Compare commits

...

82 Commits

Author SHA1 Message Date
Michel Hollands
a6462d1ac1 Scrape agent metrics as well
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-10 15:23:05 +01:00
Michel Hollands
0d3f9a1416 Merge pull request #35 from grafana/do_not_run_ruler_for_dashboards_without_grafana
Do not create ruler for dashboards when Grafana is not enabled
2024-04-10 13:01:45 +01:00
Michel Hollands
8fa5b63db7 Also store loki_build_info as it's used in dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-10 12:06:46 +01:00
Michel Hollands
d7063da3d4 Do not create ruler for dashboards when Grafana is not enabled
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-10 11:49:49 +01:00
Edward Welch
e7f28a261e change conditionals around how dashboards are installed
add ingress for grafana
reduce some variables and reuse existing
2024-04-06 15:36:46 +00:00
Michel Hollands
509a32bc59 Merge pull request #34 from grafana/filter_out_metrics
Filter out metrics not in list
2024-04-04 13:15:43 +01:00
Michel Hollands
6bb31ad5e0 Filter out metrics not in list
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-04 11:19:34 +01:00
Michel Hollands
7724d9c928 Merge pull request #32 from grafana/update_range_vector_in_recording_rules
Use 5m instead 1m range
2024-04-03 15:33:47 +01:00
Michel Hollands
13294675fe Merge pull request #33 from grafana/only_scrape_http_ports
Only scrape http ports
2024-04-03 14:20:54 +01:00
Michel Hollands
bf71def2f8 create separate discovery.kubernetes for metrics
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-03 14:12:53 +01:00
Michel Hollands
b37fa4adf5 Fix rebase
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-03 12:46:34 +01:00
Michel Hollands
18a5face81 cleanup comments
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-03 12:45:27 +01:00
Michel Hollands
5e908f796c add extra filter in prometheus scrape for http-metrics
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-03 12:44:48 +01:00
Michel Hollands
17b52d572a Use 5m instead 1m range
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-03 10:30:10 +01:00
Michel Hollands
6eac38d4ec Merge pull request #31 from grafana/update_tempo_version
Update Tempo version
2024-04-02 16:08:21 +01:00
Michel Hollands
3706c702a1 Update Tempo version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-02 16:02:32 +01:00
Michel Hollands
28b77dab17 Merge pull request #30 from grafana/update_loki_version
Update loki version
2024-04-02 14:37:56 +01:00
Michel Hollands
9770a3e5b3 update loki version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-02 14:34:03 +01:00
Michel Hollands
6cbffd6d9d Merge pull request #29 from grafana/update_versions
Update mimir version
2024-04-02 13:47:36 +01:00
Michel Hollands
4ae23a99d2 update mimir version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-02 11:43:21 +01:00
Michel Hollands
20232e9cf3 Merge pull request #28 from grafana/use_secrets
Add secrets for credentials and endpoints
2024-04-01 14:44:05 +01:00
Michel Hollands
043a503ce7 Use the meta namespace everywhere
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-01 14:42:52 +01:00
Michel Hollands
39f50d8580 Use 1 secret with all values
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-01 13:40:05 +01:00
Michel Hollands
d9fc9e4f4e Add secret and configmap for credentials and endpoints
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-01 13:15:50 +01:00
Michel Hollands
f61913d3da Merge pull request #27 from grafana/filter_out_errors_local_loki
Filter out log lines for local Loki as well
2024-03-28 15:55:51 +00:00
Michel Hollands
c29daab64d Filter out log lines for local Loki as well
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-28 15:42:24 +00:00
Michel Hollands
d389a9f741 Merge pull request #26 from grafana/fix_logline_filtering
Fix logline filtering
2024-03-28 11:05:18 +00:00
Michel Hollands
6f5f50f901 Remove httpPort variable
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-28 11:02:13 +00:00
Michel Hollands
efea1c5054 Fix filtering of log lines
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-28 11:00:09 +00:00
Michel Hollands
b02aee6816 temp
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-28 09:46:35 +00:00
Michel Hollands
c522e3f39e temp commit
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-28 09:19:14 +00:00
Michel Hollands
e3542e472d Merge pull request #25 from grafana/scrape_grafana_agents
Also get logs and metrics for agents
2024-03-27 16:38:16 +00:00
Michel Hollands
3a138991ff Also get logs and metrics for agents
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-27 15:24:11 +00:00
Michel Hollands
cd78caab48 Merge pull request #23 from grafana/update_chart_dependencies
Update agent version and enable clustering
2024-03-27 14:43:24 +00:00
Michel Hollands
f281741de9 use a statefulset with autoscaling
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-27 14:36:09 +00:00
Michel Hollands
381ecb2c06 Merge pull request #24 from grafana/update_scrape_intervals
Remove the scrape_interval settings
2024-03-27 13:06:43 +00:00
Michel Hollands
20cdb8dcc1 Remove the scrape_interval settings
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-27 11:23:32 +00:00
Michel Hollands
019f2b7b1e Update agent version and enable clustering
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-27 10:02:40 +00:00
Michel Hollands
1bffcac5e5 Merge pull request #22 from grafana/filter_log_lines
Filter out log lines not matching list
2024-03-27 08:57:15 +00:00
Michel Hollands
d23291dc91 Filter out log lines not matching list
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-26 17:42:17 +00:00
Michel Hollands
a89ba944a3 Merge pull request #21 from grafana/remove_helm_test_step
Comment out Test Helm Chart CI step for now
2024-03-26 17:31:39 +00:00
Michel Hollands
ef05e599e6 comment out Test Helm Chart CI step for now
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-03-26 17:30:42 +00:00
Michel Hollands
a586e753da Merge pull request #18 from grafana/add_create_kind_cluster_to_ci
Add create kind cluster to ci
2023-10-19 10:52:13 +01:00
Michel Hollands
76908c1e9e Turn on cloud metrics and traces by default
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-10-19 10:47:12 +01:00
Michel Hollands
bc5cdadb9f Rename file and do not run ruler when no mimir
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-10-19 10:28:59 +01:00
Michel Hollands
687c77c0f6 Use cloud
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-10-19 09:31:59 +01:00
Michel Hollands
2a0b14ee45 Merge pull request #19 from grafana/add_30_day_retention
Use 30 days retention instead of 24 hours
2023-10-18 13:22:08 +01:00
Michel Hollands
7e06d611a7 Use 30 days retention instead of 24 hours
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-10-18 11:42:53 +01:00
Michel Hollands
f4934d6007 Remove space
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-10-18 11:35:54 +01:00
Michel Hollands
427764278c Merge pull request #17 from grafana/add_lint_to_ci
Add lint to ci
2023-08-23 12:22:17 +01:00
Michel Hollands
1093e91741 Change namespace name
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:45:57 +01:00
Michel Hollands
1ed196299b Increase timeout
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:44:31 +01:00
Michel Hollands
faa0015c11 Install locally by default
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:28:47 +01:00
Michel Hollands
53416e042c Use correct namespace
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:26:35 +01:00
Michel Hollands
d804da13f1 Add test install
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:20:20 +01:00
Michel Hollands
8c0b68fe02 Fix kind.config
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:13:16 +01:00
Michel Hollands
99bb8f13c2 Apply linting 5
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:10:36 +01:00
Michel Hollands
26ff679cbb Apply linting 4
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:06:52 +01:00
Michel Hollands
fb3e3ece1b Apply linting 3
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:05:21 +01:00
Michel Hollands
7a5358b322 Apply linting 2
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:03:29 +01:00
Michel Hollands
9c92e18efe Apply linting
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 11:02:07 +01:00
Michel Hollands
ffe220590d Update dependencies
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 10:58:54 +01:00
Michel Hollands
e3708ce3fe Add ct.yaml
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 10:44:43 +01:00
Michel Hollands
3149f4df9b Add install step
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-18 10:22:14 +01:00
Michel Hollands
86ec586917 Fix typo
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-17 10:31:56 +01:00
Michel Hollands
6cd12bee01 Add linted rule files
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-17 10:29:24 +01:00
Michel Hollands
b042b396a2 Temp checkin
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-17 10:27:42 +01:00
Michel Hollands
bcacb70e2d Merge pull request #5 from grafana/add_skip_cdrs_to_installation_step
Update readme
2023-08-16 11:07:05 +01:00
Michel Hollands
d9c3b60659 Update documentation
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-16 11:02:16 +01:00
Michel Hollands
6d091d564e Add note about CRDs
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-16 09:47:54 +01:00
Michel Hollands
8671993962 Update readme
Signed-off-by: Michel Hollands <michel.hollands@grafana.com>
2023-08-16 09:44:36 +01:00
Michel Hollands
f80c9d7c43 Merge pull request #15 from grafana/add_retention
Add retention for Loki, Mimir and Tempo
2023-08-15 16:54:41 +01:00
Michel Hollands
60853bc8b0 Merge pull request #12 from grafana/add_agent_dashboards
Add agent dashboards
2023-08-15 16:53:39 +01:00
Michel Hollands
debdd67283 Merge pull request #13 from grafana/fix_mimir_dashboards
Fix mimir dashboards
2023-08-15 16:53:15 +01:00
Michel Hollands
8bc465b2e6 Merge pull request #14 from grafana/fix_tempo_dashboards
Fix Tempo dashboards
2023-08-15 16:53:06 +01:00
Michel Hollands
18d24c39f7 Add Loki retention
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-15 15:21:24 +01:00
Michel Hollands
23d14110a0 Add 1 day retention to Tempo and Mimir
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-15 10:47:33 +01:00
Michel Hollands
092423c2b3 Fix
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-14 15:22:40 +01:00
Michel Hollands
dcbe85a37a Fix
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-14 15:18:42 +01:00
Michel Hollands
db8558982c Also for
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-14 15:16:38 +01:00
Michel Hollands
49034b9f6b Fix dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-14 15:05:29 +01:00
Michel Hollands
aa988adb47 Add agent dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2023-08-03 15:28:47 +01:00
51 changed files with 6806 additions and 676 deletions

66
.github/workflows/helm-ci.yml vendored Normal file
View File

@@ -0,0 +1,66 @@
---
name: helm-ci
on:
pull_request:
paths:
- "charts/meta-monitoring/**"
env:
CT_CONFIGFILE: charts/meta-monitoring/ct.yaml
jobs:
call-lint:
name: Lint Helm Chart
runs-on: ubuntu-latest
steps:
- name: Checkout Code
uses: actions/checkout@v3
- name: Lint Yaml
run: make helm-lint
# call-test:
# name: Test Helm Chart
# runs-on: ubuntu-latest
# steps:
# - name: Checkout
# uses: actions/checkout@v3
# with:
# fetch-depth: 0
# - name: Set up Helm
# uses: azure/setup-helm@v3
# with:
# version: v3.8.2
# # Python is required because `ct lint` runs Yamale (https://github.com/23andMe/Yamale) and
# # yamllint (https://github.com/adrienverge/yamllint) which require Python
# - name: Set up Python
# uses: actions/setup-python@v4
# with:
# python-version: 3.7
# - name: Set up chart-testing
# uses: helm/chart-testing-action@v2.4.0
# - name: Run chart-testing (list-changed)
# id: list-changed
# run: |
# changed=$(ct list-changed --config "${CT_CONFIGFILE}")
# if [[ -n "$changed" ]]; then
# echo "changed=true" >> $GITHUB_OUTPUT
# fi
# - name: Run chart-testing (lint)
# run: ct lint --config "${CT_CONFIGFILE}" --check-version-increment=false
# - name: Create kind cluster
# uses: helm/kind-action@v1.8.0
# if: steps.list-changed.outputs.changed == 'true'
# with:
# config: tools/kind.config
# - name: Run chart-testing (install)
# run: |
# changed=$(ct list-changed --config "${CT_CONFIGFILE}")
# ct install --config "${CT_CONFIGFILE}"

10
Makefile Normal file
View File

@@ -0,0 +1,10 @@
# Adapted from https://www.thapaliya.com/en/writings/well-documented-makefiles/
.PHONY: help
help: ## Display this help and any documented user-facing targets. Other undocumented targets may be present in the Makefile.
help:
@awk 'BEGIN {FS = ":.*##"; printf "\nUsage:\n make <target>\n\nTargets:\n"} /^[a-zA-Z_-]+:.*?##/ { printf " %-45s %s\n", $$1, $$2 }' $(MAKEFILE_LIST)
.PHONY: helm-lint
helm-lint: ## Run helm linter
$(MAKE) -BC charts/meta-monitoring lint

View File

@@ -3,6 +3,8 @@
This is a meta-monitoring chart for GEL, GEM and GET. It should be installed in a This is a meta-monitoring chart for GEL, GEM and GET. It should be installed in a
separate namespace next to GEM, GEL or GET installations. separate namespace next to GEM, GEL or GET installations.
Note that this is pre-production software at the moment.
## Preparation ## Preparation
Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml).
@@ -15,29 +17,54 @@ Create a values.yaml file based on the [default one](../charts/meta-monitoring/v
## Local and cloud modes ## Local and cloud modes
The chart has 2 modes: local and cloud. In the local mode logs, metrics and traces are sent The chart has 2 modes: local and cloud. In the local mode logs, metrics and/or traces are sent
to small Loki, Mimir and Tempo installations running in the meta-monitoring namespace. to small Loki, Mimir and Tempo installations running in the meta-monitoring namespace.
![local mode](docs/images/Meta%20monitoring%20local.png) ![local mode](docs/images/Meta%20monitoring%20local.png)
To enable local mode set `local.enabled` to true. To enable local mode set `local.<logs|metrics|traces>.enabled` to true.
In the cloud mode the logs, metrics and traces are sent to In the cloud mode the logs, metrics and/or traces are sent to Grafana Cloud.
![cloud mode](docs/images/Meta%20monitoring%20cloud.png) ![cloud mode](docs/images/Meta%20monitoring%20cloud.png)
To enable cloud mode set `cloud.enabled` to true. The `endpoint`, `username` and `password` settings for your Grafana Cloud logs, metrics and traces instances have to be filled in as well. To enable cloud mode set `cloud.<logs|metrics|traces>.enabled` to true. The `endpoint`, `username` and `password` settings for your Grafana Cloud logs, metrics and traces instances have to be filled in as well.
Both modes can be enabled at the same time. Both modes can be enabled at the same time.
## Installation ## Installation
``` ```
helm install -n meta -f values.yaml meta ./charts/meta-monitoring helm install -n meta --skip-crds -f values.yaml meta ./charts/meta-monitoring
``` ```
If the platform supports CRDs the `--skip-crds` option can be removed. However the CRDs are not used by this chart.
For more instructions including how to update the chart go to the [installation](docs/installation.md) page. For more instructions including how to update the chart go to the [installation](docs/installation.md) page.
## Supported features
- Specify which namespaces are monitored
- Specify if logs, metrics or traces should be enabled for cloud or local
- Specify the cluster name used for the logs, metrics and traces
- Specify PII regexes that are applied to logs before they are sent to Loki (cloud or local). The capture group in the regex is replaced with *****.
- a Grafana instance is installed (when local mode is used) with the relevant datasources installed. The following dashboards are installed:
- logs dashboards
- metrics dashboards
- traces dashboards
- agent dashboards
- Retention is set to 24 hours
Most of these features are enabled by default. See the values.yaml file for how to enable/disable them.
## Caveats
- The [loki.source.kubernetes](https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.kubernetes/) component of the Grafana Agent is used to scrape Kubernetes log files. This component is marked experimental at the moment.
- This has not been tested on Openshift yet.
- The underlying Loki, Mimir and Tempo are at the default size installed by the Helm chart. This might need changing when monitoring bigger Loki, Mimir or Tempo installations.
- MinIO is used as storage at the moment with a limited retention. At the moment this chart cannot be used for monitoring over longer periods.
- Agent self monitoring is not done at the moment.
## Developer help topics ## Developer help topics
- [update dependencies](docs/dev_update_dependencies.md) - [update dependencies](docs/dev_update_dependencies.md)

View File

@@ -1,18 +1,18 @@
dependencies: dependencies:
- name: loki - name: loki
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: 5.8.0 version: 5.47.2
- name: grafana-agent - name: grafana-agent
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: 0.15.0 version: 0.37.0
- name: mimir-distributed - name: mimir-distributed
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: 4.4.1 version: 5.2.0
- name: tempo-distributed - name: tempo-distributed
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: 1.4.7 version: 1.9.1
- name: minio - name: minio
repository: https://charts.min.io repository: https://charts.min.io
version: 5.0.11 version: 5.0.11
digest: sha256:4b04084e6fe821c4d481017b2430f7c8cd782a5d60830dd3a24eb8f10a9ece09 digest: sha256:7b7e62e08d9a56e63fdb12ce3fd4d1fda4887545546ac3e98c7886be714fd763
generated: "2023-06-29T14:25:07.247853+01:00" generated: "2024-04-02T15:09:13.121195+01:00"

View File

@@ -25,21 +25,21 @@ appVersion: "0.0.1"
dependencies: dependencies:
- name: loki - name: loki
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: "5.8.0" version: "5.47.2"
condition: local.logs.enabled condition: local.logs.enabled
- name: grafana-agent - name: grafana-agent
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: "0.15.0" version: "0.37.0"
- name: mimir-distributed - name: mimir-distributed
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: "4.4.1" version: "5.2.0"
condition: local.metrics.enabled condition: local.metrics.enabled
- name: tempo-distributed - name: tempo-distributed
repository: https://grafana.github.io/helm-charts repository: https://grafana.github.io/helm-charts
version: "1.4.7" version: "1.9.1"
condition: local.traces.enabled condition: local.traces.enabled
- name: minio - name: minio
repository: https://charts.min.io repository: https://charts.min.io
version: "5.0.11" version: "5.0.11"
condition: local.minio.enabled condition: local.minio.enabled

View File

@@ -0,0 +1,7 @@
.DEFAULT_GOAL := lint
.PHONY: lint lint-yaml
lint: lint-yaml
lint-yaml:
yamllint -c $(CURDIR)/src/.yamllint.yaml $(CURDIR)/src

Binary file not shown.

Binary file not shown.

View File

@@ -0,0 +1,11 @@
---
remote: origin
target-branch: main
chart-dirs:
- charts
chart-repos:
- grafana=https://grafana.github.io/helm-charts
- minio=https://charts.min.io
helm-extra-args: --timeout 1200s
check-version-increment: false
validate-maintainers: false

View File

@@ -0,0 +1,4 @@
---
rules:
quoted-strings:
required: true

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,786 @@
{
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [ ],
"refresh": "30s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Count",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [ ],
"type": "hidden",
"unit": "short"
},
{
"alias": "Uptime",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Container",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Version",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "version",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [ ],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count by (pod, container, version) (agent_build_info{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "max by (pod, container) (time() - process_start_time_seconds{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Agent Stats",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Agent Stats",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(prometheus_target_sync_length_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])) by (pod, scrape_job) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}/{{scrape_job}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Target Sync",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (pod) (prometheus_sd_discovered_targets{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Targets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Prometheus Discovery",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_target_interval_length_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])\n/\nrate(prometheus_target_interval_length_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])\n* 1e3\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} {{interval}} configured",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Average Scrape Interval Duration",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_exceeded_sample_limit_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "exceeded sample limit: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "duplicate timestamp: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_out_of_bounds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "out of bounds: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_out_of_order_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "out of order: {{job}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Scrape failures",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (job, instance_group_name) (rate(agent_wal_samples_appended_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{job}} {{instance_group_name}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Appended Samples",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Prometheus Retrieval",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"grafana-agent-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(agent_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(agent_build_info, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "container",
"multi": true,
"name": "container",
"options": [ ],
"query": "label_values(agent_build_info, container)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": "grafana-agent-.*",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "pod",
"multi": true,
"name": "pod",
"options": [ ],
"query": "label_values(agent_build_info{container=~\"$container\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Agent",
"uid": "",
"version": 0
}

View File

@@ -161,7 +161,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "rate(go_gc_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])", "expr": "rate(go_gc_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])",
"interval": "", "interval": "",
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"refId": "A" "refId": "A"
@@ -256,7 +256,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}", "expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}",
"interval": "", "interval": "",
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"refId": "A" "refId": "A"
@@ -351,7 +351,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "go_goroutines{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}", "expr": "go_goroutines{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}",
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"refId": "A" "refId": "A"
} }
@@ -441,7 +441,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "rate(container_cpu_usage_seconds_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$component.*\", container!=\"POD\"}[$__rate_interval])", "expr": "rate(container_cpu_usage_seconds_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\".*$component.*\", container!=\"POD\"}[$__rate_interval])",
"interval": "", "interval": "",
"intervalFactor": 5, "intervalFactor": 5,
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
@@ -537,7 +537,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$component.*\", container!=\"POD\"}", "expr": "container_memory_working_set_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\".*$component.*\", container!=\"POD\"}",
"interval": "", "interval": "",
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"refId": "A" "refId": "A"
@@ -632,14 +632,14 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$component.*\"}[$__rate_interval])", "expr": "rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\".*$component.*\"}[$__rate_interval])",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "rx-{{pod}}", "legendFormat": "rx-{{pod}}",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "rate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$component.*\"}[$__rate_interval])", "expr": "rate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\".*$component.*\"}[$__rate_interval])",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "tx-{{pod}}", "legendFormat": "tx-{{pod}}",
@@ -735,7 +735,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "kubelet_volume_stats_available_bytes{cluster=\"$cluster\", namespace=\"$namespace\", persistentvolumeclaim=~\"$component.*\"}", "expr": "kubelet_volume_stats_available_bytes{cluster=\"$cluster\", namespace=\"$namespace\", persistentvolumeclaim=~\".*$component.*\"}",
"legendFormat": "{{persistentvolumeclaim}}", "legendFormat": "{{persistentvolumeclaim}}",
"refId": "A" "refId": "A"
} }
@@ -829,7 +829,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\", app=~\"$component.*\"}[$__rate_interval])", "expr": "rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\", app=~\".*$component.*\"}[$__rate_interval])",
"interval": "", "interval": "",
"legendFormat": "{{exported_pod}}", "legendFormat": "{{exported_pod}}",
"refId": "A" "refId": "A"
@@ -934,7 +934,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "tempodb_work_queue_length{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"} / tempodb_work_queue_max{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}", "expr": "tempodb_work_queue_length{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"} / tempodb_work_queue_max{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}",
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
"refId": "A" "refId": "A"
} }
@@ -1024,17 +1024,17 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(increase(tempodb_compaction_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (job)", "expr": "sum(increase(tempodb_compaction_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (job)",
"legendFormat": "compaction_err", "legendFormat": "compaction_err",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "sum(increase(tempodb_retention_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (job)", "expr": "sum(increase(tempodb_retention_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (job)",
"legendFormat": "retention_err", "legendFormat": "retention_err",
"refId": "C" "refId": "C"
}, },
{ {
"expr": "sum(increase(tempodb_blocklist_poll_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (job)", "expr": "sum(increase(tempodb_blocklist_poll_errors_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (job)",
"legendFormat": "blocklist_err", "legendFormat": "blocklist_err",
"refId": "D" "refId": "D"
} }
@@ -1124,18 +1124,18 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (le))",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempodb_blocklist_poll_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -1227,7 +1227,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "avg(tempodb_blocklist_length{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}) by (tenant)", "expr": "avg(tempodb_blocklist_length{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}) by (tenant)",
"instant": false, "instant": false,
"interval": "", "interval": "",
"legendFormat": "{{tenant}}", "legendFormat": "{{tenant}}",
@@ -1319,19 +1319,19 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/compactor\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*compactor\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/compactor\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*compactor\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/compactor\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempodb_retention_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*compactor\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -1422,13 +1422,13 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(increase(tempodb_retention_deleted_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval]))", "expr": "sum(increase(tempodb_retention_deleted_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval]))",
"interval": "", "interval": "",
"legendFormat": "deleted", "legendFormat": "deleted",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "sum(increase(tempodb_retention_marked_for_deletion_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval]))", "expr": "sum(increase(tempodb_retention_marked_for_deletion_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval]))",
"interval": "", "interval": "",
"legendFormat": "marked_for_deletion", "legendFormat": "marked_for_deletion",
"refId": "B" "refId": "B"
@@ -2049,7 +2049,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_traces_traceid\", job=\"$namespace/query-frontend\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_traces_traceid\", job=\"$namespace/.*query-frontend\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -2145,19 +2145,19 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -2253,7 +2253,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_search.*\", job=\"$namespace/query-frontend\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_search.*\", job=\"$namespace/.*query-frontend\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -2351,7 +2351,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A", "refId": "A",
@@ -2359,7 +2359,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B", "refId": "B",
@@ -2367,7 +2367,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C", "refId": "C",
@@ -2463,7 +2463,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_traces_traceid\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_traces_traceid\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -2559,19 +2559,19 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_traces_traceid\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -2667,7 +2667,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_search.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_search.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -2765,7 +2765,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A", "refId": "A",
@@ -2773,7 +2773,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B", "refId": "B",
@@ -2781,7 +2781,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C", "refId": "C",
@@ -2877,7 +2877,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_request_duration_seconds_count{route=\"/tempopb.Querier/FindTraceByID\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=\"/tempopb.Querier/FindTraceByID\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -2973,19 +2973,19 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=\"/tempopb.Querier/FindTraceByID\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -3081,7 +3081,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"/tempopb.Querier/Search.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"/tempopb.Querier/Search.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -3179,7 +3179,7 @@
"targets": [ "targets": [
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A", "refId": "A",
@@ -3187,7 +3187,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B", "refId": "B",
@@ -3195,7 +3195,7 @@
}, },
{ {
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Querier/Search.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C", "refId": "C",
@@ -3527,7 +3527,7 @@
}, },
"editorMode": "code", "editorMode": "code",
"exemplar": true, "exemplar": true,
"expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_metrics.*\", job=\"$namespace/query-frontend\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", route=~\".*api_metrics.*\", job=\"$namespace/.*query-frontend\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -3632,7 +3632,7 @@
}, },
"editorMode": "code", "editorMode": "code",
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"range": true, "range": true,
@@ -3646,7 +3646,7 @@
}, },
"editorMode": "code", "editorMode": "code",
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"range": true, "range": true,
@@ -3660,7 +3660,7 @@
}, },
"editorMode": "code", "editorMode": "code",
"exemplar": true, "exemplar": true,
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*query-frontend\", route=~\".*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"range": true, "range": true,
@@ -3763,7 +3763,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_metrics.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=~\"querier_.*api_metrics.*\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}} {{route}}", "legendFormat": "{{status_code}} {{route}}",
@@ -3866,7 +3866,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"range": true, "range": true,
@@ -3878,7 +3878,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"range": true, "range": true,
@@ -3890,7 +3890,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*querier\", route=~\"querier_.*api_metrics.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"range": true, "range": true,
@@ -3992,7 +3992,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "sum(rate(tempo_request_duration_seconds_count{route=\"/tempopb.MetricsGenerator/GetMetrics\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\"}[$__rate_interval])) by (status_code)", "expr": "sum(rate(tempo_request_duration_seconds_count{route=\"/tempopb.MetricsGenerator/GetMetrics\", cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\"}[$__rate_interval])) by (status_code)",
"hide": false, "hide": false,
"interval": "", "interval": "",
"legendFormat": "{{status_code}}", "legendFormat": "{{status_code}}",
@@ -4095,7 +4095,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"range": true, "range": true,
@@ -4107,7 +4107,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"range": true, "range": true,
@@ -4119,7 +4119,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=\"/tempopb.MetricsGenerator/GetMetrics\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"range": true, "range": true,
@@ -4437,7 +4437,7 @@
"pluginVersion": "9.0.0-d452322apre", "pluginVersion": "9.0.0-d452322apre",
"targets": [ "targets": [
{ {
"expr": "sum(increase(tempo_ingester_blocks_flushed_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\"}[1h]))", "expr": "sum(increase(tempo_ingester_blocks_flushed_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\"}[1h]))",
"interval": "", "interval": "",
"legendFormat": "{{pod}}", "legendFormat": "{{pod}}",
"refId": "A" "refId": "A"
@@ -5132,19 +5132,19 @@
"pluginVersion": "9.0.0-d452322apre", "pluginVersion": "9.0.0-d452322apre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"refId": "C" "refId": "C"
@@ -5359,7 +5359,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".99", "legendFormat": ".99",
"range": true, "range": true,
@@ -5371,7 +5371,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.9, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".9", "legendFormat": ".9",
"range": true, "range": true,
@@ -5383,7 +5383,7 @@
"uid": "cortex-ops-01" "uid": "cortex-ops-01"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))", "expr": "histogram_quantile(.5, sum(rate(tempo_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*metrics-generator\", route=~\"/tempopb.MetricsGenerator/PushSpans\"}[$__rate_interval])) by (le))",
"interval": "", "interval": "",
"legendFormat": ".5", "legendFormat": ".5",
"range": true, "range": true,
@@ -5496,7 +5496,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_memcache_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (status_code, method)", "expr": "sum(rate(tempo_memcache_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (status_code, method)",
"interval": "", "interval": "",
"legendFormat": "{{status_code}}-{{method}}", "legendFormat": "{{status_code}}-{{method}}",
"refId": "A" "refId": "A"
@@ -5590,19 +5590,19 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (method, le))", "expr": "histogram_quantile(.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (method, le))",
"interval": "", "interval": "",
"legendFormat": ".99-{{method}}", "legendFormat": ".99-{{method}}",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (method, le))", "expr": "histogram_quantile(.9, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (method, le))",
"interval": "", "interval": "",
"legendFormat": ".9-{{method}}", "legendFormat": ".9-{{method}}",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (method, le))", "expr": "histogram_quantile(.5, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (method, le))",
"interval": "", "interval": "",
"legendFormat": ".5-{{method}}", "legendFormat": ".5-{{method}}",
"refId": "C" "refId": "C"
@@ -5714,7 +5714,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempodb_backend_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (status_code, operation)", "expr": "sum(rate(tempodb_backend_request_duration_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (status_code, operation)",
"interval": "", "interval": "",
"legendFormat": "{{status_code}}-{{operation}}", "legendFormat": "{{status_code}}-{{operation}}",
"refId": "A" "refId": "A"
@@ -5808,17 +5808,17 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (operation, le))", "expr": "histogram_quantile(.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (operation, le))",
"legendFormat": ".99-{{operation}}", "legendFormat": ".99-{{operation}}",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "histogram_quantile(.9, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (operation, le))", "expr": "histogram_quantile(.9, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (operation, le))",
"legendFormat": ".9-{{operation}}", "legendFormat": ".9-{{operation}}",
"refId": "B" "refId": "B"
}, },
{ {
"expr": "histogram_quantile(.5, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}[$__rate_interval])) by (operation, le))", "expr": "histogram_quantile(.5, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}[$__rate_interval])) by (operation, le))",
"legendFormat": ".5-{{operation}}", "legendFormat": ".5-{{operation}}",
"refId": "C" "refId": "C"
} }
@@ -5934,7 +5934,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P666011C0B63BDCA4" "uid": "P666011C0B63BDCA4"
}, },
"expr": "gauge_memberlist_health_score{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}", "expr": "gauge_memberlist_health_score{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}",
"interval": "", "interval": "",
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
"refId": "A" "refId": "A"
@@ -6028,7 +6028,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "tempo_memberlist_client_cluster_node_health_score{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"}", "expr": "tempo_memberlist_client_cluster_node_health_score{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"}",
"interval": "", "interval": "",
"legendFormat": "{{instance}}", "legendFormat": "{{instance}}",
"refId": "A" "refId": "A"
@@ -6122,13 +6122,13 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "min(tempo_memberlist_client_cluster_members_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"})", "expr": "min(tempo_memberlist_client_cluster_members_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"})",
"interval": "", "interval": "",
"legendFormat": "min", "legendFormat": "min",
"refId": "A" "refId": "A"
}, },
{ {
"expr": "max(tempo_memberlist_client_cluster_members_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"})", "expr": "max(tempo_memberlist_client_cluster_members_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"})",
"interval": "", "interval": "",
"legendFormat": "max", "legendFormat": "max",
"refId": "B" "refId": "B"
@@ -6227,7 +6227,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P666011C0B63BDCA4" "uid": "P666011C0B63BDCA4"
}, },
"expr": "min(tempo_memberlist_client_kv_store_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"})", "expr": "min(tempo_memberlist_client_kv_store_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"})",
"interval": "", "interval": "",
"legendFormat": "min", "legendFormat": "min",
"refId": "A" "refId": "A"
@@ -6237,7 +6237,7 @@
"type": "prometheus", "type": "prometheus",
"uid": "P666011C0B63BDCA4" "uid": "P666011C0B63BDCA4"
}, },
"expr": "max(tempo_memberlist_client_kv_store_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/$component\"})", "expr": "max(tempo_memberlist_client_kv_store_count{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*$component\"})",
"interval": "", "interval": "",
"legendFormat": "max", "legendFormat": "max",
"refId": "B" "refId": "B"
@@ -6516,7 +6516,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempodb_compaction_objects_combined_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/compactor\"}[$__rate_interval])) by (level)", "expr": "sum(rate(tempodb_compaction_objects_combined_total{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"$namespace/.*compactor\"}[$__rate_interval])) by (level)",
"interval": "", "interval": "",
"legendFormat": "", "legendFormat": "",
"refId": "A" "refId": "A"
@@ -6608,7 +6608,7 @@
"pluginVersion": "9.0.0-d373beebpre", "pluginVersion": "9.0.0-d373beebpre",
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempodb_compaction_objects_written_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/compactor\"}[$__rate_interval])) by (level)", "expr": "sum(rate(tempodb_compaction_objects_written_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*compactor\"}[$__rate_interval])) by (level)",
"interval": "", "interval": "",
"legendFormat": "", "legendFormat": "",
"refId": "A" "refId": "A"
@@ -6701,7 +6701,7 @@
"uid": "P666011C0B63BDCA4" "uid": "P666011C0B63BDCA4"
}, },
"editorMode": "builder", "editorMode": "builder",
"expr": "sum(rate(tempodb_compaction_bytes_written_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/compactor\"}[$__rate_interval])) by (level)", "expr": "sum(rate(tempodb_compaction_bytes_written_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*compactor\"}[$__rate_interval])) by (level)",
"interval": "", "interval": "",
"legendFormat": "__auto", "legendFormat": "__auto",
"range": true, "range": true,
@@ -6795,7 +6795,7 @@
"uid": "P666011C0B63BDCA4" "uid": "P666011C0B63BDCA4"
}, },
"editorMode": "code", "editorMode": "code",
"expr": "sum(increase(tempodb_compaction_blocks_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/compactor\"}[5m])) by (level)", "expr": "sum(increase(tempodb_compaction_blocks_total{cluster=\"$cluster\", namespace=\"$namespace\", job=\"$namespace/.*compactor\"}[5m])) by (level)",
"interval": "", "interval": "",
"legendFormat": "__auto", "legendFormat": "__auto",
"range": true, "range": true,

View File

@@ -282,7 +282,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\", route=~\"api_.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\", route=~\"api_.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -369,7 +369,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",route=~\"api_.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",route=~\"api_.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -378,7 +378,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",route=~\"api_.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",route=~\"api_.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -387,7 +387,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",route=~\"api_.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",route=~\"api_.*\"}[$__interval])) by (route)", "expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",route=~\"api_.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",route=~\"api_.*\"}[$__interval])) by (route)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -492,7 +492,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\", route=~\"querier_api_.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\", route=~\"querier_api_.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -579,7 +579,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",route=~\"querier_api_.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",route=~\"querier_api_.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -588,7 +588,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",route=~\"querier_api_.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",route=~\"querier_api_.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -597,7 +597,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/querier\",route=~\"querier_api_.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\",route=~\"querier_api_.*\"}[$__interval])) by (route)", "expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",route=~\"querier_api_.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",route=~\"querier_api_.*\"}[$__interval])) by (route)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -702,7 +702,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_querier_external_endpoint_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_querier_external_endpoint_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -789,7 +789,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_querier_external_endpoint_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\"}[$__interval])) by (le,endpoint)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_querier_external_endpoint_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"}[$__interval])) by (le,endpoint)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -798,7 +798,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_querier_external_endpoint_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\"}[$__interval])) by (le,endpoint)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_querier_external_endpoint_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"}[$__interval])) by (le,endpoint)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -807,7 +807,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_querier_external_endpoint_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/querier\"}[$__interval])) by (endpoint) * 1e3 / sum(rate(tempo_querier_external_endpoint_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\"}[$__interval])) by (endpoint)", "expr": "sum(rate(tempo_querier_external_endpoint_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"}[$__interval])) by (endpoint) * 1e3 / sum(rate(tempo_querier_external_endpoint_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"}[$__interval])) by (endpoint)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -912,7 +912,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\", route=~\"/tempopb.Querier/.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\", route=~\"/tempopb.Querier/.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -999,7 +999,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1008,7 +1008,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (le,route)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (le,route)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1017,7 +1017,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (route)", "expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (route) * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Querier/.*\"}[$__interval])) by (route)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1122,7 +1122,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1209,7 +1209,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1218,7 +1218,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1227,7 +1227,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by ()", "expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",method=~\"Memcache.Get|Memcache.GetMulti\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1332,7 +1332,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\",operation=\"GET\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",operation=\"GET\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1419,7 +1419,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",operation=\"GET\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",operation=\"GET\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1428,7 +1428,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/querier\",operation=\"GET\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",operation=\"GET\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1437,7 +1437,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/querier\",operation=\"GET\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/querier\",operation=\"GET\"}[$__interval])) by ()", "expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",operation=\"GET\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\",operation=\"GET\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,

View File

@@ -621,7 +621,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -948,7 +948,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/ingester\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1275,7 +1275,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/metrics-generator\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*metrics-generator\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1602,7 +1602,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1929,7 +1929,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/querier\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -2256,7 +2256,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/compactor\"})", "expr": "sum by(instance) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,

View File

@@ -89,7 +89,7 @@
], ],
"targets": [ "targets": [
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\"})\n) by (limit_name)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\"})\n) by (limit_name)\n",
"format": "table", "format": "table",
"instant": true, "instant": true,
"intervalFactor": 2, "intervalFactor": 2,
@@ -198,7 +198,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_distributor_bytes_received_total{cluster=~\"$cluster\", job=~\"($namespace)/distributor\",tenant=\"$tenant\"}[$__rate_interval]))", "expr": "sum(rate(tempo_distributor_bytes_received_total{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\",tenant=\"$tenant\"}[$__rate_interval]))",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -207,7 +207,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\",limit_name=\"ingestion_rate_limit_bytes\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",limit_name=\"ingestion_rate_limit_bytes\"})\n) by (ingestion_rate_limit_bytes)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\",limit_name=\"ingestion_rate_limit_bytes\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",limit_name=\"ingestion_rate_limit_bytes\"})\n) by (ingestion_rate_limit_bytes)\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -216,7 +216,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\",limit_name=\"ingestion_burst_size_bytes\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",limit_name=\"ingestion_burst_size_bytes\"})\n) by (ingestion_burst_size_bytes)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\",limit_name=\"ingestion_burst_size_bytes\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",limit_name=\"ingestion_burst_size_bytes\"})\n) by (ingestion_burst_size_bytes)\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -303,7 +303,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_distributor_spans_received_total{cluster=~\"$cluster\", job=~\"($namespace)/distributor\",tenant=\"$tenant\"}[$__rate_interval]))", "expr": "sum(rate(tempo_distributor_spans_received_total{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\",tenant=\"$tenant\"}[$__rate_interval]))",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -312,7 +312,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_discarded_spans_total{cluster=~\"$cluster\", job=~\"($namespace)/distributor\",tenant=\"$tenant\"}[$__rate_interval])) by (reason)", "expr": "sum(rate(tempo_discarded_spans_total{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\",tenant=\"$tenant\"}[$__rate_interval])) by (reason)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -408,7 +408,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "max(tempo_ingester_live_traces{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",tenant=\"$tenant\"})", "expr": "max(tempo_ingester_live_traces{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",tenant=\"$tenant\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -417,7 +417,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\",limit_name=\"max_global_traces_per_user\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",limit_name=\"max_global_traces_per_user\"})\n) by (max_global_traces_per_user)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\",limit_name=\"max_global_traces_per_user\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",limit_name=\"max_global_traces_per_user\"})\n) by (max_global_traces_per_user)\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -426,7 +426,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\",limit_name=\"max_local_traces_per_user\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",limit_name=\"max_local_traces_per_user\"})\n) by (max_local_traces_per_user)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\",limit_name=\"max_local_traces_per_user\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",limit_name=\"max_local_traces_per_user\"})\n) by (max_local_traces_per_user)\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -525,7 +525,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_query_frontend_queries_total{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",tenant=\"$tenant\",op=\"traces\"}[$__rate_interval])) by (status)", "expr": "sum(rate(tempo_query_frontend_queries_total{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",tenant=\"$tenant\",op=\"traces\"}[$__rate_interval])) by (status)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -612,7 +612,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_query_frontend_queries_total{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\",tenant=\"$tenant\",op=\"search\"}[$__rate_interval])) by (status)", "expr": "sum(rate(tempo_query_frontend_queries_total{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\",tenant=\"$tenant\",op=\"search\"}[$__rate_interval])) by (status)",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -705,7 +705,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "avg(tempodb_blocklist_length{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",tenant=\"$tenant\"})", "expr": "avg(tempodb_blocklist_length{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",tenant=\"$tenant\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -786,7 +786,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(tempodb_compaction_outstanding_blocks{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",tenant=\"$tenant\"})\n/\ncount(tempo_build_info{cluster=~\"$cluster\", job=~\"($namespace)/compactor\"})\n", "expr": "sum(tempodb_compaction_outstanding_blocks{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",tenant=\"$tenant\"})\n/\ncount(tempo_build_info{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\"})\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -879,7 +879,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_metrics_generator_bytes_received_total{cluster=~\"$cluster\", job=~\"($namespace)/metrics-generator\",tenant=\"$tenant\"}[$__rate_interval]))", "expr": "sum(rate(tempo_metrics_generator_bytes_received_total{cluster=~\"$cluster\", job=~\"($namespace)/.*metrics-generator\",tenant=\"$tenant\"}[$__rate_interval]))",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -970,7 +970,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(tempo_metrics_generator_registry_active_series{cluster=~\"$cluster\", job=~\"($namespace)/metrics-generator\",tenant=\"$tenant\"})", "expr": "sum(tempo_metrics_generator_registry_active_series{cluster=~\"$cluster\", job=~\"($namespace)/.*metrics-generator\",tenant=\"$tenant\"})",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -979,7 +979,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",user=\"$tenant\",limit_name=\"metrics_generator_max_active_series\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",limit_name=\"metrics_generator_max_active_series\"})\n) by (metrics_generator_max_active_series)\n", "expr": "max(\n max by (cluster, namespace, limit_name) (tempo_limits_overrides{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",user=\"$tenant\",limit_name=\"metrics_generator_max_active_series\"})\n or max by (cluster, namespace, limit_name) (tempo_limits_defaults{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",limit_name=\"metrics_generator_max_active_series\"})\n) by (metrics_generator_max_active_series)\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1131,7 +1131,7 @@
"options": [ "options": [
], ],
"query": "label_values(tempodb_blocklist_length{cluster=~\"$cluster\", job=~\"($namespace)/compactor\"}, tenant)", "query": "label_values(tempodb_blocklist_length{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\"}, tenant)",
"refresh": 1, "refresh": 1,
"regex": "", "regex": "",
"sort": 2, "sort": 2,

View File

@@ -399,7 +399,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum(rate(tempo_receiver_accepted_spans{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval]))", "expr": "sum(rate(tempo_receiver_accepted_spans{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval]))",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -408,7 +408,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_receiver_refused_spans{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval]))", "expr": "sum(rate(tempo_receiver_refused_spans{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval]))",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -495,7 +495,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_distributor_push_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_distributor_push_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -504,7 +504,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_distributor_push_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_distributor_push_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -513,7 +513,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_distributor_push_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval])) by () * 1e3 / sum(rate(tempo_distributor_push_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"}[$__interval])) by ()", "expr": "sum(rate(tempo_distributor_push_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval])) by () * 1e3 / sum(rate(tempo_distributor_push_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -618,7 +618,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\", route=~\"/tempopb.Pusher/Push.*\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -705,7 +705,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -714,7 +714,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -723,7 +723,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by () * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by ()", "expr": "sum(rate(tempo_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by () * 1e3 / sum(rate(tempo_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",route=~\"/tempopb.Pusher/Push.*\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -828,7 +828,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",method=\"Memcache.Put\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",method=\"Memcache.Put\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -915,7 +915,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -924,7 +924,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -933,7 +933,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",method=\"Memcache.Put\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",method=\"Memcache.Put\"}[$__interval])) by ()", "expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",method=\"Memcache.Put\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",method=\"Memcache.Put\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1038,7 +1038,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",operation=~\"(PUT|POST)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",operation=~\"(PUT|POST)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1125,7 +1125,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1134,7 +1134,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1143,7 +1143,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by ()", "expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*ingester\",operation=~\"(PUT|POST)\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1248,7 +1248,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",method=\"Memcache.Put\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",method=\"Memcache.Put\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1335,7 +1335,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1344,7 +1344,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempo_memcache_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",method=\"Memcache.Put\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1353,7 +1353,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",method=\"Memcache.Put\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",method=\"Memcache.Put\"}[$__interval])) by ()", "expr": "sum(rate(tempo_memcache_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",method=\"Memcache.Put\"}[$__interval])) by () * 1e3 / sum(rate(tempo_memcache_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",method=\"Memcache.Put\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1458,7 +1458,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",operation=~\"(PUT|POST)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n", "expr": "sum by (status) (\n label_replace(label_replace(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",operation=~\"(PUT|POST)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-z]+)\"))\n",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1545,7 +1545,7 @@
"steppedLine": false, "steppedLine": false,
"targets": [ "targets": [
{ {
"expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.99, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1554,7 +1554,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3", "expr": "histogram_quantile(0.50, sum(rate(tempodb_backend_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by (le,)) * 1e3",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,
@@ -1563,7 +1563,7 @@
"step": 10 "step": 10
}, },
{ {
"expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by ()", "expr": "sum(rate(tempodb_backend_request_duration_seconds_sum{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by () * 1e3 / sum(rate(tempodb_backend_request_duration_seconds_count{cluster=~\"$cluster\", job=~\"($namespace)/.*compactor\",operation=~\"(PUT|POST)\"}[$__interval])) by ()",
"format": "time_series", "format": "time_series",
"interval": "1m", "interval": "1m",
"intervalFactor": 2, "intervalFactor": 2,

View File

@@ -1,53 +1,53 @@
groups: groups:
- name: loki_rules - name: "loki_rules"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:loki_request_duration_seconds:99quantile record: "cluster_job:loki_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:loki_request_duration_seconds:50quantile record: "cluster_job:loki_request_duration_seconds:50quantile"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[1m])) - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[5m]))
by (cluster, job) by (cluster, job)"
record: cluster_job:loki_request_duration_seconds:avg record: "cluster_job:loki_request_duration_seconds:avg"
- expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, cluster, job) - expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job)"
record: cluster_job:loki_request_duration_seconds_bucket:sum_rate record: "cluster_job:loki_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job)"
record: cluster_job:loki_request_duration_seconds_sum:sum_rate record: "cluster_job:loki_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_count[1m])) by (cluster, job) - expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)"
record: cluster_job:loki_request_duration_seconds_count:sum_rate record: "cluster_job:loki_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:loki_request_duration_seconds:99quantile record: "cluster_job_route:loki_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:loki_request_duration_seconds:50quantile record: "cluster_job_route:loki_request_duration_seconds:50quantile"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, job, route) - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)
/ sum(rate(loki_request_duration_seconds_count[1m])) by (cluster, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: cluster_job_route:loki_request_duration_seconds:avg record: "cluster_job_route:loki_request_duration_seconds:avg"
- expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, cluster, job, - expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job,
route) route)"
record: cluster_job_route:loki_request_duration_seconds_bucket:sum_rate record: "cluster_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, job, route) - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)"
record: cluster_job_route:loki_request_duration_seconds_sum:sum_rate record: "cluster_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_count[1m])) by (cluster, job, route) - expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: cluster_job_route:loki_request_duration_seconds_count:sum_rate record: "cluster_job_route:loki_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:loki_request_duration_seconds:99quantile record: "cluster_namespace_job_route:loki_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:loki_request_duration_seconds:50quantile record: "cluster_namespace_job_route:loki_request_duration_seconds:50quantile"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, namespace, - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) / sum(rate(loki_request_duration_seconds_count[1m])) by (cluster, job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route) namespace, job, route)"
record: cluster_namespace_job_route:loki_request_duration_seconds:avg record: "cluster_namespace_job_route:loki_request_duration_seconds:avg"
- expr: sum(rate(loki_request_duration_seconds_bucket[1m])) by (le, cluster, namespace, - expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate record: "cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_sum[1m])) by (cluster, namespace, - expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate record: "cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(loki_request_duration_seconds_count[1m])) by (cluster, namespace, - expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate record: "cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate"

View File

@@ -1,304 +1,299 @@
groups: groups:
- name: mimir_api_1 - name: "mimir_api_1"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_request_duration_seconds:99quantile record: "cluster_job:cortex_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_request_duration_seconds:50quantile record: "cluster_job:cortex_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, job) / sum(rate(cortex_request_duration_seconds_count[1m])) - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(cortex_request_duration_seconds_count[5m]))
by (cluster, job) by (cluster, job)"
record: cluster_job:cortex_request_duration_seconds:avg record: "cluster_job:cortex_request_duration_seconds:avg"
- expr: sum(rate(cortex_request_duration_seconds_bucket[1m])) by (le, cluster, job) - expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, job)"
record: cluster_job:cortex_request_duration_seconds_bucket:sum_rate record: "cluster_job:cortex_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_request_duration_seconds_sum:sum_rate record: "cluster_job:cortex_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job)"
record: cluster_job:cortex_request_duration_seconds_count:sum_rate record: "cluster_job:cortex_request_duration_seconds_count:sum_rate"
- name: mimir_api_2 - name: "mimir_api_2"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:cortex_request_duration_seconds:99quantile record: "cluster_job_route:cortex_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:cortex_request_duration_seconds:50quantile record: "cluster_job_route:cortex_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, job, route) - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job, route)
/ sum(rate(cortex_request_duration_seconds_count[1m])) by (cluster, job, route) / sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: cluster_job_route:cortex_request_duration_seconds:avg record: "cluster_job_route:cortex_request_duration_seconds:avg"
- expr: sum(rate(cortex_request_duration_seconds_bucket[1m])) by (le, cluster, job, - expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, job,
route) route)"
record: cluster_job_route:cortex_request_duration_seconds_bucket:sum_rate record: "cluster_job_route:cortex_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, job, route) - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job, route)"
record: cluster_job_route:cortex_request_duration_seconds_sum:sum_rate record: "cluster_job_route:cortex_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_count[1m])) by (cluster, job, route) - expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: cluster_job_route:cortex_request_duration_seconds_count:sum_rate record: "cluster_job_route:cortex_request_duration_seconds_count:sum_rate"
- name: mimir_api_3 - name: "mimir_api_3"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:cortex_request_duration_seconds:99quantile record: "cluster_namespace_job_route:cortex_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:cortex_request_duration_seconds:50quantile record: "cluster_namespace_job_route:cortex_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, namespace, - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) / sum(rate(cortex_request_duration_seconds_count[1m])) by (cluster, job, route) / sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route) namespace, job, route)"
record: cluster_namespace_job_route:cortex_request_duration_seconds:avg record: "cluster_namespace_job_route:cortex_request_duration_seconds:avg"
- expr: sum(rate(cortex_request_duration_seconds_bucket[1m])) by (le, cluster, namespace, - expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:cortex_request_duration_seconds_bucket:sum_rate record: "cluster_namespace_job_route:cortex_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_sum[1m])) by (cluster, namespace, - expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:cortex_request_duration_seconds_sum:sum_rate record: "cluster_namespace_job_route:cortex_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_request_duration_seconds_count[1m])) by (cluster, namespace, - expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route) job, route)"
record: cluster_namespace_job_route:cortex_request_duration_seconds_count:sum_rate record: "cluster_namespace_job_route:cortex_request_duration_seconds_count:sum_rate"
- name: mimir_querier_api - name: "mimir_querier_api"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_querier_request_duration_seconds:99quantile record: "cluster_job:cortex_querier_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_querier_request_duration_seconds:50quantile record: "cluster_job:cortex_querier_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job) / sum(rate(cortex_querier_request_duration_seconds_count[1m])) by (cluster, job) / sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_querier_request_duration_seconds:avg record: "cluster_job:cortex_querier_request_duration_seconds:avg"
- expr: sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
job) job)"
record: cluster_job:cortex_querier_request_duration_seconds_bucket:sum_rate record: "cluster_job:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_querier_request_duration_seconds_sum:sum_rate record: "cluster_job:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_querier_request_duration_seconds_count:sum_rate record: "cluster_job:cortex_querier_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:cortex_querier_request_duration_seconds:99quantile record: "cluster_job_route:cortex_querier_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route)) by (le, cluster, job, route))"
record: cluster_job_route:cortex_querier_request_duration_seconds:50quantile record: "cluster_job_route:cortex_querier_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job, route) / sum(rate(cortex_querier_request_duration_seconds_count[1m])) by job, route) / sum(rate(cortex_querier_request_duration_seconds_count[5m])) by
(cluster, job, route) (cluster, job, route)"
record: cluster_job_route:cortex_querier_request_duration_seconds:avg record: "cluster_job_route:cortex_querier_request_duration_seconds:avg"
- expr: sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
job, route) job, route)"
record: cluster_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate record: "cluster_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job, route) job, route)"
record: cluster_job_route:cortex_querier_request_duration_seconds_sum:sum_rate record: "cluster_job_route:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job, route) job, route)"
record: cluster_job_route:cortex_querier_request_duration_seconds_count:sum_rate record: "cluster_job_route:cortex_querier_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds:99quantile record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route)) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds:50quantile record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
namespace, job, route) / sum(rate(cortex_querier_request_duration_seconds_count[1m])) namespace, job, route) / sum(rate(cortex_querier_request_duration_seconds_count[5m]))
by (cluster, namespace, job, route) by (cluster, namespace, job, route)"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds:avg record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:avg"
- expr: sum(rate(cortex_querier_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
namespace, job, route) namespace, job, route)"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
namespace, job, route) namespace, job, route)"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds_sum:sum_rate record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_querier_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route) namespace, job, route)"
record: cluster_namespace_job_route:cortex_querier_request_duration_seconds_count:sum_rate record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_count:sum_rate"
- name: mimir_cache - name: "mimir_cache"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_memcache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_memcache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method)) by (le, cluster, job, method))"
record: cluster_job_method:cortex_memcache_request_duration_seconds:99quantile record: "cluster_job_method:cortex_memcache_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_memcache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_memcache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method)) by (le, cluster, job, method))"
record: cluster_job_method:cortex_memcache_request_duration_seconds:50quantile record: "cluster_job_method:cortex_memcache_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_memcache_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_memcache_request_duration_seconds_sum[5m])) by (cluster,
job, method) / sum(rate(cortex_memcache_request_duration_seconds_count[1m])) job, method) / sum(rate(cortex_memcache_request_duration_seconds_count[5m]))
by (cluster, job, method) by (cluster, job, method)"
record: cluster_job_method:cortex_memcache_request_duration_seconds:avg record: "cluster_job_method:cortex_memcache_request_duration_seconds:avg"
- expr: sum(rate(cortex_memcache_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_memcache_request_duration_seconds_bucket[5m])) by (le, cluster,
job, method) job, method)"
record: cluster_job_method:cortex_memcache_request_duration_seconds_bucket:sum_rate record: "cluster_job_method:cortex_memcache_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_memcache_request_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_memcache_request_duration_seconds_sum[5m])) by (cluster,
job, method) job, method)"
record: cluster_job_method:cortex_memcache_request_duration_seconds_sum:sum_rate record: "cluster_job_method:cortex_memcache_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_memcache_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_memcache_request_duration_seconds_count[5m])) by (cluster,
job, method) job, method)"
record: cluster_job_method:cortex_memcache_request_duration_seconds_count:sum_rate record: "cluster_job_method:cortex_memcache_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_cache_request_duration_seconds:99quantile record: "cluster_job:cortex_cache_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_cache_request_duration_seconds:50quantile record: "cluster_job:cortex_cache_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_cache_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job)
/ sum(rate(cortex_cache_request_duration_seconds_count[1m])) by (cluster, job) / sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster, job)"
record: cluster_job:cortex_cache_request_duration_seconds:avg record: "cluster_job:cortex_cache_request_duration_seconds:avg"
- expr: sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_cache_request_duration_seconds_bucket[5m])) by (le, cluster,
job) job)"
record: cluster_job:cortex_cache_request_duration_seconds_bucket:sum_rate record: "cluster_job:cortex_cache_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_cache_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_cache_request_duration_seconds_sum:sum_rate record: "cluster_job:cortex_cache_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_cache_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_cache_request_duration_seconds_count:sum_rate record: "cluster_job:cortex_cache_request_duration_seconds_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method)) by (le, cluster, job, method))"
record: cluster_job_method:cortex_cache_request_duration_seconds:99quantile record: "cluster_job_method:cortex_cache_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method)) by (le, cluster, job, method))"
record: cluster_job_method:cortex_cache_request_duration_seconds:50quantile record: "cluster_job_method:cortex_cache_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_cache_request_duration_seconds_sum[1m])) by (cluster, job, - expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job,
method) / sum(rate(cortex_cache_request_duration_seconds_count[1m])) by (cluster, method) / sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job, method) job, method)"
record: cluster_job_method:cortex_cache_request_duration_seconds:avg record: "cluster_job_method:cortex_cache_request_duration_seconds:avg"
- expr: sum(rate(cortex_cache_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_cache_request_duration_seconds_bucket[5m])) by (le, cluster,
job, method) job, method)"
record: cluster_job_method:cortex_cache_request_duration_seconds_bucket:sum_rate record: "cluster_job_method:cortex_cache_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_cache_request_duration_seconds_sum[1m])) by (cluster, job, - expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job,
method) method)"
record: cluster_job_method:cortex_cache_request_duration_seconds_sum:sum_rate record: "cluster_job_method:cortex_cache_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_cache_request_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job, method) job, method)"
record: cluster_job_method:cortex_cache_request_duration_seconds_count:sum_rate record: "cluster_job_method:cortex_cache_request_duration_seconds_count:sum_rate"
- name: mimir_storage - name: "mimir_storage"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_kv_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_kv_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_kv_request_duration_seconds:99quantile record: "cluster_job:cortex_kv_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_kv_request_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_kv_request_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_kv_request_duration_seconds:50quantile record: "cluster_job:cortex_kv_request_duration_seconds:50quantile"
- expr: sum(rate(cortex_kv_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_kv_request_duration_seconds_sum[5m])) by (cluster, job)
/ sum(rate(cortex_kv_request_duration_seconds_count[1m])) by (cluster, job) / sum(rate(cortex_kv_request_duration_seconds_count[5m])) by (cluster, job)"
record: cluster_job:cortex_kv_request_duration_seconds:avg record: "cluster_job:cortex_kv_request_duration_seconds:avg"
- expr: sum(rate(cortex_kv_request_duration_seconds_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_kv_request_duration_seconds_bucket[5m])) by (le, cluster,
job) job)"
record: cluster_job:cortex_kv_request_duration_seconds_bucket:sum_rate record: "cluster_job:cortex_kv_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_kv_request_duration_seconds_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_kv_request_duration_seconds_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_kv_request_duration_seconds_sum:sum_rate record: "cluster_job:cortex_kv_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_kv_request_duration_seconds_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_kv_request_duration_seconds_count[5m])) by (cluster, job)"
record: cluster_job:cortex_kv_request_duration_seconds_count:sum_rate record: "cluster_job:cortex_kv_request_duration_seconds_count:sum_rate"
- name: mimir_queries - name: "mimir_queries"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_query_frontend_retries_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_query_frontend_retries_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_query_frontend_retries:99quantile record: "cluster_job:cortex_query_frontend_retries:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_query_frontend_retries_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_query_frontend_retries_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_query_frontend_retries:50quantile record: "cluster_job:cortex_query_frontend_retries:50quantile"
- expr: sum(rate(cortex_query_frontend_retries_sum[1m])) by (cluster, job) / sum(rate(cortex_query_frontend_retries_count[1m])) - expr: "sum(rate(cortex_query_frontend_retries_sum[5m])) by (cluster, job) / sum(rate(cortex_query_frontend_retries_count[5m]))
by (cluster, job) by (cluster, job)"
record: cluster_job:cortex_query_frontend_retries:avg record: "cluster_job:cortex_query_frontend_retries:avg"
- expr: sum(rate(cortex_query_frontend_retries_bucket[1m])) by (le, cluster, job) - expr: "sum(rate(cortex_query_frontend_retries_bucket[5m])) by (le, cluster, job)"
record: cluster_job:cortex_query_frontend_retries_bucket:sum_rate record: "cluster_job:cortex_query_frontend_retries_bucket:sum_rate"
- expr: sum(rate(cortex_query_frontend_retries_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_query_frontend_retries_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_query_frontend_retries_sum:sum_rate record: "cluster_job:cortex_query_frontend_retries_sum:sum_rate"
- expr: sum(rate(cortex_query_frontend_retries_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_query_frontend_retries_count[5m])) by (cluster, job)"
record: cluster_job:cortex_query_frontend_retries_count:sum_rate record: "cluster_job:cortex_query_frontend_retries_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_query_frontend_queue_duration_seconds:99quantile record: "cluster_job:cortex_query_frontend_queue_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_query_frontend_queue_duration_seconds:50quantile record: "cluster_job:cortex_query_frontend_queue_duration_seconds:50quantile"
- expr: sum(rate(cortex_query_frontend_queue_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_sum[5m])) by (cluster,
job) / sum(rate(cortex_query_frontend_queue_duration_seconds_count[1m])) by job) / sum(rate(cortex_query_frontend_queue_duration_seconds_count[5m])) by
(cluster, job) (cluster, job)"
record: cluster_job:cortex_query_frontend_queue_duration_seconds:avg record: "cluster_job:cortex_query_frontend_queue_duration_seconds:avg"
- expr: sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[1m])) by (le, - expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m])) by (le,
cluster, job) cluster, job)"
record: cluster_job:cortex_query_frontend_queue_duration_seconds_bucket:sum_rate record: "cluster_job:cortex_query_frontend_queue_duration_seconds_bucket:sum_rate"
- expr: sum(rate(cortex_query_frontend_queue_duration_seconds_sum[1m])) by (cluster, - expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_sum[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_query_frontend_queue_duration_seconds_sum:sum_rate record: "cluster_job:cortex_query_frontend_queue_duration_seconds_sum:sum_rate"
- expr: sum(rate(cortex_query_frontend_queue_duration_seconds_count[1m])) by (cluster, - expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_count[5m])) by (cluster,
job) job)"
record: cluster_job:cortex_query_frontend_queue_duration_seconds_count:sum_rate record: "cluster_job:cortex_query_frontend_queue_duration_seconds_count:sum_rate"
- name: mimir_ingester_queries - name: "mimir_ingester_queries"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(cortex_ingester_queried_series_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_series_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_series:99quantile record: "cluster_job:cortex_ingester_queried_series:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_ingester_queried_series_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_series_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_series:50quantile record: "cluster_job:cortex_ingester_queried_series:50quantile"
- expr: sum(rate(cortex_ingester_queried_series_sum[1m])) by (cluster, job) / sum(rate(cortex_ingester_queried_series_count[1m])) - expr: "sum(rate(cortex_ingester_queried_series_sum[5m])) by (cluster, job) / sum(rate(cortex_ingester_queried_series_count[5m]))
by (cluster, job) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_series:avg record: "cluster_job:cortex_ingester_queried_series:avg"
- expr: sum(rate(cortex_ingester_queried_series_bucket[1m])) by (le, cluster, job) - expr: "sum(rate(cortex_ingester_queried_series_bucket[5m])) by (le, cluster, job)"
record: cluster_job:cortex_ingester_queried_series_bucket:sum_rate record: "cluster_job:cortex_ingester_queried_series_bucket:sum_rate"
- expr: sum(rate(cortex_ingester_queried_series_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_series_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_series_sum:sum_rate record: "cluster_job:cortex_ingester_queried_series_sum:sum_rate"
- expr: sum(rate(cortex_ingester_queried_series_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_series_count[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_series_count:sum_rate record: "cluster_job:cortex_ingester_queried_series_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_ingester_queried_samples_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_samples_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_samples:99quantile record: "cluster_job:cortex_ingester_queried_samples:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_ingester_queried_samples_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_samples_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_samples:50quantile record: "cluster_job:cortex_ingester_queried_samples:50quantile"
- expr: sum(rate(cortex_ingester_queried_samples_sum[1m])) by (cluster, job) / sum(rate(cortex_ingester_queried_samples_count[1m])) - expr: "sum(rate(cortex_ingester_queried_samples_sum[5m])) by (cluster, job) / sum(rate(cortex_ingester_queried_samples_count[5m]))
by (cluster, job) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_samples:avg record: "cluster_job:cortex_ingester_queried_samples:avg"
- expr: sum(rate(cortex_ingester_queried_samples_bucket[1m])) by (le, cluster, job) - expr: "sum(rate(cortex_ingester_queried_samples_bucket[5m])) by (le, cluster, job)"
record: cluster_job:cortex_ingester_queried_samples_bucket:sum_rate record: "cluster_job:cortex_ingester_queried_samples_bucket:sum_rate"
- expr: sum(rate(cortex_ingester_queried_samples_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_samples_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_samples_sum:sum_rate record: "cluster_job:cortex_ingester_queried_samples_sum:sum_rate"
- expr: sum(rate(cortex_ingester_queried_samples_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_samples_count[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_samples_count:sum_rate record: "cluster_job:cortex_ingester_queried_samples_count:sum_rate"
- expr: histogram_quantile(0.99, sum(rate(cortex_ingester_queried_exemplars_bucket[1m])) - expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_exemplars_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_exemplars:99quantile record: "cluster_job:cortex_ingester_queried_exemplars:99quantile"
- expr: histogram_quantile(0.50, sum(rate(cortex_ingester_queried_exemplars_bucket[1m])) - expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_exemplars_bucket[5m]))
by (le, cluster, job)) by (le, cluster, job))"
record: cluster_job:cortex_ingester_queried_exemplars:50quantile record: "cluster_job:cortex_ingester_queried_exemplars:50quantile"
- expr: sum(rate(cortex_ingester_queried_exemplars_sum[1m])) by (cluster, job) / - expr: "sum(rate(cortex_ingester_queried_exemplars_sum[5m])) by (cluster, job) /
sum(rate(cortex_ingester_queried_exemplars_count[1m])) by (cluster, job) sum(rate(cortex_ingester_queried_exemplars_count[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_exemplars:avg record: "cluster_job:cortex_ingester_queried_exemplars:avg"
- expr: sum(rate(cortex_ingester_queried_exemplars_bucket[1m])) by (le, cluster, - expr: "sum(rate(cortex_ingester_queried_exemplars_bucket[5m])) by (le, cluster,
job) job)"
record: cluster_job:cortex_ingester_queried_exemplars_bucket:sum_rate record: "cluster_job:cortex_ingester_queried_exemplars_bucket:sum_rate"
- expr: sum(rate(cortex_ingester_queried_exemplars_sum[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_exemplars_sum[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_exemplars_sum:sum_rate record: "cluster_job:cortex_ingester_queried_exemplars_sum:sum_rate"
- expr: sum(rate(cortex_ingester_queried_exemplars_count[1m])) by (cluster, job) - expr: "sum(rate(cortex_ingester_queried_exemplars_count[5m])) by (cluster, job)"
record: cluster_job:cortex_ingester_queried_exemplars_count:sum_rate record: "cluster_job:cortex_ingester_queried_exemplars_count:sum_rate"
- name: mimir_received_samples - name: "mimir_received_samples"
rules: rules:
- expr: | - expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_received_samples_total[5m]))"
sum by (cluster, namespace, job) (rate(cortex_distributor_received_samples_total[5m])) record: "cluster_namespace_job:cortex_distributor_received_samples:rate5m"
record: cluster_namespace_job:cortex_distributor_received_samples:rate5m - name: "mimir_exemplars_in"
- name: mimir_exemplars_in
rules: rules:
- expr: | - expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_exemplars_in_total[5m]))"
sum by (cluster, namespace, job) (rate(cortex_distributor_exemplars_in_total[5m])) record: "cluster_namespace_job:cortex_distributor_exemplars_in:rate5m"
record: cluster_namespace_job:cortex_distributor_exemplars_in:rate5m - name: "mimir_received_exemplars"
- name: mimir_received_exemplars
rules: rules:
- expr: | - expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_received_exemplars_total[5m]))"
sum by (cluster, namespace, job) (rate(cortex_distributor_received_exemplars_total[5m])) record: "cluster_namespace_job:cortex_distributor_received_exemplars:rate5m"
record: cluster_namespace_job:cortex_distributor_received_exemplars:rate5m - name: "mimir_exemplars_ingested"
- name: mimir_exemplars_ingested
rules: rules:
- expr: | - expr: "sum by (cluster, namespace, job) (rate(cortex_ingester_ingested_exemplars_total[5m]))"
sum by (cluster, namespace, job) (rate(cortex_ingester_ingested_exemplars_total[5m])) record: "cluster_namespace_job:cortex_ingester_ingested_exemplars:rate5m"
record: cluster_namespace_job:cortex_ingester_ingested_exemplars:rate5m - name: "mimir_exemplars_appended"
- name: mimir_exemplars_appended
rules: rules:
- expr: | - expr: "sum by (cluster, namespace, job) (rate(cortex_ingester_tsdb_exemplar_exemplars_appended_total[5m]))"
sum by (cluster, namespace, job) (rate(cortex_ingester_tsdb_exemplar_exemplars_appended_total[5m])) record: "cluster_namespace_job:cortex_ingester_tsdb_exemplar_exemplars_appended:rate5m"
record: cluster_namespace_job:cortex_ingester_tsdb_exemplar_exemplars_appended:rate5m - name: "mimir_scaling_rules"
- name: mimir_scaling_rules
rules: rules:
- expr: | - expr: |
# Convenience rule to get the number of replicas for both a deployment and a statefulset. # Convenience rule to get the number of replicas for both a deployment and a statefulset.
@@ -315,7 +310,7 @@ groups:
sum by (cluster, namespace, deployment) ( sum by (cluster, namespace, deployment) (
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*?)(?:-zone-[a-z])?") label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*?)(?:-zone-[a-z])?")
) )
record: cluster_namespace_deployment:actual_replicas:count record: "cluster_namespace_deployment:actual_replicas:count"
- expr: | - expr: |
ceil( ceil(
quantile_over_time(0.99, quantile_over_time(0.99,
@@ -326,18 +321,18 @@ groups:
/ 240000 / 240000
) )
labels: labels:
deployment: distributor deployment: "distributor"
reason: sample_rate reason: "sample_rate"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"}) sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"})
* 0.59999999999999998 / 240000 * 0.59999999999999998 / 240000
) )
labels: labels:
deployment: distributor deployment: "distributor"
reason: sample_rate_limits reason: "sample_rate_limits"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
quantile_over_time(0.99, quantile_over_time(0.99,
@@ -348,9 +343,9 @@ groups:
* 3 / 80000 * 3 / 80000
) )
labels: labels:
deployment: ingester deployment: "ingester"
reason: sample_rate reason: "sample_rate"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
quantile_over_time(0.99, quantile_over_time(0.99,
@@ -361,27 +356,27 @@ groups:
/ 1500000 / 1500000
) )
labels: labels:
deployment: ingester deployment: "ingester"
reason: active_series reason: "active_series"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="max_global_series_per_user"}) sum by (cluster, namespace) (cortex_limits_overrides{limit_name="max_global_series_per_user"})
* 3 * 0.59999999999999998 / 1500000 * 3 * 0.59999999999999998 / 1500000
) )
labels: labels:
deployment: ingester deployment: "ingester"
reason: active_series_limits reason: "active_series_limits"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"}) sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"})
* 0.59999999999999998 / 80000 * 0.59999999999999998 / 80000
) )
labels: labels:
deployment: ingester deployment: "ingester"
reason: sample_rate_limits reason: "sample_rate_limits"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
ceil( ceil(
(sum by (cluster, namespace) ( (sum by (cluster, namespace) (
@@ -393,14 +388,14 @@ groups:
) )
) )
labels: labels:
deployment: memcached deployment: "memcached"
reason: active_series reason: "active_series"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
sum by (cluster, namespace, deployment) ( sum by (cluster, namespace, deployment) (
label_replace( label_replace(
label_replace( label_replace(
sum by (cluster, namespace, pod)(rate(container_cpu_usage_seconds_total[1m])), sum by (cluster, namespace, pod)(rate(container_cpu_usage_seconds_total[5m])),
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))" "deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
), ),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it # The question mark in "(.*?)" is used to make it non-greedy, otherwise it
@@ -408,7 +403,7 @@ groups:
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?" "deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
) )
) )
record: cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate record: "cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate"
- expr: | - expr: |
# Convenience rule to get the CPU request for both a deployment and a statefulset. # Convenience rule to get the CPU request for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix. # Multi-zone deployments are grouped together removing the "zone-X" suffix.
@@ -448,7 +443,7 @@ groups:
) )
) )
) )
record: cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum record: "cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum"
- expr: | - expr: |
# Jobs should be sized to their CPU usage. # Jobs should be sized to their CPU usage.
# We do this by comparing 99th percentile usage over the last 24hrs to # We do this by comparing 99th percentile usage over the last 24hrs to
@@ -461,8 +456,8 @@ groups:
cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum
) )
labels: labels:
reason: cpu_usage reason: "cpu_usage"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: | - expr: |
# Convenience rule to get the Memory utilization for both a deployment and a statefulset. # Convenience rule to get the Memory utilization for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix. # Multi-zone deployments are grouped together removing the "zone-X" suffix.
@@ -477,7 +472,7 @@ groups:
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?" "deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
) )
) )
record: cluster_namespace_deployment:container_memory_usage_bytes:sum record: "cluster_namespace_deployment:container_memory_usage_bytes:sum"
- expr: | - expr: |
# Convenience rule to get the Memory request for both a deployment and a statefulset. # Convenience rule to get the Memory request for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix. # Multi-zone deployments are grouped together removing the "zone-X" suffix.
@@ -517,7 +512,7 @@ groups:
) )
) )
) )
record: cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum record: "cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum"
- expr: | - expr: |
# Jobs should be sized to their Memory usage. # Jobs should be sized to their Memory usage.
# We do this by comparing 99th percentile usage over the last 24hrs to # We do this by comparing 99th percentile usage over the last 24hrs to
@@ -530,42 +525,31 @@ groups:
cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum
) )
labels: labels:
reason: memory_usage reason: "memory_usage"
record: cluster_namespace_deployment_reason:required_replicas:count record: "cluster_namespace_deployment_reason:required_replicas:count"
- name: mimir_alertmanager_rules - name: "mimir_alertmanager_rules"
rules: rules:
- expr: | - expr: "sum by (cluster, job, pod) (cortex_alertmanager_alerts)"
sum by (cluster, job, pod) (cortex_alertmanager_alerts) record: "cluster_job_pod:cortex_alertmanager_alerts:sum"
record: cluster_job_pod:cortex_alertmanager_alerts:sum - expr: "sum by (cluster, job, pod) (cortex_alertmanager_silences)"
- expr: | record: "cluster_job_pod:cortex_alertmanager_silences:sum"
sum by (cluster, job, pod) (cortex_alertmanager_silences) - expr: "sum by (cluster, job) (rate(cortex_alertmanager_alerts_received_total[5m]))"
record: cluster_job_pod:cortex_alertmanager_silences:sum record: "cluster_job:cortex_alertmanager_alerts_received_total:rate5m"
- expr: | - expr: "sum by (cluster, job) (rate(cortex_alertmanager_alerts_invalid_total[5m]))"
sum by (cluster, job) (rate(cortex_alertmanager_alerts_received_total[5m])) record: "cluster_job:cortex_alertmanager_alerts_invalid_total:rate5m"
record: cluster_job:cortex_alertmanager_alerts_received_total:rate5m - expr: "sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_total[5m]))"
- expr: | record: "cluster_job_integration:cortex_alertmanager_notifications_total:rate5m"
sum by (cluster, job) (rate(cortex_alertmanager_alerts_invalid_total[5m])) - expr: "sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_failed_total[5m]))"
record: cluster_job:cortex_alertmanager_alerts_invalid_total:rate5m record: "cluster_job_integration:cortex_alertmanager_notifications_failed_total:rate5m"
- expr: | - expr: "sum by (cluster, job) (rate(cortex_alertmanager_state_replication_total[5m]))"
sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_total[5m])) record: "cluster_job:cortex_alertmanager_state_replication_total:rate5m"
record: cluster_job_integration:cortex_alertmanager_notifications_total:rate5m - expr: "sum by (cluster, job) (rate(cortex_alertmanager_state_replication_failed_total[5m]))"
- expr: | record: "cluster_job:cortex_alertmanager_state_replication_failed_total:rate5m"
sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_failed_total[5m])) - expr: "sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_total[5m]))"
record: cluster_job_integration:cortex_alertmanager_notifications_failed_total:rate5m record: "cluster_job:cortex_alertmanager_partial_state_merges_total:rate5m"
- expr: | - expr: "sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_failed_total[5m]))"
sum by (cluster, job) (rate(cortex_alertmanager_state_replication_total[5m])) record: "cluster_job:cortex_alertmanager_partial_state_merges_failed_total:rate5m"
record: cluster_job:cortex_alertmanager_state_replication_total:rate5m - name: "mimir_ingester_rules"
- expr: |
sum by (cluster, job) (rate(cortex_alertmanager_state_replication_failed_total[5m]))
record: cluster_job:cortex_alertmanager_state_replication_failed_total:rate5m
- expr: |
sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_total[5m]))
record: cluster_job:cortex_alertmanager_partial_state_merges_total:rate5m
- expr: |
sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_failed_total[5m]))
record: cluster_job:cortex_alertmanager_partial_state_merges_failed_total:rate5m
- name: mimir_ingester_rules
rules: rules:
- expr: | - expr: "sum by(cluster, namespace, pod) (rate(cortex_ingester_ingested_samples_total[5m]))"
sum by(cluster, namespace, pod) (rate(cortex_ingester_ingested_samples_total[1m])) record: "cluster_namespace_pod:cortex_ingester_ingested_samples_total:rate1m"
record: cluster_namespace_pod:cortex_ingester_ingested_samples_total:rate1m

View File

@@ -1,15 +1,15 @@
groups: groups:
- name: tempo_rules - name: "tempo_rules"
rules: rules:
- expr: histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket[1m])) by (le, cluster, namespace, job, route)) - expr: "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:tempo_request_duration_seconds:99quantile record: "cluster_namespace_job_route:tempo_request_duration_seconds:99quantile"
- expr: histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket[1m])) by (le, cluster, namespace, job, route)) - expr: "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))"
record: cluster_namespace_job_route:tempo_request_duration_seconds:50quantile record: "cluster_namespace_job_route:tempo_request_duration_seconds:50quantile"
- expr: sum(rate(tempo_request_duration_seconds_sum[1m])) by (cluster, namespace, job, route) / sum(rate(tempo_request_duration_seconds_count[1m])) by (cluster, namespace, job, route) - expr: "sum(rate(tempo_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route) / sum(rate(tempo_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)"
record: cluster_namespace_job_route:tempo_request_duration_seconds:avg record: "cluster_namespace_job_route:tempo_request_duration_seconds:avg"
- expr: sum(rate(tempo_request_duration_seconds_bucket[1m])) by (le, cluster, namespace, job, route) - expr: "sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route)"
record: cluster_namespace_job_route:tempo_request_duration_seconds_bucket:sum_rate record: "cluster_namespace_job_route:tempo_request_duration_seconds_bucket:sum_rate"
- expr: sum(rate(tempo_request_duration_seconds_sum[1m])) by (cluster, namespace, job, route) - expr: "sum(rate(tempo_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route)"
record: cluster_namespace_job_route:tempo_request_duration_seconds_sum:sum_rate record: "cluster_namespace_job_route:tempo_request_duration_seconds_sum:sum_rate"
- expr: sum(rate(tempo_request_duration_seconds_count[1m])) by (cluster, namespace, job, route) - expr: "sum(rate(tempo_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)"
record: cluster_namespace_job_route:tempo_request_duration_seconds_count:sum_rate record: "cluster_namespace_job_route:tempo_request_duration_seconds_count:sum_rate"

View File

@@ -0,0 +1,33 @@
{{/*
Return the appropriate apiVersion for ingress.
*/}}
{{- define "ingress.apiVersion" -}}
{{- if and (.Capabilities.APIVersions.Has "networking.k8s.io/v1") (semverCompare ">= 1.19-0" .Capabilities.KubeVersion.Version) -}}
{{- print "networking.k8s.io/v1" -}}
{{- else if .Capabilities.APIVersions.Has "networking.k8s.io/v1beta1" -}}
{{- print "networking.k8s.io/v1beta1" -}}
{{- else -}}
{{- print "extensions/v1beta1" -}}
{{- end -}}
{{- end -}}
{{/*
Return if ingress is stable.
*/}}
{{- define "ingress.isStable" -}}
{{- eq (include "ingress.apiVersion" .) "networking.k8s.io/v1" -}}
{{- end -}}
{{/*
Return if ingress supports ingressClassName.
*/}}
{{- define "ingress.supportsIngressClassName" -}}
{{- or (eq (include "ingress.isStable" .) "true") (and (eq (include "ingress.apiVersion" .) "networking.k8s.io/v1beta1") (semverCompare ">= 1.18-0" .Capabilities.KubeVersion.Version)) -}}
{{- end -}}
{{/*
Return if ingress supports pathType.
*/}}
{{- define "ingress.supportsPathType" -}}
{{- or (eq (include "ingress.isStable" .) "true") (and (eq (include "ingress.apiVersion" .) "networking.k8s.io/v1beta1") (semverCompare ">= 1.18-0" .Capabilities.KubeVersion.Version)) -}}
{{- end -}}

View File

@@ -18,10 +18,10 @@
{{- end }} {{- end }}
{{- define "agent.loki_process_targets" -}} {{- define "agent.loki_process_targets" -}}
{{- if empty .Values.logs.piiRegexes }} {{- if and (empty .Values.logs.piiRegexes) (empty .Values.logs.retain) }}
{{- include "agent.loki_write_targets" . }} {{- include "agent.loki_write_targets" . }}
{{- else }} {{- else }}
{{- printf "loki.process.PII.receiver" }} {{- printf "loki.process.filter.receiver" }}
{{- end }} {{- end }}
{{- end }} {{- end }}

View File

@@ -8,7 +8,7 @@ data:
discovery.kubernetes "pods" { discovery.kubernetes "pods" {
role = "pod" role = "pod"
namespaces { namespaces {
own_namespace = false own_namespace = true
names = [ {{ include "agent.namespaces" . }} ] names = [ {{ include "agent.namespaces" . }} ]
} }
} }
@@ -33,22 +33,38 @@ data:
} }
rule { rule {
target_label = "cluster" target_label = "cluster"
replacement = "{{- .Values.clusterName -}}" replacement = "{{- .Values.clusterLabelValue -}}"
} }
} }
{{- if or .Values.local.logs.enabled .Values.cloud.logs.enabled }}
// Logs // Logs
{{- if or .Values.local.logs.enabled .Values.cloud.logs.enabled }} remote.kubernetes.secret "logs_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.logs.secret -}}"
}
loki.source.kubernetes "pods" { loki.source.kubernetes "pods" {
clustering {
enabled = true
}
targets = discovery.relabel.rename_meta_labels.output targets = discovery.relabel.rename_meta_labels.output
forward_to = [ {{ include "agent.loki_process_targets" . }} ] forward_to = [ {{ include "agent.loki_process_targets" . }} ]
} }
{{- if not (empty .Values.logs.piiRegexes) }} {{- if or (not (empty .Values.logs.retain)) (not (empty .Values.logs.piiRegexes)) }}
loki.process "PII" { loki.process "filter" {
forward_to = [ {{ include "agent.loki_write_targets" . }} ] forward_to = [ {{ include "agent.loki_write_targets" . }} ]
{{- if not (empty .Values.logs.retain) }}
stage.match {
selector = "{cluster=\"{{- .Values.clusterLabelValue -}}\", namespace=~\"{{- join "|" .Values.namespacesToMonitor -}}|{{- $.Release.Namespace -}}\", pod=~\"loki.*\"} !~ \"{{ join "|" .Values.logs.retain }}\""
action = "drop"
}
{{- end }}
{{- if not (empty .Values.logs.piiRegexes) }}
{{- range .Values.logs.piiRegexes }} {{- range .Values.logs.piiRegexes }}
stage.replace { stage.replace {
expression = "{{ .expression }}" expression = "{{ .expression }}"
@@ -56,26 +72,85 @@ data:
replace = "{{ .replace }}" replace = "{{ .replace }}"
} }
{{- end }} {{- end }}
{{- end }}
} }
{{- end }} {{- end }}
{{- end }} {{- end }}
{{- if or .Values.local.metrics.enabled .Values.cloud.metrics.enabled }}
// Metrics // Metrics
{{- if or .Values.local.metrics.enabled .Values.cloud.metrics.enabled }} remote.kubernetes.secret "metrics_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.metrics.secret -}}"
}
discovery.kubernetes "metric_pods" {
role = "pod"
namespaces {
own_namespace = true
names = [ {{ include "agent.namespaces" . }} ]
}
}
discovery.relabel "only_http_metrics" {
targets = discovery.kubernetes.metric_pods.targets
rule {
source_labels = ["__meta_kubernetes_namespace"]
target_label = "namespace"
}
rule {
source_labels = ["__meta_kubernetes_pod_name"]
target_label = "pod"
}
rule {
source_labels = ["__meta_kubernetes_namespace", "__meta_kubernetes_pod_label_app_kubernetes_io_name", "__meta_kubernetes_pod_label_app_kubernetes_io_component"]
separator = "/"
regex = "(.*)/(.*)/(.*)"
replacement = "${1}/${2}-${3}"
target_label = "job"
}
rule {
target_label = "cluster"
replacement = "{{- .Values.clusterLabelValue -}}"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_port_number"]
action = "drop"
regex = "9095"
}
}
prometheus.scrape "pods" { prometheus.scrape "pods" {
targets = discovery.relabel.rename_meta_labels.output clustering {
enabled = true
}
targets = discovery.relabel.only_http_metrics.output
forward_to = [ prometheus.relabel.filter.receiver ]
}
prometheus.relabel "filter" {
rule {
source_labels = ["__name__"]
regex = "({{ join "|" .Values.metrics.retain }})"
action = "keep"
}
forward_to = [ {{ include "agent.prometheus_write_targets" . }} ] forward_to = [ {{ include "agent.prometheus_write_targets" . }} ]
} }
{{- if .Values.kubeStateMetrics.enabled }} {{- if .Values.kubeStateMetrics.enabled }}
prometheus.scrape "kubeStateMetrics" { prometheus.scrape "kubeStateMetrics" {
clustering {
enabled = true
}
targets = [ { "__address__" = "{{ .Values.kubeStateMetrics.endpoint }}" } ] targets = [ { "__address__" = "{{ .Values.kubeStateMetrics.endpoint }}" } ]
forward_to = [ {{ include "agent.prometheus_write_targets" . }} ] forward_to = [ prometheus.relabel.filter.receiver ]
} }
{{- end }} {{- end }}
// cAdvisor and Kubelete metrics // cAdvisor and Kubelet metrics
// Based on https://github.com/Chewie/loutretelecom-manifests/blob/main/manifests/addons/monitoring/config.river // Based on https://github.com/Chewie/loutretelecom-manifests/blob/main/manifests/addons/monitoring/config.river
discovery.kubernetes "all_nodes" { discovery.kubernetes "all_nodes" {
role = "node" role = "node"
@@ -104,15 +179,17 @@ data:
} }
rule { rule {
target_label = "cluster" target_label = "cluster"
replacement = "{{- .Values.clusterName -}}" replacement = "{{- .Values.clusterLabelValue -}}"
} }
} }
prometheus.scrape "cadvisor" { prometheus.scrape "cadvisor" {
clustering {
enabled = true
}
targets = discovery.relabel.all_nodes.output targets = discovery.relabel.all_nodes.output
forward_to = [ {{ include "agent.prometheus_write_targets" . }} ] forward_to = [ prometheus.relabel.filter.receiver ]
scrape_interval = "15s"
metrics_path = "/metrics/cadvisor" metrics_path = "/metrics/cadvisor"
scheme = "https" scheme = "https"
@@ -123,10 +200,12 @@ data:
} }
prometheus.scrape "kubelet" { prometheus.scrape "kubelet" {
clustering {
enabled = true
}
targets = discovery.relabel.all_nodes.output targets = discovery.relabel.all_nodes.output
forward_to = [ {{ include "agent.prometheus_write_targets" . }} ] forward_to = [ prometheus.relabel.filter.receiver ]
scrape_interval = "15s"
metrics_path = "/metrics" metrics_path = "/metrics"
scheme = "https" scheme = "https"
@@ -136,18 +215,20 @@ data:
} }
} }
prometheus.exporter.unix {} prometheus.exporter.unix "promexporter" {}
prometheus.scrape "node_exporter" { prometheus.scrape "node_exporter" {
targets = prometheus.exporter.unix.targets clustering {
enabled = true
}
targets = prometheus.exporter.unix.promexporter.targets
forward_to = [prometheus.relabel.node_exporter.receiver] forward_to = [prometheus.relabel.node_exporter.receiver]
job_name = "node-exporter" job_name = "node-exporter"
scrape_interval = "15s"
} }
prometheus.relabel "node_exporter" { prometheus.relabel "node_exporter" {
forward_to = [ {{ include "agent.prometheus_write_targets" . }} ] forward_to = [ prometheus.relabel.filter.receiver ]
rule { rule {
replacement = env("HOSTNAME") replacement = env("HOSTNAME")
@@ -178,14 +259,19 @@ data:
} }
rule { rule {
target_label = "cluster" target_label = "cluster"
replacement = "{{- .Values.clusterName -}}" replacement = "{{- .Values.clusterLabelValue -}}"
} }
} }
{{- end }} {{- end }}
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }}
// Traces // Traces
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }} remote.kubernetes.secret "traces_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.traces.secret -}}"
}
// Shamelessly copied from https://github.com/grafana/intro-to-mlt/blob/main/agent/config.river // Shamelessly copied from https://github.com/grafana/intro-to-mlt/blob/main/agent/config.river
otelcol.receiver.otlp "otlp_receiver" { otelcol.receiver.otlp "otlp_receiver" {
// We don't technically need this, but it shows how to change listen address and incoming port. // We don't technically need this, but it shows how to change listen address and incoming port.
@@ -254,11 +340,10 @@ data:
{{- if .Values.cloud.logs.enabled }} {{- if .Values.cloud.logs.enabled }}
loki.write "cloud" { loki.write "cloud" {
endpoint { endpoint {
url = "{{- .Values.cloud.logs.endpoint -}}/loki/api/v1/push" url = nonsensitive(remote.kubernetes.secret.logs_credentials.data["endpoint"])
basic_auth { basic_auth {
username = "{{- .Values.cloud.logs.username -}}" username = nonsensitive(remote.kubernetes.secret.logs_credentials.data["username"])
password = "{{- .Values.cloud.logs.password -}}" password = remote.kubernetes.secret.logs_credentials.data["password"]
} }
} }
} }
@@ -267,11 +352,10 @@ data:
{{- if .Values.cloud.metrics.enabled }} {{- if .Values.cloud.metrics.enabled }}
prometheus.remote_write "cloud" { prometheus.remote_write "cloud" {
endpoint { endpoint {
url = "{{- .Values.cloud.metrics.endpoint -}}/api/prom/push" url = nonsensitive(remote.kubernetes.secret.metrics_credentials.data["endpoint"])
basic_auth { basic_auth {
username = "{{- .Values.cloud.metrics.username -}}" username = nonsensitive(remote.kubernetes.secret.metrics_credentials.data["username"])
password = "{{- .Values.cloud.metrics.password -}}" password = remote.kubernetes.secret.metrics_credentials.data["password"]
} }
} }
} }
@@ -280,13 +364,13 @@ data:
{{- if .Values.cloud.traces.enabled }} {{- if .Values.cloud.traces.enabled }}
otelcol.exporter.otlp "cloud" { otelcol.exporter.otlp "cloud" {
client { client {
endpoint = "{{- .Values.cloud.traces.endpoint -}}" endpoint = nonsensitive(remote.kubernetes.secret.traces_credentials.data["endpoint"])
auth = otelcol.auth.basic.creds.handler auth = otelcol.auth.basic.creds.handler
} }
} }
otelcol.auth.basic "creds" { otelcol.auth.basic "creds" {
username = "{{- .Values.cloud.traces.username -}}" username = nonsensitive(remote.kubernetes.secret.traces_credentials.data["username"])
password = "{{- .Values.cloud.traces.password -}}" password = remote.kubernetes.secret.traces_credentials.data["password"]
} }
{{- end }} {{- end }}

View File

@@ -0,0 +1,19 @@
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: agent-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"agent-logs-pipeline.json": |
{{ $.Files.Get "src/dashboards/agent-logs-pipeline.json" | fromJson | toJson }}
"agent-operational.json": |
{{ $.Files.Get "src/dashboards/agent-operational.json" | fromJson | toJson }}
"agent-remote-write.json": |
{{ $.Files.Get "src/dashboards/agent-remote-write.json" | fromJson | toJson }}
"agent-tracing-pipeline.json": |
{{ $.Files.Get "src/dashboards/agent-tracing-pipeline.json" | fromJson | toJson }}
"agent.json": |
{{ $.Files.Get "src/dashboards/agent.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,4 +1,4 @@
{{- if or (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled) .Values.dashboards.traces.enabled }} {{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap
@@ -80,4 +80,12 @@ data:
orgId: 1 orgId: 1
type: file type: file
{{- end }} {{- end }}
- disableDeletion: true
editable: false
folder: Agent
name: agent-1
options:
path: /var/lib/grafana/dashboards/agent-1
orgId: 1
type: file
{{- end }} {{- end }}

View File

@@ -1,4 +1,4 @@
{{- if or (or .Values.local.logs.enabled .Values.local.metrics.enabled) .Values.local.traces.enabled }} {{- if .Values.local.grafana.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -0,0 +1,57 @@
{{- if and .Values.local.grafana.enabled .Values.grafana.ingress.enabled -}}
{{- $ingressApiIsStable := eq (include "ingress.isStable" .) "true" -}}
{{- $ingressSupportsIngressClassName := eq (include "ingress.supportsIngressClassName" .) "true" -}}
{{- $ingressSupportsPathType := eq (include "ingress.supportsPathType" .) "true" -}}
apiVersion: {{ include "ingress.apiVersion" . }}
kind: Ingress
metadata:
name: grafana
namespace: {{ $.Release.Namespace }}
labels:
app: grafana
{{- range $labelKey, $labelValue := .Values.grafana.ingress.labels }}
{{ $labelKey }}: {{ $labelValue | toYaml }}
{{- end }}
{{- with .Values.grafana.ingress.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
{{- if and $ingressSupportsIngressClassName .Values.grafana.ingress.ingressClassName }}
ingressClassName: {{ .Values.grafana.ingress.ingressClassName }}
{{- end -}}
{{- if .Values.grafana.ingress.tls }}
tls:
{{- range .Values.grafana.ingress.tls }}
- hosts:
{{- range .hosts }}
- {{ tpl . $ | quote }}
{{- end }}
{{- with .secretName }}
secretName: {{ . }}
{{- end }}
{{- end }}
{{- end }}
rules:
{{- range .Values.grafana.ingress.hosts }}
- host: {{ tpl .host $ | quote }}
http:
paths:
{{- range .paths }}
- path: {{ .path }}
{{- if $ingressSupportsPathType }}
pathType: {{ .pathType }}
{{- end }}
backend:
{{- if $ingressApiIsStable }}
service:
name: grafana
port:
number: 3000
{{- else }}
serviceName: grafana
servicePort: 3000
{{- end }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -1,4 +1,4 @@
{{- if or (or .Values.local.logs.enabled .Values.local.metrics.enabled) .Values.local.traces.enabled }} {{- if .Values.local.grafana.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: PersistentVolumeClaim kind: PersistentVolumeClaim
@@ -91,6 +91,8 @@ spec:
- mountPath: /var/lib/grafana/dashboards/tempo-1 - mountPath: /var/lib/grafana/dashboards/tempo-1
name: tempo-dashboards-1 name: tempo-dashboards-1
{{- end }} {{- end }}
- mountPath: /var/lib/grafana/dashboards/agent-1
name: agent-dashboards-1
volumes: volumes:
- name: grafana-pv - name: grafana-pv
persistentVolumeClaim: persistentVolumeClaim:
@@ -131,6 +133,9 @@ spec:
configMap: configMap:
name: tempo-dashboards-1 name: tempo-dashboards-1
{{- end }} {{- end }}
- name: agent-dashboards-1
configMap:
name: agent-dashboards-1
--- ---
apiVersion: v1 apiVersion: v1

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.logs.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.logs.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.metrics.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.metrics.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.metrics.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.metrics.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.metrics.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if .Values.dashboards.traces.enabled }} {{- if and .Values.local.grafana.enabled .Values.dashboards.traces.enabled }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap

View File

@@ -1,9 +1,10 @@
{{- if or (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled) .Values.dashboards.traces.enabled }} {{- if .Values.local.grafana.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
apiVersion: apps/v1 apiVersion: apps/v1
kind: Deployment kind: Deployment
metadata: metadata:
name: meta-mimir-ruler-for-dashboards name: {{ $.Release.Namespace }}-mimir-ruler-for-dashboards
namespace: meta namespace: {{ $.Release.Namespace }}
spec: spec:
progressDeadlineSeconds: 600 progressDeadlineSeconds: 600
replicas: 1 replicas: 1
@@ -24,7 +25,7 @@ spec:
app.kubernetes.io/component: ruler-for-dashboards app.kubernetes.io/component: ruler-for-dashboards
app.kubernetes.io/instance: meta app.kubernetes.io/instance: meta
app.kubernetes.io/name: mimir app.kubernetes.io/name: mimir
namespace: meta namespace: {{ $.Release.Namespace }}
spec: spec:
containers: containers:
- args: - args:
@@ -91,8 +92,6 @@ spec:
runAsUser: 10001 runAsUser: 10001
seccompProfile: seccompProfile:
type: RuntimeDefault type: RuntimeDefault
serviceAccount: meta-mimir
serviceAccountName: meta-mimir
terminationGracePeriodSeconds: 180 terminationGracePeriodSeconds: 180
topologySpreadConstraints: topologySpreadConstraints:
- labelSelector: - labelSelector:
@@ -109,11 +108,11 @@ spec:
items: items:
- key: mimir.yaml - key: mimir.yaml
path: mimir.yaml path: mimir.yaml
name: meta-mimir-config name: {{ $.Release.Namespace }}-mimir-config
name: config name: config
- configMap: - configMap:
defaultMode: 420 defaultMode: 420
name: meta-mimir-runtime name: {{ $.Release.Namespace }}-mimir-runtime
name: runtime-config name: runtime-config
- emptyDir: {} - emptyDir: {}
name: storage name: storage
@@ -124,3 +123,4 @@ spec:
name: rules name: rules
name: rules name: rules
{{- end }} {{- end }}
{{- end }}

View File

@@ -1,4 +1,5 @@
{{- if or (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled) .Values.dashboards.traces.enabled }} {{- if .Values.local.metrics.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
--- ---
apiVersion: v1 apiVersion: v1
kind: ConfigMap kind: ConfigMap
@@ -16,3 +17,4 @@ data:
{{ ($.Files.Glob "src/rules/tempo-rules.yaml").AsConfig | indent 2 }} {{ ($.Files.Glob "src/rules/tempo-rules.yaml").AsConfig | indent 2 }}
{{- end }} {{- end }}
{{- end }} {{- end }}
{{- end }}

View File

@@ -3,20 +3,20 @@
{{- end -}} {{- end -}}
{{- if eq .Values.cloud.logs.enabled true -}} {{- if eq .Values.cloud.logs.enabled true -}}
{{- if or (empty .Values.cloud.logs.endpoint) (or (empty .Values.cloud.logs.username) (empty .Values.cloud.logs.password)) -}} {{- if empty .Values.cloud.logs.secret -}}
{{- fail "if cloud.logs is enabled then the endpoint, username and password have to be filled in" -}} {{- fail "if cloud.logs is enabled then the secret has to be filled in" -}}
{{- end -}} {{- end -}}
{{- end -}} {{- end -}}
{{- if eq .Values.cloud.metrics.enabled true -}} {{- if eq .Values.cloud.metrics.enabled true -}}
{{- if or (empty .Values.cloud.metrics.endpoint) (or (empty .Values.cloud.metrics.username) (empty .Values.cloud.metrics.password)) -}} {{- if empty .Values.cloud.metrics.secret -}}
{{- fail "if cloud.metrics is enabled then the endpoint, username and password have to be filled in" -}} {{- fail "if cloud.metrics is enabled then the secret has to be filled in" -}}
{{- end -}} {{- end -}}
{{- end -}} {{- end -}}
{{- if eq .Values.cloud.traces.enabled true -}} {{- if eq .Values.cloud.traces.enabled true -}}
{{- if or (empty .Values.cloud.traces.endpoint) (or (empty .Values.cloud.traces.username) (empty .Values.cloud.traces.password)) -}} {{- if empty .Values.cloud.traces.secret -}}
{{- fail "if cloud.traces is enabled then the endpoint, username and password have to be filled in" -}} {{- fail "if cloud.traces is enabled then the secret has to be filled in" -}}
{{- end -}} {{- end -}}
{{- end -}} {{- end -}}
@@ -37,3 +37,7 @@
{{- if empty .Values.namespacesToMonitor -}} {{- if empty .Values.namespacesToMonitor -}}
{{- fail "No namespaces have been specified in namespacesToMonitor" -}} {{- fail "No namespaces have been specified in namespacesToMonitor" -}}
{{- end -}} {{- end -}}
{{- if empty .Values.metrics.retain -}}
{{- fail "All metrics will be collected, please specify some in metrics.retain" -}}
{{- end -}}

View File

@@ -1,10 +1,27 @@
# Specify the namespaces to monitor here
namespacesToMonitor: namespacesToMonitor:
- loki - loki
- mimir - mimir
- tempo - tempo
clusterName: "meta-monitoring" # TODO check if this can be derived # The name of the cluster where this will be installed
clusterLabelValue: "meta-monitoring"
# Set to true to write logs, metrics or traces to Grafana Cloud
cloud:
logs:
enabled: true
secret: "logs"
metrics:
enabled: true
secret: "metrics"
traces:
enabled: true
secret: "traces"
# Set to true for a local version of logs, metrics or traces
local: local:
grafana:
enabled: false
logs: logs:
enabled: false enabled: false
metrics: metrics:
@@ -14,32 +31,132 @@ local:
minio: minio:
enabled: false # This should be set to true if any of the previous is enabled enabled: false # This should be set to true if any of the previous is enabled
cloud: grafana:
logs: # Gateway ingress configuration
ingress:
# -- Specifies whether an ingress for the gateway should be created
enabled: true enabled: true
endpoint: # -- Ingress Class Name. MAY be required for Kubernetes versions >= 1.18
username: ingressClassName: ""
password: # -- Annotations for the gateway ingress
metrics: annotations: { }
enabled: true # -- Labels for the gateway ingress
endpoint: labels: { }
username: # -- Hosts configuration for the gateway ingress, passed through the `tpl` function to allow templating
password: hosts:
traces: - host: monitoring.example.com
enabled: true paths:
endpoint: - path: /
username: # -- pathType (e.g. ImplementationSpecific, Prefix, .. etc.) might also be required by some Ingress Controllers
password: # pathType: Prefix
# -- TLS configuration for the gateway ingress. Hosts passed through the `tpl` function to allow templating
#tls:
# - secretName: grafana-tls
# hosts:
# - monitoring.example.com
# Adding regexes here will add a stage.replace block. For more information see
# https://grafana.com/docs/agent/latest/flow/reference/components/loki.process/#stagereplace-block
logs: logs:
# Adding regexes here will add a stage.replace block for logs. For more information see
# https://grafana.com/docs/agent/latest/flow/reference/components/loki.process/#stagereplace-block
piiRegexes: piiRegexes:
# This example replaces the word after password with ***** # This example replaces the word after password with *****
# - expression: "password (\\\\S+)" # - expression: "password (\\\\S+)"
# source: "" # Empty uses the log message # source: "" # Empty uses the log message
# replace: "*****"" # replace: "*****""
# The lines matching these will be kept in Loki
retain:
# This shows the queries
- caller=metrics.go
# This shows any errors
- level=error
# This shows the ingest requests and is very noisy. Uncomment to include.
# - caller=push.go
# Log lines for delete requests
- delete request for user added
- Started processing delete request
- delete request for user marked as processed
metrics:
# The list of metrics to retain for logging dashboards
retain:
- agent_config_last_load_success_timestamp_seconds
- agent_config_last_load_successful
- agent_config_load_failures_total
- container_cpu_usage_seconds_total
- container_fs_writes_bytes_total
- container_memory_working_set_bytes
- container_network_receive_bytes_total
- container_network_transmit_bytes_total
- container_spec_cpu_period
- container_spec_cpu_quota
- container_spec_memory_limit_bytes
- cortex_ingester_flush_queue_length
- go_gc_duration_seconds
- go_goroutines
- go_memstats_heap_inuse_bytes
- kubelet_volume_stats_used_bytes
- kubelet_volume_stats_capacity_bytes
- kube_persistentvolumeclaim_labels
- kube_pod_container_resource_requests
- kube_pod_container_status_last_terminated_reason
- kube_pod_container_status_restarts_total
- loki_boltdb_shipper_compact_tables_operation_duration_seconds
- loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_seconds
- loki_boltdb_shipper_retention_marker_count_total
- loki_boltdb_shipper_retention_marker_table_processed_duration_seconds_bucket
- loki_boltdb_shipper_retention_marker_table_processed_duration_seconds_count
- loki_boltdb_shipper_retention_marker_table_processed_duration_seconds_sum
- loki_boltdb_shipper_retention_marker_table_processed_total
- loki_boltdb_shipper_request_duration_seconds_bucket
- loki_boltdb_shipper_request_duration_seconds_count
- loki_boltdb_shipper_retention_sweeper_chunk_deleted_duration_seconds_count
- loki_boltdb_shipper_retention_sweeper_chunk_deleted_duration_seconds_sum
- loki_boltdb_shipper_retention_sweeper_marker_files_current
- loki_boltdb_shipper_retention_sweeper_marker_file_processing_current_time
- loki_build_info
- loki_chunk_store_index_entries_per_chunk_count
- loki_chunk_store_index_entries_per_chunk_sum
- loki_compactor_delete_requests_processed_total
- loki_compactor_delete_requests_received_total
- loki_compactor_deleted_lines
- loki_compactor_oldest_pending_delete_request_age_seconds
- loki_compactor_pending_delete_requests_count
- loki_distributor_lines_received_total
- loki_ingester_chunk_age_seconds_bucket
- loki_ingester_chunk_age_seconds_count
- loki_ingester_chunk_age_seconds_sum
- loki_ingester_chunk_bounds_hours_bucket
- loki_ingester_chunk_bounds_hours_count
- loki_ingester_chunk_bounds_hours_sum
- loki_ingester_chunk_entries_bucket
- loki_ingester_chunk_entries_count
- loki_ingester_chunk_entries_sum
- loki_ingester_chunk_size_bytes_bucket
- loki_ingester_chunk_utilization_bucket
- loki_ingester_chunk_utilization_sum
- loki_ingester_chunks_flushed_total
- loki_ingester_memory_chunks
- loki_ingester_memory_streams
- loki_request_duration_seconds_count
- loki_ruler_wal_appender_ready
- loki_ruler_wal_disk_size
- loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds
- loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds
- loki_ruler_wal_prometheus_remote_storage_samples_pending
- loki_ruler_wal_prometheus_remote_storage_samples_total
- loki_ruler_wal_samples_appended_total
- loki_ruler_wal_storage_created_series_total
- loki_write_batch_retries_total
- loki_write_dropped_bytes_total
- loki_write_dropped_entries_total
- loki_write_sent_bytes_total
- loki_write_sent_entries_total
- node_disk_read_bytes_total
- node_disk_written_bytes_total
- promtail_custom_bad_words_total
# Set enabled = true to add the default logs/metrics/traces dashboards to the local Grafana # Set enabled = true to add the default logs/metrics/traces dashboards to the local Grafana
dashboards: dashboards:
logs: logs:
@@ -63,7 +180,7 @@ kubeStateMetrics:
endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080 endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080
# The following are configuration for the dependencies. # The following are configuration for the dependencies.
# These should not be changed. # These should usually not be changed.
loki: loki:
loki: loki:
@@ -71,13 +188,22 @@ loki:
storage: storage:
type: "s3" type: "s3"
s3: s3:
endpoint: "meta-minio.meta.svc:9000"
access_key_id: rootuser
secret_access_key: rootpassword
insecure: true insecure: true
s3ForcePathStyle: true
bucketNames: bucketNames:
chunks: loki-chunks chunks: loki-chunks
ruler: loki-ruler ruler: loki-ruler
structuredConfig:
common:
storage:
s3:
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
compactor:
retention_enabled: true
limits_config:
retention_period: 30d
monitoring: monitoring:
dashboards: dashboards:
enabled: false enabled: false
@@ -96,10 +222,26 @@ loki:
grafana-agent: grafana-agent:
agent: agent:
clustering:
enabled: true
configMap: configMap:
create: false create: false
name: "agent-configmap" name: "agent-configmap"
key: 'config.river' key: 'config.river'
resources:
requests:
cpu: '1000m'
memory: '600Mi'
limits:
memory: '4Gi'
controller:
type: "statefulset"
autoscaling:
enabled: true
minReplicas: 3
maxReplicas: 30
targetMemoryUtilizationPercentage: 90
targetCPUUtilizationPercentage: 90
mimir-distributed: mimir-distributed:
minio: minio:
@@ -128,6 +270,8 @@ mimir-distributed:
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000" endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}" secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true insecure: true
limits:
compactor_blocks_retention_period: 30d
tempo-distributed: tempo-distributed:
tempo: tempo:
@@ -141,6 +285,9 @@ tempo-distributed:
access_key: "{{ .Values.global.minio.rootUser }}" access_key: "{{ .Values.global.minio.rootUser }}"
secret_key: "{{ .Values.global.minio.rootPassword }}" secret_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true insecure: true
compactor:
compaction:
block_retention: 30d
traces: traces:
otlp: otlp:
http: http:
@@ -175,4 +322,4 @@ minio:
cpu: 100m cpu: 100m
memory: 128Mi memory: 128Mi
# Changed the mc config path to '/tmp' from '/etc' as '/etc' is only writable by root and OpenShift will not permit this. # Changed the mc config path to '/tmp' from '/etc' as '/etc' is only writable by root and OpenShift will not permit this.
configPathmc: "/tmp/minio/mc/" configPathmc: "/tmp/minio/mc/"

View File

@@ -6,7 +6,26 @@
kubectl create namespace meta kubectl create namespace meta
``` ```
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). 1. Create secrets with credentials and the endpoint when sending logs, metrics or traces to Grafana Cloud.
```
kubectl create secret generic logs -n meta \
--from-literal=username=<logs username> \
--from-literal=password=<logs password>
--from-literal=endpoint='https://logs-prod-us-central1.grafana.net/loki/api/v1/push'
kubectl create secret generic metrics -n meta \
--from-literal=username=<metrics username> \
--from-literal=password=<metrics password>
--from-literal=endpoint='https://prometheus-us-central1.grafana.net/api/prom/push'
kubectl create secret generic traces -n meta \
--from-literal=username=<traces username> \
--from-literal=password=<traces password>
--from-literal=endpoint='https://tempo-us-central1.grafana.net/tempo'
```
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Fill in the names of the secrets created above as needed.
1. Install this helm chart 1. Install this helm chart

9
tools/kind.config Normal file
View File

@@ -0,0 +1,9 @@
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
name: meta
nodes:
- role: control-plane
- role: worker
- role: worker
- role: worker