Compare commits

...

108 Commits

Author SHA1 Message Date
Michel Hollands
291f680c16 Use shorter name for cluster
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 08:46:45 +01:00
Michel Hollands
1be9bc8d0a Merge pull request #118 from grafana/fix_dashboards
Fix dashboards a bit more
2024-05-13 17:04:45 +01:00
Michel Hollands
81d63a4383 Fix CPU usage of ssd querier
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 16:59:05 +01:00
Michel Hollands
333ba3a3fd Add cluster to kube state metrics
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 16:58:07 +01:00
Michel Hollands
7aa091cbf8 Merge pull request #117 from grafana/fix_dashboards
Fix dashboards
2024-05-13 14:48:45 +01:00
Michel Hollands
d309a5bc50 Fix mistakes
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:44:08 +01:00
Michel Hollands
346dd4968e Make reads-resources work for all 3 deployment modes
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:36:53 +01:00
Michel Hollands
f5c9fa0593 Update operation so it works with all types of deployment
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:07:59 +01:00
Michel Hollands
d5e8df856d Update writes dashboard work with all types
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:06:41 +01:00
Michel Hollands
2d85e7e120 Update dashboards so they work with single binary
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 10:56:35 +01:00
Michel Hollands
1a4a1ad885 Fix ruler panel
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 10:17:55 +01:00
Michel Hollands
c1ff364c29 Add missing metric in reads dashboard
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:45:37 +01:00
Michel Hollands
bd0ef0e2cc Add missing values
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:21:05 +01:00
Michel Hollands
0216163885 Add chunk reason
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:11:20 +01:00
Michel Hollands
c42718649f Fix distributor memory panel
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:03:02 +01:00
Michel Hollands
650df8217a Merge pull request #116 from grafana/fix_loki_write_endpoint
Fix local write end point
2024-05-13 08:24:18 +01:00
Michel Hollands
f7946ff713 Fix local write end point
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-12 14:32:39 +01:00
Michel Hollands
b312fc37fc Merge pull request #115 from grafana/fix_traces_forwarding
Fix local tracing pipeline
2024-05-10 15:44:03 +01:00
Michel Hollands
ad96f09600 Fix tracing pipeline
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-10 15:36:05 +01:00
Michel Hollands
090f1ef91a Merge pull request #113 from grafana/change_default_sampling_type
Suggest ratelimiting sample rate for Loki traces
2024-05-09 17:10:24 +01:00
Michel Hollands
b2957d90f0 Merge pull request #112 from grafana/update_ingress_documentation
Add docs regarding the Ingress
2024-05-09 17:10:06 +01:00
Michel Hollands
f8aea814c5 Suggest ratelimiting sample rate for Loki traces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-09 16:46:43 +01:00
Michel Hollands
91c19f07d3 Set default value for host again
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-09 16:35:46 +01:00
Michel Hollands
315b203082 Reference cloud provider docs
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-09 16:34:12 +01:00
Michel Hollands
caf4eda1be Merge pull request #111 from grafana/create_new_version
Update version
2024-05-09 09:40:54 +01:00
Michel Hollands
21ba3ebe8c Update version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-09 09:35:10 +01:00
Michel Hollands
f0a934a393 Merge pull request #109 from grafana/add_more_metrics
Add the Alloy dashboards instead of the Agent ones
2024-05-09 09:26:13 +01:00
Michel Hollands
941420b417 Merge pull request #110 from grafana/chore/update-dependencies
[dependency] Update the subcharts
2024-05-09 09:25:27 +01:00
MichelHollands
1ea10cdbfa Update dependencies 2024-05-09 07:03:08 +00:00
Michel Hollands
b99d816057 Add Alloy dashboards and metrics
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 15:59:22 +01:00
Michel Hollands
f89a6816a8 Scrape more metrics from more places
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 13:06:03 +01:00
Michel Hollands
890137e7b3 Merge pull request #108 from grafana/fix_rules
Add groups to loki-rules so they are parsed correctly
2024-05-08 11:08:42 +01:00
Michel Hollands
75395ba196 Add groups to loki-rules so they are parsed correctly
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 11:08:20 +01:00
Michel Hollands
7e3145e2eb Merge pull request #107 from grafana/remove_rules_files_mimir_tempo
Remove unused ruler files
2024-05-08 10:48:27 +01:00
Michel Hollands
232777d71a Remove unused ruler files
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 10:47:52 +01:00
Michel Hollands
d9a4d4a964 Merge pull request #106 from grafana/split_up_grafana_template
Split up Grafana yaml
2024-05-08 10:39:29 +01:00
Michel Hollands
57adbf43e2 Split up Grafana yaml
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 10:39:02 +01:00
Michel Hollands
add43ae974 Merge pull request #105 from grafana/remove_redundant_variables
Remove unused variables
2024-05-08 10:25:37 +01:00
Michel Hollands
52ec526718 Remove unused variables
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 10:24:48 +01:00
Michel Hollands
8a5ed559a2 Merge pull request #104 from grafana/fix_dependency_check
Fix name and indentation of workflow
2024-05-08 09:49:07 +01:00
Michel Hollands
188cd7e56f Fix name and indentation of workflow
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 09:46:42 +01:00
Michel Hollands
9e4dbcd44a Merge pull request #100 from grafana/combine_ci
Combine dependency updates
2024-05-08 09:40:07 +01:00
Michel Hollands
28daa27fca Merge pull request #99 from grafana/chore/update-minio
[dependency] Update the Grafana version
2024-05-08 09:38:26 +01:00
Michel Hollands
2de595baf4 Merge branch 'main' into chore/update-minio 2024-05-08 09:37:45 +01:00
Michel Hollands
95257b66d3 Merge pull request #103 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-08 09:36:02 +01:00
Michel Hollands
e9b0e57ef0 Merge pull request #95 from grafana/update_grafana
Add CI action to update Grafana version
2024-05-08 09:35:29 +01:00
Michel Hollands
03609ebb35 Merge pull request #102 from grafana/fix_alloy_config_for_traces
Fix the alloy config
2024-05-08 09:34:53 +01:00
MichelHollands
7e38d19814 Update Tempo Distributed 2024-05-08 07:03:26 +00:00
Michel Hollands
32272298d7 Fix the alloy config
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 16:35:00 +01:00
Michel Hollands
3879207e05 Merge pull request #101 from grafana/fix_minio_secret_name
Fix secret name
2024-05-07 14:40:52 +01:00
Michel Hollands
cd42da2197 Fix secret name
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 14:39:20 +01:00
Michel Hollands
56cab04af8 Merge pull request #92 from grafana/use_secret_for_minio
Use a secret for the Minio access
2024-05-07 12:37:07 +01:00
Michel Hollands
c6d0444dfa Combine dependency updates
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 11:26:32 +01:00
Michel Hollands
b99140d3f4 Merge pull request #97 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-07 11:00:53 +01:00
MichelHollands
749e271455 Update Tempo Distributed 2024-05-07 09:22:20 +00:00
MichelHollands
d938dbbfe5 Update Grafana version 2024-05-07 09:22:19 +00:00
Michel Hollands
e9125d1a9c Add corrected key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:21:42 +01:00
Michel Hollands
076685ef06 Revert key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:18:55 +01:00
Michel Hollands
b0451d626e Use $. in yaml key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:16:10 +01:00
Michel Hollands
90e949e89a Change version param
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:14:50 +01:00
Michel Hollands
06e176e720 Trim the v prefix from the released version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:11:17 +01:00
Michel Hollands
d4c886ba9d Use token from env
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:00:55 +01:00
Michel Hollands
643e73f5f1 add token
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:54:50 +01:00
Michel Hollands
7e65f3d9c9 Fix sourceid
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:46:31 +01:00
Michel Hollands
de91b4dac7 Merge pull request #96 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-07 09:22:43 +01:00
MichelHollands
9f6e52d7a1 Update Tempo Distributed 2024-05-07 08:22:14 +00:00
Michel Hollands
26e0ad0b85 Add CI action to update Grafana version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:20:51 +01:00
Michel Hollands
025bb5b0c3 Merge pull request #94 from grafana/chore/update-loki
[dependency] Update the Loki subchart
2024-05-07 08:51:44 +01:00
MichelHollands
0b31eae425 Update loki 2024-05-07 07:02:51 +00:00
Michel Hollands
ab42a96949 Update installation instructions
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:29:33 +01:00
Michel Hollands
386ff25fca Use the secret in the ruler for the dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:18:44 +01:00
Michel Hollands
c6889131a7 Use structuredConfig correctly
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:12:48 +01:00
Michel Hollands
2739bae0c0 Use correct variables
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 15:40:36 +01:00
Michel Hollands
cea8076b75 Start using a secret
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 15:38:07 +01:00
Michel Hollands
29b831ca00 Merge pull request #91 from grafana/set_step_time_to_1_minute_in_dashboards
Set the interval to 1m to match the scrape interval
2024-05-03 09:58:40 +01:00
Michel Hollands
09cf8f812c Merge pull request #90 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-03 09:58:25 +01:00
Michel Hollands
f8436a8e44 Set the interval to 1m to match the scrape interval
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 09:28:26 +01:00
MichelHollands
2b26abedbb Update Tempo Distributed 2024-05-03 07:02:32 +00:00
Michel Hollands
017c041007 Merge pull request #89 from grafana/add_extra_metrics
Add way to gather extra metrics and logs
2024-05-02 17:15:18 +01:00
Michel Hollands
e7ad1383a6 Add way to gather extra metrics and logs
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 17:14:09 +01:00
Michel Hollands
2906836eae Merge pull request #88 from grafana/more_dashboard_cleanup
Cleanup dashboards
2024-05-02 14:25:50 +01:00
Michel Hollands
c70ef27e48 Set scrape interval to 1m
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 14:24:36 +01:00
Michel Hollands
3c187def47 Fix error in query
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 14:24:09 +01:00
Michel Hollands
54eda36ec3 Cleanup dashboards we won't ship
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 13:30:45 +01:00
Michel Hollands
bc33e5a2a5 Merge pull request #83 from grafana/chore/update-loki
[dependency] Update the Loki subchart
2024-05-02 13:29:03 +01:00
Michel Hollands
31e82bbf16 Merge pull request #87 from grafana/remove_non_loki_dashboards
Remove non Loki and agent charts
2024-05-02 10:49:11 +01:00
Michel Hollands
52c1bf1778 Remove non loki and agent charts
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 10:47:58 +01:00
MichelHollands
2c5c4d8e38 Update loki 2024-05-02 07:02:34 +00:00
Michel Hollands
b6a5a3cfe3 Merge pull request #85 from grafana/open_extra_ports_for_otlp
Enabled traces from Loki
2024-05-01 17:40:35 +01:00
Michel Hollands
a01992194b Use OpenTelemetry token for traces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 17:24:38 +01:00
Michel Hollands
636b654828 Add docs on how to setup Loki
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 17:20:00 +01:00
Michel Hollands
5d553e50f6 Add config for Loki traces to OTEL
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 16:54:51 +01:00
Michel Hollands
3f200115f9 Consider the meta monitoring namespace for logs as well
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 10:16:57 +01:00
Michel Hollands
f0bdf0760d Merge pull request #82 from grafana/fix_correct_version
Update version correctly
2024-04-29 16:44:48 +01:00
Michel Hollands
314b1db19b Update version correctly
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:44:28 +01:00
Michel Hollands
b547784d54 Merge pull request #81 from grafana/update_version
Update Chart to version 0.0.2
2024-04-29 16:40:03 +01:00
Michel Hollands
af4cd1f8c0 Update to version 2
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:39:05 +01:00
Michel Hollands
116119bdc4 Merge pull request #80 from grafana/fix_logic_for_secrets
Fix a few more things
2024-04-29 16:36:24 +01:00
Michel Hollands
df794115f0 Fix a few more things
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:34:21 +01:00
Michel Hollands
c26e509f65 Merge pull request #79 from grafana/use_updated_dashboards2
Cleanup dashboards
2024-04-29 15:17:15 +01:00
Michel Hollands
95f7905e34 Cleanup dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 15:14:43 +01:00
Michel Hollands
ad1b619a33 Merge pull request #78 from grafana/chore/update-minio
[dependency] Update the Minio subchart
2024-04-29 08:35:41 +01:00
MichelHollands
446c0be743 Update minio 2024-04-29 07:02:52 +00:00
Michel Hollands
be7a32de27 Merge pull request #77 from grafana/filter_cadvisor_kubelet_metrics_on_namespace
Only get cadvisor and kubelet metrics from the required namespaces
2024-04-28 14:33:32 +01:00
Michel Hollands
e41b2f360f Merge branch 'main' into filter_cadvisor_kubelet_metrics_on_namespace 2024-04-28 14:33:07 +01:00
Michel Hollands
1cafd696c7 Merge pull request #76 from grafana/add_creation_of_dashboard
Create the mixin locally
2024-04-26 17:27:46 +01:00
Michel Hollands
c614f41d66 Only keep metrics from the monitored namespaces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 16:45:40 +01:00
Michel Hollands
1871a4ef87 Only get cadvisor and kubelet metrics from the required namespaces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 14:15:33 +01:00
88 changed files with 6725 additions and 66211 deletions

View File

@@ -0,0 +1,30 @@
name: Bump grafana version specified in the values.yaml
sources:
latestGrafanaRelease:
name: Get latest grafana release on Github
kind: githubrelease
spec:
owner: grafana
repository: grafana
token: '{{ requiredEnv "UPDATECLI_GITHUB_TOKEN" }}'
versionfilter:
kind: latest
transformers:
- trimprefix: "v"
conditions:
grafanaImagePublished:
name: Ensure the latest Grafana is published on DockerHub
kind: dockerimage
source-id: latestGrafanaRelease
spec:
image: "grafana/grafana"
targets:
grafana:
name: Update Grafana version in values.yaml
kind: helmchart
spec:
file: values.yaml
key: $.grafana.version
name: charts/meta-monitoring
versionincrement: none
sourceid: latestGrafanaRelease

View File

@@ -16,8 +16,8 @@ env:
UPDATECLI_GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
jobs:
updateLoki:
name: Update the Loki subchart
updateVersions:
name: Update the subcharts
runs-on: "ubuntu-latest"
steps:
- name: Checkout
@@ -26,7 +26,7 @@ jobs:
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Loki
id: update-loki
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/loki.yaml
@@ -34,31 +34,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-loki.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Loki subchart"
body: "Updates the Loki subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update loki
labels: dependencies
branch: chore/update-loki
delete-branch: true
updateGrafanaAlloy:
name: Update the Grafana Alloy subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Alloy
id: update-grafana-alloy
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/alloy.yaml
@@ -66,31 +42,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-grafana-alloy.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Grafana Alloy subchart"
body: "Updates the Grafana Alloy subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Grafana Alloy
labels: dependencies
branch: chore/update-grafana-alloy
delete-branch: true
updateMimirDistributed:
name: Update the Mimir Distributed subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Mimir
id: update-mimir-distributed
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/mimir-distributed.yaml
@@ -98,31 +50,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-mimir-distributed.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Mimir Distributed subchart"
body: "Updates the Mimir Distributed subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Mimir Distributed
labels: dependencies
branch: chore/update-mimir-distributed
delete-branch: true
updateTempoDistributed:
name: Update the Tempo Distributed subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Tempo
id: update-tempo-distributed
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/tempo-distributed.yaml
@@ -130,31 +58,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-tempo-distributed.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Tempo Distributed subchart"
body: "Updates the tempo Distributed subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Tempo Distributed
labels: dependencies
branch: chore/update-tempo-distributed
delete-branch: true
updateMinio:
name: Update the Minio subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Minio
id: update-minio
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/minio.yaml
@@ -163,15 +67,47 @@ jobs:
fi
- name: Create pull request
if: steps.update-minio.outputs.changed == 'true'
if: steps.update-loki.outputs.changed == 'true' || steps.update-grafana-alloy.outputs.changed == 'true' || steps.update-mimir-distributed.outputs.changed == 'true' || steps.update-tempo-distributed.outputs.changed == 'true' || steps.update-minio.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Minio subchart"
body: "Updates the Minio subchart"
title: "[dependency] Update the subcharts"
body: "Updates the subcharts"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update minio
commit-message: Update dependencies
labels: dependencies
branch: chore/update-dependencies
delete-branch: true
updateGrafana:
name: Update the Grafana version
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
id: update-grafana
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/grafana.yaml
if ! git diff --exit-code > /dev/null; then
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-grafana.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Grafana version"
body: "Updates the Grafana version"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Grafana version
labels: dependencies
branch: chore/update-minio
delete-branch: true

2
.gitignore vendored
View File

@@ -1 +1 @@
production/
.DS_Store

View File

@@ -8,15 +8,3 @@ help:
helm-lint: ## Run helm linter
$(MAKE) -BC charts/meta-monitoring lint
MIXIN_PATH := production/loki-mixin
MIXIN_OUT_PATH_META_MONITORING := production/loki-mixin-compiled-meta-monitoring
mixin: ## Create our version of the mixin
@rm -rf $(MIXIN_PATH)
./scripts/clone_loki_mixin.sh
@rm -rf $(MIXIN_OUT_PATH_META_MONITORING) && mkdir $(MIXIN_OUT_PATH_META_MONITORING)
@cd $(MIXIN_PATH) && jb install
@mixtool generate all --output-alerts $(MIXIN_OUT_PATH_META_MONITORING)/alerts.yaml --output-rules $(MIXIN_OUT_PATH_META_MONITORING)/rules.yaml --directory $(MIXIN_OUT_PATH_META_MONITORING)/dashboards ${MIXIN_PATH}/mixin-meta-monitoring.libsonnet
@cp $(MIXIN_OUT_PATH_META_MONITORING)/dashboards/* charts/meta-monitoring/src/dashboards
@cp $(MIXIN_OUT_PATH_META_MONITORING)/rules.yaml charts/meta-monitoring/src/rules/loki-rules.yaml

View File

@@ -19,7 +19,7 @@ In the cloud mode the logs, metrics and/or traces are sent to Grafana Cloud.
To enable cloud mode set `cloud.<logs|metrics|traces>.enabled` to true. The `endpoint`, `username` and `password` settings for your Grafana Cloud logs, metrics and traces instances have to be filled in as well.
Both modes can be enabled at the same time.
Both modes can be enabled at the same time. Cloud mode is preferred.
## Installation
@@ -33,8 +33,6 @@ For more instructions including how to update the chart go to the [installation]
- Specify PII regexes that are applied to logs before they are sent to Loki (cloud or local). The capture group in the regex is replaced with *****.
- a Grafana instance is installed (when local mode is used) with the relevant datasources installed. The following dashboards are installed:
- logs dashboards
- metrics dashboards
- traces dashboards
- agent dashboards
- Retention is set to 24 hours

View File

@@ -1,7 +1,7 @@
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.3.4
version: 6.5.1
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.1.1
@@ -10,9 +10,9 @@ dependencies:
version: 5.3.0
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.4
version: 1.9.9
- name: minio
repository: https://charts.min.io
version: 5.1.0
digest: sha256:4bb2a4f62c9ebddcd64c28a94126ab3f07d319b028ea7c17ffbdf28d86b3be61
generated: "2024-04-25T07:02:28.663945601Z"
version: 5.2.0
digest: sha256:e0c7af6d328fe35f4b9a3557235f458d92225b84b1366dbb77c4626d3cdb5be9
generated: "2024-05-09T07:02:42.911579524Z"

View File

@@ -13,7 +13,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1
version: 0.0.3
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
@@ -22,7 +22,7 @@ appVersion: "0.0.1"
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.3.4
version: 6.5.1
condition: local.logs.enabled
- name: alloy
repository: https://grafana.github.io/helm-charts
@@ -33,9 +33,9 @@ dependencies:
condition: local.metrics.enabled
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.4
version: 1.9.9
condition: local.traces.enabled
- name: minio
repository: https://charts.min.io
version: 5.1.0
version: 5.2.0
condition: local.minio.enabled

Binary file not shown.

Binary file not shown.

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,786 +0,0 @@
{
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"links": [ ],
"refresh": "30s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Count",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #A",
"thresholds": [ ],
"type": "hidden",
"unit": "short"
},
{
"alias": "Uptime",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value #B",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Container",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "container",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Pod",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "pod",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Version",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "version",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [ ],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "count by (pod, container, version) (agent_build_info{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A",
"step": 10
},
{
"expr": "max by (pod, container) (time() - process_start_time_seconds{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "B",
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Agent Stats",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Agent Stats",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(prometheus_target_sync_length_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])) by (pod, scrape_job) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}/{{scrape_job}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Target Sync",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (pod) (prometheus_sd_discovered_targets{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Targets",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Prometheus Discovery",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "rate(prometheus_target_interval_length_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])\n/\nrate(prometheus_target_interval_length_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m])\n* 1e3\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} {{interval}} configured",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Average Scrape Interval Duration",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_exceeded_sample_limit_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "exceeded sample limit: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_duplicate_timestamp_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "duplicate timestamp: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_out_of_bounds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "out of bounds: {{job}}",
"legendLink": null,
"step": 10
},
{
"expr": "sum by (job) (rate(prometheus_target_scrapes_sample_out_of_order_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[1m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "out of order: {{job}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Scrape failures",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by (job, instance_group_name) (rate(agent_wal_samples_appended_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"$container\"}[5m]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{job}} {{instance_group_name}}",
"legendLink": null,
"step": 10
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Appended Samples",
"tooltip": {
"shared": true,
"sort": 2,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Prometheus Retrieval",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"grafana-agent-mixin"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(agent_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(agent_build_info, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "container",
"multi": true,
"name": "container",
"options": [ ],
"query": "label_values(agent_build_info, container)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": "grafana-agent-.*",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "pod",
"multi": true,
"name": "pod",
"options": [ ],
"query": "label_values(agent_build_info{container=~\"$container\"}, pod)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "",
"title": "Agent",
"uid": "",
"version": 0
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,540 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
},
{
"datasource": "$loki_datasource",
"enable": true,
"expr": "{cluster=\"$cluster\", container=\"kube-diff-logger\"} | json | namespace_extracted=\"alloy\" | name_extracted=~\"alloy.*\"",
"iconColor": "rgba(0, 211, 255, 1)",
"instant": false,
"name": "Deployments",
"titleFormat": "{{cluster}}/{{namespace}}"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": 27,
"links": [
{
"icon": "doc",
"targetBlank": true,
"title": "Documentation",
"tooltip": "Clustering documentation",
"type": "link",
"url": "https://grafana.com/docs/alloy/latest/reference/cli/run/#clustered-mode"
},
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"alloy-mixin"
],
"targetBlank": false,
"title": "Dashboards",
"type": "dashboards"
}
],
"panels": [
{
"datasource": "${datasource}",
"fieldConfig": {
"defaults": {
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 8,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showPercentChange": false,
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "count(cluster_node_info{cluster=\"$cluster\", namespace=\"$namespace\"})",
"instant": true,
"legendFormat": "__auto",
"range": false,
"refId": "A"
}
],
"title": "Nodes",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "Nodes info.\n",
"fieldConfig": {
"defaults": {
"custom": {
"align": "auto",
"cellOptions": {
"type": "auto"
},
"inspect": false
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Dashboard"
},
"properties": [
{
"id": "mappings",
"value": [
{
"options": {
"1": {
"index": 0,
"text": "Link"
}
},
"type": "value"
}
]
},
{
"id": "links",
"value": [
{
"targetBlank": false,
"title": "Detail dashboard for node",
"url": "/d/4047e755d822da63c8158cde32ae4dce/alloy-cluster-node?var-instance=${__data.fields.instance}&var-datasource=${datasource}&var-loki_datasource=${loki_datasource}&var-cluster=${cluster}&var-namespace=${namespace}"
}
]
}
]
}
]
},
"gridPos": {
"h": 9,
"w": 16,
"x": 8,
"y": 0
},
"id": 2,
"options": {
"cellHeight": "sm",
"footer": {
"countRows": false,
"fields": "",
"reducer": [
"sum"
],
"show": false
},
"showHeader": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "cluster_node_info{cluster=\"$cluster\", namespace=\"$namespace\"}",
"format": "table",
"instant": true,
"legendFormat": "__auto",
"range": false,
"refId": "A"
}
],
"title": "Node table",
"transformations": [
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true,
"Value": false,
"__name__": true,
"cluster": true,
"namespace": true,
"state": false
},
"indexByName": {},
"renameByName": {
"Value": "Dashboard",
"instance": "",
"state": ""
}
}
}
],
"type": "table"
},
{
"datasource": "${datasource}",
"description": "Whether the cluster state has converged.\n\nIt is normal for the cluster state to be diverged briefly as gossip events propagate. It is not normal for the cluster state to be diverged for a long period of time.\n\nThis will show one of the following:\n\n* Converged: Nodes are aware of all other nodes, with the correct states.\n* Not converged: A subset of nodes aren't aware of their peers, or don't have an updated view of peer states.\n",
"fieldConfig": {
"defaults": {
"mappings": [
{
"options": {
"1": {
"color": "red",
"index": 1,
"text": "Not converged"
}
},
"type": "value"
},
{
"options": {
"match": "null",
"result": {
"color": "green",
"index": 0,
"text": "Converged"
}
},
"type": "special"
}
],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "suffix:nodes"
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 8,
"x": 0,
"y": 9
},
"id": 3,
"options": {
"colorMode": "background",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showPercentChange": false,
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "clamp((\n sum(stddev by (state) (cluster_node_peers{cluster=\"$cluster\", namespace=\"$namespace\"}) != 0) or\n (sum(abs(sum without (state) (cluster_node_peers{cluster=\"$cluster\", namespace=\"$namespace\"})) - scalar(count(cluster_node_info{cluster=\"$cluster\", namespace=\"$namespace\"})) != 0))\n ),\n 1, 1\n)\n",
"format": "time_series",
"instant": true,
"legendFormat": "__auto",
"range": false,
"refId": "A"
}
],
"title": "Convergance state",
"type": "stat"
},
{
"datasource": "${datasource}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "continuous-GrYlRd"
},
"custom": {
"fillOpacity": 80,
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineWidth": 0,
"spanNulls": true
},
"mappings": [
{
"options": {
"0": {
"color": "green",
"text": "Yes"
}
},
"type": "value"
},
{
"options": {
"1": {
"color": "red",
"text": "No"
}
},
"type": "value"
}
],
"max": 1,
"noValue": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 9,
"w": 16,
"x": 8,
"y": 9
},
"id": 4,
"options": {
"alignValue": "left",
"legend": {
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"mergeValues": true,
"rowHeight": 0.9,
"showValue": "auto",
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "ceil(clamp((\n sum(stddev by (state) (cluster_node_peers{cluster=\"$cluster\", namespace=\"$namespace\"})) or\n (sum(abs(sum without (state) (cluster_node_peers{cluster=\"$cluster\", namespace=\"$namespace\"})) - scalar(count(cluster_node_info{cluster=\"$cluster\", namespace=\"$namespace\"}))))\n ),\n 0, 1\n))\n",
"instant": false,
"legendFormat": "Converged",
"range": true,
"refId": "A"
}
],
"title": "Convergance state timeline",
"type": "state-timeline"
}
],
"refresh": "10s",
"schemaVersion": 39,
"tags": [
"alloy-mixin"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Mimir",
"value": "mimir_ds"
},
"hide": 0,
"includeAll": false,
"label": "Data Source",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "Loki",
"value": "loki_ds"
},
"hide": 0,
"includeAll": false,
"label": "Loki Data Source",
"multi": false,
"name": "loki_datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components, cluster)\n",
"refId": "cluster"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\"}, namespace)\n",
"refId": "namespace"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d",
"90d"
]
},
"timezone": "",
"title": "Alloy / Cluster Overview",
"uid": "",
"version": 0,
"weekStart": ""
}

View File

@@ -0,0 +1,970 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
},
{
"datasource": "$loki_datasource",
"enable": true,
"expr": "{cluster=\"$cluster\", container=\"kube-diff-logger\"} | json | namespace_extracted=\"alloy\" | name_extracted=~\"alloy.*\"",
"iconColor": "rgba(0, 211, 255, 1)",
"instant": false,
"name": "Deployments",
"titleFormat": "{{cluster}}/{{namespace}}"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": 28,
"links": [
{
"icon": "doc",
"targetBlank": true,
"title": "Documentation",
"tooltip": "Component controller documentation",
"type": "link",
"url": "https://grafana.com/docs/alloy/latest/concepts/component_controller/"
},
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"alloy-mixin"
],
"targetBlank": false,
"title": "Dashboards",
"type": "dashboards"
}
],
"panels": [
{
"datasource": "${datasource}",
"description": "The number of Alloy instances whose metrics are being sent and reported.\n",
"fieldConfig": {
"defaults": {
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "instances"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 10,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"colorMode": "none",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showPercentChange": false,
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "count(alloy_component_controller_evaluating{cluster=\"$cluster\", namespace=\"$namespace\"})",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Running instances",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "Breakdown of components by health across all running instances.\n\n* Healthy: components have been evaluated completely and are reporting themselves as healthy.\n* Unhealthy: Components either could not be evaluated or are reporting themselves as unhealthy.\n* Unknown: A component has been created but has not yet been started.\n* Exited: A component has exited. It will not return to the running state.\n\nMore information on a component's health state can be retrieved using\nthe Alloy UI.\n\nNote that components may be in a degraded state even if they report\nthemselves as healthy. Use component-specific dashboards and alerts\nto observe detailed information about the behavior of a component.\n",
"fieldConfig": {
"defaults": {
"mappings": [],
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
}
]
}
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Unhealthy"
},
"properties": [
{
"id": "thresholds",
"value": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Unknown"
},
"properties": [
{
"id": "thresholds",
"value": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "blue",
"value": 1
}
]
}
}
]
},
{
"matcher": {
"id": "byName",
"options": "Exited"
},
"properties": [
{
"id": "thresholds",
"value": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "orange",
"value": 1
}
]
}
}
]
}
]
},
"gridPos": {
"h": 12,
"w": 14,
"x": 10,
"y": 0
},
"id": 4,
"options": {
"displayMode": "gradient",
"maxVizHeight": 300,
"minVizHeight": 16,
"minVizWidth": 8,
"namePlacement": "auto",
"orientation": "vertical",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showUnfilled": true,
"sizing": "auto",
"valueMode": "color"
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\", health_type=\"healthy\"}) or vector(0)",
"instant": true,
"legendFormat": "Healthy",
"range": false,
"refId": "A"
},
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\", health_type=\"unhealthy\"}) or vector(0)",
"instant": true,
"legendFormat": "Unhealthy",
"range": false,
"refId": "B"
},
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\", health_type=\"unknown\"}) or vector(0)",
"instant": true,
"legendFormat": "Unknown",
"range": false,
"refId": "C"
},
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\", health_type=\"exited\"}) or vector(0)",
"instant": true,
"legendFormat": "Exited",
"range": false,
"refId": "D"
}
],
"title": "Components by health",
"type": "bargauge"
},
{
"datasource": "${datasource}",
"description": "The number of running components across all running instances.\n",
"fieldConfig": {
"defaults": {
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "components"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 10,
"x": 0,
"y": 4
},
"id": 2,
"options": {
"colorMode": "none",
"graphMode": "none",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showPercentChange": false,
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\"})",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Running components",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "The percentage of components which are in a healthy state.\n",
"fieldConfig": {
"defaults": {
"mappings": [],
"max": 1,
"min": 0,
"noValue": "No components",
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percentunit"
},
"overrides": []
},
"gridPos": {
"h": 4,
"w": 10,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"showPercentChange": false,
"text": {
"valueSize": 80
},
"textMode": "auto",
"wideLayout": true
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\",health_type=\"healthy\"}) /\nsum(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\"})\n",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Overall component health",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "The frequency at which components get updated.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "points",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 3,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ops"
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 0,
"y": 12
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "sum by (instance) (rate(alloy_component_evaluation_seconds_count{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))",
"instant": false,
"legendFormat": "__auto",
"range": true,
"refId": "A"
}
],
"title": "Component evaluation rate",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "The percentiles for how long it takes to complete component evaluations.\n\nComponent evaluations must complete for components to have the latest\narguments. The longer the evaluations take, the slower it will be to\nreconcile the state of components.\n\nIf evaluation is taking too long, consider sharding your components to\ndeal with smaller amounts of data and reuse data as much as possible.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "s"
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 8,
"y": 12
},
"id": 6,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "histogram_quantile(0.99, sum(rate(alloy_component_evaluation_seconds{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])))\nor\nhistogram_quantile(0.99, sum by (le) (rate(alloy_component_evaluation_seconds_bucket{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])))\n",
"instant": false,
"legendFormat": "99th percentile",
"range": true,
"refId": "A"
},
{
"datasource": "${datasource}",
"expr": "histogram_quantile(0.50, sum(rate(alloy_component_evaluation_seconds{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])))\nor\nhistogram_quantile(0.50, sum by (le) (rate(alloy_component_evaluation_seconds_bucket{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])))\n",
"instant": false,
"legendFormat": "50th percentile",
"range": true,
"refId": "B"
},
{
"datasource": "${datasource}",
"expr": "(\n histogram_sum(sum(rate(alloy_component_evaluation_seconds{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval]))) /\n histogram_count(sum(rate(alloy_component_evaluation_seconds{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])))\n)\nor\n(\n sum(rate(alloy_component_evaluation_seconds_sum{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval])) /\n sum(rate(alloy_component_evaluation_seconds_count{cluster=\"$cluster\",namespace=\"$namespace\"}[$__rate_interval]))\n)\n",
"instant": false,
"legendFormat": "Average",
"range": true,
"refId": "C"
}
],
"title": "Component evaluation time",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "The percentage of time spent evaluating 'slow' components - components that took longer than 1 minute to evaluate.\n\nIdeally, no component should take more than 1 minute to evaluate. The components displayed in this chart\nmay be a sign of a problem with the pipeline.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percentunit"
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 16,
"y": 12
},
"id": 7,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "sum by (component_path, component_id) (rate(alloy_component_evaluation_slow_seconds{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))\n/ scalar(sum(rate(alloy_component_evaluation_seconds_sum{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval])))\n",
"instant": false,
"legendFormat": "{{component path}} {{component_id}}",
"range": true,
"refId": "A"
}
],
"title": "Slow components evaluation times",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Detailed histogram view of how long component evaluations take.\n\nThe goal is to design your config so that evaluations take as little\ntime as possible; under 100ms is a good goal.\n",
"fieldConfig": {
"defaults": {
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"scaleDistribution": {
"type": "linear"
}
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 0,
"y": 22
},
"id": 8,
"maxDataPoints": 30,
"options": {
"calculate": false,
"cellGap": 0,
"color": {
"exponent": 0.5,
"fill": "dark-orange",
"mode": "scheme",
"reverse": false,
"scale": "exponential",
"scheme": "Spectral",
"steps": 64
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"filterValues": {
"le": 0.1
},
"legend": {
"show": true
},
"rowsFrame": {
"layout": "auto"
},
"tooltip": {
"mode": "single",
"showColorScale": false,
"yHistogram": true
},
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "s"
}
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum(increase(alloy_component_evaluation_seconds{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))\nor ignoring (le)\nsum by (le) (increase(alloy_component_evaluation_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))\n",
"format": "heatmap",
"instant": false,
"legendFormat": "{{le}}",
"range": true,
"refId": "A"
}
],
"title": "Component evaluation histogram",
"type": "heatmap"
},
{
"datasource": "${datasource}",
"description": "Detailed histogram of how long components wait to be evaluated after their dependency is updated.\n\nThe goal is to design your config so that most of the time components do not\nqueue for long; under 10ms is a good goal.\n",
"fieldConfig": {
"defaults": {
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"scaleDistribution": {
"type": "linear"
}
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 8,
"y": 22
},
"id": 9,
"maxDataPoints": 30,
"options": {
"calculate": false,
"cellGap": 0,
"color": {
"exponent": 0.5,
"fill": "dark-orange",
"mode": "scheme",
"reverse": false,
"scale": "exponential",
"scheme": "Spectral",
"steps": 64
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"filterValues": {
"le": 0.1
},
"legend": {
"show": true
},
"rowsFrame": {
"layout": "auto"
},
"tooltip": {
"mode": "single",
"showColorScale": false,
"yHistogram": true
},
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "s"
}
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum(increase(alloy_component_dependencies_wait_seconds{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))\nor ignoring (le)\nsum by (le) (increase(alloy_component_dependencies_wait_seconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\"}[$__rate_interval]))\n",
"format": "heatmap",
"instant": false,
"legendFormat": "{{le}}",
"range": true,
"refId": "A"
}
],
"title": "Component dependency wait histogram",
"type": "heatmap"
}
],
"refresh": "10s",
"schemaVersion": 39,
"tags": [
"alloy-mixin"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Mimir",
"value": "mimir_ds"
},
"hide": 0,
"includeAll": false,
"label": "Data Source",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "Loki",
"value": "loki_ds"
},
"hide": 0,
"includeAll": false,
"label": "Loki Data Source",
"multi": false,
"name": "loki_datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components, cluster)\n",
"refId": "cluster"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\"}, namespace)\n",
"refId": "namespace"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d",
"90d"
]
},
"timezone": "",
"title": "Alloy / Controller",
"uid": "bf9f456aad7108b2c808dbd9973e386f",
"version": 0,
"weekStart": ""
}

View File

@@ -0,0 +1,923 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": 25,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"alloy-mixin"
],
"targetBlank": false,
"title": "Dashboards",
"type": "dashboards"
}
],
"panels": [
{
"datasource": "${datasource}",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 0
},
"id": 1,
"title": "Receivers for traces [otelcol.receiver]",
"type": "row"
},
{
"datasource": "${datasource}",
"description": "Number of spans successfully pushed into the pipeline.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "hue",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 0,
"y": 1
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(receiver_accepted_spans_ratio_total{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{ pod }} / {{ transport }}",
"range": true,
"refId": "A"
}
],
"title": "Accepted spans",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Number of spans that could not be pushed into the pipeline.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "hue",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 8,
"y": 1
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(receiver_refused_spans_ratio_total{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{ pod }} / {{ transport }}",
"range": true,
"refId": "A"
}
],
"title": "Refused spans",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "The duration of inbound RPCs.\n",
"fieldConfig": {
"defaults": {
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"scaleDistribution": {
"type": "linear"
}
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 16,
"y": 1
},
"id": 4,
"maxDataPoints": 30,
"options": {
"calculate": false,
"cellGap": 1,
"color": {
"exponent": 0.5,
"fill": "dark-orange",
"mode": "scheme",
"reverse": false,
"scale": "exponential",
"scheme": "Oranges",
"steps": 65
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"filterValues": {
"le": 1e-9
},
"legend": {
"show": true
},
"rowsFrame": {
"layout": "auto"
},
"tooltip": {
"mode": "single",
"showColorScale": false,
"yHistogram": true
},
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "ms"
}
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum by (le) (increase(rpc_server_duration_milliseconds_bucket{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\", rpc_service=\"opentelemetry.proto.collector.trace.v1.TraceService\"}[$__rate_interval]))",
"format": "heatmap",
"instant": false,
"legendFormat": "{{le}}",
"range": true,
"refId": "A"
}
],
"title": "RPC server duration",
"type": "heatmap"
},
{
"datasource": "${datasource}",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 11
},
"id": 5,
"title": "Batching of logs, metrics, and traces [otelcol.processor.batch]",
"type": "row"
},
{
"datasource": "${datasource}",
"description": "Number of spans, metric datapoints, or log lines in a batch\n",
"fieldConfig": {
"defaults": {
"custom": {
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"scaleDistribution": {
"type": "linear"
}
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 0,
"y": 12
},
"id": 6,
"maxDataPoints": 30,
"options": {
"calculate": false,
"cellGap": 1,
"color": {
"exponent": 0.5,
"fill": "dark-orange",
"mode": "scheme",
"reverse": false,
"scale": "exponential",
"scheme": "Oranges",
"steps": 65
},
"exemplars": {
"color": "rgba(255,0,255,0.7)"
},
"filterValues": {
"le": 1e-9
},
"legend": {
"show": true
},
"rowsFrame": {
"layout": "auto"
},
"tooltip": {
"mode": "single",
"showColorScale": false,
"yHistogram": true
},
"yAxis": {
"axisPlacement": "left",
"reverse": false,
"unit": "short"
}
},
"pluginVersion": "10.4.2",
"targets": [
{
"datasource": "${datasource}",
"expr": "sum by (le) (increase(processor_batch_batch_send_size_ratio_bucket{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval]))",
"format": "heatmap",
"instant": false,
"legendFormat": "{{le}}",
"range": true,
"refId": "A"
}
],
"title": "Number of units in the batch",
"type": "heatmap"
},
{
"datasource": "${datasource}",
"description": "Number of distinct metadata value combinations being processed\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 8,
"y": 12
},
"id": 7,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "processor_batch_metadata_cardinality_ratio{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}\n",
"instant": false,
"legendFormat": "{{ pod }}",
"range": true,
"refId": "A"
}
],
"title": "Distinct metadata values",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Number of times the batch was sent due to a timeout trigger\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 16,
"y": 12
},
"id": 8,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(processor_batch_timeout_trigger_send_ratio_total{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{ pod }}",
"range": true,
"refId": "A"
}
],
"title": "Timeout trigger",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"gridPos": {
"h": 1,
"w": 24,
"x": 0,
"y": 22
},
"id": 9,
"title": "Exporters for traces [otelcol.exporter]",
"type": "row"
},
{
"datasource": "${datasource}",
"description": "Number of spans successfully sent to destination.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "hue",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 0,
"y": 23
},
"id": 10,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(exporter_sent_spans_ratio_total{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{ pod }}",
"range": true,
"refId": "A"
}
],
"title": "Exported sent spans",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Number of spans in failed attempts to send to destination.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 20,
"gradientMode": "hue",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": []
},
"gridPos": {
"h": 10,
"w": 8,
"x": 8,
"y": 23
},
"id": 11,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(exporter_send_failed_spans_ratio_total{cluster=\"$cluster\", namespace=\"$namespace\", instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{ pod }}",
"range": true,
"refId": "A"
}
],
"title": "Exported failed spans",
"type": "timeseries"
}
],
"refresh": "10s",
"schemaVersion": 39,
"tags": [
"alloy-mixin"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Mimir",
"value": "mimir_ds"
},
"hide": 0,
"includeAll": false,
"label": "Data Source",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "Loki",
"value": "loki_ds"
},
"hide": 0,
"includeAll": false,
"label": "Loki Data Source",
"multi": false,
"name": "loki_datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components, cluster)\n",
"refId": "cluster"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\"}, namespace)\n",
"refId": "namespace"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"allValue": ".*",
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": true,
"label": "instance",
"multi": true,
"name": "instance",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\"}, instance)\n",
"refId": "instance"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d",
"90d"
]
},
"timezone": "",
"title": "Alloy / OpenTelemetry",
"uid": "9b6d37c8603e19e8922133984faad93d",
"version": 0,
"weekStart": ""
}

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,840 @@
{
"annotations": {
"list": [
{
"builtIn": 1,
"datasource": {
"type": "grafana",
"uid": "-- Grafana --"
},
"enable": true,
"hide": true,
"iconColor": "rgba(0, 211, 255, 1)",
"name": "Annotations & Alerts",
"type": "dashboard"
},
{
"datasource": "$loki_datasource",
"enable": true,
"expr": "{cluster=\"$cluster\", container=\"kube-diff-logger\"} | json | namespace_extracted=\"alloy\" | name_extracted=~\"alloy.*\"",
"iconColor": "rgba(0, 211, 255, 1)",
"instant": false,
"name": "Deployments",
"titleFormat": "{{cluster}}/{{namespace}}"
}
]
},
"editable": true,
"fiscalYearStartMonth": 0,
"graphTooltip": 1,
"id": 26,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"alloy-mixin"
],
"targetBlank": false,
"title": "Dashboards",
"type": "dashboards"
}
],
"panels": [
{
"datasource": "${datasource}",
"description": "CPU usage of the Alloy process relative to 1 CPU core.\n\nFor example, 100% means using one entire CPU core.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "percentunit"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 0
},
"id": 1,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(alloy_resources_process_cpu_seconds_total{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}[$__rate_interval])",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "CPU usage",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Resident memory size of the Alloy process.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "decbytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 0
},
"id": 2,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "alloy_resources_process_resident_memory_bytes{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Memory (RSS)",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Rate at which the Alloy process performs garbage collections.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "points",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 3,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "ops"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 0,
"y": 8
},
"id": 3,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(go_gc_duration_seconds_count{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}[5m])\nand on(instance)\nalloy_build_info{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}\n",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Garbage collections",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Number of goroutines which are running in parallel. An infinitely\ngrowing number of these indicates a goroutine leak.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "none"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 8,
"y": 8
},
"id": 4,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "go_goroutines{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}\nand on(instance)\nalloy_build_info{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}\n",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Goroutines",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Heap memory currently in use by the Alloy process.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "decbytes"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 8,
"x": 16,
"y": 8
},
"id": 5,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}\nand on(instance)\nalloy_build_info{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}\n",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Memory (heap inuse)",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Rate of data received across all network interfaces for the machine\nAlloy is running on.\n\nData shown here is across all running processes and not exclusive to\nthe running Alloy process.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 30,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "Bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 0,
"y": 16
},
"id": 6,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(alloy_resources_machine_rx_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Network receive bandwidth",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Rate of data sent across all network interfaces for the machine\nAlloy is running on.\n\nData shown here is across all running processes and not exclusive to\nthe running Alloy process.\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisBorderShow": false,
"axisCenteredZero": false,
"axisColorMode": "text",
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 30,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"insertNulls": false,
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "normal"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "Bps"
},
"overrides": []
},
"gridPos": {
"h": 8,
"w": 12,
"x": 12,
"y": 16
},
"id": 7,
"options": {
"legend": {
"calcs": [],
"displayMode": "list",
"placement": "bottom",
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"datasource": "${datasource}",
"expr": "rate(alloy_resources_machine_tx_bytes_total{cluster=\"$cluster\",namespace=\"$namespace\",instance=~\"$instance\"}[$__rate_interval])\n",
"instant": false,
"legendFormat": "{{instance}}",
"range": true,
"refId": "A"
}
],
"title": "Network send bandwidth",
"type": "timeseries"
}
],
"refresh": "10s",
"schemaVersion": 39,
"tags": [
"alloy-mixin"
],
"templating": {
"list": [
{
"current": {
"selected": false,
"text": "Mimir",
"value": "mimir_ds"
},
"hide": 0,
"includeAll": false,
"label": "Data Source",
"multi": false,
"name": "datasource",
"options": [],
"query": "prometheus",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"selected": false,
"text": "Loki",
"value": "loki_ds"
},
"hide": 0,
"includeAll": false,
"label": "Loki Data Source",
"multi": false,
"name": "loki_datasource",
"options": [],
"query": "loki",
"refresh": 1,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "datasource"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components, cluster)\n",
"refId": "cluster"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"current": {
"isNone": true,
"selected": false,
"text": "None",
"value": ""
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\"}, namespace)\n",
"refId": "namespace"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
},
{
"allValue": ".*",
"current": {
"selected": false,
"text": "All",
"value": "$__all"
},
"datasource": {
"uid": "${datasource}"
},
"definition": "",
"hide": 0,
"includeAll": true,
"label": "instance",
"multi": true,
"name": "instance",
"options": [],
"query": {
"query": "label_values(alloy_component_controller_running_components{cluster=\"$cluster\", namespace=\"$namespace\"}, instance)\n",
"refId": "instance"
},
"refresh": 2,
"regex": "",
"skipUrlSync": false,
"sort": 2,
"type": "query"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d",
"90d"
]
},
"timezone": "",
"title": "Alloy / Resources",
"uid": "d6a8574c31f3d7cb8f1345ec84d15a67",
"version": 0,
"weekStart": ""
}

View File

@@ -51,6 +51,7 @@
"overrides": [ ]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -64,7 +65,7 @@
"span": 6,
"targets": [
{
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "series",
"legendLink": null
@@ -98,6 +99,7 @@
"overrides": [ ]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -111,7 +113,7 @@
"span": 6,
"targets": [
{
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}) / sum(loki_ingester_memory_streams{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}) / sum(loki_ingester_memory_streams{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "chunks",
"legendLink": null
@@ -157,6 +159,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -171,19 +174,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_utilization_sum{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_utilization_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_utilization_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_utilization_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -235,6 +238,7 @@
"overrides": [ ]
},
"id": 4,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -249,19 +253,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_age_seconds_sum{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1e3 / sum(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_age_seconds_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1e3 / sum(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -325,6 +329,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -339,19 +344,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_entries_sum{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_entries_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_entries_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_entries_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -403,6 +408,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -416,7 +422,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_chunk_store_index_entries_per_chunk_sum{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_chunk_store_index_entries_per_chunk_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_chunk_store_index_entries_per_chunk_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_chunk_store_index_entries_per_chunk_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Index Entries",
"legendLink": null
@@ -462,6 +468,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -475,7 +482,7 @@
"span": 6,
"targets": [
{
"expr": "loki_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"} or cortex_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}",
"expr": "loki_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"} or cortex_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -658,6 +665,7 @@
},
"fill": 10,
"id": 8,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -673,7 +681,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -719,6 +727,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -732,7 +741,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -766,6 +775,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -780,7 +790,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (reason) (rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by (reason) (rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{reason}}",
"legendLink": null
@@ -837,13 +847,14 @@
"hideZeroBuckets": false,
"highlightCards": true,
"id": 11,
"interval": "1m",
"legend": {
"show": true
},
"span": 12,
"targets": [
{
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "heatmap",
"intervalFactor": 2,
"legendFormat": "{{le}}",
@@ -899,13 +910,14 @@
"hideZeroBuckets": false,
"highlightCards": true,
"id": 12,
"interval": "1m",
"legend": {
"show": true
},
"span": 12,
"targets": [
{
"expr": "sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)",
"expr": "sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)",
"format": "heatmap",
"intervalFactor": 2,
"legendFormat": "{{le}}",
@@ -968,6 +980,7 @@
"overrides": [ ]
},
"id": 13,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -981,19 +994,19 @@
"span": 12,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p99",
"legendLink": null
},
{
"expr": "histogram_quantile(0.90, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"expr": "histogram_quantile(0.90, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p90",
"legendLink": null
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p50",
"legendLink": null
@@ -1039,6 +1052,7 @@
"overrides": [ ]
},
"id": 14,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1052,19 +1066,19 @@
"span": 12,
"targets": [
{
"expr": "histogram_quantile(0.5, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"expr": "histogram_quantile(0.5, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p50",
"legendLink": null
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p99",
"legendLink": null
},
{
"expr": "sum(rate(loki_ingester_chunk_bounds_hours_sum{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_ingester_chunk_bounds_hours_count{cluster=\"$cluster\", job=~\"$namespace/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_bounds_hours_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_ingester_chunk_bounds_hours_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "avg",
"legendLink": null

View File

@@ -35,6 +35,7 @@
"fill": 1,
"format": "none",
"id": 1,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -110,6 +111,7 @@
"fill": 1,
"format": "dtdurations",
"id": 2,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -213,6 +215,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -260,6 +263,7 @@
"overrides": [ ]
},
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -307,6 +311,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -366,6 +371,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -413,6 +419,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -460,6 +467,7 @@
"overrides": [ ]
},
"id": 8,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -519,6 +527,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -566,6 +575,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -579,7 +589,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_compactor_deleted_lines{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval])) by (user)",
"expr": "sum(rate(loki_compactor_deleted_lines{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval])) by (user)",
"format": "time_series",
"legendFormat": "{{user}}",
"legendLink": null
@@ -603,10 +613,11 @@
{
"datasource": "$loki_datasource",
"id": 11,
"interval": "1m",
"span": 6,
"targets": [
{
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"Started processing delete request|delete request for user marked as processed\" | logfmt | line_format \"{{.ts}} user={{.user}} delete_request_id={{.delete_request_id}} msg={{.msg}}\" ",
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"Started processing delete request|delete request for user marked as processed\" | logfmt | line_format \"{{.ts}} user={{.user}} delete_request_id={{.delete_request_id}} msg={{.msg}}\" ",
"refId": "A"
}
],
@@ -616,10 +627,11 @@
{
"datasource": "$loki_datasource",
"id": 12,
"interval": "1m",
"span": 6,
"targets": [
{
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"delete request for user added\" | logfmt | line_format \"{{.ts}} user={{.user}} query='{{.query}}'\"",
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"delete request for user added\" | logfmt | line_format \"{{.ts}} user={{.user}} query='{{.query}}'\"",
"refId": "A"
}
],
@@ -701,6 +713,16 @@
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"label": null,
"name": "loki_datasource",
"options": [ ],
"query": "loki",
"refresh": 1,
"regex": "",
"type": "datasource"
}
]
},

View File

@@ -38,6 +38,7 @@
},
"hiddenSeries": false,
"id": 35,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -129,6 +130,7 @@
},
"hiddenSeries": false,
"id": 41,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -216,6 +218,7 @@
},
"hiddenSeries": false,
"id": 36,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -307,6 +310,7 @@
},
"hiddenSeries": false,
"id": 40,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -398,6 +402,7 @@
},
"hiddenSeries": false,
"id": 38,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -489,6 +494,7 @@
},
"hiddenSeries": false,
"id": 39,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -575,6 +581,7 @@
},
"hiddenSeries": false,
"id": 37,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -667,6 +674,7 @@
},
"hiddenSeries": false,
"id": 42,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -759,6 +767,7 @@
},
"hiddenSeries": false,
"id": 31,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -801,7 +810,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate({cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt| level=\"$level\" |= \"$filter\" | __error__=\"\" [$__auto])) by (level)",
"expr": "sum(rate({cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt| level=\"$level\" |= \"$filter\" | __error__=\"\" [$__interval])) by (level)",
"intervalFactor": 3,
"legendFormat": "{{level}}",
"refId": "A"
@@ -857,6 +866,7 @@
"y": 6
},
"id": 29,
"interval": "1m",
"maxDataPoints": "",
"options": {
"showLabels": false,

View File

@@ -1,724 +0,0 @@
{
"annotations": {
"list": [ ]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"iteration": 1635347545534,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"loki"
],
"targetBlank": false,
"title": "Loki Dashboards",
"type": "dashboards"
}
],
"liveNow": false,
"panels": [
{
"datasource": "${datasource}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [ ],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 2,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "8.3.0-38205pre",
"targets": [
{
"datasource": "${datasource}",
"exemplar": false,
"expr": "sum(loki_ruler_wal_appender_ready) by (pod, tenant) == 0",
"instant": true,
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Appenders Not Ready",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 11,
"x": 2,
"y": 0
},
"id": 4,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_samples_appended_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Samples Appended to WAL per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Series are unique combinations of labels",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 11,
"x": 13,
"y": 0
},
"id": 5,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_storage_created_series_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Series Created per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Difference between highest timestamp appended to WAL and highest timestamp successfully written to remote storage",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "s"
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 10
},
"id": 6,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds{tenant=~\"${tenant}\"}\n- on (tenant)\n (\n loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds{tenant=~\"${tenant}\"}\n or vector(0)\n )",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Write Behind",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 10
},
"id": 7,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_prometheus_remote_storage_samples_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Samples Sent per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 20
},
"id": 8,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum by (tenant) (loki_ruler_wal_disk_size{tenant=~\"${tenant}\"})",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "WAL Disk Size",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Some number of pending samples is expected, but if remote-write is failing this value will remain high",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 20
},
"id": 9,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "max(loki_ruler_wal_prometheus_remote_storage_samples_pending{tenant=~\"${tenant}\"}) by (tenant,pod) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Pending Samples",
"type": "timeseries"
}
],
"refresh": "10s",
"rows": [ ],
"schemaVersion": 14,
"style": "dark",
"tags": [
"loki"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(loki_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(loki_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"label": null,
"name": "loki_datasource",
"options": [ ],
"query": "loki",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": { },
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "tenant",
"options": [ ],
"query": "query_result(sum by (id) (grafanacloud_logs_instance_info) and sum(label_replace(loki_tenant:active_streams{cluster=\"$cluster\",namespace=\"$namespace\"},\"id\",\"$1\",\"tenant\",\"(.*)\")) by(id))",
"refresh": 0,
"regex": "/\"([^\"]+)\"/",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Loki / Recording Rules",
"uid": "recording-rules",
"version": 0,
"weekStart": ""
}

File diff suppressed because it is too large Load Diff

View File

@@ -90,6 +90,7 @@
]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -103,19 +104,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -191,6 +192,7 @@
]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -204,19 +206,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -253,6 +255,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -266,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/query-frontend\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -354,6 +357,7 @@
]
},
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -367,19 +371,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -455,6 +459,7 @@
]
},
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -468,19 +473,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -517,6 +522,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -530,7 +536,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/query-scheduler\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-scheduler|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -619,6 +625,7 @@
},
"gridPos": { },
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -631,19 +638,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -720,6 +727,7 @@
},
"gridPos": { },
"id": 8,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -732,19 +740,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -782,6 +790,7 @@
},
"gridPos": { },
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -794,7 +803,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/querier\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -832,6 +841,7 @@
},
"gridPos": { },
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -844,7 +854,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"querier\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -879,6 +889,7 @@
},
"gridPos": { },
"id": 11,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -891,7 +902,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"querier\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -926,6 +937,7 @@
},
"gridPos": { },
"id": 12,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1025,6 +1037,7 @@
},
"gridPos": { },
"id": 13,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1037,19 +1050,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1126,6 +1139,7 @@
},
"gridPos": { },
"id": 14,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1138,19 +1152,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1188,6 +1202,7 @@
},
"gridPos": { },
"id": 15,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1200,7 +1215,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -1238,6 +1253,7 @@
},
"gridPos": { },
"id": 16,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1250,7 +1266,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1285,6 +1301,7 @@
},
"gridPos": { },
"id": 17,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1297,7 +1314,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|index-gateway\", pod=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1332,6 +1349,7 @@
},
"gridPos": { },
"id": 18,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1344,7 +1362,7 @@
},
"targets": [
{
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary).*\"})",
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(.*index-gateway.*|(loki|enterprise-logs)-read.*|loki-single-binary).*\"})",
"format": "time_series",
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
@@ -1431,6 +1449,7 @@
},
"gridPos": { },
"id": 19,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1443,19 +1462,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1532,6 +1551,7 @@
},
"gridPos": { },
"id": 20,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1544,19 +1564,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1594,6 +1614,7 @@
},
"gridPos": { },
"id": 21,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1606,7 +1627,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/bloom-gateway\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -1644,6 +1665,7 @@
},
"gridPos": { },
"id": 22,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1656,7 +1678,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"bloom-gateway\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1691,6 +1713,7 @@
},
"gridPos": { },
"id": 23,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1703,7 +1726,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"bloom-gateway\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1738,6 +1761,7 @@
},
"gridPos": { },
"id": 24,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1836,6 +1860,7 @@
]
},
"id": 25,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1849,19 +1874,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1937,6 +1962,7 @@
]
},
"id": 26,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1950,19 +1976,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1999,6 +2025,7 @@
"overrides": [ ]
},
"id": 27,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -2012,7 +2039,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(ingester.+|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -2062,6 +2089,7 @@
},
"gridPos": { },
"id": 28,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -2148,6 +2176,7 @@
},
"gridPos": { },
"id": 29,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -2160,19 +2189,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -2249,6 +2278,7 @@
},
"gridPos": { },
"id": 30,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -2261,19 +2291,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -2311,6 +2341,7 @@
},
"gridPos": { },
"id": 31,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -2323,7 +2354,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/ruler\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*ruler|loki-backend|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null

View File

@@ -200,6 +200,7 @@
},
"fill": 10,
"id": 1,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -215,7 +216,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -249,6 +250,7 @@
"overrides": [ ]
},
"id": 2,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -263,7 +265,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -271,7 +273,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -279,7 +281,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -333,6 +335,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -347,7 +350,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -545,6 +548,7 @@
},
"fill": 10,
"id": 4,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -560,7 +564,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -594,6 +598,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -608,7 +613,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -616,7 +621,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -624,7 +629,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -678,6 +683,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -692,7 +698,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", route=~\"(api_prom_rules|api_prom_rules_namespace_groupname|api_v1_rules|loki_api_v1_delete|loki_api_v1_detected_labels|loki_api_v1_index_stats|loki_api_v1_index_volume|loki_api_v1_index_volume_range|loki_api_v1_label_name_values|loki_api_v1_label_values|loki_api_v1_labels|loki_api_v1_patterns|loki_api_v1_query|loki_api_v1_query_range|loki_api_v1_series|otlp_v1_logs|prometheus_api_v1_rules)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -890,6 +896,7 @@
},
"fill": 10,
"id": 7,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -905,7 +912,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -939,6 +946,7 @@
"overrides": [ ]
},
"id": 8,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -953,7 +961,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -961,7 +969,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -969,7 +977,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -1023,6 +1031,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1037,7 +1046,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -1235,6 +1244,7 @@
},
"fill": 10,
"id": 10,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -1250,7 +1260,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -1284,6 +1294,7 @@
"overrides": [ ]
},
"id": 11,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1298,7 +1309,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -1306,7 +1317,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -1314,7 +1325,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -1368,6 +1379,7 @@
"overrides": [ ]
},
"id": 12,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1382,7 +1394,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone-.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -1580,6 +1592,7 @@
},
"fill": 10,
"id": 13,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -1595,7 +1608,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -1629,6 +1642,7 @@
"overrides": [ ]
},
"id": 14,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1643,7 +1657,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -1651,7 +1665,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -1659,7 +1673,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -1713,6 +1727,7 @@
"overrides": [ ]
},
"id": 15,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1727,7 +1742,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*index-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -1925,6 +1940,7 @@
},
"fill": 10,
"id": 16,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -1940,7 +1956,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -1974,6 +1990,7 @@
"overrides": [ ]
},
"id": 17,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1988,7 +2005,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 99th Percentile",
@@ -1996,7 +2013,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le,route) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} 50th Percentile",
@@ -2004,7 +2021,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}) by (route) ",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{ route }} Average",
@@ -2058,6 +2075,7 @@
"overrides": [ ]
},
"id": 18,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -2072,7 +2090,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|(loki|enterprise-logs)-backend|loki-single-binary)\", route=~\"(/base.Ruler/Rules|/indexgatewaypb.IndexGateway/GetChunkRef|/indexgatewaypb.IndexGateway/GetSeries|/indexgatewaypb.IndexGateway/GetShards|/indexgatewaypb.IndexGateway/GetStats|/indexgatewaypb.IndexGateway/GetVolume|/indexgatewaypb.IndexGateway/LabelNamesForMetricName|/indexgatewaypb.IndexGateway/LabelValuesForMetricName|/indexgatewaypb.IndexGateway/QueryIndex|/logproto.BloomGateway/FilterChunkRefs|/logproto.Pattern/Query|/logproto.Querier/GetChunkIDs|/logproto.Querier/GetDetectedLabels|/logproto.Querier/GetStats|/logproto.Querier/GetVolume|/logproto.Querier/Label|/logproto.Querier/Query|/logproto.Querier/QuerySample|/logproto.Querier/Series|/logproto.StreamData/GetStreamRates)\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -2270,6 +2288,7 @@
},
"fill": 10,
"id": 19,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -2285,7 +2304,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -2319,6 +2338,7 @@
"overrides": [ ]
},
"id": 20,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -2333,19 +2353,19 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_index_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) * 1e3 / sum(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval]))",
"expr": "sum(rate(loki_index_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) * 1e3 / sum(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -2397,6 +2417,7 @@
"overrides": [ ]
},
"id": 21,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -2411,7 +2432,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|(loki|enterprise-logs)-read|loki-single-binary)\", operation!=\"index_chunk\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,
@@ -2609,6 +2630,7 @@
},
"fill": 10,
"id": 22,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -2624,7 +2646,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -2658,6 +2680,7 @@
"overrides": [ ]
},
"id": 23,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -2672,19 +2695,19 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_boltdb_shipper_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) * 1e3 / sum(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval]))",
"expr": "sum(rate(loki_boltdb_shipper_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) * 1e3 / sum(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -2736,6 +2759,7 @@
"overrides": [ ]
},
"id": 24,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -2750,7 +2774,7 @@
"span": 4,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(querier|index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le,pod)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*querier|.*index-gateway|(loki|enterprise-logs)-read|loki-single-binary)\", operation=\"Shipper.Query\"}[$__rate_interval])) by (le,pod)) * 1e3",
"format": "time_series",
"interval": "1m",
"intervalFactor": 2,

View File

@@ -90,6 +90,7 @@
]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -103,19 +104,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -191,6 +192,7 @@
]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -204,19 +206,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -253,6 +255,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -266,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/\"(compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -317,6 +320,7 @@
},
"fill": 1,
"id": 4,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -423,6 +427,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -482,6 +487,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -529,6 +535,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -590,6 +597,7 @@
},
"fill": 1,
"id": 8,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -696,6 +704,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -743,6 +752,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -802,6 +812,7 @@
"overrides": [ ]
},
"id": 11,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -849,6 +860,7 @@
"overrides": [ ]
},
"id": 12,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -896,6 +908,7 @@
"overrides": [ ]
},
"id": 13,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -956,6 +969,7 @@
},
"format": "short",
"id": 14,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1004,6 +1018,7 @@
"overrides": [ ]
},
"id": 15,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1095,6 +1110,7 @@
},
"format": "short",
"id": 16,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1143,6 +1159,7 @@
"overrides": [ ]
},
"id": 17,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1233,6 +1250,7 @@
"overrides": [ ]
},
"id": 18,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1280,6 +1298,7 @@
"overrides": [ ]
},
"id": 19,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1327,6 +1346,7 @@
"overrides": [ ]
},
"id": 20,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1367,7 +1387,7 @@
"span": 12,
"targets": [
{
"expr": "{cluster=~\"$cluster\", job=~\"($namespace)/\"(compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"\"}",
"expr": "{cluster=~\"$cluster\", job=~\"($namespace)/(.*compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}",
"refId": "A"
}
],

View File

@@ -90,6 +90,7 @@
]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -103,7 +104,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -115,7 +116,7 @@
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -191,6 +192,7 @@
]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -204,7 +206,7 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -216,7 +218,7 @@
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -253,6 +255,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -266,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/distributor\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|loki-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -316,6 +319,7 @@
},
"gridPos": { },
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -328,7 +332,7 @@
},
"targets": [
{
"expr": "sum by(pod) (loki_ingester_memory_streams{cluster=~\"$cluster\", job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"expr": "sum by(pod) (loki_ingester_memory_streams{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -405,6 +409,7 @@
},
"gridPos": { },
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -417,19 +422,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -506,6 +511,7 @@
},
"gridPos": { },
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -518,19 +524,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -568,6 +574,7 @@
},
"gridPos": { },
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -580,7 +587,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -618,6 +625,7 @@
},
"gridPos": { },
"id": 8,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -630,7 +638,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -665,6 +673,7 @@
},
"gridPos": { },
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -677,7 +686,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -712,6 +721,7 @@
},
"gridPos": { },
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -724,7 +734,7 @@
},
"targets": [
{
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(ingester.*|(loki|enterprise-logs)-write|loki-single-binary).*\"})",
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary).*\"})",
"format": "time_series",
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null

View File

@@ -200,6 +200,7 @@
},
"fill": 10,
"id": 1,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -215,7 +216,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -249,6 +250,7 @@
"overrides": [ ]
},
"id": 2,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -263,7 +265,7 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
@@ -271,7 +273,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
@@ -279,7 +281,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\", route=~\"api_prom_push|loki_api_v1_push|/httpgrpc.HTTP/Handle\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
@@ -345,6 +347,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -358,7 +361,7 @@
"span": 6,
"targets": [
{
"expr": "sum (rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval])) / sum(rate(loki_distributor_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval]))",
"expr": "sum (rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval])) / sum(rate(loki_distributor_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "bytes",
"legendLink": null
@@ -392,6 +395,7 @@
"overrides": [ ]
},
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -406,7 +410,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (tenant) (rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval])) / ignoring(tenant) group_left sum(rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval]))",
"expr": "sum by (tenant) (rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval])) / ignoring(tenant) group_left sum(rate(loki_distributor_structured_metadata_bytes_received_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*distributor|(loki|enterprise-logs)-write|loki-single-binary)\",}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{tenant}}",
"legendLink": null
@@ -619,6 +623,7 @@
},
"fill": 10,
"id": 5,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -634,7 +639,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -668,6 +673,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -682,7 +688,7 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
@@ -690,7 +696,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
@@ -698,7 +704,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester-zone.*|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
@@ -913,6 +919,7 @@
},
"fill": 10,
"id": 7,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -928,7 +935,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -962,6 +969,7 @@
"overrides": [ ]
},
"id": 8,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -976,7 +984,7 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"expr": "histogram_quantile(0.99, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
@@ -984,7 +992,7 @@
"step": 10
},
{
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"expr": "histogram_quantile(0.50, sum by (le) (cluster_job_route:loki_request_duration_seconds_bucket:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
@@ -992,7 +1000,7 @@
"step": 10
},
{
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})",
"expr": "1e3 * sum(cluster_job_route:loki_request_duration_seconds_sum:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"}) / sum(cluster_job_route:loki_request_duration_seconds_count:sum_rate{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", route=\"/logproto.Pusher/Push\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
@@ -1207,6 +1215,7 @@
},
"fill": 10,
"id": 9,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -1222,7 +1231,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -1256,6 +1265,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1270,19 +1280,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_index_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_index_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) * 1e3 / sum(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval]))",
"expr": "sum(rate(loki_index_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval])) * 1e3 / sum(rate(loki_index_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"index_chunk\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -1495,6 +1505,7 @@
},
"fill": 10,
"id": 11,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -1510,7 +1521,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -1544,6 +1555,7 @@
"overrides": [ ]
},
"id": 12,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1558,19 +1570,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_boltdb_shipper_request_duration_seconds_bucket{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_boltdb_shipper_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) * 1e3 / sum(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval]))",
"expr": "sum(rate(loki_boltdb_shipper_request_duration_seconds_sum{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval])) * 1e3 / sum(rate(loki_boltdb_shipper_request_duration_seconds_count{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\", operation=\"WRITE\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"

View File

@@ -1,836 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\",resource=\"cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "CPU",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (workingset)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (go heap inuse)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Alertmanager",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_receive_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?alertmanager.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Receive bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_transmit_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?alertmanager.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Transmit bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk writes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_read_bytes_total[$__rate_interval]\n )\n) + ignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk reads",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=~\"(alertmanager).*\"\n }\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk space utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Alertmanager resources",
"uid": "a6883fb22799ac74479c7db872451092",
"version": 0
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,940 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "CPU",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (go heap inuse)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU and memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_rss{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (RSS)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (workingset)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_receive_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?compactor.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Receive bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_transmit_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?compactor.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Transmit bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"compactor\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk writes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_read_bytes_total[$__rate_interval]\n )\n) + ignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"compactor\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk reads",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=~\"(compactor).*\"\n }\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk space utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Compactor resources",
"uid": "09a5c49e9cdb2f2b24c6d184574a07fd",
"version": 0
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,312 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "count(cortex_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}) by (sha256)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sha256:{{sha256}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Startup config file hashes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "instances",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Startup config file",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "count(cortex_runtime_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}) by (sha256)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sha256:{{sha256}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Runtime config file hashes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "instances",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Runtime config file",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Config",
"uid": "5d9d0b4724c0f80d68467088ec61e003",
"version": 0
}

View File

@@ -1,938 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(component) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{component}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "RPS / component",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"max": 1,
"min": 0,
"noValue": "0",
"unit": "percentunit"
}
},
"id": 2,
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 6,
"targets": [
{
"expr": "sum by(component) (rate(thanos_objstore_bucket_operation_failures_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) / sum by(component) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) >= 0",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{component}}",
"legendLink": null
}
],
"title": "Error rate / component",
"type": "timeseries"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Components",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(operation) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "RPS / operation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"max": 1,
"min": 0,
"noValue": "0",
"unit": "percentunit"
}
},
"id": 4,
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 6,
"targets": [
{
"expr": "sum by(operation) (rate(thanos_objstore_bucket_operation_failures_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) / sum by(operation) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) >= 0",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation}}",
"legendLink": null
}
],
"title": "Error rate / operation",
"type": "timeseries"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Operations",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Get",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: GetRange",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Exists",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Attributes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Upload",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Delete",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Object Store",
"uid": "e1324ee2a434f4158c00a9ee279d3292",
"version": 0
}

View File

@@ -1,266 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${datasource}",
"id": 1,
"span": 12,
"targets": [
{
"expr": "max by(limit_name) (cortex_limits_defaults{cluster=~\"$cluster\",namespace=~\"$namespace\"})",
"instant": true,
"legendFormat": "",
"refId": "A"
}
],
"title": "Defaults",
"transformations": [
{
"id": "labelsToFields",
"options": { }
},
{
"id": "merge",
"options": { }
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"indexByName": {
"Value": 1,
"limit_name": 0
}
}
},
{
"id": "sortBy",
"options": {
"fields": { },
"sort": [
{
"field": "limit_name"
}
]
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${datasource}",
"id": 2,
"span": 12,
"targets": [
{
"expr": "max by(user, limit_name) (cortex_limits_overrides{cluster=~\"$cluster\",namespace=~\"$namespace\",user=~\"${tenant_id}\"})",
"instant": true,
"legendFormat": "",
"refId": "A"
}
],
"title": "Per-tenant overrides",
"transformations": [
{
"id": "labelsToFields",
"options": {
"mode": "columns",
"valueLabel": "limit_name"
}
},
{
"id": "merge",
"options": { }
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"indexByName": {
"user": 0
}
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"current": {
"selected": true,
"text": ".*",
"value": ".*"
},
"hide": 0,
"label": "Tenant ID",
"name": "tenant_id",
"options": [
{
"selected": true,
"text": ".*",
"value": ".*"
}
],
"query": ".*",
"type": "textbox"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Overrides",
"uid": "1e2c358600ac53f09faea133f811b5bb",
"version": 0
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,362 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "200px",
"panels": [
{
"id": 1,
"options": {
"content": "This dashboard identifies scaling-related issues by suggesting services that you might want to scale up.\nThe table that follows contains a suggested number of replicas and the reason why.\nIf the system is failing and depending on the reason, try scaling up to the specified number.\nThe specified numbers are intended as helpful guidelines when things go wrong, rather than prescriptive guidelines.\n\nReasons:\n- **sample_rate**: There are not enough replicas to handle the\n sample rate. Applies to distributor and ingesters.\n- **active_series**: There are not enough replicas\n to handle the number of active series. Applies to ingesters.\n- **cpu_usage**: There are not enough replicas\n based on the CPU usage of the jobs vs the resource requests.\n Applies to all jobs.\n- **memory_usage**: There are not enough replicas based on the memory\n usage vs the resource requests. Applies to all jobs.\n- **active_series_limits**: There are not enough replicas to hold 60% of the\n sum of all the per tenant series limits.\n- **sample_rate_limits**: There are not enough replicas to handle 60% of the\n sum of all the per tenant rate limits.\n",
"mode": "markdown"
},
"span": 12,
"title": "",
"type": "text"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Service scaling",
"titleSize": "h6"
},
{
"collapse": false,
"height": "400px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"sort": {
"col": 0,
"desc": false
},
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Required Replicas",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Cluster",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "__name__",
"thresholds": [ ],
"type": "hidden",
"unit": "short"
},
{
"alias": "Cluster",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "cluster",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Service",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "deployment",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "namespace",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Reason",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "reason",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [ ],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sort_desc(\n cluster_namespace_deployment_reason:required_replicas:count{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n > ignoring(reason) group_left\n cluster_namespace_deployment:actual_replicas:count{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Workload-based scaling",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Scaling",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Scaling",
"uid": "64bbad83507b7289b514725658e10352",
"version": 0
}

View File

@@ -1,323 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${lokidatasource}",
"fieldConfig": {
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Time range"
},
"properties": [
{
"id": "mappings",
"value": [
{
"from": "",
"id": 1,
"text": "Instant query",
"to": "",
"type": 1,
"value": "0"
}
]
},
{
"id": "unit",
"value": "s"
}
]
},
{
"matcher": {
"id": "byName",
"options": "Step"
},
"properties": [
{
"id": "unit",
"value": "s"
}
]
}
]
},
"id": 1,
"span": 12,
"targets": [
{
"expr": "{cluster=~\"$cluster\",namespace=~\"$namespace\",name=~\"query-frontend.*\"} |= \"query stats\" != \"/api/v1/read\" | logfmt | user=~\"${tenant_id}\" | response_time > ${min_duration}",
"instant": false,
"legendFormat": "",
"range": true,
"refId": "A"
}
],
"title": "Slow queries",
"transformations": [
{
"id": "extractFields",
"options": {
"source": "labels"
}
},
{
"id": "calculateField",
"options": {
"alias": "Time range",
"binary": {
"left": "param_end",
"operator": "-",
"reducer": "sum",
"right": "param_start"
},
"mode": "binary",
"reduce": {
"reducer": "sum"
},
"replaceFields": false
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Line": true,
"Time": true,
"caller": true,
"cluster": true,
"container": true,
"host": true,
"id": true,
"job": true,
"labels": true,
"level": true,
"line": true,
"method": true,
"msg": true,
"name": true,
"namespace": true,
"param_end": true,
"param_start": true,
"param_time": true,
"path": true,
"pod": true,
"pod_template_hash": true,
"query_wall_time_seconds": true,
"stream": true,
"traceID": true,
"tsNs": true
},
"indexByName": {
"Time range": 3,
"param_query": 2,
"param_step": 4,
"response_time": 5,
"ts": 0,
"user": 1
},
"renameByName": {
"org_id": "Tenant ID",
"param_query": "Query",
"param_step": "Step",
"response_time": "Duration"
}
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"includeAll": false,
"label": "Logs datasource",
"multi": false,
"name": "lokidatasource",
"query": "loki",
"type": "datasource"
},
{
"current": {
"selected": true,
"text": "5s",
"value": "5s"
},
"hide": 0,
"label": "Min duration",
"name": "min_duration",
"options": [
{
"selected": true,
"text": "5s",
"value": "5s"
}
],
"query": "5s",
"type": "textbox"
},
{
"current": {
"selected": true,
"text": ".*",
"value": ".*"
},
"hide": 0,
"label": "Tenant ID",
"name": "tenant_id",
"options": [
{
"selected": true,
"text": ".*",
"value": ".*"
}
],
"query": ".*",
"type": "textbox"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Slow queries",
"uid": "6089e1ce1e678788f46312a0a1e647e6",
"version": 0
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,52 +1,53 @@
- name: "loki_rules"
rules:
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[5m]))
by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)
/ sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job,
route)"
record: "cluster_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate"
groups:
- name: "loki_rules"
rules:
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(loki_request_duration_seconds_count[5m]))
by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:loki_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)
/ sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, job,
route)"
record: "cluster_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:loki_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:loki_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(loki_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:loki_request_duration_seconds:50quantile"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) / sum(rate(loki_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds:avg"
- expr: "sum(rate(loki_request_duration_seconds_bucket[5m])) by (le, cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate"

View File

@@ -1,555 +0,0 @@
groups:
- name: "mimir_api_1"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job) / sum(rate(cortex_request_duration_seconds_count[5m]))
by (cluster, job)"
record: "cluster_job:cortex_request_duration_seconds:avg"
- expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:cortex_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_request_duration_seconds_count:sum_rate"
- name: "mimir_api_2"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:cortex_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:cortex_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job, route)
/ sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:cortex_request_duration_seconds:avg"
- expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, job,
route)"
record: "cluster_job_route:cortex_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, job, route)"
record: "cluster_job_route:cortex_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, job, route)"
record: "cluster_job_route:cortex_request_duration_seconds_count:sum_rate"
- name: "mimir_api_3"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:cortex_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:cortex_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route) / sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:cortex_request_duration_seconds:avg"
- expr: "sum(rate(cortex_request_duration_seconds_bucket[5m])) by (le, cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:cortex_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_sum[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:cortex_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:cortex_request_duration_seconds_count:sum_rate"
- name: "mimir_querier_api"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_querier_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_querier_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job) / sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job)"
record: "cluster_job:cortex_querier_request_duration_seconds:avg"
- expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
job)"
record: "cluster_job:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job)"
record: "cluster_job:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job)"
record: "cluster_job:cortex_querier_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:cortex_querier_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, job, route))"
record: "cluster_job_route:cortex_querier_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job, route) / sum(rate(cortex_querier_request_duration_seconds_count[5m])) by
(cluster, job, route)"
record: "cluster_job_route:cortex_querier_request_duration_seconds:avg"
- expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
job, route)"
record: "cluster_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
job, route)"
record: "cluster_job_route:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
job, route)"
record: "cluster_job_route:cortex_querier_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_querier_request_duration_seconds_bucket[5m]))
by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
namespace, job, route) / sum(rate(cortex_querier_request_duration_seconds_count[5m]))
by (cluster, namespace, job, route)"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds:avg"
- expr: "sum(rate(cortex_querier_request_duration_seconds_bucket[5m])) by (le, cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_sum[5m])) by (cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_querier_request_duration_seconds_count[5m])) by (cluster,
namespace, job, route)"
record: "cluster_namespace_job_route:cortex_querier_request_duration_seconds_count:sum_rate"
- name: "mimir_cache"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_memcache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method))"
record: "cluster_job_method:cortex_memcache_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_memcache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method))"
record: "cluster_job_method:cortex_memcache_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_memcache_request_duration_seconds_sum[5m])) by (cluster,
job, method) / sum(rate(cortex_memcache_request_duration_seconds_count[5m]))
by (cluster, job, method)"
record: "cluster_job_method:cortex_memcache_request_duration_seconds:avg"
- expr: "sum(rate(cortex_memcache_request_duration_seconds_bucket[5m])) by (le, cluster,
job, method)"
record: "cluster_job_method:cortex_memcache_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_memcache_request_duration_seconds_sum[5m])) by (cluster,
job, method)"
record: "cluster_job_method:cortex_memcache_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_memcache_request_duration_seconds_count[5m])) by (cluster,
job, method)"
record: "cluster_job_method:cortex_memcache_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_cache_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_cache_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job)
/ sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_cache_request_duration_seconds:avg"
- expr: "sum(rate(cortex_cache_request_duration_seconds_bucket[5m])) by (le, cluster,
job)"
record: "cluster_job:cortex_cache_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_cache_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job)"
record: "cluster_job:cortex_cache_request_duration_seconds_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method))"
record: "cluster_job_method:cortex_cache_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_cache_request_duration_seconds_bucket[5m]))
by (le, cluster, job, method))"
record: "cluster_job_method:cortex_cache_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job,
method) / sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job, method)"
record: "cluster_job_method:cortex_cache_request_duration_seconds:avg"
- expr: "sum(rate(cortex_cache_request_duration_seconds_bucket[5m])) by (le, cluster,
job, method)"
record: "cluster_job_method:cortex_cache_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_cache_request_duration_seconds_sum[5m])) by (cluster, job,
method)"
record: "cluster_job_method:cortex_cache_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_cache_request_duration_seconds_count[5m])) by (cluster,
job, method)"
record: "cluster_job_method:cortex_cache_request_duration_seconds_count:sum_rate"
- name: "mimir_storage"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_kv_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_kv_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_kv_request_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_kv_request_duration_seconds:50quantile"
- expr: "sum(rate(cortex_kv_request_duration_seconds_sum[5m])) by (cluster, job)
/ sum(rate(cortex_kv_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_kv_request_duration_seconds:avg"
- expr: "sum(rate(cortex_kv_request_duration_seconds_bucket[5m])) by (le, cluster,
job)"
record: "cluster_job:cortex_kv_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_kv_request_duration_seconds_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_kv_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_kv_request_duration_seconds_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_kv_request_duration_seconds_count:sum_rate"
- name: "mimir_queries"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_query_frontend_retries_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_query_frontend_retries:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_query_frontend_retries_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_query_frontend_retries:50quantile"
- expr: "sum(rate(cortex_query_frontend_retries_sum[5m])) by (cluster, job) / sum(rate(cortex_query_frontend_retries_count[5m]))
by (cluster, job)"
record: "cluster_job:cortex_query_frontend_retries:avg"
- expr: "sum(rate(cortex_query_frontend_retries_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:cortex_query_frontend_retries_bucket:sum_rate"
- expr: "sum(rate(cortex_query_frontend_retries_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_query_frontend_retries_sum:sum_rate"
- expr: "sum(rate(cortex_query_frontend_retries_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_query_frontend_retries_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds:50quantile"
- expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_sum[5m])) by (cluster,
job) / sum(rate(cortex_query_frontend_queue_duration_seconds_count[5m])) by
(cluster, job)"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds:avg"
- expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_bucket[5m])) by (le,
cluster, job)"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_sum[5m])) by (cluster,
job)"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds_sum:sum_rate"
- expr: "sum(rate(cortex_query_frontend_queue_duration_seconds_count[5m])) by (cluster,
job)"
record: "cluster_job:cortex_query_frontend_queue_duration_seconds_count:sum_rate"
- name: "mimir_ingester_queries"
rules:
- expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_series_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_series:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_series_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_series:50quantile"
- expr: "sum(rate(cortex_ingester_queried_series_sum[5m])) by (cluster, job) / sum(rate(cortex_ingester_queried_series_count[5m]))
by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_series:avg"
- expr: "sum(rate(cortex_ingester_queried_series_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:cortex_ingester_queried_series_bucket:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_series_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_series_sum:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_series_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_series_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_samples_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_samples:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_samples_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_samples:50quantile"
- expr: "sum(rate(cortex_ingester_queried_samples_sum[5m])) by (cluster, job) / sum(rate(cortex_ingester_queried_samples_count[5m]))
by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_samples:avg"
- expr: "sum(rate(cortex_ingester_queried_samples_bucket[5m])) by (le, cluster, job)"
record: "cluster_job:cortex_ingester_queried_samples_bucket:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_samples_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_samples_sum:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_samples_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_samples_count:sum_rate"
- expr: "histogram_quantile(0.99, sum(rate(cortex_ingester_queried_exemplars_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_exemplars:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(cortex_ingester_queried_exemplars_bucket[5m]))
by (le, cluster, job))"
record: "cluster_job:cortex_ingester_queried_exemplars:50quantile"
- expr: "sum(rate(cortex_ingester_queried_exemplars_sum[5m])) by (cluster, job) /
sum(rate(cortex_ingester_queried_exemplars_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_exemplars:avg"
- expr: "sum(rate(cortex_ingester_queried_exemplars_bucket[5m])) by (le, cluster,
job)"
record: "cluster_job:cortex_ingester_queried_exemplars_bucket:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_exemplars_sum[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_exemplars_sum:sum_rate"
- expr: "sum(rate(cortex_ingester_queried_exemplars_count[5m])) by (cluster, job)"
record: "cluster_job:cortex_ingester_queried_exemplars_count:sum_rate"
- name: "mimir_received_samples"
rules:
- expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_received_samples_total[5m]))"
record: "cluster_namespace_job:cortex_distributor_received_samples:rate5m"
- name: "mimir_exemplars_in"
rules:
- expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_exemplars_in_total[5m]))"
record: "cluster_namespace_job:cortex_distributor_exemplars_in:rate5m"
- name: "mimir_received_exemplars"
rules:
- expr: "sum by (cluster, namespace, job) (rate(cortex_distributor_received_exemplars_total[5m]))"
record: "cluster_namespace_job:cortex_distributor_received_exemplars:rate5m"
- name: "mimir_exemplars_ingested"
rules:
- expr: "sum by (cluster, namespace, job) (rate(cortex_ingester_ingested_exemplars_total[5m]))"
record: "cluster_namespace_job:cortex_ingester_ingested_exemplars:rate5m"
- name: "mimir_exemplars_appended"
rules:
- expr: "sum by (cluster, namespace, job) (rate(cortex_ingester_tsdb_exemplar_exemplars_appended_total[5m]))"
record: "cluster_namespace_job:cortex_ingester_tsdb_exemplar_exemplars_appended:rate5m"
- name: "mimir_scaling_rules"
rules:
- expr: |
# Convenience rule to get the number of replicas for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix.
sum by (cluster, namespace, deployment) (
label_replace(
kube_deployment_spec_replicas,
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
or
sum by (cluster, namespace, deployment) (
label_replace(kube_statefulset_replicas, "deployment", "$1", "statefulset", "(.*?)(?:-zone-[a-z])?")
)
record: "cluster_namespace_deployment:actual_replicas:count"
- expr: |
ceil(
quantile_over_time(0.99,
sum by (cluster, namespace) (
cluster_namespace_job:cortex_distributor_received_samples:rate5m
)[24h:]
)
/ 240000
)
labels:
deployment: "distributor"
reason: "sample_rate"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"})
* 0.59999999999999998 / 240000
)
labels:
deployment: "distributor"
reason: "sample_rate_limits"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
quantile_over_time(0.99,
sum by (cluster, namespace) (
cluster_namespace_job:cortex_distributor_received_samples:rate5m
)[24h:]
)
* 3 / 80000
)
labels:
deployment: "ingester"
reason: "sample_rate"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
quantile_over_time(0.99,
sum by(cluster, namespace) (
cortex_ingester_memory_series
)[24h:]
)
/ 1500000
)
labels:
deployment: "ingester"
reason: "active_series"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="max_global_series_per_user"})
* 3 * 0.59999999999999998 / 1500000
)
labels:
deployment: "ingester"
reason: "active_series_limits"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
sum by (cluster, namespace) (cortex_limits_overrides{limit_name="ingestion_rate"})
* 0.59999999999999998 / 80000
)
labels:
deployment: "ingester"
reason: "sample_rate_limits"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
ceil(
(sum by (cluster, namespace) (
cortex_ingester_tsdb_storage_blocks_bytes{job=~".+/ingester.*"}
) / 4)
/
avg by (cluster, namespace) (
memcached_limit_bytes{job=~".+/memcached"}
)
)
labels:
deployment: "memcached"
reason: "active_series"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
sum by (cluster, namespace, pod)(rate(container_cpu_usage_seconds_total[5m])),
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
record: "cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate"
- expr: |
# Convenience rule to get the CPU request for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix.
# This recording rule is made compatible with the breaking changes introduced in kube-state-metrics v2
# that remove resource metrics, ref:
# - https://github.com/kubernetes/kube-state-metrics/blob/master/CHANGELOG.md#v200-alpha--2020-09-16
# - https://github.com/kubernetes/kube-state-metrics/pull/1004
#
# This is the old expression, compatible with kube-state-metrics < v2.0.0,
# where kube_pod_container_resource_requests_cpu_cores was removed:
(
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
kube_pod_container_resource_requests_cpu_cores,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
or
# This expression is compatible with kube-state-metrics >= v1.4.0,
# where kube_pod_container_resource_requests was introduced.
(
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
kube_pod_container_resource_requests{resource="cpu"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
record: "cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum"
- expr: |
# Jobs should be sized to their CPU usage.
# We do this by comparing 99th percentile usage over the last 24hrs to
# their current provisioned #replicas and resource requests.
ceil(
cluster_namespace_deployment:actual_replicas:count
*
quantile_over_time(0.99, cluster_namespace_deployment:container_cpu_usage_seconds_total:sum_rate[24h])
/
cluster_namespace_deployment:kube_pod_container_resource_requests_cpu_cores:sum
)
labels:
reason: "cpu_usage"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- expr: |
# Convenience rule to get the Memory utilization for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix.
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
container_memory_usage_bytes{image!=""},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
record: "cluster_namespace_deployment:container_memory_usage_bytes:sum"
- expr: |
# Convenience rule to get the Memory request for both a deployment and a statefulset.
# Multi-zone deployments are grouped together removing the "zone-X" suffix.
# This recording rule is made compatible with the breaking changes introduced in kube-state-metrics v2
# that remove resource metrics, ref:
# - https://github.com/kubernetes/kube-state-metrics/blob/master/CHANGELOG.md#v200-alpha--2020-09-16
# - https://github.com/kubernetes/kube-state-metrics/pull/1004
#
# This is the old expression, compatible with kube-state-metrics < v2.0.0,
# where kube_pod_container_resource_requests_memory_bytes was removed:
(
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
kube_pod_container_resource_requests_memory_bytes,
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
or
# This expression is compatible with kube-state-metrics >= v1.4.0,
# where kube_pod_container_resource_requests was introduced.
(
sum by (cluster, namespace, deployment) (
label_replace(
label_replace(
kube_pod_container_resource_requests{resource="memory"},
"deployment", "$1", "pod", "(.*)-(?:([0-9]+)|([a-z0-9]+)-([a-z0-9]+))"
),
# The question mark in "(.*?)" is used to make it non-greedy, otherwise it
# always matches everything and the (optional) zone is not removed.
"deployment", "$1", "deployment", "(.*?)(?:-zone-[a-z])?"
)
)
)
record: "cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum"
- expr: |
# Jobs should be sized to their Memory usage.
# We do this by comparing 99th percentile usage over the last 24hrs to
# their current provisioned #replicas and resource requests.
ceil(
cluster_namespace_deployment:actual_replicas:count
*
quantile_over_time(0.99, cluster_namespace_deployment:container_memory_usage_bytes:sum[24h])
/
cluster_namespace_deployment:kube_pod_container_resource_requests_memory_bytes:sum
)
labels:
reason: "memory_usage"
record: "cluster_namespace_deployment_reason:required_replicas:count"
- name: "mimir_alertmanager_rules"
rules:
- expr: "sum by (cluster, job, pod) (cortex_alertmanager_alerts)"
record: "cluster_job_pod:cortex_alertmanager_alerts:sum"
- expr: "sum by (cluster, job, pod) (cortex_alertmanager_silences)"
record: "cluster_job_pod:cortex_alertmanager_silences:sum"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_alerts_received_total[5m]))"
record: "cluster_job:cortex_alertmanager_alerts_received_total:rate5m"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_alerts_invalid_total[5m]))"
record: "cluster_job:cortex_alertmanager_alerts_invalid_total:rate5m"
- expr: "sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_total[5m]))"
record: "cluster_job_integration:cortex_alertmanager_notifications_total:rate5m"
- expr: "sum by (cluster, job, integration) (rate(cortex_alertmanager_notifications_failed_total[5m]))"
record: "cluster_job_integration:cortex_alertmanager_notifications_failed_total:rate5m"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_state_replication_total[5m]))"
record: "cluster_job:cortex_alertmanager_state_replication_total:rate5m"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_state_replication_failed_total[5m]))"
record: "cluster_job:cortex_alertmanager_state_replication_failed_total:rate5m"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_total[5m]))"
record: "cluster_job:cortex_alertmanager_partial_state_merges_total:rate5m"
- expr: "sum by (cluster, job) (rate(cortex_alertmanager_partial_state_merges_failed_total[5m]))"
record: "cluster_job:cortex_alertmanager_partial_state_merges_failed_total:rate5m"
- name: "mimir_ingester_rules"
rules:
- expr: "sum by(cluster, namespace, pod) (rate(cortex_ingester_ingested_samples_total[5m]))"
record: "cluster_namespace_pod:cortex_ingester_ingested_samples_total:rate1m"

View File

@@ -1,15 +0,0 @@
groups:
- name: "tempo_rules"
rules:
- expr: "histogram_quantile(0.99, sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:tempo_request_duration_seconds:99quantile"
- expr: "histogram_quantile(0.50, sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route))"
record: "cluster_namespace_job_route:tempo_request_duration_seconds:50quantile"
- expr: "sum(rate(tempo_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route) / sum(rate(tempo_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)"
record: "cluster_namespace_job_route:tempo_request_duration_seconds:avg"
- expr: "sum(rate(tempo_request_duration_seconds_bucket[5m])) by (le, cluster, namespace, job, route)"
record: "cluster_namespace_job_route:tempo_request_duration_seconds_bucket:sum_rate"
- expr: "sum(rate(tempo_request_duration_seconds_sum[5m])) by (cluster, namespace, job, route)"
record: "cluster_namespace_job_route:tempo_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(tempo_request_duration_seconds_count[5m])) by (cluster, namespace, job, route)"
record: "cluster_namespace_job_route:tempo_request_duration_seconds_count:sum_rate"

View File

@@ -6,6 +6,24 @@
{{- join ", " $list }}
{{- end }}
{{- define "agent.all_namespaces" -}}
{{- $list := list }}
{{- range .Values.namespacesToMonitor }}
{{- $list = append $list (printf "\"%s\"" .) }}
{{- end }}
{{- $list = append $list (printf "\"%s\"" .Release.Namespace) }}
{{- join ", " $list }}
{{- end }}
{{- define "agent.all_namespaces_bar" -}}
{{- $list := list }}
{{- range .Values.namespacesToMonitor }}
{{- $list = append $list (printf "%s" .) }}
{{- end }}
{{- $list = append $list .Release.Namespace }}
{{- join "|" $list }}
{{- end }}
{{- define "agent.loki_write_targets" -}}
{{- $list := list }}
{{- if .Values.local.logs.enabled }}
@@ -39,10 +57,32 @@
{{- define "agent.tempo_write_targets" -}}
{{- $list := list }}
{{- if .Values.local.traces.enabled }}
{{- $list = append $list ("otelcol.exporter.otlp.local.input") }}
{{- $list = append $list ("otelcol.exporter.otlphttp.local.input") }}
{{- end }}
{{- if .Values.cloud.traces.enabled }}
{{- $list = append $list ("otelcol.exporter.otlp.cloud.input") }}
{{- $list = append $list ("otelcol.exporter.otlphttp.cloud.input") }}
{{- end }}
{{- join ", " $list }}
{{- end }}
{{- define "agent.all_logs" -}}
{{- $list := list }}
{{- range .Values.logs.retain }}
{{- $list = append $list . }}
{{- end }}
{{- range .Values.logs.extraLogs }}
{{- $list = append $list . }}
{{- end }}
{{- join "|" $list }}
{{- end }}
{{- define "agent.all_metrics" -}}
{{- $list := list }}
{{- range .Values.metrics.retain }}
{{- $list = append $list . }}
{{- end }}
{{- range .Values.metrics.extraMetrics }}
{{- $list = append $list . }}
{{- end }}
{{- join "|" $list }}
{{- end }}

View File

@@ -40,10 +40,12 @@ data:
{{- if or .Values.local.logs.enabled .Values.cloud.logs.enabled }}
// Logs
{{- if .Values.cloud.logs.enabled }}
remote.kubernetes.secret "logs_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.logs.secret -}}"
}
{{- end }}
loki.source.kubernetes "pods" {
clustering {
@@ -57,9 +59,9 @@ data:
loki.process "filter" {
forward_to = [ {{ include "agent.loki_write_targets" . }} ]
{{- if not (empty .Values.logs.retain) }}
{{- if or (not (empty .Values.logs.retain)) (not (empty .Values.logs.extraLogs)) }}
stage.match {
selector = "{cluster=\"{{- .Values.clusterLabelValue -}}\", namespace=~\"{{- join "|" .Values.namespacesToMonitor -}}|{{- $.Release.Namespace -}}\", pod=~\"loki.*\"} !~ \"{{ join "|" .Values.logs.retain }}\""
selector = "{cluster=\"{{- .Values.clusterLabelValue -}}\", namespace=~\"{{- join "|" .Values.namespacesToMonitor -}}|{{- $.Release.Namespace -}}\", pod=~\"loki.*\"} !~ \"{{ include "agent.all_logs" . }}\""
action = "drop"
}
{{- end }}
@@ -80,16 +82,18 @@ data:
{{- if or .Values.local.metrics.enabled .Values.cloud.metrics.enabled }}
// Metrics
{{- if .Values.cloud.metrics.enabled }}
remote.kubernetes.secret "metrics_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.metrics.secret -}}"
}
{{- end }}
discovery.kubernetes "metric_pods" {
role = "pod"
namespaces {
own_namespace = true
names = [ {{ include "agent.namespaces" . }} ]
names = [ {{ include "agent.all_namespaces" . }} ]
}
}
@@ -131,9 +135,21 @@ data:
}
prometheus.relabel "filter" {
rule {
target_label = "cluster"
replacement = "{{- .Values.clusterLabelValue -}}"
}
rule {
source_labels = ["__name__"]
regex = "({{ join "|" .Values.metrics.retain }})"
regex = "({{ include "agent.all_metrics" . }})"
action = "keep"
}
rule {
source_labels = ["namespace"]
regex = "{{ include "agent.all_namespaces_bar" . }}"
action = "keep"
}
@@ -154,6 +170,10 @@ data:
// Based on https://github.com/Chewie/loutretelecom-manifests/blob/main/manifests/addons/monitoring/config.river
discovery.kubernetes "all_nodes" {
role = "node"
namespaces {
own_namespace = true
names = [ {{ include "agent.namespaces" . }} ]
}
}
discovery.relabel "all_nodes" {
@@ -267,24 +287,34 @@ data:
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }}
// Traces
{{- if .Values.cloud.traces.enabled }}
remote.kubernetes.secret "traces_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.traces.secret -}}"
}
{{- end }}
// Shamelessly copied from https://github.com/grafana/intro-to-mlt/blob/main/agent/config.river
otelcol.receiver.otlp "otlp_receiver" {
// We don't technically need this, but it shows how to change listen address and incoming port.
// In this case, the Agent is listening on all available bindable addresses on port 4317 (which is the
// default OTLP gRPC port) for the OTLP protocol.
grpc {
endpoint = "0.0.0.0:4317"
}
grpc {}
// We define where to send the output of all ingested traces. In this case, to the OpenTelemetry batch processor
// named 'default'.
output {
traces = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.receiver.jaeger "jaeger" {
protocols {
thrift_http {}
}
output {
traces = [otelcol.processor.batch.default.input]
}
}
@@ -305,7 +335,7 @@ data:
{{- if .Values.local.logs.enabled }}
loki.write "local" {
endpoint {
url = "http://loki-gateway.{{- .Release.Namespace -}}.svc.cluster.local:80/loki/api/v1/push"
url = "http://loki-write.{{- .Release.Namespace -}}.svc.cluster.local:3100/loki/api/v1/push"
}
}
{{- end }}
@@ -318,21 +348,10 @@ data:
}
{{- end }}
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }}
// The OpenTelemetry exporter exports processed trace spans to another target that is listening for OTLP format traces.
// A unique label, 'local', is added to uniquely identify this exporter.
otelcol.exporter.otlp "local" {
// Define the client for exporting.
{{- if .Values.local.traces.enabled }}
otelcol.exporter.otlphttp "local" {
client {
// Send to the locally running Tempo instance, on port 4317 (OTLP gRPC).
endpoint = "meta-tempo-distributor:4317"
// Configure TLS settings for communicating with the endpoint.
tls {
// The connection is insecure.
insecure = true
// Do not verify TLS certificates when connecting.
insecure_skip_verify = true
}
endpoint = "http://{{- .Release.Name -}}-tempo-distributor.{{- .Release.Namespace -}}.svc:4318"
}
}
{{- end }}
@@ -362,7 +381,7 @@ data:
{{- end }}
{{- if .Values.cloud.traces.enabled }}
otelcol.exporter.otlp "cloud" {
otelcol.exporter.otlphttp "cloud" {
client {
endpoint = nonsensitive(remote.kubernetes.secret.traces_credentials.data["endpoint"])
auth = otelcol.auth.basic.creds.handler

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: agent-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"agent-logs-pipeline.json": |
{{ $.Files.Get "src/dashboards/agent-logs-pipeline.json" | fromJson | toJson }}
"agent-operational.json": |
{{ $.Files.Get "src/dashboards/agent-operational.json" | fromJson | toJson }}
"agent-remote-write.json": |
{{ $.Files.Get "src/dashboards/agent-remote-write.json" | fromJson | toJson }}
"agent-tracing-pipeline.json": |
{{ $.Files.Get "src/dashboards/agent-tracing-pipeline.json" | fromJson | toJson }}
"agent.json": |
{{ $.Files.Get "src/dashboards/agent.json" | fromJson | toJson }}
{{- end }}

View File

@@ -0,0 +1,21 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: alloy-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"alloy-cluster-node.json": |
{{ $.Files.Get "src/dashboards/alloy-cluster-node.json" | fromJson | toJson }}
"alloy-cluster-overview.json": |
{{ $.Files.Get "src/dashboards/alloy-cluster-overview.json" | fromJson | toJson }}
"alloy-controller.json": |
{{ $.Files.Get "src/dashboards/alloy-controller.json" | fromJson | toJson }}
"alloy-opentelemetry.json": |
{{ $.Files.Get "src/dashboards/alloy-opentelemetry.json" | fromJson | toJson }}
"alloy-prometheus.json": |
{{ $.Files.Get "src/dashboards/alloy-prometheus.json" | fromJson | toJson }}
"alloy-resources.json": |
{{ $.Files.Get "src/dashboards/alloy-resources.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,4 +1,4 @@
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap
@@ -28,64 +28,12 @@ data:
orgId: 1
type: file
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-1
folder: Alloy
name: alloy-1
options:
path: /var/lib/grafana/dashboards/mimir-1
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-2
options:
path: /var/lib/grafana/dashboards/mimir-2
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-3
options:
path: /var/lib/grafana/dashboards/mimir-3
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-4
options:
path: /var/lib/grafana/dashboards/mimir-4
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-5
options:
path: /var/lib/grafana/dashboards/mimir-5
orgId: 1
type: file
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- disableDeletion: true
editable: false
folder: Tempo
name: tempo-1
options:
path: /var/lib/grafana/dashboards/tempo-1
orgId: 1
type: file
{{- end }}
- disableDeletion: true
editable: false
folder: Agent
name: agent-1
options:
path: /var/lib/grafana/dashboards/agent-1
path: /var/lib/grafana/dashboards/alloy-1
orgId: 1
type: file
{{- end }}

View File

@@ -12,7 +12,7 @@ data:
# List of data sources to delete from the database.
deleteDatasources:
- name: Loki
- name: Loki
orgId: 1
# List of data sources to insert/update depending on what's
@@ -32,7 +32,7 @@ data:
uid: loki_ds
# <string> Sets the data source's URL, including the
# port.
url: http://loki-gateway.{{- $.Release.Namespace -}}.svc.cluster.local
url: http://{{- $.Release.Namespace -}}-loki-gateway.{{- $.Release.Namespace -}}.svc.cluster.local
# <bool> Toggles whether the data source is pre-selected
# for new panels. You can set only one default
# data source per organization.
@@ -61,6 +61,10 @@ data:
# <bool> Allows users to edit data sources from the
# Grafana UI.
editable: true
# Extra config.
jsonData:
# Scrape interval
timeInterval: 1m
{{- end }}
{{- if .Values.local.traces.enabled }}
- name: Tempo

View File

@@ -1,16 +1,4 @@
{{- if .Values.local.grafana.enabled }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
@@ -32,7 +20,7 @@ spec:
- 0
containers:
- name: grafana
image: grafana/grafana:10.0.0
image: grafana/grafana:{{- .Values.grafana.version }}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
@@ -65,7 +53,7 @@ spec:
name: grafana-pv
- mountPath: /etc/grafana/provisioning/datasources
name: datasources-provisioning
{{- if or (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled) .Values.dashboards.traces.enabled }}
{{- if .Values.dashboards.logs.enabled }}
- mountPath: /etc/grafana/provisioning/dashboards
name: dashboards-provisioning
{{- end }}
@@ -75,24 +63,8 @@ spec:
- mountPath: /var/lib/grafana/dashboards/loki-2
name: loki-dashboards-2
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- mountPath: /var/lib/grafana/dashboards/mimir-1
name: mimir-dashboards-1
- mountPath: /var/lib/grafana/dashboards/mimir-2
name: mimir-dashboards-2
- mountPath: /var/lib/grafana/dashboards/mimir-3
name: mimir-dashboards-3
- mountPath: /var/lib/grafana/dashboards/mimir-4
name: mimir-dashboards-4
- mountPath: /var/lib/grafana/dashboards/mimir-5
name: mimir-dashboards-5
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- mountPath: /var/lib/grafana/dashboards/tempo-1
name: tempo-dashboards-1
{{- end }}
- mountPath: /var/lib/grafana/dashboards/agent-1
name: agent-dashboards-1
- mountPath: /var/lib/grafana/dashboards/alloy-1
name: alloy-dashboards-1
volumes:
- name: grafana-pv
persistentVolumeClaim:
@@ -111,44 +83,7 @@ spec:
configMap:
name: loki-dashboards-2
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- name: mimir-dashboards-1
- name: alloy-dashboards-1
configMap:
name: mimir-dashboards-1
- name: mimir-dashboards-2
configMap:
name: mimir-dashboards-2
- name: mimir-dashboards-3
configMap:
name: mimir-dashboards-3
- name: mimir-dashboards-4
configMap:
name: mimir-dashboards-4
- name: mimir-dashboards-5
configMap:
name: mimir-dashboards-5
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- name: tempo-dashboards-1
configMap:
name: tempo-dashboards-1
{{- end }}
- name: agent-dashboards-1
configMap:
name: agent-dashboards-1
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: ClusterIP # Make this configurable
name: alloy-dashboards-1
{{- end }}

View File

@@ -0,0 +1,12 @@
{{- if .Values.local.grafana.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
{{- end }}

View File

@@ -0,0 +1,15 @@
{{- if .Values.local.grafana.enabled }}
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: ClusterIP # Make this configurable
{{- end }}

View File

@@ -12,8 +12,6 @@ data:
{{ $.Files.Get "src/dashboards/loki-deletion.json" | fromJson | toJson }}
"loki-logs.json": |
{{ $.Files.Get "src/dashboards/loki-logs.json" | fromJson | toJson }}
"loki-mixin-recording-rules.json": |
{{ $.Files.Get "src/dashboards/loki-mixin-recording-rules.json" | fromJson | toJson }}
"loki-operational.json": |
{{ $.Files.Get "src/dashboards/loki-operational.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"mimir-alertmanager-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-alertmanager-resources.json" | fromJson | toJson }}
"mimir-alertmanager.json": |
{{ $.Files.Get "src/dashboards/mimir-alertmanager.json" | fromJson | toJson }}
"mimir-compactor-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-compactor-resources.json" | fromJson | toJson }}
"mimir-compactor.json": |
{{ $.Files.Get "src/dashboards/mimir-compactor.json" | fromJson | toJson }}
"mimir-config.json": |
{{ $.Files.Get "src/dashboards/mimir-config.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-2
namespace: {{ $.Release.Namespace }}
data:
"mimir-object-store.json": |
{{ $.Files.Get "src/dashboards/mimir-object-store.json" | fromJson | toJson }}
"mimir-overrides.json": |
{{ $.Files.Get "src/dashboards/mimir-overrides.json" | fromJson | toJson }}
"mimir-overview-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-overview-networking.json" | fromJson | toJson }}
"mimir-overview-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-overview-resources.json" | fromJson | toJson }}
"mimir-overview.json": |
{{ $.Files.Get "src/dashboards/mimir-overview.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-3
namespace: {{ $.Release.Namespace }}
data:
"mimir-queries.json": |
{{ $.Files.Get "src/dashboards/mimir-queries.json" | fromJson | toJson }}
"mimir-reads-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-reads-networking.json" | fromJson | toJson }}
"mimir-reads-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-reads-resources.json" | fromJson | toJson }}
"mimir-reads.json": |
{{ $.Files.Get "src/dashboards/mimir-reads.json" | fromJson | toJson }}
"mimir-remote-ruler-reads-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-remote-ruler-reads-resources.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-4
namespace: {{ $.Release.Namespace }}
data:
"mimir-remote-ruler-reads.json": |
{{ $.Files.Get "src/dashboards/mimir-remote-ruler-reads.json" | fromJson | toJson }}
"mimir-rollout-progress.json": |
{{ $.Files.Get "src/dashboards/mimir-rollout-progress.json" | fromJson | toJson }}
"mimir-ruler.json": |
{{ $.Files.Get "src/dashboards/mimir-ruler.json" | fromJson | toJson }}
"mimir-scaling.json": |
{{ $.Files.Get "src/dashboards/mimir-scaling.json" | fromJson | toJson }}
"mimir-slow-queries.json": |
{{ $.Files.Get "src/dashboards/mimir-slow-queries.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-5
namespace: {{ $.Release.Namespace }}
data:
"mimir-tenants.json": |
{{ $.Files.Get "src/dashboards/mimir-tenants.json" | fromJson | toJson }}
"mimir-top-tenants.json": |
{{ $.Files.Get "src/dashboards/mimir-top-tenants.json" | fromJson | toJson }}
"mimir-writes-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-writes-networking.json" | fromJson | toJson }}
"mimir-writes-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-writes-resources.json" | fromJson | toJson }}
"mimir-writes.json": |
{{ $.Files.Get "src/dashboards/mimir-writes.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,21 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.traces.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: tempo-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"tempo-operational.json": |
{{ $.Files.Get "src/dashboards/tempo-operational.json" | fromJson | toJson }}
"tempo-reads.json": |
{{ $.Files.Get "src/dashboards/tempo-reads.json" | fromJson | toJson }}
"tempo-resources.json": |
{{ $.Files.Get "src/dashboards/tempo-resources.json" | fromJson | toJson }}
"tempo-rollout-progress.json": |
{{ $.Files.Get "src/dashboards/tempo-rollout-progress.json" | fromJson | toJson }}
"tempo-tenants.json": |
{{ $.Files.Get "src/dashboards/tempo-tenants.json" | fromJson | toJson }}
"tempo-writes.json": |
{{ $.Files.Get "src/dashboards/tempo-writes.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,5 +1,5 @@
{{- if .Values.local.grafana.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
apiVersion: apps/v1
kind: Deployment
metadata:
@@ -49,6 +49,9 @@ spec:
- containerPort: 7946
name: memberlist
protocol: TCP
envFrom:
- secretRef:
name: minio
readinessProbe:
failureThreshold: 3
httpGet:

View File

@@ -1,5 +1,5 @@
{{- if .Values.local.metrics.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap
@@ -10,11 +10,5 @@ data:
{{- if .Values.dashboards.logs.enabled }}
{{ ($.Files.Glob "src/rules/loki-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
{{ ($.Files.Glob "src/rules/mimir-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
{{ ($.Files.Glob "src/rules/tempo-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -2,8 +2,7 @@
namespacesToMonitor:
- loki
# The name of the cluster where this will be installed
clusterLabelValue: "meta-monitoring"
clusterLabelValue: "meta"
# Set to true to write logs, metrics or traces to Grafana Cloud
# The secrets have to be created first
cloud:
@@ -16,7 +15,6 @@ cloud:
traces:
enabled: true
secret: "traces"
# Set to true for a local version of logs, metrics or traces
local:
grafana:
@@ -28,9 +26,9 @@ local:
traces:
enabled: false
minio:
enabled: false # This should be set to true if any of the previous is enabled
enabled: false # This should be set to true if any of the previous is enabled
grafana:
version: 10.4.2
# Gateway ingress configuration
ingress:
# -- Specifies whether an ingress for the gateway should be created
@@ -38,51 +36,90 @@ grafana:
# -- Ingress Class Name. MAY be required for Kubernetes versions >= 1.18
ingressClassName: ""
# -- Annotations for the gateway ingress
annotations: { }
annotations: {}
# -- Labels for the gateway ingress
labels: { }
labels: {}
# -- Hosts configuration for the gateway ingress, passed through the `tpl` function to allow templating
hosts:
- host: monitoring.example.com
paths:
- path: /
# -- pathType (e.g. ImplementationSpecific, Prefix, .. etc.) might also be required by some Ingress Controllers
# pathType: Prefix
pathType: Prefix
# backend:
# service:
# name: TODO
# port:
# number: TODO
# -- TLS configuration for the gateway ingress. Hosts passed through the `tpl` function to allow templating
#tls:
# - secretName: grafana-tls
# hosts:
# - monitoring.example.com
logs:
# Adding regexes here will add a stage.replace block for logs. For more information see
# https://grafana.com/docs/agent/latest/flow/reference/components/loki.process/#stagereplace-block
piiRegexes:
# This example replaces the word after password with *****
# - expression: "password (\\\\S+)"
# source: "" # Empty uses the log message
# replace: "*****""
# The lines matching these will be kept in Loki
piiRegexes: null # This example replaces the word after password with *****
# - expression: "password (\\\\S+)"
# source: "" # Empty uses the log message
# replace: "*****""
# The lines matching these will be kept in Loki
retain:
# This shows the queries
- caller=metrics.go
# This shows any errors
- level=error
# This shows the ingest requests and is very noisy. Uncomment to include.
# - caller=push.go
# Log lines for delete requests
- delete request for user added
- Started processing delete request
- delete request for user marked as processed
# This shows the ingest requests and is very noisy. Uncomment to include.
# - caller=push.go
# Additional log lines to retain
extraLogs: []
metrics:
# The list of metrics to retain for logging dashboards
retain:
- agent_config_last_load_success_timestamp_seconds
- agent_config_last_load_successful
- agent_config_load_failures_total
- alloy_build_info
- alloy_config_last_load_success_timestamp_seconds
- alloy_config_last_load_successful
- alloy_config_load_failures_total
- alloy_component_controller_evaluating
- alloy_component_dependencies_wait_seconds
- alloy_component_dependencies_wait_seconds_bucket
- alloy_component_evaluation_seconds
- alloy_component_evaluation_seconds_bucket
- alloy_component_evaluation_seconds_count
- alloy_component_evaluation_seconds_sum
- alloy_component_evaluation_slow_seconds
- alloy_component_controller_running_components
- alloy_resources_machine_rx_bytes_total
- alloy_resources_machine_tx_bytes_total
- alloy_resources_process_cpu_seconds_total
- alloy_resources_process_resident_memory_bytes
- prometheus_remote_write_wal_samples_appended_total
- prometheus_remote_write_wal_storage_active_series
- cluster_node_info
- cluster_node_lamport_time
- cluster_node_update_observers
- cluster_node_gossip_health_score
- cluster_node_gossip_proto_version
- cluster_node_gossip_received_events_total
- cluster_node_peers
- cluster_transport_rx_bytes_total
- cluster_transport_rx_packets_total
- cluster_transport_rx_packets_failed_total
- cluster_transport_stream_rx_bytes_total
- cluster_transport_stream_rx_packets_failed_total
- cluster_transport_stream_rx_packets_total
- cluster_transport_stream_tx_bytes_total
- cluster_transport_stream_tx_packets_total
- cluster_transport_stream_tx_packets_failed_total
- cluster_transport_streams
- cluster_transport_tx_packets_total
- cluster_transport_tx_packets_failed_total
- cluster_transport_rx_packet_queue_length
- cluster_transport_tx_packet_queue_length
- container_cpu_usage_seconds_total
- container_fs_writes_bytes_total
- container_memory_working_set_bytes
@@ -98,15 +135,21 @@ metrics:
- cortex_prometheus_rule_group_last_duration_seconds
- cortex_prometheus_rule_group_last_evaluation_timestamp_seconds
- cortex_prometheus_rule_group_iterations_missed_total
- exporter_send_failed_spans_ratio_total
- exporter_sent_spans_ratio_total
- go_gc_duration_seconds
- go_gc_duration_seconds_count
- go_goroutines
- go_memstats_heap_inuse_bytes
- kubelet_volume_stats_used_bytes
- kubelet_volume_stats_capacity_bytes
- kube_deployment_created
- kube_persistentvolumeclaim_labels
- kube_pod_container_info
- kube_pod_container_resource_requests
- kube_pod_container_status_last_terminated_reason
- kube_pod_container_status_restarts_total
- loki_azure_blob_request_duration_seconds_bucket
- loki_boltdb_shipper_compact_tables_operation_duration_seconds
- loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_seconds
- loki_boltdb_shipper_retention_marker_count_total
@@ -132,10 +175,16 @@ metrics:
- loki_compactor_deleted_lines
- loki_compactor_oldest_pending_delete_request_age_seconds
- loki_compactor_pending_delete_requests_count
- loki_consul_request_duration_seconds_bucket
- loki_discarded_samples_total
- loki_discarded_bytes_total
- loki_distributor_bytes_received_total
- loki_distributor_lines_received_total
- loki_distributor_structured_metadata_bytes_received_total
- loki_gcs_request_duration_seconds_bucket
- loki_gcs_request_duration_seconds_count
- loki_index_request_duration_seconds_bucket
- loki_index_request_duration_seconds_count
- loki_ingester_chunk_age_seconds_bucket
- loki_ingester_chunk_age_seconds_count
- loki_ingester_chunk_age_seconds_sum
@@ -147,6 +196,7 @@ metrics:
- loki_ingester_chunk_entries_sum
- loki_ingester_chunk_size_bytes_bucket
- loki_ingester_chunk_utilization_bucket
- loki_ingester_chunk_utilization_count
- loki_ingester_chunk_utilization_sum
- loki_ingester_chunks_flushed_total
- loki_ingester_flush_queue_length
@@ -164,6 +214,8 @@ metrics:
- loki_ruler_wal_prometheus_remote_storage_samples_total
- loki_ruler_wal_samples_appended_total
- loki_ruler_wal_storage_created_series_total
- loki_s3_request_duration_seconds_bucket
- loki_s3_request_duration_seconds_count
- loki_write_batch_retries_total
- loki_write_dropped_bytes_total
- loki_write_dropped_entries_total
@@ -171,22 +223,64 @@ metrics:
- loki_write_sent_entries_total
- node_disk_read_bytes_total
- node_disk_written_bytes_total
- process_start_time_seconds
- processor_batch_batch_send_size_ratio_bucket
- processor_batch_metadata_cardinality_ratio
- processor_batch_timeout_trigger_send_ratio_total
- prometheus_remote_storage_bytes_total
- prometheus_remote_storage_enqueue_retries_total
- prometheus_remote_storage_highest_timestamp_in_seconds
- prometheus_remote_storage_metadata_bytes_total
- prometheus_remote_storage_queue_highest_sent_timestamp_seconds
- prometheus_remote_storage_samples_dropped_total
- prometheus_remote_storage_samples_failed_total
- prometheus_remote_storage_samples_pending
- prometheus_remote_storage_samples_retried_total
- prometheus_remote_storage_samples_total
- prometheus_remote_storage_sent_batch_duration_seconds_bucket
- prometheus_remote_storage_sent_batch_duration_seconds_count
- prometheus_remote_storage_sent_batch_duration_seconds_sum
- prometheus_remote_storage_shard_capacity
- prometheus_remote_storage_shards
- prometheus_remote_storage_shards_desired
- prometheus_remote_storage_shards_max
- prometheus_remote_storage_shards_min
- prometheus_remote_storage_succeeded_samples_total
- prometheus_remote_write_wal_samples_appended_total
- prometheus_remote_write_wal_storage_active_series
- prometheus_sd_discovered_targets
- prometheus_target_interval_length_seconds_count
- prometheus_target_interval_length_seconds_sum
- prometheus_target_scrapes_exceeded_sample_limit_total
- prometheus_target_scrapes_sample_duplicate_timestamp_total
- prometheus_target_scrapes_sample_out_of_bounds_total
- prometheus_target_scrapes_sample_out_of_order_total
- prometheus_target_sync_length_seconds_sum
- prometheus_wal_watcher_current_segment
- promtail_custom_bad_words_total
# Set enabled = true to add the default logs/metrics/traces dashboards to the local Grafana
- promtail_dropped_bytes_total
- promtail_files_active_total
- promtail_read_bytes_total
- promtail_read_lines_total
- promtail_request_duration_seconds_bucket
- promtail_sent_entries_total
- rpc_server_duration_milliseconds_bucket
- receiver_accepted_spans_ratio_total
- receiver_refused_spans_ratio_total
- scrape_duration_seconds
- traces_exporter_sent_spans
- traces_exporter_send_failed_spans
- traces_loadbalancer_backend_outcome
- traces_loadbalancer_num_backends
- traces_receiver_accepted_spans
- traces_receiver_refused_spans
- up
# Additional metrics to retain
extraMetrics: []
# Set enabled = true to add the default logs dashboards to the local Grafana
dashboards:
logs:
enabled: true
metrics:
enabled: true
traces:
enabled: true
global:
minio:
rootUser: "rootuser"
rootPassword: "rootpassword"
kubeStateMetrics:
# Scrape https://github.com/kubernetes/kube-state-metrics by default
enabled: true
@@ -194,10 +288,8 @@ kubeStateMetrics:
# https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics/
# is used. Change this if kube-state-metrics is installed somewhere else.
endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080
# The following are configuration for the dependencies.
# These should usually not be changed.
loki:
loki:
auth_enabled: false
@@ -222,9 +314,9 @@ loki:
common:
storage:
s3:
access_key_id: "{{ .Values.global.minio.rootUser }}"
access_key_id: "${rootUser}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
secret_access_key: "${rootPassword}"
compactor:
retention_enabled: true
delete_request_store: s3
@@ -247,9 +339,24 @@ loki:
installOperator: false
lokiCanary:
enabled: false
test:
enabled: false
write:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
read:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
backend:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
alloy:
alloy:
clustering:
@@ -264,6 +371,15 @@ alloy:
memory: '600Mi'
limits:
memory: '4Gi'
extraPorts:
- name: "otel"
port: 4317
targetPort: 4317
protocol: "TCP"
- name: "thrifthttp"
port: 14268
targetPort: 14268
protocol: "TCP"
controller:
type: "statefulset"
autoscaling:
@@ -272,37 +388,36 @@ alloy:
maxReplicas: 30
targetMemoryUtilizationPercentage: 90
targetCPUUtilizationPercentage: 90
mimir-distributed:
minio:
enabled: false
global:
extraEnvFrom:
- secretRef:
name: "minio"
mimir:
structuredConfig:
alertmanager_storage:
s3:
bucket_name: mimir-ruler
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
blocks_storage:
backend: s3
s3:
bucket_name: mimir-tsdb
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
ruler_storage:
s3:
bucket_name: mimir-ruler
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
common:
storage:
backend: s3
s3:
bucket_name: mimir-ruler
access_key_id: "${rootUser}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "${rootPassword}"
insecure: true
limits:
compactor_blocks_retention_period: 30d
tempo-distributed:
tempo:
structuredConfig:
@@ -312,22 +427,47 @@ tempo-distributed:
s3:
bucket: tempo
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
access_key: "{{ .Values.global.minio.rootUser }}"
secret_key: "{{ .Values.global.minio.rootPassword }}"
access_key: "${rootUser}"
secret_key: "${rootPassword}"
insecure: true
compactor:
compaction:
block_retention: 30d
distributor:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
ingester:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
compactor:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
querier:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
queryFrontend:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
traces:
otlp:
http:
enabled: true
grpc:
enabled: true
minio:
rootUser: rootuser
rootPassword: rootpassword
existingSecret: "minio"
buckets:
- name: loki-chunks
policy: none

View File

@@ -25,21 +25,21 @@
```
kubectl create secret generic logs -n meta \
--from-literal=username=<logs username> \
--from-literal=password=<token>
--from-literal=password=<token> \
--from-literal=endpoint='https://logs-prod-us-central1.grafana.net/loki/api/v1/push'
kubectl create secret generic metrics -n meta \
--from-literal=username=<metrics username> \
--from-literal=password=<token>
--from-literal=password=<token> \
--from-literal=endpoint='https://prometheus-us-central1.grafana.net/api/prom/push'
kubectl create secret generic traces -n meta \
--from-literal=username=<traces username> \
--from-literal=password=<token>
--from-literal=endpoint='https://tempo-us-central1.grafana.net/tempo'
--from-literal=username=<OTLP instance ID> \
--from-literal=password=<token> \
--from-literal=endpoint='https://otlp-gateway-prod-us-east-0.grafana.net/otlp'
```
The logs, metrics and traces usernames are the `User / Username / Instance IDs` of the Loki, Prometheus/Mimir and Tempo instances in Grafana Cloud. From `Home` in Grafana click on `Stacks`. Then go to the `Details` pages of Loki, Prometheus/Mimir and Tempo.
The logs, metrics and traces usernames are the `User / Username / Instance IDs` of the Loki, Prometheus/Mimir and OpenTelemetry instances in Grafana Cloud. From `Home` in Grafana click on `Stacks`. Then go to the `Details` pages of Loki and Prometheus/Mimir. For OpenTelemetry go to the `Configure` page.
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Fill in the names of the secrets created above as needed. An example minimal values.yaml looks like this:
@@ -67,6 +67,14 @@
kubectl create namespace meta
```
1. Create a secret named `minio` with the user and password for the local Minio:
```
kubectl create secret generic minio -n meta \
--from-literal=rootPassword=<password> \
--from-literal=rootUser=<user>
```
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). An example minimal values.yaml looks like this:
```
@@ -163,4 +171,27 @@ For each of the dashboard files in charts/meta-monitoring/src/dashboards folder
```
mimirtool rules print --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token>
```
```
## Configure Loki to send traces
1. In the Loki config enable tracing:
```
loki:
tracing:
enabled: true
```
1. Add the following environment variables to your Loki binaries. When using the Loki Helm chart these can be added using the `extraEnv` setting for the Loki components.
1. JAEGER_ENDPOINT: http address of the mmc-alloy service installed by the meta-monitoring chart, for example "http://mmc-alloy:14268/api/traces"
1. JAEGER_AGENT_TAGS: extra tags you would like to add to the spans, for example 'cluster="abc",namespace="def"'
1. JAEGER_SAMPLER_TYPE: the sampling strategy, we suggest setting this to `ratelimiting` so at most 1 trace is accepted per second. See these [docs](https://www.jaegertracing.io/docs/1.57/sampling/) for more options.
1. JAEGER_SAMPLER_PARAM: 1.0
1. If Loki is installed in a different namespace you can create an [ExternalName service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) in Kubernetes to point to the mmc-alloy service in the meta monitoring namespace
## Configure external access using an Ingress in local mode
When using local mode by default a Kubernetes [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) object is created to access the Grafana instance. This will need to be adapted to your cloud provider by updating the `grafana.ingress` section of the `values.yaml` file provided to Helm. Check the documentation of your cloud provider for available options.