Compare commits

..

94 Commits

Author SHA1 Message Date
Michel Hollands
57adbf43e2 Split up Grafana yaml
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 10:39:02 +01:00
Michel Hollands
add43ae974 Merge pull request #105 from grafana/remove_redundant_variables
Remove unused variables
2024-05-08 10:25:37 +01:00
Michel Hollands
52ec526718 Remove unused variables
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 10:24:48 +01:00
Michel Hollands
8a5ed559a2 Merge pull request #104 from grafana/fix_dependency_check
Fix name and indentation of workflow
2024-05-08 09:49:07 +01:00
Michel Hollands
188cd7e56f Fix name and indentation of workflow
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-08 09:46:42 +01:00
Michel Hollands
9e4dbcd44a Merge pull request #100 from grafana/combine_ci
Combine dependency updates
2024-05-08 09:40:07 +01:00
Michel Hollands
28daa27fca Merge pull request #99 from grafana/chore/update-minio
[dependency] Update the Grafana version
2024-05-08 09:38:26 +01:00
Michel Hollands
2de595baf4 Merge branch 'main' into chore/update-minio 2024-05-08 09:37:45 +01:00
Michel Hollands
95257b66d3 Merge pull request #103 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-08 09:36:02 +01:00
Michel Hollands
e9b0e57ef0 Merge pull request #95 from grafana/update_grafana
Add CI action to update Grafana version
2024-05-08 09:35:29 +01:00
Michel Hollands
03609ebb35 Merge pull request #102 from grafana/fix_alloy_config_for_traces
Fix the alloy config
2024-05-08 09:34:53 +01:00
MichelHollands
7e38d19814 Update Tempo Distributed 2024-05-08 07:03:26 +00:00
Michel Hollands
32272298d7 Fix the alloy config
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 16:35:00 +01:00
Michel Hollands
3879207e05 Merge pull request #101 from grafana/fix_minio_secret_name
Fix secret name
2024-05-07 14:40:52 +01:00
Michel Hollands
cd42da2197 Fix secret name
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 14:39:20 +01:00
Michel Hollands
56cab04af8 Merge pull request #92 from grafana/use_secret_for_minio
Use a secret for the Minio access
2024-05-07 12:37:07 +01:00
Michel Hollands
c6d0444dfa Combine dependency updates
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 11:26:32 +01:00
Michel Hollands
b99140d3f4 Merge pull request #97 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-07 11:00:53 +01:00
MichelHollands
749e271455 Update Tempo Distributed 2024-05-07 09:22:20 +00:00
MichelHollands
d938dbbfe5 Update Grafana version 2024-05-07 09:22:19 +00:00
Michel Hollands
e9125d1a9c Add corrected key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:21:42 +01:00
Michel Hollands
076685ef06 Revert key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:18:55 +01:00
Michel Hollands
b0451d626e Use $. in yaml key
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:16:10 +01:00
Michel Hollands
90e949e89a Change version param
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:14:50 +01:00
Michel Hollands
06e176e720 Trim the v prefix from the released version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:11:17 +01:00
Michel Hollands
d4c886ba9d Use token from env
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 10:00:55 +01:00
Michel Hollands
643e73f5f1 add token
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:54:50 +01:00
Michel Hollands
7e65f3d9c9 Fix sourceid
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:46:31 +01:00
Michel Hollands
de91b4dac7 Merge pull request #96 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-07 09:22:43 +01:00
MichelHollands
9f6e52d7a1 Update Tempo Distributed 2024-05-07 08:22:14 +00:00
Michel Hollands
26e0ad0b85 Add CI action to update Grafana version
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-07 09:20:51 +01:00
Michel Hollands
025bb5b0c3 Merge pull request #94 from grafana/chore/update-loki
[dependency] Update the Loki subchart
2024-05-07 08:51:44 +01:00
MichelHollands
0b31eae425 Update loki 2024-05-07 07:02:51 +00:00
Michel Hollands
ab42a96949 Update installation instructions
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:29:33 +01:00
Michel Hollands
386ff25fca Use the secret in the ruler for the dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:18:44 +01:00
Michel Hollands
c6889131a7 Use structuredConfig correctly
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-06 16:12:48 +01:00
Michel Hollands
2739bae0c0 Use correct variables
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 15:40:36 +01:00
Michel Hollands
cea8076b75 Start using a secret
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 15:38:07 +01:00
Michel Hollands
29b831ca00 Merge pull request #91 from grafana/set_step_time_to_1_minute_in_dashboards
Set the interval to 1m to match the scrape interval
2024-05-03 09:58:40 +01:00
Michel Hollands
09cf8f812c Merge pull request #90 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-05-03 09:58:25 +01:00
Michel Hollands
f8436a8e44 Set the interval to 1m to match the scrape interval
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-03 09:28:26 +01:00
MichelHollands
2b26abedbb Update Tempo Distributed 2024-05-03 07:02:32 +00:00
Michel Hollands
017c041007 Merge pull request #89 from grafana/add_extra_metrics
Add way to gather extra metrics and logs
2024-05-02 17:15:18 +01:00
Michel Hollands
e7ad1383a6 Add way to gather extra metrics and logs
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 17:14:09 +01:00
Michel Hollands
2906836eae Merge pull request #88 from grafana/more_dashboard_cleanup
Cleanup dashboards
2024-05-02 14:25:50 +01:00
Michel Hollands
c70ef27e48 Set scrape interval to 1m
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 14:24:36 +01:00
Michel Hollands
3c187def47 Fix error in query
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 14:24:09 +01:00
Michel Hollands
54eda36ec3 Cleanup dashboards we won't ship
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 13:30:45 +01:00
Michel Hollands
bc33e5a2a5 Merge pull request #83 from grafana/chore/update-loki
[dependency] Update the Loki subchart
2024-05-02 13:29:03 +01:00
Michel Hollands
31e82bbf16 Merge pull request #87 from grafana/remove_non_loki_dashboards
Remove non Loki and agent charts
2024-05-02 10:49:11 +01:00
Michel Hollands
52c1bf1778 Remove non loki and agent charts
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-02 10:47:58 +01:00
MichelHollands
2c5c4d8e38 Update loki 2024-05-02 07:02:34 +00:00
Michel Hollands
b6a5a3cfe3 Merge pull request #85 from grafana/open_extra_ports_for_otlp
Enabled traces from Loki
2024-05-01 17:40:35 +01:00
Michel Hollands
a01992194b Use OpenTelemetry token for traces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 17:24:38 +01:00
Michel Hollands
636b654828 Add docs on how to setup Loki
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 17:20:00 +01:00
Michel Hollands
5d553e50f6 Add config for Loki traces to OTEL
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 16:54:51 +01:00
Michel Hollands
3f200115f9 Consider the meta monitoring namespace for logs as well
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-01 10:16:57 +01:00
Michel Hollands
f0bdf0760d Merge pull request #82 from grafana/fix_correct_version
Update version correctly
2024-04-29 16:44:48 +01:00
Michel Hollands
314b1db19b Update version correctly
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:44:28 +01:00
Michel Hollands
b547784d54 Merge pull request #81 from grafana/update_version
Update Chart to version 0.0.2
2024-04-29 16:40:03 +01:00
Michel Hollands
af4cd1f8c0 Update to version 2
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:39:05 +01:00
Michel Hollands
116119bdc4 Merge pull request #80 from grafana/fix_logic_for_secrets
Fix a few more things
2024-04-29 16:36:24 +01:00
Michel Hollands
df794115f0 Fix a few more things
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 16:34:21 +01:00
Michel Hollands
c26e509f65 Merge pull request #79 from grafana/use_updated_dashboards2
Cleanup dashboards
2024-04-29 15:17:15 +01:00
Michel Hollands
95f7905e34 Cleanup dashboards
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-29 15:14:43 +01:00
Michel Hollands
ad1b619a33 Merge pull request #78 from grafana/chore/update-minio
[dependency] Update the Minio subchart
2024-04-29 08:35:41 +01:00
MichelHollands
446c0be743 Update minio 2024-04-29 07:02:52 +00:00
Michel Hollands
be7a32de27 Merge pull request #77 from grafana/filter_cadvisor_kubelet_metrics_on_namespace
Only get cadvisor and kubelet metrics from the required namespaces
2024-04-28 14:33:32 +01:00
Michel Hollands
e41b2f360f Merge branch 'main' into filter_cadvisor_kubelet_metrics_on_namespace 2024-04-28 14:33:07 +01:00
Michel Hollands
1cafd696c7 Merge pull request #76 from grafana/add_creation_of_dashboard
Create the mixin locally
2024-04-26 17:27:46 +01:00
Michel Hollands
c614f41d66 Only keep metrics from the monitored namespaces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 16:45:40 +01:00
Michel Hollands
2144cea411 Reformat loki-rules
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 14:43:08 +01:00
Michel Hollands
81a017551b This was moved to a separate PR
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 14:20:39 +01:00
Michel Hollands
1871a4ef87 Only get cadvisor and kubelet metrics from the required namespaces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 14:15:33 +01:00
Michel Hollands
11d80263a7 Update dashboards from Loki
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-26 14:08:59 +01:00
Michel Hollands
cdb0bee56e First draft
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-25 19:15:00 +01:00
Michel Hollands
58171a6a42 Merge pull request #75 from grafana/more_docs_fixes
Small docs fixes
2024-04-25 15:29:02 +01:00
Michel Hollands
c65445384b Merge pull request #72 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-04-25 15:28:37 +01:00
Michel Hollands
1f980f393e Small docs fixes
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-25 15:28:09 +01:00
Michel Hollands
47d9190eda Merge pull request #74 from grafana/add_more_docs
Add docs on how to install dashboards and rules in the cloud
2024-04-25 15:25:26 +01:00
Michel Hollands
5ff9bd16c9 Add docs on how to install dashboards and rules in the cloud
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-25 15:22:18 +01:00
MichelHollands
d6faaf88f5 Update Tempo Distributed 2024-04-25 07:02:32 +00:00
Michel Hollands
2d711f7168 Merge pull request #73 from grafana/update_main_page
Update installation instructions
2024-04-24 10:48:09 +01:00
Michel Hollands
c666bf69c9 Remove whitespace
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-24 10:46:31 +01:00
Michel Hollands
41619b99b1 Update example values.yaml for local mode
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-24 10:45:38 +01:00
Michel Hollands
5923139796 Update installation instructions
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-24 10:40:52 +01:00
Michel Hollands
329d5822ea Merge pull request #71 from grafana/update_installation_instructions
Update installation instructions
2024-04-23 11:12:35 +01:00
Michel Hollands
5498b27ad6 Add repo update step
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-23 11:12:00 +01:00
Michel Hollands
da687315e7 Merge pull request #70 from grafana/chore/update-loki
[dependency] Update the Loki subchart
2024-04-23 11:08:51 +01:00
Michel Hollands
8f20e45c77 Update installation instructions
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-04-23 09:25:03 +01:00
Michel Hollands
e81b1246f5 Merge pull request #69 from grafana/fix_release_flow
Get Release Helm chart Github action working
2024-04-23 08:51:54 +01:00
MichelHollands
b103fb3434 Update loki 2024-04-23 07:02:51 +00:00
Michel Hollands
9349d2d906 Merge pull request #68 from grafana/chore/update-tempo-distributed
[dependency] Update the Tempo Distributed subchart
2024-04-22 14:02:07 +01:00
MichelHollands
0938193982 Update Tempo Distributed 2024-04-22 07:03:00 +00:00
74 changed files with 6239 additions and 59865 deletions

View File

@@ -0,0 +1,30 @@
name: Bump grafana version specified in the values.yaml
sources:
latestGrafanaRelease:
name: Get latest grafana release on Github
kind: githubrelease
spec:
owner: grafana
repository: grafana
token: '{{ requiredEnv "UPDATECLI_GITHUB_TOKEN" }}'
versionfilter:
kind: latest
transformers:
- trimprefix: "v"
conditions:
grafanaImagePublished:
name: Ensure the latest Grafana is published on DockerHub
kind: dockerimage
source-id: latestGrafanaRelease
spec:
image: "grafana/grafana"
targets:
grafana:
name: Update Grafana version in values.yaml
kind: helmchart
spec:
file: values.yaml
key: $.grafana.version
name: charts/meta-monitoring
versionincrement: none
sourceid: latestGrafanaRelease

View File

@@ -16,8 +16,8 @@ env:
UPDATECLI_GITHUB_TOKEN: "${{ secrets.GITHUB_TOKEN }}"
jobs:
updateLoki:
name: Update the Loki subchart
updateVersions:
name: Update the subcharts
runs-on: "ubuntu-latest"
steps:
- name: Checkout
@@ -26,7 +26,7 @@ jobs:
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Loki
id: update-loki
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/loki.yaml
@@ -34,31 +34,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-loki.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Loki subchart"
body: "Updates the Loki subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update loki
labels: dependencies
branch: chore/update-loki
delete-branch: true
updateGrafanaAlloy:
name: Update the Grafana Alloy subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Alloy
id: update-grafana-alloy
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/alloy.yaml
@@ -66,31 +42,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-grafana-alloy.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Grafana Alloy subchart"
body: "Updates the Grafana Alloy subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Grafana Alloy
labels: dependencies
branch: chore/update-grafana-alloy
delete-branch: true
updateMimirDistributed:
name: Update the Mimir Distributed subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Mimir
id: update-mimir-distributed
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/mimir-distributed.yaml
@@ -98,31 +50,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-mimir-distributed.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Mimir Distributed subchart"
body: "Updates the Mimir Distributed subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Mimir Distributed
labels: dependencies
branch: chore/update-mimir-distributed
delete-branch: true
updateTempoDistributed:
name: Update the Tempo Distributed subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Tempo
id: update-tempo-distributed
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/tempo-distributed.yaml
@@ -130,31 +58,7 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-tempo-distributed.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Tempo Distributed subchart"
body: "Updates the tempo Distributed subchart"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Tempo Distributed
labels: dependencies
branch: chore/update-tempo-distributed
delete-branch: true
updateMinio:
name: Update the Minio subchart
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
- name: Run Updatecli for Minio
id: update-minio
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/minio.yaml
@@ -163,15 +67,47 @@ jobs:
fi
- name: Create pull request
if: steps.update-minio.outputs.changed == 'true'
if: steps.update-loki.outputs.changed == 'true' || steps.update-grafana-alloy.outputs.changed == 'true' || steps.update-mimir-distributed.outputs.changed == 'true' || steps.update-tempo-distributed.outputs.changed == 'true' || steps.update-minio.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Minio subchart"
body: "Updates the Minio subchart"
title: "[dependency] Update the subcharts"
body: "Updates the subcharts"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update minio
commit-message: Update dependencies
labels: dependencies
branch: chore/update-dependencies
delete-branch: true
updateGrafana:
name: Update the Grafana version
runs-on: "ubuntu-latest"
steps:
- name: Checkout
uses: actions/checkout@v2
- name: Install Updatecli
uses: updatecli/updatecli-action@v2
- name: Run Updatecli
id: update-grafana
run: |
updatecli apply --config ${UPDATECLI_CONFIG_DIR}/grafana.yaml
if ! git diff --exit-code > /dev/null; then
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- name: Create pull request
if: steps.update-grafana.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
with:
title: "[dependency] Update the Grafana version"
body: "Updates the Grafana version"
base: main
author: "${{ github.actor }} <${{ github.actor }}@users.noreply.github.com>"
committer: "GitHub <noreply@github.com>"
commit-message: Update Grafana version
labels: dependencies
branch: chore/update-minio
delete-branch: true

1
.gitignore vendored Normal file
View File

@@ -0,0 +1 @@
.DS_Store

View File

@@ -1,20 +1,9 @@
# meta-monitoring-chart
This is a meta-monitoring chart for GEL, GEM and GET. It should be installed in a
separate namespace next to GEM, GEL or GET installations.
This is a meta-monitoring chart for Loki.
Note that this is pre-production software at the moment.
## Preparation
Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml).
1. Add or remove the namespaces to monitor in the `namespacesToMonitor` setting
1. Set the cluster name in the `clusterName` setting. This will be added as a label to all logs, metrics and traces.
1. Create a `meta` namespace.
## Local and cloud modes
The chart has 2 modes: local and cloud. In the local mode logs, metrics and/or traces are sent
@@ -30,16 +19,10 @@ In the cloud mode the logs, metrics and/or traces are sent to Grafana Cloud.
To enable cloud mode set `cloud.<logs|metrics|traces>.enabled` to true. The `endpoint`, `username` and `password` settings for your Grafana Cloud logs, metrics and traces instances have to be filled in as well.
Both modes can be enabled at the same time.
Both modes can be enabled at the same time. Cloud mode is preferred.
## Installation
```
helm install -n meta --skip-crds -f values.yaml meta ./charts/meta-monitoring
```
If the platform supports CRDs the `--skip-crds` option can be removed. However the CRDs are not used by this chart.
For more instructions including how to update the chart go to the [installation](docs/installation.md) page.
## Supported features
@@ -50,8 +33,6 @@ For more instructions including how to update the chart go to the [installation]
- Specify PII regexes that are applied to logs before they are sent to Loki (cloud or local). The capture group in the regex is replaced with *****.
- a Grafana instance is installed (when local mode is used) with the relevant datasources installed. The following dashboards are installed:
- logs dashboards
- metrics dashboards
- traces dashboards
- agent dashboards
- Retention is set to 24 hours
@@ -59,7 +40,6 @@ Most of these features are enabled by default. See the values.yaml file for how
## Caveats
- The [loki.source.kubernetes](https://grafana.com/docs/agent/latest/flow/reference/components/loki.source.kubernetes/) component of the Grafana Agent is used to scrape Kubernetes log files. This component is marked experimental at the moment.
- This has not been tested on Openshift yet.
- The underlying Loki, Mimir and Tempo are at the default size installed by the Helm chart. This might need changing when monitoring bigger Loki, Mimir or Tempo installations.
- MinIO is used as storage at the moment with a limited retention. At the moment this chart cannot be used for monitoring over longer periods.

View File

@@ -1,7 +1,7 @@
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.2.0
version: 6.5.0
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.1.1
@@ -10,9 +10,9 @@ dependencies:
version: 5.3.0
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.2
version: 1.9.9
- name: minio
repository: https://charts.min.io
version: 5.1.0
digest: sha256:f9a79bfc30df65ba2ff94a097844f6b3aa41970318d5fb1708b3aeecebbe68d1
generated: "2024-04-16T08:10:03.998180905Z"
version: 5.2.0
digest: sha256:5328702b5f6b0487aba8f7bc77d6abfcd5e094569e9205cd725971e3e31255dd
generated: "2024-05-08T07:03:21.797461955Z"

View File

@@ -13,7 +13,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.1
version: 0.0.2
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
@@ -22,7 +22,7 @@ appVersion: "0.0.1"
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.2.0
version: 6.5.0
condition: local.logs.enabled
- name: alloy
repository: https://grafana.github.io/helm-charts
@@ -33,9 +33,9 @@ dependencies:
condition: local.metrics.enabled
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.2
version: 1.9.9
condition: local.traces.enabled
- name: minio
repository: https://charts.min.io
version: 5.1.0
version: 5.2.0
condition: local.minio.enabled

Binary file not shown.

Binary file not shown.

View File

@@ -51,6 +51,7 @@
"overrides": [ ]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -64,7 +65,7 @@
"span": 6,
"targets": [
{
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"})",
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "series",
"legendLink": null
@@ -98,6 +99,7 @@
"overrides": [ ]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -111,7 +113,7 @@
"span": 6,
"targets": [
{
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}) / sum(loki_ingester_memory_streams{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"})",
"expr": "sum(loki_ingester_memory_chunks{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}) / sum(loki_ingester_memory_streams{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "chunks",
"legendLink": null
@@ -157,6 +159,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -171,19 +174,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_utilization_sum{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_utilization_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_utilization_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_utilization_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -235,6 +238,7 @@
"overrides": [ ]
},
"id": 4,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -249,19 +253,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1e3",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_age_seconds_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_age_seconds_sum{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) * 1e3 / sum(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_age_seconds_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1e3 / sum(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -325,6 +329,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -339,19 +344,19 @@
"span": 6,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)) * 1",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_entries_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)) * 1",
"format": "time_series",
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(loki_ingester_chunk_entries_sum{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_entries_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunk_entries_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) * 1 / sum(rate(loki_ingester_chunk_entries_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Average",
"refId": "C"
@@ -403,6 +408,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -416,7 +422,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_chunk_store_index_entries_per_chunk_sum{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m])) / sum(rate(loki_chunk_store_index_entries_per_chunk_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m]))",
"expr": "sum(rate(loki_chunk_store_index_entries_per_chunk_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_chunk_store_index_entries_per_chunk_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "Index Entries",
"legendLink": null
@@ -462,6 +468,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -475,7 +482,7 @@
"span": 6,
"targets": [
{
"expr": "loki_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"} or cortex_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}",
"expr": "loki_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"} or cortex_ingester_flush_queue_length{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -658,6 +665,7 @@
},
"fill": 10,
"id": 8,
"interval": "1m",
"linewidth": 0,
"links": [ ],
"options": {
@@ -673,7 +681,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"expr": "sum by (status) (\n label_replace(label_replace(rate(loki_ingester_chunk_age_seconds_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]),\n \"status\", \"${1}xx\", \"status_code\", \"([0-9])..\"),\n \"status\", \"${1}\", \"status_code\", \"([a-zA-Z]+)\"))\n",
"format": "time_series",
"legendFormat": "{{status}}",
"refId": "A"
@@ -719,6 +727,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -732,7 +741,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -766,6 +775,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -780,7 +790,7 @@
"stack": true,
"targets": [
{
"expr": "sum by (reason) (rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum by (reason) (rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{reason}}",
"legendLink": null
@@ -837,13 +847,14 @@
"hideZeroBuckets": false,
"highlightCards": true,
"id": 11,
"interval": "1m",
"legend": {
"show": true
},
"span": 12,
"targets": [
{
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval]))",
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "heatmap",
"intervalFactor": 2,
"legendFormat": "{{le}}",
@@ -899,13 +910,14 @@
"hideZeroBuckets": false,
"highlightCards": true,
"id": 12,
"interval": "1m",
"legend": {
"show": true
},
"span": 12,
"targets": [
{
"expr": "sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[$__rate_interval])) by (le)",
"expr": "sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le)",
"format": "heatmap",
"intervalFactor": 2,
"legendFormat": "{{le}}",
@@ -968,6 +980,7 @@
"overrides": [ ]
},
"id": 13,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -981,19 +994,19 @@
"span": 12,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[1m])) by (le))",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p99",
"legendLink": null
},
{
"expr": "histogram_quantile(0.90, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[1m])) by (le))",
"expr": "histogram_quantile(0.90, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p90",
"legendLink": null
},
{
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[1m])) by (le))",
"expr": "histogram_quantile(0.50, sum(rate(loki_ingester_chunk_size_bytes_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p50",
"legendLink": null
@@ -1039,6 +1052,7 @@
"overrides": [ ]
},
"id": 14,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1052,19 +1066,19 @@
"span": 12,
"targets": [
{
"expr": "histogram_quantile(0.5, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m])) by (le))",
"expr": "histogram_quantile(0.5, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p50",
"legendLink": null
},
{
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m])) by (le))",
"expr": "histogram_quantile(0.99, sum(rate(loki_ingester_chunk_bounds_hours_bucket{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) by (le))",
"format": "time_series",
"legendFormat": "p99",
"legendLink": null
},
{
"expr": "sum(rate(loki_ingester_chunk_bounds_hours_sum{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m])) / sum(rate(loki_ingester_chunk_bounds_hours_count{cluster=\"$cluster\", job=~\"$namespace/(loki|enterprise-logs)-write\"}[5m]))",
"expr": "sum(rate(loki_ingester_chunk_bounds_hours_sum{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval])) / sum(rate(loki_ingester_chunk_bounds_hours_count{cluster=\"$cluster\", job=~\"$namespace/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "avg",
"legendLink": null

View File

@@ -35,6 +35,7 @@
"fill": 1,
"format": "none",
"id": 1,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -110,6 +111,7 @@
"fill": 1,
"format": "dtdurations",
"id": 2,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -213,6 +215,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -260,6 +263,7 @@
"overrides": [ ]
},
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -307,6 +311,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -366,6 +371,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -413,6 +419,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -460,6 +467,7 @@
"overrides": [ ]
},
"id": 8,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -519,6 +527,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -566,6 +575,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -579,7 +589,7 @@
"span": 6,
"targets": [
{
"expr": "sum(rate(loki_compactor_deleted_lines{cluster=~\"$cluster\",job=~\"$namespace/(loki|enterprise-logs)-read\"}[$__rate_interval])) by (user)",
"expr": "sum(rate(loki_compactor_deleted_lines{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval])) by (user)",
"format": "time_series",
"legendFormat": "{{user}}",
"legendLink": null
@@ -603,10 +613,11 @@
{
"datasource": "$loki_datasource",
"id": 11,
"interval": "1m",
"span": 6,
"targets": [
{
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"compactor\"} |~ \"Started processing delete request|delete request for user marked as processed\" | logfmt | line_format \"{{.ts}} user={{.user}} delete_request_id={{.delete_request_id}} msg={{.msg}}\" ",
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"Started processing delete request|delete request for user marked as processed\" | logfmt | line_format \"{{.ts}} user={{.user}} delete_request_id={{.delete_request_id}} msg={{.msg}}\" ",
"refId": "A"
}
],
@@ -616,10 +627,11 @@
{
"datasource": "$loki_datasource",
"id": 12,
"interval": "1m",
"span": 6,
"targets": [
{
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"compactor\"} |~ \"delete request for user added\" | logfmt | line_format \"{{.ts}} user={{.user}} query='{{.query}}'\"",
"expr": "{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*/compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} |~ \"delete request for user added\" | logfmt | line_format \"{{.ts}} user={{.user}} query='{{.query}}'\"",
"refId": "A"
}
],
@@ -701,6 +713,16 @@
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"label": null,
"name": "loki_datasource",
"options": [ ],
"query": "loki",
"refresh": 1,
"regex": "",
"type": "datasource"
}
]
},

View File

@@ -38,6 +38,7 @@
},
"hiddenSeries": false,
"id": 35,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -114,6 +115,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"unit": "s"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -124,6 +130,7 @@
},
"hiddenSeries": false,
"id": 41,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -211,6 +218,7 @@
},
"hiddenSeries": false,
"id": 36,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -236,7 +244,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_cpu_usage_seconds_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"}[5m]))",
"expr": "sum(rate(container_cpu_usage_seconds_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\"}[$__rate_interval]))",
"refId": "A"
}
],
@@ -287,6 +295,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"unit": "bytes"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -297,6 +310,7 @@
},
"hiddenSeries": false,
"id": 40,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -373,6 +387,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"unit": "binBps"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -383,6 +402,7 @@
},
"hiddenSeries": false,
"id": 38,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -408,7 +428,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\"}[5m]))",
"expr": "sum(rate(container_network_transmit_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\"}[$__rate_interval]))",
"refId": "A"
}
],
@@ -459,6 +479,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"unit": "binBps"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -469,6 +494,7 @@
},
"hiddenSeries": false,
"id": 39,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -494,7 +520,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\"}[5m]))",
"expr": "sum(rate(container_network_receive_bytes_total{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\"}[$__rate_interval]))",
"refId": "A"
}
],
@@ -555,6 +581,7 @@
},
"hiddenSeries": false,
"id": 37,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -632,6 +659,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"unit": "ops"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -642,6 +674,7 @@
},
"hiddenSeries": false,
"id": 42,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -667,7 +700,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\", exported_pod=~\"$deployment.*\", exported_pod=~\"$pod\", container=~\"$container\"}[5m])) by (level)",
"expr": "sum(rate(promtail_custom_bad_words_total{cluster=\"$cluster\", exported_namespace=\"$namespace\", exported_pod=~\"$deployment.*\", exported_pod=~\"$pod\", container=~\"$container\"}[$__rate_interval])) by (level)",
"legendFormat": "{{level}}",
"refId": "A"
}
@@ -719,6 +752,11 @@
"dashLength": 10,
"dashes": false,
"datasource": "$loki_datasource",
"fieldConfig": {
"defaults": {
"unit": "ops"
}
},
"fill": 1,
"fillGradient": 0,
"gridPos": {
@@ -729,6 +767,7 @@
},
"hiddenSeries": false,
"id": 31,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -771,7 +810,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum(rate({cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt| level=\"$level\" |= \"$filter\" [5m])) by (level)",
"expr": "sum(rate({cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"$deployment.*\", pod=~\"$pod\", container=~\"$container\" } |logfmt| level=\"$level\" |= \"$filter\" | __error__=\"\" [$__interval])) by (level)",
"intervalFactor": 3,
"legendFormat": "{{level}}",
"refId": "A"
@@ -827,6 +866,7 @@
"y": 6
},
"id": 29,
"interval": "1m",
"maxDataPoints": "",
"options": {
"showLabels": false,

View File

@@ -1,723 +0,0 @@
{
"annotations": {
"list": [ ]
},
"editable": true,
"fiscalYearStartMonth": 0,
"gnetId": null,
"graphTooltip": 0,
"hideControls": false,
"iteration": 1635347545534,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"loki"
],
"targetBlank": false,
"title": "Loki Dashboards",
"type": "dashboards"
}
],
"liveNow": false,
"panels": [
{
"datasource": "${datasource}",
"fieldConfig": {
"defaults": {
"color": {
"mode": "thresholds"
},
"mappings": [ ],
"noValue": "0",
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 1
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 2,
"x": 0,
"y": 0
},
"id": 2,
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "auto",
"orientation": "auto",
"reduceOptions": {
"calcs": [
"lastNotNull"
],
"fields": "",
"values": false
},
"textMode": "auto"
},
"pluginVersion": "8.3.0-38205pre",
"targets": [
{
"datasource": "${datasource}",
"exemplar": false,
"expr": "sum(loki_ruler_wal_appender_ready) by (pod, tenant) == 0",
"instant": true,
"interval": "",
"legendFormat": "",
"refId": "A"
}
],
"title": "Appenders Not Ready",
"type": "stat"
},
{
"datasource": "${datasource}",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 11,
"x": 2,
"y": 0
},
"id": 4,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_samples_appended_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Samples Appended to WAL per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Series are unique combinations of labels",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 11,
"x": 13,
"y": 0
},
"id": 5,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_storage_created_series_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Series Created per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Difference between highest timestamp appended to WAL and highest timestamp successfully written to remote storage",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 10
},
"id": 6,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "loki_ruler_wal_prometheus_remote_storage_highest_timestamp_in_seconds{tenant=~\"${tenant}\"}\n- on (tenant)\n (\n loki_ruler_wal_prometheus_remote_storage_queue_highest_sent_timestamp_seconds{tenant=~\"${tenant}\"}\n or vector(0)\n )",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Write Behind",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 10
},
"id": 7,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum(rate(loki_ruler_wal_prometheus_remote_storage_samples_total{tenant=~\"${tenant}\"}[$__rate_interval])) by (tenant) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Samples Sent per Second",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "\n",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
},
"unit": "bytes"
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 0,
"y": 20
},
"id": 8,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "sum by (tenant) (loki_ruler_wal_disk_size{tenant=~\"${tenant}\"})",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "WAL Disk Size",
"type": "timeseries"
},
{
"datasource": "${datasource}",
"description": "Some number of pending samples is expected, but if remote-write is failing this value will remain high",
"fieldConfig": {
"defaults": {
"color": {
"mode": "palette-classic"
},
"custom": {
"axisLabel": "",
"axisPlacement": "auto",
"barAlignment": 0,
"drawStyle": "line",
"fillOpacity": 0,
"gradientMode": "none",
"hideFrom": {
"legend": false,
"tooltip": false,
"viz": false
},
"lineInterpolation": "linear",
"lineWidth": 1,
"pointSize": 5,
"scaleDistribution": {
"type": "linear"
},
"showPoints": "auto",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
},
"thresholdsStyle": {
"mode": "off"
}
},
"mappings": [ ],
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 80
}
]
}
},
"overrides": [ ]
},
"gridPos": {
"h": 10,
"w": 12,
"x": 12,
"y": 20
},
"id": 9,
"options": {
"legend": {
"calcs": [ ],
"displayMode": "list",
"placement": "bottom"
},
"tooltip": {
"mode": "single"
}
},
"targets": [
{
"datasource": "${datasource}",
"exemplar": true,
"expr": "max(loki_ruler_wal_prometheus_remote_storage_samples_pending{tenant=~\"${tenant}\"}) by (tenant,pod) > 0",
"interval": "",
"legendFormat": "{{tenant}}",
"refId": "A"
}
],
"title": "Pending Samples",
"type": "timeseries"
}
],
"refresh": "10s",
"rows": [ ],
"schemaVersion": 14,
"style": "dark",
"tags": [
"loki"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(loki_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(loki_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 2,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"label": null,
"name": "loki_datasource",
"options": [ ],
"query": "loki",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": { },
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": null,
"multi": false,
"name": "tenant",
"options": [ ],
"query": "query_result(sum by (id) (grafanacloud_logs_instance_info) and sum(label_replace(loki_tenant:active_streams{cluster=\"$cluster\",namespace=\"$namespace\"},\"id\",\"$1\",\"tenant\",\"(.*)\")) by(id))",
"refresh": 0,
"regex": "/\"([^\"]+)\"/",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Loki / Recording Rules",
"uid": "recording-rules",
"version": 0,
"weekStart": ""
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -90,6 +90,7 @@
]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -103,19 +104,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -191,6 +192,7 @@
]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -204,19 +206,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-read.*\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*compactor.*|(loki|enterprise-logs)-backend.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -253,6 +255,7 @@
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -266,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-read\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -317,6 +320,7 @@
},
"fill": 1,
"id": 4,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -423,6 +427,7 @@
"overrides": [ ]
},
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -482,6 +487,7 @@
"overrides": [ ]
},
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -529,6 +535,7 @@
"overrides": [ ]
},
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -590,6 +597,7 @@
},
"fill": 1,
"id": 8,
"interval": "1m",
"legend": {
"avg": false,
"current": false,
@@ -696,6 +704,7 @@
"overrides": [ ]
},
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -743,6 +752,7 @@
"overrides": [ ]
},
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -802,6 +812,7 @@
"overrides": [ ]
},
"id": 11,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -849,6 +860,7 @@
"overrides": [ ]
},
"id": 12,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -896,6 +908,7 @@
"overrides": [ ]
},
"id": 13,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -956,6 +969,7 @@
},
"format": "short",
"id": 14,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1004,6 +1018,7 @@
"overrides": [ ]
},
"id": 15,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1095,6 +1110,7 @@
},
"format": "short",
"id": 16,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1143,6 +1159,7 @@
"overrides": [ ]
},
"id": 17,
"interval": "1m",
"links": [ ],
"nullPointMode": "null as zero",
"options": {
@@ -1233,6 +1250,7 @@
"overrides": [ ]
},
"id": 18,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1280,6 +1298,7 @@
"overrides": [ ]
},
"id": 19,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1327,6 +1346,7 @@
"overrides": [ ]
},
"id": 20,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -1367,7 +1387,7 @@
"span": 12,
"targets": [
{
"expr": "{cluster=~\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-read\"}",
"expr": "{cluster=~\"$cluster\", job=~\"($namespace)/(.*compactor|(loki|enterprise-logs)-backend.*|loki-single-binary)\"}",
"refId": "A"
}
],

View File

@@ -22,6 +22,273 @@
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"fillOpacity": 10,
"lineWidth": 1,
"pointSize": 5,
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
}
},
"thresholds": {
"mode": "absolute",
"steps": [ ]
},
"unit": "short"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "request"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FFC000",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 0
}
]
},
{
"matcher": {
"id": "byName",
"options": "limit"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#E02F44",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 0
}
]
}
]
},
"id": 1,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
}
],
"title": "CPU",
"tooltip": {
"sort": 2
},
"type": "timeseries"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"fillOpacity": 10,
"lineWidth": 1,
"pointSize": 5,
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
}
},
"thresholds": {
"mode": "absolute",
"steps": [ ]
},
"unit": "bytes"
},
"overrides": [
{
"matcher": {
"id": "byName",
"options": "request"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#FFC000",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 0
}
]
},
{
"matcher": {
"id": "byName",
"options": "limit"
},
"properties": [
{
"id": "color",
"value": {
"fixedColor": "#E02F44",
"mode": "fixed"
}
},
{
"id": "custom.fillOpacity",
"value": 0
}
]
}
]
},
"id": 2,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
}
],
"title": "Memory (workingset)",
"tooltip": {
"sort": 2
},
"type": "timeseries"
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"custom": {
"drawStyle": "line",
"fillOpacity": 10,
"lineWidth": 1,
"pointSize": 5,
"showPoints": "never",
"spanNulls": false,
"stacking": {
"group": "A",
"mode": "none"
}
},
"thresholds": {
"mode": "absolute",
"steps": [ ]
},
"unit": "bytes"
},
"overrides": [ ]
},
"id": 3,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"title": "Memory (go heap inuse)",
"tooltip": {
"sort": 2
},
"type": "timeseries"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Distributor",
"titleSize": "h6"
},
{
"collapse": false,
"collapsed": false,
@@ -51,7 +318,8 @@
"overrides": [ ]
},
"gridPos": { },
"id": 1,
"id": 4,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -64,7 +332,7 @@
},
"targets": [
{
"expr": "sum by(pod) (loki_ingester_memory_streams{cluster=~\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-write\"})",
"expr": "sum by(pod) (loki_ingester_memory_streams{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -140,7 +408,8 @@
]
},
"gridPos": { },
"id": 2,
"id": 5,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -153,19 +422,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -241,7 +510,8 @@
]
},
"gridPos": { },
"id": 3,
"id": 6,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -254,19 +524,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -303,7 +573,8 @@
"overrides": [ ]
},
"gridPos": { },
"id": 4,
"id": 7,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -316,7 +587,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(loki|enterprise-logs)-write\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -353,7 +624,8 @@
"overrides": [ ]
},
"gridPos": { },
"id": 5,
"id": 8,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -366,7 +638,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -400,7 +672,8 @@
"overrides": [ ]
},
"gridPos": { },
"id": 6,
"id": 9,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -413,7 +686,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"loki\", pod=~\"(loki|enterprise-logs)-write.*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"loki|ingester\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary)\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -447,7 +720,8 @@
"overrides": [ ]
},
"gridPos": { },
"id": 7,
"id": 10,
"interval": "1m",
"links": [ ],
"options": {
"legend": {
@@ -460,7 +734,7 @@
},
"targets": [
{
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(loki|enterprise-logs)-write.*\"})",
"expr": "max by(persistentvolumeclaim) (kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} / kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}) and count by(persistentvolumeclaim) (kube_persistentvolumeclaim_labels{cluster=~\"$cluster\", namespace=~\"$namespace\",label_name=~\"(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary).*\"})",
"format": "time_series",
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
@@ -474,7 +748,7 @@
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Write path",
"title": "Ingester",
"titleSize": "h6",
"type": "row"
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,836 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\",resource=\"cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "CPU",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (workingset)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"alertmanager\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (go heap inuse)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Alertmanager",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_receive_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?alertmanager.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Receive bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_transmit_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?alertmanager.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Transmit bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk writes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_read_bytes_total[$__rate_interval]\n )\n) + ignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"alertmanager\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk reads",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=~\"(alertmanager).*\"\n }\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk space utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Alertmanager resources",
"uid": "a6883fb22799ac74479c7db872451092",
"version": 0
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,940 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"cpu\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "CPU",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (go heap inuse)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "CPU and memory",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_rss{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (RSS)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 4,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [
{
"alias": "request",
"color": "#FFC000",
"dashLength": 5,
"dashes": true,
"fill": 0
},
{
"alias": "limit",
"color": "#E02F44",
"dashLength": 5,
"dashes": true,
"fill": 0
}
],
"spaceLength": 10,
"span": 6,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\"} > 0)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "limit",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\",container=~\"compactor\",resource=\"memory\"})",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "request",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Memory (workingset)",
"tooltip": {
"sort": 2
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "bytes",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_receive_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?compactor.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Receive bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_network_transmit_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\",pod=~\"(.*mimir-)?compactor.*\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Transmit bandwidth",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Network",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_written_bytes_total[$__rate_interval]\n )\n)\n+\nignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"compactor\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk writes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(instance, pod, device) (\n rate(\n node_disk_read_bytes_total[$__rate_interval]\n )\n) + ignoring(pod) group_right() (\n label_replace(\n count by(\n instance,\n pod,\n device\n )\n (\n container_fs_writes_bytes_total{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n container=~\"compactor\",\n device!~\".*sda.*\"\n }\n ),\n \"device\",\n \"$1\",\n \"device\",\n \"/dev/(.*)\"\n ) * 0\n)\n\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk reads",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "Bps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 0,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "max by(persistentvolumeclaim) (\n kubelet_volume_stats_used_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"} /\n kubelet_volume_stats_capacity_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\nand\ncount by(persistentvolumeclaim) (\n kube_persistentvolumeclaim_labels{\n cluster=~\"$cluster\", namespace=~\"$namespace\",\n label_name=~\"(compactor).*\"\n }\n)\n",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{persistentvolumeclaim}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Disk space utilization",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "percentunit",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Disk",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Compactor resources",
"uid": "09a5c49e9cdb2f2b24c6d184574a07fd",
"version": 0
}

File diff suppressed because it is too large Load Diff

View File

@@ -1,312 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "count(cortex_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}) by (sha256)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sha256:{{sha256}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Startup config file hashes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "instances",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Startup config file",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 12,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "count(cortex_runtime_config_hash{cluster=~\"$cluster\", namespace=~\"$namespace\"}) by (sha256)",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "sha256:{{sha256}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Runtime config file hashes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "instances",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Runtime config file",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Config",
"uid": "5d9d0b4724c0f80d68467088ec61e003",
"version": 0
}

View File

@@ -1,938 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 1,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(component) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{component}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "RPS / component",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"max": 1,
"min": 0,
"noValue": "0",
"unit": "percentunit"
}
},
"id": 2,
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 6,
"targets": [
{
"expr": "sum by(component) (rate(thanos_objstore_bucket_operation_failures_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) / sum by(component) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) >= 0",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{component}}",
"legendLink": null
}
],
"title": "Error rate / component",
"type": "timeseries"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Components",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 10,
"id": 3,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 0,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 6,
"stack": true,
"steppedLine": false,
"targets": [
{
"expr": "sum by(operation) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation}}",
"legendLink": null
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "RPS / operation",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "reqps",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"datasource": "$datasource",
"fieldConfig": {
"defaults": {
"max": 1,
"min": 0,
"noValue": "0",
"unit": "percentunit"
}
},
"id": 4,
"links": [ ],
"options": {
"legend": {
"showLegend": true
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"span": 6,
"targets": [
{
"expr": "sum by(operation) (rate(thanos_objstore_bucket_operation_failures_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) / sum by(operation) (rate(thanos_objstore_bucket_operations_total{cluster=~\"$cluster\", namespace=~\"$namespace\"}[$__rate_interval])) >= 0",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "{{operation}}",
"legendLink": null
}
],
"title": "Error rate / operation",
"type": "timeseries"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Operations",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 5,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Get",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 6,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"get_range\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: GetRange",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 7,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"exists\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Exists",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 8,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"attributes\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Attributes",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 9,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"upload\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Upload",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
},
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 10,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"spaceLength": 10,
"span": 4,
"stack": false,
"steppedLine": false,
"targets": [
{
"expr": "histogram_quantile(0.99, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "99th Percentile",
"refId": "A"
},
{
"expr": "histogram_quantile(0.50, sum(rate(thanos_objstore_bucket_operation_duration_seconds_bucket{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) by (le)) * 1e3",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "50th Percentile",
"refId": "B"
},
{
"expr": "sum(rate(thanos_objstore_bucket_operation_duration_seconds_sum{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval])) * 1e3 / sum(rate(thanos_objstore_bucket_operation_duration_seconds_count{cluster=~\"$cluster\", namespace=~\"$namespace\",operation=\"delete\"}[$__rate_interval]))",
"format": "time_series",
"intervalFactor": 2,
"legendFormat": "Average",
"refId": "C"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Op: Delete",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"type": "graph",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "ms",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Object Store",
"uid": "e1324ee2a434f4158c00a9ee279d3292",
"version": 0
}

View File

@@ -1,266 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${datasource}",
"id": 1,
"span": 12,
"targets": [
{
"expr": "max by(limit_name) (cortex_limits_defaults{cluster=~\"$cluster\",namespace=~\"$namespace\"})",
"instant": true,
"legendFormat": "",
"refId": "A"
}
],
"title": "Defaults",
"transformations": [
{
"id": "labelsToFields",
"options": { }
},
{
"id": "merge",
"options": { }
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"indexByName": {
"Value": 1,
"limit_name": 0
}
}
},
{
"id": "sortBy",
"options": {
"fields": { },
"sort": [
{
"field": "limit_name"
}
]
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
},
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${datasource}",
"id": 2,
"span": 12,
"targets": [
{
"expr": "max by(user, limit_name) (cortex_limits_overrides{cluster=~\"$cluster\",namespace=~\"$namespace\",user=~\"${tenant_id}\"})",
"instant": true,
"legendFormat": "",
"refId": "A"
}
],
"title": "Per-tenant overrides",
"transformations": [
{
"id": "labelsToFields",
"options": {
"mode": "columns",
"valueLabel": "limit_name"
}
},
{
"id": "merge",
"options": { }
},
{
"id": "organize",
"options": {
"excludeByName": {
"Time": true
},
"indexByName": {
"user": 0
}
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"current": {
"selected": true,
"text": ".*",
"value": ".*"
},
"hide": 0,
"label": "Tenant ID",
"name": "tenant_id",
"options": [
{
"selected": true,
"text": ".*",
"value": ".*"
}
],
"query": ".*",
"type": "textbox"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Overrides",
"uid": "1e2c358600ac53f09faea133f811b5bb",
"version": 0
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,362 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "10s",
"rows": [
{
"collapse": false,
"height": "200px",
"panels": [
{
"id": 1,
"options": {
"content": "This dashboard identifies scaling-related issues by suggesting services that you might want to scale up.\nThe table that follows contains a suggested number of replicas and the reason why.\nIf the system is failing and depending on the reason, try scaling up to the specified number.\nThe specified numbers are intended as helpful guidelines when things go wrong, rather than prescriptive guidelines.\n\nReasons:\n- **sample_rate**: There are not enough replicas to handle the\n sample rate. Applies to distributor and ingesters.\n- **active_series**: There are not enough replicas\n to handle the number of active series. Applies to ingesters.\n- **cpu_usage**: There are not enough replicas\n based on the CPU usage of the jobs vs the resource requests.\n Applies to all jobs.\n- **memory_usage**: There are not enough replicas based on the memory\n usage vs the resource requests. Applies to all jobs.\n- **active_series_limits**: There are not enough replicas to hold 60% of the\n sum of all the per tenant series limits.\n- **sample_rate_limits**: There are not enough replicas to handle 60% of the\n sum of all the per tenant rate limits.\n",
"mode": "markdown"
},
"span": 12,
"title": "",
"type": "text"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Service scaling",
"titleSize": "h6"
},
{
"collapse": false,
"height": "400px",
"panels": [
{
"aliasColors": { },
"bars": false,
"dashLength": 10,
"dashes": false,
"datasource": "$datasource",
"fill": 1,
"id": 2,
"legend": {
"avg": false,
"current": false,
"max": false,
"min": false,
"show": true,
"total": false,
"values": false
},
"lines": true,
"linewidth": 1,
"links": [ ],
"nullPointMode": "null as zero",
"percentage": false,
"pointradius": 5,
"points": false,
"renderer": "flot",
"seriesOverrides": [ ],
"sort": {
"col": 0,
"desc": false
},
"spaceLength": 10,
"span": 12,
"stack": false,
"steppedLine": false,
"styles": [
{
"alias": "Time",
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"pattern": "Time",
"type": "hidden"
},
{
"alias": "Required Replicas",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 0,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "Value",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Cluster",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "__name__",
"thresholds": [ ],
"type": "hidden",
"unit": "short"
},
{
"alias": "Cluster",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "cluster",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Service",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "deployment",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Namespace",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "namespace",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "Reason",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"link": false,
"linkTargetBlank": false,
"linkTooltip": "Drill down",
"linkUrl": "",
"pattern": "reason",
"thresholds": [ ],
"type": "number",
"unit": "short"
},
{
"alias": "",
"colorMode": null,
"colors": [ ],
"dateFormat": "YYYY-MM-DD HH:mm:ss",
"decimals": 2,
"pattern": "/.*/",
"thresholds": [ ],
"type": "string",
"unit": "short"
}
],
"targets": [
{
"expr": "sort_desc(\n cluster_namespace_deployment_reason:required_replicas:count{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n > ignoring(reason) group_left\n cluster_namespace_deployment:actual_replicas:count{cluster=~\"$cluster\", namespace=~\"$namespace\"}\n)\n",
"format": "table",
"instant": true,
"intervalFactor": 2,
"legendFormat": "",
"refId": "A"
}
],
"thresholds": [ ],
"timeFrom": null,
"timeShift": null,
"title": "Workload-based scaling",
"tooltip": {
"shared": false,
"sort": 0,
"value_type": "individual"
},
"transform": "table",
"type": "table",
"xaxis": {
"buckets": null,
"mode": "time",
"name": null,
"show": true,
"values": [ ]
},
"yaxes": [
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": 0,
"show": true
},
{
"format": "short",
"label": null,
"logBase": 1,
"max": null,
"min": null,
"show": false
}
]
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "Scaling",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": true,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": ".+",
"current": {
"selected": true,
"text": "All",
"value": "$__all"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "namespace",
"multi": true,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Scaling",
"uid": "64bbad83507b7289b514725658e10352",
"version": 0
}

View File

@@ -1,323 +0,0 @@
{
"__requires": [
{
"id": "grafana",
"name": "Grafana",
"type": "grafana",
"version": "8.0.0"
}
],
"annotations": {
"list": [ ]
},
"editable": true,
"gnetId": null,
"graphTooltip": 1,
"hideControls": false,
"links": [
{
"asDropdown": true,
"icon": "external link",
"includeVars": true,
"keepTime": true,
"tags": [
"mimir"
],
"targetBlank": false,
"title": "Mimir dashboards",
"type": "dashboards"
}
],
"refresh": "",
"rows": [
{
"collapse": false,
"height": "250px",
"panels": [
{
"datasource": "${lokidatasource}",
"fieldConfig": {
"overrides": [
{
"matcher": {
"id": "byName",
"options": "Time range"
},
"properties": [
{
"id": "mappings",
"value": [
{
"from": "",
"id": 1,
"text": "Instant query",
"to": "",
"type": 1,
"value": "0"
}
]
},
{
"id": "unit",
"value": "s"
}
]
},
{
"matcher": {
"id": "byName",
"options": "Step"
},
"properties": [
{
"id": "unit",
"value": "s"
}
]
}
]
},
"id": 1,
"span": 12,
"targets": [
{
"expr": "{cluster=~\"$cluster\",namespace=~\"$namespace\",name=~\"query-frontend.*\"} |= \"query stats\" != \"/api/v1/read\" | logfmt | user=~\"${tenant_id}\" | response_time > ${min_duration}",
"instant": false,
"legendFormat": "",
"range": true,
"refId": "A"
}
],
"title": "Slow queries",
"transformations": [
{
"id": "extractFields",
"options": {
"source": "labels"
}
},
{
"id": "calculateField",
"options": {
"alias": "Time range",
"binary": {
"left": "param_end",
"operator": "-",
"reducer": "sum",
"right": "param_start"
},
"mode": "binary",
"reduce": {
"reducer": "sum"
},
"replaceFields": false
}
},
{
"id": "organize",
"options": {
"excludeByName": {
"Line": true,
"Time": true,
"caller": true,
"cluster": true,
"container": true,
"host": true,
"id": true,
"job": true,
"labels": true,
"level": true,
"line": true,
"method": true,
"msg": true,
"name": true,
"namespace": true,
"param_end": true,
"param_start": true,
"param_time": true,
"path": true,
"pod": true,
"pod_template_hash": true,
"query_wall_time_seconds": true,
"stream": true,
"traceID": true,
"tsNs": true
},
"indexByName": {
"Time range": 3,
"param_query": 2,
"param_step": 4,
"response_time": 5,
"ts": 0,
"user": 1
},
"renameByName": {
"org_id": "Tenant ID",
"param_query": "Query",
"param_step": "Step",
"response_time": "Duration"
}
}
}
],
"type": "table"
}
],
"repeat": null,
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "",
"titleSize": "h6"
}
],
"schemaVersion": 14,
"style": "dark",
"tags": [
"mimir"
],
"templating": {
"list": [
{
"current": {
"text": "default",
"value": "default"
},
"hide": 0,
"label": "Data Source",
"name": "datasource",
"options": [ ],
"query": "prometheus",
"refresh": 1,
"regex": "",
"type": "datasource"
},
{
"allValue": ".*",
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": true,
"label": "cluster",
"multi": false,
"name": "cluster",
"options": [ ],
"query": "label_values(cortex_build_info, cluster)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"allValue": null,
"current": {
"text": "prod",
"value": "prod"
},
"datasource": "$datasource",
"hide": 0,
"includeAll": false,
"label": "namespace",
"multi": false,
"name": "namespace",
"options": [ ],
"query": "label_values(cortex_build_info{cluster=~\"$cluster\"}, namespace)",
"refresh": 1,
"regex": "",
"sort": 1,
"tagValuesQuery": "",
"tags": [ ],
"tagsQuery": "",
"type": "query",
"useTags": false
},
{
"hide": 0,
"includeAll": false,
"label": "Logs datasource",
"multi": false,
"name": "lokidatasource",
"query": "loki",
"type": "datasource"
},
{
"current": {
"selected": true,
"text": "5s",
"value": "5s"
},
"hide": 0,
"label": "Min duration",
"name": "min_duration",
"options": [
{
"selected": true,
"text": "5s",
"value": "5s"
}
],
"query": "5s",
"type": "textbox"
},
{
"current": {
"selected": true,
"text": ".*",
"value": ".*"
},
"hide": 0,
"label": "Tenant ID",
"name": "tenant_id",
"options": [
{
"selected": true,
"text": ".*",
"value": ".*"
}
],
"query": ".*",
"type": "textbox"
}
]
},
"time": {
"from": "now-1h",
"to": "now"
},
"timepicker": {
"refresh_intervals": [
"5s",
"10s",
"30s",
"1m",
"5m",
"15m",
"30m",
"1h",
"2h",
"1d"
],
"time_options": [
"5m",
"15m",
"1h",
"6h",
"12h",
"24h",
"2d",
"7d",
"30d"
]
},
"timezone": "utc",
"title": "Mimir / Slow queries",
"uid": "6089e1ce1e678788f46312a0a1e647e6",
"version": 0
}

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -1,4 +1,3 @@
groups:
- name: "loki_rules"
rules:
- expr: "histogram_quantile(0.99, sum(rate(loki_request_duration_seconds_bucket[5m]))
@@ -50,4 +49,4 @@ groups:
record: "cluster_namespace_job_route:loki_request_duration_seconds_sum:sum_rate"
- expr: "sum(rate(loki_request_duration_seconds_count[5m])) by (cluster, namespace,
job, route)"
record: "cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate"
record: "cluster_namespace_job_route:loki_request_duration_seconds_count:sum_rate"

View File

@@ -6,6 +6,15 @@
{{- join ", " $list }}
{{- end }}
{{- define "agent.all_namespaces" -}}
{{- $list := list }}
{{- range .Values.namespacesToMonitor }}
{{- $list = append $list (printf "%s" .) }}
{{- end }}
{{- $list = append $list .Release.Namespace }}
{{- join "|" $list }}
{{- end }}
{{- define "agent.loki_write_targets" -}}
{{- $list := list }}
{{- if .Values.local.logs.enabled }}
@@ -39,10 +48,32 @@
{{- define "agent.tempo_write_targets" -}}
{{- $list := list }}
{{- if .Values.local.traces.enabled }}
{{- $list = append $list ("otelcol.exporter.otlp.local.input") }}
{{- $list = append $list ("otelcol.exporter.otlphttp.local.input") }}
{{- end }}
{{- if .Values.cloud.traces.enabled }}
{{- $list = append $list ("otelcol.exporter.otlp.cloud.input") }}
{{- $list = append $list ("otelcol.exporter.otlphttp.cloud.input") }}
{{- end }}
{{- join ", " $list }}
{{- end }}
{{- define "agent.all_logs" -}}
{{- $list := list }}
{{- range .Values.logs.retain }}
{{- $list = append $list . }}
{{- end }}
{{- range .Values.logs.extraLogs }}
{{- $list = append $list . }}
{{- end }}
{{- join "|" $list }}
{{- end }}
{{- define "agent.all_metrics" -}}
{{- $list := list }}
{{- range .Values.metrics.retain }}
{{- $list = append $list . }}
{{- end }}
{{- range .Values.metrics.extraMetrics }}
{{- $list = append $list . }}
{{- end }}
{{- join "|" $list }}
{{- end }}

View File

@@ -40,10 +40,12 @@ data:
{{- if or .Values.local.logs.enabled .Values.cloud.logs.enabled }}
// Logs
{{- if .Values.cloud.logs.enabled }}
remote.kubernetes.secret "logs_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.logs.secret -}}"
}
{{- end }}
loki.source.kubernetes "pods" {
clustering {
@@ -57,9 +59,9 @@ data:
loki.process "filter" {
forward_to = [ {{ include "agent.loki_write_targets" . }} ]
{{- if not (empty .Values.logs.retain) }}
{{- if or (not (empty .Values.logs.retain)) (not (empty .Values.logs.extraLogs)) }}
stage.match {
selector = "{cluster=\"{{- .Values.clusterLabelValue -}}\", namespace=~\"{{- join "|" .Values.namespacesToMonitor -}}|{{- $.Release.Namespace -}}\", pod=~\"loki.*\"} !~ \"{{ join "|" .Values.logs.retain }}\""
selector = "{cluster=\"{{- .Values.clusterLabelValue -}}\", namespace=~\"{{- join "|" .Values.namespacesToMonitor -}}|{{- $.Release.Namespace -}}\", pod=~\"loki.*\"} !~ \"{{ include "agent.all_logs" . }}\""
action = "drop"
}
{{- end }}
@@ -80,10 +82,12 @@ data:
{{- if or .Values.local.metrics.enabled .Values.cloud.metrics.enabled }}
// Metrics
{{- if .Values.cloud.metrics.enabled }}
remote.kubernetes.secret "metrics_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.metrics.secret -}}"
}
{{- end }}
discovery.kubernetes "metric_pods" {
role = "pod"
@@ -133,7 +137,14 @@ data:
prometheus.relabel "filter" {
rule {
source_labels = ["__name__"]
regex = "({{ join "|" .Values.metrics.retain }})"
regex = "({{ include "agent.all_metrics" . }})"
action = "keep"
}
rule {
source_labels = ["namespace"]
regex = "{{ include "agent.all_namespaces" . }}"
action = "keep"
}
@@ -154,6 +165,10 @@ data:
// Based on https://github.com/Chewie/loutretelecom-manifests/blob/main/manifests/addons/monitoring/config.river
discovery.kubernetes "all_nodes" {
role = "node"
namespaces {
own_namespace = true
names = [ {{ include "agent.namespaces" . }} ]
}
}
discovery.relabel "all_nodes" {
@@ -267,24 +282,34 @@ data:
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }}
// Traces
{{- if .Values.cloud.traces.enabled }}
remote.kubernetes.secret "traces_credentials" {
namespace = "{{- $.Release.Namespace -}}"
name = "{{- .Values.cloud.traces.secret -}}"
}
{{- end }}
// Shamelessly copied from https://github.com/grafana/intro-to-mlt/blob/main/agent/config.river
otelcol.receiver.otlp "otlp_receiver" {
// We don't technically need this, but it shows how to change listen address and incoming port.
// In this case, the Agent is listening on all available bindable addresses on port 4317 (which is the
// default OTLP gRPC port) for the OTLP protocol.
grpc {
endpoint = "0.0.0.0:4317"
}
grpc {}
// We define where to send the output of all ingested traces. In this case, to the OpenTelemetry batch processor
// named 'default'.
output {
traces = [otelcol.processor.batch.default.input]
traces = [otelcol.processor.batch.default.input]
}
}
otelcol.receiver.jaeger "jaeger" {
protocols {
thrift_http {}
}
output {
traces = [otelcol.processor.batch.default.input]
}
}
@@ -305,7 +330,7 @@ data:
{{- if .Values.local.logs.enabled }}
loki.write "local" {
endpoint {
url = "http://loki-gateway.{{- .Release.Namespace -}}.svc.cluster.local:80/loki/api/v1/push"
url = "http://{{- .Release.Namespace -}}-loki-gateway.{{- .Release.Namespace -}}.svc.cluster.local:80/loki/api/v1/push"
}
}
{{- end }}
@@ -318,21 +343,10 @@ data:
}
{{- end }}
{{- if or .Values.local.traces.enabled .Values.cloud.traces.enabled }}
// The OpenTelemetry exporter exports processed trace spans to another target that is listening for OTLP format traces.
// A unique label, 'local', is added to uniquely identify this exporter.
otelcol.exporter.otlp "local" {
// Define the client for exporting.
{{- if .Values.local.traces.enabled }}
otelcol.exporter.otlphttp "local" {
client {
// Send to the locally running Tempo instance, on port 4317 (OTLP gRPC).
endpoint = "meta-tempo-distributor:4317"
// Configure TLS settings for communicating with the endpoint.
tls {
// The connection is insecure.
insecure = true
// Do not verify TLS certificates when connecting.
insecure_skip_verify = true
}
endpoint = "http://{{- .Release.Name -}}-tempo-distributor.svc:4318"
}
}
{{- end }}
@@ -362,7 +376,7 @@ data:
{{- end }}
{{- if .Values.cloud.traces.enabled }}
otelcol.exporter.otlp "cloud" {
otelcol.exporter.otlphttp "cloud" {
client {
endpoint = nonsensitive(remote.kubernetes.secret.traces_credentials.data["endpoint"])
auth = otelcol.auth.basic.creds.handler

View File

@@ -1,4 +1,4 @@
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap

View File

@@ -1,4 +1,4 @@
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap
@@ -27,58 +27,6 @@ data:
path: /var/lib/grafana/dashboards/loki-2
orgId: 1
type: file
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-1
options:
path: /var/lib/grafana/dashboards/mimir-1
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-2
options:
path: /var/lib/grafana/dashboards/mimir-2
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-3
options:
path: /var/lib/grafana/dashboards/mimir-3
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-4
options:
path: /var/lib/grafana/dashboards/mimir-4
orgId: 1
type: file
- disableDeletion: true
editable: false
folder: Mimir
name: mimir-5
options:
path: /var/lib/grafana/dashboards/mimir-5
orgId: 1
type: file
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- disableDeletion: true
editable: false
folder: Tempo
name: tempo-1
options:
path: /var/lib/grafana/dashboards/tempo-1
orgId: 1
type: file
{{- end }}
- disableDeletion: true
editable: false

View File

@@ -12,7 +12,7 @@ data:
# List of data sources to delete from the database.
deleteDatasources:
- name: Loki
- name: Loki
orgId: 1
# List of data sources to insert/update depending on what's
@@ -32,7 +32,7 @@ data:
uid: loki_ds
# <string> Sets the data source's URL, including the
# port.
url: http://loki-gateway.{{- $.Release.Namespace -}}.svc.cluster.local
url: http://{{- $.Release.Namespace -}}-loki-gateway.{{- $.Release.Namespace -}}.svc.cluster.local
# <bool> Toggles whether the data source is pre-selected
# for new panels. You can set only one default
# data source per organization.
@@ -61,6 +61,10 @@ data:
# <bool> Allows users to edit data sources from the
# Grafana UI.
editable: true
# Extra config.
jsonData:
# Scrape interval
timeInterval: 1m
{{- end }}
{{- if .Values.local.traces.enabled }}
- name: Tempo

View File

@@ -1,16 +1,4 @@
{{- if .Values.local.grafana.enabled }}
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
---
apiVersion: apps/v1
kind: Deployment
metadata:
@@ -32,7 +20,7 @@ spec:
- 0
containers:
- name: grafana
image: grafana/grafana:10.0.0
image: grafana/grafana:{{- .Values.grafana.version }}
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3000
@@ -65,7 +53,7 @@ spec:
name: grafana-pv
- mountPath: /etc/grafana/provisioning/datasources
name: datasources-provisioning
{{- if or (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled) .Values.dashboards.traces.enabled }}
{{- if .Values.dashboards.logs.enabled }}
- mountPath: /etc/grafana/provisioning/dashboards
name: dashboards-provisioning
{{- end }}
@@ -75,22 +63,6 @@ spec:
- mountPath: /var/lib/grafana/dashboards/loki-2
name: loki-dashboards-2
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- mountPath: /var/lib/grafana/dashboards/mimir-1
name: mimir-dashboards-1
- mountPath: /var/lib/grafana/dashboards/mimir-2
name: mimir-dashboards-2
- mountPath: /var/lib/grafana/dashboards/mimir-3
name: mimir-dashboards-3
- mountPath: /var/lib/grafana/dashboards/mimir-4
name: mimir-dashboards-4
- mountPath: /var/lib/grafana/dashboards/mimir-5
name: mimir-dashboards-5
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- mountPath: /var/lib/grafana/dashboards/tempo-1
name: tempo-dashboards-1
{{- end }}
- mountPath: /var/lib/grafana/dashboards/agent-1
name: agent-dashboards-1
volumes:
@@ -111,44 +83,7 @@ spec:
configMap:
name: loki-dashboards-2
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
- name: mimir-dashboards-1
configMap:
name: mimir-dashboards-1
- name: mimir-dashboards-2
configMap:
name: mimir-dashboards-2
- name: mimir-dashboards-3
configMap:
name: mimir-dashboards-3
- name: mimir-dashboards-4
configMap:
name: mimir-dashboards-4
- name: mimir-dashboards-5
configMap:
name: mimir-dashboards-5
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
- name: tempo-dashboards-1
configMap:
name: tempo-dashboards-1
{{- end }}
- name: agent-dashboards-1
configMap:
name: agent-dashboards-1
---
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: ClusterIP # Make this configurable
{{- end }}

View File

@@ -0,0 +1,12 @@
{{- if .Values.local.grafana.enabled }}
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: grafana-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
{{- end }}

View File

@@ -0,0 +1,15 @@
{{- if .Values.local.grafana.enabled }}
apiVersion: v1
kind: Service
metadata:
name: grafana
spec:
ports:
- port: 3000
protocol: TCP
targetPort: http-grafana
selector:
app: grafana
sessionAffinity: None
type: ClusterIP # Make this configurable
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"mimir-alertmanager-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-alertmanager-resources.json" | fromJson | toJson }}
"mimir-alertmanager.json": |
{{ $.Files.Get "src/dashboards/mimir-alertmanager.json" | fromJson | toJson }}
"mimir-compactor-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-compactor-resources.json" | fromJson | toJson }}
"mimir-compactor.json": |
{{ $.Files.Get "src/dashboards/mimir-compactor.json" | fromJson | toJson }}
"mimir-config.json": |
{{ $.Files.Get "src/dashboards/mimir-config.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-2
namespace: {{ $.Release.Namespace }}
data:
"mimir-object-store.json": |
{{ $.Files.Get "src/dashboards/mimir-object-store.json" | fromJson | toJson }}
"mimir-overrides.json": |
{{ $.Files.Get "src/dashboards/mimir-overrides.json" | fromJson | toJson }}
"mimir-overview-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-overview-networking.json" | fromJson | toJson }}
"mimir-overview-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-overview-resources.json" | fromJson | toJson }}
"mimir-overview.json": |
{{ $.Files.Get "src/dashboards/mimir-overview.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-3
namespace: {{ $.Release.Namespace }}
data:
"mimir-queries.json": |
{{ $.Files.Get "src/dashboards/mimir-queries.json" | fromJson | toJson }}
"mimir-reads-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-reads-networking.json" | fromJson | toJson }}
"mimir-reads-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-reads-resources.json" | fromJson | toJson }}
"mimir-reads.json": |
{{ $.Files.Get "src/dashboards/mimir-reads.json" | fromJson | toJson }}
"mimir-remote-ruler-reads-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-remote-ruler-reads-resources.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-4
namespace: {{ $.Release.Namespace }}
data:
"mimir-remote-ruler-reads.json": |
{{ $.Files.Get "src/dashboards/mimir-remote-ruler-reads.json" | fromJson | toJson }}
"mimir-rollout-progress.json": |
{{ $.Files.Get "src/dashboards/mimir-rollout-progress.json" | fromJson | toJson }}
"mimir-ruler.json": |
{{ $.Files.Get "src/dashboards/mimir-ruler.json" | fromJson | toJson }}
"mimir-scaling.json": |
{{ $.Files.Get "src/dashboards/mimir-scaling.json" | fromJson | toJson }}
"mimir-slow-queries.json": |
{{ $.Files.Get "src/dashboards/mimir-slow-queries.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,19 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.metrics.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: mimir-dashboards-5
namespace: {{ $.Release.Namespace }}
data:
"mimir-tenants.json": |
{{ $.Files.Get "src/dashboards/mimir-tenants.json" | fromJson | toJson }}
"mimir-top-tenants.json": |
{{ $.Files.Get "src/dashboards/mimir-top-tenants.json" | fromJson | toJson }}
"mimir-writes-networking.json": |
{{ $.Files.Get "src/dashboards/mimir-writes-networking.json" | fromJson | toJson }}
"mimir-writes-resources.json": |
{{ $.Files.Get "src/dashboards/mimir-writes-resources.json" | fromJson | toJson }}
"mimir-writes.json": |
{{ $.Files.Get "src/dashboards/mimir-writes.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,21 +0,0 @@
{{- if and .Values.local.grafana.enabled .Values.dashboards.traces.enabled }}
---
apiVersion: v1
kind: ConfigMap
metadata:
name: tempo-dashboards-1
namespace: {{ $.Release.Namespace }}
data:
"tempo-operational.json": |
{{ $.Files.Get "src/dashboards/tempo-operational.json" | fromJson | toJson }}
"tempo-reads.json": |
{{ $.Files.Get "src/dashboards/tempo-reads.json" | fromJson | toJson }}
"tempo-resources.json": |
{{ $.Files.Get "src/dashboards/tempo-resources.json" | fromJson | toJson }}
"tempo-rollout-progress.json": |
{{ $.Files.Get "src/dashboards/tempo-rollout-progress.json" | fromJson | toJson }}
"tempo-tenants.json": |
{{ $.Files.Get "src/dashboards/tempo-tenants.json" | fromJson | toJson }}
"tempo-writes.json": |
{{ $.Files.Get "src/dashboards/tempo-writes.json" | fromJson | toJson }}
{{- end }}

View File

@@ -1,5 +1,5 @@
{{- if .Values.local.grafana.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
apiVersion: apps/v1
kind: Deployment
metadata:
@@ -49,6 +49,9 @@ spec:
- containerPort: 7946
name: memberlist
protocol: TCP
envFrom:
- secretRef:
name: minio
readinessProbe:
failureThreshold: 3
httpGet:

View File

@@ -1,5 +1,5 @@
{{- if .Values.local.metrics.enabled }}
{{- if and .Values.local.grafana.enabled (or .Values.dashboards.logs.enabled .Values.dashboards.metrics.enabled .Values.dashboards.traces.enabled) }}
{{- if and .Values.local.grafana.enabled .Values.dashboards.logs.enabled }}
---
apiVersion: v1
kind: ConfigMap
@@ -10,11 +10,5 @@ data:
{{- if .Values.dashboards.logs.enabled }}
{{ ($.Files.Glob "src/rules/loki-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.dashboards.metrics.enabled }}
{{ ($.Files.Glob "src/rules/mimir-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- if .Values.dashboards.traces.enabled }}
{{ ($.Files.Glob "src/rules/tempo-rules.yaml").AsConfig | indent 2 }}
{{- end }}
{{- end }}
{{- end }}

View File

@@ -1,12 +1,10 @@
# Specify the namespaces to monitor here
namespacesToMonitor:
- loki
- mimir
- tempo
# The name of the cluster where this will be installed
clusterLabelValue: "meta-monitoring"
# Set to true to write logs, metrics or traces to Grafana Cloud
# The secrets have to be created first
cloud:
logs:
enabled: true
@@ -17,7 +15,6 @@ cloud:
traces:
enabled: true
secret: "traces"
# Set to true for a local version of logs, metrics or traces
local:
grafana:
@@ -29,9 +26,9 @@ local:
traces:
enabled: false
minio:
enabled: false # This should be set to true if any of the previous is enabled
enabled: false # This should be set to true if any of the previous is enabled
grafana:
version: 10.4.2
# Gateway ingress configuration
ingress:
# -- Specifies whether an ingress for the gateway should be created
@@ -39,9 +36,9 @@ grafana:
# -- Ingress Class Name. MAY be required for Kubernetes versions >= 1.18
ingressClassName: ""
# -- Annotations for the gateway ingress
annotations: { }
annotations: {}
# -- Labels for the gateway ingress
labels: { }
labels: {}
# -- Hosts configuration for the gateway ingress, passed through the `tpl` function to allow templating
hosts:
- host: monitoring.example.com
@@ -54,30 +51,27 @@ grafana:
# - secretName: grafana-tls
# hosts:
# - monitoring.example.com
logs:
# Adding regexes here will add a stage.replace block for logs. For more information see
# https://grafana.com/docs/agent/latest/flow/reference/components/loki.process/#stagereplace-block
piiRegexes:
# This example replaces the word after password with *****
# - expression: "password (\\\\S+)"
# source: "" # Empty uses the log message
# replace: "*****""
# The lines matching these will be kept in Loki
piiRegexes: null # This example replaces the word after password with *****
# - expression: "password (\\\\S+)"
# source: "" # Empty uses the log message
# replace: "*****""
# The lines matching these will be kept in Loki
retain:
# This shows the queries
- caller=metrics.go
# This shows any errors
- level=error
# This shows the ingest requests and is very noisy. Uncomment to include.
# - caller=push.go
# Log lines for delete requests
- delete request for user added
- Started processing delete request
- delete request for user marked as processed
# This shows the ingest requests and is very noisy. Uncomment to include.
# - caller=push.go
# Additional log lines to retain
extraLogs: []
metrics:
# The list of metrics to retain for logging dashboards
retain:
@@ -104,7 +98,9 @@ metrics:
- go_memstats_heap_inuse_bytes
- kubelet_volume_stats_used_bytes
- kubelet_volume_stats_capacity_bytes
- kube_deployment_created
- kube_persistentvolumeclaim_labels
- kube_pod_container_info
- kube_pod_container_resource_requests
- kube_pod_container_status_last_terminated_reason
- kube_pod_container_status_restarts_total
@@ -137,6 +133,7 @@ metrics:
- loki_distributor_bytes_received_total
- loki_distributor_lines_received_total
- loki_distributor_structured_metadata_bytes_received_total
- loki_index_request_duration_seconds_count
- loki_ingester_chunk_age_seconds_bucket
- loki_ingester_chunk_age_seconds_count
- loki_ingester_chunk_age_seconds_sum
@@ -173,21 +170,12 @@ metrics:
- node_disk_read_bytes_total
- node_disk_written_bytes_total
- promtail_custom_bad_words_total
# Set enabled = true to add the default logs/metrics/traces dashboards to the local Grafana
# Additional metrics to retain
extraMetrics: []
# Set enabled = true to add the default logs dashboards to the local Grafana
dashboards:
logs:
enabled: true
metrics:
enabled: true
traces:
enabled: true
global:
minio:
rootUser: "rootuser"
rootPassword: "rootpassword"
kubeStateMetrics:
# Scrape https://github.com/kubernetes/kube-state-metrics by default
enabled: true
@@ -195,10 +183,8 @@ kubeStateMetrics:
# https://artifacthub.io/packages/helm/prometheus-community/kube-state-metrics/
# is used. Change this if kube-state-metrics is installed somewhere else.
endpoint: kube-state-metrics.kube-state-metrics.svc.cluster.local:8080
# The following are configuration for the dependencies.
# These should usually not be changed.
loki:
loki:
auth_enabled: false
@@ -223,9 +209,9 @@ loki:
common:
storage:
s3:
access_key_id: "{{ .Values.global.minio.rootUser }}"
access_key_id: "${rootUser}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
secret_access_key: "${rootPassword}"
compactor:
retention_enabled: true
delete_request_store: s3
@@ -248,9 +234,24 @@ loki:
installOperator: false
lokiCanary:
enabled: false
test:
enabled: false
write:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
read:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
backend:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
alloy:
alloy:
clustering:
@@ -265,6 +266,15 @@ alloy:
memory: '600Mi'
limits:
memory: '4Gi'
extraPorts:
- name: "otel"
port: 4317
targetPort: 4317
protocol: "TCP"
- name: "thrifthttp"
port: 14268
targetPort: 14268
protocol: "TCP"
controller:
type: "statefulset"
autoscaling:
@@ -273,37 +283,36 @@ alloy:
maxReplicas: 30
targetMemoryUtilizationPercentage: 90
targetCPUUtilizationPercentage: 90
mimir-distributed:
minio:
enabled: false
global:
extraEnvFrom:
- secretRef:
name: "minio"
mimir:
structuredConfig:
alertmanager_storage:
s3:
bucket_name: mimir-ruler
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
blocks_storage:
backend: s3
s3:
bucket_name: mimir-tsdb
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
ruler_storage:
s3:
bucket_name: mimir-ruler
access_key_id: "{{ .Values.global.minio.rootUser }}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "{{ .Values.global.minio.rootPassword }}"
insecure: true
common:
storage:
backend: s3
s3:
bucket_name: mimir-ruler
access_key_id: "${rootUser}"
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
secret_access_key: "${rootPassword}"
insecure: true
limits:
compactor_blocks_retention_period: 30d
tempo-distributed:
tempo:
structuredConfig:
@@ -313,22 +322,47 @@ tempo-distributed:
s3:
bucket: tempo
endpoint: "{{ .Release.Name }}-minio.{{ .Release.Namespace }}.svc:9000"
access_key: "{{ .Values.global.minio.rootUser }}"
secret_key: "{{ .Values.global.minio.rootPassword }}"
access_key: "${rootUser}"
secret_key: "${rootPassword}"
insecure: true
compactor:
compaction:
block_retention: 30d
distributor:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
ingester:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
compactor:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
querier:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
queryFrontend:
extraArgs:
- "-config.expand-env=true"
extraEnvFrom:
- secretRef:
name: "minio"
traces:
otlp:
http:
enabled: true
grpc:
enabled: true
minio:
rootUser: rootuser
rootPassword: rootpassword
existingSecret: "minio"
buckets:
- name: loki-chunks
policy: none

View File

@@ -1,5 +1,19 @@
# Install this chart
## Preparation for Cloud mode (preferred)
1. Use an existing Grafana Cloud account or setup a new one. Then create an access token:
1. In Grafana go to Administration -> Users and Access -> Cloud access policies.
1. Click `Create access policy`.
1. Fill in the `Display name` field and check the `Write` check box for metrics, logs and traces. Then click `Create`.
1. On the newly created access policy click `Add token`.
1. Fill in the `Token name` field and click `Create`. Make a copy of the token as it will be used later on.
1. Create the meta namespace
```
@@ -11,36 +25,169 @@
```
kubectl create secret generic logs -n meta \
--from-literal=username=<logs username> \
--from-literal=password=<logs password>
--from-literal=password=<token> \
--from-literal=endpoint='https://logs-prod-us-central1.grafana.net/loki/api/v1/push'
kubectl create secret generic metrics -n meta \
--from-literal=username=<metrics username> \
--from-literal=password=<metrics password>
--from-literal=password=<token> \
--from-literal=endpoint='https://prometheus-us-central1.grafana.net/api/prom/push'
kubectl create secret generic traces -n meta \
--from-literal=username=<traces username> \
--from-literal=password=<traces password>
--from-literal=endpoint='https://tempo-us-central1.grafana.net/tempo'
--from-literal=username=<OTLP instance ID> \
--from-literal=password=<token> \
--from-literal=endpoint='https://otlp-gateway-prod-us-east-0.grafana.net/otlp'
```
The logs, metrics and traces usernames are the `User / Username / Instance IDs` of the Loki, Prometheus/Mimir and OpenTelemetry instances in Grafana Cloud. From `Home` in Grafana click on `Stacks`. Then go to the `Details` pages of Loki and Prometheus/Mimir. For OpenTelemetry go to the `Configure` page.
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Fill in the names of the secrets created above as needed. An example minimal values.yaml looks like this:
```
namespacesToMonitor:
- loki
cloud:
logs:
enabled: true
secret: "logs"
metrics:
enabled: true
secret: "metrics"
traces:
enabled: true
secret: "traces"
```
## Preparation for Local mode
1. Create the meta namespace
```
kubectl create namespace meta
```
1. Create a secret named `minio` with the user and password for the local Minio:
```
kubectl create secret generic minio -n meta \
--from-literal=rootPassword=<password> \
--from-literal=rootUser=<user>
```
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). An example minimal values.yaml looks like this:
```
namespacesToMonitor:
- loki
cloud:
logs:
enabled: false
metrics:
enabled: false
traces:
enabled: false
local:
grafana:
enabled:true
logs:
enabled: true
metrics:
enabled: true
traces:
enabled: true
minio:
enabled: true
```
## Installing the chart
1. Add the repo
```
helm repo add grafana https://grafana.github.io/helm-charts
```
1. Fetch the latest charts from the grafana repo
```
helm repo update grafana
```
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Fill in the names of the secrets created above as needed.
1. Install this helm chart
```
helm install -n meta -f values.yaml meta ./charts/meta-monitoring
helm install -n meta -f values.yaml meta grafana/meta-monitoring
```
1. Upgrade
```
helm upgrade --install -f values.yaml -n meta meta ./charts/meta-monitoring
helm upgrade --install -f values.yaml -n meta meta grafana/meta-monitoring
```
1. Delete this chart:
```
helm delete -n meta meta
```
```
## Installing the dashboards and rules on Grafana Cloud
## Installing the dashboards on Grafana Cloud
Only the files for the application monitored have to be copied. When monitoring Loki import dashboard files starting with 'loki-'.
For each of the dashboard files in charts/meta-monitoring/src/dashboards folder do the following:
1. Click on 'Dashboards' in Grafana
1. Click on the 'New` button and select 'Import'
1. Drop the dashboard file to the 'Upload dashboard JSON file' drop area
1. Click 'Import'
## Installing the rules on Grafana Cloud
1. Select the rules files in charts/meta-monitoring/src/rules for the application to monitor. When monitoring Loki use loki-rules.yaml.
1. Install mimirtool as per the [instructions](https://grafana.com/docs/mimir/latest/manage/tools/mimirtool/)
1. Create an access policy with Read and Write permission for Rules. Also create a token and record the token.
1. Get your cloud Prometheus endpoint and Instance ID from the `Prometheus` page in `Stacks`.
1. Use them to load the rules using mimirtool as follows:
```
mimirtool rules load --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token> *.yaml
```
1. To check the rules you have uploaded run:
```
mimirtool rules print --address=<your_cloud_prometheus_endpoint> --id=<your_instance_id> --key=<your_cloud_access_policy_token>
```
## Configure Loki to send traces
1. In the Loki config enable tracing:
```
loki:
tracing:
enabled: true
```
1. Add the following environment variables to your Loki binaries. When using the Loki Helm chart these can be added using the `extraEnv` setting for the Loki components.
1. JAEGER_ENDPOINT: http address of the mmc-alloy service installed by the meta-monitoring chart, for example "http://mmc-alloy:14268/api/traces"
1. JAEGER_AGENT_TAGS: extra tags you would like to add to the spans, for example 'cluster="abc",namespace="def"'
1. JAEGER_SAMPLER_TYPE: the sampling strategy, for example to sample all use 'const' with a value of 1 for the next environment variable
1. JAEGER_SAMPLER_PARAM: 1
1. If Loki is installed in a different namespace you can create an [ExternalName service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) in Kubernetes to point to the mmc-alloy service in the meta monitoring namespace

20
scripts/clone_loki_mixin.sh Executable file
View File

@@ -0,0 +1,20 @@
#!/usr/bin/env bash
clean_up() {
test -d "$tmp_dir" && rm -fr "$tmp_dir"
}
here=${PWD}
tmp_dir=$( mktemp -d -t my-script )
cd $tmp_dir
echo "Cloning Loki"
git clone --filter=blob:none --no-checkout "https://github.com/grafana/loki"
cd loki
git sparse-checkout init --cone
git checkout main
git sparse-checkout set production/loki-mixin
echo "Copying production/loki-mixin to ${here}"
cp -r production ${here}

View File

@@ -0,0 +1,18 @@
(import 'dashboards.libsonnet') +
(import 'alerts.libsonnet') +
(import 'recording_rules.libsonnet') + {
grafanaDashboardFolder: 'Loki Meta Monitoring',
_config+:: {
internal_components: false,
// The Meta Monitoring helm chart uses Grafana Alloy instead of promtail
promtail+: {
enabled: false,
},
meta_monitoring+: {
enabled: true,
},
},
}