Compare commits

...

59 Commits

Author SHA1 Message Date
Edward Welch
0914919499 updating chart dependency 2024-10-24 16:17:27 +00:00
Ed Welch
918b6b9cb4 Merge pull request #158 from grafana/slim-bean-patch-1
Update Chart.yaml to update alloy and release new version
2024-10-24 12:01:33 -04:00
Ed Welch
55f3424118 Update Chart.yaml 2024-10-24 12:01:09 -04:00
Ed Welch
98e5ecd887 Merge pull request #157 from grafana/update-metrics-port
update metrics port to look for `metrics` in the port name vs excludi…
2024-10-24 11:46:06 -04:00
Edward Welch
58b438cdb5 update metrics port to look for metrics in the port name vs excluding ports
add some more log lines
2024-10-24 15:06:30 +00:00
J Stickler
c4af598b75 Merge pull request #147 from W0n9/main
Fix typo in loki-reads.json
2024-07-18 09:21:40 -04:00
TsungWing Wong
c78fe2d9fa Fix typo in loki-reads.json
TSBD -> TSDB
2024-07-17 13:38:58 +08:00
J Stickler
bae6e28b51 Merge pull request #146 from Vinaum8/patch-1
Update installation.md
2024-06-24 14:51:00 -04:00
Vinícius Fernandes
8f38e9508f Update installation.md
Update line 94, grafana:true to grafana : true. =P
2024-06-21 15:51:18 -03:00
Michel Hollands
de8a87dea1 Merge pull request #140 from grafana/chore/update-dependencies
[dependency] Update the subcharts
2024-06-03 08:43:45 +01:00
MichelHollands
48fad9f387 Update dependencies 2024-06-03 07:02:54 +00:00
Michel Hollands
4ec5f08646 Merge pull request #131 from grafana/add_loki_team_to_prs
Add loki-squad as PR reviewers
2024-05-31 15:21:35 +01:00
Michel Hollands
a1b66f0cd4 Merge pull request #138 from grafana/chore/update-dependencies
[dependency] Update the subcharts
2024-05-31 15:20:40 +01:00
MichelHollands
34bbe47d75 Update dependencies 2024-05-31 13:57:47 +00:00
Michel Hollands
0ef850e96c Add permissions
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:56:32 +01:00
Michel Hollands
c91a819e77 Add secret step
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:53:22 +01:00
Michel Hollands
71462a9f93 Use other token
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:49:41 +01:00
Michel Hollands
c5f1daf8f0 Use team-reviewers
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:36:05 +01:00
Michel Hollands
952c3e85d9 Use @
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:29:24 +01:00
Michel Hollands
f6b72897cd Use other form
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:06:06 +01:00
Michel Hollands
8b6314fde3 Add loki-squad as PR reviewers
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-31 14:03:07 +01:00
Michel Hollands
4d42fb664d Merge pull request #127 from grafana/chore/update-dependencies
[dependency] Update the subcharts
2024-05-22 09:14:16 +01:00
MichelHollands
9457c25ced Update dependencies 2024-05-22 07:03:07 +00:00
Michel Hollands
ca686afc3e Merge pull request #125 from grafana/update_version
Update to version 1.0
2024-05-14 14:10:13 +01:00
Michel Hollands
4b01214225 Update to version 1.0
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 14:09:41 +01:00
Michel Hollands
0e63a86fe5 Merge pull request #124 from grafana/update_main_page
Update the README and docs
2024-05-14 14:08:00 +01:00
Michel Hollands
4e8b2be044 Update the README and docs
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 14:06:49 +01:00
Michel Hollands
df12d96f9c Merge pull request #123 from grafana/cleanup_ci
Comment out installation in CI for now
2024-05-14 11:51:24 +01:00
Michel Hollands
fcb5de6793 Comment out installation in CI for now
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 11:50:57 +01:00
Michel Hollands
661662caec Merge pull request #121 from grafana/add_ci_install
Add CI install step
2024-05-14 10:52:01 +01:00
Michel Hollands
2a681ce1eb Add workflow dispatch
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 10:50:12 +01:00
Michel Hollands
52e4516e04 Add CI install step
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 10:48:45 +01:00
Michel Hollands
95085c4e72 Merge pull request #114 from grafana/chore/update-dependencies
[dependency] Update the subcharts
2024-05-14 10:38:16 +01:00
Michel Hollands
55d3c9d723 Merge pull request #120 from grafana/add_ksm_docs
Add kube-state-metrics documentation
2024-05-14 10:31:47 +01:00
Michel Hollands
618ab3778b Add kube-state-metrics documentation
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 10:31:11 +01:00
Michel Hollands
89d9bdb5e2 Merge pull request #119 from grafana/fix_cluster_name
Use shorter name for cluster
2024-05-14 08:47:56 +01:00
Michel Hollands
291f680c16 Use shorter name for cluster
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-14 08:46:45 +01:00
MichelHollands
3658769c7a Update dependencies 2024-05-14 07:04:01 +00:00
Michel Hollands
1be9bc8d0a Merge pull request #118 from grafana/fix_dashboards
Fix dashboards a bit more
2024-05-13 17:04:45 +01:00
Michel Hollands
81d63a4383 Fix CPU usage of ssd querier
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 16:59:05 +01:00
Michel Hollands
333ba3a3fd Add cluster to kube state metrics
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 16:58:07 +01:00
Michel Hollands
7aa091cbf8 Merge pull request #117 from grafana/fix_dashboards
Fix dashboards
2024-05-13 14:48:45 +01:00
Michel Hollands
d309a5bc50 Fix mistakes
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:44:08 +01:00
Michel Hollands
346dd4968e Make reads-resources work for all 3 deployment modes
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:36:53 +01:00
Michel Hollands
f5c9fa0593 Update operation so it works with all types of deployment
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:07:59 +01:00
Michel Hollands
d5e8df856d Update writes dashboard work with all types
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 14:06:41 +01:00
Michel Hollands
2d85e7e120 Update dashboards so they work with single binary
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 10:56:35 +01:00
Michel Hollands
1a4a1ad885 Fix ruler panel
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 10:17:55 +01:00
Michel Hollands
c1ff364c29 Add missing metric in reads dashboard
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:45:37 +01:00
Michel Hollands
bd0ef0e2cc Add missing values
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:21:05 +01:00
Michel Hollands
0216163885 Add chunk reason
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:11:20 +01:00
Michel Hollands
c42718649f Fix distributor memory panel
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-13 09:03:02 +01:00
Michel Hollands
650df8217a Merge pull request #116 from grafana/fix_loki_write_endpoint
Fix local write end point
2024-05-13 08:24:18 +01:00
Michel Hollands
f7946ff713 Fix local write end point
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-12 14:32:39 +01:00
Michel Hollands
b312fc37fc Merge pull request #115 from grafana/fix_traces_forwarding
Fix local tracing pipeline
2024-05-10 15:44:03 +01:00
Michel Hollands
ad96f09600 Fix tracing pipeline
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-10 15:36:05 +01:00
Michel Hollands
090f1ef91a Merge pull request #113 from grafana/change_default_sampling_type
Suggest ratelimiting sample rate for Loki traces
2024-05-09 17:10:24 +01:00
Michel Hollands
b2957d90f0 Merge pull request #112 from grafana/update_ingress_documentation
Add docs regarding the Ingress
2024-05-09 17:10:06 +01:00
Michel Hollands
f8aea814c5 Suggest ratelimiting sample rate for Loki traces
Signed-off-by: Michel Hollands <michel.hollands@gmail.com>
2024-05-09 16:46:43 +01:00
20 changed files with 168 additions and 95 deletions

19
.github/configs/cluster-config.yaml vendored Normal file
View File

@@ -0,0 +1,19 @@
apiVersion: kind.x-k8s.io/v1alpha4
kind: Cluster
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: ClusterConfiguration
controllerManager:
extraArgs:
bind-address: 0.0.0.0
secure-port: "10257"
scheduler:
extraArgs:
bind-address: 0.0.0.0
secure-port: "10259"
- |
kind: KubeProxyConfiguration
metricsBindAddress: 0.0.0.0:10249
- role: worker

View File

@@ -19,6 +19,9 @@ jobs:
updateVersions:
name: Update the subcharts
runs-on: "ubuntu-latest"
permissions:
contents: write
id-token: write
steps:
- name: Checkout
uses: actions/checkout@v2
@@ -66,6 +69,20 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- id: get-secrets
uses: grafana/shared-workflows/actions/get-vault-secrets@main
with:
# Secrets placed in the ci/repo/grafana/<repo>/<path> path in Vault
repo_secrets: |
APP_ID=github-app:app-id
PRIVATE_KEY=github-app:private-key
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ env.APP_ID }}
private-key: ${{ env.PRIVATE_KEY }}
- name: Create pull request
if: steps.update-loki.outputs.changed == 'true' || steps.update-grafana-alloy.outputs.changed == 'true' || steps.update-mimir-distributed.outputs.changed == 'true' || steps.update-tempo-distributed.outputs.changed == 'true' || steps.update-minio.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
@@ -79,10 +96,15 @@ jobs:
labels: dependencies
branch: chore/update-dependencies
delete-branch: true
team-reviewers: "@grafana/loki-squad"
token: ${{ steps.app-token.outputs.token }}
updateGrafana:
name: Update the Grafana version
runs-on: "ubuntu-latest"
permissions:
contents: write
id-token: write
steps:
- name: Checkout
uses: actions/checkout@v2
@@ -98,6 +120,20 @@ jobs:
echo "changed=true" >> "${GITHUB_OUTPUT}"
fi
- id: get-secrets
uses: grafana/shared-workflows/actions/get-vault-secrets@main
with:
# Secrets placed in the ci/repo/grafana/<repo>/<path> path in Vault
repo_secrets: |
APP_ID=github-app:app-id
PRIVATE_KEY=github-app:private-key
- uses: actions/create-github-app-token@v1
id: app-token
with:
app-id: ${{ env.APP_ID }}
private-key: ${{ env.PRIVATE_KEY }}
- name: Create pull request
if: steps.update-grafana.outputs.changed == 'true'
uses: peter-evans/create-pull-request@v5
@@ -111,3 +147,5 @@ jobs:
labels: dependencies
branch: chore/update-minio
delete-branch: true
team-reviewers: "@grafana/loki-squad"
token: ${{ steps.app-token.outputs.token }}

View File

@@ -1,6 +1,7 @@
---
name: helm-ci
on:
workflow_dispatch:
pull_request:
paths:
- "charts/meta-monitoring/**"
@@ -24,7 +25,7 @@ jobs:
# runs-on: ubuntu-latest
# steps:
# - name: Checkout
# uses: actions/checkout@v3
# uses: actions/checkout@v4
# with:
# fetch-depth: 0
@@ -38,10 +39,10 @@ jobs:
# - name: Set up Python
# uses: actions/setup-python@v4
# with:
# python-version: 3.7
# python-version: 3.9
# - name: Set up chart-testing
# uses: helm/chart-testing-action@v2.4.0
# uses: helm/chart-testing-action@v2
# - name: Run chart-testing (list-changed)
# id: list-changed
@@ -55,10 +56,10 @@ jobs:
# run: ct lint --config "${CT_CONFIGFILE}" --check-version-increment=false
# - name: Create kind cluster
# uses: helm/kind-action@v1.8.0
# uses: helm/kind-action@v1
# if: steps.list-changed.outputs.changed == 'true'
# with:
# config: tools/kind.config
# config: "${{ github.workspace }}/.github/configs/cluster-config.yaml"
# - name: Run chart-testing (install)
# run: |

View File

@@ -1,8 +1,6 @@
# meta-monitoring-chart
This is a meta-monitoring chart for Loki.
Note that this is pre-production software at the moment.
This is a meta-monitoring chart for Loki, specifically Loki installed via the Loki helm chart.
## Local and cloud modes
@@ -11,19 +9,15 @@ to small Loki, Mimir and Tempo installations running in the meta-monitoring name
![local mode](docs/images/Meta%20monitoring%20local.png)
To enable local mode set `local.<logs|metrics|traces>.enabled` to true.
In the cloud mode the logs, metrics and/or traces are sent to Grafana Cloud.
![cloud mode](docs/images/Meta%20monitoring%20cloud.png)
To enable cloud mode set `cloud.<logs|metrics|traces>.enabled` to true. The `endpoint`, `username` and `password` settings for your Grafana Cloud logs, metrics and traces instances have to be filled in as well.
Both modes can be enabled at the same time. Cloud mode is preferred.
## Installation
For more instructions including how to update the chart go to the [installation](docs/installation.md) page.
For more instructions including how to install the chart go to the [installation](docs/installation.md) page.
## Supported features
@@ -33,8 +27,7 @@ For more instructions including how to update the chart go to the [installation]
- Specify PII regexes that are applied to logs before they are sent to Loki (cloud or local). The capture group in the regex is replaced with *****.
- a Grafana instance is installed (when local mode is used) with the relevant datasources installed. The following dashboards are installed:
- logs dashboards
- agent dashboards
- Retention is set to 24 hours
- Alloy dashboards
Most of these features are enabled by default. See the values.yaml file for how to enable/disable them.
@@ -42,8 +35,7 @@ Most of these features are enabled by default. See the values.yaml file for how
- This has not been tested on Openshift yet.
- The underlying Loki, Mimir and Tempo are at the default size installed by the Helm chart. This might need changing when monitoring bigger Loki, Mimir or Tempo installations.
- MinIO is used as storage at the moment with a limited retention. At the moment this chart cannot be used for monitoring over longer periods.
- Agent self monitoring is not done at the moment.
- MinIO is used as storage for the local mode at the moment with a limited retention. At the moment this chart cannot be used for monitoring over longer periods.
## Developer help topics

View File

@@ -1,18 +1,18 @@
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.5.1
version: 6.6.2
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.1.1
version: 0.9.2
- name: mimir-distributed
repository: https://grafana.github.io/helm-charts
version: 5.3.0
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.9
version: 1.10.0
- name: minio
repository: https://charts.min.io
version: 5.2.0
digest: sha256:e0c7af6d328fe35f4b9a3557235f458d92225b84b1366dbb77c4626d3cdb5be9
generated: "2024-05-09T07:02:42.911579524Z"
digest: sha256:f9db8c14f253abf3dda0e1828db974daa76e6fbc4ae8c8b59b22ae8ac80134f0
generated: "2024-10-24T12:16:53.621381279-04:00"

View File

@@ -13,7 +13,7 @@ type: application
# This is the chart version. This version number should be incremented each time you make changes
# to the chart and its templates, including the app version.
# Versions are expected to follow Semantic Versioning (https://semver.org/)
version: 0.0.3
version: 1.1.0
# This is the version number of the application being deployed. This version number should be
# incremented each time you make changes to the application. Versions are not expected to
# follow Semantic Versioning. They should reflect the version the application is using.
@@ -22,18 +22,18 @@ appVersion: "0.0.1"
dependencies:
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.5.1
version: 6.6.2
condition: local.logs.enabled
- name: alloy
repository: https://grafana.github.io/helm-charts
version: 0.1.1
version: 0.9.2
- name: mimir-distributed
repository: https://grafana.github.io/helm-charts
version: 5.3.0
condition: local.metrics.enabled
- name: tempo-distributed
repository: https://grafana.github.io/helm-charts
version: 1.9.9
version: 1.10.0
condition: local.traces.enabled
- name: minio
repository: https://charts.min.io

Binary file not shown.

Binary file not shown.

View File

@@ -1824,7 +1824,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*distributor.*|(loki|enterprise-logs)-write)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*distributor.*|(loki|enterprise-logs)-write.*|$namespace-[0-9]+)\"}[$__rate_interval]))",
"intervalFactor": 3,
"legendFormat": "{{pod}}",
"refId": "A"
@@ -1921,7 +1921,7 @@
"steppedLine": false,
"targets": [
{
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"(.*/distributor|(loki|enterprise-logs)-write|.*/loki)\"}",
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", job=~\"(.*/.*distributor|$namespace/(loki|enterprise-logs)-write|.*/loki|$namespace/loki-single-binary)\"}",
"instant": false,
"intervalFactor": 3,
"legendFormat": "{{pod}}",
@@ -2525,7 +2525,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary|$namespace-[0-9]+)\"}[$__rate_interval]))",
"intervalFactor": 3,
"legendFormat": "{{pod}}",
"refId": "A"
@@ -2622,7 +2622,7 @@
"steppedLine": false,
"targets": [
{
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\"}",
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"(.*ingester.*|(loki|enterprise-logs)-write.*|loki-single-binary|$namespace-[0-9]+)\"}",
"instant": false,
"intervalFactor": 3,
"legendFormat": "{{pod}}",
@@ -3308,7 +3308,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by(reason) (rate(loki_ingester_chunks_flushed_total{cluster=~\"$cluster\",job=~\"$namespace/.*ingester.*\", namespace=~\"$namespace\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=~\"$cluster\",job=~\"$namespace/.*ingester.*\", namespace=~\"$namespace\"}[$__rate_interval]))",
"expr": "sum by(reason) (rate(loki_ingester_chunks_flushed_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", namespace=~\"$namespace\"}[$__rate_interval])) / ignoring(reason) group_left sum(rate(loki_ingester_chunks_flushed_total{cluster=~\"$cluster\",job=~\"($namespace)/(.*ingester.*|(loki|enterprise-logs)-write|loki-single-binary)\", namespace=~\"$namespace\"}[$__rate_interval]))",
"interval": "",
"legendFormat": "{{ reason }}"
}
@@ -3388,7 +3388,7 @@
"reverseYBuckets": false,
"targets": [
{
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"($namespace)/(ingester|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by (le) (rate(loki_ingester_chunk_utilization_bucket{cluster=\"$cluster\", job=~\"($namespace)/(.*ingester|(loki|enterprise-logs)-write|loki-single-binary)\"}[$__rate_interval]))",
"format": "heatmap",
"instant": false,
"interval": "",
@@ -3481,7 +3481,7 @@
"steppedLine": false,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*querier.*|(loki|enterprise-logs)-read|loki-single-binary)\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", pod=~\"(.*querier.*|(loki|enterprise-logs)-read.*|loki-single-binary|$namespace-[0-9]+)\"}[$__rate_interval]))",
"intervalFactor": 3,
"legendFormat": "{{pod}}",
"refId": "A"
@@ -3578,7 +3578,7 @@
"steppedLine": false,
"targets": [
{
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"(.*querier.*|(loki|enterprise-logs)-read|.*loki-single-binary)\"}",
"expr": "go_memstats_heap_inuse_bytes{cluster=\"$cluster\", namespace=\"$namespace\", pod=~\"(.*querier.*|(loki|enterprise-logs)-read.*|.*loki-single-binary|$namespace-[0-9]+)\"}",
"instant": false,
"intervalFactor": 3,
"legendFormat": "{{pod}}",

View File

@@ -104,19 +104,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -206,19 +206,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -269,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*query-frontend\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-frontend|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -371,19 +371,19 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -473,19 +473,19 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-scheduler|loki\", pod=~\"query-scheduler|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -536,7 +536,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*query-scheduler\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*query-scheduler|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -638,19 +638,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -740,19 +740,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"querier|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -803,7 +803,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*querier\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*querier|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -854,7 +854,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"querier\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -902,7 +902,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"querier\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"query-frontend|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1462,19 +1462,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1564,19 +1564,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -1627,7 +1627,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*bloom-gateway\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*bloom-gateway|loki-read|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -1678,7 +1678,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"bloom-gateway\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_written_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -1726,7 +1726,7 @@
},
"targets": [
{
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=\"bloom-gateway\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"expr": "sum by(instance, pod, device) (rate(node_disk_read_bytes_total[$__rate_interval])) + ignoring(pod) group_right() (label_replace(count by(instance, pod, device) (container_fs_writes_bytes_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"bloom-gateway|loki\", pod=~\"query-frontend|loki-read-.*|$namespace-[0-9]*\", device!~\".*sda.*\"}), \"device\", \"$1\", \"device\", \"/dev/(.*)\") * 0)\n",
"format": "time_series",
"legendFormat": "{{pod}} - {{device}}",
"legendLink": null
@@ -2189,19 +2189,19 @@
},
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\", resource=\"cpu\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\", resource=\"cpu\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -2291,19 +2291,19 @@
},
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
},
{
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\", resource=\"memory\"} > 0)",
"expr": "min(kube_pod_container_resource_requests{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\", resource=\"memory\"} > 0)",
"format": "time_series",
"legendFormat": "request",
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"ruler|loki\", pod=~\"ruler|loki-backend-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -2354,7 +2354,7 @@
},
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/ruler\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*ruler|loki-backend|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null

View File

@@ -2449,7 +2449,7 @@
"repeatIteration": null,
"repeatRowId": null,
"showTitle": true,
"title": "TSBD Index",
"title": "TSDB Index",
"titleSize": "h6"
},
{
@@ -2897,4 +2897,4 @@
"title": "Loki / Reads",
"uid": "reads",
"version": 0
}
}

View File

@@ -104,7 +104,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"}[$__rate_interval]))",
"expr": "sum by(pod) (rate(container_cpu_usage_seconds_total{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"}[$__rate_interval]))",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -116,7 +116,7 @@
"legendLink": null
},
{
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"expr": "min(container_spec_cpu_quota{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"} / container_spec_cpu_period{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -206,7 +206,7 @@
"span": 4,
"targets": [
{
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"})",
"expr": "max by(pod) (container_memory_working_set_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null
@@ -218,7 +218,7 @@
"legendLink": null
},
{
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor\"} > 0)",
"expr": "min(container_spec_memory_limit_bytes{cluster=~\"$cluster\", namespace=~\"$namespace\", container=~\"distributor|loki\", pod=~\"distributor|loki-write-.*|$namespace-[0-9]*\"} > 0)",
"format": "time_series",
"legendFormat": "limit",
"legendLink": null
@@ -269,7 +269,7 @@
"span": 4,
"targets": [
{
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/.*distributor\"})",
"expr": "sum by(pod) (go_memstats_heap_inuse_bytes{cluster=~\"$cluster\", job=~\"($namespace)/(.*distributor|loki-write|loki-single-binary)\"})",
"format": "time_series",
"legendFormat": "{{pod}}",
"legendLink": null

View File

@@ -120,9 +120,9 @@ data:
replacement = "{{- .Values.clusterLabelValue -}}"
}
rule {
source_labels = ["__meta_kubernetes_pod_container_port_number"]
action = "drop"
regex = "9095"
source_labels = ["__meta_kubernetes_pod_container_port_name"]
action = "keep"
regex = ".*metrics.*"
}
}
@@ -135,6 +135,11 @@ data:
}
prometheus.relabel "filter" {
rule {
target_label = "cluster"
replacement = "{{- .Values.clusterLabelValue -}}"
}
rule {
source_labels = ["__name__"]
regex = "({{ include "agent.all_metrics" . }})"
@@ -330,7 +335,7 @@ data:
{{- if .Values.local.logs.enabled }}
loki.write "local" {
endpoint {
url = "http://{{- .Release.Namespace -}}-loki-gateway.{{- .Release.Namespace -}}.svc.cluster.local:80/loki/api/v1/push"
url = "http://loki-write.{{- .Release.Namespace -}}.svc.cluster.local:3100/loki/api/v1/push"
}
}
{{- end }}
@@ -346,7 +351,7 @@ data:
{{- if .Values.local.traces.enabled }}
otelcol.exporter.otlphttp "local" {
client {
endpoint = "http://{{- .Release.Name -}}-tempo-distributor.svc:4318"
endpoint = "http://{{- .Release.Name -}}-tempo-distributor.{{- .Release.Namespace -}}.svc:4318"
}
}
{{- end }}

View File

@@ -2,7 +2,7 @@
namespacesToMonitor:
- loki
# The name of the cluster where this will be installed
clusterLabelValue: "meta-monitoring"
clusterLabelValue: "meta"
# Set to true to write logs, metrics or traces to Grafana Cloud
# The secrets have to be created first
cloud:
@@ -66,9 +66,11 @@ logs:
# The lines matching these will be kept in Loki
retain:
# This shows the queries
- executing query
- caller=metrics.go
# This shows any errors
- level=error
- level=warn
# Log lines for delete requests
- delete request for user added
- Started processing delete request
@@ -149,6 +151,7 @@ metrics:
- kube_pod_container_resource_requests
- kube_pod_container_status_last_terminated_reason
- kube_pod_container_status_restarts_total
- loki_azure_blob_request_duration_seconds_bucket
- loki_boltdb_shipper_compact_tables_operation_duration_seconds
- loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_seconds
- loki_boltdb_shipper_retention_marker_count_total
@@ -174,11 +177,15 @@ metrics:
- loki_compactor_deleted_lines
- loki_compactor_oldest_pending_delete_request_age_seconds
- loki_compactor_pending_delete_requests_count
- loki_consul_request_duration_seconds_bucket
- loki_discarded_samples_total
- loki_discarded_bytes_total
- loki_distributor_bytes_received_total
- loki_distributor_lines_received_total
- loki_distributor_structured_metadata_bytes_received_total
- loki_gcs_request_duration_seconds_bucket
- loki_gcs_request_duration_seconds_count
- loki_index_request_duration_seconds_bucket
- loki_index_request_duration_seconds_count
- loki_ingester_chunk_age_seconds_bucket
- loki_ingester_chunk_age_seconds_count
@@ -191,6 +198,7 @@ metrics:
- loki_ingester_chunk_entries_sum
- loki_ingester_chunk_size_bytes_bucket
- loki_ingester_chunk_utilization_bucket
- loki_ingester_chunk_utilization_count
- loki_ingester_chunk_utilization_sum
- loki_ingester_chunks_flushed_total
- loki_ingester_flush_queue_length
@@ -208,6 +216,8 @@ metrics:
- loki_ruler_wal_prometheus_remote_storage_samples_total
- loki_ruler_wal_samples_appended_total
- loki_ruler_wal_storage_created_series_total
- loki_s3_request_duration_seconds_bucket
- loki_s3_request_duration_seconds_count
- loki_write_batch_retries_total
- loki_write_dropped_bytes_total
- loki_write_dropped_entries_total

View File

@@ -1,8 +1,12 @@
# Update the dependencies
The dependencies are the version of Loki, Mimir, Agent and so on that are included in this chart.
The dependencies are the versions of Loki, Mimir, Agent and so on that are included in this chart.
The current versions can be found in the [Chart.yaml](../charts/meta-monitoring/Chart.yaml) file.
A Github action runs daily to see if updated versions are available. A PR will be created.
The manual steps are as follows:
Run this in the charts/meta-monitoring directory after updating a dependency:
```

View File

@@ -4,7 +4,7 @@
1. Use an existing Grafana Cloud account or setup a new one. Then create an access token:
1. In Grafana go to Administration -> Users and Access -> Cloud access policies.
1. In a Grafana instance on Grafana Cloud go to Administration -> Users and Access -> Cloud access policies.
1. Click `Create access policy`.
@@ -39,7 +39,7 @@
--from-literal=endpoint='https://otlp-gateway-prod-us-east-0.grafana.net/otlp'
```
The logs, metrics and traces usernames are the `User / Username / Instance IDs` of the Loki, Prometheus/Mimir and OpenTelemetry instances in Grafana Cloud. From `Home` in Grafana click on `Stacks`. Then go to the `Details` pages of Loki and Prometheus/Mimir. For OpenTelemetry go to the `Configure` page.
The logs, metrics and traces usernames are the `User / Username / Instance IDs` of the Loki, Prometheus/Mimir and OpenTelemetry instances in Grafana Cloud. From `Home` in Grafana click on `Stacks`. Then go to the `Details` pages of Loki and Prometheus/Mimir. For OpenTelemetry go to the `Configure` page. The endpoints will also have to be changed to match your settings.
1. Create a values.yaml file based on the [default one](../charts/meta-monitoring/values.yaml). Fill in the names of the secrets created above as needed. An example minimal values.yaml looks like this:
@@ -91,7 +91,7 @@
local:
grafana:
enabled:true
enabled: true
logs:
enabled: true
metrics:
@@ -102,7 +102,7 @@
enabled: true
```
## Installing the chart
## Installing, updating and deleting the chart
1. Add the repo
@@ -175,7 +175,7 @@ For each of the dashboard files in charts/meta-monitoring/src/dashboards folder
## Configure Loki to send traces
1. In the Loki config enable tracing:
1. In the Loki that is being monitored enable tracing in the config:
```
loki:
@@ -187,11 +187,15 @@ For each of the dashboard files in charts/meta-monitoring/src/dashboards folder
1. JAEGER_ENDPOINT: http address of the mmc-alloy service installed by the meta-monitoring chart, for example "http://mmc-alloy:14268/api/traces"
1. JAEGER_AGENT_TAGS: extra tags you would like to add to the spans, for example 'cluster="abc",namespace="def"'
1. JAEGER_SAMPLER_TYPE: the sampling strategy, for example to sample all use 'const' with a value of 1 for the next environment variable
1. JAEGER_SAMPLER_PARAM: 1
1. JAEGER_SAMPLER_TYPE: the sampling strategy, we suggest setting this to `ratelimiting` so at most 1 trace is accepted per second. See these [docs](https://www.jaegertracing.io/docs/1.57/sampling/) for more options.
1. JAEGER_SAMPLER_PARAM: 1.0
1. If Loki is installed in a different namespace you can create an [ExternalName service](https://kubernetes.io/docs/concepts/services-networking/service/#externalname) in Kubernetes to point to the mmc-alloy service in the meta monitoring namespace
## Configure external access using an Ingress in local mode
When using local mode by default a Kubernetes [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) object is created to access the Grafana instance. This will need to be adapted to your cloud provider by updating the `grafana.ingress` section of the `values.yaml` file provided to Helm. Check the documentation of your cloud provider for available options.
When using local mode by default a Kubernetes [Ingress](https://kubernetes.io/docs/concepts/services-networking/ingress/) object is created to access the Grafana instance. This will need to be adapted to your cloud provider by updating the `grafana.ingress` section of the `values.yaml` file provided to Helm. Check the documentation of your cloud provider for available options.
## Kube-state-metrics
Metrics about Kubernetes objects are scraped from [kube-state-metrics](https://github.com/kubernetes/kube-state-metrics). This needs to be installed in the cluster. The `kubeStateMetrics.endpoint` entry in values.yaml should be set to it's address (without the `/metrics` part in the URL).