IK.AM


Dev > CaaS > Kubernetes > TAP

Exclude Tanzu Application Platform Components from Prometheus Service Discovery

Created on Thu Sep 21 2023 • Last Updated on Thu Sep 21 2023N/A Views

🏷️ Kubernetes | Tanzu | TAP | Prometheus

Warning

This article was automatically translated by OpenAI (gpt-4o). It may be edited eventually, but please be aware that it may contain incorrect information at this time.

Many components deployed on the Tanzu Application Platform have metrics endpoints for Prometheus and are annotated with prometheus.io/scrape: true to be targets for Prometheus Service Discovery.

While it is convenient that metrics are automatically collected just by introducing Prometheus or an agent compatible with Prometheus Service Discovery (such as DataDog, Wavefront, Grafana Agent), it can lead to the unintended collection of a large number of metrics, significantly increasing the number of metrics that are not being monitored.

Especially when using SaaS, the amount of metrics per unit time often correlates with the billing amount, potentially leading to a situation where you are "paying for metrics you are not using."

In this article, I will introduce a method to turn off the prometheus.io/scrape: true setting and temporarily exclude components from being scraped. You can exclude unnecessary metrics for the time being and re-enable them when needed.

To exclude components from being scraped, you need to modify the manifest (annotations), so we will use overlays.

Below, I will introduce the configuration method for each package, but you can collectively change the tap-values.yaml and update TAP for each profile. Also, I have not confirmed whether all packages are covered. I will only introduce the ones I have excluded.

Contour

Target Profiles: run, build, view, iterate, full

This component emits the most metrics. Since Envoy has a Grafana dashboard, you might consider utilizing it.

Create the following Secret:

cat <<EOF > contour-disable-scrape.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: contour-disable-scrape
  namespace: tap-install
type: Opaque
stringData:
  contour-disable-scrape.yml: |
    #@ load("@ytt:overlay", "overlay")

    #@ for kind, name in [['DaemonSet', 'envoy'], ['Deployment', 'envoy'], ['Deployment', 'contour']]:
    #@overlay/match by=overlay.subset({"kind": kind, "metadata": {"name": name}}), expects="0+"
    ---
    spec:
      template:
        metadata:
          annotations:
            prometheus.io/scrape: "false"
    #@ end
---
EOF

kubectl apply -f contour-disable-scrape.yaml

Add the following settings to tap-values.yaml under package_overlays and update TAP:

package_overlays:
- name: contour
  secrets:
  - name: contour-disable-scrape
  # ...

# ... 

Cloud Native Runtimes

Target Profiles: run, iterate, full

This component emits the second most metrics. Since Knative Serving has a Grafana dashboard, you might consider utilizing it.

Create the following Secret:

cat <<EOF > cnrs-disable-scrape.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: cnrs-disable-scrape
  namespace: tap-install
type: Opaque
stringData:
  cnrs-disable-scrape.yml: |
    #@ load("@ytt:overlay", "overlay")
    #@overlay/match by=overlay.subset({"kind":"Deployment","metadata":{"namespace": "knative-serving"}}),expects="1+"
    ---
    spec:
      template:
        #@overlay/match-child-defaults missing_ok=True
        metadata:
          annotations:
            prometheus.io/scrape: 'false'
            wavefront.com/scrape: 'false'
---
EOF

kubectl apply -f cnrs-disable-scrape.yaml

Add the following settings to tap-values.yaml under package_overlays and update TAP:

package_overlays:
- name: cnrs
  secrets:
  - name: cnrs-disable-scrape
  # ...

# ... 

cert-manager

Target Profiles: run, build, view, iterate, full

Since it is included in all profiles, it accumulates.

Create the following Secret:

cat <<EOF > cert-manager-disable-scrape.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: cert-manager-disable-scrape
  namespace: tap-install
type: Opaque
stringData:
  cert-manager-disable-scrape.yml: |
    #@ load("@ytt:overlay", "overlay")
    #@overlay/match by=overlay.subset({"kind":"Deployment","metadata":{"namespace": "cert-manager"}}),expects="1+"
    ---
    spec:
      template:
        #@overlay/match-child-defaults missing_ok=True
        metadata:
          annotations:
            prometheus.io/scrape: 'false'
---
EOF

kubectl apply -f cnrs-disable-scrape.yaml

Add the following settings to tap-values.yaml under package_overlays and update TAP:

package_overlays:
- name: cert-manager
  secrets:
  - name: cert-manager-disable-scrape
  # ...

# ... 

Flux

Target Profiles: run, build, iterate, full

Create the following Secret:

cat <<EOF > fluxcd-source-controller-disable-scrape.yaml
---
apiVersion: v1
kind: Secret
metadata:
  name: fluxcd-source-controller-disable-scrape
  namespace: tap-install
type: Opaque
stringData:
  fluxcd-source-controller-disable-scrape.yml: |
    #@ load("@ytt:overlay", "overlay")
    #@overlay/match by=overlay.subset({"kind":"Deployment","metadata":{"namespace": "flux-system"}}),expects="1+"
    ---
    spec:
      template:
        #@overlay/match-child-defaults missing_ok=True
        metadata:
          annotations:
            prometheus.io/scrape: 'false'
---
EOF

kubectl apply -f cnrs-disable-scrape.yaml

Add the following settings to tap-values.yaml under package_overlays and update TAP:

package_overlays:
- name: fluxcd-source-controller
  secrets:
  - name: fluxcd-source-controller-disable-scrape
  # ...

# ...

If there are other components that are targets for scraping, you can handle them using the same method.

Ideally, increasing the number of metrics collected would make the platform more "observable" and better, but it is a trade-off with billing. The above settings go against the trend of enhancing "observability," so re-enable metrics collection as needed to make the platform more "observable."

Found a mistake? Update the entry.