OpenTelemetry(OTLP)に対応したLGTM Observability基盤をBitnamiのHelm Chartで構築するメモ
GrafanaのLGTM (Loki, Grafana, Tempo, Mimir)スタックをKubernetesにデプロイします。
Loki (Logs), Tempo (Traces), Mimir (Metrics)はすべてOpenTelemetry protocol (OTLP)に対応しているので、LGTMスタックで Logs, Traces, MetricsをOTLPで統一して受け取れるObservability基盤を構築できます。
今回はBitnamiのHelm ChartでLGTMをデプロイします。それぞれ次のChartを利用します。
- https://github.com/bitnami/charts/tree/main/bitnami/grafana-loki
- https://github.com/bitnami/charts/tree/main/bitnami/grafana-operator
- https://github.com/bitnami/charts/tree/main/bitnami/grafana-tempo
- https://github.com/bitnami/charts/tree/main/bitnami/grafana-mimir
Loki, Tempo, Mimirは以下のように実装に少しばらつきがあります。
| プロダクト | サポートされているプロトコル | エンドポイントのパス | X-Scope-OrgIDリクエストヘッダー |
|---|---|---|---|
| Loki | OTLP/HTTP | /otlp/v1/logs |
不要 |
| Tempo | OTLP/HTTP, OTLP/gRPC | /v1/traces |
不要 |
| Mimir | OTLP/HTTP | /otlp/v1/metrics |
必須 |
またいずれも認証の仕組みは提供していません。
そこで、今回のObservability基盤は次のような構成で構築します。
- OTLP/HTTPのみをサポート
- NginxでBasic認証を提供
- NginxのPath RewritingでOTLPエンドポイントのパスを
/v1/{logs,traces,metrics}に統一 - PrometheusのRemote Writeエンドポイントも
/v1/remote_writeで対応 X-Scope-OrgIDリクエストヘッダーが設定されていない場合はNginx側でダミー値を設定- それ以外のパスはGrafanaにルーティング
次の図のような構成になります。
今回はOrbStack上のMetal LBをインストール済みのKindを使用します。セットアップ方法はこちらの記事を参照してください。 他のK8sでも同様に構築できると思います。
Observability基盤構築前のK8s Nodeの状態は次のとおりです。
$ kubectl describe node
Name: kind-control-plane
Roles: control-plane
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
kubernetes.io/arch=arm64
kubernetes.io/hostname=kind-control-plane
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 20 Jan 2025 17:59:08 +0900
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: kind-control-plane
AcquireTime: <unset>
RenewTime: Wed, 22 Jan 2025 23:25:02 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Wed, 22 Jan 2025 23:23:35 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Wed, 22 Jan 2025 23:23:35 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Wed, 22 Jan 2025 23:23:35 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Wed, 22 Jan 2025 23:23:35 +0900 Mon, 20 Jan 2025 17:59:29 +0900 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.107.2
Hostname: kind-control-plane
Capacity:
cpu: 16
ephemeral-storage: 1690465072Ki
memory: 98754248Ki
pods: 110
Allocatable:
cpu: 16
ephemeral-storage: 1690465072Ki
memory: 98754248Ki
pods: 110
System Info:
Machine ID: 2ed21c3f7b25469ebf6153009f257c5e
System UUID: 2ed21c3f7b25469ebf6153009f257c5e
Boot ID: 0863cdb6-1991-46e5-b9eb-c2de57e10d60
Kernel Version: 6.12.9-orbstack-00297-gaa9b46293ea3
OS Image: Debian GNU/Linux 12 (bookworm)
Operating System: linux
Architecture: arm64
Container Runtime Version: containerd://1.7.24
Kubelet Version: v1.32.0
Kube-Proxy Version: v1.32.0
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
ProviderID: kind://docker/kind/kind-control-plane
Non-terminated Pods: (11 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system coredns-668d6bf9bc-jz8vz 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 2d5h
kube-system coredns-668d6bf9bc-m9p5v 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 2d5h
kube-system etcd-kind-control-plane 100m (0%) 0 (0%) 100Mi (0%) 0 (0%) 2d5h
kube-system kindnet-kkjsb 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 2d5h
kube-system kube-apiserver-kind-control-plane 250m (1%) 0 (0%) 0 (0%) 0 (0%) 2d5h
kube-system kube-controller-manager-kind-control-plane 200m (1%) 0 (0%) 0 (0%) 0 (0%) 2d5h
kube-system kube-proxy-gcgn8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d5h
kube-system kube-scheduler-kind-control-plane 100m (0%) 0 (0%) 0 (0%) 0 (0%) 2d5h
local-path-storage local-path-provisioner-58cc7856b6-lbnp8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d5h
metallb-system controller-bb5f47665-fz9pt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d5h
metallb-system speaker-ktwbc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d5h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 950m (5%) 100m (0%)
memory 290Mi (0%) 390Mi (0%)
ephemeral-storage 0 (0%) 0 (0%)
Events: <none>
以下インストールするChartのバージョンは検証したもので固定します。
Lokiのインストール
まずはLokiから。
次のようにhelmのvalues.yamlを作成します。
K8s Podのログを収集は不要なのでpromtail.enabledをfalseにします。
また、Gateway (Nginx)は別に用意するのでgateway.enabledもfalseにします。
cat <<'EOF' > loki-helm-values.yaml
---
promtail:
enabled: false
gateway:
enabled: false
ingester:
persistence:
size: 8Gi
storageClass: ""
---
EOF
次のコマンドでLokiをインストールします。
helm upgrade --install --create-namespace \
-n loki \
loki \
oci://registry-1.docker.io/bitnamicharts/grafana-loki \
-f loki-helm-values.yaml \
--version 4.7.2 \
--wait
インストールが完了したら、Pod, Service, PersistenceVolumeClaimは次のようになります。
$ kubectl get pod,svc,pvc -n loki
NAME READY STATUS RESTARTS AGE
pod/loki-grafana-loki-compactor-6f6d9559f7-9qrtz 1/1 Running 0 5m11s
pod/loki-grafana-loki-distributor-57d779f6bb-b4zmp 1/1 Running 0 5m11s
pod/loki-grafana-loki-ingester-0 1/1 Running 0 5m11s
pod/loki-grafana-loki-querier-0 1/1 Running 0 5m11s
pod/loki-grafana-loki-query-frontend-674bd6c587-dftjh 1/1 Running 0 5m11s
pod/loki-memcachedchunks-0 1/1 Running 0 5m11s
pod/loki-memcachedfrontend-0 1/1 Running 0 5m11s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/loki-grafana-loki-compactor ClusterIP 10.96.208.172 <none> 3100/TCP,9095/TCP 5m11s
service/loki-grafana-loki-distributor ClusterIP 10.96.120.59 <none> 3100/TCP,9095/TCP 5m11s
service/loki-grafana-loki-gossip-ring ClusterIP None <none> 7946/TCP 5m11s
service/loki-grafana-loki-ingester ClusterIP 10.96.145.75 <none> 3100/TCP,9095/TCP 5m11s
service/loki-grafana-loki-querier ClusterIP 10.96.248.182 <none> 3100/TCP,9095/TCP 5m11s
service/loki-grafana-loki-query-frontend ClusterIP 10.96.12.177 <none> 3100/TCP,9095/TCP 5m11s
service/loki-grafana-loki-query-frontend-headless ClusterIP None <none> 3100/TCP,9095/TCP 5m11s
service/loki-memcachedchunks ClusterIP 10.96.160.73 <none> 11211/TCP 5m11s
service/loki-memcachedfrontend ClusterIP 10.96.243.212 <none> 11211/TCP 5m11s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/data-loki-grafana-loki-ingester-0 Bound pvc-7176b198-ebe9-4d2f-82a8-6f834af68136 8Gi RWO standard <unset> 5m11s
persistentvolumeclaim/data-loki-grafana-loki-querier-0 Bound pvc-f90d245d-6d1c-49db-92dc-586e0b035a94 8Gi RWO standard <unset> 5m11s
persistentvolumeclaim/loki-grafana-loki-compactor Bound pvc-e4aad685-3275-40cd-9f94-f7fa4c731df0 8Gi RWO standard <unset> 5m11s
Tempo
次にTempo。
次のようにhelmのvalues.yamlを作成します。 デフォルトでOTLPのreceiverのみ有効にします。 ingesterはデフォルトのリソースだとOut of Memoryになったため、少し変更しています。
cat <<'EOF' > tempo-helm-values.yaml
---
tempo:
traces:
jaeger:
grpc: false
thriftBinary: false
thriftCompact: false
thriftHttp: false
otlp:
http: true
grpc: true
ingester:
resources:
requests:
memory: 256Mi
limits:
memory: 512Mi
persistence:
size: 8Gi
storageClass: ""
distributor:
service:
type: ClusterIP
compactor:
persistence:
size: 8Gi
storageClass: ""
querier:
persistence:
size: 8Gi
storageClass: ""
vulture:
enabled: false
---
EOF
次のコマンドでTempoをインストールします。
helm upgrade --install --create-namespace \
-n tempo \
tempo \
oci://registry-1.docker.io/bitnamicharts/grafana-tempo \
-f tempo-helm-values.yaml \
--version 3.8.3 \
--wait
インストールが完了したら、Pod, Service, PersistenceVolumeClaimは次のようになります。
$ kubectl get pod,svc,pvc -n tempo
NAME READY STATUS RESTARTS AGE
pod/tempo-grafana-tempo-compactor-6dc998ff9b-xkwvj 1/1 Running 0 2m43s
pod/tempo-grafana-tempo-distributor-864dfdc486-hjbjd 1/1 Running 0 2m43s
pod/tempo-grafana-tempo-ingester-0 1/1 Running 0 2m43s
pod/tempo-grafana-tempo-metrics-generator-85769bd45-z9fp8 1/1 Running 0 2m43s
pod/tempo-grafana-tempo-querier-59ccb5b748-n8g2m 1/1 Running 0 2m43s
pod/tempo-grafana-tempo-query-frontend-65f7df794d-kpc2z 1/1 Running 0 2m43s
pod/tempo-memcached-84f684d575-w47qj 1/1 Running 0 2m43s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/tempo-grafana-tempo-compactor ClusterIP 10.96.199.145 <none> 3200/TCP,9095/TCP 2m43s
service/tempo-grafana-tempo-distributor ClusterIP 10.96.235.163 <none> 3200/TCP,9095/TCP,4318/TCP,55681/TCP,4317/TCP,55680/TCP 2m43s
service/tempo-grafana-tempo-gossip-ring ClusterIP None <none> 7946/TCP 2m43s
service/tempo-grafana-tempo-ingester ClusterIP 10.96.113.54 <none> 3200/TCP,9095/TCP 2m43s
service/tempo-grafana-tempo-metrics-generator ClusterIP 10.96.60.72 <none> 3200/TCP,9095/TCP 2m43s
service/tempo-grafana-tempo-querier ClusterIP 10.96.148.254 <none> 3200/TCP,9095/TCP 2m43s
service/tempo-grafana-tempo-query-frontend ClusterIP 10.96.166.76 <none> 3200/TCP,9095/TCP 2m43s
service/tempo-grafana-tempo-query-frontend-headless ClusterIP None <none> 3200/TCP,9095/TCP 2m43s
service/tempo-memcached ClusterIP 10.96.207.57 <none> 11211/TCP 2m43s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/data-tempo-grafana-tempo-ingester-0 Bound pvc-3ad6ac99-3ef8-431b-98f2-7984f40116fc 8Gi RWO standard <unset> 2m43s
Mimir
次にMimir。
次のようにhelmのvalues.yamlを作成します。
Grafana Cloudの仕様と統一するため、OTLPでインジェスとするメトリクスの名前のsuffixに単位を追記したいと思います。(例: jvm_memory_used -> jvm_memory_used_bytes)
そのためにはconfigにlimits.otel_metric_suffixes_enabled: trueを設定する必要があります。
この設定はmimir.configurationに記述できますが、部分的な追記ができないため、デフォルトの設定をコピーした上で、カスタマイズしたい箇所を修正します。
ここでも、Gateway (Nginx)は別に用意するのでgateway.enabledもfalseにします。
cat <<'EOF' > mimir-helm-values.yaml
---
gateway:
enabled: false
ingester:
persistence:
size: 8Gi
storageClass: ""
mimir:
#! * frontend.max_outstanding_per_tenant: 10000
#! * limits.otel_metric_suffixes_enabled: true
#! * limits.promote_otel_resource_attributes: TBD
#! See https://grafana.com/docs/mimir/latest/configure/configuration-parameters/#limits
configuration: |
usage_stats:
installation_mode: helm
activity_tracker:
filepath: {{ .Values.mimir.dataDir }}/activity.log
alertmanager_storage:
{{- if .Values.minio.enabled }}
backend: s3
s3:
access_key_id: ${MIMIR_MINIO_ACCESS_KEY_ID}
secret_access_key: ${MIMIR_MINIO_SECRET_ACCESS_KEY}
bucket_name: alertmanager
endpoint: "{{ include "grafana-mimir.minio.fullname" . }}:{{ .Values.minio.service.ports.api }}"
insecure: {{ not .Values.minio.tls.enabled }}
{{- else }}
backend: {{ .Values.alertmanager.blockStorage.backend }}
{{ .Values.alertmanager.blockStorage.backend }}:
{{- include "common.tplvalues.render" (dict "value" .Values.alertmanager.blockStorage.config "context" $) | nindent 4 }}
{{- end }}
# This configures how the store-gateway synchronizes blocks stored in the bucket. It uses Minio by default for getting started (configured via flags) but this should be changed for production deployments.
blocks_storage:
bucket_store:
sync_dir: {{ .Values.mimir.dataDir }}/tsdb-sync
{{- if .Values.memcachedchunks.enabled }}
chunks_cache:
backend: memcached
memcached:
addresses: {{ include "grafana-mimir.memcached-chunks.host" . }}
timeout: 450ms
{{- end }}
{{- if .Values.memcachedindex.enabled }}
index_cache:
backend: memcached
memcached:
addresses: {{ include "grafana-mimir.memcached-index.host" . }}
timeout: 450ms
{{- end }}
{{- if .Values.memcachedmetadata.enabled }}
metadata_cache:
backend: memcached
memcached:
addresses: {{ include "grafana-mimir.memcached-metadata.host" . }}
timeout: 450ms
{{- end }}
{{- if .Values.minio.enabled }}
backend: s3
s3:
access_key_id: ${MIMIR_MINIO_ACCESS_KEY_ID}
secret_access_key: ${MIMIR_MINIO_SECRET_ACCESS_KEY}
bucket_name: mimir
endpoint: "{{ include "grafana-mimir.minio.fullname" . }}:{{ .Values.minio.service.ports.api }}"
insecure: {{ not .Values.minio.tls.enabled }}
{{- else }}
backend: {{ .Values.mimir.blockStorage.backend }}
{{ .Values.mimir.blockStorage.backend }}:
{{- include "common.tplvalues.render" (dict "value" .Values.mimir.blockStorage.config "context" $) | nindent 4 }}
{{- end }}
tsdb:
dir: {{ .Values.mimir.dataDir }}/tsdb
ingester:
compaction_interval: 30m
deletion_delay: 2h
max_closing_blocks_concurrency: 2
max_opening_blocks_concurrency: 4
symbols_flushers_concurrency: 4
data_dir: {{ .Values.mimir.dataDir }}/ingester
sharding_ring:
wait_stability_min_duration: 1m
compactor:
data_dir: {{ .Values.mimir.dataDir }}/compactor
frontend:
parallelize_shardable_queries: true
max_outstanding_per_tenant: 10000
{{- if .Values.memcachedfrontend.enabled }}
results_cache:
backend: memcached
memcached:
timeout: 500ms
addresses: {{ include "grafana-mimir.memcached-frontend.host" . }}
cache_results: true
{{- end }}
{{- if .Values.queryScheduler.enabled }}
scheduler_address: {{ template "grafana-mimir.query-scheduler.fullname" . }}-headless.{{ .Release.Namespace }}.svc:{{ .Values.queryScheduler.service.ports.grpc }}
{{- end }}
frontend_worker:
grpc_client_config:
max_send_msg_size: 419430400 # 400MiB
{{- if .Values.queryScheduler.enabled }}
scheduler_address: {{ template "grafana-mimir.query-scheduler.fullname" . }}-headless.{{ .Release.Namespace }}.svc:{{ .Values.queryScheduler.service.ports.grpc }}
{{- else }}
frontend_address: {{ template "grafana-mimir.query-frontend.fullname" . }}-headless.{{ .Release.Namespace }}.svc:{{ .Values.queryFrontend.service.ports.grpc }}
{{- end }}
ingester:
ring:
final_sleep: 0s
num_tokens: 512
tokens_file_path: {{ .Values.mimir.dataDir }}/tokens
unregister_on_shutdown: false
ingester_client:
grpc_client_config:
max_recv_msg_size: 104857600
max_send_msg_size: 104857600
limits:
# Limit queries to 500 days. You can override this on a per-tenant basis.
max_total_query_length: 12000h
# Adjust max query parallelism to 16x sharding, without sharding we can run 15d queries fully in parallel.
# With sharding we can further shard each day another 16 times. 15 days * 16 shards = 240 subqueries.
max_query_parallelism: 240
# Avoid caching results newer than 10m because some samples can be delayed
# This presents caching incomplete results
max_cache_freshness: 10m
# additional config
otel_metric_suffixes_enabled: true
memberlist:
abort_if_cluster_join_fails: false
compression_enabled: false
advertise_port: {{ .Values.mimir.containerPorts.gossipRing }}
bind_port: {{ .Values.mimir.containerPorts.gossipRing }}
join_members:
- dns+{{ include "grafana-mimir.gossip-ring.fullname" . }}.{{ .Release.Namespace }}.svc.{{ .Values.clusterDomain }}:{{ .Values.mimir.gossipRing.service.ports.http }}
querier:
# With query sharding we run more but smaller queries. We must strike a balance
# which allows us to process more sharded queries in parallel when requested, but not overload
# queriers during non-sharded queries.
max_concurrent: 16
query_scheduler:
# Increase from default of 100 to account for queries created by query sharding
max_outstanding_requests_per_tenant: 800
server:
grpc_server_max_concurrent_streams: 1000
grpc_server_max_connection_age: 2m
grpc_server_max_connection_age_grace: 5m
grpc_server_max_connection_idle: 1m
http_listen_port: {{ .Values.mimir.containerPorts.http }}
grpc_listen_port: {{ .Values.mimir.containerPorts.grpc }}
api:
alertmanager_http_prefix: {{ .Values.mimir.httpPrefix.alertmanager }}
prometheus_http_prefix: {{ .Values.mimir.httpPrefix.prometheus }}
store_gateway:
sharding_ring:
wait_stability_min_duration: 1m
tokens_file_path: {{ .Values.mimir.dataDir }}/tokens
{{- if .Values.ruler.enabled }}
ruler:
alertmanager_url: dnssrvnoa+http://_http-metrics._tcp.{{ include "grafana-mimir.alertmanager.fullname" . }}-headless.{{ .Release.Namespace }}.svc.{{ .Values.clusterDomain }}/alertmanager
enable_api: true
rule_path: {{ .Values.mimir.dataDir }}/ruler
ruler_storage:
{{- if .Values.minio.enabled }}
backend: s3
s3:
access_key_id: ${MIMIR_MINIO_ACCESS_KEY_ID}
secret_access_key: ${MIMIR_MINIO_SECRET_ACCESS_KEY}
bucket_name: ruler
endpoint: "{{ include "grafana-mimir.minio.fullname" . }}:{{ .Values.minio.service.ports.api }}"
insecure: {{ not .Values.minio.tls.enabled }}
{{- else }}
backend: {{ .Values.ruler.blockStorage.backend }}
{{ .Values.ruler.blockStorage.backend }}:
{{- include "common.tplvalues.render" (dict "value" .Values.ruler.blockStorage.config "context" $) | nindent 4 }}
{{- end }}
{{- end }}
{{- if .Values.alertmanager.enabled }}
alertmanager:
data_dir: {{ .Values.mimir.dataDir }}/alert-manager
enable_api: true
external_url: {{ .Values.mimir.httpPrefix.alertmanager }}
{{- if .Values.minio.enabled }}
alertmanager_storage:
backend: s3
s3:
access_key_id: ${MIMIR_MINIO_ACCESS_KEY_ID}
secret_access_key: ${MIMIR_MINIO_SECRET_ACCESS_KEY}
bucket_name: ruler
endpoint: "{{ include "grafana-mimir.minio.fullname" . }}:{{ .Values.minio.service.ports.api }}"
insecure: {{ not .Values.minio.tls.enabled }}
{{- end }}
{{- end }}
---
EOF
次のコマンドでMimirをインストールします。
helm upgrade --install --create-namespace \
-n mimir \
mimir \
oci://registry-1.docker.io/bitnamicharts/grafana-mimir \
-f mimir-helm-values.yaml \
--version 1.3.2 \
--wait
インストールが完了したら、Pod, Service, PersistenceVolumeClaimは次のようになります。
$ kubectl get pod,svc,pvc -n mimir
NAME READY STATUS RESTARTS AGE
pod/mimir-grafana-mimir-compactor-0 1/1 Running 0 2m43s
pod/mimir-grafana-mimir-distributor-54847f8855-cqbhf 1/1 Running 0 2m44s
pod/mimir-grafana-mimir-ingester-0 1/1 Running 0 2m44s
pod/mimir-grafana-mimir-ingester-1 1/1 Running 0 106s
pod/mimir-grafana-mimir-querier-6ccd5959c7-j2tfj 1/1 Running 0 2m44s
pod/mimir-grafana-mimir-query-frontend-7b64dc7b56-9rfpc 1/1 Running 0 2m44s
pod/mimir-grafana-mimir-store-gateway-0 1/1 Running 0 2m44s
pod/mimir-memcachedchunks-0 1/1 Running 0 2m44s
pod/mimir-memcachedfrontend-0 1/1 Running 0 2m44s
pod/mimir-memcachedindex-0 1/1 Running 0 2m44s
pod/mimir-memcachedmetadata-0 1/1 Running 0 2m44s
pod/mimir-minio-d77856fcf-t75gh 1/1 Running 0 2m44s
pod/mimir-minio-provisioning-2gpx6 0/1 Completed 0 50s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/mimir-grafana-mimir-compactor ClusterIP 10.96.242.3 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-distributor ClusterIP 10.96.108.117 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-distributor-headless ClusterIP None <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-gossip-ring ClusterIP None <none> 7946/TCP 2m44s
service/mimir-grafana-mimir-ingester ClusterIP 10.96.248.115 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-ingester-headless ClusterIP None <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-querier ClusterIP 10.96.149.67 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-querier-headless ClusterIP None <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-query-frontend ClusterIP 10.96.250.120 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-query-frontend-headless ClusterIP None <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-store-gateway ClusterIP 10.96.119.22 <none> 8080/TCP,9095/TCP 2m44s
service/mimir-grafana-mimir-store-gateway-headless ClusterIP None <none> 8080/TCP,9095/TCP 2m44s
service/mimir-memcachedchunks ClusterIP 10.96.5.10 <none> 11211/TCP 2m44s
service/mimir-memcachedfrontend ClusterIP 10.96.191.236 <none> 11211/TCP 2m44s
service/mimir-memcachedindex ClusterIP 10.96.69.198 <none> 11211/TCP 2m44s
service/mimir-memcachedmetadata ClusterIP 10.96.118.90 <none> 11211/TCP 2m44s
service/mimir-minio ClusterIP 10.96.164.45 <none> 80/TCP,9001/TCP 2m44s
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS VOLUMEATTRIBUTESCLASS AGE
persistentvolumeclaim/data-mimir-grafana-mimir-compactor-0 Bound pvc-f21ef1f6-aeb3-41d6-b959-764d4e8f6612 8Gi RWO standard <unset> 2m44s
persistentvolumeclaim/data-mimir-grafana-mimir-ingester-0 Bound pvc-932c10dd-694d-4e31-9811-91cd5c0cc3ac 8Gi RWO standard <unset> 2m44s
persistentvolumeclaim/data-mimir-grafana-mimir-ingester-1 Bound pvc-343f5844-c701-4674-82da-cbf5b0bc5533 8Gi RWO standard <unset> 106s
persistentvolumeclaim/data-mimir-grafana-mimir-store-gateway-0 Bound pvc-eb21d87b-b6fc-4cb0-9505-8b555ca703f8 8Gi RWO standard <unset> 2m44s
persistentvolumeclaim/mimir-minio Bound pvc-da55929e-f976-4406-a89f-ba18b67336cf 8Gi RWO standard <unset> 2m44s
Grafana
次にGrafana。GrafanaはGrafana Operatorを使ってインストールします。Grafana Operatorと同時にGrafanaもインストールできます。
次のようにhelmのvalues.yamlを作成します。admin_passwordは適当に変更してください。
Tempo, Loki, Mimir(Prometheus)のDataSourceおよび、Javaアプリ用のDashBoardを一緒に登録します。
cat <<'EOF' > grafana-helm-values.yaml
---
grafana:
service:
type: ClusterIP
config:
security:
admin_user: admin
admin_password: changeme
extraDeploy:
- apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
name: tempo
namespace: grafana
spec:
instanceSelector:
matchLabels:
app.kubernetes.io/instance: grafana
datasource:
name: tempo
type: tempo
access: proxy
basicAuth: false
url: http://tempo-grafana-tempo-query-frontend.tempo.svc.cluster.local:3200
isDefault: false
jsonData:
tlsSkipVerify: false
timeInterval: "5s"
tracesToLogsV2:
datasourceUid: loki
spanStartTimeShift: "-3h"
spanEndTimeShift: "3h"
filterByTraceID: true
filterBySpanID: false
- apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
name: loki
namespace: grafana
spec:
instanceSelector:
matchLabels:
app.kubernetes.io/instance: grafana
datasource:
name: loki
type: loki
access: proxy
basicAuth: false
url: http://loki-grafana-loki-query-frontend.loki.svc.cluster.local:3100
isDefault: false
jsonData:
tlsSkipVerify: false
timeInterval: "5s"
maxLines: 50
derivedFields:
- datasourceUid: tempo
matcherType: label
matcherRegex: trace_id
name: traceId
url: ${__value.raw}
- apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDatasource
metadata:
name: mimir
namespace: grafana
spec:
instanceSelector:
matchLabels:
app.kubernetes.io/instance: grafana
datasource:
name: mimir
type: prometheus
access: proxy
basicAuth: false
url: http://mimir-grafana-mimir-query-frontend.mimir.svc.cluster.local:8080/prometheus
isDefault: false
jsonData:
exemplarTraceIdDestinations: [ ]
httpHeaderName1: X-Scope-OrgID
httpMethod: POST
defaultEditor: code
secureJsonData:
httpHeaderValue1: anonymous
- apiVersion: grafana.integreatly.org/v1beta1
kind: GrafanaDashboard
metadata:
name: grafanadashboard-otel-java-dashboard
namespace: grafana
spec:
instanceSelector:
matchLabels:
app.kubernetes.io/instance: grafana
resyncPeriod: 60m
url: https://raw.githubusercontent.com/making/k8s-gitops/refs/heads/main/peach/platform/grafana/dashboard/otel-java-dashboard.json
---
EOF
次のコマンドでGrafanaをインストールします。
helm upgrade --install --create-namespace \
-n grafana \
grafana \
oci://registry-1.docker.io/bitnamicharts/grafana-operator \
-f grafana-helm-values.yaml \
--version 4.8.1 \
--wait
インストールが完了したら、Pod, Serviceは次のようになります。Grafanaに対する設定はすべてCustom Resource経由で行うので、Grafanaには永続ボリュームはアタッチしません。
$ kubectl get pod,svc -n grafana
NAME READY STATUS RESTARTS AGE
pod/grafana-grafana-operator-75d4b8b8b6-5c6h8 1/1 Running 0 113s
pod/grafana-grafana-operator-grafana-deployment-84bdbdd564-sjn26 1/1 Running 0 94s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/grafana-grafana-operator-grafana-service ClusterIP 10.96.111.125 <none> 3000/TCP 94s
作成されたCustom Resourceは次のようになります。
$ kubectl get grafana,grafanadatasource,grafanadashboard -n grafana
NAME VERSION STAGE STAGE STATUS AGE
grafana.grafana.integreatly.org/grafana-grafana-operator-grafana 11.3.0-pre complete success 2m15s
NAME NO MATCHING INSTANCES LAST RESYNC AGE
grafanadatasource.grafana.integreatly.org/loki 76s 2m15s
grafanadatasource.grafana.integreatly.org/mimir 76s 2m15s
grafanadatasource.grafana.integreatly.org/tempo 76s 2m15s
NAME NO MATCHING INSTANCES LAST RESYNC AGE
grafanadashboard.grafana.integreatly.org/grafanadashboard-otel-java-dashboard 82s 2m15s
Nginx
最後にGatewayとなるNginxをインストールします。今回はシンプルにするため、Nginxをtype=LoadBalancerで公開し、TLSの設定は行わず、HTTPでアクセスすることとします。
cat <<'EOF' > nginx-helm-values.yaml
---
service:
type: LoadBalancer
serverBlock: |
map $http_x_scope_orgid $ensured_x_scope_orgid {
default $http_x_scope_orgid;
"" "anonymous";
}
server {
listen 8080;
server_name _;
location /v1/traces {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/auth/.htpasswd;
proxy_pass http://tempo-grafana-tempo-distributor.tempo.svc.cluster.local:4318;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /v1/logs {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/auth/.htpasswd;
# Path mapping: /v1/logs -> /otlp/v1/logs
rewrite ^/v1/logs(.*)$ /otlp/v1/logs$1 break;
proxy_pass http://loki-grafana-loki-distributor.loki.svc.cluster.local:3100;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
location /v1/metrics {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/auth/.htpasswd;
# Path mapping: /v1/metrics -> /otlp/v1/metrics
rewrite ^/v1/metrics(.*)$ /otlp/v1/metrics$1 break;
proxy_pass http://mimir-grafana-mimir-distributor.mimir:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Scope-OrgID $ensured_x_scope_orgid;
}
location /v1/remote_write {
auth_basic "Restricted Access";
auth_basic_user_file /etc/nginx/auth/.htpasswd;
# Path mapping: /v1/remote_write -> /api/v1/push
rewrite ^/v1/remote_write(.*)$ /api/v1/push$1 break;
proxy_pass http://mimir-grafana-mimir-distributor.mimir:8080;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header X-Scope-OrgID $ensured_x_scope_orgid;
}
location / {
proxy_pass http://grafana-grafana-operator-grafana-service.grafana.svc.cluster.local:3000;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
}
extraVolumeMounts:
- name: auth-volume
mountPath: /etc/nginx/auth
readOnly: true
extraVolumes:
- name: auth-volume
secret:
secretName: nginx-basic-auth
---
EOF
事前にBasic認証のためのユーザー情報を生成しておきます。
echo 'password' | htpasswd -i -c .htpasswd myuser
kubectl create ns nginx
kubectl create secret -n nginx generic nginx-basic-auth --from-file=.htpasswd
次のコマンドでNginxをインストールします。
helm upgrade --install --create-namespace \
-n nginx \
nginx \
oci://registry-1.docker.io/bitnamicharts/nginx \
-f nginx-helm-values.yaml \
--version 18.3.5 \
--wait
インストールが完了したら、Pod, Serviceは次のようになります。
$ kubectl get pod,svc -n nginx
NAME READY STATUS RESTARTS AGE
pod/nginx-6d46c47b5c-82xqz 1/1 Running 0 16s
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/nginx LoadBalancer 10.96.251.24 192.168.107.200 80:32282/TCP,443:30134/TCP 16s
NginxのExternal IP(この例では192.168.107.200)にブラウザでアクセスしてください。Grafanaのログインページにリダイレクトされます。
設定変更していなければ、ユーザー名 / パスワードはadmin / changemeでログインできます。
最初からJavaアプリ用のダッシュボードが設定されています。
Logi, Tempo, Mimirのデータソースの設定も行われています。
Trace, Logs, MetricsのOTLPエンドポイントに空リクエストを送ってみます。ここではNginxのBasic認証のユーザー名を使用します。 それぞれレスポンスの形式がおこなりますが、20Xのレスポンスが返ってくればOKです。
$ curl -u myuser:password http://192.168.107.200/v1/traces -H "Content-Type: application/json" -d "{}" -v
> POST /v1/traces HTTP/1.1
> Host: 192.168.107.200
> Authorization: Basic bXl1c2VyOnBhc3N3b3Jk
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 2
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Wed, 22 Jan 2025 15:01:29 GMT
< Content-Type: application/json
< Content-Length: 21
< Connection: keep-alive
< X-Frame-Options: SAMEORIGIN
<
{"partialSuccess":{}}
$ curl -u myuser:password http://192.168.107.200/v1/logs -H "Content-Type: application/json" -d "{}" -v
> POST /v1/logs HTTP/1.1
> Host: 192.168.107.200
> Authorization: Basic bXl1c2VyOnBhc3N3b3Jk
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 2
>
< HTTP/1.1 204 No Content
< Server: nginx
< Date: Wed, 22 Jan 2025 15:01:55 GMT
< Connection: keep-alive
< X-Frame-Options: SAMEORIGIN
<
$ curl -u myuser:password http://192.168.107.200/v1/metrics -H "Content-Type: application/json" -d "{}" -v
> POST /v1/metrics HTTP/1.1
> Host: 192.168.107.200
> Authorization: Basic bXl1c2VyOnBhc3N3b3Jk
> User-Agent: curl/8.7.1
> Accept: */*
> Content-Type: application/json
> Content-Length: 2
>
< HTTP/1.1 200 OK
< Server: nginx
< Date: Wed, 22 Jan 2025 15:02:37 GMT
< Content-Length: 0
< Connection: keep-alive
< X-Frame-Options: SAMEORIGIN
<
ここまでで、インストールしたHelm Chartは以下の通りです。
$ helm ls -A
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
grafana grafana 1 2025-01-23 00:57:38.657919 +0900 JST deployed grafana-operator-4.8.1 5.15.1
loki loki 1 2025-01-22 23:26:09.945145 +0900 JST deployed grafana-loki-4.7.2 3.3.2
mimir mimir 1 2025-01-22 23:41:15.097167 +0900 JST deployed grafana-mimir-1.3.2 2.15.0
nginx nginx 1 2025-01-22 23:56:03.284183 +0900 JST deployed nginx-18.3.5 1.27.3
tempo tempo 1 2025-01-22 23:31:47.725091 +0900 JST deployed grafana-tempo-3.8.3 2.7.0
Observability基盤構築後のK8s Nodeの状態は次のとおりです。
$ kubectl describe node
Name: kind-control-plane
Roles: control-plane
Labels: beta.kubernetes.io/arch=arm64
beta.kubernetes.io/os=linux
kubernetes.io/arch=arm64
kubernetes.io/hostname=kind-control-plane
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=
Annotations: kubeadm.alpha.kubernetes.io/cri-socket: unix:///run/containerd/containerd.sock
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Mon, 20 Jan 2025 17:59:08 +0900
Taints: <none>
Unschedulable: false
Lease:
HolderIdentity: kind-control-plane
AcquireTime: <unset>
RenewTime: Thu, 23 Jan 2025 10:34:30 +0900
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Thu, 23 Jan 2025 10:30:21 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Thu, 23 Jan 2025 10:30:21 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Thu, 23 Jan 2025 10:30:21 +0900 Mon, 20 Jan 2025 17:59:07 +0900 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Thu, 23 Jan 2025 10:30:21 +0900 Mon, 20 Jan 2025 17:59:29 +0900 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 192.168.107.2
Hostname: kind-control-plane
Capacity:
cpu: 16
ephemeral-storage: 1690465072Ki
memory: 98754248Ki
pods: 110
Allocatable:
cpu: 16
ephemeral-storage: 1690465072Ki
memory: 98754248Ki
pods: 110
System Info:
Machine ID: 2ed21c3f7b25469ebf6153009f257c5e
System UUID: 2ed21c3f7b25469ebf6153009f257c5e
Boot ID: 0863cdb6-1991-46e5-b9eb-c2de57e10d60
Kernel Version: 6.12.9-orbstack-00297-gaa9b46293ea3
OS Image: Debian GNU/Linux 12 (bookworm)
Operating System: linux
Architecture: arm64
Container Runtime Version: containerd://1.7.24
Kubelet Version: v1.32.0
Kube-Proxy Version: v1.32.0
PodCIDR: 10.244.0.0/24
PodCIDRs: 10.244.0.0/24
ProviderID: kind://docker/kind/kind-control-plane
Non-terminated Pods: (40 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits Age
--------- ---- ------------ ---------- --------------- ------------- ---
grafana grafana-grafana-operator-75d4b8b8b6-5c6h8 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
grafana grafana-grafana-operator-grafana-deployment-84bdbdd564-sjn26 100m (0%) 0 (0%) 256Mi (0%) 1Gi (1%) 10h
kube-system coredns-668d6bf9bc-jz8vz 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 2d16h
kube-system coredns-668d6bf9bc-m9p5v 100m (0%) 0 (0%) 70Mi (0%) 170Mi (0%) 2d16h
kube-system etcd-kind-control-plane 100m (0%) 0 (0%) 100Mi (0%) 0 (0%) 2d16h
kube-system kindnet-kkjsb 100m (0%) 100m (0%) 50Mi (0%) 50Mi (0%) 2d16h
kube-system kube-apiserver-kind-control-plane 250m (1%) 0 (0%) 0 (0%) 0 (0%) 2d16h
kube-system kube-controller-manager-kind-control-plane 200m (1%) 0 (0%) 0 (0%) 0 (0%) 2d16h
kube-system kube-proxy-gcgn8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d16h
kube-system kube-scheduler-kind-control-plane 100m (0%) 0 (0%) 0 (0%) 0 (0%) 2d16h
local-path-storage local-path-provisioner-58cc7856b6-lbnp8 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d16h
loki loki-grafana-loki-compactor-6f6d9559f7-9qrtz 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
loki loki-grafana-loki-distributor-57d779f6bb-b4zmp 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
loki loki-grafana-loki-ingester-0 250m (1%) 375m (2%) 256Mi (0%) 384Mi (0%) 11h
loki loki-grafana-loki-querier-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
loki loki-grafana-loki-query-frontend-674bd6c587-dftjh 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
loki loki-memcachedchunks-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
loki loki-memcachedfrontend-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
metallb-system controller-bb5f47665-fz9pt 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d16h
metallb-system speaker-ktwbc 0 (0%) 0 (0%) 0 (0%) 0 (0%) 2d16h
mimir mimir-grafana-mimir-compactor-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-distributor-54847f8855-cqbhf 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-ingester-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-ingester-1 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-querier-6ccd5959c7-j2tfj 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-query-frontend-7b64dc7b56-9rfpc 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-grafana-mimir-store-gateway-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-memcachedchunks-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-memcachedfrontend-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-memcachedindex-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-memcachedmetadata-0 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
mimir mimir-minio-d77856fcf-t75gh 250m (1%) 375m (2%) 256Mi (0%) 384Mi (0%) 10h
nginx nginx-6d46c47b5c-82xqz 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 10h
tempo tempo-grafana-tempo-compactor-6dc998ff9b-xkwvj 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
tempo tempo-grafana-tempo-distributor-864dfdc486-hjbjd 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
tempo tempo-grafana-tempo-ingester-0 0 (0%) 0 (0%) 256Mi (0%) 512Mi (0%) 11h
tempo tempo-grafana-tempo-metrics-generator-85769bd45-z9fp8 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
tempo tempo-grafana-tempo-querier-59ccb5b748-n8g2m 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
tempo tempo-grafana-tempo-query-frontend-65f7df794d-kpc2z 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
tempo tempo-memcached-84f684d575-w47qj 100m (0%) 150m (0%) 128Mi (0%) 192Mi (0%) 11h
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 4050m (25%) 4600m (28%)
memory 4514Mi (4%) 7494Mi (7%)
ephemeral-storage 1350Mi (0%) 54Gi (3%)
Events: <none>
Memcachedが合計7台立っています... Helm Chartの設定で共用の外部Memcachedを使用することもできますが、設定が煩雑になるのと、ドキュメントで用途ごとにMemcachedを使うことが推奨されているので、このままにしています。
また、各コンポーネントのresourceがデフォルトのnano resourcePresetのままなので、実際に運用する場合は必要に応じて調整が必要でしょう。
サンプルJavaアプリ
ではサンプルのJavaアプリからこのObservability Stackにシグナルを送ってみましょう。 サンプルをアプリをビルドします。
git clone https://github.com/making/demo-zipkin-otel
cd demo-zipkin-otel
./mvnw clean install -DskipTests
このアプリは元々はBrave OTel (Tracer)向けのサンプルでしたが、今回はトレースだけでなくメトリクス、ログも計測できるようにOTel Java Agentを使用します。
Agentをダウンロードします。
wget https://github.com/open-telemetry/opentelemetry-java-instrumentation/releases/latest/download/opentelemetry-javaagent.jar
Tip
opentelemetry-java-instrumentation 2.12.0 (opentelemetry sdk 1.46.0)で試しました。
次の環境変数を設定します。
cat <<'EOF' > otel-opts.sh
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://192.168.107.200:80
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic bXl1c2VyOnBhc3N3b3Jk"
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=1.0
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment.name=laptop
export OTEL_RESOURCE_DISABLED_KEYS=process.command_args,process.command_line,process.executable.path,process.pid,os.type,os.description,host.arch,container.id,k8s.replicaset.name,k8s.deployment.name
EOF
バックエンドを起動します。
source otel-opts.sh
OTEL_SERVICE_NAME=backend java -javaagent:./opentelemetry-javaagent.jar -jar backend/target/backend-0.0.1-SNAPSHOT.jar --management.zipkin.tracing.export.enabled=false
フロントエンドを起動します。
source otel-opts.sh
OTEL_SERVICE_NAME=frontend java -javaagent:./opentelemetry-javaagent.jar -jar frontend/target/frontend-0.0.1-SNAPSHOT.jar --management.zipkin.tracing.export.enabled=false
フロントエンドにリクエストを送ります。
$ curl http://localhost:8080
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Hello World!</title>
</head>
<body>
<p><strong>Hello World!</strong></p>
</body>
</html>
継続的にリクエストを送りましょう。
while true;do curl -s http://localhost:8080 > /dev/null;sleep 3;done
GrafanaでOTel Java Dashboardを参照すると次のメトリクスのグラフが表示されます。jobでアプリ(backend or frontend)を切り替えられます。
backend
frontend
次に"Explore" -> "loki"に行きます。"Label fileters"でservice_nameにbackendを設定し、"Run query"ボタンをクリックすると、backendアプリのログが表示されます。
ログの行をクリックするとAttributesを確認できます。
また、traceIdのリンクに"tempo"ボタンが表示されます。クリックまたは新しいタブで開くと、該当のトレースの詳細画面にジャンプできます。
Spanをクリックして、Attributesを確認できます。
今度は"Log for this span"ボタンが表示され、クリックすると該当のログが表示されます。このようにログとトレース間で行き来することができるようになります。
"Explore" -> "tempo"に行くと、トレース一覧を確認できます。こちらの画面から特定のトレースを参照し、そこからログにもジャンプする、ということも可能です。
Note
メトリクスからトレースにジャンプするためにexemplarsがありますが、まだ検証できていません。
こちらの記事で試したレガシーアプリのシグナルも送ってみます。
記事中のsetenv.shの設定を次のように変えてTomcatを起動してください。
cat <<'EOF' > $CATALINA_HOME/bin/setenv.sh
export CATALINA_OPTS="$CATALINA_OPTS -javaagent:$CATALINA_HOME/bin/opentelemetry-javaagent.jar"
export OTEL_SERVICE_NAME=legacy-app
export OTEL_TRACES_EXPORTER=otlp
export OTEL_METRICS_EXPORTER=otlp
export OTEL_LOGS_EXPORTER=otlp
export OTEL_EXPORTER_OTLP_ENDPOINT=http://192.168.107.200:80
export OTEL_EXPORTER_OTLP_HEADERS="Authorization=Basic bXl1c2VyOnBhc3N3b3Jk"
export OTEL_TRACES_SAMPLER=traceidratio
export OTEL_TRACES_SAMPLER_ARG=1.0
export OTEL_EXPORTER_OTLP_COMPRESSION=gzip
export OTEL_EXPORTER_OTLP_PROTOCOL=http/protobuf
export OTEL_RESOURCE_ATTRIBUTES=deployment.environment.name=laptop
export OTEL_RESOURCE_DISABLED_KEYS=process.command_args,process.command_line,process.executable.path,process.pid,os.type,os.description,host.arch,container.id,k8s.replicaset.name,k8s.deployment.name
EOF
アプリのログがLokiから参照できるようになります。
また、ログからトレースにジャンプできます。
トレースも確認できます。
メトリクスも確認できます。
サンプルNode.jsアプリ
次はNode.jsアプリを試してみます。Generative AIのInstrumentationのサンプルでOpenAIへのリクエストのトレースとリクエスト・レスポンスの詳細のログイベントが同時に取れます。
git clone https://github.com/openzipkin-contrib/zipkin-otel
cd zipkin-otel/collector-http/src/test/resources/nodejs
export OTEL_EXPORTER_OTLP_ENDPOINT=http://192.168.107.200:80
export OTEL_INSTRUMENTATION_GENAI_CAPTURE_MESSAGE_CONTENT=true
export OPENAI_API_KEY=sk-**** (Your OpenAI API Key!!)
./run.sh
トレースと、
それに関連するログが確認できました。
Uninstall
今回構築した基盤をアンインストールするには次のコマンドを実行してください。
helm uninstall -n loki loki --wait
kubectl delete pvc -n loki --all
helm uninstall -n tempo tempo --wait
kubectl delete pvc -n tempo --all
helm uninstall -n mimir mimir --wait
kubectl delete pvc -n mimir --all
kubectl delete job -n mimir --all
helm uninstall -n grafana grafana --wait
helm uninstall -n nginx nginx --wait
kubectl delete secret -n nginx nginx-basic-auth