Structured Logs via Grafana Loki, time-series Metrics via Prometheus, distributed Traces via Grafana Tempo, and Health Checks — all production-verified with real tenant workloads on March 24, 2026.
Custom BizFirst gauges, counters, and histograms plus full ASP.NET Core auto-instrumentation. EdgeStream Kafka metrics included. All metrics labelled by tenant_id. Exposed at /metrics, scraped every 15 seconds by Prometheus.
TelemetryEnrichmentMiddleware automatically adds TenantId, ServerId, RequestId, and TraceId to every log entry, metric, and trace span. Per-tenant Grafana dashboards via $TenantId template variable. SecurityAuditLog indexed by (TenantId, CreatedAt).
Four production-ready alert rules: HighErrorRate, HighLatency, HighKafkaLag, ComponentDown. Critical alerts route to PagerDuty immediately. Warning alerts route to Slack. Full 30-minute escalation ladder configured out of the box.
docker-compose.observability.yml spins up Prometheus (9090), Loki (3100), Tempo (4317), Grafana v12.4.1 (3000), and AlertManager in one command. Full Kubernetes manifests with liveness and readiness probe configuration included.
L2 compliance tier: every RBAC decision (ALLOW/DENY/ERROR) written to SecurityAuditLog with PolicyId, PrivilegeKey, PrincipalId, ResourceNodeId, ActionType, Reason, and TenantId. GDPR-compliant — user IDs hashed, no PII in logs, configurable TTL retention.
Full-chain traces exported via OTLP gRPC to Grafana Tempo on port 4317. Spans cover the complete request path: HTTP request → service layer → database queries → Kafka → SignalR. Service dependency graph auto-built from span relationships.
Environment-aware sampling rates configured in appsettings.json: 100% development, 10% staging, 1% production. AlwaysSampleErrors policy guarantees no error trace is ever dropped, regardless of sampling rate or environment.
Query traces in Grafana using TraceQL. TraceId is automatically written to every Loki log entry by TelemetryEnrichmentMiddleware, enabling one-click navigation from any trace span to its associated log lines.
50+ built-in metrics exposed at /metrics in Prometheus text format. BizFirst custom metrics, EdgeStream Kafka metrics, and ASP.NET Core auto-instrumented HTTP server metrics all available. Scrape interval: 15 seconds.
Structured logs pushed to Grafana Loki via HTTP sink. Three log tiers operational: L0 Console/File for development, L1 Loki for all environments, L2 SecurityAuditLog for compliance. LogQL querying available immediately in Grafana Explore.
Establishes the core multi-tenancy pattern: TelemetryEnrichmentMiddleware automatically adds TenantId, ServerId, RequestId, and TraceId to every log entry and metric. Per-tenant label enforcement introduced as a platform requirement for all custom metrics.
Three lines of C#. Full-stack observability. Production-verified.