Getting Started with BizFirst Observe
BizFirst Observe is the unified observability platform for the BizFirstAi platform. It provides logs via Grafana Loki, metrics via Prometheus, distributed traces via Grafana Tempo, and health checks — all wired up with three method calls in your Program.cs.
Production-verified: BizFirst Observe entered production on March 24, 2026 running .NET 9.0 with OpenTelemetry SDK 1.11, Grafana v12.4.1, and Grafana Loki on port 3100. The configuration below reflects that verified stack.
3-Line Setup
Add the following three calls to your Program.cs to register the full observability stack:
// 1. Register OTEL SDK, Prometheus exporter, Loki exporter, Tempo exporter
builder.Services.RegisterService_Observability(builder.Configuration);
// 2. Register Serilog with Loki sink and enrichment
builder.Host.RegisterSerilog_Observability(builder.Configuration);
// 3. Register middleware: TelemetryEnrichmentMiddleware + /metrics + /health endpoints
app.RegisterApp_Observability();
appsettings.json
Add the Observability section to your appsettings.json. Adjust endpoint URLs for your environment:
{
"Observability": {
"Tracing": {
"OtlpEndpoint": "http://localhost:4317",
"SamplingRate": 1.0,
"AlwaysSampleErrors": true
},
"Metrics": {
"PrometheusEndpoint": "http://localhost:9090",
"ScrapeIntervalSeconds": 15
},
"Logging": {
"LokiEndpoint": "http://localhost:3100",
"MinimumLevel": "Information",
"RetentionDays": 3
},
"Grafana": {
"Endpoint": "http://localhost:3000"
}
}
}
Production values: Set SamplingRate to 0.01 (1%) in production and 0.10 (10%) in staging. Set RetentionDays to 30 for production and 7 for staging. AlwaysSampleErrors must always be true.
Verify Your Installation
After starting your application, verify each component is running:
# Check Prometheus /metrics endpoint on your service
curl http://localhost:5000/metrics
# Check Prometheus is scraping (replace with your Prometheus host)
curl http://localhost:9090/api/v1/targets
# Check Grafana is running
curl http://localhost:3000/api/health
# Check Loki is running
curl http://localhost:3100/ready
# Check your service health endpoints
curl http://localhost:5000/health
curl http://localhost:5000/health/live
curl http://localhost:5000/health/ready
Logging
Serilog Registration
BizFirst Observe registers Serilog with three sinks automatically: Console/File (L0, development), Loki HTTP push (L1, all environments), and the SQL Server SecurityAuditLog table (L2, compliance events). You do not need to configure Serilog manually.
// Normally handled by RegisterSerilog_Observability — shown for reference
Log.Logger = new LoggerConfiguration()
.Enrich.FromLogContext()
.WriteTo.Console()
.WriteTo.GrafanaLoki(
"http://localhost:3100",
labels: new[] { new LokiLabel { Key = "app", Value = "bizfirst" } })
.CreateLogger();
Automatic Enrichment Fields
TelemetryEnrichmentMiddleware adds the following fields to every log entry automatically. These fields are also used as Loki labels:
| Field | Source | Purpose |
|---|---|---|
TenantId |
JWT claim / request header | Tenant isolation — all queries filter by this field |
ServerId |
Machine name / pod name | Identify which instance produced the log |
RequestId |
ASP.NET Core TraceIdentifier | Correlate all logs within a single HTTP request |
TraceId |
OpenTelemetry Activity TraceId | Link log entries to the corresponding distributed trace in Tempo |
LogQL Example Queries
Use these queries in Grafana Explore (Loki data source) to investigate logs:
# All errors for a specific tenant
{tenant_id="acme"} |= "ERROR"
# Fatal logs from the payroll service
{service="payroll", tenant_id="acme"} | json | level="Fatal"
# Logs for a specific request trace
{app="bizfirst"} | json | TraceId="4bf92f3577b34da6"
# Logs from a specific server in the last hour
{server_id="prod-node-01"} | json | level=~"Error|Fatal"
# Security audit events for a tenant
{tenant_id="acme", service="auth"} | json | EventType="DENY"
Metrics
Custom Counter & Histogram in C#
Use the IMeterFactory provided by the OTEL SDK. All custom metrics must include a tenant_id tag:
using System.Diagnostics.Metrics;
public class PayrollService
{
private static readonly Meter _meter = new("BizFirst.Payroll", "1.0");
// Counter — increment each time a payroll run completes
private static readonly Counter<long> _payrollCounter =
_meter.CreateCounter<long>(
"bizfirst_payroll_processed_total",
description: "Total number of payroll runs processed");
// Histogram — record duration of each payroll run
private static readonly Histogram<double> _payrollDuration =
_meter.CreateHistogram<double>(
"bizfirst_payroll_duration_seconds",
unit: "s",
description: "Duration of payroll run processing");
public async Task ProcessPayroll(string tenantId)
{
var stopwatch = Stopwatch.StartNew();
try
{
// ... payroll processing logic ...
_payrollCounter.Add(1,
new KeyValuePair<string, object?>("tenant_id", tenantId),
new KeyValuePair<string, object?>("status", "success"));
}
finally
{
stopwatch.Stop();
_payrollDuration.Record(stopwatch.Elapsed.TotalSeconds,
new KeyValuePair<string, object?>("tenant_id", tenantId));
}
}
}
PromQL Example Queries
Use these queries in Grafana Explore (Prometheus data source) to build dashboards and alerts:
# Request rate (requests per second over last 5 minutes)
rate(http_server_request_duration_seconds_count[5m])
# P99 latency per tenant
histogram_quantile(0.99,
sum(rate(http_server_request_duration_seconds_bucket{tenant_id="acme"}[5m]))
by (le))
# 5xx error rate
rate(http_server_request_duration_seconds_count{http_response_status_code=~"5.."}[5m])
/
rate(http_server_request_duration_seconds_count[5m])
# Health status for Kafka (2=Healthy, 1=Degraded, 0=Unhealthy)
bizfirst_health_check_status{component="kafka"}
# Kafka consumer lag
edgestream_kafka_consumer_lag_messages{consumer_group="payroll-processor"}
# Active HTTP requests right now
http_server_active_requests
Tracing
Custom Span in C#
Use ActivitySource to create custom spans. BizFirst Observe pre-registers ActivitySources for all BizFirstAi products:
using System.Diagnostics;
public class WorkflowExecutor
{
private static readonly ActivitySource _activitySource =
new("BizFirst.ProcessEngine");
public async Task ExecuteWorkflow(string workflowId, string tenantId)
{
using var activity = _activitySource.StartActivity("workflow.execute");
// Add span attributes — tenant_id is required
activity?.SetTag("tenant_id", tenantId);
activity?.SetTag("workflow.id", workflowId);
activity?.SetTag("workflow.version", "1.0");
try
{
// Execute each node as a child span
foreach (var node in workflow.Nodes)
{
using var nodeActivity = _activitySource.StartActivity(
"node.execute",
ActivityKind.Internal,
activity?.Context ?? default);
nodeActivity?.SetTag("node.id", node.Id);
nodeActivity?.SetTag("node.type", node.Type);
nodeActivity?.SetTag("tenant_id", tenantId);
await node.ExecuteAsync();
}
}
catch (Exception ex)
{
activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
activity?.RecordException(ex);
throw;
}
}
}
TraceQL Example Queries
Use these queries in Grafana Explore (Tempo data source) to find traces:
# Slow traces (duration over 2 seconds)
{duration > 2s}
# Error traces
{status=error}
# Tenant-specific traces
{.tenant_id="acme"}
# Database query spans over 500ms
{span.db.system="mssql" && duration > 500ms}
# Workflow execution traces for a specific tenant
{name="workflow.execute" && .tenant_id="acme"}
# All traces containing an error span from the payroll service
{resource.service.name="bizfirst-payroll"} | select(status=error)
Health Checks
Registration
BizFirst Observe registers health checks for all six core dependencies automatically via RegisterService_Observability. The equivalent manual registration looks like this:
builder.Services.AddHealthChecks()
.AddKafka(config, name: "kafka", tags: new[] { "ready" })
.AddRedis(redisConnectionString, name: "redis", tags: new[] { "ready" })
.AddSqlServer(sqlConnectionString, name: "sqlserver", tags: new[] { "ready" })
.AddUrlGroup(new Uri("http://localhost:3100/ready"), name: "loki", tags: new[] { "ready" })
.AddUrlGroup(new Uri("http://localhost:4317"), name: "tempo", tags: new[] { "ready" })
.AddUrlGroup(new Uri("http://localhost:3000/api/health"), name: "grafana", tags: new[] { "ready" });
Health Response Format
GET /health returns a JSON body with the overall status and a breakdown per component:
{
"status": "Healthy",
"totalDuration": "00:00:00.1234567",
"entries": {
"kafka": { "status": "Healthy", "duration": "00:00:00.0120000" },
"redis": { "status": "Healthy", "duration": "00:00:00.0030000" },
"sqlserver": { "status": "Healthy", "duration": "00:00:00.0450000" },
"loki": { "status": "Healthy", "duration": "00:00:00.0080000" },
"tempo": { "status": "Degraded", "duration": "00:00:00.3210000",
"description": "Connection timeout" },
"grafana": { "status": "Healthy", "duration": "00:00:00.0110000" }
}
}
Kubernetes Probe Configuration
Use the dedicated liveness and readiness endpoints in your Kubernetes pod spec:
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 10
periodSeconds: 15
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
failureThreshold: 2
Multi-Tenancy
TelemetryEnrichmentMiddleware is registered automatically by app.RegisterApp_Observability(). It reads the TenantId from the authenticated user's JWT claims and adds it as a property to every log entry via Serilog's log context, as a label on every Loki log, as a tag on every OpenTelemetry span, and as a label on every Prometheus metric.
Enforcement rule: Every custom metric you create must include a tenant_id label. The platform enforces this at code review. Metrics without tenant_id cannot be used in per-tenant dashboards and will fail tenant isolation audits.
Alerting
BizFirst Observe ships with four pre-configured AlertManager rules. These are activated when you deploy the included docker-compose.observability.yml:
| Rule | Condition | Duration | Routing |
|---|---|---|---|
| HighErrorRate | 5xx rate > 5% | 5 minutes | PagerDuty (Critical) |
| HighLatency | P95 > 1 second | 10 minutes | Slack (Warning) |
| HighKafkaLag | Lag > 10,000 messages | 5 minutes | Slack (Warning) |
| ComponentDown | health_check_status == 0 | 2 minutes | PagerDuty (Critical) |
Deployment
For local development and single-server deployments, use the included Docker Compose file to run the full observability stack alongside your application:
# Start the full observability stack (Prometheus, Loki, Tempo, Grafana, AlertManager)
docker compose -f docker-compose.observability.yml up -d
# Verify all services are healthy
docker compose -f docker-compose.observability.yml ps
# Service port map:
# Grafana: http://localhost:3000 (dashboards + alerting UI)
# Prometheus: http://localhost:9090 (metrics query + targets)
# Loki: http://localhost:3100 (log storage)
# Tempo OTLP: grpc://localhost:4317 (trace ingestion)
# AlertManager: http://localhost:9093 (alert routing UI)
Next Steps
Ready to go further? Explore these resources: