Building AI-Assisted Operations: Agentic AI Workshop

Chapter 3: Deploy Platform Workload

In this chapter, you’ll deploy the monitoring stack that our AI agents will use to analyze the cluster. This includes Prometheus for metrics collection, Grafana for visualization, and a sample application (Podinfo) for testing.

Goals

Deploy Metrics Server for resource metrics
Deploy kube-prometheus-stack (Prometheus + Grafana + Alertmanager)
Deploy Podinfo as a sample application with metrics
Enable the observability-agent to query metrics

Estimated Time: 30 minutes

Why do we need monitoring?

Our AI agents need data to make informed decisions:

k8s-agent queries the Kubernetes API for pod status, events, and configurations
observability-agent queries Prometheus for metrics like CPU, memory, and application performance

Without Prometheus, the observability-agent has no data source to query!

Step 1: Create the Project Directory

Create a new directory for this chapter’s code:

mkdir -p cfgmgmtcamp-2026-platform
cd cfgmgmtcamp-2026-platform
pulumi new typescript -f

Install the Kubernetes provider:

npm install @pulumi/kubernetes

Step 2: Write the Pulumi Program

Open index.ts and replace the contents with the following code to deploy the monitoring stack:

import * as k8s from "@pulumi/kubernetes";
import * as pulumi from "@pulumi/pulumi";

// Configuration
const config = new pulumi.Config();
const grafanaAdminPassword = config.getSecret("grafanaAdminPassword") || pulumi.secret("workshop-admin");

// Create monitoring namespace
const monitoringNs = new k8s.core.v1.Namespace("monitoring", {
    metadata: { name: "monitoring" },
});

// Create apps namespace for sample application
const appsNs = new k8s.core.v1.Namespace("apps", {
    metadata: { name: "apps" },
});

// Install Metrics Server (required for kubectl top and HPA)
const metricsServer = new k8s.helm.v3.Release("metrics-server", {
    chart: "metrics-server",
    repositoryOpts: {
        repo: "https://kubernetes-sigs.github.io/metrics-server/",
    },
    namespace: "kube-system",
    version: "3.13.0",
    values: {
        args: [
            "--kubelet-insecure-tls",
        ],
    },
});

// Install kube-prometheus-stack (Prometheus, Grafana, Alertmanager)
const prometheusStack = new k8s.helm.v3.Release("kube-prometheus-stack", {
    name: "kube-prometheus-stack", // Explicit release name for consistent service names
    chart: "kube-prometheus-stack",
    repositoryOpts: {
        repo: "https://prometheus-community.github.io/helm-charts",
    },
    namespace: monitoringNs.metadata.name,
    version: "81.4.2",
    values: {
        grafana: {
            enabled: true,
            adminPassword: grafanaAdminPassword,
            service: {
                type: "LoadBalancer",
                port: 80,
            },
            // Disable init container that blocks startup
            sidecar: {
                datasources: {
                    enabled: true,
                    initDatasources: false,
                },
            },
            "grafana.ini": {
                // Enable anonymous access for observability-agent
                "auth.anonymous": {
                    enabled: true,
                    org_role: "Viewer",
                },
            },
        },
        prometheus: {
            prometheusSpec: {
                retention: "24h",
                serviceMonitorSelectorNilUsesHelmValues: false,
                podMonitorSelectorNilUsesHelmValues: false,
            },
        },
        alertmanager: { enabled: true },
        // Disable components not available in managed K8s
        kubeEtcd: { enabled: false },
        kubeControllerManager: { enabled: false },
        kubeScheduler: { enabled: false },
        kubeProxy: { enabled: false },
    },
}, { dependsOn: [monitoringNs] });

// Deploy Podinfo as sample application
const podinfo = new k8s.helm.v3.Release("podinfo", {
    chart: "podinfo",
    repositoryOpts: {
        repo: "https://stefanprodan.github.io/podinfo",
    },
    namespace: appsNs.metadata.name,
    version: "6.9.4",
    values: {
        replicaCount: 2,
        serviceMonitor: {
            enabled: true,
            interval: "15s",
        },
    },
}, { dependsOn: [appsNs, prometheusStack] });

// Exports
export const monitoringNamespace = monitoringNs.metadata.name;
export const appsNamespace = appsNs.metadata.name;

Click to see YAML version

name: 03-platform-workload
runtime: yaml
description: Deploy monitoring stack and sample application

config:
  grafanaAdminPassword:
    type: string
    secret: true
    default: workshop-admin

resources:
  monitoring-ns:
    type: kubernetes:core/v1:Namespace
    properties:
      metadata:
        name: monitoring

  apps-ns:
    type: kubernetes:core/v1:Namespace
    properties:
      metadata:
        name: apps

  metrics-server:
    type: kubernetes:helm.sh/v3:Release
    properties:
      chart: metrics-server
      repositoryOpts:
        repo: https://kubernetes-sigs.github.io/metrics-server/
      namespace: kube-system
      version: "3.13.0"
      values:
        args:
          - --kubelet-insecure-tls

  kube-prometheus-stack:
    type: kubernetes:helm.sh/v3:Release
    properties:
      name: kube-prometheus-stack  # Explicit release name for consistent service names
      chart: kube-prometheus-stack
      repositoryOpts:
        repo: https://prometheus-community.github.io/helm-charts
      namespace: ${monitoring-ns.metadata.name}
      version: "81.4.2"
      values:
        grafana:
          enabled: true
          adminPassword: ${grafanaAdminPassword}
          service:
            type: LoadBalancer
            port: 80
          sidecar:
            datasources:
              enabled: true
              initDatasources: false
          grafana.ini:
            auth.anonymous:
              enabled: true
              org_role: Viewer
        prometheus:
          prometheusSpec:
            retention: 24h
            serviceMonitorSelectorNilUsesHelmValues: false
            podMonitorSelectorNilUsesHelmValues: false
        alertmanager:
          enabled: true
        kubeEtcd:
          enabled: false
        kubeControllerManager:
          enabled: false
        kubeScheduler:
          enabled: false
        kubeProxy:
          enabled: false
    options:
      dependsOn:
        - ${monitoring-ns}

  podinfo:
    type: kubernetes:helm.sh/v3:Release
    properties:
      chart: podinfo
      repositoryOpts:
        repo: https://stefanprodan.github.io/podinfo
      namespace: ${apps-ns.metadata.name}
      version: "6.9.4"
      values:
        replicaCount: 2
        serviceMonitor:
          enabled: true
          interval: 15s
    options:
      dependsOn:
        - ${apps-ns}
        - ${kube-prometheus-stack}

outputs:
  monitoringNamespace: ${monitoring-ns.metadata.name}
  appsNamespace: ${apps-ns.metadata.name}

Step 3: Configure the Stack

Create Pulumi.dev.yaml in your project directory to import the workload ESC environment:

environment:
  - cfgmgmtcamp-2026-workshop-infra-env/workload

This reuses the same ESC environment from Chapter 2, which provides:

kubernetes:kubeconfig - Automatically connects to your cluster
grafanaAdminPassword - Grafana admin password (defaults to workshop-admin)

Step 4: Deploy the Stack

Run pulumi up to deploy:

pulumi up

Warning: This deployment takes 3-5 minutes as it installs several Helm charts.

Step 5: Verify the Deployment

Check that all pods are running:

# Check monitoring stack
pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl get pods -n monitoring

# Check metrics server
pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl get pods -n kube-system | grep metrics

# Check sample application
pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl get pods -n apps

Step 6: Access Grafana

Get the Grafana LoadBalancer IP:

GRAFANA_IP=$(pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl get svc -n monitoring kube-prometheus-stack-grafana -o jsonpath='{.status.loadBalancer.ingress[0].ip}')
echo "Grafana URL: http://$GRAFANA_IP"

Username: admin
Password: workshop-admin (or your configured password)

Step 7: Test Metrics Server

Verify that kubectl top works:

# View node resource usage
pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl top nodes

# View pod resource usage
pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl top pods -n apps

Step 8: Test the Observability Agent

Now that Prometheus is running, the observability-agent can query real metrics.

Open the Kagent dashboard (from Chapter 2)
Select observability-agent
Try these prompts:
- “What is the CPU usage of pods in the apps namespace?”
- “Show me memory usage for the podinfo deployment”
- “Which nodes have the highest CPU utilization and should I be concerned?”

The observability-agent will generate PromQL queries, execute them against Prometheus, and return actual metric data.

Step 9: Explore Prometheus (Optional)

To access the Prometheus UI directly, you can port-forward:

pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl port-forward -n monitoring svc/kube-prometheus-stack-prometheus 9090:9090

Open http://localhost:9090 to access the Prometheus UI and try queries like:

# Container CPU usage in apps namespace
rate(container_cpu_usage_seconds_total{namespace="apps"}[5m])

# Container memory usage in apps namespace
container_memory_working_set_bytes{namespace="apps"}

# Podinfo HTTP requests by status code
sum(rate(http_requests_total{namespace="apps"}[5m])) by (status)

Tip: You can also explore metrics through Grafana’s “Explore” feature, which doesn’t require port-forwarding.

Architecture Overview

graph TB
    subgraph ks["kube-system"]
        ms["Metrics Server"]
        ms --> |"kubectl top, HPA"| usage["Resource Usage"]
    end

    subgraph mon["monitoring"]
        prom["Prometheus<br/>(metrics)"]
        graf["Grafana<br/>(dashboards)"]
        alert["Alertmanager<br/>(alerts)"]
    end

    subgraph apps["apps"]
        pod["Podinfo<br/>(2 replicas)"]
        sm["ServiceMonitor"]
        pod --> sm
    end

    sm --> prom
    prom --> graf
    prom --> alert

    style ks fill:#e3f2fd,stroke:#1976d2
    style mon fill:#e8f5e9,stroke:#388e3c
    style apps fill:#fff3e0,stroke:#f57c00

Checkpoint

Before proceeding, verify:

All pods in monitoring namespace are Running
Metrics Server is running in kube-system
Podinfo is running with 2 replicas in apps
pulumi env run ... -- kubectl top nodes shows resource usage
observability-agent can answer questions about metrics

Stretch Goals

Create a Custom Dashboard: Import or create a Grafana dashboard for Podinfo metrics
Test Alerting: Check the Alertmanager UI for any firing alerts
Explore ServiceMonitor: Run pulumi env run cfgmgmtcamp-2026-workshop-infra-env/workload -- kubectl get servicemonitor -A to see how Prometheus discovers metrics

Learn More

Next: Chapter 4: Multi-Agent Troubleshooting