arial
GitHub

Axiom Observability

Monitor Arial runs with Axiom for production-grade observability. Get structured logs, timing metrics, and insights into your AI workstreams.


Overview

Arial integrates with Axiom to provide centralized logging and observability for your workstream executions. When enabled, Arial automatically sends structured events to Axiom, giving you:

  • Real-time visibility into run progress
  • Historical data for debugging failed runs
  • Timing metrics for performance analysis
  • Centralized logs across multiple machines/CI runs

The integration is optional and non-blocking. If Axiom is unavailable, Arial continues running normally.


Setup

1. Create an Axiom Account

Sign up at axiom.co if you don't have an account. Axiom offers a generous free tier.

2. Create a Dataset

In the Axiom dashboard:

  1. Go to Datasets
  2. Click New Dataset
  3. Name it arial (or your preferred name)

3. Create an API Token

  1. Go to Settings > API Tokens
  2. Click New API Token
  3. Give it a name like "Arial CLI"
  4. Grant it Ingest permission for your dataset
  5. Copy the token (starts with xaat-)

4. Configure Environment Variables

export AXIOM_TOKEN=xaat-your-token-here
export AXIOM_DATASET=arial

For multi-org accounts, also set:

export AXIOM_ORG_ID=your-org-id

5. Run Arial

arial run

You'll see confirmation that Axiom is enabled:

Axiom logging enabled

Environment Variables

VariableRequiredDefaultDescription
`AXIOM_TOKEN`Yes-API token with ingest permissions
`AXIOM_DATASET`No`arial`Dataset to send events to
`AXIOM_ORG_ID`No-Organization ID (for multi-org accounts)

Event Types

Arial sends structured events to Axiom. Each event includes:

  • _time: ISO timestamp
  • service: Always "arial"
  • type: Event type identifier
  • Event-specific fields

Run Events

EventFieldsDescription
`run.started``specsDir`, `workstreamCount`Emitted when `arial run` begins
`run.completed``duration`, `success`Emitted when all workstreams finish

Workstream Events

EventFieldsDescription
`workstream.started``workstreamId`, `branch`Agent begins execution
`workstream.completed``workstreamId`, `duration`Agent finishes successfully
`workstream.failed``workstreamId`, `error`, `retryCount`Agent encountered an error

Merge Events

EventFieldsDescription
`merge.started``workstreamId`, `branch`Branch merge begins
`merge.completed``workstreamId`Branch merged successfully
`merge.conflict``workstreamId`, `retryCount`Merge conflict detected

Other Events

EventFieldsDescription
`context.added``workstreamId`Context from another workstream added

Example Events

Run Started

{
  "_time": "2024-01-15T10:00:00.000Z",
  "service": "arial",
  "type": "run.started",
  "specsDir": "./specs",
  "workstreamCount": 3
}

Workstream Completed

{
  "_time": "2024-01-15T10:02:15.000Z",
  "service": "arial",
  "type": "workstream.completed",
  "workstreamId": "auth",
  "duration": 135000
}

Workstream Failed

{
  "_time": "2024-01-15T10:01:30.000Z",
  "service": "arial",
  "type": "workstream.failed",
  "workstreamId": "api",
  "error": "Agent exceeded context limit",
  "retryCount": 0
}

Querying Data

Axiom APL Examples

All events from the last hour:

['arial']
| where _time > ago(1h)

Failed workstreams:

['arial']
| where type == "workstream.failed"
| project _time, workstreamId, error

Average workstream duration:

['arial']
| where type == "workstream.completed"
| summarize avg(duration) by bin(_time, 1h)

Run success rate:

['arial']
| where type == "run.completed"
| summarize
    total = count(),
    successful = countif(success == true)
| extend success_rate = (successful * 100.0) / total

Workstreams with merge conflicts:

['arial']
| where type == "merge.conflict"
| summarize conflicts = count() by workstreamId
| order by conflicts desc

Creating Dashboards

Build an Arial monitoring dashboard with these widgets:

Run Overview

  • Success Rate: Pie chart of run.completed by success
  • Runs Over Time: Time series of run.started events

Workstream Health

  • Status Distribution: Count by workstream.completed vs workstream.failed
  • Average Duration: Line chart of workstream durations
  • Failure Reasons: Table of workstream.failed grouped by error

Merge Analytics

  • Conflict Rate: Percentage of workstreams with merge.conflict events
  • Retry Distribution: Histogram of retryCount values

CI/CD Integration

GitHub Actions

- name: Run Arial
  env:
    AXIOM_TOKEN: ${{ secrets.AXIOM_TOKEN }}
    AXIOM_DATASET: arial-ci
  run: arial run

GitLab CI

run-arial:
  script:
    - arial run
  variables:
    AXIOM_TOKEN: $AXIOM_TOKEN
    AXIOM_DATASET: arial-ci

Use a different dataset (e.g., arial-ci) to separate CI runs from local development.


Troubleshooting

"Axiom logging enabled" Not Shown

Verify your token is set:

echo $AXIOM_TOKEN

Ensure it starts with xaat-.

Events Not Appearing in Axiom

  1. Check dataset name: Ensure AXIOM_DATASET matches your Axiom dataset
  2. Verify token permissions: Token needs Ingest permission
  3. Wait for flush: Events are buffered and sent every 5 seconds
  4. Check for errors: Look for Axiom ingest failed messages in console output

Multi-Org Authentication Issues

If your token belongs to a specific organization, set AXIOM_ORG_ID:

export AXIOM_ORG_ID=your-org-id

Find your org ID in Axiom under Settings > Organization.

Network Errors

Arial retries failed requests by buffering events. If network issues persist, events may be lost. Arial continues executing regardless of Axiom availability.


Best Practices

  1. Use separate datasets for development, staging, and production
  2. Set up alerts in Axiom for workstream.failed events
  3. Monitor duration trends to catch performance regressions
  4. Review merge conflicts to identify specs that need better isolation
  5. Archive old data periodically to manage costs

Security

  • Store AXIOM_TOKEN securely (environment variable, secrets manager)
  • Use tokens with minimal permissions (ingest-only)
  • Create separate tokens for different environments
  • Rotate tokens periodically