Axiom Observability

Monitor Arial runs with Axiom for production-grade observability. Get structured logs, timing metrics, and insights into your AI workstreams.

Overview

Arial integrates with Axiom to provide centralized logging and observability for your workstream executions. When enabled, Arial automatically sends structured events to Axiom, giving you:

Real-time visibility into run progress
Historical data for debugging failed runs
Timing metrics for performance analysis
Centralized logs across multiple machines/CI runs

The integration is optional and non-blocking. If Axiom is unavailable, Arial continues running normally.

Setup

1. Create an Axiom Account

2. Create a Dataset

In the Axiom dashboard:

Go to Datasets
Click New Dataset
Name it arial (or your preferred name)

3. Create an API Token

Go to Settings > API Tokens
Click New API Token
Give it a name like "Arial CLI"
Grant it Ingest permission for your dataset
Copy the token (starts with xaat-)

4. Configure Environment Variables

export AXIOM_TOKEN=xaat-your-token-here
export AXIOM_DATASET=arial

For multi-org accounts, also set:

export AXIOM_ORG_ID=your-org-id

5. Run Arial

arial run

You'll see confirmation that Axiom is enabled:

Axiom logging enabled

Environment Variables

Variable	Required	Default	Description
`AXIOM_TOKEN`	Yes	-	API token with ingest permissions
`AXIOM_DATASET`	No	`arial`	Dataset to send events to
`AXIOM_ORG_ID`	No	-	Organization ID (for multi-org accounts)

Event Types

Arial sends structured events to Axiom. Each event includes:

_time: ISO timestamp
service: Always "arial"
type: Event type identifier
Event-specific fields

Run Events

Event	Fields	Description
`run.started`	`specsDir`, `workstreamCount`	Emitted when `arial run` begins
`run.completed`	`duration`, `success`	Emitted when all workstreams finish

Workstream Events

Event	Fields	Description
`workstream.started`	`workstreamId`, `branch`	Agent begins execution
`workstream.completed`	`workstreamId`, `duration`	Agent finishes successfully
`workstream.failed`	`workstreamId`, `error`, `retryCount`	Agent encountered an error

Merge Events

Event	Fields	Description
`merge.started`	`workstreamId`, `branch`	Branch merge begins
`merge.completed`	`workstreamId`	Branch merged successfully
`merge.conflict`	`workstreamId`, `retryCount`	Merge conflict detected

Other Events

Event	Fields	Description
`context.added`	`workstreamId`	Context from another workstream added

Example Events

Run Started

{
  "_time": "2024-01-15T10:00:00.000Z",
  "service": "arial",
  "type": "run.started",
  "specsDir": "./specs",
  "workstreamCount": 3
}

Workstream Completed

{
  "_time": "2024-01-15T10:02:15.000Z",
  "service": "arial",
  "type": "workstream.completed",
  "workstreamId": "auth",
  "duration": 135000
}

Workstream Failed

{
  "_time": "2024-01-15T10:01:30.000Z",
  "service": "arial",
  "type": "workstream.failed",
  "workstreamId": "api",
  "error": "Agent exceeded context limit",
  "retryCount": 0
}

Querying Data

Axiom APL Examples

All events from the last hour:

['arial']
| where _time > ago(1h)

Failed workstreams:

['arial']
| where type == "workstream.failed"
| project _time, workstreamId, error

Average workstream duration:

['arial']
| where type == "workstream.completed"
| summarize avg(duration) by bin(_time, 1h)

Run success rate:

['arial']
| where type == "run.completed"
| summarize
    total = count(),
    successful = countif(success == true)
| extend success_rate = (successful * 100.0) / total

Workstreams with merge conflicts:

['arial']
| where type == "merge.conflict"
| summarize conflicts = count() by workstreamId
| order by conflicts desc

Creating Dashboards

Build an Arial monitoring dashboard with these widgets:

Run Overview

Success Rate: Pie chart of run.completed by success
Runs Over Time: Time series of run.started events

Workstream Health

Status Distribution: Count by workstream.completed vs workstream.failed
Average Duration: Line chart of workstream durations
Failure Reasons: Table of workstream.failed grouped by error

Merge Analytics

Conflict Rate: Percentage of workstreams with merge.conflict events
Retry Distribution: Histogram of retryCount values

CI/CD Integration

GitHub Actions

- name: Run Arial
  env:
    AXIOM_TOKEN: ${{ secrets.AXIOM_TOKEN }}
    AXIOM_DATASET: arial-ci
  run: arial run

GitLab CI

run-arial:
  script:
    - arial run
  variables:
    AXIOM_TOKEN: $AXIOM_TOKEN
    AXIOM_DATASET: arial-ci

Use a different dataset (e.g., arial-ci) to separate CI runs from local development.

Troubleshooting

"Axiom logging enabled" Not Shown

Verify your token is set:

echo $AXIOM_TOKEN

Ensure it starts with xaat-.

Events Not Appearing in Axiom

Check dataset name: Ensure AXIOM_DATASET matches your Axiom dataset
Verify token permissions: Token needs Ingest permission
Wait for flush: Events are buffered and sent every 5 seconds
Check for errors: Look for Axiom ingest failed messages in console output

Multi-Org Authentication Issues

If your token belongs to a specific organization, set AXIOM_ORG_ID:

export AXIOM_ORG_ID=your-org-id

Find your org ID in Axiom under Settings > Organization.

Network Errors

Arial retries failed requests by buffering events. If network issues persist, events may be lost. Arial continues executing regardless of Axiom availability.

Best Practices

Use separate datasets for development, staging, and production
Set up alerts in Axiom for workstream.failed events
Monitor duration trends to catch performance regressions
Review merge conflicts to identify specs that need better isolation
Archive old data periodically to manage costs

Security

Store AXIOM_TOKEN securely (environment variable, secrets manager)
Use tokens with minimal permissions (ingest-only)
Create separate tokens for different environments
Rotate tokens periodically