Auckland Transit Archive

About Auckland Transit Archive

What is this?

Auckland Transit Archive is an open data project that tracks Auckland bus reliability over time. All the data comes from Auckland Transport's public GTFS-RT feeds, and all the analysis runs directly in your web browser using DuckDB WASM.

How delays are measured

Auckland Transport broadcasts real-time predictions every few minutes through their GTFS-RT (General Transit Feed Specification - Realtime) API. Each prediction includes how many seconds early or late a bus is expected to arrive at each stop.

  • Positive delay = Bus is running late
  • Negative delay = Bus is running early
  • "On time" = Within 1 minute of scheduled time
  • "5+ minutes late" = Arrival delay > 300 seconds

Data collection

We capture Auckland Transport's GTFS-RT feeds every 30 seconds and store them in a public data lake. The data includes:

  • Vehicle positions - GPS coordinates of every bus
  • Trip updates - Delay predictions for each stop
  • Service alerts - Disruption notices

Why DuckDB WASM?

Instead of running queries on a server, we use DuckDB compiled to WebAssembly. This means:

  • No server costs - the site is completely static
  • No rate limits - run as many queries as you want
  • Full transparency - you can inspect exactly what SQL is running
  • Privacy - your queries never leave your browser

Query the data yourself

The entire dataset is publicly accessible. You can connect using DuckDB CLI or any DuckDB client:

INSTALL ducklake; LOAD ducklake;
INSTALL postgres; LOAD postgres;

CREATE SECRET postgres_secret (
    TYPE postgres,
    HOST 'ep-shy-base-a7ry4sok.ap-southeast-2.aws.neon.tech',
    PORT 5432, DATABASE 'neondb',
    USER 'transit_reader', PASSWORD 'public_readonly_2024'
);

CREATE SECRET ducklake_secret (
    TYPE ducklake, METADATA_PATH '',
    DATA_PATH 's3://auckland-transit-archive/',
    METADATA_PARAMETERS MAP {TYPE': 'postgres', 'SECRET': 'postgres_secret'}
);

ATTACH 'ducklake:ducklake_secret' AS transit (
    DATA_PATH 'https://bus-data.stochastic.systems/',
    OVERRIDE_DATA_PATH true
);

-- Now query!
SELECT route_id, AVG(arrival_delay)/60 as avg_delay_minutes
FROM transit.auckland.trip_updates
WHERE observed_at > NOW() - INTERVAL '7 days'
GROUP BY route_id ORDER BY avg_delay_minutes DESC;

Limitations

  • Data depends on AT's GTFS-RT feed accuracy
  • We started collecting data recently - historical data is limited
  • Some routes may have insufficient samples for reliable statistics
  • Ferry and train data may be less complete than bus data

Contributing

This is an open source project. Contributions are welcome!

  • GitHub Repository
  • Report bugs or request features via GitHub Issues
  • Submit pull requests for improvements

Contact

For questions about this project, open an issue on GitHub or reach out via the repository's discussion forums.