Anomaly detection in time series data using BigQuery ML

Create an ARIMA model, then detect anomalies

Let’s say we have a time series dataset of the number of bicycles rented in London every day:

SELECT 
EXTRACT(date from start_date) AS start_date,
COUNT(*) AS num_trips
FROM `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY start_date

1. ARIMA+ Model

First, create an ML model to predict the number of trips on future days using an ARIMA model:

CREATE OR REPLACE MODEL ch09eu.bicycle_daily_trips
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='num_trips',
TIME_SERIES_TIMESTAMP_COL='start_date',
DECOMPOSE_TIME_SERIES=TRUE
)
AS (
SELECT
EXTRACT(date from start_date) AS start_date,
COUNT(*) AS num_trips
FROM `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY start_date
);

2. Detecting anomalies

Next, detect anomalies (unusual days):

SELECT * 
FROM ML.DETECT_ANOMALIES(
MODEL ch09eu.bicycle_daily_trips,
STRUCT (0.95 AS anomaly_prob_threshold))
ORDER BY anomaly_probability DESC
LIMIT 10

What happened in London on July 9, 2015?

Let’s ask Google.

Data Analytics & AI @ Google Cloud