Anomaly detection in time series data using BigQuery ML

Create an ARIMA model, then detect anomalies

Lak Lakshmanan
2 min readJul 15, 2021

Let’s say we have a time series dataset of the number of bicycles rented in London every day:

SELECT 
EXTRACT(date from start_date) AS start_date,
COUNT(*) AS num_trips
FROM `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY start_date

How can we find unusual days in terms of the number of bicycle rentals?

In BigQuery, all we need are two SQL statements.

1. ARIMA+ Model

First, create an ML model to predict the number of trips on future days using an ARIMA model:

CREATE OR REPLACE MODEL ch09eu.bicycle_daily_trips
OPTIONS(
model_type='arima_plus',
TIME_SERIES_DATA_COL='num_trips',
TIME_SERIES_TIMESTAMP_COL='start_date',
DECOMPOSE_TIME_SERIES=TRUE
)
AS (
SELECT
EXTRACT(date from start_date) AS start_date,
COUNT(*) AS num_trips
FROM `bigquery-public-data.london_bicycles.cycle_hire`
GROUP BY start_date
);

One key thing is that we ask the model to keep track of forecast errors during the modeling using DECOMPOSE_TIME_SERIES. This is so that we can turn around and ask to see the anomalies.

2. Detecting anomalies

Next, detect anomalies (unusual days):

SELECT * 
FROM ML.DETECT_ANOMALIES(
MODEL ch09eu.bicycle_daily_trips,
STRUCT (0.95 AS anomaly_prob_threshold))
ORDER BY anomaly_probability DESC
LIMIT 10

The result?

Okay, the second item on the list is Christmas. But why was July 9, 2015 so unusual?

What happened in London on July 9, 2015?

Let’s ask Google.

Ah yes. The subway network was shutdown due to a strike. That’s why there was an unusually high number of bicycle rentals.

Enjoy!

Sign up to discover human stories that deepen your understanding of the world.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Lak Lakshmanan
Lak Lakshmanan

Written by Lak Lakshmanan

articles are personal observations and not investment advice.

Responses (5)

What are your thoughts?

Thank you!

Thank you @Lak for such a wonderful article.

Thanks for the article Lak, big fan of the well-written documentation and new functions around Arima anomaly detection. Using it currently and really happy, have a slight overfitting issue but plenty to work with