Anomaly detection in time series data using BigQuery ML

Create an ARIMA model, then detect anomalies

Let’s say we have a time series dataset of the number of bicycles rented in London every day:

How can we find unusual days in terms of the number of bicycle rentals?

In BigQuery, all we need are two SQL statements.

1. ARIMA+ Model

First, create an ML model to predict the number of trips on future days using an ARIMA model:

One key thing is that we ask the model to keep track of forecast errors during the modeling using DECOMPOSE_TIME_SERIES. This is so that we can turn around and ask to see the anomalies.

2. Detecting anomalies

Next, detect anomalies (unusual days):

The result?

Okay, the second item on the list is Christmas. But why was July 9, 2015 so unusual?

What happened in London on July 9, 2015?

Let’s ask Google.

Ah yes. The subway network was shutdown due to a strike. That’s why there was an unusually high number of bicycle rentals.

Enjoy!

--

--

articles are personal observations and not investment advice.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store