Advent of Code 2019 in Apache Beam (Days 1–3)

Concepts: Pipeline, Map, Combine, Create, FlatMap, CoJoinByKey, Top

Inspired by Felipe Hoffa who is doing Advent of Code in BigQuery, I decided to do it in Apache Beam. This blog explains my solutions, which you can find on GitHub. If you are learning Apache Beam, I encourage you to look over my solutions, but then try to solve the problems yourself.

Day 1a: Fuel computation (Concepts: Pipeline, Map, Combine)

The problem for Day 1 consists of some simple math operations applied to a large number of modules. We can parallelize the operations over the modules using Beam. I started by writing the fuel computation:

Then, verified that the code works as desired:

The Beam pipeline doesn’t use any command-line args, and so is:

The steps read the input consisting of each module’s mass, convert the mass to a floating point number, compute the fuel for the module, sums up the fuel, and writes out the total fuel:

The Map applies the lambda function to each module, and the CombineGlobally does an aggregation of the fuel for all the modules.

Beam will automatically parallelize each of the Map steps. Had we done a CombinePerKey, Beam would do the necessary Map-Reduce to parallelize the combine steps.

Day 1b: Fuel for fuel

The second part of the problem simply involves changing the fuel computation. The rest of the code remains the same:

Day 2a: Intcode

On Day 2, the problem is pretty much parsing an “intcode”. Unfortunately, the intcode parser is a state machine, and because Beam is a distributed programming framework, there is very little for Beam to do here. We just have to write the Python code to parse the intcode:

As before, we use the provided examples to verify that our code is correct:

Once that is done, the problem asks us to replace the first two inputs and obtain the result at the first position. So:

The Beam code is trivial because there is only input and it needs to have the run_1202 function applied to it:

Day 2b: Filtering (Concepts: Create, FlatMap)

The second part of the problem requires parallel processing, and Beam is back in play. Instead of the hardcoded 12 and 2 in Day 2a, we have to make them parameters:

Because we have to find the noun-verb that results in a specific value of output, we return a tuple consists of the two inputs and the output.

The Beam pipeline starts with generating all the possible values of the noun and verb:

The Python generator creates inputs of the form (3, [0,1,2,…,99]), and Beam.Create makes the Python list of tuples a PCollection of tuples.

Each of the tuples themselves has a list in the second position. So, the next step is to split the second part. Here, because the input is a single tuple, and the output has 100, we need to use a FlatMap (use a Map for 1:1 transformations, FlatMap for 1:many):

where split_verbs is:

As before, we then run the program for each (noun, verb):

Note the idiom here when using Map — just pass in a function name if it takes only one parameter, but it takes multiple parameters, then use a lamda to specify the positional/named parameters.

Then, we filter the set of outputs for the desired output value:

where filter_for_desired is:

Here, again, we see that we are using FlatMap because it is not a 1:1 transform. The input consists of the results of processing 100x100 intcodes and the output is a single one that matches the desired output.

Day 3a (Concepts: GroupByKey, CoGroupByKey, Top)

This time, we are given a set of navigation and need to first retrieve a set of points. We can do it by building a Python list:

Once we have the locations for the two wires, we need to find all the intersection points. A scalable way to do this is to find the intersection points in each row. For that, we will make up a key-value collection for each wire where the key is the row number and the value is the columns that the wire touches the grid:

Then, to find the intersections of the wires, we co-join the two collections by row:

This yields something that looks like this, where for row #217, wire1 has 9 points and wire2 has only one point:

Given this structure, we can easily find the intersection points — this is any column that appears in both wire1 and wire2:

Once we find the intersection, then finding the Manhattan distance is quite easy:

Note that to find the minimum distance, I’m using the TopCombineFunction, asking Beam to find the top 1, and because Top normally finds the greatest, I tell it to reverse the comparison.

Day 3b: signal delay

The second part basically changes the metric. Instead of the Manhattan distance, we need to minimize the total number of steps to reach the intersection. While we could compute the Manhattan distance simply from the row, col of the points, now we have to keep track of the steps as well. We can modify the code from the previous section to store the column as well as the step number:

Then, when doing the search, we make sure to store the total number of steps, which is the signal delay:

Instead of the Manhattan distance, minimize this signal delay and you’ve got it.

Written by

Data Analytics & AI @ Google Cloud

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store