How to do product mix optimization in real-time

Linear programming on streaming data within an Apache Beam pipeline

4 min readFeb 2, 2022

Not everything needs to be machine learning! Classical operations research methods can be quite good for optimizing processes, product-mixes, and logistics if you understand the constraints you are operating under. You can even operate on streaming data within an Apache Beam pipeline to achieve real-time control and agility.

Problem

Imagine that you have a manufacturing facility that can produce 4 types of products (A, B, C, and D) from 4 ingredients (dye, labor, water, and concentrate). Moreover, the value of producing A (the profit if you will) is $50, of B is $100 and so on:

Every hour, you get new supplies:

{"dye": 5710, "labor": 870, "water": 22350, "concentrate": 5010}

We want to maximize the total profit:

50A + 100B + 125C + 40D

How much of A, B, C, and D should you manufacture?

Cost function and Constraints

It depends on how much of each ingredient you need to produce the products. Let’s say that you need 50 mg of dye, 5 person-hours, 300 liters of water and 30 ml of concentrate to make product A and similarly, you know what you need for the other products:

Moreover, you can not ship more than 25 units of B per hour and not more than 10 units of C per hour.

These are the constraints. If you write them up, you will get an equations of this form for dye (since you received 5710 mg of it):

50A + 60B + 100C + 50D <= 5710

The bounds for A is (0, inf) whereas for B, it is (0, 25).

Cost function and Constraints

We can solve a constrained optimization problem like this using linear programming. scipy has a handy implementation:

def linopt(materials):
    import numpy as np
    from scipy.optimize import linprog
    from scipy.optimize import OptimizeResult    # coefficients of optimization function to *minimize*
    c = -1 * np.array([50, 100, 125, 40])
    # constraints  A_ub @x <= b_ub (could also use a_eq, b_eq, etc.)
    A_ub = [
        [50, 60, 100, 50],
        [5, 25, 10, 5],
        [300, 400, 800, 200],
        [30, 75, 50, 20]
    ]
    b_ub = [materials['dye'], 
            materials['labor'], 
            materials['water'], 
            materials['concentrate']]
    bounds = [
        (0, np.inf),
        (0, 25),
        (0, 10),
        (0, np.inf)
    ]
    res = linprog(c, A_ub=A_ub, b_ub=b_ub, bounds=bounds)
    return np.round(res.x)

Given the input materials:

{"dye": 5170, "labor": 700, "water": 29940, "concentrate": 5160}

the optimal product mix is:

[37. 10. 10. 35.]

Apache Beam pipeline

Wrap this in an Apache Beam pipeline, and you can basically solve the problem for every JSON message you receive in a Pub/Sub topic.

For trying it out, let’s use a text file:

    p = beam.Pipeline(options=options)
    (p
      | 'ingest'    >> beam.io.ReadFromText(INPUT)
      | 'optimize'  >> beam.Map(lambda x : linopt(json.loads(x)))
      | 'output'    >> beam.io.WriteToText(OUTPUT)
    )
    result = p.run()

Carrying inventory

In reality, we’d want to carry some inventory. If we receive 30 mg of dye, and use only 28 mg, we’d keep the remaining 2 mg for the next iteration. We can do that with Apache Beam by maintaining a stateful variable:

class Inventory:
    def __init__(self):
        # assume water and labor can not be stored in inventory
        self.dye = 0
        self.concentrate = 0    def update(self, leftover):
        self.dye = leftover[0]
        self.concentrate = leftover[3]

The optimization method is modified to use and update the inventory:

def linopt(materials, inventory):
    ...
    b_ub = [
        materials['dye'] + inventory.dye,
        materials['labor'],
        materials['water'],
        materials['concentrate'] + inventory.concentrate
    ]
    
    ...    leftover = b_ub - np.matmul(A_ub, qty)
    inventory.update(leftover)
    return qty

The pipeline is modified to use the inventory as a side input:

inventory = Inventory()(p
                    | 'ingest' >> beam.io.ReadFromText(INPUT)
                    | 'parse' >> beam.Map(lambda x: json.loads(x))      
                    | 'optimize' >> beam.Map(lambda x: linopt(x, inventory))
                    | 'output' >> beam.io.WriteToText(OUTPUT)
    )

Now, the remaining inventory from the previous timestep is used in addition to materials received:

inventory from previous step: 
[1340.0, 5.0, 140.0, 278895.0]material received: 
{'dye': 5000, 'labor': 770, 'water': 20210, 'concentrate': 5300}total available:
[6340.0, 770, 20210, 284195.0]qty:
[ 0. 17.  0. 65.]inventory for next step:
[2070.0, 20.0, 410.0, 281620.0]

The full code is on GitHub. Enjoy!

How to do product mix optimization in real-time

Linear programming on streaming data within an Apache Beam pipeline

Problem

Cost function and Constraints

Cost function and Constraints

Apache Beam pipeline

Carrying inventory

Written by Lak Lakshmanan

No responses yet