From Optimization to Prescriptive Analytics

Author: William Vorhies

Summary: True prescriptive analytics requires the use of real optimization techniques that very few applications actually use. Here’s a refresher on optimization with examples of where and how they’re best used.

Predictive analytics and optimization have gone hand in hand since the very beginning. But in 2014 some erudite journal decided we needed another phrase for this combo and it became Prescriptive Analytics, theoretically differentiating what could happen (predictive) from what should happen (prescriptive) through the application of optimization.

Originally I felt strongly that this was a distinction without a difference and only served to confuse our customers who were having a hard enough time five years ago understanding why they should even be doing predictive. Time passed. I was over ruled and prescriptive analytics became a fixed part of our nomenclature.

As our attention has been pulled increasingly to AI, the greater business value by far is still being generated by ML and predictive models. And although a great deal of lip service is being given to using models to determine what should be done, very little of this involves true optimization.

The typical business uses such as ‘next best offer’ or churn reduction or even fraud detection are much more likely to be used in the context of a rules based engine or even robotic process automation. It seems time to review where true optimization fits in this and to outline some of the problem types where optimization should be used.

Confusion over Optimization

Any quick survey of the literature will find a definition for optimization similar to this:

The act of obtaining the best results under the given circumstances. The goal being to minimize the required effort or maximize the desired benefit.

The problem is that this definition is much too broad to be used in data science. Starting with differentiating between single variable and multiple variable optimization.

Single Variable versus Multiple Variable Optimization

To clarify, we are talking about the dependent or output variable which is the goal of every predictive model. We apply an appropriate cost function and use multiple techniques to get a best fit model. In almost all cases this involves a gradient descent calculation like SGD which is itself the very definition of optimization.

So although every model we create is essentially an optimization, this is decidedly not what we’re talking about when talking about optimization in the context of prescriptive analytics.

True optimization problems need to be expressed in terms of at least two (seemingly conflicting) functions. For example:

In pricing, select the optimum price that maximizes both total revenue and profit.
In scheduling and routing (the traveling salesman problem) select the route that is the shortest and also allows visits to all necessary locations.
In physical machinery, for example in this case a waste incinerator, select the greatest throughput volume that does not exceed maximum CO2 output levels.
In human purchase behavior, select the combination of promotional offers that both maximizes sales and profit.

It should be evident that each of these examples contains conflicting goals. Maximizing unit sales for example could easily be achieved by allowing price to fall well below cost. However maximizing for both unit sales and profit requires looking at where these two different curves intersect, showing both the lowest price to earn maximum unit sales while at the same time earning the greatest profit.

Defining the Problem

Optimization problems are said to require a four part definition:

Decision Variable: The measure to be optimized (maximized or minimized, e.g. profit, travel time).

Objective: Measured by minimization or maximization.

Constraints: This is the most important element and describes the formulas by which the two conflicting forces behave. For example trash throughput versus CO2 production, or revenue versus profit.

Bounds: Bounds define the limits of the feasible solutions. It might be something seemingly obvious like prices cannot be a negative number or that the operating temperature of the waste incinerator cannot exceed some logical physical maximum temperature. Many optimization problems don’t define bounds so their solutions are open to any feasible solution in the search space.

More Than Two Variables

While it’s theoretically possible to have three or more variable to be optimized simultaneously, typically this involves such computational complexity that it’s not recommended.

For example, in the profitability example we might seek to select the promotion that 1.) maximizes sales, 2.) maximizes profit, and 3.) focuses only on a specific characteristic of merchandise such as inventory that’s over 60 days old. It’s almost always better to reduce the third variable to a bounds, and if necessary segment the problem by different bounds groupings.

Solution Techniques and Other Considerations

The mathematics and techniques available for optimization are almost as complex and multi-faceted as the techniques of machine learning itself. Start with the issue of whether the challenge falls into the category of ‘Convex Optimization Problems’ or ‘Non-Convex’.

Convex optimization problems can be shown to have a single optimum answer. Imagine the case of simply charting the two opposing constraint functions to see where they intersect, representing the single optimum solution.

More common and certainly familiar to data scientists are the ‘non-convex’. It’s easy to visualize these as the complex solution spaces we all encounter with multiple local minima and maxima. There are dozens of techniques to address these situations.

One is to bound the problem to a narrow group of data that is likely to have only a single local optima. Frequently however it’s necessary to use a multiple-start technique that randomly begins the search process at different places in the solution space and collects a group of local optima that can subsequently be compared. Evolutionary genetic algorithms are favored in this group. Still no guarantee that the true optima has been located.

Swarm techniques of which there are many with many clever names like Ant Colony, Firefly optimization, or Bee optimization are available. These are multiple start techniques but instead of sequential iterations, these are agents which start simultaneously. Despite the connotation of ‘swarm’, generally about 10 or 12 independent agents are all that’s required. Here’s a somewhat comprehensive list of optimization techniques.

Ant colony optimization (ACO); I Dorigo and Stutzle (2004)
Artificial immune system optimization; Cutello and Nicosia (2002)
Bacterial foraging optimization; Kim, Abraham and Cho (2007)
Bee optimization; Karaboga and Bosturk (2007) Pham et al (2006)
Cuckoo algorithm; Yang and Deb (2009, 2010)
Differential evolution (DE); Storn and Price (1995, 1997)
Firefly optimization; Yang (2010)
Fish optimization; Huang and Zhou (2008)
Genetic algorithms (GA); Haupt and Haupt (2004)
Particle swarm optimization (PSO), Binary Particle Swarm Optimization (BPSO); Eberhart and Kennedy (1995)
Raindrop optimization; Shah-Hosseini (2009)
Simulated annealing; Kirkpatrick, Gelatt and Vecchi (1983)
Biogeography-based optimization (BBO),
Chemical reaction optimization (CRO)
A group search optimizer (GSO),
Imperialist algorithm
Swine flow Optimization Algorithm.
Teaching Learning Based Optimization (TLBO)
Bayesian Optimization Algorithms (BOA)
Population-based incremental learning (PBIL)
Evolution strategy with covariance matrix adaptation (CMA-ES)
Charged system search Optimization Algorithm
Continuous scatter search (CSS) Optimization Algorithm
Tabu search Continuous Optimization
Evolutionary programming
League championship algorithm
Harmony search Optimization algorithm
Gravitational search algorithm Optimization
Evolution strategies Optimization
Firework algorithm, Ying Tan, 2010
Big-bang big-crunch Optimization algorithm, OK Erol, 2006
Artificial bee colony optimization (ABC), Karaboga, 2005
Backtracking Search Optimization algorithm (BSA)
Differential Search Algorithm (DSA) (A modernized particle swarm optimization algorithm)
Hybrid Particle Swarm Optimization and Gravitational Search Algorithm (PSOGSA)
Multi-objective bat algorithm (MOBA) Binary Bat Algorithm (BBA)
Flower Pollination Algorithm
The Wind Driven Optimization (WDO) algorithm
Grey Wolf Optimizer (GWO)
Generative Algorithms
Hybrid Differential Evolution Algorithm with Adaptive Crossover Mechanism
Lloyd’s Algorithm
One Rank Cuckoo Search (ORCS) algorithm: An improved cuckoo search optimization algorithm
Huffman Algorithm
Active-Set Algorithm (ASA)
Random Search Algorithm
Alternating Conditional Expectation algorithm (ACE)
Normalized Normal Constraint (NNC) algorithm

This is not to say that you need to master more than a few of these, but you should be aware that selecting the right algorithm is not simply selecting whatever default optimizer might be available in your favorite package.

If you find an opportunity for true optimization, and not just an application of a simple rule based on model score, then you may be able to create a truly optimized outcome for your organization.

Other articles by Bill Vorhies

About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2.0 million times. He can be reached at:

Bill@DataScienceCentral.com or Bill@Data-Magnum.com

Go to Source