Model Transparency and the Complication of Model-Based vs Model-Free Learning

Author: William Vorhies

Summary: High stakes models like those that allocate scarce resources to competing hospitals are headline news. New thinking contrasting model-based versus model-free learning are emerging to describe new conditions we must consider before building or evaluating those models.

At this point we’re all familiar with the conversation around model transparency. Whether that’s driven by the common sense desire for understanding or regulation driven as in finance it seems like a fairly straightforward goal.

In the context of supervised model building where almost all of us live bias in a model is assumed to be the result of a dataset that doesn’t truly represent the whole population. Occasionally we might come across an example where the decision making process represented by the model is flawed but this is rare.

We wrote recently of the dangers of using models where the impact on an individual or a group might have life-changing implications. This is not simply deciding who gets the promotion or the special price. This is about who gets bail or which hospitals receive how many masks and respirators.

The good news is that most of us will probably never be in the position of having to build or defend a model with such wide ranging implications. The bad news is that the better we get at model building and the more advanced data science becomes, the greater will be the desire to ‘let the machine decide’.

Our upbringing in this modern era still causes us to think that machines don’t make mistakes. They are built to perform their function, and unlike humans will perform that flawlessly each and every time. The complication is that we are being lulled into the belief that AI is just another machine that will always give the right answer.

And it should in fact always give the same (or a consistent) answer. But will we all believe the answer is ‘right’? Especially when the model is allocating scarce resources.

Planning for Moral AGI

Fortunately there are folks thinking about this. This paper: “Building Machines That Learn and Think about Morality” by Christopher Burr and Geoff Keeling is a good place to start.

The context of this is planning for AGI, Artificial General Intelligence. That world is still divided between those trying to make computers learn and think like humans and those who believe the same results can be had more simply by allowing the computer to discover certain heuristic rules that explain most behavior.

AGI is many years off but like self-driving cars it’s most likely to arrive in stages from augmentation through full autonomy. Burr and Keeling are attempting to identify some questions and approaches to researching one of the more knotty aspects of AGI. That is, will we agree that the answers that our AGI entity provides will be good? In this case good needs to be understood as qualitative as in moral or ethical.

The reason this is relevant now is that we have entered the age where we are calling on models to guide decisions that have strong moral or ethical implications. By the way, ‘moral’ and ‘ethical’ are both defined as describing “right behavior” which immediately suggests a subjective component.

Why Should Data Scientists Care

From a data science perspective the aspect that caught my attention was the distinction between model-based versus model-free learning. These phrases are beginning to pop up in our literature so we’re using this topic to shed a little light here.

These terms derive from reinforcement learning and represent two competing schools of practice. A model-based RL system starts with a detailed understanding of its environment, thereby its limits and capabilities. A model-free RL system simply allows the system to conduct trial-and-error actions until a solution is found. The limits of a full model understanding aren’t judged to be necessary; the RL will figure it out eventually.

In the field of human learning these two terms also represent the two originalist schools of thought about how humans learn. The model-based school believes the human infant comes equipped with ‘startup software’ that rapidly (much more rapidly than today’s RL) allows them to organize experiences of the world into successful behaviors and transfer learning between dissimilar circumstances.

The model-free school says that no model is required. That the infant learns through simple heuristics and those are guided by physical abilities and limitations. Like, if you only have two hands then don’t try to find a solution that requires three. Or more simply, gravity pulls down.

That might all seem too abstract for the average data scientist, but here’s where it gets interesting. The two schools of human learning are gradually converging on an understanding that both types, model-based and model-free act together with one type sometimes more dominant than the other.

Back to Model Building

If you are asked to build or evaluate a ‘high stakes model’ here are some considerations that Burr and Keeling say you should keep in mind.

Model-based systems seem superior at the outset but the requirement to constantly adapt to a changing environment places the system in a loop of sensing (gaining information about the environment), modeling (building the richly reconstructive representation), planning (course of action to reach the goal) and action. The SMPA loop is computationally intensive and results in either a time cost or an accuracy cost that need to balance against how fast the environment is changing.

The model-free systems based on simpler heuristics are faster to train in a changing environment but may not transfer learning well outside of a narrow domain. However, a model-free “moral agent is not required to internalize the norms of society in order to ensure their behavior meets certain moral standards, and can potentially make do with a simplified model of the world (or maybe even a set of well-tuned heuristics) when certain institutions act as regulative constraints.”

What is needed is a mechanism that interprets or negotiates between the two approaches. This mechanism has not yet been defined but the authors point to the need for the two models to operate in parallel (presumably where resources and time allow). This is the challenge for the data scientist model builder.

Back to Bias and Transparency

The larger question for transparency in these high-impact situations is recognizing the limitations of each approach and being prepared to use examples of why the model’s recommendation was reached and by extension, why we should believe it is the ‘right action’.

The authors summarize the current state of conflict between these two approaches. The way we actually arrive at moral decisions frequently has more to do with heuristics than an ideal model. For example, we are much more likely to act to repair or challenge an unfair act if the person impacted is close to us rather than on the other side of the world. Yet in reality the moral/ethical decision should require the right action in both cases. So designing an AI that relies mostly on model-free heuristics may cause it to consistently make suboptimal moral decisions.

Also, systems that emphasize model-free heuristics are less likely to reach acceptable solutions when the circumstances are novel.

However, model-based ‘ideal reasoner’ systems that don’t recognize the imperfect heuristic methods we actually use in moral decision making may appear biased to many. For example an ‘in group/out group’ bias is easy to spot in many such decisions. A universally consistent solution may seem biased against the ‘in group’, although it is actually eliminating bias against the ‘out group’.

When evaluating moral decision making, we rely heavily on examples for explanations of why a decision was reached. So it goes largely without saying that the higher the stakes of the model, the more transparency is required.

The unsolved challenge that we are addressing here is how future data scientists can build these dual-personality models and balance their outcomes so that the greatest possible number of people agree that the result is “right action”.

Other articles by Bill Vorhies

About the author: Bill is Contributing Editor for Data Science Central. Bill is also President & Chief Data Scientist at Data-Magnum and has practiced as a data scientist since 2001. His articles have been read more than 2.1 million times.

Bill@DataScienceCentral.com or Bill@Data-Magnum.com

Go to Source