Recognizing the limitations of current approaches to ML and AI (and finding a role for multi-scale modeling)

Thanks to all the participants for an excellent and enlightening conference. The current AI "Indian Summer" (more on this label later) presents an opportunity to the multiscale modeling community that is, in many ways, a recapitulation and an extension of the "Big Data Revolution" from ~ 5 years ago. That it is an extension should come as little surprise, as the current approaches to ML (primarily neural net based) represent a new-ish set of technologies and methods to achieve the large-scale correlation-identification goals set forth by Big Data. It is perfectly natural that the highly publicized accomplishments of Deep Learning seen in other domains (image identification, game-playing, optimizing click-bait) should find a ready audience in the biomedical community. But, given the potentially mission-critical nature of biomedical decisions, it behooves us to be cognizant of what is actually underneath the AI-hype: as with all analytical tools, we need to understand what the limits of those tools may be, in addition to their recognized benefits. Hence my label “AI-Indian Summer,” representing perhaps a particular period of exuberance in the biomedical community, all while the AI-community itself is warning of the next “AI-Winter.” https://blog.piekniewski.info/2018/05/28/ai-winter-is-well-on-its-way/

But winter is survivable, as long as you know it’s coming and have made the appropriate preparations. We can look to the AI community itself for cautionary notes regarding what the current approaches can and cannot do.

The first comes from Judea Pearl, winner of the Turing Prize in 2011 for his work on Ai and causal inference. As noted above, current ML is primarily a statistical tool, focused on identifying deep correlations between data sets. For problems that need only correlation (i.e. classification, diagnosis, prognosis) ML and current AI are well suited. However, as we are taught in Science 101 “Correlation is not causation,” and analysis and operations over inferences of causality, particularly mechanistic causality (as opposed to statistical/Granger causality) underpin the search for medical therapeutics and development of control strategies for biological systems. A recent article from Pearl summarizes the limitations of current ML approaches in terms of evaluating causality and counterfactuals: “Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution”: https://arxiv.org/abs/1801.04016

The second comes from Gary Marcus, a cognitive scientist at New York University, and one of the most prominent critics of some of the purported claims in the current wave of AI. While many of his critiques are based on the pathway towards developing a true general artificial intelligence (many of which focus on the need for some innate symbolic reasoning framework in such a system), many of those critiques are particularly germane for the biomedical community as it strives towards adoption of ML and AI. As, for all intents and purposes, the current state-of-the-art in AI (at least in terms of public understanding) refers to Deep Learning, an excellent overview of its limits can be found in this article “Deep Learning: A Critical Appraisal” https://arxiv.org/abs/1801.00631 (this article has been expanded to book-length in the recently published “Rebooting AI”). Marcus has proposed 6 simple skeptical questions to ask when reading about an AI project (reproduced from "Rebooting AI"):

1. Stripping away the rhetoric, what did the AI system actually do here? (Does a “reading system” really read, or does it just highlight relevant bits of text?)

2. How general is the result? (For example, does an alleged reading task measure all aspects of reading, or just a tiny slice of it? If it was trained on fiction, can it read the news?)

3. Is there a demo where I can try out my own examples? (If you can’t, you should be worried about how robust the results are.)

4. If the researchers—or their press people—allege that an AI system is better than humans, then which humans, and how much better? (Was the comparison with college professors, who read for a living, or bored Amazon Mechanical Turk workers getting paid a penny a sentence?)

5. How far does succeeding at the particular task actually take us toward building genuine AI? (Is it an academic exercise, or something that could be used in the real world?)

6. How robust is the system? Could it work just as well with other data sets, without massive amounts of retraining? (For example, would a driverless car system that was trained during the day be able to drive at night, or in the snow, or if there was a detour sign not listed on its map?)

Question #6 is particularly relevant to potential biomedical applications of ML/AI/DL, as it pertains to the generalizability/brittleness of a generated system and of considerable impact given the heterogeneity of biological systems.

Based on these insights, I see the following as key limitations of current ML/AI that specifically impact potential biomedical applications:

1. Generalizability: Is the system able to perform outside the boundaries of its initial training/testing set (recognizing that the traditional 80/20 or 70/30 train/test approach cannot address this issue; given the quality of modern ML learning algorithms such performance is almost guaranteed)? A manifestation of this is the well known fact that performance of ML systems degrade once released into the wild as it experiences data not previously seen (which is why applications from Google, Amazon are perpetually updating and retraining; analogous to the Red Queen effect).

2. Brittleness/Overfitting: This is related to Genearlizability. Is the training/test set representative of the distribution of outcomes/data in the real world? Biological data is inevitably sparse (compared to successful ML applications in image identification, text analysis and game playing), and current methods are extremely data hungry.

3. Interpretability: Neural networks are black boxes; we cannot readily gain insight into why they do what they do. As such “positive results” may be due to artifacts in the data set, completely distinct from the intended target of the classifier (i.e. selecting based on the type of machine that generates an image, recognizing there is selection bias in the groups of provided data, or classifying based on a specific annotator of the data…).

4. Inability to evaluate causality: See comments from Pearl.

5. Cannot evaluate anything “new”: The data must exist in order to train the NN, therefore it can only evaluate what has already been done. It cannot hypothesize the effect of the possible (a crucial step in developing new therapies).

So what to do? Clearly ML and AI represent powerful and potentially beneficial tools, but how can we utilize them while addressing their known limitations? My contention is that mechanism based dynamic computational modeling (MBM) can serve in a complementary fashion to the limitations noted above:

1. MBMs are generative functions: While not completely “general” (i.e. not natural laws), MBMs essentially represent some common basis of what is similar and conserved (i.e. functions) that underpins a wide range of data/behavior/phenotypes.

2. The generative capacity of MBMs can produce synthetic data in a fashion “akin”* to the real world system. This is in distinction to statistical means of generating synthetic data, which includes assumptions about the underlying distribution (which are often ungrounded for data representing interacting components). *Note: we need to recognize that the MBMs themselves are necessarily abstractions and incomplete…

3. MBMs are explicit instantiations of hypotheses and knowledge, and therefore intrinsically represent some interpreted set of mechanisms.

4. MBMs are inherently mechanistically causal.

5. MBMs can instantiate things yet to be. What can be imagined can be modeled.

Clearly MBMs come with their own limitations, but I would assert that the development of rigorous methods of integrating MBM and ML can allow each method to complement the limitations of the other. To a great degree, each represents a technological tool targeting a specific portion of the Scientific Cycle (ML => Establishing the correlations that form hypotheses via abduction, MBM => Accelerating hypothesis testing to falsify and determine plausibility) (https://stm.sciencemag.org/content/2/41/41ps34). The challenges of ML are excellently summarized in this report from the DoE Office of Scientific and Technical Investigation https://www.osti.gov/biblio/1478744. I am hopeful that the MSM community will take the opportunity to build on this workshop as we move forward towards a more comprehensive scientific approach to the challenges of biomedicine.

- Gary An

Table sorting checkbox

Off