Back to 2018 IMAG Futures Agenda

__MSM Methodology WGs__

**Cell-to-Macroscale Working Group**

**Committee on Credible Practice of Modeling & Simulation in Healthcare**

**Model and Data Sharing Working Group**

**Population Modeling Working Group**

**Theoretical and Computational Methods**

**MSM U01 suggestions for new MSM Methodology challenges:**

- Coarse-graining approaches including coarse-grained versions of molecular dynamics models and sub-cellular models.
- Model reproducibility.
- Molecular-to-cellular link. Role of the heterogeneity in both of cell types and of the cell microenvironment. Connection to omics data.
- Rigorous description of coupling in space and time. Similar to matching asymptotic methods in PDEs.
- How to best balance the simplicity that models should bring with the complexity of biology through a rational integration of biological data and hypothesis in multi-modeling frameworks.
- Combining continuum based finite elements simulation methods (e.g., CFD, structural, and FSI) with particle based methods (e.g., DPD, MD) to cover the vast range of scales in biological processes.
- • Machine learning that takes into account physical principles. More often than not, the data is not only noisy with higher uncertainty than precision but it also does not represent a measurement of the underlying processes. The use of physical principles will enable inferences in spite of noisy data.
- • Methods to predict regulation from principles (could be combined with inference from data). That heterotrophic central metabolism is generally conserved across metazoans implies that there are common principles for regulation of metabolism. Understanding why cells are regulated – from an operational or dynamic perspective, will lead to predicting how cells are regulated.
- Better integrate engineering/ modeling approaches with fundamental biology and its approaches. There is a need for more integration between biology (as it is often taught in a Socratic method, and approached using qualitative focused/reductionist methodologies) and engineering/ systems approaches. There is a need for cross-talk on both ends, but often the language and approaches can be inaccessible.
- Strategies for experimental data generation and experimental design that are specifically generated for model development and validation.
- The quantitative description of multiple scales in biological processes are based on theories and models from different fields (e.g, systems biology, continuum mechanics, hydrodynamics). Clinical and health data are heterogeneous ranging from medcial records, imaging data, to molecular profiling. To integrate this diversity, MSM can explicitly focus on heterogeneous multiscale methods or Multimodeller Hyper Models which combine disparate models into one process/simulation framework.
- We are currently developing a protocol to assess form and function in the micro-vasculature of the pulmonary circulation using electron microscopy techniques on both animal and human data. Originally, our project relied on clinical data acquired for the heart and large vessels only. A proper definition and calibration of a MS model of hemodynamics and vascular G&R also necessitates MS data. Thus, bigger emphasis on techniques for MS data acquisition combining different experimental modalities applied to different scales and levels of the model is needed and must be addressed by the MSM Consortium.
- MSM has historically focused on understanding and harnessing order across hierarchies. Recent results suggest that order is critical for properties such as strength, flow, and conduction, whereas heterogeneity and disorder are critical for properties such as toughness, mixing, and redundancy. Modeling the latter is challenging and computationally expensive because techniques such as Monte Carlo approaches must be used. A pressing need exists for sound homogenization methods that predict properties of disordered tissues.
- The MSM consortium needs to encourage improved methods of annotating and sharing biological knowledge. The modeling community is critically dependent on finding relevant data (wet lab or computational) but the current methods of publication and data sharing are very poor. Note that this is an international challenge that must include non-human health researchers.
- We believe that MSM consortium would benefit by more focus on common computational techniques rather than the current focus on common application areas.
- There is a paucity of geometric information from which realistic microcirculatory networks can be constructed, as well as limited physiological data for heterogeneity of NO and pO2 in different organs. Our present computational methods using COMSOL software would not be sufficient to solve complex mass transport interactions in a network with heterogenous properties.
- Employment of cutting-edge deep learning techniques and development and validation of predictive models based on longitudinal data.
- Providing interfaces/meshing between high-level top-down phenomenological models of behavior with low-level bottom-up models of basic physiology
- While there is, generally speaking, much interest in machine learning and “big data” approaches to modeling complex systems, there is comparatively less emphasis on mechanism-based modeling. I would respectfully suggest that a future MSM challenge could be focused exclusively on the development, application, and uncertainty assessment of mechanism based, multi-scale models.
- Granularity: Biologic waveforms can be assessed continually whereas brainstem cytokines are measured at designated time points from homogenized tissue, thus unique sets of animals are required for each time point. We proposed to align samples post hoc. However, our measurements ventilatory pattern variability over 5-min epochs are can vary in the transition period. Once our measures indicate low variability and high predictability stably, the animal may already be in sepsis
- Multi-fidelity modeling via machine learning of multiscale biological systems
- Physics/Biology-informed learning machines for discovering new models
- How to link, in a mechanistic rather than phenomenological way, molecular mechanisms to cell behavior. To what extent is the more mechanistic linkage useful and predictive, in a way that informs molecular interventions?

**New MSM Methodology challenges - Draft Outline**

**Contacts: Jason Haugh (jason_haugh@ncsu.edu), Saleet Jafri (sjafri@gmu.edu)**

I. Spanning Scales in Multiscale Modeling

1. Coarse-graining approaches bridging molecular dynamics and sub-cellular models.

2. Linking molecular fundamentals to cellular behavior in a more mechanistic way, to inform molecular interventions. Connection to omics data.

3. Linking cellular and tissue scales. Modeling heterogeneity of cell types and of the cell microenvironment. Developing sound homogenization methods that predict properties of disordered tissues.

4. Combining continuum based finite elements simulation methods (e.g., CFD, structural, and FSI) with particle based methods (e.g., DPD, MD) to cover the vast range of scales in biological processes. Common computational techniques with an emphasis on rigorous coupling in space and time.

5. Providing interfaces/meshing between high-level top-down phenomenological models of behavior with low-level bottom-up models of basic physiology.

II. Combining Data Analytics with Multiscale Modeling

6. Development, application, and uncertainty assessment of mechanism based, multi-scale models.

7. Multi-fidelity modeling via machine learning of multiscale biological systems. Physics/Biology-informed learning machines for discovering new models. Machine learning that takes into account physical principles, enabling inferences to be made in spite of noisy data.

8. Methods to predict regulation from principles (could be combined with inference from data). Understanding why cells are regulated, from an operational or dynamic perspective, will lead to predicting how cells are regulated.

9. Employment of cutting-edge deep learning techniques and development and validation of predictive models based on longitudinal data.

III. Integration of Experiment with Multiscale Modeling

10. Balancng the simplicity that models should bring with the complexity of biology through integration of biological data and hypothesis in multiscale modeling frameworks.

11. Strategies for experimental data generation and experimental design that are specifically generated for model development and validation.

12. Use of various imaging modalities to acquire dynamical data and to acquire realistic geometric information at various biological scales.

13. Developing heterogeneous multiscale methods to combine disparate descriptions (theories and models from different quantitative fields and biological and clinical data) into one process/simulation framework.

14. Enhancing communication (language and approaches) between experts in in engineering/ modeling approaches experimental biology.

IV. Model Reproducibility and Sharing

15. Frameworks for defining and validating model reproducibility.

16. Improved methods of annotating and sharing biological knowledge, both experimental and computational.

COMMENTS:

Regarding II-9:

Models extracted from longitudinal data using statistical and machine learning techniques many times represent local phenomena observed locally at a specific data set. Many times those data-sets cannot be shared such as in the case of individual data. Therefore those models may not validate well on another data-set. Techniques such as cross validation used in Machine learning may not preserve elements described by multiple models each describing a different data-set. To preserve characteristics of different models and data-sets an ensemble model approach can be employed where multiple models representing different data-sets are merged together. This way knowledge can be accumulated by merging models where data cannot be merged. The Reference Model for disease progression takes this approach where multiple disease models are merged and optimized to best validate against Clinical Trial Summary data. Information on the model is available on: https://simtk.org/projects/therefmodel/ . In a nutshell, the model accumulates knowledge by 1) accumulating models extracted from different longitudinal data-sets. 2) accumulating validation data - results of clinical trials - now can be imported from ClinicalTrials.Gov . With such a knowledge base, it is possible to get a better model that fits data and figure out which models are more dominant and which should not be used. It is also possible to explore the data and find outliers in it compared to the best model. It is also possible to ask questions about the data and the modeling assumptions. The Reference Model was able to deduce that disease models are getting outdated fast due to fast improvement in medical practice. This important element may impact future developers to add a time stamp to their models.

Methods: Ensemble Models, Optimization with constraints, Competitive and Cooperative merging of models, Evolutionary Computation for population generation

Future challenges: Merging human interpretation

Regarding the above comment:

1. One can identify different clusters using different data sets, e.g. imaging clusters, clinical clusters and molecular clusters. A fundamental question is: are they meaningful? For instance, are we able to use cluster membership to improve disease treatment toward personalized medicine? Is cluster membership stable over time?

2. To answer the above questions, longitudinal data is a necessity. Useful longitudinal data for large populations are rare, even those collected by NIH-funded multi-center trials. Not sure what data one could find from ClinicalTrials.Gov.

Posted by Ching-Long Lin

Here are answers to Ching-Long Lin from Jacob Barhak

1. The word cluster was not used in the text above. There are things that can be done with clustering algorithms, yet first enough information needs to be gathered. The real issue is how we can use the knowledge we collect. Currently this knowledge is held in the heads of humans that not necessarily see the data in the same way. We want this knowledge available to many machines that collaborate faster that humans and can access much more data than humans. This is the essence of machine learning techniques that automate reasoning tasks. And note that the questions are mostly aimed at human perception rather than automation which is what machine learning does - machine learning described data as models. Humans need better tools to grasp the accumulating pile of data to make better decisions - so we need to develop such techniques.

2. Longitudinal data is one type of data that is indeed rare and more importantly not transferable by law in most cases. Therefore focusing on longitudinal data alone has a built in limitation that diminishes its potency. Therefore other ways should be designed to allow accumulation of knowledge. The Reference Model is one example. I believe that in the future similar ideas will appear that will allow merging of data that cannot be merged regularly. The power of longitudinal data is typically shear size and diversity. Yet combined summery report of multiple clinical trials are more powerful since many more people are involved in those studies overall and diversity is maintained due to the differences between studies. ClinicalTrails.Gov holds now over a quarter of a million clinical trials and the database is growing rapidly - roughly 10% a year and up until recently studies did not have to report - now they have to. More an more studies are reporting results in this NIH held database. The Reference Model has an interface that can read this results data for purposes of validation. So phenomena observed in one clinical study, where longitudinal data was available and transformed into a model, can now be validated and merged with other studies.

Regarding the post by Misha Pavel about "modeling at the behavioral and cognitive levels" and "recent advances in sensor and communication technology enable accuate measurements and monitoring in real life".

This is an excellent suggestion. One of the major reasons why AI, smart collaborative robotics and autonomous vehicles/systems would have a major impact in the near future (http://engineering.columbia.edu/kai-fu-lee-speech) is due to advances in sensor and communication technology, or in a broader sense being similar to the notions of Internet of Things (IoT) (https://en.wikipedia.org/wiki/Internet_of_things), industry 4.0 (https://en.wikipedia.org/wiki/Industry_4.0), or Apple health app or Researchkit & Carekit (https://www.apple.com/ios/health/). This topic is relevant to machine learnig. I would welcome Dr. Pavel to elaborate the discussion of this topic.

Posted by Ching-Long Lin

## Comment

## The key idea driving our

## With regards to Machine

With regards to Machine Learning and data analytics:

**Explainable Machine Learning**

**Daniel Jacobson1 and Ben Brown2**

**1****Oak Ridge National Laboratory, 2Lawrence Berkeley National Laboratory**

**Figure ****1****.** We envision that in five to ten years’ time we will see a significant transition toward thinking of individual machine learning technologies as "optimization strategies" rather than ends unto themselves. In this emerging paradigm, the fitted learning engine is a means by which to discover important patterns or dynamics in complex data – patterns which are ultimately represented by response surfaces, or systems of dynamical equations (*etc*.) to render the content of the learning machine interpretable, or explainable, to the data scientist.

The combinatorial space of all of the possible interactions in a complex system (such as a biological organism) is very large (10170 possible interactions in a single human cell), and thus there are considerable computing challenges in searching for these relationships. This same challenge exists in any discipline that can describe a system by input and output matrices or tensors, e.g. the material science genome project, parameter sweeps in uncertainty quantification in simulation driven disciplines, additive manufacturing, process control systems (engineering, manufacturing), text mining (scientific literature, medical records, *etc.*), power grid management, cybersecurity, *etc*. The ability to understand such complex systems will usher in a new era in scientific machine learning. Success in the construction and application of algorithms that provide insight into the interactive mechanisms responsible for the emergent dynamics of complex systems will have transformative effects on virtually every area of science. However, machine learning (ML) techniques are often seen as black boxes, and the few techniques that “open” that box are extremely limited in the information they can obtain (e.g. main effects, pairwise interactions and other lower-order effects, usually of specified form). If current-day statistical machine learning is viewed as a collection of methods for mapping between two matrices or tensors, X and Y, in almost all cases X can be N-dimensional, but Y must be one- or low-dimensional (in some cases the use of N-dimensional Y is impossible and in others the algorithm’s computational cost scales exponentially with N), which severely limits what problems can be solved. In the cases where Y is allowed to be multidimensional, such as DNNs, the model may be able to accurately predict Y, but the process of transforming X into Y is obscured by the model, and hence scientific insight is extremely limited.

We view modern machine learning for science as composed of two primary thrusts, which we discuss below. We conclude by pointing to a near-future transition away from focusing on the “optimization strategy”, and toward “interpretation strategies” that will ultimately be independent of the underlying fitted learner – a vision we sketch in Fig. 1.

**Translating ensemble models based on histograms and other density estimates**

Since their development by Breiman [1] in the late ’90s and early 2000s, random forest (RF) methods have been extremely effective and widely used techniques in ML for both prediction and feature importance determination. Until the dawn of deep learning, RFs dominated the various machine learning leader boards. The effectiveness of RF methods is derived from the ability of decision trees to describe very precise rules about a given data set and the use of a collection of decision trees as an ensemble with each tree using random subsets of the data and random subsets of parameters to limit the bias inherent in any one individual decision tree. Ensemblization of weakly dependent learners, such as randomized trees, leads to provable convergence and consistency in impressively general settings [1]. Trees are identically histograms – they are fit in a data-adaptive fashion, but they are simply Haar wavelet density estimates that are obtained through a hierarchical optimization procedure. The ensemble generates a smooth density estimate, or in the case of regression, a smooth response surface. Remarkably, even if no optimization whatsoever is used, the resulting histograms are still intrinsically data adaptive – they are simply histograms with random bin widths, and “extremely Random Forests” has shown comparable performance to RF [2]. Others have developed non-Haar versions of RF, though it is not yet clear that anything is gained by alternative bases within the trees – the ensemblized smooth estimator is often left unchanged [3]. The incredible flexibility of these algorithms has led to their adoption in virtually every area of science. Recent developments include random intersection trees (RITs) [4] and iterative RFs (iRFs) [5] – density estimates (or response surfaces) that can be mined not only for feature importance but also for interactions between features – even when those interactions are highly localized to subsets of observations and of complex forms. It is now possible to identify interactions of any form or order at the same computational cost as main effects [4-5] – enabled by importance sampling on the space of all subsets (order 2*P*). Hence, RF and its many decedents are extremely valuable for scientific inference, e.g. for the mapping of complex response surfaces that underlie natural phenomena (Fig. 2), the construction and parameterization of surrogate simulations, and, broadly speaking, the engineering of processes of previously intractable complexity.

However, there is a major bottleneck on the utility of these methods – they specialize in one- or low-dimensional Y. Hence, they are of limited utility in data fusion exercises where we wish to understand a potentially high dimensional process (e.g. a tensor) as a function of another high-dimensional process. Classically, one would simply regress each phenotype one at a time, but it is costly, and also lossy unless we care only about E(Y|X) – which may be far from true. We are often interested in the joint distribution of Y and X in detail – e.g. we may care enormously about the variance of Y given X for process control or stability analysis. Tensor-ready versions of ensemblized density estimators have the potential to open fundamentally near areas of research in statistical machine learning: such algorithms will inherit the astounding interpretability of RF “for free”, and stand to change the way we think about coupled high-dimensional processes.

**Translating Deep Neural Networks and architectures based on stacked ensembles of linear models**

We view modern machine learning as having two primary thrusts: the first we discussed above, and the second is of course Deep Learning. These algorithms top virtually every ML leader board, and provide astounding predictive power. However, they are currently “alchemical”, in that we have little insight into why they work. The models have millions of parameters, usually far more than training observations. It has been observed that different optimization engines provide differing degrees of generalization error, and for the most part these phenomena remain mysterious [6]. Interpretation strategies for Deep Learners have focused on understanding the activation profiles of individual neurons or filter selection, e.g. [7], but these are a far cry from the response surfaces extractable from RF-based algorithms. Research into response surface extraction from DNNs is an important, though slowly emerging frontier, and fundamentally new ideas are needed.

Figure 2. An iRF-derived response surface showing AND-like rule structure in a biological process.

**Interpreting fitted learners – learning to explain what learning machines have learned:**

Currently, when we talk about “a machine learning algorithm” we are referring to a specific optimization strategy for constructing a fitted data representation inside a computer – an RF, or a DNN, *etc*. The scientist usually does not care about the capacity to predict – the scientist wants to understand why prediction is possible, to discover mechanism, to quantify uncertainty, to learn to simulate processes with intractable physics. The future of machine learning in science is therefore not the black box – the black box becomes a means to an end, and regardless of what “optimization strategy” we employed to construct a data representation, the only thing we care about at the end is understanding, in as much detail as possible, that data representation. We care about learning from the learning machine. Hence, we foresee a sea-change in the primary thrust of statistical machine learning for science, with essential new foci on interpretation strategies for fitted algorithms, including certainly DNNs and RFs, and the many architectures that will follow, including the hybrids in between [8-9]. In 10 years, when we say, “machine learning”, we will be referring more to what we learn from the machine rather than what the machine has learned from the data. The machine will become a lens through which we view our data – our partner in interpretation and hypothesis generation – as indispensable as the microscope.

**References**

- Breiman, L. “Random forests.” Machine Learning, 45(1), 5–32 (2001).
- Mironică, I. and Dogaru, R., 2013. A novel feature-extraction algorithm for efficient classification of texture images.
*Scientific Bulletin of UPB, Series C-Electrical Engineering*. - Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U. and Hamprecht, F.A., 2011, September. On oblique random forests. In
*Joint European Conference on Machine Learning and Knowledge Discovery in Databases*(pp. 453-469). Springer, Berlin, Heidelberg. - Shah, R. D., and Meinshausen, N. “Random intersection trees.” Journal of Machine Learning Research, 15, 629–654 (2014).
- Basu, S.; Kumbier, K.; Brown, J. B.; and Yu, B. “Iterative random forests to detect predictive and stable high-order interactions.” arXiv preprint, arXiv: 1706.08457 (2017). (PNAS, in press, 2018)
- Advani SM, Saxe AM. High-dimensional dynamics of generalization error in neural networks. (2017) https://arxiv.org/abs/1710.03667
- Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, Hod Lipson. Understanding Neural Networks Through Deep Visualization. https://arxiv.org/abs/1506.06579 (2017)
- Gérard Biau (LPMA, LSTA), Erwan Scornet (LSTA), Johannes Welbl. Neural Random Forests. https://arxiv.org/abs/1604.07143 (2016)
- Zhou, Z.-H., and Feng, J. “Deep forest: Towards an alternative to deep neural networks.” arXiv preprint, arXiv: 1702.08835 (2017).

## As I mentioned during the

As I mentioned during the session. Explainable Machine Learning/AI has tremendous application to this field and is a way to capture the higher-order interactions responsible for reguatlion and functions in biological systems across multiple scales.

**Explainable Machine Learning**

**Daniel Jacobson1 and Ben Brown2**

**1****Oak Ridge National Laboratory, 2Lawrence Berkeley National Laboratory**

**Figure ****1****.** We envision that in five to ten years’ time we will see a significant transition toward thinking of individual machine learning technologies as "optimization strategies" rather than ends unto themselves. In this emerging paradigm, the fitted learning engine is a means by which to discover important patterns or dynamics in complex data – patterns which are ultimately represented by response surfaces, or systems of dynamical equations (*etc*.) to render the content of the learning machine interpretable, or explainable, to the data scientist.

The combinatorial space of all of the possible interactions in a complex system (such as a biological organism) is very large (10170 possible interactions in a single human cell), and thus there are considerable computing challenges in searching for these relationships. This same challenge exists in any discipline that can describe a system by input and output matrices or tensors, e.g. the material science genome project, parameter sweeps in uncertainty quantification in simulation driven disciplines, additive manufacturing, process control systems (engineering, manufacturing), text mining (scientific literature, medical records, *etc.*), power grid management, cybersecurity, *etc*. The ability to understand such complex systems will usher in a new era in scientific machine learning. Success in the construction and application of algorithms that provide insight into the interactive mechanisms responsible for the emergent dynamics of complex systems will have transformative effects on virtually every area of science. However, machine learning (ML) techniques are often seen as black boxes, and the few techniques that “open” that box are extremely limited in the information they can obtain (e.g. main effects, pairwise interactions and other lower-order effects, usually of specified form). If current-day statistical machine learning is viewed as a collection of methods for mapping between two matrices or tensors, X and Y, in almost all cases X can be N-dimensional, but Y must be one- or low-dimensional (in some cases the use of N-dimensional Y is impossible and in others the algorithm’s computational cost scales exponentially with N), which severely limits what problems can be solved. In the cases where Y is allowed to be multidimensional, such as DNNs, the model may be able to accurately predict Y, but the process of transforming X into Y is obscured by the model, and hence scientific insight is extremely limited.

We view modern machine learning for science as composed of two primary thrusts, which we discuss below. We conclude by pointing to a near-future transition away from focusing on the “optimization strategy”, and toward “interpretation strategies” that will ultimately be independent of the underlying fitted learner – a vision we sketch in Fig. 1.

**Translating ensemble models based on histograms and other density estimates**

Since their development by Breiman [1] in the late ’90s and early 2000s, random forest (RF) methods have been extremely effective and widely used techniques in ML for both prediction and feature importance determination. Until the dawn of deep learning, RFs dominated the various machine learning leader boards. The effectiveness of RF methods is derived from the ability of decision trees to describe very precise rules about a given data set and the use of a collection of decision trees as an ensemble with each tree using random subsets of the data and random subsets of parameters to limit the bias inherent in any one individual decision tree. Ensemblization of weakly dependent learners, such as randomized trees, leads to provable convergence and consistency in impressively general settings [1]. Trees are identically histograms – they are fit in a data-adaptive fashion, but they are simply Haar wavelet density estimates that are obtained through a hierarchical optimization procedure. The ensemble generates a smooth density estimate, or in the case of regression, a smooth response surface. Remarkably, even if no optimization whatsoever is used, the resulting histograms are still intrinsically data adaptive – they are simply histograms with random bin widths, and “extremely Random Forests” has shown comparable performance to RF [2]. Others have developed non-Haar versions of RF, though it is not yet clear that anything is gained by alternative bases within the trees – the ensemblized smooth estimator is often left unchanged [3]. The incredible flexibility of these algorithms has led to their adoption in virtually every area of science. Recent developments include random intersection trees (RITs) [4] and iterative RFs (iRFs) [5] – density estimates (or response surfaces) that can be mined not only for feature importance but also for interactions between features – even when those interactions are highly localized to subsets of observations and of complex forms. It is now possible to identify interactions of any form or order at the same computational cost as main effects [4-5] – enabled by importance sampling on the space of all subsets (order 2*P*). Hence, RF and its many decedents are extremely valuable for scientific inference, e.g. for the mapping of complex response surfaces that underlie natural phenomena (Fig. 2), the construction and parameterization of surrogate simulations, and, broadly speaking, the engineering of processes of previously intractable complexity.

However, there is a major bottleneck on the utility of these methods – they specialize in one- or low-dimensional Y. Hence, they are of limited utility in data fusion exercises where we wish to understand a potentially high dimensional process (e.g. a tensor) as a function of another high-dimensional process. Classically, one would simply regress each phenotype one at a time, but it is costly, and also lossy unless we care only about E(Y|X) – which may be far from true. We are often interested in the joint distribution of Y and X in detail – e.g. we may care enormously about the variance of Y given X for process control or stability analysis. Tensor-ready versions of ensemblized density estimators have the potential to open fundamentally near areas of research in statistical machine learning: such algorithms will inherit the astounding interpretability of RF “for free”, and stand to change the way we think about coupled high-dimensional processes.

**Translating Deep Neural Networks and architectures based on stacked ensembles of linear models**

We view modern machine learning as having two primary thrusts: the first we discussed above, and the second is of course Deep Learning. These algorithms top virtually every ML leader board, and provide astounding predictive power. However, they are currently “alchemical”, in that we have little insight into why they work. The models have millions of parameters, usually far more than training observations. It has been observed that different optimization engines provide differing degrees of generalization error, and for the most part these phenomena remain mysterious [6]. Interpretation strategies for Deep Learners have focused on understanding the activation profiles of individual neurons or filter selection, e.g. [7], but these are a far cry from the response surfaces extractable from RF-based algorithms. Research into response surface extraction from DNNs is an important, though slowly emerging frontier, and fundamentally new ideas are needed.

Figure 2. An iRF-derived response surface showing AND-like rule structure in a biological process.

**Interpreting fitted learners – learning to explain what learning machines have learned:**

Currently, when we talk about “a machine learning algorithm” we are referring to a specific optimization strategy for constructing a fitted data representation inside a computer – an RF, or a DNN, *etc*. The scientist usually does not care about the capacity to predict – the scientist wants to understand why prediction is possible, to discover mechanism, to quantify uncertainty, to learn to simulate processes with intractable physics. The future of machine learning in science is therefore not the black box – the black box becomes a means to an end, and regardless of what “optimization strategy” we employed to construct a data representation, the only thing we care about at the end is understanding, in as much detail as possible, that data representation. We care about learning from the learning machine. Hence, we foresee a sea-change in the primary thrust of statistical machine learning for science, with essential new foci on interpretation strategies for fitted algorithms, including certainly DNNs and RFs, and the many architectures that will follow, including the hybrids in between [8-9]. In 10 years, when we say, “machine learning”, we will be referring more to what we learn from the machine rather than what the machine has learned from the data. The machine will become a lens through which we view our data – our partner in interpretation and hypothesis generation – as indispensable as the microscope.

**References**

- Breiman, L. “Random forests.” Machine Learning, 45(1), 5–32 (2001).
- Mironică, I. and Dogaru, R., 2013. A novel feature-extraction algorithm for efficient classification of texture images.
*Scientific Bulletin of UPB, Series C-Electrical Engineering*. - Menze, B.H., Kelm, B.M., Splitthoff, D.N., Koethe, U. and Hamprecht, F.A., 2011, September. On oblique random forests. In
*Joint European Conference on Machine Learning and Knowledge Discovery in Databases*(pp. 453-469). Springer, Berlin, Heidelberg. - Shah, R. D., and Meinshausen, N. “Random intersection trees.” Journal of Machine Learning Research, 15, 629–654 (2014).
- Basu, S.; Kumbier, K.; Brown, J. B.; and Yu, B. “Iterative random forests to detect predictive and stable high-order interactions.” arXiv preprint, arXiv: 1706.08457 (2017). (PNAS, in press, 2018)
- Advani SM, Saxe AM. High-dimensional dynamics of generalization error in neural networks. (2017) https://arxiv.org/abs/1710.03667
- Jason Yosinski, Jeff Clune, Anh Nguyen, Thomas Fuchs, Hod Lipson. Understanding Neural Networks Through Deep Visualization. https://arxiv.org/abs/1506.06579 (2017)
- Gérard Biau (LPMA, LSTA), Erwan Scornet (LSTA), Johannes Welbl. Neural Random Forests. https://arxiv.org/abs/1604.07143 (2016)
- Zhou, Z.-H., and Feng, J. “Deep forest: Towards an alternative to deep neural networks.” arXiv preprint, arXiv: 1702.08835 (2017).

The key idea driving our thinking within the IMAG group is that advancing our understanding and improving clinical practice requires computational modeling at multiple scales that would capture “complex network of interscale causal interactions” [ Lytton, 2017], nicely represented in the slides of Olaf Dammann. I think that we all agree that empirical evidence available at each scale is valuable and often critical in evaluating our theoretical models and the underlying assumptions. This is also true in the process of connecting neural models at all levels to dysfunction that is often defined and observable at the behavioral and cognitive deficit level. Yet we seem to pay less attention to modeling at the behavioral and cognitive levels and are often satisfied surrogate measurements or with subjective assessments. This may be due to the fact that until recently, objective assessments of behaviors was not easy outside of laboratory, but recent advances in sensor and communication technology enable accurate measurements and monitoring in real life. These advances are facilitating the notions of “digital phenotyping” that together with the development of computational models will enable us to connect neurophysiological models to sensory-motor and cognitive functions. I believe that the development of this interscale connections and models is among the key challenges for the IMAG efforts. If there is enough interest, I would be happy to elaborate the discussion of this topic. Misha Pavel, m.pavel@neu.edu