** Session Description: **The session consists of two 40 minute keynote lectures, followed by an open discussion led by the moderators.

Charge to Speakers and Audience: Discuss concepts, approaches and experience for (1) how domain knowledge can be maximally integrated into machine learning and (2) how the learning can be used to inform multi-scale models.

**Keynote Lectures (Tentative titles): **

1. Simulation-assisted machine learning, David Craft, Harvard Medical School.

2. Predicting molecular properties with covariant compositional networks, Risi Kondor, University of Chicago.

**Keynote Speaker Bios:**

* Dr. David Craft *is an Assistant Professor and the Head of the Optimization and Systems Biology Lab at Harvard Medical School and MGH. His current focus is on the development of machine learning approaches for personalized cancer treatment. The main research questions related to this line of work are:

*what is the value of prior knowledge in this domain*and

*how can we best incorporate such expert knowledge?*

* Dr. Risi Kodor *is an Associate Professor in the Departments of Computer Science and Statistics at the University of Chicago. His current research interests lie in the areas of machine learning, statistical learning theory, non-commutative harmonic analysis, computational group theory, harmonic analysis on graphs and networks, theoretical computer science, and applications in computational physics and biology.

**Moderator Bios: **

* Dr. Paris Perdikaris *is an Assistant Professor in the Department of Mechanical Engineering and Applied Mechanics at the University of Pennsylvania. His work spans a wide range of areas in computational science and engineering, with a particular focus on the analysis and design of complex physical and biological systems using machine learning, stochastic modeling, computational mechanics, and high-performance computing.

* Dr. William Cannon *is a Scientist in the Physical and Computational Sciences Division at the Pacific Northwest National Laboratory. His work combines modeling & simulation with data analysis to understand how cells change as a function of the environment, and more broadly to understand the principles of self-organization and the emergence of biological function.

**Workshop Discussion Questions:**

- Can theory-driven machine learning approaches enable the discovery interpretable models that not only can explain the observed data, but can elucidate mechanisms, distill causality, and help us probe interventions and counterfactuals in complex multiscale systems? For instance, causal inference generally uses various statistical measures such as partial correlation to infer causal influence. If instead, the appropriate statistical measure were known from the physics, such as a statistical odds ratio from thermodynamics, would the causal inference be more accurate or interpretable as a mechanism?
- Can theory-driven machine learning, combined with sparse and indirect measurements of the phenomena, produce a mechanistic understanding of how biological phenomena emerge?
- Can theory-driven machine learning approaches uncover meaningful and compact representations for complex inter-connected processes, and, subsequently, enable the cost-effective exploration of vast combinatorial spaces (e.g., design of bio-molecules with target properties)?
- Uncertainty quantification is the back-bone of decision making. Can theory-driven machine learning approaches enable the reliable characterization of predictive uncertainty and pinpoint its sources? The quantification of uncertainty has many practical applications such as decision making in the clinic, the robust design of synthetic biology pathways, drug target identification and drug risk assessment, as well as guiding the judicious acquisition of new data.
- Is deep learning necessary in theory-driven learning? In principle, the more domain knowledge that is incorporated into the model the less that needs to be learned and the easier the computing task will be.
- It is likely that the applications will utilize a range of techniques, from dynamic programming to variational methods to standard machine learning to deep learning. Is high performance computing required when theory-driven models are employed?
- What are the challenges and limitations of incorporating theory into machine learning for use with multiscale modeling? Are some machine learning methods, such as reinforcement learning, more conducive to integration with multiscale modeling?