Defining standards for good modeling practice

Classes of models might be defined as Class 1: lists of observed objects, categorized into groups suspected of being related, Class 2: linkage maps of proposed relationships, sometimes with cause-and-effect directionality, Class 3: kinetic descriptions of relationships between variables developed enough that model solutions fit observed time course behavior, and Class 4: biophysically based models that appear to provide an "explanation", demonstrating both good fitting of model solutions to data, and being established on a thermodynamically solid base. (While one might regard Class 4 models as the epitome of scientific success, there may actually be no biophysical model that does not at some point incorporate a class 3 descriptor of some component.) Because the distinction between classed is imperfect, we begin by defining the one with the clearest definition, the standard most difficult to meet:

Class 4. Biophysically-based models: 

We begin with objective characteristics of models demonstrated to be "valid" and made available to the user community: they should be suitable for archiving in a Physiome Database and for publication. The goal of adhering to a standard for models is to guide the structuring, presentation, and documentation of the model. The model should be made available as a completely documented, verified, and validated model, so that other investigators can reproduce it accurately and easily, or download a working version directly onto their own computer, and then challenge it, examining its behavior against their own data. Each model should include a list of the assumptions or conditions under which the model is seemingly correct, defining the constraints or the conditions for validity. Class 4 models should have the following:

Conservation:

  1. Unitary Balance - exact balancing of units in all equations.
  2. Mass Balance - total conservation of mass of individual components. (Conservation of volume should follow from this if all partial molar volumes are known.)
  3. Charge Balance - accounting for charge transfer across membranes and for membrane potentials and Donnan equilibria.
  4. Osmotic Balance - accounting for water and solute fluxes in transient and steady states
  5. Thermodynamic balance - obeying Haldane constraints for reactions, and having energy balance

Verification: Demonstration that the mathematical expressions defining the model is complete and that the computational method used provided correct solutions to the mathematics:

  1. Equations must be mathematically correct and complete, with unitary balance, with initial and boundary conditions defined, and with explicit definitions, units, and unambiguous descriptions of each parameter.
  2. Running code supplied in some reasonably commonly used form. The code should exhibit:
    • numerical solutions matching appropriate reduced cases having analytical solutions, etc.
    • runs correctly with no, or at least little, dependence on step size
    • runs from varied initial conditions to appropriate steady states
    • runs on more than 1 platform

Validation, meaning that model solutions fit physiological data: (By "valid", we mean "not invalidated", since true validity is unprovable.)

  1. Initial conditions: should be consistent with a physiological steady state (constant or oscillatory)
  2. Sets of data, fitted by the model, should be provided for evaluation of the models by others. Provision also of data sets which cannot be fitted, and therefore serve as challenges to the model.
  3. The results of fitting variegated data sets, showing applicability of the model to high quality experimental data from different sources and of different sorts.
  4. Parameter justification and evaluation: Since not all of a model's parameters are necessarily determined via the fitting of the selected data sets, the others should be justified through citations, solutions, etc., choosing references to prior publications that provide best characterization. Inferences obtained through fuzzy logic would not suffice, and such a model should rather be in Class 3. The parameters determined via model analysis should be described by estimated means and confidence ranges.

Documentation: Each model should be accompanied by:

  1. A peer reviewed publication, or equivalent, providing a full description, with the validation. Each parameter and variable should be defined by (1) symbol (2) name (3) definition, and (4) units.
  2. A description of its phylogenetic heritage and historical and contemporary setting.
  3. Documentation with references for parameter values, appropriate to the species, age, sex, etc.
  4. A description of model components or submodels and their sources, if applicable.
  5. A description of models incorporating this model into a larger more integrated system, illustrating the position of this model in the hierarchy.

Obeisance to Good Modeling Practices: Does the model with documentation, list the following:

  1. Fundamental assumptions
  2. Limitation and shortcomings
  3. List of alternative models to be considered
  4. Describe level of detail used in the particular model to position it within a hierarchy of related models.

Provision for Critique, Commentary and Discussion: This would presumably be supported on the website providing the model and would include:

  1. Commentary by authors, by reviewers, and the responses by authors.
  2. Commentary as in letters to the editor.
  3. Critiques published subsequently by other authors or the same authors.
  4. Listings of and references to competing or alternative models.

Model classes 1 to 3, earlier phases of models:

Given that even as classically beautiful and thoroughly presented and reproducible a model as that of Hodgkin and Huxley (1952) cannot fulfill all the requirements for a Class 4 model (it does not provide for ionic balance for example), most model presentations will likewise probably fall short, but will nevertheless be important to document and to provide in reproducible form. Recognizing that models are started with mere wisps of ideas, a database of models for the research community might include the design and development phases of models.

Class 3 Models. (Kinetic descriptions of relationships between variables and model solutions fitting observed time course behavior.)

Class 3 models are computable. Most published models lie here, illustrating that even when one attempts to define a model as completely as possible there will be identifiable shortcomings precluding its classification as Class 4. There is obviously a fuzzy edge to these classifications: when a Class 3 model is provided with fully defined assumptions and limitations it might be "elevated" to Class 4 status.

Defining Class 3 models as incomplete Class 4 models is useful: they lack one or more specific attributes of a Class 4 model. The identification of the missing attribute thence defines what next is needed: more data regarding a component, the replacement of an empirical or assumed relationship with a biophysically or biochemically defined relationship, or the replacement of an hypothesized feedback loop required for realism with an identified and characterized feedback mechanism. Relationships between variables defined through combinations of fuzzy logic and optimization should be earmarked for experimental testing.Thus a list of the questionable assumptions is a critical component of a Class 3 model.

Class 2 models: (Linkage maps of proposed relationships, sometimes with cause-and-effect directionality)

An example of a class 2 model is a bacterial metabolic model wherein the components are mostly identified and the relationships more or less known, even if the reaction kinetic parameters are not yet known. If the stoichiometries of the reactions are known, then the system can be "characterized" using flux balance analysis (FBA), a technique for estimating the steady-state balance in a network of reactions from the stoichiometries when the reactions have not been characterized either mechanistically, kinetically or thermodynamically. Imposing thermodynamic constraints on a network model, converting it to energy balance analysis (EBA) improves the likelihood of the steady state fluxes being close to experimentally observed values, but still does not provide evidence on the shapes of transients or even on network stability.

Class 1 Models: (Lists of observed measures, categorized together in groups suspected of being related.)

This class might be regarded as a collection of objects masquerading as a model, and yet this is where almost all models or and systems concepts begin. Examples can be taken from various fields. One would be the historical development of an understanding of the tricarboxylic acid (TCA) cycle from an incomplete set of observations of a set of solutes in the first half of the twentieth century, the organization of these by the husband and wife team, the Coris (the Cori cycle), and the later incorporation by Krebs into the TCA cycle, leading to later refinements. Another illustration is the current status of genetic regulatory networks: while the gene or set of genes providing a given protein can be readily identified, the succession of proteins involved in promoting expression and in regulating expression to stable levels are mostly unknown. These regulatory proteins achieve only miniscule concentrations, a few copies per cell, and evade identification, but nevertheless need to be recognized and their kinetics characterized via modeling, even though stochastic modeling techniques will probably be required. This early phase model type can be characterized using lists, Venn diagrams of potentially related elements, and conceptual network diagrams. (Other examples could relate to glucose, hypertension, alveolar gas exchange, etc.)

Download 'Good Modeling Standards' in checklist form: Standards checklist

Model development and archiving support at https://www.imagwiki.nibib.nih.gov/physiome provided by the following grants: NIH U01HL122199 Analyzing the Cardiac Power Grid, 09/15/2015 - 05/31/2020, NIH/NIBIB BE08407 Software Integration, JSim and SBW 6/1/09-5/31/13; NIH/NHLBI T15 HL88516-01 Modeling for Heart, Lung and Blood: From Cell to Organ, 4/1/07-3/31/11; NSF BES-0506477 Adaptive Multi-Scale Model Simulation, 8/15/05-7/31/08; NIH/NHLBI R01 HL073598 Core 3: 3D Imaging and Computer Modeling of the Respiratory Tract, 9/1/04-8/31/09; as well as prior support from NIH/NCRR P41 RR01243 Simulation Resource in Circulatory Mass Transport and Exchange, 12/1/1980-11/30/01 and NIH/NIBIB R01 EB001973 JSim: A Simulation Analysis Platform, 3/1/02-2/28/07.