Glossary of Terms

Back to Brige2AI main page

 

This is a working page of terms and definitions used in the Bridge2AI program.

 

Active learning  

A research field focused on data-efficient machine learning algorithms that are able to query the high-dimensional data interactively to label data with the desired outputs for new training samples. In active learning, the results of existing measurements are used to prioritize the selection of data to be labeled. 

 

AI Artificial Intelligence 

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to perform tasks commonly associated with intelligent beings. The term may also be applied to projects of developing systems with intellectual processes of humans, such as the ability to reason, learning from past experience and problem-solving. 

In the Department of Energy 2019 Report of the Office of Science Roundtable on Data for AI, Data and Models: A Framework for Advancing AI in Science, AI is considered to be inclusive of machine learning (ML), deep learning (DL), neural networks (NN), computer vision, and natural language processing (NLP).  

 

AI accelerator

A class of specialized hardware or computer system designed to accelerate specific AI applications.  Some examples of AI accelerator include, Graphics Processing Unit (GPU), Vision Processing Unit (VPU),  Application-Specific Integrated Circuit (ASIC), Tensor Processing Unit (TPU) , and Reconfigurable devices such as field-programmable gate arrays (FPGA). 

 

AI/ML model  

An AI/ML model is an inference method that can be used to perform a “task,” such as prediction, diagnosis, classification, etc. The model is developed using training data or other knowledge.  

 

AI task  

The inference activity performed by an artificially intelligent system.  

 

AI tools  

AI tools, such as PyTorch and TensorFlow, used to build and deploy AI applications.  

 

AutoML  

Stands for automated machine learning, not to be confused with a Google toolkit with the same name. It also is the process of automatically finding the model and model hyperparameters that best describe a particular training dataset.  

 

Composable services

Composable services are created from interoperable modular components that can be assembled flexibly into multiple well-defined functional and usable tools or capabilities.  

 

Curation for Input and Output of AL/ML models  

Curation of Input/Output for AL/ML models involves the standardization, organization, and modification of inputs and outputs to AL/ML models to enable ease of comparison and extension. 

 

Data Bias in AI/ML

A type of error in which certain elements of a dataset are more heavily weighted and/or represented than others. Types of data bias include selection bias, exclusion bias, measurement bias, recall bias, observer bias etc. A biased dataset not only introduces issues of ethics and fairness, but also results in skewed outcomes, lower accuracy, and analytical errors. 

 

Data Curation  

A process to prepare data for analysis and interpretation that involves the annotation, organization and integration of data collected from a variety of data sources.  This process normally entails: Ensuring common data types ; Ensuring equivalent units of measurement; Correcting for source-specific artifacts. 

 

Data Curation for AI/ML

Modern AI/ML data curation involves the same organizational procedure described above, but also includes the conversion of data to tensors (with appropriate numerical formats) that can be used to train AL/ML models.  Note that traditional symbolic logic-based AL/ML models take text inputs directly, without the tensor conversion. 

 

Data for AI  

The digital artifacts used to generate AI models and/or used in combination with AI models during inference.  

 

Data Harmonization 

Any efforts to combine data from different sources. It is related to data curation. 

 

Data Sheets 

Data sheets for datasets document the motivation for the data collection, composition, collection process, recommended uses, preprocessing/cleaning/labeling, distribution, and maintenance of the data (https://arxiv.org/pdf/1803.09010.pdf).  

 

Data types – biomedical

Any data that describes a biological or medical process, molecular, physiologicaln or population, is biomedical data. The Open Biological and Biomedical Ontology Foundry (http://www.obofoundry.org/) is a useful resource for describing the varieties of biomedical data. 

 

Data types – AI/ML ready  

Data is any structured collection of numbers or strings. A data’s “data type” denotes a specific structural format:  language text is sequence of words, images are a set of color and luminosity matrices, social network data uses graph representations, and knowledge bases normally takes the form of relational database. All AI/ML-ready data are usually represented as floating point tensors. 

 

DL  - Deep Learning  

A family of machine learning methods based on artificial neural networks with multiple layers. 

 

Lifecycle Management of AL/ML models:  

Management of AL/ML models involves managing the lifecycle of model creation and model operations. The major steps include: Ease of model training and deployment; Model deployment and training at scale, Monitoring data governance, quality, and compliance; Visualization of the whole pipeline; Rich connectors to data sources. 

 

Lifelong learning  

Also known continuous learning. A strategy for dealing with a well-known shortcoming of artificial neural network approaches, namely catastrophic forgetting, where the model’s performance degrades on previously learned tasks as new tasks are introduced.  

 

Metadata 

Attributes of the data that are searchable by human or machine, e.g.: structure, dimensionality, sparseness, and multimodality of the data, information about the types of models trained on the data, discipline domain, source, author, and other information about the data generation process. Along with standards are key enablers of FAIR data.  Metadata are described by vocabularies that follow FAIR principles, linked to established ontologies to establish systematic linking between datasets, can use Semantic Models (see below).  Metadata are critical for understanding biases in data and for interpreting results from AI applications. 

 

ML  - Machine Learning  

A family of methods that uses data to train a model and use it  to perform predictions (or actions) on future data. 

 

Machine learning algorithm

Machine learning algorithms are procedures that are implemented in code to perform tasks such as prediction, diagnosis, classification etc. Machine learning algorithms can be implemented with a range of modern programming languages, and their efficiency can be analyzed and described. Some examples of machine learning algorithms are: linear regression, decision tree, random forest, artificial neural network, k-means, etc.

 

Machine learning model

A machine learning model is the output of a machine learning algorithm run on data. The model is trained using training data and then can process additional data/knowledge to make predictions. For example, for predicting a binary outcome, binary classification models can be trained using the logistic regression learning algorithm.

 

Model cards 

Model cards for AI trained models provide documentation of benchmarks of model performance  under various conditions (https://arxiv.org/pdf/1810.03993.pdf). They may include model details, intended use, factors, metrics, evaluation data, training data, quantitative analyses, ethical considerations, and recommendations. 

 

Ontology  

The models of knowledge and associated definitions and relationships among terms or categories that are essential for interoperability among datasets.  

 

Quality Assessments of Metadata

Quality assessment investigates the accuracy, consistency, and sufficiency of metadata. Some examples of the quality metrics include: completeness (measured by the percentage of records in which an element is used), accuracy, accessibility, conformance to expectations, consistency, timeliness, provenance, element frequency, entropy, vocabularies distribution etc.

 

Semantic Models 

A conceptual data model where semantic information is included.  Syntax (formats) describing data standards, could be replaced with domain-specific forward models that represent the data generation process. Semantic models are useful organizing the raw data to impose a logical structure on it, and in constructing metadata.  

 

Transfer learning  

The act of using pre-trained models for tasks/data other than what the models were originally designed for. 

 

Any efforts to combine data from different sources. It is related to data curation. 

 

Data Sheets 

Data sheets for datasets document the motivation for the data collection, composition, collection process, recommended uses, preprocessing/cleaning/labeling, distribution, and maintenance of the data (https://arxiv.org/pdf/1803.09010.pdf).  

 

Data types – biomedical

Any data that describes a biological or medical process, molecular, physiological or population, is biomedical data. The Open Biological and Biomedical Ontology Foundry (http://www.obofoundry.org/) is a useful resource for describing the varieties of biomedical data. 

 

Data types – AI/ML ready  

Data is any structured collection of numbers or strings. A data’s “data type” denotes a specific structural format:  language text is sequence of words, images are a set of color and luminosity matrices, social network data uses graph representations, and knowledge bases normally takes the form of relational database. All AI/ML-ready data are usually represented as floating point tensors. 

 

DL  - Deep Learning  

A family of machine learning methods based on artificial neural networks with multiple layers. 

 

Lifecycle Management of AL/ML models:  

Management of AL/ML models involves managing the lifecycle of model creation and model operations. The major steps include: Ease of model training and deployment; Model deployment and training at scale, Monitoring data governance, quality, and compliance; Visualization of the whole pipeline; Rich connectors to data sources. 

 

Lifelong learning  

Also known continuous learning. A strategy for dealing with a well-known shortcoming of artificial neural network approaches, namely catastrophic forgetting, where the model’s performance degrades on previously learned tasks as new tasks are introduced.  

 

Metadata 

Attributes of the data that are searchable by human or machine, e.g.: structure, dimensionality, sparseness, and multi-modality of the data, information about the types of models trained on the data, discipline domain, source, author, and other information about the data generation process. Along with standards are key enablers of FAIR data.  Metadata are described by vocabularies that follow FAIR principles, linked to established ontologies to establish systematic linking between datasets, can use Semantic Models (see below).  Metadata are critical for understanding biases in data and for interpreting results from AI applications. 

 

ML  - Machine Learning  

A family of methods that uses data to train a model and use it  to perform predictions (or actions) on future data. 

 

Machine learning algorithm

Machine learning algorithms are procedures that are implemented in code to perform tasks such as prediction, diagnosis, classification etc. Machine learning algorithms can be implemented with a range of modern programming languages, and their efficiency can be analyzed and described. Some examples of machine learning algorithms are: linear regression, decision tree, random forest, artificial neural network, k-means, etc.

 

Machine learning model

A machine learning model is the output of a machine learning algorithm run on data. The model is trained using training data and then can process additional data/knowledge to make predictions. For example, for predicting a binary outcome, binary classification models can be trained using the logistic regression learning algorithm.

 

Model cards 

Model cards for AI trained models provide documentation of benchmarks of model performance  under various conditions (https://arxiv.org/pdf/1810.03993.pdf). They may include model details, intended use, factors, metrics, evaluation data, training data, quantitative analyses, ethical considerations, and recommendations. 

 

Ontology  

The models of knowledge and associated definitions and relationships among terms or categories that are essential for interoperability among datasets.  

 

Quality Assessments of Metadata

Quality assessment investigates the accuracy, consistency, and sufficiency of metadata. Some examples of the quality metrics include: completeness (measured by the percentage of records in which an element is used), accuracy, accessibility, conformance to expectations, consistency, timeliness, provenance, element frequency, entropy, vocabularies distribution etc.

 

Semantic Models 

A conceptual data model where semantic information is included.  Syntax (formats) describing data standards, could be replaced with domain-specific forward models that represent the data generation process. Semantic models are useful organizing the raw data to impose a logical structure on it, and in constructing metadata.  

 

Transfer learning  

The act of using pre-trained models for tasks/data other than what the models were originally designed for. 

 

 

Back to Brige2AI main page

Table sorting checkbox
Off