AI Glossary - 2025

Welcome to our AI glossary, which provides clear explanations of key terms in artificial intelligence and related fields like data science.

Whether you're a beginner or an expert, use this glossary to:

Clarify unfamiliar concepts
Expand your AI knowledge
Discover new areas of interest

Explore the world of AI through our comprehensive collection of terms and definitions.

A

Accuracy: The proportion of correct predictions (both true positives and true negatives) among the total number of cases examined.

Activation Function: A function applied to the weighted sum of inputs in a neural network to introduce non-linearity and help the network learn complex patterns.

AdaBoost: An ensemble learning technique that combines multiple weak learners to create a strong learner, adjusting the weights of instances based on previous classification results.

Adversarial Learning: A technique where two neural networks compete against each other to improve their performance, often used in generative models.

Agent: An entity that perceives its environment and takes actions to achieve specific goals, often used in reinforcement learning contexts.

Algorithm: A step-by-step procedure or formula for solving a problem or accomplishing a task, fundamental to AI and data science.

Annotation: The process of labelling data with relevant information, often used in supervised learning to prepare datasets for training models.

Anomaly Detection: The identification of rare items, events, or observations that significantly differ from the majority of the data.

Artificial General Intelligence (AGI): A hypothetical type of AI that would have the ability to understand, learn, and apply intelligence in a way similar to human beings across a wide range of tasks.

Artificial Intelligence (AI): The simulation of human intelligence in machines programmed to think and learn like humans, encompassing various subfields and techniques.

Artificial Neural Network (ANN): A computing system inspired by biological neural networks, consisting of interconnected nodes (artificial neurons) that process and transmit information.

Association Rule Learning: A rule-based machine learning method for discovering interesting relations between variables in large databases.

Automated Machine Learning (AutoML): The process of automating the end-to-end process of applying machine learning to real-world problems, including data preparation, feature selection, and model selection.

Autonomous Vehicle: A vehicle capable of sensing its environment and operating without human involvement, using AI techniques for navigation and decision-making.

B

Backpropagation: An algorithm used in neural networks to calculate the gradient of the loss function with respect to the weights, allowing the network to learn from its errors.

Bag of Words: A text representation method that describes the occurrence of words within a document, often used in natural language processing.

Bagging: An ensemble learning technique that combines multiple models trained on different subsets of the training data to reduce overfitting and improve generalisation.

Batch Normalisation: A technique used to improve the stability and performance of neural networks by normalising the inputs to each layer.

Bayesian Network: A probabilistic graphical model that represents a set of variables and their conditional dependencies.

BERT (Bidirectional Encoder Representations from Transformers): A transformer-based machine learning technique for natural language processing pre-training developed by Google, designed to understand the context of a word in a sentence by looking at the words that come before and after it.

Bias: In machine learning, bias can refer to:

A constant added to the input of an activation function in neural networks.
The error introduced by approximating a real-world problem with a simplified model.

Big Data: Extremely large datasets that may be analysed computationally to reveal patterns, trends, and associations.

Binary Classification: A type of classification task where the goal is to predict one of two possible outcomes.

Bioinformatics: The application of computational techniques to analyse and interpret biological data, often involving AI and machine learning methods.

Boosting: An ensemble learning technique that combines multiple weak learners to create a strong learner, focusing on instances that previous models misclassified.

Bot: An automated program designed to perform specific tasks, often used in AI applications like chatbots or web crawlers.

Bounding Box: In computer vision, a rectangle that encloses an object of interest in an image, often used in object detection tasks.

C

Chatbot: An AI program designed to simulate human conversation through text or voice interactions.

ChatGPT: A large language model developed by OpenAI, trained to engage in conversational interactions and perform a wide range of language tasks.

Classification: A supervised learning task where the goal is to predict the categorical class labels of new instances, based on past observations.

Clustering: An unsupervised learning technique that groups similar data points together based on their features.

CNN (Convolutional Neural Network): A type of neural network particularly effective for image recognition and processing, which uses convolutional layers to detect features.

Cold Start Problem: A challenge in recommender systems where the system cannot draw any inferences for users or items about which it has not yet gathered sufficient information.

Collaborative Filtering: A technique used in recommender systems that makes predictions about a user's interests by collecting preferences from many users. It can be user-based (finding similar users) or item-based (finding similar items).

Computer Vision: A field of AI that trains computers to interpret and understand visual information from the world.

Confusion Matrix: A table used to describe the performance of a classification model, showing the counts of true positive, true negative, false positive, and false negative predictions.

Content-Based Filtering: A method used in recommender systems that uses the features of items to recommend additional items with similar properties. This approach doesn't rely on user behaviour data.

Continuous Learning: An AI paradigm where the model continues to learn and adapt from a continuous stream of data, rather than from a fixed dataset.

Convergence: In machine learning, the state reached by an iterative algorithm when its output becomes stable.

Coreference Resolution: The task of finding all expressions that refer to the same entity in a text, crucial for many NLP applications.

Correlation: A statistical measure that expresses the extent to which two variables are linearly related.

Cross-validation: A model validation technique for assessing how the results of a statistical analysis will generalise to an independent dataset.

Curse of Dimensionality: The various phenomena that arise when analysing data in high-dimensional spaces that do not occur in low-dimensional settings.

Cybersecurity AI: The application of AI techniques to detect, prevent, and respond to cyber threats and vulnerabilities.

D

Data Augmentation: A technique used to increase the amount of training data by applying various transformations to existing data.

Data Cleaning: The process of detecting and correcting (or removing) corrupt or inaccurate records from a dataset.

Data Engineering: The practice of designing and building systems for collecting, storing, and analysing data at scale.

Data Lake: A centralised repository that allows you to store all your structured and unstructured data at any scale.

Data Mining: The process of discovering patterns in large datasets involving methods at the intersection of machine learning, statistics, and database systems.

Data Science: An interdisciplinary field that uses scientific methods, processes, algorithms and systems to extract knowledge and insights from structured and unstructured data.

Data Warehouse: A system used for reporting and data analysis, considered a core component of business intelligence.

Decision Tree: A tree-like model of decisions and their possible consequences, used in data mining and machine learning.

Deep Learning: A subset of machine learning based on artificial neural networks with multiple layers.

Dependency Parsing: A method for analyzing the grammatical structure of a sentence, establishing relationships between "head" words and words which modify those heads.

Dimensionality Reduction: The process of reducing the number of random variables under consideration by obtaining a set of principal variables.

Distributed Computing: A model in which components of a software system are shared among multiple computers to improve efficiency and performance.

Dropout: A regularisation technique for neural networks where randomly selected neurons are ignored during training to prevent overfitting.

E

Edge Computing: A distributed computing paradigm that brings computation and data storage closer to the location where it is needed, improving response times and saving bandwidth.

Eigenvalue: A value that, when used to multiply a vector (eigenvector), results in a scalar multiple of that vector. Important in linear algebra and used in various machine learning algorithms.

Embedding: A technique that transforms high-dimensional data into a lower-dimensional space while preserving important relationships, often used in natural language processing and recommendation systems.

Ensemble Learning: A machine learning technique that combines several base models to produce one optimal predictive model.

Entity Extraction (Named Entity Recognition): The process of identifying and classifying named entities (such as person names, organisations, locations) in unstructured text into predefined categories.

Entity Recognition: A subtask of information extraction that seeks to locate and classify named entities mentioned in unstructured text into pre-defined categories.

Epoch: One complete pass through the entire training dataset in machine learning, particularly in neural network training.

Ethical AI: The practice of developing and using AI systems in ways that are morally sound and beneficial to humanity.

Evaluation Metric: A measure used to assess the performance of a machine learning model, such as accuracy, precision, recall, or F1 score.

Explainable AI (XAI): AI systems that can provide clear explanations of their decision-making processes, making them more transparent and trustworthy.

Exploratory Data Analysis (EDA): An approach to analysing datasets to summarise their main characteristics, often with visual methods.

Expert System: An AI program that uses a knowledge base of human expertise for problem-solving, often in specific domains like medical diagnosis.

Exponential Smoothing: A time series forecasting method for univariate data that can be extended to support data with a systematic trend or seasonal component.

F

F1 Score: A measure of a model's accuracy that combines precision and recall, providing a single score that balances both concerns.

False Negative: An error in which a model incorrectly predicts a negative outcome when the actual outcome is positive.

False Positive: An error in which a model incorrectly predicts a positive outcome when the actual outcome is negative.

Feature: An individual measurable property or characteristic of a phenomenon being observed, used as input in machine learning models.

Feature Engineering: The process of using domain knowledge to extract features from raw data, enhancing the predictive power of machine learning algorithms.

Feature Selection: The process of selecting a subset of relevant features for use in model construction, aiming to improve model performance and interpretability.

Federated Learning: A machine learning technique that trains an algorithm across multiple decentralised devices or servers holding local data samples, without exchanging them.

Feedforward Neural Network: A type of artificial neural network wherein connections between nodes do not form a cycle, with information moving only in one direction.

Few-shot Learning: A machine learning approach where a model is trained to recognize new classes or perform new tasks with only a few examples.

Fine-tuning: The process of taking a pre-trained model and adapting it to a new, similar task, often with a smaller dataset.

Fuzzy Logic: A form of many-valued logic in which the truth values of variables may be any real number between 0 and 1, used to handle the concept of partial truth.

Forecasting: The process of making predictions about the future based on past and present data, often used in time series analysis.

G

Gaussian Process: A probabilistic model used for regression and classification tasks, based on the principles of Bayesian inference.

GenAI (Generative AI): AI systems capable of creating new content, such as text, images, or music, based on patterns learned from existing data.

Generative Adversarial Network (GAN): A class of machine learning frameworks where two neural networks contest with each other to generate new, synthetic instances of data that can pass for real data.

Genetic Algorithm: An optimisation technique inspired by the process of natural selection, used to find approximate solutions to search and optimisation problems.

GPT (Generative Pre-trained Transformer): A type of large language model that uses transformer architecture and is trained on vast amounts of text data to generate human-like text.

Gradient Boosting: An ensemble learning technique that builds a series of weak learners (typically decision trees) sequentially, with each new model correcting the errors of the previous ones.

Gradient Descent: An optimisation algorithm used to minimise a loss function by iteratively moving in the direction of steepest descent as defined by the negative of the gradient.

Graph Neural Network (GNN): A class of neural networks designed to work directly on graph-structured data, capable of learning from both node features and graph topology.

Grid Search: A hyperparameter tuning technique that exhaustively searches through a specified subset of the hyperparameter space of a learning algorithm.

Ground Truth: The accuracy of the training set's classification for supervised learning techniques, used to measure the accuracy of a machine learning model.

H

Heuristic: A problem-solving approach that uses practical methods or various shortcuts to produce solutions that may not be optimal but are sufficient given a limited timeframe or deadline.

Hidden Layer: Any layer in a neural network between the input and output layers, responsible for extracting and learning complex features from the input data.

Hidden Markov Model (HMM): A statistical model in which the system being modelled is assumed to be a Markov process with hidden states, often used in speech recognition and natural language processing.

Hierarchical Clustering: A method of cluster analysis that builds a hierarchy of clusters, often visualised as a dendrogram.

Hybrid Recommender Systems: Systems that combine multiple recommendation techniques (such as collaborative filtering and content-based methods) to overcome the limitations of any single approach.

Hyperparameter: A parameter whose value is set before the learning process begins, distinguishing it from other parameters that are learned during training.

Hyperparameter Tuning: The process of finding the optimal hyperparameters for a machine learning algorithm, often through methods like grid search, random search, or Bayesian optimisation.

Hypothesis Testing: A statistical method used to make inferences about population parameters based on sample data, often used in data analysis and machine learning to validate assumptions and models.

Human-in-the-loop: An approach to AI system design that incorporates human feedback and decision-making into the algorithm's operation, often used to improve accuracy and address ethical concerns.

I

Image Recognition: A technology that can identify places, people, objects, and actions in images, often using deep learning models like Convolutional Neural Networks.

Imbalanced Data: A situation in machine learning where the classes in a classification problem are not represented equally, potentially leading to biased models.

Inductive Reasoning: A method of reasoning in which a general principle is derived from specific observations, often used in machine learning to generalise from training data.

Inference: The process of using a trained machine learning model to make predictions on new, unseen data.

Information Gain: A measure of the reduction in entropy achieved by partitioning examples according to a given feature, often used in decision tree algorithms.

Information Retrieval: A field of study concerned with finding material (usually documents) of an unstructured nature (usually text) that satisfies an information need from within large collections (usually stored on computers). It encompasses techniques for search engines, recommendation systems, and other systems that help users find relevant information.

Instance-based Learning: A family of learning algorithms that compare new problem instances with instances seen in training, which have been stored in memory.

Interpretable AI: AI systems designed to be easily understood by humans, allowing users to comprehend how decisions or predictions are made.

Intrusion Detection System: An AI-powered security system that monitors network traffic for suspicious activity and policy violations.

Inverted Index: A data structure used in information retrieval systems to allow fast full-text searches, storing a mapping from words to their locations in documents.

IoT (Internet of Things): The network of physical objects embedded with sensors, software, and other technologies for the purpose of connecting and exchanging data with other devices and systems over the internet, often leveraging AI for data analysis.

Iterative Learning: A learning process where the model is refined multiple times on the same dataset, each time improving its performance based on previous iterations.

J

JSON (JavaScript Object Notation): A lightweight data interchange format that is easy for humans to read and write and easy for machines to parse and generate, often used in data transmission between a server and web application.

Jupyter Notebook: An open-source web application that allows you to create and share documents containing live code, equations, visualisations, and narrative text, widely used in data science for data cleaning, transformation, numerical simulation, and machine learning.

Joint Probability Distribution: A probability distribution for two or more random variables, which gives the probability of each variable falling in any particular range of its values, often used in probabilistic models and Bayesian networks.

Jaccard Similarity: A statistic used for gauging the similarity and diversity of sample sets, calculated as the size of the intersection divided by the size of the union of two sets, often used in text similarity and recommender systems.

K

K-means Clustering: An unsupervised learning algorithm that groups data into k number of clusters based on similarity, widely used for partitioning in data mining.

K-nearest Neighbours (KNN): A simple, versatile algorithm used for both classification and regression tasks, which makes predictions based on the k closest training examples in the feature space.

Keras: An open-source software library that provides a Python interface for artificial neural networks, acting as an interface for the TensorFlow library.

Kernel: In machine learning, particularly in support vector machines, a kernel is a function used to transform input data into a higher-dimensional space, making it easier to find a decision boundary.

Kernel Density Estimation: A non-parametric way to estimate the probability density function of a random variable based on a finite data sample.

Knowledge Base: A technology used to store complex structured and unstructured information used by a computer system, often in AI applications like expert systems and natural language processing.

Knowledge Graph: A knowledge base that uses a graph-structured data model to represent and link entities, widely used in semantic search engines and recommendation systems.

L

L1 Regularization: A method to reduce model complexity and prevent overfitting by adding the absolute value of the magnitude of coefficient as penalty term to the loss function.

L2 Regularization: Similar to L1, but uses the squared magnitude of coefficients as the penalty term, also known as ridge regression when applied to linear regression.

Labelled Data: Data that has been tagged with one or more labels identifying certain properties, characteristics, or classifications, used in supervised learning.

Lambda Function: In programming, a small anonymous function that can have any number of arguments but can only have one expression, often used in data processing and functional programming.

LangChain: A framework for developing applications powered by language models, providing tools to integrate AI models with external data sources and environments.

Latent Variable: A variable that is not directly observed but is rather inferred from other variables that are observed, often used in dimensionality reduction and probabilistic models.

Layer: In neural networks, a set of nodes that process a set of input features. Neural networks typically consist of an input layer, one or more hidden layers, and an output layer.

LDA (Latent Dirichlet Allocation): A generative statistical model that allows sets of observations to be explained by unobserved groups, commonly used for topic modeling in natural language processing.

Learning Rate: A hyperparameter that determines the step size at each iteration while moving toward a minimum of the loss function, crucial in training neural networks.

Least Squares Regression: A method for estimating the parameters of a linear regression model by minimising the sum of the squares of the differences between the observed and predicted values.

LSA (Latent Semantic Analysis): A technique in natural language processing for analyzing relationships between a set of documents and the terms they contain.

LSTM (Long Short-Term Memory): A type of recurrent neural network capable of learning long-term dependencies, particularly useful for time series and natural language processing tasks.

Linear Regression: A linear approach to modelling the relationship between a scalar response and one or more explanatory variables, widely used for predictive analysis.

Logistic Regression: Despite its name, a classification algorithm used to predict a binary outcome based on a set of independent variables.

Loss Function: A function that computes the difference between the predicted output and the actual output in a machine learning model, used to optimise the model during training.

M

Machine Learning: A subset of AI that provides systems the ability to automatically learn and improve from experience without being explicitly programmed.

Machine Translation: The task of automatically converting text or speech from one language to another while preserving meaning.

Markov Chain: A mathematical system that transitions from one state to another according to certain probabilistic rules, often used in predictive modelling and natural language processing.

Matplotlib: A plotting library for Python, often used in data science for creating static, animated, and interactive visualisations.

Matrix: A rectangular array of numbers, symbols, or expressions, arranged in rows and columns, fundamental in linear algebra and many machine learning algorithms.

Matrix Factorisation: A class of collaborative filtering algorithms used in recommender systems that work by decomposing the user-item interaction matrix into lower-dimensional matrices.

Mean Absolute Error (MAE): A measure of difference between two continuous variables, calculated as the average of the absolute differences between prediction and actual observation.

Mean Squared Error (MSE): A measure of the average squared difference between the estimated values and the actual value, commonly used as a loss function in regression problems.

Metadata: Data that provides information about other data, often used to summarise basic information about data to make finding and working with particular instances of data easier.

Model: A simplified representation of a system or process, designed to represent complex relationships in a more understandable or analysable form.

Monte Carlo Simulation: A broad class of computational algorithms that rely on repeated random sampling to obtain numerical results, often used in optimisation and generating predictions.

Multi-layer Perceptron (MLP): A class of feedforward artificial neural network that consists of at least three layers of nodes: an input layer, a hidden layer and an output layer.

Multicollinearity: A phenomenon in which one predictor variable in a multiple regression model can be linearly predicted from the others with a substantial degree of accuracy.

N

Naive Bayes: A family of simple probabilistic classifiers based on applying Bayes' theorem with strong independence assumptions between the features.

Natural Language Processing (NLP): A subfield of AI focused on the interaction between computers and humans using natural language, enabling machines to understand, interpret, and generate human language.

Neural Network: A series of algorithms that attempt to recognise underlying relationships in a set of data through a process that mimics the way the human brain operates.

Neuron: In artificial neural networks, a node that processes input, applies an activation function, and produces output, analogous to biological neurons.

Noise: Unexplained variation or randomness in a dataset that doesn't represent the underlying pattern or relationship.

Normalisation: The process of transforming numeric variables to have a standard scale, often to improve the performance and training stability of machine learning algorithms.

Numpy: A fundamental package for scientific computing in Python, providing support for large, multi-dimensional arrays and matrices, along with a collection of mathematical functions.

Nearest Neighbour Search: The optimisation problem of finding the point in a given set that is closest to a given point, often used in recommendation systems and anomaly detection.

Non-parametric Model: A model whose structure is not specified a priori but is determined from data, allowing the model complexity to grow with the size of the training set.

O

One-hot Encoding: A process by which categorical variables are converted into a form that could be provided to machine learning algorithms to do a better job in prediction.

One-shot Learning: A machine learning approach where a model can learn to recognize or classify new instances of a class from just one example.

Ontology: In AI, a formal naming and definition of the types, properties, and interrelationships of the entities that exist for a particular domain of discourse.

Optimisation: The process of maximising or minimising a function by systematically choosing input values from within an allowed set and computing the value of the function.

Outlier: An observation point that is distant from other observations, potentially due to variability in the measurement or indicating experimental error.

Overfitting: A modelling error that occurs when a function is too closely fit to a limited set of data points, potentially capturing noise and leading to poor generalisation to new data.

Output Layer: The final layer in a neural network that produces the network's prediction or output.

Optical Character Recognition (OCR): The electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text.

Object Detection: A computer vision technique for locating instances of objects in an image or video, often using deep learning models like YOLO or R-CNN.

Ordinal Encoding: A process of converting categorical variables that have an inherent order to integer values, preserving the relative order of categories.

P

Pandas: An open-source library providing high-performance, easy-to-use data structures and data analysis tools for Python.

Parameter: A variable in a model whose value is estimated from data and can be adjusted to improve the model's performance.

Part-of-Speech Tagging: The process of marking up words in a text as corresponding to particular parts of speech (noun, verb, adjective, etc.) based on both definition and context.

Pattern Recognition: The automated recognition of patterns and regularities in data, a fundamental component of machine learning.

Perceptron: The simplest type of artificial neural network, consisting of a single layer of input nodes connected directly to a layer of output nodes.

Precision: In classification, the ratio of true positive predictions to the total number of positive predictions, indicating how accurate the model is in its positive predictions.

Predictive Analytics: The use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data.

Principal Component Analysis (PCA): A dimensionality-reduction method that is often used to reduce the dimensionality of large data sets, by transforming a large set of variables into a smaller one that still contains most of the information.

Prompt: In the context of AI, a text input given to a language model to elicit a specific type of response or to guide the model's output.

Prompt Engineering: The practice of designing and refining prompts to effectively communicate with and extract desired outcomes from large language models.

Pruning: The process of reducing the size of decision trees by removing sections of the tree that provide little power to classify instances.

Python: A high-level, interpreted programming language widely used in data science and machine learning due to its simplicity and powerful libraries.

PyTorch: An open-source machine learning library based on the Torch library, used for applications such as computer vision and natural language processing.

Q

Q-learning: A model-free reinforcement learning algorithm used to find an optimal action-selection policy for any given Markov decision process.

Quantum Computing: A type of computation that harnesses the collective properties of quantum states, such as superposition and entanglement, to perform calculations. It has potential applications in machine learning and AI for solving certain types of problems much faster than classical computers.

Quadratic Discriminant Analysis: A classification algorithm that uses quadratic surfaces to separate classes, assuming that each class has its own covariance matrix.

Quantile Regression: A type of regression analysis aimed at estimating the conditional median or other quantiles of the response variable, useful when the rate of change in the conditional quantile is of interest.

Query: In the context of databases and information retrieval systems, a request for information from a database or search engine, often used in natural language processing and information retrieval tasks in AI.

Question Answering: A computer science discipline within NLP that focuses on building systems that automatically answer questions posed by humans in natural language.

R

Random Forest: An ensemble learning method that operates by constructing multiple decision trees during training and outputting the class that is the mode of the classes or mean prediction of the individual trees.

Recall: In classification, the ratio of true positive predictions to the total number of actual positive instances, indicating how well the model identifies all positive instances.

Recommender System: An AI-based system that suggests items or content to users based on their preferences, behaviour, or similarity to other users. These systems are widely used in e-commerce, streaming services, and social media platforms.

Recurrent Neural Network (RNN): A class of artificial neural networks where connections between nodes form a directed graph along a temporal sequence, allowing it to exhibit temporal dynamic behaviour.

Regression: A set of statistical processes for estimating the relationships between variables, often used for prediction and forecasting.

Regularisation: A technique used to prevent overfitting by adding a penalty term to the loss function, discouraging complex models.

Reinforcement Learning: A type of machine learning where an agent learns to make decisions by taking actions in an environment to maximise some notion of cumulative reward.

Relevance Ranking: The process of sorting search results based on their relevance to a user's query, often using machine learning algorithms to improve accuracy.

Robotics: The branch of AI that deals with the design, construction, operation, and use of robots, often integrating computer vision, natural language processing, and machine learning.

R (programming language): A programming language and free software environment for statistical computing and graphics, widely used among statisticians and data miners.

ROC Curve (Receiver Operating Characteristic Curve): A graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied.

Root Mean Square Error (RMSE): A frequently used measure of the differences between values predicted by a model and the values actually observed, often used in regression analysis.

S

Scikit-learn: A machine learning library for Python, featuring various classification, regression and clustering algorithms, designed to interoperate with Python numerical and scientific libraries.

Sentiment Analysis: The use of natural language processing, text analysis, and computational linguistics to systematically identify, extract, quantify, and study affective states and subjective information.

Sigmoid Function: An S-shaped curve that maps any input value to a value between 0 and 1, commonly used as an activation function in neural networks.

Singular Value Decomposition (SVD): A factorisation of a real or complex matrix used in many applications including dimensionality reduction and collaborative filtering.

Stochastic Gradient Descent (SGD): An iterative method for optimising an objective function with suitable smoothness properties, often used in machine learning for minimising the loss function.

Supervised Learning: A type of machine learning where the algorithm learns on a labelled dataset, providing an answer key that the algorithm can use to evaluate its accuracy on training data.

Support Vector Machine (SVM): A supervised learning model that analyses data for classification and regression analysis, particularly effective in high-dimensional spaces.

Swarm Intelligence: The collective behaviour of decentralised, self-organised systems, natural or artificial, often used in optimisation problems and robotics.

Synthetic Data: Artificially generated data that mimics real data in terms of essential characteristics, often used when real data is scarce or when privacy concerns prevent the use of real data.

Statistical Inference: The process of drawing conclusions about populations or scientific truths from data, central to many machine learning and data science applications.

T

TensorFlow: An open-source software library for dataflow and differentiable programming across a range of tasks, commonly used for machine learning applications such as neural networks.

Test Set: A subset of a dataset used to assess the likely performance of a model after it has been trained on a training dataset.

Text Classification: The task of assigning predefined categories to free-text documents, often used in spam detection, sentiment analysis, and topic labeling.

Text Mining: The process of deriving high-quality information from text, often used in natural language processing tasks.

Text Summarisation: The process of creating a concise and coherent summary of longer text documents while preserving key information and overall meaning.

Time Series: A series of data points indexed in time order, often used in forecasting and trend analysis.

Topic Modelling: A statistical modelling approach used in NLP to discover abstract topics that occur in a collection of documents.

Training Set: The subset of data used to initially fit a model in machine learning.

Transfer Learning: A machine learning method where a model developed for a task is reused as the starting point for a model on a second task, often saving training time and improving performance.

Transformer: A deep learning model architecture that uses self-attention mechanisms, forming the basis for many state-of-the-art natural language processing models.

True Negative: An outcome where the model correctly predicts the negative class in a binary classification problem.

True Positive: An outcome where the model correctly predicts the positive class in a binary classification problem.

Turing Test: A test of a machine's ability to exhibit intelligent behaviour equivalent to, or indistinguishable from, that of a human.

t-SNE (t-Distributed Stochastic Neighbor Embedding): A machine learning algorithm for visualisation based on Stochastic Neighbor Embedding, particularly well suited for the visualisation of high-dimensional datasets.

TF-IDF (Term Frequency-Inverse Document Frequency): A numerical statistic used in information retrieval to reflect how important a word is to a document in a collection or corpus.

U

Underfitting: A situation where a machine learning model is too simple to capture the underlying structure of the data, resulting in poor performance on both training and test data.

Unsupervised Learning: A type of machine learning where the algorithm learns patterns from unlabelled data, without explicit guidance on what to look for.

Uploading: In the context of AI, the hypothetical future process of transferring a human consciousness into a computer system or artificial body.

Utility Function: In decision theory and artificial intelligence, a function that expresses the preferences of an agent over a set of outcomes or states of the world.

Univariate Analysis: The simplest form of quantitative analysis where the data being analysed contains only one variable.

User Interface (UI): In the context of AI, the means by which users interact with AI systems, often designed to be intuitive and user-friendly.

V

Validation Set: A subset of the training data used to evaluate a model's performance during training, helping to tune hyperparameters and prevent overfitting.

Variance: In machine learning, a measure of how much the predictions for a given point vary between different realisations of the model.

Vector: In machine learning and data science, an array of numbers used to represent real-world objects or concepts in a way that computers can process.

Vector Space Model: A model for representing text documents as vectors of identifiers, such as index terms. It's used in information retrieval, information filtering, indexing, and relevancy rankings.

Vectorisation: The process of converting data from a scalar format to a vector format, often used to speed up computations in machine learning algorithms.

Virtual Assistant: An AI-powered software agent that can perform tasks or services for an individual based on commands or questions, such as Siri or Alexa.

Vision Transformer (ViT): A type of neural network architecture that applies the transformer model, originally designed for natural language processing tasks, to image recognition tasks.

Visualisation: The graphical representation of data or concepts, often used in data science to gain insights and communicate findings effectively.

Voice Recognition: A technology that enables the recognition and translation of spoken language into text, often used in virtual assistants and accessibility tools.

W

Weak AI: Also known as Narrow AI, it refers to AI systems designed and trained for a specific task, which cannot easily transfer their intelligence to other tasks.

Weight: In neural networks, a parameter within a node that controls the strength of the connection between neurons.

Word Embedding: A technique in natural language processing where words or phrases from the vocabulary are mapped to vectors of real numbers, capturing semantic meanings.

Word2Vec: A group of related models used to produce word embeddings, representing words as vectors in a multi-dimensional space.

Word Sense Disambiguation: The task of identifying which sense of a word is used in a sentence when the word has multiple meanings.

Workflow: In data science and AI, a sequence of data processing steps typically represented as a directed graph, often used in data preparation and model deployment.

Wrapper Method: In feature selection, a method that uses a predictive model to score feature subsets, selecting features that lead to the best model performance.

White Box Model: A type of model whose inner workings and decision-making process are transparent and interpretable, contrasting with black box models.

X

XAI (Explainable AI): An approach to artificial intelligence that focuses on making AI systems' decisions and reasoning processes understandable and interpretable by humans.

XGBoost (Extreme Gradient Boosting): An optimised distributed gradient boosting library designed to be highly efficient, flexible and portable, often used for supervised learning problems.

XML (eXtensible Markup Language): A markup language that defines a set of rules for encoding documents in a format that is both human-readable and machine-readable, often used in data storage and transfer in AI and data science applications.

Y

YOLO (You Only Look Once): A state-of-the-art, real-time object detection system that can detect multiple objects in an image with a single forward pass through a neural network.

Yield Curve: In financial data science, a curve showing several interest rates across different contract lengths for a similar debt contract, often used in predictive modelling of economic trends.

Z

Z-score: A statistical measure that quantifies the number of standard deviations by which an observation or data point is above or below the mean of a distribution. It's often used in data preprocessing and anomaly detection.

Zero-shot Learning: A problem setup in machine learning where the model is able to correctly classify or make predictions for classes that were not observed during training.

Zeroisation: In cybersecurity and AI systems handling sensitive data, it's the practice of overwriting memory locations that contained sensitive data with zeros to prevent unauthorised access to that data.

Zipf's Law: An empirical law stating that the frequency of any word in a corpus of natural language is inversely proportional to its rank in the frequency table. It's often observed in natural language processing tasks.

AI Glossary - 2025

A

B

C

D

E

F

G

H

I

J

K

L

M

N

O

P

Q

R

S

T

U

V

W

X

Y

Z

Success is a Team Sport. Let’s Do this Together!

General Inquiries

Schedule a free consultation