AI GLOSSARY - S
Definition: An algorithm in reinforcement learning that uses the state-action pair to determine the next action and updates its policy based on the current action’s reward and the next action’s expected reward.
Definition: The capability of a system, network, or process to handle a growing amount of work or its potential to be enlarged to accommodate that growth. In AI, scalability often refers to the ability of an algorithm or model to efficiently process large volumes of data.
Definition: An open-source machine learning library for the Python programming language. It features various classification, regression, and clustering algorithms and is designed to interoperate with the Python numerical and scientific libraries NumPy and SciPy.
Definition: A field within natural language processing that focuses on interpreting and understanding the meanings of words, phrases, and sentences in context.
Definition: A class of machine learning tasks and techniques that also make use of unlabelled data for training – typically a small amount of labeled data with a large amount of unlabelled data.
Definition: The process of computationally determining whether a piece of writing is positive, negative, or neutral. It is often used to gauge the sentiment of social media posts or customer reviews to help brands understand consumer attitudes.
Definition: A type of model in machine learning that is concerned with predicting sequences of data elements, such as natural language sentences, time-series data, or genetic sequences.
Definition: A simple yet very efficient approach to fitting linear classifiers and regressors under convex loss functions such as (linear) Support Vector Machines and Logistic Regression. SGD has been successfully applied to large-scale and sparse machine learning problems.
Definition: Refers to machine learning methods that do not involve neural networks or deep learning. These techniques typically involve fewer layers of processing or transformations.
Definition: A numerical measure of how alike two data objects are. High similarity between objects often leads to small distances in a feature space. This concept is used in various applications such as clustering, information retrieval, and classification.
Definition: A probabilistic technique for approximating the global optimum of a given function. Specifically, it is a metaheuristic to approximate global optimisation in a large search space for an optimisation problem.
Definition: A function that turns logits (typically in logistic regression or neural networks for multiclass classification) into probabilities that sum to one. The softmax function outputs a vector that represents the probability distributions of a list of potential outcomes.
Definition: Data that has a large percentage of zero values or missing values, which occurs especially in datasets where most of the elements are not relevant. In machine learning, handling sparse data requires specialised techniques to ensure model efficiency and accuracy.
Definition: A technique that uses the spectrum (eigenvalues) of the similarity matrix of the data to perform dimensionality reduction before clustering in fewer dimensions. It is especially useful when the structure of the individual clusters is highly non-convex.
Definition: A powerful and versatile supervised machine learning algorithm used for both classification and regression. However, it is mostly used in classification problems. In these algorithms, each data item is plotted as a point in n-dimensional space (where n is the number of features), with the value of each feature being the value of a particular coordinate. Then, classification is performed by finding the hyper-plane that differentiates the two classes very well.

