AI GLOSSARY - K
Definition: A property of anonymised data where if a person’s data is released, they cannot be re-identified from the data within a group of at least k individuals, even if the entire dataset is known. This is crucial in data privacy.
Definition: A popular unsupervised machine learning algorithm that groups data into k number of clusters by finding centroid points and assigning data points to the nearest cluster.
Definition: A simple, instance-based learning algorithm where the function is only approximated locally and all computation is deferred until classification. The output consists of a class membership, which is determined by a majority vote of its k nearest neighbours.
Definition: Apache Kafka is a distributed streaming platform that lets you publish and subscribe to streams of records, store records in a fault-tolerant way, and process streams of records as they occur. Kafka is widely used in real-time data pipelines and streaming applications.
Definition: In machine learning, a kernel is a function used in kernel methods to enable various algorithms to operate in high-dimensional, implicit feature spaces without ever computing the coordinates of the data in that space explicitly.
Definition: A class of algorithms for pattern analysis, whose best known member is the support vector machine (SVM). They are used for various pattern recognition tasks such as classification and regression.
Definition: A technique used in video processing where the most representative or significant frames of a video sequence are automatically extracted. This is useful for video summarisation and indexing.
Definition: A model validation technique used in machine learning that partitions the data into k subsets and holds out one subset to use as the test set and the others as the training set. This process is repeated k times with each subset used as the test set once.
Definition: A technology used to store complex structured and unstructured information used by a computer system. In AI, a knowledge base is used to optimise information retrieval and support inference engines.
Definition: The field of artificial intelligence dedicated to incorporating knowledge from humans into systems that can simulate human cognitive processes in solving complex problems.
Definition: A knowledge base that uses a graph-structured data model or topology. These graphs are often used by search engines (like Google) to enhance the search results with semantically structured knowledge.
Definition: Involves the ways in which knowledge can be represented in artificial intelligence applications. It deals with artificial intelligence planning and high-level cognition, including problem-solving.
Definition: A measure of how one probability distribution diverges from a second, expected probability distribution. In machine learning, it’s used as a measure of loss when optimising classification and other tasks.
Definition: A statistical measure that defines how heavily the tails of a distribution differ from the tails of a normal distribution. In data analysis, kurtosis is used to describe the shape of a probability distribution and can help in identifying outliers.
Definition: A collection of databases dealing with genomes, biological pathways, diseases, drugs, and chemical substances. In bioinformatics and systems biology, KEGG is used for linking genomic information with higher order functional information.

