Glossary

Our glossary is harmonised with the ISO Information technology — Vocabulary (ISO 2023); Information technology — Cloud computing — Taxonomy based data handling for cloud services (ISO 2020) and Information technology — Cloud computing — Interoperability and portability (ISO 2017).

Data science terms

Data and information science terms, definitions
Term Description
conceptualisation an abstract, simplified view of some selected part of the world, containing the objects, concepts, and other entities that are presumed of interest for some particular purpose and the relationships between them
data reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing Note 1 to entry: Data can be processed by humans or by automatic means.[SOURCE:ISO/IEC 2382:2015, 2121272]
database collection of data organized according to a conceptual structure describing the characteristics of these data and the relationships among their corresponding entities, supporting one or more application areas. [SOURCE:ISO/IEC 2382:2015, 2121413]
data set or dataset identifiable collection of data available for access or download in one or more formats [SOURCE:Adapted from ISO 19115-2:2009, 4.7] Beware: various conceptual and information models use different dataset definitions.
datatype defined set of data objects of a specified data structure and a set of permissible operations, such that these data objects act as operands in the execution of any one of these operations
big data extensive datasets – primarily in the data characteristics of volume, variety, velocity, and/or variability  – that require a scalable technology for efficient storage, manipulation, management, and analysis
note : Big data is commonly used in many different ways, for example as the name of the scalable technology used to handle big data extensive datasets.
data variability changes in transmission rate, format or structure, semantics, or quality of datasets
data variety range of formats, logical models, timescales, and semantics of a dataset. Note: Data veracity refers to descriptive data and self-inquiry about objects to support real-time decision-making.
data velocity rate of flow at which data is created, transmitted, stored, analysed or visualised
data volatility characteristic of data pertaining to the rate of change of these data over time
[SOURCE:ISO/IEC 2382:2015, 17.06.06]
register an official list or record of names or items; it aims to be a complete list of the objects in a specific group of objects or population, for example, all copyright-protected musical works in a country, or all legal person enterprises in another country.
data flow definition a structure which describes, categorises and constrains the allowable content of a data set that providers will supply for different reference periods. [SDMX 3.0]
datacube A statistical data set created in a multi-dimensional space (e.g., time, geography, gender), or hyper-cube, indexed by those dimensions. The term cube shouldn’t be taken literally, it is not meant to imply that there are exactly three dimensions.
data science extraction of actionable knowledge from data through a process of discovery, or hypothesis and hypothesis testing
cluster <distributed data processing> set of functional units under common control [SOURCE:ISO/IEC 2382:2015, 2120586]
scatter distribution of processing across multiple nodes in a cluster
file

named set of records treated as a unit

[SOURCE:ISO/IEC 2382:2015, 04.07.10]

knowledge base or K-base database that contains inference rules and information about human experience and expertise in a domain. 1: In self-improving systems, the knowledge base additionally contains information resulting from the solution of previously encountered problems. The terms knowledge base and K-base are standardized by ISO/IEC [ISO/IEC 2382-1:1993].
knowledge representation process or result of encoding and storing knowledge in a knowledge base. Term and definition standardized by ISO/IEC [ISO/IEC 2382-28:1995].
knowledge graph a knowledge representation that uses a graph-structured data model to represent and operate on data.
knowledge source source of information from which a knowledge base has been created for a specific kind of problem Term and definition standardized by ISO/IEC [ISO/IEC 2382-28:1995].
knowledge engineering tool functional tool designed to facilitate the rapid development of knowledge-based systems. 1. A knowledge engineering tool incorporates specific strategies for knowledge representation, inference, and control, as well as elementary modeling constructs for easy handling of typical problems.Term and definition standardized by ISO/IEC [ISO/IEC 2382-28:1995].
conceptual model representation of the characteristics of a universe of discourse by means of entities and entity relationships (ISO/IEC 2382-17:1999). In this document, we use conceptual models for models that can be used by humans and computers, too, and we use the information model term for use in IT systems.
interoperability Ability of two or more systems or applications to exchange information and to mutually use the information that has been exchanged. [SOURCE:ISO/IEC 19941:2017]
data portability Ability to easily transfer data from one system to another without being required to re-enter data.
metadata

data that define and describe other data [ISO/IEC 11179-1:2023]; we use the more functional definition “a statement about a potentially informative object.”

[SOURCE:ISO/IEC 2382:2015, 17.06.05]: metadata is data about data or data elements, possibly including their data descriptions, and data about data ownership, access paths, access rights and data volatility

NERD Named-entity recognition and disambiguation
persistent identifier A persistent identifier (or permanent Identifier or handle), is one that never changes, so that your bookmarks and links don’t break when a website or a database or an API service gets updated.
CIDOC-CRM The conceptual model of CIDOC, the standard conceptualisation of collection management systems in heritage organisations.
RiC Records in Context, a new conceptual model that replaces the four most important international archiving standards.
Wikibase Wikibase is a software system that help the collaborative management of knowledge in a central repository. It was originally developed for the management of Wikidata, but it is available now for the creation of private, or public-private partnership knowledge graphs. It is developed by Wikimedia Deutschland.
relational model data model whose structure is based on a set of relations
[SOURCE:ISO/IEC 2382:2015, 17.04.04]
non-relational model logical data model that does not follow a relational model for the storage and manipulation of data
structured data

data which are organized based on a pre-defined (applicable) set of rules

Notey: The predefined set of rules governing the basis on which the data is structured needs to be clearly stated and made known.

partially structured data

data that has some organization

Note 1: Partially structured data is often referred to as semi-structured data by industry.

Note 2: examples of partially structured data are records with free text fields in addition to more structured fields. Such data is frequently represented in computer interpretable/parsible formats such as XML or JSON

horizontal scaling

providing a single logical unit through the connection of multiple hardware and software.

Note: The example of horizontal scaling is increasing the performance of distributed data processing through the addition of nodes in the cluster for additional resources.

vertical scaling act of increasing the performance of data processing through improvements to processors, memory, storage, or connectivity.
algorithm finite ordered set of well-defined rules for the solution of a problem
definition standardized by ISO/IEC 2382-1:1993

Naming

Named entity terms, definitions
Term Description
party natural person or legal person, whether or not incorporated, or a group of either (ISNI 3.1, (ISO 2012, p15))
registrant party that requests an ISNI from the Registration Authority (ISNI 3.2 (ISO 2012, p15))
identity of a party identity of a party or a fictional character that is or was presented to the public (3.2, (ISO 2012, p15))
name (3.2, (ISO 2012, p15))
name word or phrase used for identification (Wikidata Q82799)
common name name generally used for a taxon, group of taxa or organism(s) (Wikidata Q502895)
family name part of a naming scheme for individuals, used in many cultures worldwide (Wikidata Q101352)
given name name typically used to differentiate people from the same family, clan, or other social group who have a common last name (Wikidata Q202444)
company name
geographical name Toponym. Name for a geographical entity or location. (Wikidata Q7884789)
namespace collection of identifiers with a unique meaning within the namespace (Wikidata Q873636)
thesaurus controlled vocabulary expanded with relations of broader, narrower and related terms, serving subject indexing and vocabulary control (Wikidata Q17152639)
authority file
register
name ambiguity | |

VIAF definitions

Based on the VIAF website and the OCLC glossary.

Term Description
VIAF The Virtual International Authority File (VIAF) combines multiple name authority files into a single OCLC-hosted name authority service. The goal of the service is to lower the cost and increase the utility of library authority files by matching and linking widely-used authority files and making that information available on the Web. VIAF does not create data but only processes data submitted by VIAF participants.
OCLC A nonprofit global library cooperative providing shared technology services, original research, and community programs for its membership and the library community at large. Originally “Ohio College Library Center,” later “Online Computer Library Center, Inc.” or “OCLC, Inc.”.
authority record A collection of information about a name (personal, corporate, family, or meeting), preferred title, or subject term (topical, geographic, genre, etc.).
bibliographic record A description of the physical or virtual format and intellectual content of a single resource (a book, video, map, etc.) encoded in a standardized format such as MARC.
MARC (Machine-Readable Cataloging) A family of international standards for the representation and communication of bibliographic, authority, holdings, classification, and related information in machine-readable form, based upon the Format for Information Exchange, ISO 2709. MARC standards define the three elements of record structure, content designation, and data content.  MARC 21, originally developed by the Library of Congress in the 1960s, is the most widely used of the MARC standards.  UNIMARC (Universal MARC Format), developed by the International Federation of Library Associations and Institutions (IFLA) in the 1970s, is the second most widely used MARC standard.
authority control Verifies an access point in a bibliographic record against an internal or external authority file such as the Library of Congress Authority File and, if a matching authority record exists, links the access point to the corresponding authority record. If the authority record is updated, the controlled (linked) access point in bibliographic records is updated automatically.
access point A name, term, code, etc., representing a specific entity that is indexed.
authorized access point An access point, representing an entity, formulated according to a specified standard.
surname A name used as a family name that may precede or follow a given name, depending on the culture.
multipart surname Surname that includes prefixes, hyphenated names, or names that begin with articles or prepositions.
given name A name chosen for a person at birth that identifies and differentiates that person from others in the same family. Depending on the culture a person is born into, the given name can precede or follow a surname (i.e. family name). A given name may also be known as a forename, first name, or personal name.
corporate name The name of an agency, association, business, firm, government, institution, nonprofit enterprise, performing group, etc. used as an authorized access point in a bibliographic record.
title A word, phrase, character, or group of characters, normally appearing on a resource, that names the manifestation or the work contained in it.
subtitle A word, character(s), or phrase that appears in conjunction with, and is subordinate to, a title proper of a manifestation. Also known as other title information.
preferred title A title forming the authorized access point that identifies a resource, especially if it has appeared under varying titles. Preferred titles generally serve one of two purposes: collocating versions of the resource including complete works, works in a particular literary or musical form (sonatas, songs) and distinguishing between different resources with the same or similar titles. Uniform title is the term used by AACR2, and Preferred title is the term used by RDA.
RDA Resource Description and Access.An international standard for creating library and cultural heritage resource metadata that are well-formed according to international models for user-focused linked data applications. RDA was created by the RDA Steering Committee (RSC) to replace the Anglo-American Cataloguing Rules, 2nd Edition Revised (AACR2), which were first published in 1978. RDA continues to be developed in a collaborative process led by the RSC in line with a set of objectives and principles informed by the Statement of International Cataloguing Principles.
metadata Literally, data about data. It is descriptive information about a particular data set, object, or resource, including how it is formatted, and when and by whom it was collected. Originally metadata most commonly referred to digital resources, but now can refer to any physical or electronic resource. It may be created automatically using software or entered by hand.
identifier A term, number, or name used to refer to a library resource, library metadata description, or an entry within an ontology.
subject The topic treated, or matter discussed, in a resource. What a resource is about. Subject schemes (for example, Library of Congress Subject Headings [LCSH]) use a controlled vocabulary to categorize library materials about the same subject.
subject scheme Subjects categorize library material and provide controlled access to the content of resources. Schemes define concepts and relationships between concepts to support user navigation. Subject schemes, such as Library of Congress Subject Headings (LCSH), use a controlled vocabulary; that is, they use the same terms to categorize the library material about the same subject. For example, a resource about atomic structure and another resource about neutrons can have the same subject entry, Nuclear physics.

statement: a simple element of knowledge with a true or false value; an atomic statement is a declarative sentence that attributes one property or relationship to an object or event.

semantic triple, or RDF triple or simply triple,