Appendix B — Variables in Music Databases

We use two information models:

When repeatedly querying a music API, such as the Spotify API, we carry out an automated survey in which the respondent is not a human but a machine. Sending the same query to a well-designed API will yield a comparable answer to our question sent to the same API yesterday or tomorrow.

There are still many similarities with survey harmonisation: you would usually like to combine the data from the API with other data sources, in which case you still have to harmonise the concepts, the labelling, the translations, and the coding of your responses (processed in a dataset into variables.)

The previous Appendix introduces the key concepts and practices of survey harmonisation. When working with APIs, you do not need to harmonise question texts in human languages, because you harmonise them in a marchine-readable query language (for example, in SQL or SPARQL.) The rest of the data harmonisation workflow is the same.

B.1 String versus item

  • Slovakia (Q79) is a well-defined node in our Wikibase graph.
  • Slovakia as a string is not well-defined; it can only be understood if we add "Slovakia"@en a reference to the natural language of the string.

Whenever possible, we want to refer to well-defined nodes in the knowledge graph. For example, our entry Slovakia (Q79) states that it is equivalent with Slovakia (Q214) on Wikidata, and Wikidata connects plenty of metadata to this concept: the geographical boundaries, the fact that it is an independent state since 1993, it predecessors, capital, etc.

Our aim is to have a rich and standardised description to each variable, and as much as possible, to very constant (or attribute.) Katarína Kubošiová is a Slovak singer-songwriter, also known as Katarzia. To avoid any ambigouity with other people potentially called Katarína Kubošiová or Katarzia, we would like to refer to her with a globally unique identifier. Her ISNI identifier (ISNI: ) is isni: 0000000467220673, which identifies her with global clarity.

The metadata enrichment is possible to make data points into nodes. For example, if we conceptualise Slovakia into a node, than we can connect to this node sound recordings (regardless if they have a Slovak or English-language title) if they were registered with the Slovak national ISRC registrant’s SK prefix. We can connect Katarína Kubošiová, Katarzia, SK, Slovakia in a graph to the concept of Slovakia with less or more clarity; in this case, for example, defining that a sound recording was registered in Slovakia, or the artist known as Katarzia was born in Slovakia or sung in the Slovak language.

B.1.1 1. Access Wikibase

Login in with you account to Wikibase.

B.1.2 2. Create a New Item

Go to Special Pages

Scroll down and select: Create a New Item

Fill the form with the item’s data:

  • Language - Choose the language (en)

  • Label - Give a short name for the node, for example, Katarína Kubošiová

  • Description - Enter the item description, for example Singer-songwriter born in the Slovak Republic

  • Aliases - you can add Katarzia or any other known names here.

Click Create.

The item now is created on Wikibase. For each concept that you want to use in your research, its documentation should be present. For key persons, names, musical works, it is also advisable to have an item defined.

Note

Note: The system assigns a unique ID to every entry. For example, in our system, the ID of Ján Levoslav Bella (Slovak conductor, composer and educator), aslo known under the alias with no Slovak special characters as Jan Levoslav Bella is Q93. With Q93 you cannot make the mistake of confusing the fact that Ján Levoslav Bella is the same person as Jan Levoslav Bella.

B.1.3 3. Add Metadata Statements

You need to add further metadata statements to the question bank item. Metadata is a statement about the data. We are adding standard, basic statements in subject, predicate, and object (triplet) format to each question bank item.

B.1.3.1 Variable Representation

DDI has standard variable representation definitions. When a questionnaire will be filled out in a raw dataset, or data will be systematically queried from and API, each response will be translated into a variable. We need to define how we want to represent those answers in the resulting output dataset. (See DDI 3.3 (2020) documentation - Variable Value Representation and Question Response Domain)

Using statements you can define the representation of the variables. You can choose from the following categories:

B.1.3.2 Define the source study

Tip

For further details, please check the disco:Study class.

With the study (DDI) P270 property you must link as a statement the study where you found the concept definition. If it was a formal ontology, or Wikibase, use different properties (see below).

An example for a study: Eurobarometer 88.1 (2017) Q139

Note

Note: If the study is not yet in Wikibase, you can create an entry for it using the Create a New Item function.

B.1.4 Add national langugage translations to your concept

On Wikibase you can add different language versions to the same question.

To do so, go to Special Pages

Scroll down and select: Set Item/Property Description

Fill the form:

  • ID - The QiD of the question (for example, if you want to add a Dutch description to Ján Levoslav Bella, i.e., Slovak conductor, composer and educator, you must reference Q93.

  • Language code - the new language you want to input the question, in this case, nl.

  • Description - Write a short definition (up to 250 characters) in the new language.

Select “Set Description”.

The entry is now updated with another language label or description.