Tutorial on documenting your data using tables#

This is a step-by-step guide to help you document your resource with the SSbD ontology using table-based workflows (for example, Excel). Note that you are free to use which ever editor you like, as long as the resusling tables can be converted into csv-files and the columns are named according to the specifications below.

What is a resource and why document it?#

A resource is the thing you want to describe and share in a structured way. In practice, a resource can be a dataset, a software tool, a workflow, a model, or another documented digital object. Note that a resource can be either a class (a concept) or an individual (a specific instance of a concept). That means that you can document a type of experiment or simulation (a class) as well as a specific experiment or simulation (an individual). Documenting your resource means capturing the key metadata about it in a clear and reusable format. This is important so that existing datasets, sodtware tools and other resources can be easily discovered, understood, and reused by others. It also allows you to connect your resource to other related resources, which can enable new insights and applications.
The goal of this guide is to help you capture the key metadata for that resource in a clear and reusable format.

Quick introduction to classes and individuals#

When documenting data, we need to describe both classes (concepts of things) and individuals (actual instances of things). For example, there is a difference between the concept of a pen and one specific pen. If I ask my child to bring me a pen, I do not care which pen she brings, but she understands the concept and brings an actual individual pen.

Cartoon illustrating a request for a pen and the return of an individual pen
Figure 1. Cartoon illustrating a request for a pen (concept) and the return of an individual pen. Image created partly with ChatGPT (OpenAI), 2026.

Similarly, we can ask for a specific type of experiment or a type of dataset. However, this requires that the concepts of these types are well documented so that both machines and humans can help us find the right resources. In Figure 2 below, examples of resources, both classes (concepts) and individuals, related to running and experiment/instrument are shown.

Cartoon illustrating the concept and individuals related to an experiment.
Figure 2. Cartoon illustrating some concepts and individuals related to of experiment. Image created partly with ChatGPT (OpenAI), 2026.

The various concepts and individuals are connected together with properties, which can be used to describe the relationships between them. For example, we can document that a specific dataset (an individual) is an instance of a specific dataset type (a class). This dataset is also generated by a specific experiment (another individual), which is an instance of a specific experiment type (a class). The relationship between the classes (concepts) is maintained at the individual level. This means that the operator can ask many different questions that connect the various resources together, such as “find me all datasets that were generated by experiments of type X”, or “I want a dataset of type Y, how should I proceed?”. Even better, we can connect various concepts together and for instance create workflows that connects datasets and simulations. This is more generally shown below.

Concepts of datasets and simulations can be connected together in a workflow because we have documented what the inputs and outputs of the various simulations are.
Figure 3. Concepts of datasets and simulations can be connected together in a workflow because we have documented what the inputs and outputs of the various simulations are. Note that the relations between classes, i.e. the concepts, must be implemented as restrictions and each hasInput and hasOutput should actually include a qualifier e.g. hasInput some.

Documenting your resources in practice#

It is important to realise that we often want to document both existing datasets and concepts (such as activities) that generate datasets, as well as types of datasets that do not yet exist. We are here presenting examples of how to document dataset types (as concepts), computation software (as individuals), indicators (as concepts), and computations (as concepts) in tables, where each row documents a class or individual. The tables can then be used to generate a knowledge graph (expressed with RDF) that can be queried and integrated with other resources.

In Figure 4 below some examples of tables and how they are related are shown.

Examples of tables for documenting resources and how they are related.

How to construct the tables#

  • Each row in the table documents a resource.

  • The column labels in the header row are mapped to properties in an ontology (and belong therefore to a controlled vocabulary).

  • There is a column @id, which is the unique identifier for the resource. For classes, this is an IRI (Internationalized Resource Identifier) that uniquely identifies the concept in an ontology. For individuals, this is an IRI that identifies the specific instance. (NB: in the Google spreadsheets this column is called identifier, but it is the same as @id. Please use @id in your own spreadsheets.).

  • There are one (or more) @type columns, that indicate whether the resource is a class or an individual, and also what kind of class an individual is a member of (for example, a dataset type, a software tool, etc.). For classes, only one @type column is allowed with value owl:Class. Individuals can have more than one type.

  • For classes, there is a column subClassOf, which indicates the parent class that this class is a subclass of. Several subClassOf columns are permitted. For individuals, there is no subClassOf column because individuals are not subclasses of anything.

  • For both classes and individuals, there are columns for the properties that we want to document (for example, label, description, hasInput, hasOutput, etc.). The properties that we want to document depend on the type of resource we are documenting and the use case we have in mind. The Object Properties, Annotation Properties and Data Properties in the SSbD Core Ontology are a good starting point for deciding which properties to document for each resource. They can be found in the Reference Documentation. (In the google spread sheets, the columns are filled with chosen properties.)

  • The values in the table should be filled according to the definitions of the properties in the SSbD Core Ontology.

    • All annotation properties should be filled with a literal value (for example, a string or a number).

    • All object properties should be filled with a IRI that identifies another resource (for example, a dataset type, a software tool, etc.). Here it is important to make sure that you refer to something within the correct range.

Table templates#

Below are some example templates that are used for documenting ssbd related resource within the pink project. Note that these are just examples and that you can create your own templates based on the properties that are relevant for your use case. The important thing is to make sure that the columns are named according to the specifications above and that the values are filled according to the definitions of the properties in the ssbd core ontology.

[!NOTE] Another point: here prefixes (the term before the colon) are set to ‘pink’, because these tables are examples within the pink project. Typically, each provider, even within a project has their own prefix (which is short for their own namespace).

1. Dataset type table (class-level documentation)#

@id

label

description

subClassOf

theme

@type

pink:ToxicityDataset

Toxicity dataset

Dataset type for toxicity endpoints.

pink:DatasetType

pink:SafetyAndSustainability

owl:Class

pink:ExposureDataset

Exposure dataset

Dataset type for exposure-related data.

pink:DatasetType

pink:SafetyAndSustainability

owl:Class

2. Computation type table (class-level documentation)#

@id

title

hasInput

hasOutput

subClassOf

@type

pink:activity_qsar_prediction

QSAR prediction

pink:ToxicityDataset

pink:ToxicityDataset

pink:Computation

owl:Class

pink:activity_screening

Activity Screening

pink:ExposureDataset

pink:ToxicityDataset

pink:Computation

owl:Class

3. Software table (individual documentation)#

@id

title

description

tierLevel

implementsModel

hasAPI

accessRights

pink:mytool-v1

MyTool v1

In-house software for endpoint prediction.

pink:Tier3

pink:QSARModel

https://example.org/api/mytool

rights:PUBLIC

pink:mytool-v2

MyTool v2

Updated release with improved descriptors.

pink:Tier3

pink:QSARModel

https://example.org/api/mytool/v2

rights:RESTRICTED

4. Agent table (individual documentation)#

@id

name

@type

https://orcid.org/0000-0000-0000-0001

Example Researcher

prov:Agent

https://example.org/org/acme-lab

Acme Lab

prov:Agent

5. Dataset table (individual documentation)#

@id

@type

dcterms:title

dcterms:description

dcterms:publisher

https://example.org/dataset/tox-001

pink:Dataset

My toxicity dataset

Measurements from in vitro assay campaign.

https://orcid.org/0000-0000-0000-0001

https://example.org/dataset/exposure-001

pink:Dataset

My exposure dataset

Exposure observations collected in 2025.

https://example.org/org/acme-lab