Using VTL for Validation

Important

A seamless integration of pysdmx and vtlengine will modify this tutorial. The current version is a placeholder for the upcoming changes showing the use of both libraries separated. For the latest updates on VTL usage, please check issue #158.

In this tutorial, we shall examine the utilization of pysdmx for reading data and metadata to generate and operate on datapoints using vtlengine.

Numerous types of operations can be performed; however, this tutorial will focus exclusively on the fundamental ones.

Required Metadata

For the present scenario, the required metadata is contingent upon the desired operations. For reference please check sdmx to vtl documentation

Step-by-Step Solution

pysdmx facilitates the reading of data and metadata from an SDMX file or service. For the purpose of this tutorial, we shall employ the XML files structures.xml (data structure) and data.csv (data).

Reading the Data

The initial step involves reading the data structure and data from the SDMX files. The following code snippet demonstrates the process:

from pathlib import Path

# Path to the structures file in SDMX-ML 2.1 (same directory as this script)
path_to_structures = Path(__file__).parent / "structures.xml"

# Path to the data file
path_to_data = Path(__file__).parent / "data.csv"

# Get Structures SDMX Message
structures_msg = read_sdmx(path_to_structures)

# Get Data message
data_msg = read_sdmx(path_to_data)

Extracting the Data and Data Structure

After reading the data and metadata, the next step is to extract the data and data structure from the SDMX messages. The following code snippet demonstrates the process using the Short URN SDMX_TYPE=AGENCY_ID:ID(VERSION)

# Extract the data structure and data for DS_1
data_structure_1 = structures_msg.get_data_structure_definition("DataStructure=MD:DS_1(1.0)")
data_1 = data_msg.get_data("DataStructure=MD:DS_1(1.0)")

# Extract the data structure and data for DS_2
data_structure_2 = structures_msg.get_data_structure_definition("DataStructure=BIS:DS_2(1.0)")
data_2 = data_msg.get_data("DataStructure=BIS:DS_2(1.0)")

To construct the datapoint, the metadata must be converted to the VTL format using the to_vtl_json upcoming DataStructureDefinition method:

from pysdmx.model.dataflow import Component, DataStructureDefinition, Role
from pysdmx.model.__utils import VTL_DTYPES_MAPPING, VTL_ROLE_MAPPING

def to_vtl_json(
    dsd: DataStructureDefinition, path: Optional[str] = None
) -> Optional[Dict[str, Any]]:
    """Formats the DataStructureDefinition as a VTL DataStructure."""

    dataset_name = dsd.id
    components = []
    NAME = "name"
    ROLE = "role"
    TYPE = "type"
    NULLABLE = "nullable"

    _components: List[Component] = []
    _components.extend(dsd.components.dimensions)
    _components.extend(dsd.components.measures)
    _components.extend(dsd.components.attributes)

    for c in _components:
        _type = VTL_DTYPES_MAPPING[c.dtype]
        _nullability = c.role != Role.DIMENSION
        _role = VTL_ROLE_MAPPING[c.role]

        component = {
            NAME: c.id,
            ROLE: _role,
            TYPE: _type,
            NULLABLE: _nullability,
        }

        components.append(component)

    result = {
        "datasets": [{"name": dataset_name, "DataStructure": components}]
    }
    if path is not None:
        with open(path, "w") as fp:
            json.dump(result, fp)
        return None

    return result

vtl_data_structure_1 = to_vtl_json(data_structure_1)
vtl_data_structure_2 = to_vtl_json(data_structure_2)

Preparing the Dictionary

To create the datapoint, a dictionary containing the required data and structures must first be prepared. The arguments data_structures and datapoints support the following types:

  • Dict[str, Any]

  • Path

  • List[Union[Dict[str, Any], Path]]

The example below uses dictionaries for simplicity:

vtl_data_structures = {
    "DS_1": vtl_data_structure_1,
    "DS_2": vtl_data_structure_2,
}

datapoints = {
    "DS_1": data_1,
    "DS_2": data_2,
}

Defining the Expression and Execution

Next, define the expression to be executed and utilize the run method of vtlengine to perform the operation. The following example demonstrates the addition of the datapoints DS_1 and DS_2, with the result assigned to a new datapoint DS_r:

For reference please check vtlengine run documentation

import vtlengine

expression = "DS_r <- DS_1 + DS_2;"

run_result = run(
    script=expression,
    data_structures=vtl_data_structures,
    datapoints=datapoints,
    return_only_persistent=True,
)