.. _vtl-handling: Validate data using VTL ======================= In this tutorial, we shall examine the utilization of ``pysdmx`` for reading **data** and **metadata** to generate a dataset and VTL script and the ``vtlengine`` library to execute the VTL script. .. note:: This tutorial assumes that you have a basic understanding of SDMX and VTL concepts. If you are new to these topics, please refer to the `VTL documentation `_ and `SDMX-VTL documentation `_ .. important:: To use the VTL functionalities, you need to install the `pysdmx[vtl]` extra. This tutorial requires the `pysdmx[data]` extra to handle SDMX datasets as Pandas DataFrames, and the `pysdmx[xml]` extra to read and write SDMX-ML messages. Check the :ref:`installation guide ` for more information. Numerous types of operations can be performed; however, this tutorial will focus exclusively on the fundamental ones. .. contents:: :local: :depth: 2 Step-by-Step Solution --------------------- Using pysdmx we will read the Datasets, its Structures and the VTL objects. For the purpose of this tutorial, we shall employ the XML files ``structures.xml`` (data structure), ``data.xml`` (data) and ``vtl_ts.xml`` (Transformation and VTLMapping). Files used in the example can be found here: - :download:`data.xml <../_static/data.xml>` - :download:`structures.xml <../_static/structures.xml>` - :download:`vtl_ts.xml <../_static/vtl_ts.xml>` Reading Data and Structures messages ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The initial step involves reading the data structure and data from the SDMX files. The following code snippet demonstrates the process: .. code-block:: python from pathlib import Path # Path to the structures file (same directory as this script) path_to_structures = Path(__file__).parent / "structures.xml" # Path to the data file (same directory as this script) path_to_data = Path(__file__).parent / "data.xml" Now we have the paths to the files, we can read the data structure and data and extract the data: .. code-block:: python from pysdmx.io import get_datasets # With the data and metadata path we extract the datasets with their related structures datasets = get_datasets(path_to_data, path_to_structures) .. important:: Check the :ref:`Get Datasets method docs ` for more information on how to generate a PandasDataset with both data and related structures. This method is the recommended way to read SDMX data and structures, as it combines them in a single Pandas Dataset, allowing you to work with the data and its structure seamlessly. Getting the Transformation Scheme and VTL Mapping ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ For the next step, we have three options available. We can read the transformation scheme and VTL mapping from a file, we can read a file from a Fusion Registry URL or we can create the pysdmx Model objects. .. code-block:: python from pysdmx.io import read_sdmx from pathlib import Path # Path to the transformation file path_to_vtl_ts = Path(__file__).parent / "vtl_ts.xml" # Read the transformation file with read_sdmx message = read_sdmx(path_to_vtl_ts) # Get the Transformation Schemes ts = message.get_transformation_schemes()[0] # Get the VTL Mapping Scheme mapping_scheme = message.get_vtl_mapping_schemes()[0] # Get the VTL Dataflow Mapping from the items, assuming the first item is the one we want dataflow_mapping = mapping_scheme.items[0] Optionally, we can also create the Transformation Scheme and VTL Mapping objects directly in code. .. code-block:: python from pysdmx.model import VtlDataflowMapping, DataflowRef, VtlMappingScheme, TransformationScheme, Transformation # Mapping using VTLDataflowMapping object: dataflow_mapping = VtlDataflowMapping( dataflow=DataflowRef(agency="MD", id="TEST_DF", version="1.0"), dataflow_alias="DS_1", id="VTL_MAP_1", name="VTL Mapping 1", ) mapping_scheme = VtlMappingScheme( id="VTL_MAP_SCHEME_1", name="VTL Mapping Scheme 1", version="1.0", agency="MD", items=[dataflow_mapping], ) # Transformation Scheme object ts = TransformationScheme( id="TS1", version="1.0", agency="MD", vtl_version="2.1", name="Transformation Scheme 1", items=[ Transformation( id="T1", uri=None, urn=None, name="Transformation 1", description=None, expression="DS_1 [calc Me_4 := OBS_VALUE]", is_persistent=True, result="DS_r", annotations=(), ), ], vtl_mapping_scheme=mapping_scheme ) You may download as well directly the structures from the FMR or the SDMX API: - :ref:`FMR tutorial ` - :ref:`SDMX-REST tutorial ` At this point you may use the :ref:`VTL Toolkit Model validations ` to validate the Transformation Scheme. Running the VTL Script ^^^^^^^^^^^^^^^^^^^^^^ .. _run_sdmx: Now that we have the VTL script, we can run it using the `vtlengine.run_sdmx method `_. .. code-block:: python from vtlengine import run_sdmx # Run the VTL script with the datasets and the dataflow mapping run_sdmx(script=ts, datasets=datasets, mappings=dataflow_mapping) The `run_sdmx` method will execute the Transformation Scheme (VTL Script) using the provided datasets and dataflow mapping. Summary ------- In this tutorial, we have learned how to read SDMX data and metadata using ``pysdmx``, extract the Pandas Datasets, and run a VTL script using the ``vtlengine.run_sdmx`` method. Useful additional links: - `VTL Engine Docs `_. - `10 Minutes to VTL Engine `_. - `VTL Documentation `_ - `SDMX-VTL documentation `_