SDMX Data IO operations

This tutorial provides an overview of how to read and write SDMX data using the pysdmx.io package. It covers the general reader and writer with a focus on data messages, and how to combine data with metadata for further processing.

Important

To use the pysdmx.io data functionalities, you need to install the pysdmx[data] extra.

For SDMX-ML support, you also need to install the pysdmx[xml] extra.

Check the installation guide for more information.

Reading

The general reader allows to read SDMX data and metadata from various formats

IO Formats supported.

It is recommended to use the general reader for all use cases, as it automatically detects the format of the file and uses the appropriate reader.

Using Read SDMX

A typical example to read data or metadata from a file, a string or a URL, using read_sdmx:

from pysdmx.io import read_sdmx
from pathlib import Path

# Read file from the same folder as this code
file_path = Path(__file__).parent / "sample.xml"

# Read from file
message = read_sdmx(file_path)

# Read from URL
message = read_sdmx("https://example.com/sample.xml")

By default, the pysdmx.io.read_sdmx() function will automatically detect the format of the file and use the appropriate reader.

The output is always a Message object, which holds the Datasets at the data attribute.

# Access the Pandas Datasets on a Data Message
data = message.data

The data is returned as a list of Pandas Dataset.

Using Get Datasets

To combine the Datasets with the related metadata, you can use the get_datasets function:

from pysdmx.io import get_datasets

datasets = get_datasets(data_path, metadata_path)

The get_datasets function will return a list of Pandas Dataset objects, each containing the data (as PandasDataframe) and the related metadata as a pysdmx.model.dataflow.Schema object.

The combination of data and structures is essential for:

Data validation: Ensuring that the data conforms to the expected structure.
Data serialization in Time Series: Converting the data into a format suitable for time series analysis (see writer tutorial).
VTL validations: Run a VTL Transformation Scheme over the data (see VTL tutorial.

This is needed when validating the data against the structure, or converting the data to other formats ( see writer tutorial). The VTL validation requires the data to be combined with the structures.

Writing

The general writer allows to write SDMX data to various formats

IO Formats supported.

It is recommended to use the pysdmx.io.write_sdmx() for all use cases, despite we include specific writers for the supported formats.

Important

To use the pysdmx.io data functionalities, you need to install the pysdmx[data] extra.

For SDMX-ML support, you also need to install the pysdmx[xml] extra.

Check the installation guide for more information.

Important

To write SDMX-ML Generic or Series messages, the PandasDataset requires to have its structure defined as a Schema object.

A typical example to write data from a Pandas Dataset to a file, using write_sdmx:

from pysdmx.io import write_sdmx
from pysdmx.io.format import Format
from pysdmx.io.pd import PandasDataset

# Replace with actual structure and data
dataset = PandasDataset(structure=..., data=...)

write_sdmx(
    dataset,
    output_path="output.csv",
    sdmx_format=Format.DATA_SDMX_CSV_2_0_0,
)

Additional arguments are available for SDMX-ML to:

Pretty print the XML output (using the prettyprint argument).
Use a custom Header (using the header argument).
Specify the dimension at observation level (using the dimension_at_observation argument). This is needed for Time Series data formats.

A typical example to write data in Time Series with a custom header (pretty printed):

Note

The dataset.structure defined as a Schema is needed for SDMX-ML Generic or Series messages. We include here a simple example on how to create a Schema object from a DataStructureDefinition. The DataStructureDefinition can be extracted from a SDMX Structures message, the FMR or created manually. See the Structures IO tutorial for more information.

from datetime import datetime

from pysdmx.io import write_sdmx
from pysdmx.io.format import Format
from pysdmx.io.pd import PandasDataset
from pysdmx.model import Organisation, DataStructureDefinition
from pysdmx.model.message import Header

dsd = DataStructureDefinition(id=...,
                              name=...,
                              components=...)

dataset = PandasDataset(data=..., structure=dsd.to_schema())

header = Header(
    id="TEST_MESSAGE",
    test=True,
    prepared=datetime.now(),
    sender=Organisation(id="MD", name="MeaningfulData"),
)

write_sdmx(
    dataset,
    output_path="output.xml",
    sdmx_format=Format.DATA_SDMX_ML_3_0,
    prettyprint=True,
    header=header,
    dimension_at_observation={"Dataflow=MD:TEST_DF(1.0)": "TIME_PERIOD"},
)

Convert between formats

To convert SDMX Data messages between formats, you can combine the get_datasets and write_sdmx functions:

from pysdmx.io import get_datasets, write_sdmx
from pathlib import Path
from pysdmx.io.format import Format

# Read the data and structures SDMX-ML messages (any supported format can be used)
datasets = get_datasets("data.xml", "structures.xml")

# Write the data to SDMX-CSV 2.0
write_sdmx(
    sdmx_objects=datasets,
    sdmx_format=Format.DATA_SDMX_CSV_2_0_0,
    output_path="output.csv",
)