Configure your processes
The previous tutorials illustrated how pysdmx
can facilitate specific
steps (e.g., validation, mapping, etc.) in our statistical processes in a
fully metadata-driven fashion. However, configuring these steps themselves—
tweaking them to suit our needs—remains a crucial aspect. This tutorial
addresses the question of how we can achieve this using pysdmx
and SDMX
reference metadata.
For example, the tutorial about validation shows how SDMX metadata can be used to validate input data. But what should we do in case some data are reported as invalid? Should we quarantine the entire input? Or should we quarantine only the subset of invalid data and proceed to the next step with the subset of valid data?
The purpose of this tutorial is to show how we can provide such configuration
details, using pysdmx
and SDMX reference metadata.
Required metadata
For a scenario where we need to configure steps in our statistical processes, the following metadata in our SDMX Registry is essential:
Metadataflows: A set of metadata attributes about “something” and related artifacts (concept schemes, codelists, and metadata structure) are required.
Metadatasets: A metadata report provides values for the metadata attributes relevant to the target to which the report refers (e.g., a dataflow).
Dataflows: A set of data about a statistical domain and related artifacts (concept schemes, codelists, and data structure) are required.
For additional information about the various types of SDMX artifacts, please refer to the SDMX documentation.
Step-by-step solution
Defining the metadata
In a scenario where we receive a data submission for validation, mapping, and integration, each step can be configured differently. Configuration options may depend on the ingested data or business unit practices. For instance, considering validation:
The data received might include only what has changed compared to the previous submission. Alternatively, it could be a complete dataset, requiring different validation approaches.
In case of validation problems, businesses may choose to quarantine only the invalid data, proceeding to the next step with the subset of valid data. Others may prefer quarantining the entire submission.
These configuration options can be captured using SDMX reference metadata. To do this:
Create a Metadata Structure Definition with the configuration options using concepts and coded concepts.
Define the type(s) of attachment targets (e.g., a dataflow, a provision agreement).
Define a metadataflow (e.g., with the ID
DCO
for Dataflow Configuration Options) for which metadata reports will be provided.Provide metadata reports (metadatasets) attached to the desired targets, defining their different configuration options.
For example, for the BIS_MACRO
dataflow maintained by BIS
, options
could include:
partial_update
set to the boolean valuetrue
(indicating acceptance of only new or updated data).on_validation_error
set to codeF
(Fail), signifying that the entire submission must be quarantined in case of validation issues.structure_map
set to the URN of the structure map to be used for mapping data from theBIS_MACRO
dataflow structure to its target structure.on_mapping_error
set to codeI
(Ignore), as only a subset of data is mapped.
Connecting to a Registry
pysdmx
allows retrieving metadata from an SDMX Registry in either a
synchronous (via pymedal.fmr.RegistryClient
) or asynchronous fashion
(via pymedal.fmr.AsyncRegistryClient
). The choice depends on your use
case. The asynchronous client is often preferred as it is non-blocking.
To connect to your target Registry, instantiate the client by passing the
SDMX-REST endpoint. If using the
FMR,
the endpoint is the URL at which the FMR is available, followed by
/sdmx/v2/
.
from pysdmx.fmr import AsyncRegistryClient
client = AsyncRegistryClient("[endpoint_comes_here]")
Retrieving configuration details
As mentioned earlier, we aim to retrieve configuration details for the
BIS_MACRO
dataflow. The metadata report for the DCO_BIS_MACRO
ID
can be obtained using the get_report method:
report = await client.get_report("BIS", "DCO_BIS_MACRO", "1.0")
Iterate over the report to print configuration options:
for attribute in report:
print(attribute)
In practice, instead of printing, these attributes can be used to drive
process steps. For example, a validation step can check the value of
partial_update
to determine whether mandatory attributes need validation.
check_mandatory = report["partial_update"]
Summary
This tutorial demonstrated how to create a client to retrieve metadata from
our Registry. Using the get_report method
, we retrieved configuration
options for the BIS_MACRO
dataflow. This information can now be
utilized to customize the behavior of statistical processes.