Map Your Data
In this tutorial, we explore how pysdmx
facilitates mapping data in
a metadata-driven fashion, relying solely on the metadata stored in an SDMX
Registry.
Required Metadata
SDMX supports various mapping rules, ranging from simple mappings (e.g.,
mapping a list of non-standard country codes to ISO 3166 2-letter country
codes) to more complex ones (e.g., many-to-many and time-dependent mapping
rules). To support the definition of mapping rules, SDMX offers structure
maps, component maps, representation maps, fixed value maps,
epoch maps, and date pattern maps. pysdmx
supports all these
types except epoch maps. Examples demonstrating how these can be used to
define mapping rules are provided below.
For additional information about the various types of SDMX artifacts, please refer to the SDMX documentation.
Step-by-step Solution
pysdmx
allows retrieving metadata from an SDMX Registry in either a
synchronous manner (via pymedal.fmr.RegistryClient
) or asynchronously
(via pymedal.fmr.AsyncRegistryClient
). The choice depends on the use case
(and preference), but we tend to use the asynchronous client by default as
it is non-blocking.
Connecting to a Registry
First, we need an instance of the client to connect to our target Registry.
When instantiating the client, we pass the SDMX-REST endpoint of the Registry.
If we use the FMR,
i.e., the reference Implementation of the SDMX Registry specification, the
endpoint will be the URL at which the FMR is available, followed by
/sdmx/v2/
.
from pysdmx.fmr import AsyncRegistryClient
client = AsyncRegistryClient("[Your_endpoint_comes_here]")
Retrieving Simple Code Mappings
A common use case is mapping codes used by external parties to codes used by
internal processes. For example, a third party might use their own “standard”
to represent currencies (e.g., C$
for the Canadian dollar), while
internal data compilation processes prefer widely adopted standards, such as
ISO 4217 3-letter currency codes (e.g., CAD
). This type of mapping
(C$ => CAD
) is expressed in SDMX using a RepresentationMap
.
pysdmx
allows retrieving such mappings via the get_code_map
method
exposed by the client. For example, let’s assume the BIS maintains a map of
the IMF 3-digit country codes to the ISO 3166 2-letter country codes, and
that the ID of this map is IMF_2_ISO
:
rm = client.get_code_map("BIS", "IMF_2_ISO")
Now we have a list of code maps that can be iterated over. Each mapping includes the source and target codes and may include business validity details describing when the association is valid (via its valid_from and valid_to properties).
for m in rm:
print(m)
# Output example:
# ValueMap(source='146', target='CH', valid_from=None, valid_to=None)
Applying Structure Maps
While the example above is quite simple, SDMX also allows more complex mappings, such as mapping a source dataflow to a target dataflow.
Let’s assume the following mapping definition:
Source |
Target |
Comp Type |
Mapping Type |
---|---|---|---|
FREQ |
Dimension |
Fixed Value |
|
CONTRACT |
CONTRACT |
Dimension |
Representation Map |
OPT_TYP |
OPTION_TYPE |
Dimension |
Implicit |
ACTIVITY_DATE |
TIME_PERIOD |
Dimension |
Date Pattern |
TO |
TO |
Measure |
Implicit |
OI |
OI |
Measure |
Implicit |
CONF_STATUS |
Attribute |
Fixed Value |
In a nutshell, we need to:
Copy the values for
OPTION_TYPE
,TO
, andOI
(Implicit).Map codes for
CONTRACT
(Representation Map).Parse dates and then reformat them for
ACTIVITY_DATE
(Date Pattern).Set fixed values for
FREQ
andCONF_STATUS
(Fixed Value).
Let’s retrieve the structure map, using the get_mapping
function exposed
by the client, assuming its ID is SRC_2_TGT
and it is maintained by the
BIS:
mapping = client.get_mapping("BIS", "SRC_2_TGT")
The object returns the different mapping rules that need to be applied, organized by mapping types. Each mapping type is available via its property.
Copying Values
Some values merely need to be copied from the source to the target. This type
of mapping can be retrieved via the implicit_maps
property. As we know,
this is the case for OPTION_TYPE
, TO
, and OI
.
for m in mapping.implicit_maps:
print(m)
# Output example:
# ImplicitComponentMap(source='OPT_TYP', target='OPTION_TYPE')
As seen, the operation to be applied is fairly simple:
The value must be copied from the source to the target.
The target component might need to be renamed (this is the case for
OPT_TYP
).
Setting Fixed Values
Another fairly simple case is setting fixed values. This is the case for
FREQ
and CONF_STATUS
, where we need to set the value to M
(Monthly) and C
(Confidential) respectively.
for m in mapping.fixed_value_maps:
print(m)
# Output example:
# FixedValueMap(target='FREQ', value='M')
Mapping Codes
Then we have one component (CONTRACT
) for which the values in the source
need to be mapped to another value in the target, using a mapping table. Such
mappings can be retrieved via the component_maps
property:
for m in mapping.component_maps:
print(m)
# Output example:
# ComponentMap(
# source='CONTRACT',
# target='CONTRACT',
# values=[
# ValueMap(source='PROD TYPE', target='_T', valid_from=None, valid_to=None),
# ValueMap(source=re.compile('^([A-Z0-9]+)$'), target='\\1', valid_from=None, valid_to=None)
# ]
# )
As can be seen, this mapping is quite interesting:
Whenever we find the value
PROD TYPE
in the source, we need to map it to_T
in the target. This is easy.But the next one is a … regular expression with a capture group. Basically, it says that whatever is between the beginning and the end of the cell should be copied over to the target, i.e. an implicit mapping…
From the above, we can learn two things:
Mappings don’t necessarily map simple values to another; sometimes, it can be a complex regular expression, with one or more capture groups.
Mappings need to be executed in order. In the example above, if the regular expression was executed first,
PROD TYPE
would never be matched to_T
.
At this stage, it is also worth noting that SDMX allows mapping N components
in the source to N components in the target (for example, 2 components in the
source to 6 components in the target). This type of mapping is available via
the multiple_component_maps
property. The objects returned are similar to
the objects returned by component_maps
except that the source and target
properties, as well as their values, allow a list of values instead of a
single one.
Reformatting Dates
We still need to map one component, i.e. ACTIVITY_DATE
. For this, we need
to use a date pattern. Such mapping types are available via the date_maps
property:
for m in rm.date_maps:
print(m)
# Output example:
# PatternMap(source='ACTIVITY_DATE', target='TIME_PERIOD', pattern='MM/dd/yyyy', frequency='M')
Here, we need to parse the date using the supplied pattern, MM/dd/yyyy
,
i.e., dates like 12/25/2013
. Once this is done, we need to format them to
SDMX reporting periods of monthly frequency (i.e., 2013-12
in this
example). Such operations can be achieved using Python strptime
and
strftime
provided by the datetime
module.
We have now mapped the 5 components in the source to the 7 components in the target, thereby completing the work needed for this tutorial.
Summary
In this tutorial, we have created a client to retrieve metadata from the SDMX
Global Registry, and we have used the get_mapping
and get_code_map
methods to retrieve mapping definitions.
This tutorial only scratches the surface of what SDMX mappings can do.
Nevertheless, we now have a good idea of how pysdmx
can help
write SDMX mapping code in Python.