.. _physical-model: Create the Physical Data Model =============================== In this tutorial, we explore how ``pysdmx`` assists in **creating the physical data model for a dataflow** in a metadata-driven fashion, relying solely on the metadata stored in an SDMX Registry. Required Metadata ----------------- For this scenario, we need the following metadata in our SDMX Registry: Data Structure A data structure describes the expected structure of data, including various **components** (dimensions, attributes, or measures) relevant for a statistical domain. It also provides component **data types** (string, integer, dates, etc.) and specifies whether these components are **mandatory**. In short, the data structure contains all the information needed to create our physical data model. Dataflows or provision agreements could also be used to consider additional constraints, but for this tutorial, we use data structures. For additional information about SDMX artifacts, refer to the `SDMX documentation `_. Step-by-step Solution --------------------- ``pysdmx`` allows retrieving metadata from an SDMX Registry either synchronously (via ``pysdmx.api.fmr.RegistryClient``) or asynchronously (via ``pysdmx.api.fmr.AsyncRegistryClient``). The choice depends on the use case and preference, but we use the asynchronous client by default as it is non-blocking. Connecting to a Registry ^^^^^^^^^^^^^^^^^^^^^^^^ First, we need an instance of the client to connect to our target Registry. When instantiating the client, we pass the SDMX-REST endpoint of our Registry. If using the `FMR `_, the reference Implementation of the SDMX Registry specification, the endpoint is the URL at which the FMR is available, followed by ``/sdmx/v2/``. For this tutorial, we create the physical data model for the ``CPI`` data structure maintained by Eurostats (``ESTAT``), as published on the `SDMX Global Registry `_ . .. code-block:: python from pysdmx.api.fmr import AsyncRegistryClient client = AsyncRegistryClient("https://registry.sdmx.org/sdmx/v2/") Retrieving the Schema Information ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ An SDMX-REST ``schema`` query retrieves what is allowed within the context of a data structure, dataflow, or provision agreement. The SDMX Registry uses all available information (e.g., constraints) to return a "schema" describing "data validity rules" for the selected context. We use this to retrieve the metadata needed to create our physical data model. As mentioned, we create the physical data model for the ``CPI`` data structure maintained by Eurostats (``ESTAT``). .. code-block:: python schema = await client.get_schema("datastructure", "ESTAT", "CPI", "1.0") Creating the Physical Data Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Creating a physical data model depends on the selected technology (e.g., a SQL database, an AVRO schema, or a document database like MongoDB). As a minimum, expect an **identifier** for the field (or column), the expected **data type**, and whether the field **can be null**. All this information is available in the ``Schema`` object returned by the ``get_schema`` method: .. code-block:: python for component in schema.components: print(f"{component.id} ({component.dtype}). Required: {component.required}") # Example output: # FREQ (String). Required: True # SEASONAL_ADJUST (String). Required: True # REF_AREA (String). Required: True # ... Mapping SDMX Data Types ^^^^^^^^^^^^^^^^^^^^^^^ Mapping SDMX data types (e.g., ``ObservationalTimePeriod``) to the types of the selected technology is beyond the scope of this tutorial but is easily achieved using a mapping table. Fine-tuning the Physical Data Model ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ The core information for creating the physical data model is covered. Additional information is available for each component for fine-tuning. For example, SDMX allows defining **facets** to provide additional constraints beyond the data type. This information is available via the ``facets`` property. .. code-block:: python print(schema.components["COMMENT_DSET"].facets) # Example output: # max_length=1050 The **role** a component plays in the data structure (dimension, attribute, or measure) is available via the ``role`` property. Display the name or value depending on the use case. .. code-block:: python for component in schema.components: print(f"{component.id} has role: {component.role.name}") This allows, for example, creating a composite primary key out of the dimension values. Alternatively, get all dimensions (or measures or attributes) directly using the appropriate property: .. code-block:: python for component in schema.components.dimensions: print(f"{component.id}") Last but not least, SDMX distinguishes between **coded** and **uncoded** components. If the technology stack supports it, use the list of allowed codes to define the list of codes a component is allowed to have in the physical data model. The list of codes is available via the ``codes`` property: .. code-block:: python frequencies = [c.id for c in schema.components["FREQ"].codes] print(frequencies) # Example output: # ['A', 'S', 'Q', 'M', 'W', 'D', 'H', 'B', 'N'] Summary ------- In this tutorial, we created a client to retrieve metadata from the Registry and used its ``get_schema`` method to retrieve the structure details for the ``CPI`` dataflow maintained by Eurostat. We saw the type of information returned by the ``get_schema`` method and now have a good idea of how to use it to create the physical data model in our technology of choice.