Dataflows and data structures
Note
Additional information about how dataflows (and related structures) can be used to drive statistical processes is available in the following tutorials:
Model for SDMX dataflows and related structures (like schemas).
pysdmx
is dataflow-centric, another area where pysdmx
is
opinionated. As such, when retrieving information about a dataflow,
information typically provided via the data structure (and related
structures like concept schemes and codelists) is already provided
as part of the response.
- class pysdmx.model.dataflow.Component(id: str, required: bool, role: Role, dtype: DataType = DataType.STRING, facets: Facets | None = None, name: str | None = None, description: str | None = None, codes: Codelist | Hierarchy | None = None, attachment_level: str | None = None, array_def: ArrayBoundaries | None = None)
A component of a dataset (aka variable), such the frequency.
Concepts are used to describe the relevant characteristics of a statistical domain. For example, exchanges rates might be described with components such as the numerator currency, the denominator currency, the type of exchange rates, etc.
Some of these components are expected to be useful across statistical domains. Examples of such components include the frequency, the observation status, the confidentiality, etc.
When using components to describe the expected structure of a statistical domain, data stewards distinguish between the components that represent what is being captured (i.e. the measures), the components that help uniquely identifying the measures (i.e. the dimensions) and the components that provide additional descriptive information about the measures (i.e. the attributes). This is the component role. The role can be D (for Dimension), A (for Attribute) or M (for Measure).
While dimensions and measures are typically mandatory, attributes may be either mandatory or optional. This is captured in the required property using a boolean value (true for mandatory components, false otherwise). This may vary with the statistical domain, i.e. a mandatory component within a particular domain may be optional in another.
While the value of some attributes is expected to potentially vary with each measurement (aka observation or data point), some others must be unique across all observations sharing the same (sub)set of dimension values. This is captured in the attachment_level property, which can be one of: D (for Dataset), O (for Observation), any string identifying a component ID (FREQ) or comma-separated list of component IDs (FREQ,REF_AREA). The latter can be used to identify the dimension, group or series to which the attribute is attached. The attachment level of a component may vary with the statistical domain, i.e. a component attached to a series in a particular domain may be attached to, say, the dataset in another domain.
The codes field indicates the expected (i.e. allowed) set of values a component can take within a particular domain. In addition to (or instead of) a set of codes, additional details about the expected format may be found in the facets and dtype fields.
- id
A unique identifier for the component (e.g. FREQ).
- required
Whether the component must have a value.
- role
The role played by the component.
- dtype
The component’s data type (string, number, etc.).
- facets
Additional details such as the component’s minimum length.
- name
The component’s name.
- description
Additional descriptive information about the component.
- codes
The expected values for the component (e.g. currency codes).
- attachment_level
The attachement level (if role = A only). Attributes can be attached at different levels such as D (for dataset-level attributes), O (for observation-level attributes) or a combination of dimension IDs, separated by commas, for series- and group-level attributes).
- array_def
Any additional constraints for array types.
- class pysdmx.model.dataflow.Components(iterable)
A collection of components describing the data.
- append(item)
Add a component to the existing list of components.
- Return type:
None
- property attributes: Sequence[Component]
Return the list of attributes.
Attributes are components that provide descriptive information about some piece of data (aka an observation or data point).
- Returns:
The list of attributes
- property dimensions: Sequence[Component]
Return the list of dimensions.
Dimensions are components that contribute to the unique identification of a piece of data (aka an observation or data point). The combination of the values for all dimensions of an observation can therefore be seen as the observation’s primary key.
- Returns:
The list of dimensions
- extend(other)
Add the components to the existing list of components.
- Return type:
None
- insert(i, item)
Add a component at the requested index.
- Return type:
None
- class pysdmx.model.dataflow.DataflowInfo(id: str, components: Components, agency: Organisation, name: str | None = None, description: str | None = None, version: str = '1.0', providers: Sequence[Organisation] = (), series_count: int | None = None, obs_count: int | None = None, start_period: str | None = None, end_period: str | None = None, last_updated: datetime | None = None, dsd_ref: str | None = None)
Extended information about a dataflow.
The information includes:
Some basic metadata about the dataflow (such as its ID and name).
Some useful metrics such as the number of observations.
The expected structure of data (i.e. the data schema), including the expected components, their types, etc.
- id
The identifier of the dataflow (e.g. CBS).
- components
The data structure, i.e. the components, their types, etc.
- agency
The organization responsible for the data (e.g. BIS).
- name
The dataflow’s name (e.g. Consolidated Banking Statistics).
- description
Additional descriptive information about the dataflow.
- version
The dataflow version.
- providers
The organizations providing the data.
- series_count
The number of series available in the dataflow.
- obs_count
The number of observations available in the dataflow.
- start_period
The oldest period for which data are available.
- end_period
The oldest period for which data are available.
- last_updated
When the dataflow was last updated.
- dsd_ref
The URN of the data structure used by the dataflow.
- class pysdmx.model.dataflow.Role(value)
The various roles a component can play.
- ATTRIBUTE = 'A'
The component provides descriptive information about the data.
- DIMENSION = 'D'
The component helps identifying data (e.g. primary key).
- MEASURE = 'M'
The component holds a value we measure or collect.
- class pysdmx.model.dataflow.Schema(context: str, agency: str, id: str, components: Components, version: str = '1.0', artefacts: Sequence[str] = (), generated: datetime = datetime.datetime(2024, 2, 23, 8, 32, 44, 237058))
The allowed content within a certain context.
This is the equivalent to the result of a schema query in the SDMX-REST API.
The response contains the list of allowed values for the selected context (one of data structure, dataflow or provision agreement), and is typially used for validation purposes.
- context
The context for which the schema is provided. One of datastructure, dataflow or provisionagreement.
- agency
The agency maintaining the context (e.g. BIS).
- id
The ID of the context (e.g. BIS_MACRO).
- components
The list of components along with their allowed values, types, etc.
- version
The context version (e.g. 1.0)
- artefacts
The URNs of the artefacts used to generate the schema. This will typically include the URNs of data structures, codelists, concept schemes, content constraints, etc.
- generated
When the schema was generated. This is useful for metadata synchronization purposes. For example, if any of the artefacts listed under the artefacts property has been updated after the schema was generated, you might want to regenerate the schema.