Dataflows and data structures

Note

Additional information about how dataflows (and related structures) can be used to drive statistical processes is available in the following tutorials:

Model for SDMX dataflows and related structures (like schemas).

pysdmx is dataflow-centric, another area where pysdmx is opinionated. As such, when retrieving information about a dataflow, information typically provided via the data structure (and related structures like concept schemes and codelists) is already provided as part of the response.

A component of a dataset (aka variable), such the frequency.

Concepts are used to describe the relevant characteristics of a statistical domain. For example, exchanges rates might be described with components such as the numerator currency, the denominator currency, the type of exchange rates, etc.

Some of these components are expected to be useful across statistical domains. Examples of such components include the frequency, the observation status, the confidentiality, etc.

When using components to describe the expected structure of a statistical domain, data stewards distinguish between the components that represent what is being captured (i.e. the measures), the components that help uniquely identifying the measures (i.e. the dimensions) and the components that provide additional descriptive information about the measures (i.e. the attributes). This is the component role. The role can be D (for Dimension), A (for Attribute) or M (for Measure).

While dimensions and measures are typically mandatory, attributes may be either mandatory or optional. This is captured in the required property using a boolean value (true for mandatory components, false otherwise). This may vary with the statistical domain, i.e. a mandatory component within a particular domain may be optional in another.

While the value of some attributes is expected to potentially vary with each measurement (aka observation or data point), some others must be unique across all observations sharing the same (sub)set of dimension values. This is captured in the attachment_level property, which can be one of: D (for Dataset), O (for Observation), any string identifying a component ID (FREQ) or comma-separated list of component IDs (FREQ,REF_AREA). The latter can be used to identify the dimension, group or series to which the attribute is attached. The attachment level of a component may vary with the statistical domain, i.e. a component attached to a series in a particular domain may be attached to, say, the dataset in another domain.

The codes field indicates the expected (i.e. allowed) set of values a component can take within a particular domain. In addition to (or instead of) a set of codes, additional details about the expected format may be found in the facets and dtype fields.

id: A unique identifier for the component (e.g. FREQ).

required: Whether the component must have a value.

role: The role played by the component.

local_dtype: The component’s local data type (string, number, etc.).

local_facets: Additional local details such as the component’s minimum length.

name: The component’s name.

description: Additional descriptive information about the component.

local_codes: The expected local values for the component (e.g. currency codes).

attachment_level: The attachement level (if role = A only). Attributes can be attached at different levels such as D (for dataset-level attributes), O (for observation-level attributes) or a combination of dimension IDs, separated by commas, for series- and group-level attributes).

array_def: Any additional constraints for array types.

property dtype: DataType

Returns the component data type.

This will return the local data type (if any) or the data type of the referenced concept (if any). In case neither are set, the data type will default to string.

Returns:: The component data type (local, core or default).

property enumeration: Codelist | Hierarchy | None

Returns the list of valid codes for the component.

This will return the local codes (if any) or the codes of the referenced concept (if any), or None in case neither are set.

Returns:: The component codes (local or core).

property facets: Facets | None

Returns the component facets.

This will return the local facets (if any) or the facets of the referenced concept (if any), or None in case neither are set.

Returns:: The component facets (local or core).

class pysdmx.model.dataflow.Components(iterable)

A collection of components describing the data.

append(item)

Add a component to the existing list of components.

Return type:: None

property attributes: Sequence[Component]

Return the list of attributes.

Attributes are components that provide descriptive information about some piece of data (aka an observation or data point).

Returns:: The list of attributes

property dimensions: Sequence[Component]

Return the list of dimensions.

Dimensions are components that contribute to the unique identification of a piece of data (aka an observation or data point). The combination of the values for all dimensions of an observation can therefore be seen as the observation’s primary key.

Returns:: The list of dimensions

extend(other)

Add the components to the existing list of components.

Return type:: None

insert(i, item)

Add a component at the requested index.

Return type:: None

property measures: Sequence[Component]

Return the list of measures.

Measures are components that hold the measured values.

Returns:: The list of measures

class pysdmx.model.dataflow.Dataflow(id: str, uri: str | None = None, urn: str | None = None, name: str | None = None, description: str | None = None, version: str = '1.0', valid_from: datetime | None = None, valid_to: datetime | None = None, is_final: bool = False, is_external_reference: bool = False, service_url: str | None = None, structure_url: str | None = None, agency: str | Agency = '', structure: str | None = None, *, annotations: Sequence[Annotation] = ()): A flow of data that providers will provide.

class pysdmx.model.dataflow.DataflowInfo(id: str, components: Components, agency: Agency, name: str | None = None, description: str | None = None, version: str = '1.0', providers: Sequence[DataProvider] = (), series_count: int | None = None, obs_count: int | None = None, start_period: str | None = None, end_period: str | None = None, last_updated: datetime | None = None, dsd_ref: str | None = None)

Extended information about a dataflow.

The information includes:

Some basic metadata about the dataflow (such as its ID and name).
Some useful metrics such as the number of observations.
The expected structure of data (i.e. the data schema), including the expected components, their types, etc.

id: The identifier of the dataflow (e.g. CBS).

components: The data structure, i.e. the components, their types, etc.

agency: The organization responsible for the data (e.g. BIS).

name: The dataflow’s name (e.g. Consolidated Banking Statistics).

description: Additional descriptive information about the dataflow.

version: The dataflow version.

providers: The organizations providing the data.

series_count: The number of series available in the dataflow.

obs_count: The number of observations available in the dataflow.

start_period: The oldest period for which data are available.

end_period: The oldest period for which data are available.

last_updated: When the dataflow was last updated.

dsd_ref: The URN of the data structure used by the dataflow.

class pysdmx.model.dataflow.Role(value, names=<not given>, *values, module=None, qualname=None, type=None, start=1, boundary=None)

The various roles a component can play.

ATTRIBUTE = 'A': The component provides descriptive information about the data.

DIMENSION = 'D': The component helps identifying data (e.g. primary key).

MEASURE = 'M': The component holds a value we measure or collect.

class pysdmx.model.dataflow.Schema(context: Literal['datastructure', 'dataflow', 'provisionagreement'], agency: str, id: str, components: Components, version: str = '1.0', artefacts: Sequence[str] = (), generated: datetime = datetime.datetime(2025, 5, 23, 14, 44, 47, 198547, tzinfo=datetime.timezone.utc))

The allowed content within a certain context.

This is the equivalent to the result of a schema query in the SDMX-REST API.

The response contains the list of allowed values for the selected context (one of datastructure, dataflow or provisionagreement), and is typically used for validation purposes.

context: The context for which the schema is provided. One of datastructure, dataflow or provisionagreement.

agency: The agency maintaining the context (e.g. BIS).

id: The ID of the context (e.g. BIS_MACRO).

components: The list of components along with their allowed values, types, etc.

version: The context version (e.g. 1.0)

artefacts: The URNs of the artefacts used to generate the schema. This will typically include the URNs of data structures, codelists, concept schemes, content constraints, etc.

generated: When the schema was generated. This is useful for metadata synchronization purposes. For example, if any of the artefacts listed under the artefacts property has been updated after the schema was generated, you might want to regenerate the schema.

property short_urn: str: Returns a short URN for the schema.