Pandas Connector

This is the client to be used for discovering and retrieveing data from a compliant service.

class pysdmx.api.dc.pd.PandasConnector(api_endpoint, pem=None, timeout=20.0)

A Pandas connector for data discovery and data retrieval.

This connector is an implementation of the SDMX “data discovery and data retrieval” API for SDMX-REST v2 web services, which returns Pandas data frames for data queries.

In addition to being compliant with the SDMX-REST v2 API, the targeted service must be able to return structural metadata in SDMX-JSON v2.0.0 and data in SDMX-CSV v2.0.0.

Warning: This class is experimental and subject to change without prior notice. It is not covered by semantic versioning guarantees, and modifications to this class will not result in a major version increment. Use it with caution in production environments or critical processes.

data(dataflow, filters=None, columns=None, apply_schema=True, infer_series_keys=True, infer_index=True, labels='id')

Get data for the selected dataflow, matching the supplied filters.

Parameters:
  • dataflow (Union[str, MaintainableIdentification]) – The dataflow from which to retrieve data. Either a string representing the SDMX URN of the dataflow or the information necessary to uniquely identify it. Classes such as DataflowRef or Dataflow are examples of pysdmx classes that implement the MaintainableIdentification protocol. In case of strings, a shorthand notation (agency:id(version)) is also acceptable.

  • filters (Union[BooleanFilter, DateTimeFilter, MultiFilter, NumberFilter, TextFilter, str, None]) – The data query filters, if any. This can be a string similar to a SQL WHERE clause (“AREA=’UY’ AND FREQ <> ‘A’”) or a Python expression (“REF_AREA==’UY’ and FREQ != ‘A’”) or one of the various filters the pysdmx.api.dc.query module offers.

  • columns (Optional[Iterable[str]]) – The components (dimensions, attributes and measures) to be returned. If not provided, all components will be returned.

  • apply_schema (bool) – Whether to apply a schema, with data types, to the data frame. In that case, the dataflow definition is retrieved and applied to the data, which includes type casting of the various columns.

  • infer_series_keys (bool) – Whether to attempt inferring the series keys from the dimension values. Series keys are generated by concatenating the values of all dimension columns using a period (.) as a separator. This operation will only be performed if the TIME_PERIOD column exists in the data structure. When enabled, a new column called SERIES_KEY will be added to the DataFrame.

  • infer_index (bool) – Whether to create an index for the DataFrame. This operation will only be performed if the TIME_PERIOD column exists in the data structure. When enabled, the DataFrame will be indexed using a combination of SERIES_KEY and TIME_PERIOD.

  • labels (Literal['id', 'name', 'both']) – Specifies the format of category fields in the DataFrame. The following options are available: - “id”: Only include the code IDs (default behavior). - “name”: Replace code IDs with their corresponding names. - “both”: Include both the code IDs and names (“ID: Name”).

Return type:

DataFrame

Returns:

The requested data, if any. Data are returned as Pandas data frame.

dataflow(dataflow)

Retrieve information about a dataflow.

This function provides details about a dataflow, including its components, to assist in querying data effectively.

Parameters:

dataflow (Union[str, MaintainableIdentification]) – Specifies the dataflow to retrieve. This can be: - A string representing the SDMX URN of the dataflow. A shorthand notation (agency:id(version)) is also acceptable. - An object implementing the MaintainableIdentification protocol (e.g., instances of DataflowRef or Dataflow).

Returns:

An object containing detailed information about

the requested dataflow, including:

  • Basic metadata, such as the dataflow’s ID and name.

  • Metrics, such as the number of observations or period coverage (if available from the source).

  • The expected data structure (data schema), including components, their types, and other relevant details.

Return type:

Dataflow

Raises:
  • errors.Invalid – In case the targeted service returns a client error, i.e. a status between 400 and 499.

  • errors.InternalError – In case the targeted service returns a server error, i.e. a status between 500 and 599, or in case the server response could not be deserialized.

  • errors.NotFound – In case the targeted service does not contain the requested dataflow.

  • errors.Unavailable – In case the targeted service could not be reached.

dataflows(search_term=None)

Get the list of dataflows available in the connector.

Parameters:

search_term (Optional[str]) – A search term. If set, any dataflow containing the term in its ID, name, or description will be returned.

Returns:

A sorted and immutable collection of dataflows

matching the supplied search term, if any. For each dataflow, information such as its ID, name and description is returned. If a search term is supplied and does not match any dataflow, an empty collection will be returned. The collection is sorted by agency ID, then dataflow ID and then version number.

Return type:

tuple[Dataflow, ...]

Raises:
  • errors.Invalid – In case the targeted service returns a client error, i.e. a status between 400 and 499.

  • errors.InternalError – In case the targeted service returns a server error, i.e. a status between 500 and 599, or in case the server response could not be deserialized.

  • errors.NotFound – In case the targeted service does not contain any dataflow.

  • errors.Unavailable – In case the targeted service could not be reached.