zea.data.file

zea data file (HDF5).

Functions

assert_key(file, key)

Asserts key is in a h5py.File.

load_dict_from_hdf5_group(group)

Recursively load the contents of an HDF5 group into a plain dict.

load_file(path[, data_type, indices, ...])

Loads a zea data files (h5py file).

load_file_all_data_types(path[, indices, ...])

Loads a zea data files (h5py file).

validate_file([path, file])

Validate the structure and data of a zea HDF5 file.

Classes

File(name[, mode])

h5py.File in zea format.

Track(index, group[, timestamps, label, probe])

A single acquisition track within a File.

class zea.data.file.File(name, mode='r', *args, **kwargs)[source]

Bases: File

h5py.File in zea format.

Initialize the file.

Parameters:
  • name (str, Path, HFPath) – The path to the file. Can be a string or a Path object. Additionally can be a string with the prefix ‘hf://’, in which case it will be resolved to a huggingface path.

  • mode (str, optional) – The mode to open the file in. Defaults to “r”.

  • revision (str, optional) – HuggingFace revision (branch, tag, or commit hash) to download from. Only used when name starts with hf://. Defaults to "main". Example: revision="v0.1.0".

  • repo_type (str, optional) – HuggingFace repository type. Only used when name starts with hf://. Defaults to "dataset".

  • cache_dir (str or Path, optional) – Local cache directory for downloaded HuggingFace files. Only used when name starts with hf://.

  • *args – Additional arguments to pass to h5py.File.

  • **kwargs – Additional keyword arguments to pass to h5py.File.

copy_key(key, dst)[source]

Copy a specific key to another file.

Will always copy the attributes and the scan data if it exists. Will warn if the key is not in this file or if the key already exists in the destination file.

Parameters:
  • key (str) – The key to copy.

  • dst (File) – The destination file to copy the key to.

classmethod create(path, data=None, scan=None, tracks=None, track_schedule=None, metadata=None, metrics=None, probe_name=None, probe=None, us_machine=None, description=None, compression='lzf', chunk_frames=False, overwrite=False)[source]

Create a new zea HDF5 file from data, scan, and optional metadata.

All inputs are validated against the FileSpec schema (dtypes, shapes, dimension consistency) before anything is written to disk.

For single-track files, supply data and scan. For multi-track files, supply tracks (a list of dicts with "data" and "scan" keys, or TrackSpec objects) and optionally track_schedule.

Parameters:
  • path – Destination file path.

  • data (dict | None) – Data dict accepted by DataSpec. Mutually exclusive with tracks.

  • scan (dict | None) – Scan-parameter dict accepted by ScanSpec. Mutually exclusive with tracks.

  • tracks (list | None) – List of track dicts (each with "data" and "scan" keys) accepted by TrackSpec objects. Mutually exclusive with data/scan.

  • track_schedule (ndarray | None) – Optional int32 array of length n_total_tx indicating which track each global transmit belongs to. Only used with tracks.

  • metadata (dict | None) – Optional metadata dict accepted by MetadataSpec.

  • metrics (dict | None) – Optional metrics dict accepted by MetricsSpec.

  • probe_name (str | None) – Removed — use probe={'name': ...} instead.

  • probe (ProbeSpec | dict | None) – Probe specification as a Probe object or a plain dict accepted by ProbeSpec.

  • us_machine (str | None) – Name of the ultrasound machine.

  • description (str | None) – Free-text description of the acquisition.

  • compression (str) – HDF5 compression filter (default "lzf").

  • chunk_frames (bool) – If True, use frame-wise chunking for all datasets containing a “frames” dimension. Dataset will be stored with HDF5 chunking enabled, using a single frame (a single slice along the first dimension) per chunk.

  • overwrite (bool) – If False (default), raise if the file exists.

Returns:

An open read-only File handle.

Return type:

File

Single-track example:

>>> import os, tempfile
>>> import numpy as np
>>> from zea import File

>>> n_frames, n_tx, n_ax, n_el = 2, 4, 64, 8
>>> raw = np.zeros((n_frames, n_tx, n_ax, n_el, 1), dtype=np.float32)
>>> geom = np.zeros((n_el, 3), dtype=np.float32)
>>> scan = {
...     "sampling_frequency": np.float32(40e6),
...     "center_frequency": np.float32(5e6),
...     "demodulation_frequency": np.float32(5e6),
...     "initial_times": np.zeros(n_tx, dtype=np.float32),
...     "t0_delays": np.zeros((n_tx, n_el), dtype=np.float32),
...     "tx_apodizations": np.ones((n_tx, n_el), dtype=np.float32),
...     "focus_distances": np.full(n_tx, np.inf, dtype=np.float32),
...     "transmit_origins": np.zeros((n_tx, 3), dtype=np.float32),
...     "polar_angles": np.zeros(n_tx, dtype=np.float32),
...     "time_to_next_transmit": np.ones((n_frames, n_tx), dtype=np.float32) * 1e-4,
... }

>>> _, path = tempfile.mkstemp(suffix=".hdf5")
>>> f = File.create(
...     path,
...     data={"raw_data": raw},
...     scan=scan,
...     probe={"name": "verasonics_l11_4v"},
...     overwrite=True,
... )
>>> f.probe_name
'verasonics_l11_4v'
>>> f.close()
>>> os.unlink(path)
property data: _GroupProxy

Lazy proxy for the data group of a single-track file.

Supports both the new tracks/track_0/data/ layout and the legacy flat data/ layout.

Returns a GroupProxy so individual datasets can be accessed as attributes without loading everything into RAM:

with File(path) as f:
    f.data.raw_data[:, :n_tx]  # read a slice
    f.data.image.values[0]  # nested group access
Raises:

AttributeError – When the file contains more than one track. Use tracks to iterate over individual tracks.

property description

Reads the description from the data file and returns it.

format_key(key)[source]

Format the key to match the data type.

get_scan_parameters()[source]

Returns a dictionary of parameters to initialize a scan object that comes with the file (stored inside datafile).

If there are no scan parameters in the hdf5 file, returns an empty dictionary.

Returns:

The scan parameters.

Return type:

dict

classmethod get_shape(path, key)[source]

Get the shape of a key in a file.

Parameters:
  • path (str) – The path to the file.

  • key (str) – The key to get the shape of.

Returns:

The shape of the key.

Return type:

tuple

get_track(label)[source]

Return the track with the given label.

Parameters:

label (str) – The exact label string assigned to the desired track.

Returns:

The matching Track object.

Return type:

Track

Raises:

KeyError – If no track with that label exists, with a message listing the available labels so the error is self-diagnosing.

Example:

with File("acquisition.hdf5") as f:
    focused = f.get_track("focused")
    raw = focused.data.raw_data[:]
has_key(key)[source]

Check if the file has a specific key.

Parameters:

key (str) – The key to check.

Returns:

True if the key exists, False otherwise.

Return type:

bool

static key_to_data_type(key)[source]

Convert the key to a data type.

load_data(data_type, indices=None)[source]

Load data from the file.

Deprecated since version Use: file.data.<key> with standard h5py slice indexing instead::

with File(path) as f:

raw = f.data.raw_data[:] # all frames raw = f.data.raw_data[0] # first frame raw = f.data.raw_data[0, [0, 2]] # frame 0, transmits 0 and 2

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • data_type (str) – The type of data to load. Options are ‘raw_data’, ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all data is loaded.

Return type:

ndarray

load_parameters(**overrides)[source]

Load the acquisition parameters (merged probe + scan) from the file.

Reads both the scan and probe groups and merges them into a single Parameters object that owns derivation, caching, and lazy loading of derived quantities. The probe and scan groups live at the same level and have non-overlapping field names, so merging is a plain dict union.

Parameters:

**overrides – Override any parameter from the file. Custom (non-spec) keys are stored as passthrough parameters.

Returns:

The merged, derivable parameters object.

Return type:

Parameters

Raises:

AttributeError – When the file contains more than one track. Use tracks and call .load_parameters() on each track.

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path) as f:
...     parameters = f.load_parameters()
>>> type(parameters).__name__
'Parameters'
load_transmits(key, selected_transmits)[source]

Load raw_data or aligned_data for a given list of transmits. :type key: str :param key: The type of data to load. Options are ‘raw_data’ and ‘aligned_data’. :type key: str :type selected_transmits: list, np.ndarray :param selected_transmits: The transmits to load. :type selected_transmits: list, np.ndarray

property metadata: MetadataSpec

Return a validated MetadataSpec object from the file.

Returns:

The validated metadata spec.

Return type:

MetadataSpec

Raises:

KeyError – If the file has no metadata group.

Example

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path, revision="v0.1.0") as f:
...     meta = f.metadata
...     print(meta.subject.type)
phantom
property metrics: MetricsSpec

Return a validated MetricsSpec object from the file.

Returns:

The validated metrics spec.

Return type:

MetricsSpec

Raises:

KeyError – If the file has no metrics group.

Example:

>>> with File("my_file.hdf5") as f:
...     met = f.metrics
...     print(met.coherence_factor.shape)
property n_ax: int

Number of axial samples.

property n_el: int

Number of elements.

property n_frames: int

Number of frames.

property n_tx: int

Number of transmit events.

property name

Return the name of the file.

property path

Return the path of the file.

property probe: Probe

Returns a Probe object initialized with the parameters from the file.

Returns:

The probe object.

Return type:

Probe

Example

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path) as f:
...     probe = f.probe
>>> probe.name
'verasonics_l11_4v'
property probe_name

Reads the probe name from the data file and returns it.

recursively_load_dict_contents_from_group(path)[source]

Load dict from contents of group.

Deprecated since version Use: the module-level load_dict_from_hdf5_group() function instead, passing an h5py.Group directly.

Parameters:

path (str) – path to group

Returns:

dictionary with contents of group

Return type:

dict

property scan: ScanSpec | None

Return the validated ScanSpec for this file.

This is the bare scan group as a spec object. For a full, derivable parameter object (merged probe + scan, with caching and derived properties) use load_parameters().

Raises:

AttributeError – When the file contains more than one track. Use tracks and access .scan on each track instead.

>>> from zea import File
>>> path = (
...     "hf://zeahub/picmus/database/experiments/contrast_speckle/"
...     "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"
... )
>>> with File(path, revision="v0.1.0", mode="r") as f:
...     scan = f.scan
>>> type(scan).__name__
'ScanSpec'
shape(key)[source]

Return shape of some key.

Return type:

tuple

property stem

Return the stem of the file.

summary()[source]

Print the contents of the file.

to_iterator(key)[source]

Convert the data to an iterator over all frames.

property track_labels: list[str | None]

Labels of all tracks in acquisition order.

Returns a list with one entry per track. Each entry is the label string stored on that track, or None for unlabelled tracks (e.g. single-track or legacy files). The list order matches tracks, so unpacking f.tracks in the same order as f.track_labels is always safe.

Example:

with File("acquisition.hdf5") as f:
    print(f.track_labels)  # ['focused', 'planewave']
    focused, planewave = f.tracks  # safe — same order
property track_schedule: ndarray | None

Track index for each global transmit event, shape (n_total_tx,).

Returns an int32 array that maps every transmit event (in acquisition order) to the track it belongs to, or None if no track_schedule dataset was stored in this file.

Example:

with File("multi_track.hdf5") as f:
    sched = f.track_schedule  # e.g. array([0, 1, 0, 1, ...])
property tracks: list[Track]

Return a list of Track objects, one per track.

Each track exposes .data (a GroupProxy), .scan (a ScanSpec) and .load_parameters() (a Parameters factory method) for that specific track.

Raises:

AttributeError – For legacy flat-format files that have no tracks/ group — use data and scan() directly for those.

Example:

with File("multi_track.hdf5") as f:
    for track in f.tracks:
        raw = track.data.raw_data[:]
        parameters = track.load_parameters()
property us_machine

Reads the ultrasound machine name from the data file and returns it.

validate()[source]

Lightweight structural validation — no array data is loaded into RAM.

Checks that the file has a data group and that all keys within it are recognised zea data types. For legacy files (before zea v0.1.0) a minimal key-name check is performed. For files created with zea v0.1.0 and later (via File.create()) the keys are checked against the DataSpec schema.

Use validate_spec() for a full validation that loads all data and checks dtypes, shapes, and cross-field dimension consistency.

Returns:

{"status": "success"} on success.

Return type:

dict

Raises:

AssertionError – If the file is missing required groups or contains unrecognised data keys.

validate_spec()[source]

Full schema validation — loads all data into RAM.

Reads every dataset in the file and runs dtype, shape, and cross-dimension consistency checks as defined by FileSpec. Use this to confirm a file is fully spec-compliant before sharing or processing it.

For a fast, zero-IO structural check use validate() instead.

Note

This method only works on files created with zea v0.1.0 and later. Files written before zea v0.1.0 should be re-saved through File.create().

Returns:

The fully validated spec object, with all data accessible as typed attributes (e.g. spec.data.raw_data, spec.scan.n_tx).

Return type:

FileSpec

Raises:

TypeError, ValueError – If the file does not conform to the spec.

>>> with File("my_file.hdf5") as f:
...     spec = f.validate_spec()
...     print(spec.scan.n_tx)
property zea_version: str | None

Return the zea version that wrote this file, or None for legacy files.

Files created with zea v0.1.0 and later store a zea_version root attribute. Files written before zea v0.1.0 return None.

class zea.data.file.Track(index, group, timestamps=None, label=None, probe=None)[source]

Bases: object

A single acquisition track within a File.

Provides the same .data, .scan and .load_parameters() interface as File but scoped to one tracks/track_N group. Obtain instances through File.tracks rather than constructing this class directly.

Example:

with File("multi_track.hdf5") as f:
    for track in f.tracks:
        raw = track.data.raw_data[:]
        parameters = track.load_parameters()
property data: _GroupProxy

Lazy proxy for this track’s data group.

property label: str | None

Human-readable name for this track (e.g. 'focused' or 'planewave').

Returns None for single-track files or legacy files written without a label. Use File.track_labels to print all labels in acquisition order and File.get_track() to retrieve a track by name.

load_parameters(**overrides)[source]

Load this track’s parameters (merged probe + scan) as Parameters.

Each track shares the same probe but has its own scan, so the returned object has the same shape as File.load_parameters() for a single-track file.

Parameters:

**overrides – Override any parameter.

Returns:

Initialised parameters object for this track.

Return type:

Parameters

property n_ax: int

Number of axial samples.

property n_el: int

Number of elements.

property n_frames: int

Number of frames.

property n_tx: int

Number of transmit events.

property scan: ScanSpec

Return the validated ScanSpec for this track.

This is the bare scan group as a spec object. For a full, derivable parameter object (merged probe + scan) use load_parameters().

property timestamps: ndarray | None

Global transmit timestamps for this track, shape (n_frames, n_tx).

Timestamps are pre-computed when the Track is created via File.tracks. Returns None if the file has no track_schedule or any track is missing time_to_next_transmit.

zea.data.file.assert_key(file, key)[source]

Asserts key is in a h5py.File.

zea.data.file.load_dict_from_hdf5_group(group)[source]

Recursively load the contents of an HDF5 group into a plain dict.

Datasets are returned as numpy arrays or scalars; nested groups are converted recursively. String datasets are decoded to np.str_.

Parameters:

group (Group) – An open h5py.Group (or h5py.File).

Returns:

Nested dictionary mirroring the group structure.

Return type:

dict

zea.data.file.load_file(path, data_type='raw_data', indices=None, scan_kwargs=None)[source]

Loads a zea data files (h5py file).

Returns the data together with a parameters object containing the parameters of the acquisition. Probe information is available via parameters.to_probe_dict() or File.probe.

Additionally, it can load a specific subset of frames / transmits.

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • path (str, pathlike) – The path to the hdf5 file.

  • data_type (str, optional) – The type of data to load. Defaults to ‘raw_data’. Other options are ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all frames are loaded.

  • scan_kwargs (dict) – Additional keyword arguments to pass to File.load_parameters(). These will override the parameters from the file if they are present. Defaults to None.

Returns:

The raw data of shape (n_frames, n_tx, n_ax, n_el, n_ch). (Parameters): A parameters object containing the parameters of the acquisition.

Return type:

Tuple[ndarray, Parameters]

zea.data.file.load_file_all_data_types(path, indices=None, scan_kwargs=None)[source]

Loads a zea data files (h5py file).

Returns all data types together with a parameters object containing the parameters of the acquisition. Probe information is available via parameters.to_probe_dict() or File.probe.

Additionally, it can load a specific subset of frames / transmits.

The indices parameter can be used to load a subset of the data. This can be

  • 'all' or None to load all data

  • an int to load a single frame

  • a List[int] to load specific frames

  • a Tuple[Union[list, slice, int], ...] to index multiple axes (i.e. frames and transmits). Note that

    indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.

For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.

Parameters:
  • path (str, pathlike) – The path to the hdf5 file.

  • indices (Union[Tuple[Union[list, slice, int], ...], List[int], int, None]) – The indices to load. Defaults to None in which case all frames are loaded.

  • scan_kwargs (dict) – Additional keyword arguments to pass to File.load_parameters(). These will override the parameters from the file if they are present. Defaults to None.

Returns:

A dictionary with all data types as keys and the corresponding data as values. (Parameters): A parameters object containing the parameters of the acquisition.

Return type:

(dict)

zea.data.file.validate_file(path=None, file=None)[source]

Validate the structure and data of a zea HDF5 file.

For files created with zea v0.1.0 and later this runs the full FileSpec schema validation (dtypes, shapes, and dimension consistency). Legacy files (before zea v0.1.0) are detected by the presence of scalar dataset scan/n_frames; for those only a lightweight structural data group check is performed.

Provide either path or file, but not both.

Parameters:
  • path (str) – Path to the HDF5 file.

  • file (File) – An already-open File instance.

Returns:

{"status": "success"} on success.

Return type:

dict

Raises:
  • AssertionError – If the file is missing the data group.

  • TypeError, ValueError – If spec validation fails on files created with zea v0.1.0 and later.