zea.data.file¶
zea data file (HDF5).
Functions
|
Asserts key is in a h5py.File. |
|
Recursively load the contents of an HDF5 group into a plain dict. |
|
Loads a zea data files (h5py file). |
|
Loads a zea data files (h5py file). |
|
Validate the structure and data of a zea HDF5 file. |
Classes
|
h5py.File in zea format. |
|
A single acquisition track within a |
- class zea.data.file.File(name, mode='r', *args, **kwargs)[source]¶
Bases:
Fileh5py.File in zea format.
Initialize the file.
- Parameters:
name (str, Path, HFPath) – The path to the file. Can be a string or a Path object. Additionally can be a string with the prefix ‘hf://’, in which case it will be resolved to a huggingface path.
mode (str, optional) – The mode to open the file in. Defaults to “r”.
revision (str, optional) – HuggingFace revision (branch, tag, or commit hash) to download from. Only used when
namestarts withhf://. Defaults to"main". Example:revision="v0.1.0".repo_type (str, optional) – HuggingFace repository type. Only used when
namestarts withhf://. Defaults to"dataset".cache_dir (str or Path, optional) – Local cache directory for downloaded HuggingFace files. Only used when
namestarts withhf://.*args – Additional arguments to pass to h5py.File.
**kwargs – Additional keyword arguments to pass to h5py.File.
- copy_key(key, dst)[source]¶
Copy a specific key to another file.
Will always copy the attributes and the scan data if it exists. Will warn if the key is not in this file or if the key already exists in the destination file.
- Parameters:
key (
str) – The key to copy.dst (
File) – The destination file to copy the key to.
- classmethod create(path, data=None, scan=None, tracks=None, track_schedule=None, metadata=None, metrics=None, probe_name=None, probe=None, us_machine=None, description=None, compression='lzf', chunk_frames=False, overwrite=False)[source]¶
Create a new zea HDF5 file from data, scan, and optional metadata.
All inputs are validated against the
FileSpecschema (dtypes, shapes, dimension consistency) before anything is written to disk.For single-track files, supply
dataandscan. For multi-track files, supplytracks(a list of dicts with"data"and"scan"keys, orTrackSpecobjects) and optionallytrack_schedule.- Parameters:
path – Destination file path.
data (
dict|None) – Data dict accepted byDataSpec. Mutually exclusive withtracks.scan (
dict|None) – Scan-parameter dict accepted byScanSpec. Mutually exclusive withtracks.tracks (
list|None) – List of track dicts (each with"data"and"scan"keys) accepted byTrackSpecobjects. Mutually exclusive withdata/scan.track_schedule (
ndarray|None) – Optional int32 array of lengthn_total_txindicating which track each global transmit belongs to. Only used withtracks.metadata (
dict|None) – Optional metadata dict accepted byMetadataSpec.metrics (
dict|None) – Optional metrics dict accepted byMetricsSpec.probe_name (
str|None) – Removed — useprobe={'name': ...}instead.probe (
ProbeSpec|dict|None) – Probe specification as aProbeobject or a plain dict accepted byProbeSpec.us_machine (
str|None) – Name of the ultrasound machine.description (
str|None) – Free-text description of the acquisition.compression (
str) – HDF5 compression filter (default"lzf").chunk_frames (
bool) – If True, use frame-wise chunking for all datasets containing a “frames” dimension. Dataset will be stored with HDF5 chunking enabled, using a single frame (a single slice along the first dimension) per chunk.overwrite (
bool) – If False (default), raise if the file exists.
- Returns:
An open read-only
Filehandle.- Return type:
Single-track example:
>>> import os, tempfile >>> import numpy as np >>> from zea import File >>> n_frames, n_tx, n_ax, n_el = 2, 4, 64, 8 >>> raw = np.zeros((n_frames, n_tx, n_ax, n_el, 1), dtype=np.float32) >>> geom = np.zeros((n_el, 3), dtype=np.float32) >>> scan = { ... "sampling_frequency": np.float32(40e6), ... "center_frequency": np.float32(5e6), ... "demodulation_frequency": np.float32(5e6), ... "initial_times": np.zeros(n_tx, dtype=np.float32), ... "t0_delays": np.zeros((n_tx, n_el), dtype=np.float32), ... "tx_apodizations": np.ones((n_tx, n_el), dtype=np.float32), ... "focus_distances": np.full(n_tx, np.inf, dtype=np.float32), ... "transmit_origins": np.zeros((n_tx, 3), dtype=np.float32), ... "polar_angles": np.zeros(n_tx, dtype=np.float32), ... "time_to_next_transmit": np.ones((n_frames, n_tx), dtype=np.float32) * 1e-4, ... } >>> _, path = tempfile.mkstemp(suffix=".hdf5") >>> f = File.create( ... path, ... data={"raw_data": raw}, ... scan=scan, ... probe={"name": "verasonics_l11_4v"}, ... overwrite=True, ... ) >>> f.probe_name 'verasonics_l11_4v' >>> f.close() >>> os.unlink(path)
- property data: _GroupProxy¶
Lazy proxy for the
datagroup of a single-track file.Supports both the new
tracks/track_0/data/layout and the legacy flatdata/layout.Returns a
GroupProxyso individual datasets can be accessed as attributes without loading everything into RAM:with File(path) as f: f.data.raw_data[:, :n_tx] # read a slice f.data.image.values[0] # nested group access
- Raises:
AttributeError – When the file contains more than one track. Use
tracksto iterate over individual tracks.
- property description¶
Reads the description from the data file and returns it.
- get_scan_parameters()[source]¶
Returns a dictionary of parameters to initialize a scan object that comes with the file (stored inside datafile).
If there are no scan parameters in the hdf5 file, returns an empty dictionary.
- Returns:
The scan parameters.
- Return type:
dict
- classmethod get_shape(path, key)[source]¶
Get the shape of a key in a file.
- Parameters:
path (
str) – The path to the file.key (
str) – The key to get the shape of.
- Returns:
The shape of the key.
- Return type:
tuple
- get_track(label)[source]¶
Return the track with the given label.
- Parameters:
label (
str) – The exact label string assigned to the desired track.- Returns:
The matching
Trackobject.- Return type:
- Raises:
KeyError – If no track with that label exists, with a message listing the available labels so the error is self-diagnosing.
Example:
with File("acquisition.hdf5") as f: focused = f.get_track("focused") raw = focused.data.raw_data[:]
- has_key(key)[source]¶
Check if the file has a specific key.
- Parameters:
key (
str) – The key to check.- Returns:
True if the key exists, False otherwise.
- Return type:
bool
- load_data(data_type, indices=None)[source]¶
Load data from the file.
Deprecated since version Use:
file.data.<key>with standard h5py slice indexing instead::- with File(path) as f:
raw = f.data.raw_data[:] # all frames raw = f.data.raw_data[0] # first frame raw = f.data.raw_data[0, [0, 2]] # frame 0, transmits 0 and 2
The indices parameter can be used to load a subset of the data. This can be
'all'orNoneto load all dataan
intto load a single framea
List[int]to load specific frames- a
Tuple[Union[list, slice, int], ...]to index multiple axes (i.e. frames and transmits). Note that indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.
- a
For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.
- Parameters:
data_type (str) – The type of data to load. Options are ‘raw_data’, ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.
indices (
Union[Tuple[Union[list,slice,int],...],List[int],int,None]) – The indices to load. Defaults toNonein which case all data is loaded.
- Return type:
ndarray
- load_parameters(**overrides)[source]¶
Load the acquisition parameters (merged probe + scan) from the file.
Reads both the
scanandprobegroups and merges them into a singleParametersobject that owns derivation, caching, and lazy loading of derived quantities. The probe and scan groups live at the same level and have non-overlapping field names, so merging is a plain dict union.- Parameters:
**overrides – Override any parameter from the file. Custom (non-spec) keys are stored as passthrough parameters.
- Returns:
The merged, derivable parameters object.
- Return type:
- Raises:
AttributeError – When the file contains more than one track. Use
tracksand call.load_parameters()on each track.
>>> from zea import File >>> path = ( ... "hf://zeahub/picmus/database/experiments/contrast_speckle/" ... "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5" ... ) >>> with File(path) as f: ... parameters = f.load_parameters() >>> type(parameters).__name__ 'Parameters'
- load_transmits(key, selected_transmits)[source]¶
Load raw_data or aligned_data for a given list of transmits. :type key: str :param key: The type of data to load. Options are ‘raw_data’ and ‘aligned_data’. :type key: str :type selected_transmits: list, np.ndarray :param selected_transmits: The transmits to load. :type selected_transmits: list, np.ndarray
- property metadata: MetadataSpec¶
Return a validated
MetadataSpecobject from the file.- Returns:
The validated metadata spec.
- Return type:
- Raises:
KeyError – If the file has no
metadatagroup.
Example
>>> from zea import File >>> path = ( ... "hf://zeahub/picmus/database/experiments/contrast_speckle/" ... "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5" ... ) >>> with File(path, revision="v0.1.0") as f: ... meta = f.metadata ... print(meta.subject.type) phantom
- property metrics: MetricsSpec¶
Return a validated
MetricsSpecobject from the file.- Returns:
The validated metrics spec.
- Return type:
- Raises:
KeyError – If the file has no
metricsgroup.
Example:
>>> with File("my_file.hdf5") as f: ... met = f.metrics ... print(met.coherence_factor.shape)
- property n_ax: int¶
Number of axial samples.
- property n_el: int¶
Number of elements.
- property n_frames: int¶
Number of frames.
- property n_tx: int¶
Number of transmit events.
- property name¶
Return the name of the file.
- property path¶
Return the path of the file.
- property probe: Probe¶
Returns a Probe object initialized with the parameters from the file.
- Returns:
The probe object.
- Return type:
Example
>>> from zea import File >>> path = ( ... "hf://zeahub/picmus/database/experiments/contrast_speckle/" ... "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5" ... ) >>> with File(path) as f: ... probe = f.probe >>> probe.name 'verasonics_l11_4v'
- property probe_name¶
Reads the probe name from the data file and returns it.
- recursively_load_dict_contents_from_group(path)[source]¶
Load dict from contents of group.
Deprecated since version Use: the module-level
load_dict_from_hdf5_group()function instead, passing anh5py.Groupdirectly.- Parameters:
path (
str) – path to group- Returns:
dictionary with contents of group
- Return type:
dict
- property scan: ScanSpec | None¶
Return the validated
ScanSpecfor this file.This is the bare scan group as a spec object. For a full, derivable parameter object (merged probe + scan, with caching and derived properties) use
load_parameters().- Raises:
AttributeError – When the file contains more than one track. Use
tracksand access.scanon each track instead.
>>> from zea import File >>> path = ( ... "hf://zeahub/picmus/database/experiments/contrast_speckle/" ... "contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5" ... ) >>> with File(path, revision="v0.1.0", mode="r") as f: ... scan = f.scan >>> type(scan).__name__ 'ScanSpec'
- property stem¶
Return the stem of the file.
- property track_labels: list[str | None]¶
Labels of all tracks in acquisition order.
Returns a list with one entry per track. Each entry is the label string stored on that track, or
Nonefor unlabelled tracks (e.g. single-track or legacy files). The list order matchestracks, so unpackingf.tracksin the same order asf.track_labelsis always safe.Example:
with File("acquisition.hdf5") as f: print(f.track_labels) # ['focused', 'planewave'] focused, planewave = f.tracks # safe — same order
- property track_schedule: ndarray | None¶
Track index for each global transmit event, shape
(n_total_tx,).Returns an
int32array that maps every transmit event (in acquisition order) to the track it belongs to, orNoneif notrack_scheduledataset was stored in this file.Example:
with File("multi_track.hdf5") as f: sched = f.track_schedule # e.g. array([0, 1, 0, 1, ...])
- property tracks: list[Track]¶
Return a list of
Trackobjects, one per track.Each track exposes
.data(aGroupProxy),.scan(aScanSpec) and.load_parameters()(aParametersfactory method) for that specific track.- Raises:
AttributeError – For legacy flat-format files that have no
tracks/group — usedataandscan()directly for those.
Example:
with File("multi_track.hdf5") as f: for track in f.tracks: raw = track.data.raw_data[:] parameters = track.load_parameters()
- property us_machine¶
Reads the ultrasound machine name from the data file and returns it.
- validate()[source]¶
Lightweight structural validation — no array data is loaded into RAM.
Checks that the file has a
datagroup and that all keys within it are recognised zea data types. For legacy files (before zea v0.1.0) a minimal key-name check is performed. For files created with zea v0.1.0 and later (viaFile.create()) the keys are checked against theDataSpecschema.Use
validate_spec()for a full validation that loads all data and checks dtypes, shapes, and cross-field dimension consistency.- Returns:
{"status": "success"}on success.- Return type:
dict
- Raises:
AssertionError – If the file is missing required groups or contains unrecognised data keys.
- validate_spec()[source]¶
Full schema validation — loads all data into RAM.
Reads every dataset in the file and runs dtype, shape, and cross-dimension consistency checks as defined by
FileSpec. Use this to confirm a file is fully spec-compliant before sharing or processing it.For a fast, zero-IO structural check use
validate()instead.Note
This method only works on files created with zea v0.1.0 and later. Files written before zea v0.1.0 should be re-saved through
File.create().- Returns:
The fully validated spec object, with all data accessible as typed attributes (e.g.
spec.data.raw_data,spec.scan.n_tx).- Return type:
- Raises:
TypeError, ValueError – If the file does not conform to the spec.
>>> with File("my_file.hdf5") as f: ... spec = f.validate_spec() ... print(spec.scan.n_tx)
- property zea_version: str | None¶
Return the zea version that wrote this file, or
Nonefor legacy files.Files created with zea v0.1.0 and later store a
zea_versionroot attribute. Files written before zea v0.1.0 returnNone.
- class zea.data.file.Track(index, group, timestamps=None, label=None, probe=None)[source]¶
Bases:
objectA single acquisition track within a
File.Provides the same
.data,.scanand.load_parameters()interface asFilebut scoped to onetracks/track_Ngroup. Obtain instances throughFile.tracksrather than constructing this class directly.Example:
with File("multi_track.hdf5") as f: for track in f.tracks: raw = track.data.raw_data[:] parameters = track.load_parameters()
- property data: _GroupProxy¶
Lazy proxy for this track’s
datagroup.
- property label: str | None¶
Human-readable name for this track (e.g.
'focused'or'planewave').Returns
Nonefor single-track files or legacy files written without a label. UseFile.track_labelsto print all labels in acquisition order andFile.get_track()to retrieve a track by name.
- load_parameters(**overrides)[source]¶
Load this track’s parameters (merged probe + scan) as
Parameters.Each track shares the same probe but has its own scan, so the returned object has the same shape as
File.load_parameters()for a single-track file.- Parameters:
**overrides – Override any parameter.
- Returns:
Initialised parameters object for this track.
- Return type:
- property n_ax: int¶
Number of axial samples.
- property n_el: int¶
Number of elements.
- property n_frames: int¶
Number of frames.
- property n_tx: int¶
Number of transmit events.
- property scan: ScanSpec¶
Return the validated
ScanSpecfor this track.This is the bare scan group as a spec object. For a full, derivable parameter object (merged probe + scan) use
load_parameters().
- property timestamps: ndarray | None¶
Global transmit timestamps for this track, shape
(n_frames, n_tx).Timestamps are pre-computed when the
Trackis created viaFile.tracks. ReturnsNoneif the file has notrack_scheduleor any track is missingtime_to_next_transmit.
- zea.data.file.load_dict_from_hdf5_group(group)[source]¶
Recursively load the contents of an HDF5 group into a plain dict.
Datasets are returned as numpy arrays or scalars; nested groups are converted recursively. String datasets are decoded to
np.str_.- Parameters:
group (
Group) – An openh5py.Group(orh5py.File).- Returns:
Nested dictionary mirroring the group structure.
- Return type:
dict
- zea.data.file.load_file(path, data_type='raw_data', indices=None, scan_kwargs=None)[source]¶
Loads a zea data files (h5py file).
Returns the data together with a parameters object containing the parameters of the acquisition. Probe information is available via
parameters.to_probe_dict()orFile.probe.Additionally, it can load a specific subset of frames / transmits.
The indices parameter can be used to load a subset of the data. This can be
'all'orNoneto load all dataan
intto load a single framea
List[int]to load specific frames- a
Tuple[Union[list, slice, int], ...]to index multiple axes (i.e. frames and transmits). Note that indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.
- a
For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.
- Parameters:
path (str, pathlike) – The path to the hdf5 file.
data_type (str, optional) – The type of data to load. Defaults to ‘raw_data’. Other options are ‘aligned_data’, ‘beamformed_data’, ‘envelope_data’, ‘image’ and ‘image_sc’.
indices (
Union[Tuple[Union[list,slice,int],...],List[int],int,None]) – The indices to load. Defaults to None in which case all frames are loaded.scan_kwargs (
dict) – Additional keyword arguments to pass toFile.load_parameters(). These will override the parameters from the file if they are present. Defaults to None.
- Returns:
The raw data of shape (n_frames, n_tx, n_ax, n_el, n_ch). (Parameters): A parameters object containing the parameters of the acquisition.
- Return type:
Tuple[ndarray,Parameters]
- zea.data.file.load_file_all_data_types(path, indices=None, scan_kwargs=None)[source]¶
Loads a zea data files (h5py file).
Returns all data types together with a parameters object containing the parameters of the acquisition. Probe information is available via
parameters.to_probe_dict()orFile.probe.Additionally, it can load a specific subset of frames / transmits.
The indices parameter can be used to load a subset of the data. This can be
'all'orNoneto load all dataan
intto load a single framea
List[int]to load specific frames- a
Tuple[Union[list, slice, int], ...]to index multiple axes (i.e. frames and transmits). Note that indexing with lists of indices for multiple axes is not supported. In that case, try to define one of the axes with a slice for optimal performance. Alternatively, slice the data after loading.
- a
For more information on the indexing options, see indexing on ndarrays and fancy indexing in h5py.
- Parameters:
path (str, pathlike) – The path to the hdf5 file.
indices (
Union[Tuple[Union[list,slice,int],...],List[int],int,None]) – The indices to load. Defaults to None in which case all frames are loaded.scan_kwargs (
dict) – Additional keyword arguments to pass toFile.load_parameters(). These will override the parameters from the file if they are present. Defaults to None.
- Returns:
A dictionary with all data types as keys and the corresponding data as values. (Parameters): A parameters object containing the parameters of the acquisition.
- Return type:
(dict)
- zea.data.file.validate_file(path=None, file=None)[source]¶
Validate the structure and data of a zea HDF5 file.
For files created with zea v0.1.0 and later this runs the full
FileSpecschema validation (dtypes, shapes, and dimension consistency). Legacy files (before zea v0.1.0) are detected by the presence of scalar datasetscan/n_frames; for those only a lightweight structuraldatagroup check is performed.Provide either path or file, but not both.
- Parameters:
- Returns:
{"status": "success"}on success.- Return type:
dict
- Raises:
AssertionError – If the file is missing the
datagroup.TypeError, ValueError – If spec validation fails on files created with zea v0.1.0 and later.