Working with the zea data formatΒΆ

In this tutorial notebook we will show how to load a zea data file and how to access the data stored in it. There are three common ways to load a zea data file:

  1. Loading data from single file with zea.File

  2. Loading data from a group of files with zea.Dataset

  3. Loading data in batches with dataloading utilities with zea.Dataloader

Open In Colab Β  View on GitHub Β  Hugging Face dataset

‼️ Important: This notebook is optimized for GPU/TPU. Code execution on a CPU may be very slow.

If you are running in Colab, please enable a hardware accelerator via:

Runtime β†’ Change runtime type β†’ Hardware accelerator β†’ GPU/TPU πŸš€.

[1]:
%%capture
%pip install zea
[2]:
config_picmus_iq = "hf://zeahub/configs/config_picmus_iq.yaml"
[3]:
import os

os.environ["KERAS_BACKEND"] = "jax"
os.environ["ZEA_DISABLE_CACHE"] = "1"
os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
[4]:
import keras
import matplotlib.pyplot as plt

import zea
from zea.visualize import set_mpl_style
zea: Using backend 'jax'

We will work with the GPU if available, and initialize using init_device to pick the best available device. Also, (optionally), we will set the matplotlib style for plotting.

[5]:
zea.init_device(verbose=False)
set_mpl_style()

Loading a file with zea.FileΒΆ

The zea data format works with HDF5 files. We can open a zea data file using the h5py package and have a look at the contents using the zea.File.summary() function. You can see that every dataset element contains a corresponding description and unit. Note that we now pass a url to a Hugging Face dataset, but you can also use a local file path to a zea data file. Here we will use an example from the PICMUS dataset, converted to zea format and hosted on the Hugging Face Hub.

Tip: You can also use the HDFView tool to view the contents of the zea data file without having to run any code. Or if you use VS Code, you can install the HDF5 extension to view the contents of the file.

You can extract data and acquisition parameters (which are stored together with the data in the zea data file) as follows:

[6]:
file_path = "hf://zeahub/picmus/database/experiments/contrast_speckle/contrast_speckle_expe_dataset_iq/contrast_speckle_expe_dataset_iq.hdf5"

# we'll only load the first frame and the first 3 transmit events here
frame_idx = 0
transmit_idx = slice(0, 3)

with zea.File(file_path, mode="r", revision="v0.1.0") as file:
    file.summary()
    data = file.data.raw_data[frame_idx, transmit_idx]
    parameters = file.load_parameters()

print("Raw data shape:", data.shape)
print(parameters)
contrast_speckle_expe_dataset_iq.hdf5/
    β”œβ”€β”€ description: PICMUS (Plane-wave Imaging Challenge in Medical UltraSound) dataset converted to zea format. License: The datasets and code provided on PICMUS are free of use. The only request is to refer properly to PICMUS - The Plane Wave Imaging Challenge in Medical UltraSound and quote the proceeding paper.. Citation: H. Liebgott, A. Rodriguez-Molares, F. Cervenansky, J. D'hooge, O. Bernard. "Plane-Wave Imaging Challenge in Medical Ultrasound." 2016 IEEE International Ultrasonics Symposium (IUS), Tours, France, 2016, pp. 1-4. https://doi.org/10.1109/ULTSYM.2016.7728908
    β”œβ”€β”€ zea_version: 0.1.0
    β”œβ”€β”€ metadata/
    β”‚   β”œβ”€β”€ credit/
    β”‚   β”‚   β”œβ”€β”€ /metadata/credit (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Credit or attribution for the dataset.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: -
    β”‚   └── subject/
    β”‚       └── type/
    β”‚           β”œβ”€β”€ /metadata/subject/type (shape=())
    β”‚           β”‚   β”œβ”€β”€ description:
    β”‚           β”‚   β”œβ”€β”€ unit: -
    β”œβ”€β”€ metrics/
    β”œβ”€β”€ probe/
    β”‚   β”œβ”€β”€ element_height/
    β”‚   β”‚   β”œβ”€β”€ /probe/element_height (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Height (elevation aperture) of a single transducer element.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: m
    β”‚   β”œβ”€β”€ element_width/
    β”‚   β”‚   β”œβ”€β”€ /probe/element_width (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Width of a single transducer element.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: m
    β”‚   β”œβ”€β”€ name/
    β”‚   β”‚   β”œβ”€β”€ /probe/name (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Probe model name/identifier.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: -
    β”‚   β”œβ”€β”€ probe_bandwidth_percent/
    β”‚   β”‚   β”œβ”€β”€ /probe/probe_bandwidth_percent (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Fractional bandwidth as a percentage.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: %
    β”‚   β”œβ”€β”€ probe_center_frequency/
    β”‚   β”‚   β”œβ”€β”€ /probe/probe_center_frequency (shape=())
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Probe nominal centre frequency.
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: Hz
    β”‚   β”œβ”€β”€ probe_geometry/
    β”‚   β”‚   β”œβ”€β”€ /probe/probe_geometry (shape=(128, 3))
    β”‚   β”‚   β”‚   β”œβ”€β”€ description: Element positions (x, y, z) per element, shape (n_el, 3).
    β”‚   β”‚   β”‚   β”œβ”€β”€ unit: m
    β”‚   └── type/
    β”‚       β”œβ”€β”€ /probe/type (shape=())
    β”‚       β”‚   β”œβ”€β”€ description: Probe geometry type (linear, phased, curved, ...).
    β”‚       β”‚   β”œβ”€β”€ unit: -
    └── tracks/
        └── track_0/
            β”œβ”€β”€ data/
            β”‚   └── raw_data/
            β”‚       β”œβ”€β”€ /tracks/track_0/data/raw_data (shape=(1, 75, 832, 128, 2))
            β”‚       β”‚   β”œβ”€β”€ description: Raw channel data.
            β”‚       β”‚   β”œβ”€β”€ unit: -
            └── scan/
                β”œβ”€β”€ azimuth_angles/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/azimuth_angles (shape=(75))
                β”‚   β”‚   β”œβ”€β”€ description: Azimuthal angles of transmit beams.
                β”‚   β”‚   β”œβ”€β”€ unit: rad
                β”œβ”€β”€ center_frequency/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/center_frequency (shape=())
                β”‚   β”‚   β”œβ”€β”€ description: Center frequency of the transmit pulse.
                β”‚   β”‚   β”œβ”€β”€ unit: Hz
                β”œβ”€β”€ demodulation_frequency/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/demodulation_frequency (shape=())
                β”‚   β”‚   β”œβ”€β”€ description: Demodulation frequency.
                β”‚   β”‚   β”œβ”€β”€ unit: Hz
                β”œβ”€β”€ focus_distances/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/focus_distances (shape=(75))
                β”‚   β”‚   β”œβ”€β”€ description: Transmit focus distances.
                β”‚   β”‚   β”œβ”€β”€ unit: m
                β”œβ”€β”€ initial_times/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/initial_times (shape=(75))
                β”‚   β”‚   β”œβ”€β”€ description: A/D converter start times per transmit.
                β”‚   β”‚   β”œβ”€β”€ unit: s
                β”œβ”€β”€ polar_angles/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/polar_angles (shape=(75))
                β”‚   β”‚   β”œβ”€β”€ description: Polar angles of transmit beams.
                β”‚   β”‚   β”œβ”€β”€ unit: rad
                β”œβ”€β”€ sampling_frequency/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/sampling_frequency (shape=())
                β”‚   β”‚   β”œβ”€β”€ description: Sampling frequency.
                β”‚   β”‚   β”œβ”€β”€ unit: Hz
                β”œβ”€β”€ sound_speed/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/sound_speed (shape=())
                β”‚   β”‚   β”œβ”€β”€ description: Speed of sound.
                β”‚   β”‚   β”œβ”€β”€ unit: m/s
                β”œβ”€β”€ t0_delays/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/t0_delays (shape=(75, 128))
                β”‚   β”‚   β”œβ”€β”€ description: Transmit delays per element.
                β”‚   β”‚   β”œβ”€β”€ unit: s
                β”œβ”€β”€ transmit_origins/
                β”‚   β”œβ”€β”€ /tracks/track_0/scan/transmit_origins (shape=(75, 3))
                β”‚   β”‚   β”œβ”€β”€ description: Transmit beam origins (x, y, z).
                β”‚   β”‚   β”œβ”€β”€ unit: m
                └── tx_apodizations/
                    β”œβ”€β”€ /tracks/track_0/scan/tx_apodizations (shape=(75, 128))
                    β”‚   β”œβ”€β”€ description: Transmit apodization per element.
                    β”‚   β”œβ”€β”€ unit: -
Raw data shape: (3, 832, 128, 2)
Parameters(
    probe_center_frequency=5133300.0,
    probe_bandwidth_percent=67.0,
    probe_geometry=array(float32 (128, 3)),
    element_width=0.00027,
    element_height=0.005,
    sampling_frequency=5.208 MHz,
    center_frequency=5.208 MHz,
    demodulation_frequency=5.208 MHz,
    initial_times=array(float32 (75,)),
    t0_delays=array(float32 (75, 128)),
    tx_apodizations=array(float32 (75, 128)),
    focus_distances=array(float32 (75,)),
    transmit_origins=array(float32 (75, 3)),
    polar_angles=array(float32 (75,)),
    azimuth_angles=array(float32 (75,)),
    sound_speed=1540.0,
    n_ax=832,
    n_el=128,
    n_tx=75,
    pixels_per_wavelength=4,
    pfield_kwargs={},
    apply_lens_correction=False,
    grid_type='cartesian',
    selected_transmits=[0, 1, 2, 3, ..., 73, 74] (len=75),
    attenuation_coef=0.0,
    f_number=1.0,
    distance_to_apex=0.0
)

Loading data with zea.DatasetΒΆ

We can also load and manage a group of files (i.e. a dataset) using the zea.Dataset class. Instead of a path to a single file, we can pass a list of file paths or a directory containing multiple zea data files. The zea.Dataset class will automatically load the files and allow you to access the data in a similar way as with zea.File.

[7]:
dataset_path = "hf://zeahub/picmus/database/experiments"

dataset = zea.Dataset(dataset_path, revision="v0.1.0")

print(dataset)

for file in dataset:
    print(file)

dataset.close()
zea: Searching /tmp/zea_cache_ozng4pb2/huggingface/datasets/datasets--zeahub--picmus/snapshots/397ea75f45921ab61bb1d995290d3853b7a2fb0b/database/experiments for ['.hdf5', '.h5'] files...
Dataset with 4 files
<File "contrast_speckle_expe_dataset_rf.hdf5" (mode r, 1 track)>
<File "contrast_speckle_expe_dataset_iq.hdf5" (mode r, 1 track)>
<File "resolution_distorsion_expe_dataset_rf.hdf5" (mode r, 1 track)>
<File "resolution_distorsion_expe_dataset_iq.hdf5" (mode r, 1 track)>

Loading data with DataloaderΒΆ

In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The zea.Dataloader class provides a convenient way to create a high-performance data loader from a zea dataset. It is built on Grain and does not require TensorFlow. This dataloader is particularly useful for training models. Consistency of shape is preferred, which is not the case for PICMUS. Therefore in this example we will use a small part of the CAMUS dataset.

[8]:
dataset_path = "hf://zeahub/camus-sample/val"
dataloader = zea.Dataloader(
    dataset_path,
    key="data/image/values",
    revision="v0.1.0",
    batch_size=4,
    shuffle=True,
    clip_image_range=[-60, 0],
    image_range=[-60, 0],
    normalization_range=[0, 1],
    image_size=(256, 256),
    resize_type="resize",  # or "center_crop or "random_crop"
    seed=4,
)

for batch in dataloader:
    print("Batch shape:", batch.shape)
    break  # Just show the first batch

fig, _ = zea.visualize.plot_image_grid(batch)
zea: Searching /tmp/zea_cache_ozng4pb2/huggingface/datasets/datasets--zeahub--camus-sample/snapshots/66df8da70a28be958531748b5d67530fe496943b/val for ['.hdf5', '.h5'] files...
zea: Dataset validated. Check /tmp/zea_cache_ozng4pb2/huggingface/datasets/datasets--zeahub--camus-sample/snapshots/66df8da70a28be958531748b5d67530fe496943b/val/validated.flag for details.
zea: Caching is globally disabled for _find_h5_file_shapes.
Batch shape: (4, 256, 256, 1)
../../_images/notebooks_data_zea_data_example_15_3.png

Processing an exampleΒΆ

We will now use one of the zea data files to demonstrate how to process it. A full example can be found in the zea_pipeline_example notebook. Here we will just show a simple example for completeness. We will start by loading a config file, that contains all the required information to initiate a processing pipeline.

[9]:
config = zea.Config.from_path(config_picmus_iq)
zea: WARNING Config key 'scan' is deprecated; use 'parameters' instead. Aliasing 'scan' -> 'parameters'.

Now we can load the zea data file, extract data and parameters, and then process the data using the pipeline defined by the config file.

[10]:
with zea.File(config.data.dataset_folder + "/" + config.data.file_path, mode="r") as file:
    # we use config here to overwrite some of the scan parameters
    parameters = file.load_parameters()
    parameters.update(**config.parameters)
    data = file[file.format_key(config.data.dtype)][:]

pipeline = zea.Pipeline.from_config(config)
inputs = pipeline.prepare_parameters(parameters)

images = pipeline(data=data, **inputs)["data"]
images = keras.ops.convert_to_numpy(images)
zea: WARNING This ``zea.File`` '/tmp/zea_cache_ozng4pb2/huggingface/datasets/datasets--zeahub--picmus/snapshots/07fe825b53c92b1d423fadb1dfa104ed2a38aa4a/database/simulation/contrast_speckle/contrast_speckle_simu_dataset_iq/contrast_speckle_simu_dataset_iq.hdf5' was created with a legacy version of zea (<0.1.0), while you are using zea v0.1.0. It may behave in unexpected ways. Install an earlier version of zea<0.1.0 for full compatibility or re-save the file with zea v0.1.0 or later (e.g. via File.create).

Finally we can plot the result.

[11]:
image = zea.display.to_8bit(images[0], dynamic_range=(-50, 0))
plt.figure()
# Convert xlims and zlims from meters to millimeters for display
xlims_mm = [v * 1e3 for v in parameters.xlims]
zlims_mm = [v * 1e3 for v in parameters.zlims]
plt.imshow(image, cmap="gray", extent=[xlims_mm[0], xlims_mm[1], zlims_mm[1], zlims_mm[0]])
plt.xlabel("X (mm)")
plt.ylabel("Z (mm)")
[11]:
Text(0, 0.5, 'Z (mm)')
../../_images/notebooks_data_zea_data_example_21_1.png