{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Working with the ``zea`` data format\n", "In this tutorial notebook we will show how to load a zea data file and how to access the data stored in it. There are three common ways to load a zea data file:\n", "\n", "1. Loading data from single file with `zea.File`\n", "2. Loading data from a group of files with `zea.Dataset`\n", "3. Loading data in batches with dataloading utilities with `zea.Dataloader`" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/tue-bmd/zea/blob/main/docs/source/notebooks/data/zea_data_example.ipynb)\n", " \n", "[![View on GitHub](https://img.shields.io/badge/GitHub-View%20Source-blue?logo=github)](https://github.com/tue-bmd/zea/blob/main/docs/source/notebooks/data/zea_data_example.ipynb)\n", " \n", "[![Hugging Face dataset](https://img.shields.io/badge/Hugging%20Face-Dataset-yellow?logo=huggingface)](https://huggingface.co/datasets/zeahub/picmus)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "‼️ **Important:** This notebook is optimized for **GPU/TPU**. Code execution on a **CPU** may be very slow.\n", "\n", "If you are running in Colab, please enable a hardware accelerator via:\n", "\n", "**Runtime → Change runtime type → Hardware accelerator → GPU/TPU** 🚀." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%%capture\n", "%pip install zea" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "tags": [ "parameters" ] }, "outputs": [], "source": [ "config_picmus_iq = \"hf://zeahub/configs/config_picmus_iq.yaml\"" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "tags": [] }, "outputs": [], "source": [ "import os\n", "\n", "os.environ[\"KERAS_BACKEND\"] = \"jax\"\n", "os.environ[\"ZEA_DISABLE_CACHE\"] = \"1\"\n", "os.environ[\"TF_CPP_MIN_LOG_LEVEL\"] = \"3\"" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\u001b[38;5;36mzea\u001b[0m\u001b[0m: Using backend 'jax'\n" ] } ], "source": [ "import keras\n", "import matplotlib.pyplot as plt\n", "\n", "import zea\n", "from zea.visualize import set_mpl_style" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will work with the GPU if available, and initialize using `init_device` to pick the best available device. Also, (optionally), we will set the matplotlib style for plotting." ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "zea.init_device(verbose=False)\n", "set_mpl_style()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading a file with `zea.File`\n", "The zea data format works with HDF5 files. We can open a zea data file using the `h5py` package and have a look at the contents using the `zea.File.summary()` function. You can see that every dataset element contains a corresponding description and unit. Note that we now pass a url to a Hugging Face dataset, but you can also use a local file path to a zea data file. Here we will use an example from the [PICMUS](https://www.creatis.insa-lyon.fr/Challenge/IEEE_IUS_2016/home) dataset, converted to zea format and hosted on the [Hugging Face Hub](https://huggingface.co/datasets/zeahub/picmus).\n", "\n", "> *Tip:*\n", "> You can also use the [HDFView](https://www.hdfgroup.org/downloads/hdfview/) tool to view the contents of the zea data file without having to run any code. Or if you use VS Code, you can install the [HDF5 extension](https://marketplace.visualstudio.com/items?itemName=h5web.vscode-h5web) to view the contents of the file." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "You can extract data and acquisition parameters (which are stored together with the data in the zea data file) as follows:" ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8680274cafa148caaa2937efba1db772", "version_major": 2, "version_minor": 0 }, "text/plain": [ "database/experiments/contrast_speckle/co(…): 0%| | 0.00/64.2M [00:00\n", "\n", "\n", "\n" ] } ], "source": [ "dataset_path = \"hf://zeahub/picmus/database/experiments\"\n", "\n", "dataset = zea.Dataset(dataset_path, revision=\"v0.1.0\")\n", "\n", "print(dataset)\n", "\n", "for file in dataset:\n", " print(file)\n", "\n", "dataset.close()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Loading data with `Dataloader`\n", "\n", "In machine and deep learning workflows, we often want more features like batching, shuffling, and parallel data loading. The `zea.Dataloader` class provides a convenient way to create a high-performance data loader from a zea dataset. It is built on Grain and does not require TensorFlow. This dataloader is particularly useful for training models. Consistency of shape is preferred, which is not the case for [PICMUS](https://www.creatis.insa-lyon.fr/Challenge/IEEE_IUS_2016/home). Therefore in this example we will use a small part of the [CAMUS](https://www.creatis.insa-lyon.fr/Challenge/camus/) dataset." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "ff160e5b11f2444bb2010b3629c90109", "version_major": 2, "version_minor": 0 }, "text/plain": [ "val/patient0401/patient0401_2CH_half_seq(…): 0%| | 0.00/24.9M [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "dataset_path = \"hf://zeahub/camus-sample/val\"\n", "dataloader = zea.Dataloader(\n", " dataset_path,\n", " key=\"data/image/values\",\n", " revision=\"v0.1.0\",\n", " batch_size=4,\n", " shuffle=True,\n", " clip_image_range=[-60, 0],\n", " image_range=[-60, 0],\n", " normalization_range=[0, 1],\n", " image_size=(256, 256),\n", " resize_type=\"resize\", # or \"center_crop or \"random_crop\"\n", " seed=4,\n", ")\n", "\n", "for batch in dataloader:\n", " print(\"Batch shape:\", batch.shape)\n", " break # Just show the first batch\n", "\n", "fig, _ = zea.visualize.plot_image_grid(batch)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Processing an example\n", "We will now use one of the zea data files to demonstrate how to process it. A full example can be found in the [zea_pipeline_example](../pipeline/zea_pipeline_example.ipynb) notebook. Here we will just show a simple example for completeness. We will start by loading a config file, that contains all the required information to initiate a processing pipeline." ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "feb66c4487da4910b8439b56ad553ae6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "config_picmus_iq.yaml: 0.00B [00:00, ?B/s]" ] }, "metadata": {}, "output_type": "display_data" }, { "name": "stdout", "output_type": "stream", "text": [ "\u001b[1m\u001b[38;5;36mzea\u001b[0m\u001b[0m: \u001b[38;5;214mWARNING\u001b[0m Config key 'scan' is deprecated; use 'parameters' instead. Aliasing 'scan' -> 'parameters'.\n" ] } ], "source": [ "config = zea.Config.from_path(config_picmus_iq)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Now we can load the zea data file, extract data and parameters, and then process the data using the pipeline defined by the config file." ] }, { "cell_type": "code", "execution_count": 10, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "8da823dcb7ff4d0598662faa0fd7e9e2", "version_major": 2, "version_minor": 0 }, "text/plain": [ "database/simulation/contrast_speckle/con(…): 0%| | 0.00/36.4M [00:00" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "image = zea.display.to_8bit(images[0], dynamic_range=(-50, 0))\n", "plt.figure()\n", "# Convert xlims and zlims from meters to millimeters for display\n", "xlims_mm = [v * 1e3 for v in parameters.xlims]\n", "zlims_mm = [v * 1e3 for v in parameters.zlims]\n", "plt.imshow(image, cmap=\"gray\", extent=[xlims_mm[0], xlims_mm[1], zlims_mm[1], zlims_mm[0]])\n", "plt.xlabel(\"X (mm)\")\n", "plt.ylabel(\"Z (mm)\")" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.11" } }, "nbformat": 4, "nbformat_minor": 2 }