zea.Dataset¶

class zea.Dataset(file_paths, validate=False, directory_splits=None, revision=None, lazy=False, _suggest_lazy=True, **kwargs)[source]¶

Bases: H5FileHandleCache

Iterate over File(s) and Folder(s).

Initializes the Dataset.

Parameters:
  • file_paths (Union[List[str], str]) – (list of) path(s) to the folder(s) containing the HDF5 file(s) or list of HDF5 file paths. Can be a mixed list of folders and files.

  • validate (bool) – Whether to validate the dataset. Defaults to True.

  • directory_splits (list | None) – List of directory split by. Is a list of floats between 0 and 1, with the same length as the number of file_paths given. If none, all files in file_paths are used.

  • revision (str | None) – HuggingFace revision (branch, tag, or commit hash). Only used when file_paths contains hf:// paths. Defaults to None (uses HuggingFace Hub default, i.e. the main branch).

  • lazy (bool) – If True, hf:// files are not downloaded at init — each file is downloaded on first access. len(ds) returns the number of files (not total frames). Defaults to False.

__call__()[source]¶

Call self as a function.

find_files(paths)[source]¶

Find files and optionally validate folders and files.

Return type:

List[str]

classmethod from_config(path, user=None, **kwargs)[source]¶

Creates a Dataset from a config file.

load_file_shapes(key)[source]¶

Load the shapes of the datasets in each file.

property n_files¶

Return number of files in dataset.

property total_frames¶

Return total number of frames in dataset.