zea.Dataset¶
- class zea.Dataset(file_paths, validate=False, directory_splits=None, revision=None, lazy=False, _suggest_lazy=True, **kwargs)[source]¶
Bases:
H5FileHandleCacheIterate over File(s) and Folder(s).
Initializes the Dataset.
- Parameters:
file_paths (
Union[List[str],str]) – (list of) path(s) to the folder(s) containing the HDF5 file(s) or list of HDF5 file paths. Can be a mixed list of folders and files.validate (
bool) – Whether to validate the dataset. Defaults to True.directory_splits (
list|None) – List of directory split by. Is a list of floats between 0 and 1, with the same length as the number of file_paths given. If none, all files in file_paths are used.revision (
str|None) – HuggingFace revision (branch, tag, or commit hash). Only used when file_paths containshf://paths. Defaults toNone(uses HuggingFace Hub default, i.e. themainbranch).lazy (
bool) – If True,hf://files are not downloaded at init — each file is downloaded on first access.len(ds)returns the number of files (not total frames). Defaults to False.
- find_files(paths)[source]¶
Find files and optionally validate folders and files.
- Return type:
List[str]
- property n_files¶
Return number of files in dataset.
- property total_frames¶
Return total number of frames in dataset.