HDF5 is a general purpose library and file format for storing
scientific data.
HDF5 can store two primary objects: datasets and
groups. A dataset is essentially a multidimensional array of data
elements, and a group is a structure for organizing objects in an HDF5
file. Using these two basic objects, one can create and store almost
any kind of scientific data structure, such as images, arrays of vectors,
and structured and unstructured grids. You can also mix and match them
in HDF5 files according to your needs.
Efficient storage and I/O.
HDF5 was created to address the data
management needs of scientists and engineers working in high performance,
data intensive computing environments. As a result, the HDF5 library and
format emphasize storage and I/O efficiency. For instance, the HDF5 format
can accommodate data in a variety of ways, such as compressed or chunked.
And the library is tuned and adapted to read and write data efficiently on
parallel computing systems.
Software.
NCSA maintains a suite of free, open source software,
including the HDF5 I/O library and several utilities. The HDF5 user
community also develops and contributes software, much of it freely
available. Unlike HDF4, there is little commercial support for HDF5 at
this time, but we are successfully working with vendors to change this.
Emphasis on standards.
Data can be stored in HDF5 in an endless variety of
ways, so it is important for communities of users to standardize on how their
data is to be organized in HDF5. This makes it possible to share data
easily, and also to build and share tools for accessing and analyzing data
stored in HDF5. The NCSA HDF team works with users to encourage them to
organize HDF5 files in standard ways.
Large and varied user community.
HDF5 users range across a variety of
engineering and scientific fields, and even some non-technical fields.
Data stored in HDF5 is used for a wide range of applications, from
computational fluid dynamics to film making.
Further Information:
Documents preprared for the 2002 R&D 100 Award that (still) contain useful
information:
How is HDF5 different from HDF4 and earlier versions of HDF?
HDF5 was designed to address some of the limitations of the HDF 4.x library and to address current and anticipated requirements of modern systems and applications.Some of the HDF (4) limitations are:
- A single file cannot store more than 20,000 complex objects, and a
single file cannot be larger than 2 gigabytes.
- The data models are less consistent than they should be. There are more
object types than necessary, and datatypes are too restricted.
- The library source is old and overly complex, does not support parallel I/O effectively, and is difficult to use in threaded applications.
HDF5 includes the following improvements.
- A new file format designed to address some of the deficiencies of HDF
4.x, particularly the need to store larger files and more objects per file.
- A simpler, more comprehensive data model that includes only two basic
structures: a multidimensional array of record structures, and a grouping
structure.
- A simpler, better-engineered library and API, with improved support for parallel I/O, threads, and other requirements imposed by modern systems and applications.
HDF5 is not compatible with HDF (4), but conversion software is available for converting HDF4 data to HDF5, and vice versa.
- - Last modified:August 24th 2007
