Skip to content

scigolib/hdf5

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

HDF5 Go Library

Pure Go implementation of the HDF5 file format - No CGo required

Go Version Go Report Card CI Coverage License Status GoDoc

A modern, pure Go library for reading and writing HDF5 files without CGo dependencies. Read support is feature-complete, write support advancing rapidly! v0.11.6-beta: Dataset resizing, variable-length datatypes, and hyperslab selection complete!


✨ Features

  • βœ… Pure Go - No CGo, no C dependencies, cross-platform
  • βœ… Modern Design - Built with Go 1.25+ best practices
  • βœ… HDF5 Compatibility - Read: v0, v2, v3 superblocks | Write: v0, v2 superblocks
  • βœ… Full Dataset Reading - Compact, contiguous, chunked layouts with GZIP
  • βœ… Rich Datatypes - Integers, floats, strings (fixed/variable), compounds
  • βœ… Memory Efficient - Buffer pooling and smart memory management
  • βœ… Production Ready - Read support feature-complete
  • ✍️ Comprehensive Write Support - Datasets, groups, attributes + Smart Rebalancing!

πŸš€ Quick Start

Installation

go get github.com/scigolib/hdf5

Basic Usage

package main

import (
    "fmt"
    "log"
    "github.com/scigolib/hdf5"
)

func main() {
    // Open HDF5 file
    file, err := hdf5.Open("data.h5")
    if err != nil {
        log.Fatal(err)
    }
    defer file.Close()

    // Walk through file structure
    file.Walk(func(path string, obj hdf5.Object) {
        switch v := obj.(type) {
        case *hdf5.Group:
            fmt.Printf("πŸ“ %s (%d children)\n", path, len(v.Children()))
        case *hdf5.Dataset:
            fmt.Printf("πŸ“Š %s\n", path)
        }
    })
}

Output:

πŸ“ / (2 children)
πŸ“Š /temperature
πŸ“ /experiments/ (3 children)

More examples β†’


πŸ“š Documentation

Getting Started

Reference

Advanced


⚑ Performance Tuning

NEW in v0.11.6-beta: Dataset resizing, variable-length datatypes (strings, ragged arrays), and efficient hyperslab selection (data slicing)!

When deleting many attributes, B-trees can become sparse (wasted disk space, slower searches). This library offers 4 rebalancing strategies:

1. Default (No Rebalancing)

Fast deletions, but B-tree may become sparse

// No options = no rebalancing (like HDF5 C library)
fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate)

Use for: Append-only workloads, small files (<100MB)


2. Lazy Rebalancing (10-100x faster than immediate)

Batch processing: rebalances when threshold reached

fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
    hdf5.WithLazyRebalancing(
        hdf5.LazyThreshold(0.05),         // Trigger at 5% underflow
        hdf5.LazyMaxDelay(5*time.Minute), // Force rebalance after 5 min
    ),
)

Use for: Batch deletion workloads, medium/large files (100-500MB)

Performance: ~2% overhead, occasional 100-500ms pauses


3. Incremental Rebalancing (ZERO pause)

Background processing: rebalances in background goroutine

fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
    hdf5.WithLazyRebalancing(),  // Prerequisite!
    hdf5.WithIncrementalRebalancing(
        hdf5.IncrementalBudget(100*time.Millisecond),
        hdf5.IncrementalInterval(5*time.Second),
    ),
)
defer fw.Close()  // Stops background goroutine

Use for: Large files (>500MB), continuous operations, TB-scale data

Performance: ~4% overhead, zero user-visible pause


4. Smart Rebalancing (Auto-Pilot)

Auto-tuning: library detects workload and selects optimal mode

fw, err := hdf5.CreateForWrite("data.h5", hdf5.CreateTruncate,
    hdf5.WithSmartRebalancing(
        hdf5.SmartAutoDetect(true),
        hdf5.SmartAutoSwitch(true),
    ),
)

Use for: Unknown workloads, mixed operations, research environments

Performance: ~6% overhead, adapts automatically


Performance Comparison

Mode Deletion Speed Pause Time Use Case
Default 100% (baseline) None Append-only, small files
Lazy 95% (10-100x faster than immediate!) 100-500ms batches Batch deletions
Incremental 92% None (background) Large files, continuous ops
Smart 88% Varies Unknown workloads

Learn more:


🎯 Current Status

Version: v0.11.6-beta (RELEASED 2025-11-06 - Dataset Resize + VLen + Hyperslab) βœ…

Production Readiness: Read support feature-complete! Write support advancing rapidly! πŸŽ‰

βœ… Fully Implemented

  • File Structure:

    • Superblock parsing (v0, v2, v3)
    • Object headers v1 (legacy HDF5 < 1.8) with continuations
    • Object headers v2 (modern HDF5 >= 1.8) with continuations
    • Groups (traditional symbol tables + modern object headers)
    • B-trees (leaf + non-leaf nodes for large files)
    • Local heaps (string storage)
    • Global Heap (variable-length data)
    • Fractal heap (direct blocks for dense attributes) ✨ NEW
  • Dataset Reading:

    • Compact layout (data in object header)
    • Contiguous layout (sequential storage)
    • Chunked layout with B-tree indexing
    • GZIP/Deflate compression
    • Filter pipeline for compressed data ✨ NEW
  • Datatypes (Read + Write):

    • Basic types: int8-64, uint8-64, float32/64
    • Strings: Fixed-length (null/space/null-padded), variable-length (via Global Heap)
    • Advanced types: Arrays, Enums, References (object/region), Opaque
    • Compound types: Struct-like with nested members
  • Attributes:

    • Compact attributes (in object header) ✨ NEW
    • Dense attributes (fractal heap foundation) ✨ NEW
    • Attribute reading for groups and datasets ✨ NEW
    • Full attribute API (Group.Attributes(), Dataset.Attributes()) ✨ NEW
  • Navigation: Full file tree traversal via Walk()

  • Code Quality:

    • Test coverage: 89.7% in internal/ (target: >70%) βœ…
    • Lint issues: 0 (34+ linters) βœ…
    • TODO items: 0 (all resolved) βœ…
    • 57 reference HDF5 test files βœ…

⚠️ Partial Support

  • Dense Attributes: Infrastructure ready, B-tree v2 iteration deferred to v0.12.0-rc.1 (<10% of files affected)

✍️ Write Support (v0.11.6-beta)

NEW: Advanced Write Features! βœ…

Dataset Operations:

  • βœ… Create datasets (all layouts: contiguous, chunked, compact)
  • βœ… Write data (all standard datatypes)
  • βœ… Dataset resizing with unlimited dimensions (NEW!)
  • βœ… Variable-length datatypes: strings, ragged arrays (NEW!)
  • βœ… Compression (GZIP, Shuffle, Fletcher32)
  • βœ… Array and enum datatypes
  • βœ… References and opaque types
  • βœ… Attribute writing (dense & compact storage)
  • βœ… Attribute modification/deletion

Read Enhancements:

  • βœ… Hyperslab selection (data slicing) - 10-250x faster! (NEW!)
  • βœ… Efficient partial dataset reading
  • βœ… Stride and block support
  • βœ… Chunk-aware reading (reads ONLY needed chunks)

Known Limitations (v0.11.6-beta):

  • ⚠️ Soft/external links (hard links work, MVP APIs exist)
  • ⚠️ Compound datatype writing (read works perfectly)
  • ⚠️ Some advanced filters

❌ Planned Features

Next Steps - See ROADMAP.md for complete timeline and versioning strategy.


πŸ”§ Development

Requirements

  • Go 1.25 or later
  • No external dependencies for the library

Building

# Clone repository
git clone https://github.com/scigolib/hdf5.git
cd hdf5

# Run tests
go test ./...

# Build examples
go build ./examples/...

# Build tools
go build ./cmd/...

Testing

# Run all tests
go test ./...

# Run with race detector
go test -race ./...

# Run with coverage
go test -coverprofile=coverage.out ./...
go tool cover -html=coverage.out

🀝 Contributing

Contributions are welcome! This is an early-stage project and we'd love your help.

Before contributing:

  1. Read CONTRIBUTING.md - Git workflow and development guidelines
  2. Check open issues
  3. Review the Architecture Overview

Ways to contribute:

  • πŸ› Report bugs
  • πŸ’‘ Suggest features
  • πŸ“ Improve documentation
  • πŸ”§ Submit pull requests
  • ⭐ Star the project

πŸ—ΊοΈ Comparison with Other Libraries

Feature This Library gonum/hdf5 go-hdf5/hdf5
Pure Go βœ… Yes ❌ CGo wrapper βœ… Yes
Reading βœ… Full (v0.10.0) βœ… Full ❌ Limited
Writing βœ… MVP (v0.11.0) βœ… Full ❌ No
HDF5 1.8+ βœ… Yes ⚠️ Limited ❌ No
Advanced Datatypes βœ… Yes (v0.11.0) βœ… Yes ❌ No
Maintained βœ… Active ⚠️ Slow ❌ Inactive
Thread-safe ⚠️ User must sync* ⚠️ Conditional ❌ No

* Different File instances are independent. Concurrent access to same File requires user synchronization (standard Go practice). Full thread-safety with mutexes + SWMR mode planned for v0.12.0-rc.1.


πŸ“– HDF5 Resources


πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


πŸ™ Acknowledgments

  • The HDF Group for the HDF5 format specification
  • gonum/hdf5 for inspiration
  • All contributors to this project

Special Thanks

Professor Ancha Baranova - This project would not have been possible without her invaluable help and support. Her assistance was crucial in bringing this library to life.


πŸ“ž Support


Status: Beta - Read complete, Write support advancing Version: v0.11.6-beta (Dataset Resize + VLen + Hyperslab + 70.4% Coverage) Last Updated: 2025-11-06


Built with ❀️ by the HDF5 Go community Recognized by HDF Group Forum ⭐

Packages

No packages published

Contributors 2

  •  
  •  

Languages