I/O and Data Formats
FFTMatvec uses HDF5 (via the HighFive C++ library) for all on-disk storage. Parallel HDF5 with MPI collective I/O is used for both reading and writing, so the same files work seamlessly across any number of MPI ranks.
This page describes how matrix directories and vector files should be formatted so that FFTMatvec can read them, and what FFTMatvec produces when it writes output.
Terminology
Before diving in, here is a quick glossary of the key dimensions:
| Symbol | Meaning |
|---|---|
Nd |
Global number of block rows (observation blocks) |
Nm |
Global number of block columns (parameter blocks) |
Nt |
Block size (number of time steps per block) |
A block-triangular Toeplitz (BTT) matrix \(F\) has shape \((N_d \cdot N_t) \times (N_m \cdot N_t)\). It is stored as \(N_d\) block rows, each of length \(N_m \cdot N_t\).
Data Ordering (SOTI)
FFTMatvec uses SOTI (Space-Outer-Time-Inner) ordering for all data. In this convention, data is stored with the block (sensor/parameter) index as the outer dimension and the time-step index as the inner dimension. Concretely, a vector \(\mathbf{v}\) of \(N\) blocks of size \(N_t\) is stored as:
This is indicated by the reindex attribute in HDF5 datasets. FFTMatvec requires reindex = 1 (i.e., SOTI ordering) for all input data.
Matrix Directory Structure
A matrix is stored as a directory with a specific layout. When you construct a Matrix from a path (either in C++ or Python), the code expects the following structure:
my_matrix/
└── binary/
├──adj/
| ├── <prefix>000000.h5 ← block row 0
| ├── <prefix>000001.h5 ← block row 1
| ├── <prefix>000002.h5 ← block row 2
| └── ... ← one file per block row
└── meta_adj ← plain-text metadata file
The meta_adj File
The meta_adj file is a simple plain-text file with nine lines, each containing a single value. The format is:
<global_num_rows> (line 0: Nd — number of block rows)
<global_num_cols> (line 1: Nm — number of block columns)
<block_size> (line 2: Nt — block size / time steps)
<prefix> (line 3: filename prefix string, e.g. "F_")
<extension> (line 4: file extension, must be ".h5")
<reindexed> (line 5: must be 1)
<is_p2q> (line 6: 0 for parameter-to-observable, 1 for parameter-to-QoI)
<reverse_dof> (line 7: must be 1)
<checksum> (line 8: integer checksum, 0 means no checksum verification)
Example meta_adj for a matrix with 10 block rows, 20 block columns, block size 50:
Block-Row HDF5 Files
Each block row is stored in a separate HDF5 file named <prefix><NNNNNN>.h5, where NNNNNN is the zero-padded (6-digit) block-row index. For example, with prefix F_, the files would be F_000000.h5, F_000001.h5, etc.
Each HDF5 file contains:
- Dataset
"vec"— a 1D array of doubles with shape(Nm * Nt,), stored in SOTI ordering. This is the flattened block row: block column 0's \(N_t\) values, then block column 1's, and so on. - Attributes on the
"vec"dataset:reindex(int) — must be1n_param(int) — must equalNm(global number of block columns)param_steps(int) — must equalNt(block size)checksum(int, optional) — if the meta file specifies a nonzero checksum, this must match
Vector HDF5 Format
A vector is stored as a single HDF5 file. Column vectors and row vectors use slightly different attribute names but share the same dataset structure.
Dataset
- Dataset
"vec"— a 1D array of doubles with shape(glob_num_blocks * block_size,), stored contiguously in SOTI ordering.
Attributes
The attributes on the "vec" dataset depend on whether the vector is a column vector (parameter-space, row_or_col = "col") or a row vector (observation-space, row_or_col = "row"):
Column vectors (parameter-space):
| Attribute | Type | Description |
|---|---|---|
n_param |
int | Global number of blocks (Nm) |
param_steps |
int | Block size (Nt) |
reindex |
int | Must be 1 (SOTI ordering) |
checksum |
int | Optional — present only if checksum ≠ 0 |
Row vectors (observation-space):
| Attribute | Type | Description |
|---|---|---|
n_obs |
int | Global number of blocks (Nd) |
obs_steps |
int | Block size (Nt) |
reindex |
int | Must be 1 (SOTI ordering) |
qoi |
int | Optional — 1 if this is a QoI (quantity-of-interest) vector |
Worked Example
Consider a small matrix with:
Nd = 3block rowsNm = 4block columnsNt = 5block size- prefix
F_
Directory layout
my_matrix/
└── binary/
├── adj/
│ ├── <prefix>000000.h5 ← block row 0
│ ├── <prefix>000001.h5 ← block row 1
│ ├── <prefix>000002.h5 ← block row 2
│ └── ... ← one file per block row
└── meta_adj ← plain-text metadata file
Contents of meta_adj
Each HDF5 file
Each file (e.g., F_000000.h5) contains:
- Dataset
"vec": shape(20,)— that is \(N_m \times N_t = 4 \times 5 = 20\) doubles - Attributes:
reindex = 1,n_param = 4,param_steps = 5
Corresponding vectors
- A column (input) vector for this matrix has shape
(20,)(\(N_m \times N_t\)) with attributesn_param = 4,param_steps = 5,reindex = 1 - A row (output) vector has shape
(15,)(\(N_d \times N_t\)) with attributesn_obs = 3,obs_steps = 5,reindex = 1
Creating Matrix Data with Python (h5py)
If you are generating matrix data outside of FFTMatvec (e.g., from a PDE solver), you can create the directory structure using Python and h5py:
import h5py
import numpy as np
import os
Nd = 3 # block rows
Nm = 4 # block columns
Nt = 5 # block size
prefix = "F_"
out_dir = "my_matrix/binary"
os.makedirs(out_dir, exist_ok=True)
# Write meta_adj
with open(os.path.join(out_dir, "meta_adj"), "w") as f:
f.write(f"{Nd}\n{Nm}\n{Nt}\n{prefix}\n.h5\n1\n0\n1\n0\n")
# Write each block row
for r in range(Nd):
filename = os.path.join(out_dir, f"{prefix}{r:06d}.h5")
data = np.random.randn(Nm * Nt) # replace with your actual data
with h5py.File(filename, "w") as f:
ds = f.create_dataset("vec", data=data)
ds.attrs["reindex"] = 1
ds.attrs["n_param"] = Nm
ds.attrs["param_steps"] = Nt
Similarly, to create a vector file:
import h5py
import numpy as np
Nm = 4
Nt = 5
data = np.random.randn(Nm * Nt)
with h5py.File("my_vector.h5", "w") as f:
ds = f.create_dataset("vec", data=data)
ds.attrs["reindex"] = 1
ds.attrs["n_param"] = Nm # use "n_obs" for row vectors
ds.attrs["param_steps"] = Nt # use "obs_steps" for row vectors
Reading and Writing in C++
Reading a Matrix from Disk
// Construct a Matrix from a directory path
// The code reads binary/meta_adj and binary/<prefix>NNNNNN.h5 files
Matrix F(comm, "/path/to/my_matrix");
// Optionally load an auxiliary matrix (must share the same dimensions)
Matrix F_with_aux(comm, "/path/to/my_matrix", "/path/to/aux_matrix");
Reading a Vector from Disk
// Create a vector with matching dimensions, then read from file
Vector v(comm, Nm, Nt, "col", true); // column vector with global sizes
v.init_vec_from_file("my_vector.h5");
// With checksum verification
v.init_vec_from_file("my_vector.h5", 42); // expects checksum == 42
Writing a Vector to Disk
// After computing a result, save it
result_vec.save("output.h5");
// For QoI vectors, pass QoI = true
qoi_vec.save("qoi_output.h5", true);
Reading and Writing in Python
Reading a Matrix
import pyFFTMatvec
from mpi4py import MPI
comm = pyFFTMatvec.Comm(MPI.COMM_WORLD, proc_rows, proc_cols)
# Load matrix from directory
F = pyFFTMatvec.Matrix(comm, path="/path/to/my_matrix")
# With mixed precision
p_config = pyFFTMatvec.MatvecPrecisionConfig()
p_config.fft = pyFFTMatvec.Precision.SINGLE
F = pyFFTMatvec.Matrix(comm, path="/path/to/my_matrix", p_config=p_config)
Reading and Writing Vectors
# Create matching vectors from the matrix
x = F.get_vec("input") # column vector with correct dimensions
y = F.get_vec("output") # row vector with correct dimensions
# Read vector from file
x.init_vec_from_file("my_vector.h5")
# Compute matvec
F.matvec(x, y)
# Save result
y.save("result.h5")
Checksum Verification
FFTMatvec supports optional checksum verification to ensure data consistency. If the meta_adj file specifies a nonzero checksum (line 8), then:
- Each block-row HDF5 file must have a
checksumattribute on its"vec"dataset matching the value inmeta_adj - Vector files loaded via
init_vec_from_filecan also be checked by passing achecksumargument
This is useful when working with large datasets across different storage systems to verify data integrity.
Auxiliary Matrices
FFTMatvec supports loading an auxiliary matrix \(G\) alongside the primary matrix \(F\). This is used in the context of computing the full product \(G^T F\) (or similar operations). The auxiliary matrix must:
- Have the same dimensions (
Nd,Nm,Nt) as the primary matrix - Have the same checksum as the primary matrix
- Be stored in the same directory format as the primary matrix
You load it by providing the aux_path argument when constructing a Matrix.