data_base¶

Efficient, reproducible and flexible database with dictionary-like API. This package provides efficient and scalable methods to store and access simulation results at a terrabyte scale. Each data base entry contains metadata, indicating when the data was written, and the exact version of the source code that was used at this timepoint. A wide variety of input data and output file formats are supported (see data_base.IO.LoaderDumper), including:

1D and ND numpy arrays
pandas and dask dataframes
Cell objects
ReducedLdaModel objects

Simulation results from single_cell_parser and simrun can be imported and converted to a high performance binary format using the data_base.db_initializers subpackage.

Example

Loader contains information on how to load the data. It contains which module to use (assuming it contains a Loader class):

{"Loader": "data_base.IO.LoaderDumper.dask_to_parquet"}

metadata contains the time, commit hash, module versions, creation date, file format, and whether or not the data was saved with uncommitted code (dirty). If the data was created within a Jupyter session, it also contains the code history that was used to produce this data:

{
    "dumper": "dask_to_parquet",
    "time": [2025, 2, 21, 15, 51, 23, 4, 52, -1],
    "module_list": "...",
    "module_versions": {
        "re": "2.2.1",
        ...
        "pygments": "2.18.0",
        "bluepyopt": "1.9.126"
        },
    "history": "import Interface as I ...",
    "hostname": "localhost",
    "metadata_creation_time": "together_with_new_key",
    "version": "heads/master",
    "full-revisionid": "9fd2c2a94cdc36ee806d4625e353cd289cd7ce16",
    "dirty": false,
    "error": null
}

Functions¶

`is_data_base`(path)	Checks if a given path contains a `DataBase`.
`is_sub_data_base`(parent_db, key)	Check if a given key is a sub-database of the parent database.
`get_db_by_unique_id`(unique_id)	Get a DataBase by its unique ID, as registered in the data base register.

Modules¶

`IO`	Read and write data.
`analyze`	Analyze simrun-initialized databases.
`db_initializers`	Initialize a database from raw simulation data.
`data_base_register`	Registry of databases.
`dbopen`	Open files directly in a database.
`distributed_lock`	Configuration for locking servers
`exceptions`	`data_base` specific exceptions.
`isf_data_base`	Database for robust and efficient data storage.
`utils`	Database utility and convenience functions.