data_base

Efficient, reproducible and flexible database with dictionary-like API. This package provides efficient and scalable methods to store and access simulation results at a terrabyte scale. Each data base entry contains metadata, indicating when the data was written, and the exact version of the source code that was used at this timepoint. A wide variety of input data and output file formats are supported (see data_base.IO.LoaderDumper), including:

  • 1D and ND numpy arrays

  • pandas and dask dataframes

  • Cell objects

  • ReducedLdaModel objects

Simulation results from single_cell_parser and simrun can be imported and converted to a high performance binary format using the data_base.db_initializers subpackage.

Example

Loader contains information on how to load the data. It contains which module to use (assuming it contains a Loader class):

{"Loader": "data_base.IO.LoaderDumper.dask_to_parquet"}

metadata contains the time, commit hash, module versions, creation date, file format, and whether or not the data was saved with uncommitted code (dirty). If the data was created within a Jupyter session, it also contains the code history that was used to produce this data:

{
    "dumper": "dask_to_parquet",
    "time": [2025, 2, 21, 15, 51, 23, 4, 52, -1],
    "module_list": "...",
    "module_versions": {
        "re": "2.2.1",
        ...
        "pygments": "2.18.0",
        "bluepyopt": "1.9.126"
        },
    "history": "import Interface as I ...",
    "hostname": "localhost",
    "metadata_creation_time": "together_with_new_key",
    "version": "heads/master",
    "full-revisionid": "9fd2c2a94cdc36ee806d4625e353cd289cd7ce16",
    "dirty": false,
    "error": null
}

Functions

is_data_base(path)

Checks if a given path contains a DataBase.

is_sub_data_base(parent_db, key)

Check if a given key is a sub-database of the parent database.

get_db_by_unique_id(unique_id)

Get a DataBase by its unique ID, as registered in the data base register.

Modules

IO

Read and write data.

analyze

Analyze simrun-initialized databases.

db_initializers

Initialize a database from raw simulation data.

data_base_register

Registry of databases.

dbopen

Open files directly in a database.

distributed_lock

Configuration for locking servers

exceptions

data_base specific exceptions.

isf_data_base

Database for robust and efficient data storage.

utils

Database utility and convenience functions.