❭ API reference ❭ data_base ❭
data_base¶
Efficient, reproducible and flexible database with dictionary-like API.
This package provides efficient and scalable methods to store and access simulation results at a terrabyte scale.
A wide variety of input data and output file formats are supported (see data_base.IO.LoaderDumper), including:
1D and ND numpy arrays
pandas and dask dataframes
Cellobjects
Databases saves keys as folders containing at least three files:
Loader: JSON file containing information on how to load the datametadata: JSON file containing metadata.Data file(s): The actual data, in a format specified by the
Loaderfile. Some file formats split up the data in multiple files, such as parquet and msgpack.
Simulation results from single_cell_parser and simrun can be imported and converted to a high performance binary format using the data_base.db_initializers subpackage.
Example
Loader contains information on how to load the data. It contains which module to use (assuming it contains a Loader class):
{"Loader": "data_base.IO.LoaderDumper.dask_to_parquet"}
metadata contains the time, commit hash, module versions, creation date, file format, and whether or not the data was saved with uncommitted code (dirty).
If the data was created within a Jupyter session, it also contains the code history that was used to produce this data:
{
"dumper": "dask_to_parquet",
"time": [2025, 2, 21, 15, 51, 23, 4, 52, -1],
"module_list": "...",
"module_versions": {
"re": "2.2.1",
...
"pygments": "2.18.0",
"bluepyopt": "1.9.126"
},
"history": "import Interface as I ...",
"hostname": "localhost",
"metadata_creation_time": "together_with_new_key",
"version": "heads/master",
"full-revisionid": "9fd2c2a94cdc36ee806d4625e353cd289cd7ce16",
"dirty": false,
"error": null
}
Saving and loading data is easily achieved:
from data_base import DataBase
db = DataBase('/path/to/database')
obj = pd.DataFrame(...) # some pandas dataframe for example
db['my_key'] = obj # saves the object to the database with the default format
loaded_obj = db['my_key'] # loads the object from the database
db.set('my_other_key', obj, dumper='pandas_to_msgpack') # saves the object with a specific format
When you don’t specify the dumper, the default dumper as specified in the configuration file is used.
The default dumper is purposely chosen to prioritize flexibility (i.e. save anything), not performance (i.e. save something specific very efficiently).
Performant data formats will need to be specified explicitly, as they often depend on the object being saved and the intended use case.
You can (but shouldn’t) reconfigure the default dumper in config/db_settings.json
Functions¶
|
Checks if a given path contains a |
|
Check if a given key is a sub-database of the parent database. |
|
Get a DataBase by its unique ID, as registered in the data base register. |
Attributes¶
- |
Modules¶
Read and write data. |
|
Analyze simrun-initialized databases. |
|
Initialize a database from raw simulation data. |
|
Registry of databases. |
|
Open files directly in a database. |
|
Configuration for locking servers |
|
|
|
The |
|
Database utility and convenience functions. |
Documentation unclear, incomplete, broken or wrong? Let us know