data_base
❭ IO
❭ LoaderDumper
❭ dask_to_parquet
❭ dump
dump¶
-
data_base.IO.LoaderDumper.dask_to_parquet.dump(obj, savedir, schema=
None
, client=None
, repartition=10000
)¶ Save a dask dataframe to one or more parquet files.
One parquet file per partition is created. Each partition is written to a file named ‘pandas_to_parquet.<n_partitions>.<partition>.parquet’. The writing of these files is parallelized using the dask client if one is provided.
In addition to the dask dataframe itself, meta information is saved in the form of a JSON file.
See also
save_object_meta()
for saving meta information- Parameters:¶
obj (dask.dataframe) – Dask dataframe to save
savedir (str) – Directory where the parquet files will be stored
client (dask.distributed.Client) – Dask client for parallellization.
repartition (int) – If the original object has more than twice this amount of partitions, it will be repartitioned. Otherwise, the object is saved according to its original partitioning.
- Returns:¶
None
See also
Each individual partitoin is saved using
save_helper()
.