data_baseIOLoaderDumperdask_to_parquetdump

dump

data_base.IO.LoaderDumper.dask_to_parquet.dump(obj, savedir, schema=None, client=None, repartition=10000)

Save a dask dataframe to one or more parquet files.

One parquet file per partition is created. Each partition is written to a file named ‘pandas_to_parquet.<n_partitions>.<partition>.parquet’. The writing of these files is parallelized using the dask client if one is provided.

In addition to the dask dataframe itself, meta information is saved in the form of a JSON file.

See also

save_object_meta() for saving meta information

Parameters:
  • obj (dask.dataframe) – Dask dataframe to save

  • savedir (str) – Directory where the parquet files will be stored

  • client (dask.distributed.Client) – Dask client for parallellization.

  • repartition (int) – If the original object has more than twice this amount of partitions, it will be repartitioned. Otherwise, the object is saved according to its original partitioning.

Returns:

None

See also

Each individual partitoin is saved using save_helper().