Skip to main content

ParallelLoader

ParallelLoader Objects

class ParallelLoader(ParallelQuery.ParallelQuery)

Parallel and Batch Loader for ApertureDB

This takes a dataset (which is a collection of homogeneous objects) or a derived class, and optimally inserts them into database by splitting them into batches, and passing the batches to multiple workers.

query_setup

def query_setup(generator: Subscriptable) -> None

Runs the setup for the loader, which includes creating indices Currently, it only creates indices for the properties that are also used for constraint.

Will only run when the argument generator has a get_indices method that returns a dictionary of the form:

{
"entity": {
"class_name": ["property_name"]
},
}

or

{
"connection": {
"class_name": ["property_name"]
},
}

Arguments:

  • generator Subscriptable - The Subscriptable object that is being ingested

ingest

def ingest(generator: Subscriptable,
batchsize: int = 1,
numthreads: int = 4,
stats: bool = False) -> None

Method to ingest data into the database

Arguments:

  • generator Subscriptable - The list of data, or a class derived from Subscriptable to be ingested.
  • batchsize int, optional - The size of batch to be used. Defaults to 1.
  • numthreads int, optional - Number of workers to create. Defaults to 4.
  • stats bool, optional - If stats need to be presented, realtime. Defaults to False.