ParallelLoader
ParallelLoader Objects
class ParallelLoader(ParallelQuery.ParallelQuery)
Parallel and Batch Loader for ApertureDB
This takes a dataset (which is a collection of homogeneous objects) or a derived class, and optimally inserts them into database by splitting them into batches, and passing the batches to multiple workers.
query_setup
def query_setup(generator: Subscriptable) -> None
Runs the setup for the loader, which includes creating indices Currently, it only creates indices for the properties that are also used for constraint.
Will only run when the argument generator has a get_indices method that returns a dictionary of the form:
{
"entity": {
"class_name": ["property_name"]
},
}
or
{
"connection": {
"class_name": ["property_name"]
},
}
Arguments:
generator
Subscriptable - The Subscriptable object that is being ingested
ingest
def ingest(generator: Subscriptable,
batchsize: int = 1,
numthreads: int = 4,
stats: bool = False) -> None
Method to ingest data into the database
Arguments:
generator
Subscriptable - The list of data, or a class derived from Subscriptable to be ingested.batchsize
int, optional - The size of batch to be used. Defaults to 1.numthreads
int, optional - Number of workers to create. Defaults to 4.stats
bool, optional - If stats need to be presented, realtime. Defaults to False.