ParallelLoader
ParallelLoader Objects
class ParallelLoader(ParallelQuery.ParallelQuery)
Parallel and Batch Loader for ApertureDB
This takes a dataset (which is a collection of homogeneous objects) or a derived class, and optimally inserts them into database by splitting them into batches, and passing the batches to multiple workers.
get_entity_indexes
def get_entity_indexes(schema: dict) -> dict
Returns a dictionary of indexes for entities' properties.
Arguments:
schema
dict - The schema dictionary to get indexes from.
Returns:
dict
- A dictionary of entity indexes.
get_connection_indexes
def get_connection_indexes(schema: dict) -> dict
Returns a dictionary of indexes for connections' properties.
Arguments:
schema
dict - The schema dictionary to get indexes from.
Returns:
dict
- A dictionary of connection indexes.
query_setup
def query_setup(generator: Subscriptable) -> None
Runs the setup for the loader, which includes creating indices Currently, it only creates indices for the properties that are also used for constraint.
Will only run when the argument generator has a get_indices method that returns a dictionary of the form:
{
"entity": {
"class_name": ["property_name"]
},
}
or
{
"connection": {
"class_name": ["property_name"]
},
}
Arguments:
generator
Subscriptable - The Subscriptable object that is being ingested
ingest
def ingest(generator: Subscriptable,
batchsize: int = 1,
numthreads: int = 4,
stats: bool = False) -> None
Method to ingest data into the database
Arguments:
generator
Subscriptable - The list of data, or a class derived from Subscriptable to be ingested.batchsize
int, optional - The size of batch to be used. Defaults to 1.numthreads
int, optional - Number of workers to create. Defaults to 4.stats
bool, optional - If stats need to be presented, realtime. Defaults to False.