ParallelQuery
ParallelQuery Objects
class ParallelQuery(Parallelizer.Parallelizer)
Parallel and Batch Querier for ApertureDB
This class provides the abstraction for partitioning data into batches, so that they may be processed using different threads.
Arguments:
client
Connector - The database connector.dry_run
bool, optional - Whether to run in dry run mode. Defaults to False.
generate_batch
def generate_batch(
data: List[Tuple[Commands, Blobs]]) -> Tuple[Commands, Blobs]
Here we flatten the individual queries to run them as a single query in a batch We also update the _ref values and connections refs.
Arguments:
data
list[tuple[Query, Blobs]] - The data to be batched. Each tuple contains a list of commands and a list of blobs.
Returns:
commands
Commands - The batched commands.blobs
Blobs - The batched blobs.
do_batch
def do_batch(client: Connector, batch_start: int,
data: List[Tuple[Commands, Blobs]]) -> None
Executes batch of queries and blobs in the database.
Arguments:
client
Connector - The database connector.data
list[tuple[Commands, Blobs]] - The data to be batched. Each tuple contains a list of commands and a list of blobs.It also provides a way for invoking a user defined function to handle the responses of each of the queries executed. This function can be used to process the responses from each of the corresponding queries in Parallelizer It will be called once per query, and it needs to have 4 parameters:
- requests
- input_blobs
- responses
- output_blobs Example usage:
class MyQueries(QueryGenerator):
def process_responses(requests, input_blobs, responses, output_blobs):
self.requests.extend(requests)
self.responses.extend(responses)
loader = ParallelLoader(self.client)
generator = MyQueries()
loader.ingest(generator)
query
def query(generator,
batchsize: int = 1,
numthreads: int = 4,
stats: bool = False) -> None
This function takes as input the data to be executed in specified number of threads. The generator yields a tuple : (array of commands, array of blobs)
Arguments:
generator
type - The class that generates the queries to be executed.batchsize
int, optional - Number of queries per transaction. Defaults to 1.numthreads
int, optional - Number of parallel workers. Defaults to 4.stats
bool, optional - Show statistics at end of ingestion. Defaults to False.
debug_sample
def debug_sample(**kwargs) -> None
Sample the data to be ingested for debugging purposes.