Parallelism

Home | Previous Page | Next Page Disk, Memory, and Process Management > Table Fragmentation and PDQ > Parallel Database Query >

Parallelism

The degree of parallelism for a query refers to the number of subplans that the database server executes in parallel to run the query. For example, a two-table join that six threads execute (with each thread executing one sixth of the required processing) has a higher degree of parallelism than one that two threads execute.

The database server determines the best degree of parallelism for each component of a PDQ query, based on various considerations: the number of available coservers, the number of virtual processors (VPs) on each coserver, the fragmentation of the tables that are being queried, the complexity of the query, and so forth.

The database server achieves a high degree of parallelism, so SQL operations are completely parallel. Completely parallel means that Extended Parallel Server processes multiple threads simultaneously on all CPU VPs across all coservers to speed execution of a single query.

The value of PDQPRIORITY does not determine when to use PDQ to process a query in parallel. Even when the value of PDQPRIORITY is 0, the database server executes a query in parallel across all CPU VPs on all coservers.

Important:

In Extended Parallel Server, PDQPRIORITY does not affect the degree of parallelism. PDQPRIORITY values that are set by the database server administrator, by the user, and by the client application affect only the amount of memory available for parallel processing.

PDQ provides performance advantages on parallel-processing platforms composed of multiple computers.On a parallel-processing platform, PDQ distributes the execution of a query across available processors on all nodes that support coservers, and takes full advantage of the memory on each of those nodes.

When the connection coserver determines that a query requires access to data that is fragmented across coservers, the database server determines which additional coservers are required to participate in the query. It then divides the query plan into subplans for each of the participating coservers. This division is based on the fragmentation scheme of the tables and the availability of resources on the connection coserver and the participating coservers.

Extended Parallel Server distributes each subplan to the pertinent coservers and executes the subplans in parallel. Each subplan is processed simultaneously with the others. Because each subplan represents a smaller amount of processing time than the original query plan, the database server can drastically reduce the time that is required to process the query if each portion of the query had to be performed consecutively.

Parallel execution is extremely useful for decision support queries in which large volumes of data are scanned, joined, and sorted across multiple coservers.

For example, consider the following SQL request:

SELECT geo_id, sum(dollars) 
   FROM customer a, cash b
   WHERE a.cust_id=b.cust_id
   GROUP BY geo_id
   ORDER BY SUM(dollars)

In this example, the connection and participating coservers perform the following tasks:

Each coserver scans relevant fragments of the customer table and the cash table in parallel.
Each coserver joins rows from local fragments of both the customer table and the cash table by customer ID.
As participating coservers complete local join operations, they can go on to perform other portions of the join operation or aggregations. They can also perform some of the steps that are involved in selecting the geographic areas and dollar amounts that belong to particular customers, the group-by operations, and the order-by operations that are needed to complete the query.
When the query is complete, the connection coserver returns the results to the client.