SLIDE 12.2 - Chapter 12: Query Processing Show slide p 537 * Query processing involves * translations of sql statements into reading actual blocks from disk * optimizations and transformations * evaluations of queries * Covering only the Overview, Measures, and Joins SLIDE 12.3 - Basic Steps in Query Processing Show slide p 537 SELECT salary FROM instructor WHERE salary < 75000 SLIDE 12.4 - Basic Steps in Query Processing Show slide p 538 * Parsing Check syntax relations & attribute names Create parse-tree representation of the query Translate into relational algebra SLIDE 12.5 - Basic Steps in Query Processing: Optimization Show slide p 539 SELECT salary FROM instructor WHERE salary < 75000 READ 1 * May be many ways to represent the same query in relational algebra READ 2 * Same relational algebra expression may be executed by different algorithms READ 3 * Annotations may be added to the RA expression to specify which algorithm should be used Called EVALUATION PRIMITIVE Sequence of evaluation primitives is called a QUERY-EXECUTION plan or QUERY-EVALUATION plan SLIDE 12.6 - Basic Steps in Query Processing: Optimization Show slide p 539 READ 1 * Optimization Users writing SQL shouldn't have to worry about this READ 2 READ 3 SLIDE 12.7 - Measures of Query Cost Show slide p 540 * Evaluate plans according to their cost How do you estimate the cost ? READ 1 * Focus on number of block transfers * What makes up the cost of a disk block access ? Seek time (on avg 4 ms) Rotational Latency * Block Transfer time (on avg 0.1 ms per 4k block) READ 2 SLIDE 12.8 - Measures of Query Cost Show slide p 540 READ 1 READ 2 * CPU costs depends on many other factors and is very fast compared to disk IO SLIDE 12.9 - Measures of Query Cost Show slide p 541 READ SLIDE * * * SKIP AHEAD TO JOINS * * * SLIDE 12.23 - Join Operation Show slide p 549 READ SLIDE SLIDE 12.24 - Nested-Loop Join Show slide p 550 * This join algorithm consists of a pair of nested loops POINT 1 * theta is a join condition * for each row in r read each row in s if the 2 rows satisfy the join condition Theta Combine the 2 rows Add them to the result set * natural join theta join on attributes of the same name delete repeated attributes columns in result READ 2 READ 3 READ 4 SLIDE 12.25 - Nested-Loop Join Show slide p 550 POINT 1 Nr is number of tuples in r Br is the number of blocks to hold all tuples in r Bs is the number of blocks to hold all tuples in s * For each record in r, we have to perform a complete scan on s READ 1 * Seeks * Need 1 seek for each block of r * Need 1 seek on s for each record in r Costs only 1 seek for each scan of s because records are read sequentially READ SLIDE SLIDE 12.26 - Block Nested-Loop Join Show slide p 551 * Improve performance if we process relations on a per-block basis rather than on a per-tuple basis * Every block of the inner relation is paired with every block of the outer relation * Within each pair of blocks every tuple in one block is paired with every tuple in the other block SLIDE 12.27 - Block Nested-Loop Join Show slide p 552 READ 1 * Doing a cross product of the blocks Read block from r Read ALL blocks from s READ 2 * If all the blocks fit into memory READ 3 SLIDE 12.28 - Indexed Nested-Loop Join Show slide p 553 READ 1 READ 2 * Go read tuple from data block whose index entry matced the join condition READ 3 * Room for 1 data block of r and one index block for s READ 4 SLIDE 12.29 - Example of Indexed Nested-Loop Join Costs Show slide p 553 READ 1-4 READ 5 * Transfers: Read 1 block of takes once for each student 400 blocks of takes * 100 blocks of student * Seeks: Seek once for each scan of inner relation Seek once for each block of outer relation 2 * 100 blocks of students READ 6 * Cut I/O in half SLIDE 12.30 - Merge-Join Show slide p 553 SLIDE 12.31 - Merge Merge-Join Show slide p 557 READ 1-3 * Bb is the number of buffer blocks READ 4 SLIDE 12.32 - Hash-Join Show slide p 557 * Value in r is hashed to one of its partitions * Equal value in s is hashed to one of its partitions * Only those 2 partitions need to be checked SLIDE 12.33 - Hash-Join Show slide p 558 SLIDE 12.34 - Hash-Join Show slide p 558 SLIDE 12.35 - Hash-Join Algorithm Show slide p 558 SLIDE 12.36 - Hash-Join Algorithm Show slide p 559 SLIDE 12.37 - Handling of Overflows Show slide p 560 SLIDE 12.38 - Cost of Hash Hash-Join Show slide p 561 READ 1 SLIDE 12.39 - Cost of Hash Hash-Join Show slide p 561 SLIDE 12.40 - Cost of Hash Hash-Join Show slide p 562 SLIDE 12.41 - Cost of Hash Hash-Join Show slide p 563 * ANDs READ 1 * ORs READ 2