Dani's IT Blog

…moving from Oracle to Software Development

Oracle Exadata performance revealed – SmartScan – Part V

with 2 comments

As I said in my first post, Oracle can optimize the queries through row and column filtering. Until now, I only showed how row filtering works.

Column filtering is an interesting option, to reduce the amount of data, which has to be transfered. In most queries we only need a subset of the columns of a table. If we need only two out of twenty columns of a table and the storage sends all columns to the database, then roughly 80% of the data received is immediately trashed. In the request sent from the database to the storage cells, the needed rows and columns are specified. Based on this information, the Exadata Cells only send data for the needed columns back to the database. As we already know, Oracle uses the result set as container for the data. Therefore sending only specific columns is fairly easy. Enough the theory, let’s have a look into this.

Until now I always used SELECT *, to assure that column projection does not kick in. The execution plan does not contain any information, whether column filtering, is or could be used. Basically it can always be used, if SmartScan is enabled and not all columns are requested by the database. If we want to know, if column filtering was used, we have to have a look at the diagnostic events. We only have to consider the value of the event ‘cell physical IO interconnect bytes’

All queries use the table opitz.huge_customers.

SQL> desc opitz.huge_customers
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      ID                                       NUMBER
    2      FIRST_NAME                               VARCHAR2(50 CHAR)
    3      LAST_NAME                                VARCHAR2(50 CHAR)
    4      COUNTRY                                  VARCHAR2(2 CHAR)
    5      IS_A_CLASS                               VARCHAR2(1)

First I issue a SELECT *, to know how much data is transferred, when all columns are transferred.

SELECT * FROM opitz.huge_customers WHERE is_a_class = 'Y';

Cell physical IO interconnect bytes shows that 340 MB were transferred over the interconnect.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes        357090752  SQL

What if we only select one column.

SELECT last_name FROM opitz.huge_customers WHERE is_a_class = 'Y';

Of course less data is transferred.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes        185055792  SQL

When we select all columns, 340MB have to be sent over the interconnect, whereas selecting only the column last_name, reduces the amount of data which has to be sent, to 180MB. Therefore only half the amount of data needs to be transferred. Can Exadata further reduce the data, if a column with a smaller data type is chosen?

SELECT is_a_class FROM opitz.huge_customers WHERE is_a_class = 'Y'; 

Yes, as we can see even less data was transferred.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes         53693936  SQL

This time a bit more than 50MB where transferred. We can clearly see, that it matters which columns are transferred. We know that the result set will cause some overhead. Let’s calculate it, to get an idea how efficient it is.

The is_a_class column takes 1 Byte to be stored. We selected 4259648 rows. Therefore 4259648 Bytes of data have to be transferred. Of course some additional data has to be added, for that the database knows how to interpret the data. 4259648 Bytes is 4MB and 51MB where transferred. Therefore the overhead per row was 51MB – 4 MB / 4259648 rows = 11 Bytes/row.

Does the overhead change when selecting another column? For the last_name column, we transferred 185055792 Bytes for the same amount of rows. I know, that the column last_name only contains values of 32 Bytes (I filled the column with randomly generate 32 Byte long strings). Therefore 4259648 * 32 Bytes = 136308736 Bytes or 129MB have to be transferred. 176MB minus 129MB is again 47MB for 4259648 rows and therefore again a overhead of 11 Bytes per row.

Although this overhead calculation might seem accurate, it isn’t. To get a better understanding, more tests would be needed, like changing the number of columns selected and changing the number rows selected. But I’m quite sure that a good part of this overhead is per row. I don’t think an overall result set overhead of more than a few kilobytes is realistic. But I can imagine, that some additional data is needed to separate a VARCHAR2 data field from it’s successor. The main message, of this overhead calculation is, an overhead exists and the optimizer might take this fact into account when it has to decide whether to offload or not.


Written by danirey

March 31, 2011 at 06:07

2 Responses

Subscribe to comments with RSS.

  1. great series Dani, I really enjoyed reading all of them. thanks for the info

    I also want Wish you good luck with the trip hope you enjoy too much.

    coskan gundogar

    April 29, 2011 at 00:28

    • Hi Coskan

      Thanks for the compliment


      May 3, 2011 at 14:50

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: