Dani's IT Blog

…moving from Oracle to Software Development

Oracle Exadata performance revealed – SmartScan – Part II

with one comment

In the last post I gave a brief introduction to Oracle Exadata and a first hint on how the massive performance improvements are achieved.

Let us think through what Oracle had to do, to be able to filter data on the storage level.

Normally communication between Oracle and the storage server is on block level. Which means Oracle tells the storage system which blocks it needs and the storage sends them to the database server. With such a communication the storage can not know, how data could be pre-filtered. Therefore Oracle had to find a way to send additional information to the storage server. Which led to a new protocol called iDB. It is used for the communication between the database server and the storage cells. The added data contains the following and more information:

  • db_unique_name
  • instance name
  • consumer group
  • read mirror copy yes/no
  • sqlid
  • SQL statement or relevant parts of it

But the new protocol itself is not sufficient. We now have the SQL statement, or at least parts of it on the storage cell (I will try to investigate this in more detail in the future). But what next?
Of course the cell software will have to separate needed and unneeded data. To do so, it has to be able to read Oracle blocks. Therefore the cell software contains parts of the Oracle Database kernel.

To transfer the filtered data back to the database, the iDB protocol is used. But what now? How can we put this information into the buffer cache?

Let us have a look at the structure of an Oracle block. The one shown below, contains a part of a heap table.

Oracle block containing parts of a heap table

As we can see, the data area is only a subpart of the block. By ripping of a single row, data can be reduced even further. But how can this information be stored in the buffer cache? Remember, the buffer cache only contains complete oracle blocks. That’s also why the database has to have multiple buffer caches, if it uses more than one block size.

Sending the complete block, is not what we want to do. What’s the alternative?
What about: We send only the needed rows to the database and it fills the gaps, to create a complete block, with NULL?
This would unnecessarily pollute the buffer cache with NULLs and the database would have to take special care for never writing this block back to the storage.

Is there a better solution?
Oracle could create an artificial block, containing only the needed rows gathered from multiple blocks. But what can we gain by doing so?
Although the buffer cache would not get polluted with NULLs anymore, the database would still have to take special care for never writing this block to the storage.

Ok, lets take it one step further. Of what use would this artificial blocks in the buffer cache be?
Only the original query, or maybe some queries selecting subsets of the original one, could ever use this blocks again. Because the data is specially filtered for the query on row and column level, chances that the data can be reused are very low. Therefore it does not make sense, to have such blocks in the buffer cache anyway.

So what’s the solution then?
Once again, Oracle had the chance to reuse some “old” technology (please consider this a laud for the good decisions the Oracle developers made in the past). The direct read, usually used for parallel reads, puts a result set into the PGA instead of putting the read blocks into the buffer cache. Because only the result set is saved, no unnecessary data pollutes the memory. By using the PGA instead of the SGA, special memory structures such as a result set are possible and the memory can be freed as soon as the requester finished reading it. To write the data into the PGA the Cells use RDMA, remote direct memory access. Without RDMA additional memory copies would be needed and a processor in the database server would have to take care of this. Thanks to RDMA, the processors of the database server should not be involved at all.

Now you should have some knowledge about the underlying architecture of the SmartScan. In the next post I will show, how the predicate offloading works.


Written by danirey

March 2, 2011 at 17:56

One Response

Subscribe to comments with RSS.

  1. […] In the first two posts I described what Exadata is and discussed the new concepts introduced with Exadata. […]

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: