Dani's IT Blog

…moving from Oracle to Software Development

Posts Tagged ‘Oracle

Oracle Exadata performance revealed – SmartScan – Part V

with 2 comments

As I said in my first post, Oracle can optimize the queries through row and column filtering. Until now, I only showed how row filtering works.

Column filtering is an interesting option, to reduce the amount of data, which has to be transfered. In most queries we only need a subset of the columns of a table. If we need only two out of twenty columns of a table and the storage sends all columns to the database, then roughly 80% of the data received is immediately trashed. In the request sent from the database to the storage cells, the needed rows and columns are specified. Based on this information, the Exadata Cells only send data for the needed columns back to the database. As we already know, Oracle uses the result set as container for the data. Therefore sending only specific columns is fairly easy. Enough the theory, let’s have a look into this.

Until now I always used SELECT *, to assure that column projection does not kick in. The execution plan does not contain any information, whether column filtering, is or could be used. Basically it can always be used, if SmartScan is enabled and not all columns are requested by the database. If we want to know, if column filtering was used, we have to have a look at the diagnostic events. We only have to consider the value of the event ‘cell physical IO interconnect bytes’

All queries use the table opitz.huge_customers.

SQL> desc opitz.huge_customers
           Name                            Null?    Type
           ------------------------------- -------- ----------------------------
    1      ID                                       NUMBER
    2      FIRST_NAME                               VARCHAR2(50 CHAR)
    3      LAST_NAME                                VARCHAR2(50 CHAR)
    4      COUNTRY                                  VARCHAR2(2 CHAR)
    5      IS_A_CLASS                               VARCHAR2(1)

First I issue a SELECT *, to know how much data is transferred, when all columns are transferred.

SELECT * FROM opitz.huge_customers WHERE is_a_class = 'Y';

Cell physical IO interconnect bytes shows that 340 MB were transferred over the interconnect.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes        357090752  SQL

What if we only select one column.

SELECT last_name FROM opitz.huge_customers WHERE is_a_class = 'Y';

Of course less data is transferred.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes        185055792  SQL

When we select all columns, 340MB have to be sent over the interconnect, whereas selecting only the column last_name, reduces the amount of data which has to be sent, to 180MB. Therefore only half the amount of data needs to be transferred. Can Exadata further reduce the data, if a column with a smaller data type is chosen?

SELECT is_a_class FROM opitz.huge_customers WHERE is_a_class = 'Y'; 

Yes, as we can see even less data was transferred.

NAME                                        VALUE     WAIT_CLASS
------------------------------------------ ---------- -----------
cell physical IO interconnect bytes         53693936  SQL

This time a bit more than 50MB where transferred. We can clearly see, that it matters which columns are transferred. We know that the result set will cause some overhead. Let’s calculate it, to get an idea how efficient it is.

The is_a_class column takes 1 Byte to be stored. We selected 4259648 rows. Therefore 4259648 Bytes of data have to be transferred. Of course some additional data has to be added, for that the database knows how to interpret the data. 4259648 Bytes is 4MB and 51MB where transferred. Therefore the overhead per row was 51MB – 4 MB / 4259648 rows = 11 Bytes/row.

Does the overhead change when selecting another column? For the last_name column, we transferred 185055792 Bytes for the same amount of rows. I know, that the column last_name only contains values of 32 Bytes (I filled the column with randomly generate 32 Byte long strings). Therefore 4259648 * 32 Bytes = 136308736 Bytes or 129MB have to be transferred. 176MB minus 129MB is again 47MB for 4259648 rows and therefore again a overhead of 11 Bytes per row.

Although this overhead calculation might seem accurate, it isn’t. To get a better understanding, more tests would be needed, like changing the number of columns selected and changing the number rows selected. But I’m quite sure that a good part of this overhead is per row. I don’t think an overall result set overhead of more than a few kilobytes is realistic. But I can imagine, that some additional data is needed to separate a VARCHAR2 data field from it’s successor. The main message, of this overhead calculation is, an overhead exists and the optimizer might take this fact into account when it has to decide whether to offload or not.


Written by danirey

March 31, 2011 at 06:07

Oracle Exadata performance revealed – SmartScan – Part IV

leave a comment »

In my last post I was writing about predicate offloading. I showed that it is possible, to offload a filter such as WHERE columnX equals Y. But how useful is this in real live? Most queries are more complex than this. Many use functions, such as upper, length, substr, mod, ceil, abs, ….

So what happens, if a function is part of a predicate? Does the Exadata Cell know about all of them?

No. But it knows about a lot of them. You can use every Oracle 11g database, to find out which one can be handled by the Exadata Cells and therefore don’t prevent offload processing. This information is part of the data dictionary.

SELECT * FROM v$sqlfn_metadata WHERE offloadable = 'YES';

In addition all those functions can be combined with the following operators.


To get an idea of how well it works, I made a simple test using the MOD function. The MOD function calculates [first parameter] modulo [second parameter]. As the ID column is unique, the following query will return every 117th row.

SELECT * FROM opitz.huge_customers WHERE mod(id,117) = 0;

What did the execution plan look like?

| Id  | Operation                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT          |                |   639K|    46M|   196K  (1)| 00:39:17 |
|*  1 |  TABLE ACCESS STORAGE FULL| HUGE_CUSTOMERS |   639K|    46M|   196K  (1)| 00:39:17 |

Predicate Information (identified by operation id):

   1 - storage(MOD("ID",117)=0)

The line
in the Predicate Information, indicates that offload processing could be used for this query. As we already know, we have to have a look at the events, to know whether offload processing was used or not. As you can see, from the first part of the tracefile, offload processing was used.

PARSING IN CURSOR #2 len=86 dep=0 uid=0 oct=3 lid=0 tim=1289240657920538 hv=3576726616 ad='22674a1e0' sqlid='f4d2btvam112s'
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM opitz.huge_customers WHERE mod(id,117) = 0
PARSE #2:c=2000,e=2852,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=4233741383,tim=1289240657920537
EXEC #2:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4233741383,tim=1289240657920643
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1289240657920667
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657921078
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657921136
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 179 name|mode=1263468550 2=131153 0=1 obj#=-1 tim=1289240657921343
WAIT #2: nam='reliable message' ela= 951 channel context=10055297152 channel handle=10033718256 broadcast message=10096394424 obj#=-1 tim=1289240657922493
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657922535
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=131153 0=1 obj#=-1 tim=1289240657922650
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657922694
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 114 name|mode=1263468545 2=131153 0=2 obj#=-1 tim=1289240657922835
WAIT #2: nam='cell smart table scan' ela= 136 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657931912
WAIT #2: nam='cell smart table scan' ela= 120 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657934294
WAIT #2: nam='cell smart table scan' ela= 246 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657937276
WAIT #2: nam='cell smart table scan' ela= 1126 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657955228
WAIT #2: nam='cell smart table scan' ela= 1086 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657956385
WAIT #2: nam='cell smart table scan' ela= 1085 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657957539
WAIT #2: nam='cell smart table scan' ela= 654 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657958306
WAIT #2: nam='cell smart table scan' ela= 205 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657958540
WAIT #2: nam='cell smart table scan' ela= 16 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657958584
WAIT #2: nam='cell smart table scan' ela= 14378 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657974236
WAIT #2: nam='cell smart table scan' ela= 2920 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657977313
WAIT #2: nam='cell smart table scan' ela= 1 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657977362
WAIT #2: nam='cell smart table scan' ela= 1 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657977379
FETCH #2:c=23996,e=56800,p=94,cr=103,cu=0,mis=0,r=1,dep=0,og=1,plh=4233741383,tim=1289240657977490

But is there also a way to see, how much it helped. We already know that the query returns only every 117th row, therefore only a 117th of the table needs to be returned by the Exadata Cells. Can we verify and confirm this?

Yes, there is such information in the v$sesstat View. Remember that those numbers are cumulative and you have to subtract the values of those stats before running the query, from the values after running the query. The following where my results.

physical read bytes = 5891588096 = 5600 MB
cell physical IO interconnect bytes = 47898864 = 45MB

And 5600MB / 117 > 45MB

So reality is even better than the theory. Therefore the overhead of the data blocks divided by 117 was bigger then the overhead of the result set đŸ˜€

In my next post, I will write about column projection.

Written by danirey

March 9, 2011 at 07:06

Oracle Exadata performance revealed – SmartScan – Part III

with one comment

In the first two posts I described what Exadata is and discussed the new concepts introduced with Exadata.

In this post I will show, how the predicate offloading works and how to analyze whether the database used SmartScan (offload processing) or not.

Let us start with a very simple query, which selects all A Class customers from the customers table. There is no Index on the is_a_class column and the table contains approximately 500 million rows.
SELECT * FROM opitz.customers WHERE is_a_class = 'Y';
and have a look at the execution plan

| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT          |           |   500K|    36M|  3100   (1)| 00:00:38 |
|*  1 |  TABLE ACCESS STORAGE FULL| CUSTOMERS |   500K|    36M|  3100   (1)| 00:00:38 |
Query Block Name / Object Alias (identified by operation id):
   1 - SEL$1 / CUSTOMERS@SEL$1
Predicate Information (identified by operation id):
   1 - filter("IS_A_CLASS"='Y')
Column Projection Information (identified by operation id):

This execution plan is without the usage of SmartScan, but the data resists in an Exadata Cell. Therefore we can see TABLE ACCESS STORAGE FULL instead of the well known TABLE ACCESS FULL. In this case the word STORAGE indicates that the operation was executed on an Exadata. I have set cell_offload_processing = false to force this execution plan. But what would the execution plan be, with the parameter cell_offload_processing set to the default value, which is true.

| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
|   0 | SELECT STATEMENT          |           |   500K|    36M|  3100   (1)| 00:00:38 |
|*  1 |  TABLE ACCESS STORAGE FULL| CUSTOMERS |   500K|    36M|  3100   (1)| 00:00:38 |
Query Block Name / Object Alias (identified by operation id):
   1 - SEL$1 / CUSTOMERS@SEL$1
Predicate Information (identified by operation id):
   1 - storage("IS_A_CLASS"='Y')
Column Projection Information (identified by operation id):
26 rows selected.

In the Predicate Information the line storage(“IS_A_CLASS”=’Y’) was added. This line indicates that offload processing could be used for this query. It is very important to know, that the decision whether SmartScan should be used or not, is a runtime decision! When the optimizer calculates the best execution plan, it only decides whether SmartScan could be used, not whether it is used! This is very specific to Exadata and often misunderstood.

But how can we know, whether offload processing was used or not?

Oracle added some new wait events, which contain this information. A normal multiblock read from Exadata would look as follows
WAIT #33: nam='cell multiblock physical read' ela= 7552 cellhash#=398250101 diskhash#=786031317 bytes=40960 obj#=16639 tim=1286372833389835

A neat detail. We can not only see what was executed, but also on which disk of what cell, how many bytes where read and which object was retrieved.

When a SmartScan is done, the ‘cell smart table scan’ event will occur.
WAIT #2: nam='cell smart table scan' ela= 154 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240139259306

As we have seen the execution plan itself is not enough to know whether a SmartScan happened, only the according Wait events can tell the truth. In the next post I will write about predicate offloading when functions are used to evaluate the data.

Written by danirey

March 7, 2011 at 18:01

Oracle Exadata performance revealed – SmartScan – Part I

with 2 comments

This is the start of yet another series of blog posts. This time it will be about Oracle Exadata and its SmartScan.

I just arrived back home from an Exadata Database Machine Administration Workshop with Joel Goodman as trainer. He did a great job. Have a look at his blog on oratrain.wordpress.com. Fortunately after this training I can still stick by what I wrote in a paper about SmartScan for the SOUG (swiss oracle user group) in November last year. Therefore I will put parts of this paper, which are originally written in German, in English on my blog. As it will not be a simple translation of the original text, it might also be interesting to people who read the original article.

First lets start with a very brief introduction to Oracle Exadata – readers already knowing Exadata can skip this section. Oracle Exadata Database Machine is a database appliance. It contains some servers to run an Oracle Database, usually in a Real Application Cluster and some others used as Storage Servers. The storage server software is developed by Oracle and optimized for the Oracle Database. All systems are very powerful and connected to each other over InfiniBand. The current minimum configuration is a cabinet containing two database servers with 2x 6-core Intel Xeon and 96GB each. Three storage servers, usually called Cells in the context of Exadata, containing 2x 6-core Intel Xeon and 24GB of RAM each. The storage server contain 12 hard disk drives and 384GB Sun Flash Accelerator storage each. Everything is plumbed together with 40Gbit/s InfiniBand. The smallest Database Machine configuration provides a raw read performance from disk of 5.6 GByte/s and 16 GByte/s from the flash storage. For detailed hardware specifications have a look at http://www.oracle.com/us/products/database/exadata/index.html.

Those are quite impressive numbers, but how much does this improve the speed of my application? Oracle claims a performance gain of 10 to 30 times faster. The biggest improvement achieved by a customer was 72 times faster. But how is this possible? Such massive improvements can not be achieved only by putting some very fast hardware together, otherwise customers would have already done this. But how does Oracle achieve its performance gains?

To be faster there are only two things you can do. Either work faster or do less work. We have already seen what Oracle did to work faster, but how can they reduce work and achieve the same result nevertheless?

In todays three tier architectures it is standard, to push a WHERE clause from the application layer into the database layer. Therefore only a subset of the data has to be sent back to the application layer. But what if the storage would already do this filtering? This would drastically reduce the payload on the database server and the network between the storage and database server. Therefore Oracle wrote an intelligent storage server, which is able to apply such filters before sending the data to the database.

Oracle calls this storage level optimizations SmartScan. With SmartScan Oracle can filter the following in the storage layer:

  • an equals predicate filter in the WHERE clause
  • a WHERE clause which filters using a function
  • only returning the needed subset of columns
  • pre-joining tables on storage

In subsequent posts I will show how this optimizations work. In the next post I will show what Oracle had to implement or reuse to make such optimizations possible.

Written by danirey

February 28, 2011 at 17:27

Exadata Präsentation

leave a comment »

Am 25. November 2010 werde ich an einer Veranstaltung der SOUG SIG Systems Management Ă¼ber Exadata SmartScan sprechen. Die Veranstaltung findet in Thalwil, in der Nähe von ZĂ¼rich statt.

Interessierte können sich unter http://www.soug.ch/events/sig-101125-agenda.html fĂ¼r den Anlass anmelden.

Written by danirey

November 11, 2010 at 12:46

The gains and pains of dbms_datapump. Part III

with 3 comments

Imran Khan, a visitor of my blog, asked me how to export a schema using dbms_datapump. While preparing a little example for him, I once again came around an ugly detail in dbms_datapump, which I didn’t mention in my earlier posts.

You might receive the following error message when you execute your code the second time.

ERROR at line 1:
ORA-39001: invalid argument value
ORA-06512: at line 42

If it happens the second time you execute your code, then it might be your DBMS_DATAPUMP.ADD_FILE statement which fails.

But what’s the reason? dbms_datapump fails to add the file, because it already exists. Unfortunately the error message “ORA-39001: invalid argument value” doesn’t help much to find the problem. It makes sense, that Oracle doesn’t just overwrite an existing file, but they could at least come up with an error message which leads towards the problem.

So in the following code snippet it would be line 3 causing the problem. To solve it simply delete the file which should be used as destination for the export. In my case it’s the file hr.dmp in the DATA_PUMP_DIR.


If you wrote a logfile and did this before adding the destination file, it’s quite easy to find the problem. The logfile will contain the following lines.

ORA-39000: bad dump file specification
ORA-31641: unable to create dump file "/u00/app/oracle/admin/DAR112/dmp/hr.dmp"
ORA-27038: created file already exists
Additional information: 1

To write a logfile, you can add the following lines in your script, directly after the job creation.

SELECT 'datapump_'||to_char(SYSDATE,'YYYYMMDD')||'.log'
  INTO logfileName
  FROM dual;
  handle =>     handle,
  filename =>   logfileName,
  directory =>  'DATA_PUMP_DIR',
  filetype =>   dbms_datapump.ku$_file_type_log_file);

If your interested in more details about dbms_datapump, have a look at my other posts about dbms_datapump.

Written by danirey

June 28, 2010 at 16:57

OCOB – OPITZ CONSULTING Oracle Base – in English

leave a comment »

I’m glad to inform that OCOB – OPITZ CONSULTING Oracle Base is now available!

[Edit: June 24th 2011]
Unfortunately OPTIZ CONSULTING Schweiz GmbH had to declare bankruptcy a few months ago. The German Holding and it’s GmbHs still exist. But before you consider to take OCOB as your new standard tool for database administration, you should ask OPITZ whether they are able and want to keep up with support and improvement of OCOB.

Trivadis has a good and similar product which is available through http://www.trivadis.com/en/produkte/datenbank-tools/tvd-basenvtm.html

Unfortunately I’m not aware of existing open source projects in this area. If you know a good one please tell me about it, I’ll then post a link in this post.

OCOB is a management framework for Oracle server products. It’s mainly used for easy handling of Oracle Databases. It can easily show the current status of your database, listener, asm or application server.

oracle@ltdar2:/u00/app/oracle/product/11.1.0/dbs/ [DAR112] up
Status of Oracle Products on ltdar2.opitz-consulting.int

DB-Inst.  DB2            : down    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DAR102B        : down    (/u00/app/oracle/product/10.2.0_2)
DB-Inst.  DAR920         : down    (/u00/app/oracle/product/9.2.0)
DB-Inst.  DAR102         : open    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DB1            : down    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DAR11BETA      : down    (/u00/app/oracle/product/11.1.0_beta2)
DB-Inst.  DAR111         : down    (/u00/app/oracle/product/11.1.0)
DB-Inst.  DAR112         : up      (/u00/app/oracle/product/11.2.0)

Lsnr                     : down    
oracle@ltdar2:/u00/app/oracle/product/11.1.0/dbs/ [DAR112] 

It can be used on Unix, Linux and Windows.

You can download a copy, free of charge, at http://www.opitz-consulting.com/?id=810

But why do I bore you with this advertising post? First of all I’m a big fan of it and I was personally involved in the last development steps.

Written by danirey

May 10, 2010 at 12:20

Posted in English, Oracle, Tipps&Tricks

Tagged with , , ,