Dani's IT Blog

…moving from Oracle to Software Development

Posts Tagged ‘English

Oracle Exadata performance revealed – SmartScan – Part IV

leave a comment »

In my last post I was writing about predicate offloading. I showed that it is possible, to offload a filter such as WHERE columnX equals Y. But how useful is this in real live? Most queries are more complex than this. Many use functions, such as upper, length, substr, mod, ceil, abs, ….

So what happens, if a function is part of a predicate? Does the Exadata Cell know about all of them?

No. But it knows about a lot of them. You can use every Oracle 11g database, to find out which one can be handled by the Exadata Cells and therefore don’t prevent offload processing. This information is part of the data dictionary.

SELECT * FROM v$sqlfn_metadata WHERE offloadable = 'YES';

In addition all those functions can be combined with the following operators.

!=, , =, IS [NOT] NULL, LIKE, [NOT] BETWEEN, [NOT] IN, EXISTS, IS OF type, NOT, AND, OR

To get an idea of how well it works, I made a simple test using the MOD function. The MOD function calculates [first parameter] modulo [second parameter]. As the ID column is unique, the following query will return every 117th row.

SELECT * FROM opitz.huge_customers WHERE mod(id,117) = 0;

What did the execution plan look like?

--------------------------------------------------------------------------------------------
| Id  | Operation                 | Name           | Rows  | Bytes | Cost (%CPU)| Time     |
--------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |                |   639K|    46M|   196K  (1)| 00:39:17 |
|*  1 |  TABLE ACCESS STORAGE FULL| HUGE_CUSTOMERS |   639K|    46M|   196K  (1)| 00:39:17 |
--------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - storage(MOD("ID",117)=0)
       filter(MOD("ID",117)=0)

The line
storage(MOD(“ID”,117)=0)
in the Predicate Information, indicates that offload processing could be used for this query. As we already know, we have to have a look at the events, to know whether offload processing was used or not. As you can see, from the first part of the tracefile, offload processing was used.

PARSING IN CURSOR #2 len=86 dep=0 uid=0 oct=3 lid=0 tim=1289240657920538 hv=3576726616 ad='22674a1e0' sqlid='f4d2btvam112s'
SELECT /*+ GATHER_PLAN_STATISTICS */ * FROM opitz.huge_customers WHERE mod(id,117) = 0
END OF STMT
PARSE #2:c=2000,e=2852,p=0,cr=0,cu=0,mis=1,r=0,dep=0,og=1,plh=4233741383,tim=1289240657920537
EXEC #2:c=0,e=23,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=4233741383,tim=1289240657920643
WAIT #2: nam='SQL*Net message to client' ela= 1 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1289240657920667
WAIT #2: nam='ges message buffer allocation' ela= 2 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657921078
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657921136
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 179 name|mode=1263468550 2=131153 0=1 obj#=-1 tim=1289240657921343
WAIT #2: nam='reliable message' ela= 951 channel context=10055297152 channel handle=10033718256 broadcast message=10096394424 obj#=-1 tim=1289240657922493
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657922535
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 88 name|mode=1263468550 2=131153 0=1 obj#=-1 tim=1289240657922650
WAIT #2: nam='ges message buffer allocation' ela= 1 pool=0 request=1 allocated=0 obj#=-1 tim=1289240657922694
WAIT #2: nam='enq: KO - fast object checkpoint' ela= 114 name|mode=1263468545 2=131153 0=2 obj#=-1 tim=1289240657922835
WAIT #2: nam='cell smart table scan' ela= 136 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657931912
WAIT #2: nam='cell smart table scan' ela= 120 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657934294
WAIT #2: nam='cell smart table scan' ela= 246 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657937276
WAIT #2: nam='cell smart table scan' ela= 1126 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657955228
WAIT #2: nam='cell smart table scan' ela= 1086 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657956385
WAIT #2: nam='cell smart table scan' ela= 1085 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657957539
WAIT #2: nam='cell smart table scan' ela= 654 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657958306
WAIT #2: nam='cell smart table scan' ela= 205 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657958540
WAIT #2: nam='cell smart table scan' ela= 16 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657958584
WAIT #2: nam='cell smart table scan' ela= 14378 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657974236
WAIT #2: nam='cell smart table scan' ela= 2920 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240657977313
WAIT #2: nam='cell smart table scan' ela= 1 cellhash#=88802347 p2=0 p3=0 obj#=15043 tim=1289240657977362
WAIT #2: nam='cell smart table scan' ela= 1 cellhash#=398250101 p2=0 p3=0 obj#=15043 tim=1289240657977379
FETCH #2:c=23996,e=56800,p=94,cr=103,cu=0,mis=0,r=1,dep=0,og=1,plh=4233741383,tim=1289240657977490
...

But is there also a way to see, how much it helped. We already know that the query returns only every 117th row, therefore only a 117th of the table needs to be returned by the Exadata Cells. Can we verify and confirm this?

Yes, there is such information in the v$sesstat View. Remember that those numbers are cumulative and you have to subtract the values of those stats before running the query, from the values after running the query. The following where my results.

physical read bytes = 5891588096 = 5600 MB
cell physical IO interconnect bytes = 47898864 = 45MB

And 5600MB / 117 > 45MB

So reality is even better than the theory. Therefore the overhead of the data blocks divided by 117 was bigger then the overhead of the result set 😀

In my next post, I will write about column projection.

Advertisements

Written by danirey

March 9, 2011 at 07:06

Oracle Exadata performance revealed – SmartScan – Part III

with one comment

In the first two posts I described what Exadata is and discussed the new concepts introduced with Exadata.

In this post I will show, how the predicate offloading works and how to analyze whether the database used SmartScan (offload processing) or not.

Let us start with a very simple query, which selects all A Class customers from the customers table. There is no Index on the is_a_class column and the table contains approximately 500 million rows.
SELECT * FROM opitz.customers WHERE is_a_class = 'Y';
and have a look at the execution plan

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   500K|    36M|  3100   (1)| 00:00:38 |
|*  1 |  TABLE ACCESS STORAGE FULL| CUSTOMERS |   500K|    36M|  3100   (1)| 00:00:38 |
---------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / CUSTOMERS@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - filter("IS_A_CLASS"='Y')
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "CUSTOMERS"."ID"[NUMBER,22], "CUSTOMERS"."FIRST_NAME"[VARCHAR2,200],
       "CUSTOMERS"."LAST_NAME"[VARCHAR2,200], "CUSTOMERS"."COUNTRY"[VARCHAR2,8],
       "IS_A_CLASS"[VARCHAR2,1]

This execution plan is without the usage of SmartScan, but the data resists in an Exadata Cell. Therefore we can see TABLE ACCESS STORAGE FULL instead of the well known TABLE ACCESS FULL. In this case the word STORAGE indicates that the operation was executed on an Exadata. I have set cell_offload_processing = false to force this execution plan. But what would the execution plan be, with the parameter cell_offload_processing set to the default value, which is true.

---------------------------------------------------------------------------------------
| Id  | Operation                 | Name      | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT          |           |   500K|    36M|  3100   (1)| 00:00:38 |
|*  1 |  TABLE ACCESS STORAGE FULL| CUSTOMERS |   500K|    36M|  3100   (1)| 00:00:38 |
---------------------------------------------------------------------------------------
Query Block Name / Object Alias (identified by operation id):
-------------------------------------------------------------
   1 - SEL$1 / CUSTOMERS@SEL$1
Predicate Information (identified by operation id):
---------------------------------------------------
   1 - storage("IS_A_CLASS"='Y')
       filter("IS_A_CLASS"='Y')
Column Projection Information (identified by operation id):
-----------------------------------------------------------
   1 - "CUSTOMERS"."ID"[NUMBER,22], "CUSTOMERS"."FIRST_NAME"[VARCHAR2,200],
       "CUSTOMERS"."LAST_NAME"[VARCHAR2,200], "CUSTOMERS"."COUNTRY"[VARCHAR2,8],
       "IS_A_CLASS"[VARCHAR2,1]
26 rows selected.

In the Predicate Information the line storage(“IS_A_CLASS”=’Y’) was added. This line indicates that offload processing could be used for this query. It is very important to know, that the decision whether SmartScan should be used or not, is a runtime decision! When the optimizer calculates the best execution plan, it only decides whether SmartScan could be used, not whether it is used! This is very specific to Exadata and often misunderstood.

But how can we know, whether offload processing was used or not?

Oracle added some new wait events, which contain this information. A normal multiblock read from Exadata would look as follows
WAIT #33: nam='cell multiblock physical read' ela= 7552 cellhash#=398250101 diskhash#=786031317 bytes=40960 obj#=16639 tim=1286372833389835

A neat detail. We can not only see what was executed, but also on which disk of what cell, how many bytes where read and which object was retrieved.

When a SmartScan is done, the ‘cell smart table scan’ event will occur.
WAIT #2: nam='cell smart table scan' ela= 154 cellhash#=2520626383 p2=0 p3=0 obj#=15043 tim=1289240139259306

As we have seen the execution plan itself is not enough to know whether a SmartScan happened, only the according Wait events can tell the truth. In the next post I will write about predicate offloading when functions are used to evaluate the data.

Written by danirey

March 7, 2011 at 18:01

Oracle Exadata performance revealed – SmartScan – Part II

with one comment

In the last post I gave a brief introduction to Oracle Exadata and a first hint on how the massive performance improvements are achieved.

Let us think through what Oracle had to do, to be able to filter data on the storage level.

Normally communication between Oracle and the storage server is on block level. Which means Oracle tells the storage system which blocks it needs and the storage sends them to the database server. With such a communication the storage can not know, how data could be pre-filtered. Therefore Oracle had to find a way to send additional information to the storage server. Which led to a new protocol called iDB. It is used for the communication between the database server and the storage cells. The added data contains the following and more information:

  • db_unique_name
  • instance name
  • consumer group
  • read mirror copy yes/no
  • sqlid
  • SQL statement or relevant parts of it

But the new protocol itself is not sufficient. We now have the SQL statement, or at least parts of it on the storage cell (I will try to investigate this in more detail in the future). But what next?
Of course the cell software will have to separate needed and unneeded data. To do so, it has to be able to read Oracle blocks. Therefore the cell software contains parts of the Oracle Database kernel.

To transfer the filtered data back to the database, the iDB protocol is used. But what now? How can we put this information into the buffer cache?

Let us have a look at the structure of an Oracle block. The one shown below, contains a part of a heap table.

Oracle block containing parts of a heap table

As we can see, the data area is only a subpart of the block. By ripping of a single row, data can be reduced even further. But how can this information be stored in the buffer cache? Remember, the buffer cache only contains complete oracle blocks. That’s also why the database has to have multiple buffer caches, if it uses more than one block size.

Sending the complete block, is not what we want to do. What’s the alternative?
What about: We send only the needed rows to the database and it fills the gaps, to create a complete block, with NULL?
This would unnecessarily pollute the buffer cache with NULLs and the database would have to take special care for never writing this block back to the storage.

Is there a better solution?
Oracle could create an artificial block, containing only the needed rows gathered from multiple blocks. But what can we gain by doing so?
Although the buffer cache would not get polluted with NULLs anymore, the database would still have to take special care for never writing this block to the storage.

Ok, lets take it one step further. Of what use would this artificial blocks in the buffer cache be?
Only the original query, or maybe some queries selecting subsets of the original one, could ever use this blocks again. Because the data is specially filtered for the query on row and column level, chances that the data can be reused are very low. Therefore it does not make sense, to have such blocks in the buffer cache anyway.

So what’s the solution then?
Once again, Oracle had the chance to reuse some “old” technology (please consider this a laud for the good decisions the Oracle developers made in the past). The direct read, usually used for parallel reads, puts a result set into the PGA instead of putting the read blocks into the buffer cache. Because only the result set is saved, no unnecessary data pollutes the memory. By using the PGA instead of the SGA, special memory structures such as a result set are possible and the memory can be freed as soon as the requester finished reading it. To write the data into the PGA the Cells use RDMA, remote direct memory access. Without RDMA additional memory copies would be needed and a processor in the database server would have to take care of this. Thanks to RDMA, the processors of the database server should not be involved at all.

Now you should have some knowledge about the underlying architecture of the SmartScan. In the next post I will show, how the predicate offloading works.

Written by danirey

March 2, 2011 at 17:56

Oracle Exadata performance revealed – SmartScan – Part I

with 2 comments

This is the start of yet another series of blog posts. This time it will be about Oracle Exadata and its SmartScan.

I just arrived back home from an Exadata Database Machine Administration Workshop with Joel Goodman as trainer. He did a great job. Have a look at his blog on oratrain.wordpress.com. Fortunately after this training I can still stick by what I wrote in a paper about SmartScan for the SOUG (swiss oracle user group) in November last year. Therefore I will put parts of this paper, which are originally written in German, in English on my blog. As it will not be a simple translation of the original text, it might also be interesting to people who read the original article.

First lets start with a very brief introduction to Oracle Exadata – readers already knowing Exadata can skip this section. Oracle Exadata Database Machine is a database appliance. It contains some servers to run an Oracle Database, usually in a Real Application Cluster and some others used as Storage Servers. The storage server software is developed by Oracle and optimized for the Oracle Database. All systems are very powerful and connected to each other over InfiniBand. The current minimum configuration is a cabinet containing two database servers with 2x 6-core Intel Xeon and 96GB each. Three storage servers, usually called Cells in the context of Exadata, containing 2x 6-core Intel Xeon and 24GB of RAM each. The storage server contain 12 hard disk drives and 384GB Sun Flash Accelerator storage each. Everything is plumbed together with 40Gbit/s InfiniBand. The smallest Database Machine configuration provides a raw read performance from disk of 5.6 GByte/s and 16 GByte/s from the flash storage. For detailed hardware specifications have a look at http://www.oracle.com/us/products/database/exadata/index.html.

Those are quite impressive numbers, but how much does this improve the speed of my application? Oracle claims a performance gain of 10 to 30 times faster. The biggest improvement achieved by a customer was 72 times faster. But how is this possible? Such massive improvements can not be achieved only by putting some very fast hardware together, otherwise customers would have already done this. But how does Oracle achieve its performance gains?

To be faster there are only two things you can do. Either work faster or do less work. We have already seen what Oracle did to work faster, but how can they reduce work and achieve the same result nevertheless?

In todays three tier architectures it is standard, to push a WHERE clause from the application layer into the database layer. Therefore only a subset of the data has to be sent back to the application layer. But what if the storage would already do this filtering? This would drastically reduce the payload on the database server and the network between the storage and database server. Therefore Oracle wrote an intelligent storage server, which is able to apply such filters before sending the data to the database.

Oracle calls this storage level optimizations SmartScan. With SmartScan Oracle can filter the following in the storage layer:

  • an equals predicate filter in the WHERE clause
  • a WHERE clause which filters using a function
  • only returning the needed subset of columns
  • pre-joining tables on storage

In subsequent posts I will show how this optimizations work. In the next post I will show what Oracle had to implement or reuse to make such optimizations possible.

Written by danirey

February 28, 2011 at 17:27

OCOB – OPITZ CONSULTING Oracle Base – in English

leave a comment »

I’m glad to inform that OCOB – OPITZ CONSULTING Oracle Base is now available!

[Edit: June 24th 2011]
Unfortunately OPTIZ CONSULTING Schweiz GmbH had to declare bankruptcy a few months ago. The German Holding and it’s GmbHs still exist. But before you consider to take OCOB as your new standard tool for database administration, you should ask OPITZ whether they are able and want to keep up with support and improvement of OCOB.

Trivadis has a good and similar product which is available through http://www.trivadis.com/en/produkte/datenbank-tools/tvd-basenvtm.html

Unfortunately I’m not aware of existing open source projects in this area. If you know a good one please tell me about it, I’ll then post a link in this post.
[/Edit]

OCOB is a management framework for Oracle server products. It’s mainly used for easy handling of Oracle Databases. It can easily show the current status of your database, listener, asm or application server.

oracle@ltdar2:/u00/app/oracle/product/11.1.0/dbs/ [DAR112] up
Status of Oracle Products on ltdar2.opitz-consulting.int

DB-Inst.  DB2            : down    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DAR102B        : down    (/u00/app/oracle/product/10.2.0_2)
DB-Inst.  DAR920         : down    (/u00/app/oracle/product/9.2.0)
DB-Inst.  DAR102         : open    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DB1            : down    (/u00/app/oracle/product/10.2.0)
DB-Inst.  DAR11BETA      : down    (/u00/app/oracle/product/11.1.0_beta2)
DB-Inst.  DAR111         : down    (/u00/app/oracle/product/11.1.0)
DB-Inst.  DAR112         : up      (/u00/app/oracle/product/11.2.0)

Lsnr                     : down    
oracle@ltdar2:/u00/app/oracle/product/11.1.0/dbs/ [DAR112] 

It can be used on Unix, Linux and Windows.

You can download a copy, free of charge, at http://www.opitz-consulting.com/?id=810

But why do I bore you with this advertising post? First of all I’m a big fan of it and I was personally involved in the last development steps.

Written by danirey

May 10, 2010 at 12:20

Posted in English, Oracle, Tipps&Tricks

Tagged with , , ,

use menus in Vim without having a GUI

leave a comment »

While applying changes to sqplus.vim I was looking for the option to execute code only if Vim is running in a GUI ( if has("gui_running") ) While searching for this information I changed my mind, because I found out that you can use the menu also when running Vim on command line.

You just have to add the following lines to you .vimrc

:source $VIMRUNTIME/menu.vim
:set wildmenu
:set cpo-=<
:set wcm=<C-Z>

To use the newly added feature write :emenu plus a SPACE in command mode and press TAB. A menu appears above the command bar. You can change between the menu points with TAB and choose one with ENTER. After this you are in the sub menu, again you can select with TAB and ENTER.

You can also write a part of a menu point and complete it with tab. Or you type :emenu <C-D> (<C-D> means Ctrl+D for those not used this notation) which shows you a list of all possible menu entries.

The cool thing about wildmenu is, it brings you command completion for more than just menus. For example you could use it to show a list of available help entries. To try this type :h spl <C-D> and you will see a list of available help documents about split, vsplit, isplit, display …

Written by danirey

October 27, 2009 at 10:05