Archive

Linux

This post is about an issue I ran into when trying to use pgbouncer in front of Yugabyte YSQL, the postgres layer yugabyte reuses for handling relational data.

pgbouncer: https://www.pgbouncer.org

yugabyte: https://www.yugabyte.com

postgres: https://www.postgresql.org

The case for pgbouncer.

Pgbouncer is a connectionpool for postgres databases. It’s a process sitting in between the client and the actual postgres server, and separates clients from a postgres server connection by keeping a pool of connections from the clients (the socalled clients pool), and keeping a pool of connections to the postgres server (the socalled servers pool).

A postgres database connection is forked from the postmaster as an independent process when a client connects, and executes the queries on the client’s behalf. Creating a process is a timetaking and “costly” action for a computer system. The most ideal way of using postgres is to logon once with all the needed connections, and use and reuse these connections.

Sadly, a lot of database clients do not take using the database in the most ideal way into account, and perform database unfriendly connection management, such as creating and dropping connections really quick or creating an high amount of connections, or both.

pgbouncer helps in such cases with the aforementioned separation of clients and servers, and linking a client with a server connection based on the set pool mode, which can be session (a client uses a database connection for the duration of the session until disconnect), transaction (a client uses a database connection for the duration of a transaction) or statement (a client uses a database connection for each query).

The default setting for pool mode is session, based on testing it seems pgbouncer functions most optimal when using transaction mode, so (varying/huge numbers of) client connections can use a lower amount of (dedicated/steady) database connections.

The encountered issue.

I used a default Centos 8.3 image I created myself using packer, using vagrant and virtualbox. This means a lot of the machine configuration is default, using packer means the image is created using scripts. In other words: there was no configuration done by myself or someone else without me knowing it.

The server was functioning correctly, and added pgbouncer (pgbouncer is a package in EPEL), configured it and ran it. I configured pgbouncer to allow 1000 client connections (max_client_conn), and create 100 database connections (default_pool_size,min_pool_size). The way pgbouncer works is that the connections are not created when pgbouncer starts, but per database when a client requests the database from pgbouncer for the first time. I manually connected through pgbouncer and it functioned correctly. So far, so good.

Then I tested it with a synthetic load:

for n in $(1 200) do; ysqlsh -h localhost -p 6432 -c "select pg_sleep(60);" & done

What should happen is that all 200 connections get connected to pgbouncer, and pgbouncer creates 100 connections, to which 100 clients get connected and execute, and 100 clients should remain waiting to be serviced.

However, I noticed that there were only a couple of connections created to the database server, something like 2 or 3, and the amount was slowly increasing. It also lead to pgbouncer eventually killing the connections, because query_wait_timeout is set to 120 (seconds) by default, and with the low number of connections, some sessions do exceed that time waiting and thus are stopped by pgbouncer.

The investigation.

The first thing I did was google the issue. No hits that I could find that actually described my issue.

To validate why I couldn’t find this issue, I setup a machine with vanilla postgres and added pgbouncer to it with the same configuration and performed the same test. In that situation pgbouncer build the server connections really quick, and everything worked as I wanted it to work, meaning that pgbouncer connected all the client connections, and executed the queries using the statically set number of connections to the postgres server.

At that point I started thinking the modifications yugabyte made to the postgres layer might have an impact on this.

But it’s clear, I need additional information to understand where I needed to look.

Looking more deeply into the configuration of pgbouncer I found the option “verbose”, and set it to 5 to get more information, and ran the test against the yugabyte ysql server again. I found the following messages in the pgbouncer log:

2021-04-08 16:59:49.839 UTC [22773] DEBUG sbuf_after_connect_check: pending error: Connection refused

2021-04-08 16:59:49.839 UTC [22773] LOG S-0x563bd3657a80: yugabyte/yugabyte@[::1]:5433 closing because: connect failed (age=0s)

2021-04-08 16:59:49.839 UTC [22773] NOISE safe_close(35) = 0

2021-04-08 16:59:50.174 UTC [22773] DEBUG launching new connection to satisfy min_pool_size

2021-04-08 16:59:50.174 UTC [22773] DEBUG launch_new_connection: last failed, not launching new connection yet, still waiting 14 s

2021-04-08 16:59:50.505 UTC [22773] DEBUG launching new connection to satisfy min_pool_size

2021-04-08 16:59:50.505 UTC [22773] DEBUG launch_new_connection: last failed, not launching new connection yet, still waiting 14 s

2021-04-08 16:59:50.841 UTC [22773] DEBUG launching new connection to satisfy min_pool_size

2021-04-08 16:59:50.841 UTC [22773] DEBUG launch_new_connection: last failed, not launching new connection yet, still waiting 13 s

So pgbouncer encountered a connection refused error, and therefore closes/stops the attempt to add a connection, and then waits for 15 seconds before trying again. Why would it encounter a connection refused, which it didn’t encounter with vanilla postgres?

To be more practical, there also is a setting that allows tweaking the back-off time for the connection refused issue: server_login_retry. When I set it to 0, I still do get the error, but pgbouncer then builds up the server connections.

But it’s not very satisfactory to have this rather blunt workaround. I would like get the issue solved!

Looking deeper.

I decided I need to get this solved. Because this is a connection refused error, it’s logical to look at networking. A really good tool for this case is tcpdump. tcpdump allows (a superuser/root) to capture network traffic and visualise it. For the investigation, I setup the server to have no other connections to the database, pgbouncer freshly started so no connection are built up yet, and then connect to pgbouncer to trigger it to build the connections.

First regular postgres:

vagrant@localhost ~]$ sudo tcpdump -ttt -i lo -n port 5432

dropped privs to tcpdump

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [S], seq 952020725, win 43690, options [mss 65495,sackOK,TS val 3901196965 ecr 0,nop,wscale 6], length 0

IP 127.0.0.1.postgres > 127.0.0.1.48366: Flags [S.], seq 3702813748, ack 952020726, win 43690, options [mss 65495,sackOK,TS val 3901196965 ecr 3901196965,nop,wscale 6], length 0

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [.], ack 1, win 683, options [nop,nop,TS val 3901196965 ecr 3901196965], length 0

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [P.], seq 1:9, ack 1, win 683, options [nop,nop,TS val 3901197012 ecr 3901196965], length 8

IP 127.0.0.1.postgres > 127.0.0.1.48366: Flags [.], ack 9, win 683, options [nop,nop,TS val 3901197012 ecr 3901197012], length 0

IP 127.0.0.1.postgres > 127.0.0.1.48366: Flags [P.], seq 1:2, ack 9, win 683, options [nop,nop,TS val 3901197012 ecr 3901197012], length 1

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [.], ack 2, win 683, options [nop,nop,TS val 3901197012 ecr 3901197012], length 0

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [P.], seq 9:98, ack 2, win 683, options [nop,nop,TS val 3901197012 ecr 3901197012], length 89

IP 127.0.0.1.postgres > 127.0.0.1.48366: Flags [P.], seq 2:15, ack 98, win 683, options [nop,nop,TS val 3901197013 ecr 3901197012], length 13

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [P.], seq 98:139, ack 15, win 683, options [nop,nop,TS val 3901197013 ecr 3901197013], length 41

IP 127.0.0.1.postgres > 127.0.0.1.48366: Flags [P.], seq 15:342, ack 139, win 683, options [nop,nop,TS val 3901197014 ecr 3901197013], length 327

IP 127.0.0.1.48366 > 127.0.0.1.postgres: Flags [.], ack 342, win 700, options [nop,nop,TS val 3901197053 ecr 3901197014], length 0

In essence there isn’t much noteworthy to see. The first 3 packets are the TCP 3-way handshake, after which we see pgbouncer and postgres getting the connection ready at the other layers.

Now yugabyte:

[vagrant@centos83-yb-1 ~]$ sudo tcpdump -ttt -i lo -n port 5433

dropped privs to tcpdump

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode

listening on lo, link-type EN10MB (Ethernet), capture size 262144 bytes

IP 127.0.0.1.37500 > 127.0.0.1.pyrrho: Flags [S], seq 2745440791, win 43690, options [mss 65495,sackOK,TS val 1151235630 ecr 0,nop,wscale 7], length 0

IP 127.0.0.1.pyrrho > 127.0.0.1.37500: Flags [S.], seq 2754397962, ack 2745440792, win 43690, options [mss 65495,sackOK,TS val 1151235630 ecr 1151235630,nop,wscale 7], length 0

IP 127.0.0.1.37500 > 127.0.0.1.pyrrho: Flags [.], ack 1, win 342, options [nop,nop,TS val 1151235630 ecr 1151235630], length 0

IP 127.0.0.1.37500 > 127.0.0.1.pyrrho: Flags [P.], seq 1:42, ack 1, win 342, options [nop,nop,TS val 1151235630 ecr 1151235630], length 41

IP 127.0.0.1.pyrrho > 127.0.0.1.37500: Flags [.], ack 42, win 342, options [nop,nop,TS val 1151235630 ecr 1151235630], length 0

IP 127.0.0.1.pyrrho > 127.0.0.1.37500: Flags [P.], seq 1:332, ack 42, win 342, options [nop,nop,TS val 1151235755 ecr 1151235630], length 331

IP 127.0.0.1.37500 > 127.0.0.1.pyrrho: Flags [.], ack 332, win 350, options [nop,nop,TS val 1151235756 ecr 1151235755], length 0

IP6 ::1.43518 > ::1.pyrrho: Flags [S], seq 2042629123, win 43690, options [mss 65476,sackOK,TS val 2150324005 ecr 0,nop,wscale 7], length 0

IP6 ::1.pyrrho > ::1.43518: Flags [R.], seq 0, ack 2042629124, win 0, length 0

Yugabyte works at port 5433, which tcpdump translates as ‘pyrrho’. Here too the 3-way handshake is visible. And then pgbouncer and yugabyte chatting, until…a packet is sent using IP6 ?!? The packet is sent to the ‘pyrrho’ port, so it means the client (pgbouncer) must be doing this. Quite correctly the server responds to the IP6 packet by telling it the port is not in use (‘R’: reset). That actually fits with what the pgbouncer logging told us.

The solution.

At his point I knew somehow IPv6 (IP6) got involved. It probably has to do with name resolving, because that fits a scenario where something might “spontaneously” resolve as IPv6. And since it’s localhost, it’s not logical this is DNS, this must be something that is local on the machine. Well, if it’s name to address resolving and it can’t be DNS, the most logical thing would be the /etc/hosts file!

Apparently, with a Centos 8 standard installation, localhost is defined in the following way in /etc/hosts:

127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost4.localdomain6

And because ‘::1’, the IPv6 address, is the second one, that is the one localhost will resolve to:

ping -c 1 localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.024 ms

Well, then the solution to fully solve this would be really simple: remove the IPv6 / ::1 line from /etc/hosts. I did that, and indeed, there is no ‘connection refused’ anymore! Problem solved!

I am excited to announce that I have accepted a role at Yugabyte as ‘developer advocate’. It’s not easy to leave the talented group of people that is Enkitec, of which many of them I call friends. In fact, the change is much bigger since Yugabyte is a database outside of the Oracle ecosystem, in which ecosystem there also are a lot of people that I have gotten to known and call friends.

So why the change? In essence, my reasons are the same as the ones Bryn Llewellyn mentioned in his blogpost about his move, although I come from a different background being a consultant and not working for Oracle.

Open source

You could argue that having spend investigating the inner working of the Oracle database for 20 or so years means I am discarding all this knowledge I build up. I don’t believe that’s true. Yes, I have build up a lot of knowledge about the Oracle database works, and parts of that knowledge might be known (usable?) to a small group. However, I am looking forward to working with software for which I can actually see how it’s made, and I don’t have to perform extensive research to figure out how it’s “probably” implemented. This is an important reason for me personally. In general I think there are much more reasons for companies for wanting to use open source software instead of proprietary, closed source software.

Distributed architecture & distributed SQL

The area I have spend a significant amount of time in, is assessing and investigating performance of the Oracle database. Oracle, like many traditional monolithic databases, use an architecture where the database itself is actually a single set of files containing the data, which is singular, and are served by processes that perform validation of requests to use that data, and perform performance features like caching so data can be used from memory, which means it doesn’t need to be re-read from the files so it can be served at memory latency. This architecture imposes limits on the scale of use, for which common solutions to overcome these limits with monolithic architectures are:
– Sharing the (still singular) datafiles with multiple machines (RAC).
– Make a copy of the data files, and spool the changes to the copy (DG/OGG/replication).
– (Manually) partition the data over multiple independent databases (sharding).
– Storing data redundantly over the available (still local) disks (RAID).
I am not saying these solutions are wrong. In fact, many of them are very cleverly architected and optimized over the years with the different architectures of databases and work really well.
However, there are a consequences that are inherently imposed by a monolithic architecture, such as scalability, flexibility and failure resistance.

The Yugabyte database provides a cloud ready, cloud vendor agnostic, failure resistant, linear scaling database (sorry for the abundance of hype words). Failure resistance is provided by keeping multiple copies of the data, which is configurable as ‘replication factor’. By keeping copies of the data, failure of a single Yugabyte node (called a ‘tablet server’) can be overcome. By spreading these copies of data over groups of tablet servers in cloud availability zones, the outage of an entire availability zone can be survived. When such a disaster strikes, the cluster will survive and provide normal (read and write functions) as long as 2/3rd of the nodes survive (in case of a replication factor of 3), without the need for any management to accommodate failover, like instantiating a replica/standby.

Each Yugabyte tablet server also serves as an API for data retrieval and manipulation, so with the addition of tablet servers, the processing power increases. A tablet server should use local, non-protected disks to take advantage of low latency, and by adding more tablet servers the work can be spread out further to these, increasing the amount of work that can be done. Aside from increasing processing for API-side processing, Yugabyte provides functionality which pushes down the work to the storage, which Exadata specialists will recognize as ‘smart scans’, alongside things like predicate pushdown and bloom filters.

Yugabyte has not invented yet another SQL and/or no-SQL dialect. Another point that I think is really strong is that Yugabyte reuses Postgresql as the SQL API. The Postgres API is “wire compatible” with Postgres, which means that any product that can talk to a Postgres database can talk to Yugabyte. In fact, instead of writing a Postgres layer, Yugabyte has re-used the Postgres source to use the query layers of postgres, and then provide the storage for postgres using Yugabyte. This also means a lot of the server side functionality of Postgres, such as PL/pgSQL can be used. Needless to say there are limitations inherent to the postgres connection with the Yugabyte storage layer, but most common postgres database functionality is available.

But that is not all! Aside and in parallel to Yugabyte’s Postgres API, it also provides a “NoSQL” APIs, which are compatible with Apache Cassandra and with Redis. Oh, and data storage is done via LSM (log-structured merge-tree), etc.

Conclusion

I hope at this point you can see that Yugabyte provides a fresh, modern approach to a database that provides a lot of advantages over more traditional, monolithic databases, for which I think a lot of cases that I witnessed over the years could significantly benefit from (performance-wise and availability-wise). I am really thrilled to be on the basis of building that further.

Also, a lot of technical reasons that I described are really just a summary, if you have gotten interested or want to learn more, I would urge you (or challenge you :-D) to have a look at https://www.yugabyte.com or read more at https://docs.yugabyte.com. Or if you want to see the code, head over to https://github.com/yugabyte/yugabyte-db!

This post is about a fully documented but easy to miss feature of vagrant, which is the evaluation of the Vagrantfiles when a vagrant ‘box’ is started. As you can see, I wrote ‘Vagrantfiles’, which is plural. I (very naively) thought that the Vagrantfile in the directory where you want to start the ‘box’ determines the specifics of the vagrant ‘box’.

This begins with me trying to create a vagrant virtualbox VM using packer, where I specify a Vagrantfile during the packer build. I normally don’t spend a lot of thought on that, and just put in a very simple Vagrantfile, essentially just defining the box and setting memory and cpu amounts.

In this case, I decided to put in the Vagrantfile that I wanted to use for this special purpose ‘box’ as the Vagrantfile for the packer build, which has some ruby in it which adds a disk:

data1_disk = "data1.vdi"
if !File.exist?(data1_disk)
  vb.customize [ 'createhd', '--filename', data1_disk, '--size', 20480 ]
end
vb.customize [ 'storageattach', :id, '--storagectl', 'SATA Controller', '--port', 2, '--device', 0, '--type', 'hdd', '--medium', data1_disk ]

My (wrong) assumption was that this Vagrantfile was kept as a validation or template.

After the box was created, I tested running it, and therefore I copied the Vagrantfile including the above addition of checking for a file, if it doesn’t exist creating it, and then attaching the disk to the VM. However, this did fail with an error in the provisioning stage that a disk was already added, and therefore could not be added again.

I didn’t understand the error when I first encountered it, so I investigated the issue by removing the box (vagrant destroy), and commenting out the disk addition code (the above 5 lines) from the Vagrantfile, and then run the box addition again (vagrant up). Much to my surprise, it did add the disk for which I explicitly removed the code that provided that functionality.

This severely puzzled me.

After going over the vagrant documentation and searching for the error messages, I found out that the Vagrantfile embedded with the ‘box’ is actually used, and parsed, along with the user specified Vagrantfile, and the contents of both are merged before a vagrant ‘box’ is started. This Vagrantfile can be seen in the directory which holds the zipped boxes after downloading from app.vagrantup.com. On my Mac this is ‘~/.vagrant.d/boxes/{box name}/{version}/{provider}/Vagrantfile’.

This perfectly explained what I witnessed: because the disk creation steps are in the ’embedded’ Vagrantfile, it will simply be executed, even if it’s not in the normal Vagrantfile. And of course it threw an error when the disk creation steps were added to the normal Vagrantfile, because then these steps were executed twice, which would show at the vb.customize ‘storageattach’ execution, because that is not protected by a check, so the second occurrence would be tried and fails.

This was really my ignorance, and maybe a bit the documentation not being overly verbose about the existence of more than one Vagrantfile. It also gives great opportunities: lots of steps or logic that would normally be in the Vagrantfile could be put in the embedded Vagrantfile so that regular Vagrantfile can be kept really simple.

Conclusion is that if you create vagrant boxes yourself, when you want to perform provisioning steps against a vagrant box that simply need to be done, you might as well put them in the embedded Vagrantfile.

Hopefully I got your interest by the weird name of this blogpost. This blogpost is about sensible usage of an Oracle database. Probably, there are a lot of blog posts like this, but I need to get this off my chest.

A quote any starwars fan would recognise is ‘I sense a disturbance in the force’. I do, and I have felt it for a long time. This disturbance is the usage of the number of connections for a database. Because of my profession, this is the oracle database, but this really applies to the server-side of any client/server server processor running on at least (but probably not limited to) intel Xeon processors.

The disturbance is the oversubscription or sometimes even excessive oversubscription of database connections from application servers, or any other means of database processes that acts as clients. This means this does not exclude parallel query processes, in other words: this applies to parallel query processes too.

What is oversubscription of database connections? Any good consultant would be able to tell you this: it depends. In principle, oversubscription means more subscribers than a system can handle. This means ‘oversubscription’ is a multidimensional beast, which can apply to CPU, memory, disk IO, network IO. That makes it hard.

This blogpost is about CPU oversubscription. The way a modern CPU, Intel Xeon in this case, works is not simple, and thus this will and cannot be an exhaustive description. What I want to try is provide a basic, simplistic description to provide an idea and give guidance.

An Intel Xeon CPU provides a number of processing units, called ‘processor’ in /proc/cpuinfo, called ‘cpu’ in the top utility, etc, which are the execution contexts for processes. These execution contexts can be hyperthreads, which for Intel Xeon are two threads per core, or an execution context for a single core. Not all Xeon CPUs provide hyperthreading, and hyperthreading, if available in the CPU, can be disabled in a system’s BIOS. A hyperthread can not, and does not do any processing, that is what a core does.

So why do hyperthreads exist then? I think two prominent reasons are:
1. Not all processes are active all the time. By increasing the number of execution contexts, the switching for processes between execution contexts is reduced, which means a reduction of time spend on context switches.
2. Cores are incredibly powerful, and given that probably not all processes are active all the time, combining two processes on the same core will be “reasonably unnoticeable”.

The next question which then comes to mind is: is hyperthreading a good thing for a database? I cannot give an exhaustive answer for this, partially because there is massive difference between system usage for different types of database usage. I would say that with reasonable usage, hyperthreading in modern Intel Xeon CPUs does provide more benefits, reduced context switching, than it gives downsides, like variances in latency of CPU usage.

This means that for looking at the CPU processing power of a database server the ‘actual’ processing power sits logically between the number of core’s and the number of threads. But wait! Didn’t I just say that threads don’t process, only core’s do? Yes, but let me explain how I look at this: if your processes do processing requiring ACTUAL, ON CPU processing, it means they depend on the core to be able to handle this, versus processes that work by running into waiting really quickly, like doing disk IO, network IO or waiting for users to make them active, which still might appear as running all the time, but in reality are actually processing on the core occasionally.

The first type, doing the actual, on cpu processing should calculate CPU power more towards core count, and the second type, doing lots of stalls, should calculate CPU more towards thread count. Overall, this is quite simply about using a core as efficient as possible.

Now that we gone through CPUs and their cores and threads, and oversubscription in general, the next question is: so how much processes should be allocated on a database server?

The answer is simple: if you want a high performing database for your application servers, the number of processes IN TOTAL should not exceed a number sitting somewhere between CPU core count and CPU thread count.

“Isn’t that incredibly low?”

Yes, for most of the deployments that I see this would be shockingly low. But just because the number of processes is set very high somewhere doesn’t make it right. It just means it’s set that way.

“But why is it set too high everywhere?”

I don’t know. I don’t understand why lots and lots of people do allocate high, up to sometimes ASTRONOMICAL numbers of database processes, and then expect that to be the best tuned way, while there is NO LOGICAL EXPLANATION that I can see for this to make sense. In fact: the explanation why this doesn’t make sense is this blogpost.

To make the comparison with a supermarket and the number of tills: if you go shopping in a supermarket and want to pay and leave as soon as possible, there should be a ready, idle till available, or else you have to wait. For Intel Xeon hyperthreading, you could make the comparison with a till that serves two lanes with persons that want to pay at the same time, because it takes time to put all the items from the shopping basket onto the desk, and the more time that takes, the more efficient a till serving two lanes would be (an Intel Xeon CPU can actually serve two threads at the same time, optimising runtime on the single core).

“Okay, but the majority of the processes is not actually doing anything.”

Well, if the processes are actually not doing anything, why have them in the first place? If that is really true, it doesn’t make sense to have them. And don’t forget: what looks like an idle connection from both an application and a database perspective still is an actual live, running operating system process, and a running database process that has memory allocated, occupies a network socket, has breakable parse locks out, etc, and requires CPU time to maintain the connection.

In fact, by having huge numbers of database connections, you have setup the application to be able to cause “the perfect storm” on the database!

With this, I mean that what I normally see, is that indeed the majority of the database connections are not used. However… if things get worse, and the database gets active and starts lacking CPU processing power, more database connections get active. That is logical, right? The database connections that normally would be active will take longer time because of the increased activity, so with a constant amount of work, new work cannot use an existing connection because that is still active, and thus take another connection that normally would sit idle. However, serving more connections will increase the amount of CPU required even further, which was already lacking, so the waiting time increases further. Now because the waiting time gets higher, more connections are needed, etc.

And then I didn’t talk about dynamically increasing connection pools!

What I mean with that is that I until now talked about STATIC connection pools. Static means the minimal number of connections is the maximal number of connections in the pool. A dynamic connection pool will have a certain amount of connections, and when there is a need for more, which means all the connections are busy, add more connections.

Especially with Oracle, this is really a bad idea. Let me explain. Outside of too much connections in the first place, which is a bad idea already, having an expanding connection pool means not only idle connections are put to work, but instead the database is given EVEN MORE work by initialising new connections. An oracle database connection is not lightweight, it requires initialising memory, which is an expensive operation. And the whole reason the connection is created is because the connection pool established all the connections were busy, which almost certainly is because the database was busy (!!!!).

I hope a lot of people will make it to the end, and then realise that high numbers of connections does not make any sense. If you do have an explanation that makes sense, please comment. Please mind that a tuned setup requires an application server to be reasonably setup too, you cannot have one part setup for ultimate processing power, and another part be just a shipwreck.

This blogpost is about how the oracle database executable created or changed during installation and patching. I take linux for the examples, because that is the version that I am almost uniquely working with. I think the linux operating is where the vast majority of linux installations are installed on, and therefore an explanation with linux is helpful to most of the people.

The first thing to understand is the oracle executable is a dynamically linked executable. This is easy to see when you execute the ‘ldd’ utility against the oracle executable:

$ ldd oracle
	linux-vdso.so.1 (0x00007ffd3f5b0000)
	libodm19.so => /u01/app/oracle/product/19/dbhome_1/lib/libodm19.so (0x00007fa693084000)
	libofs.so => /u01/app/oracle/product/19/dbhome_1/lib/libofs.so (0x00007fa692e82000)
	libcell19.so => /u01/app/oracle/product/19/dbhome_1/lib/libcell19.so (0x00007fa692b69000)
	libskgxp19.so => /u01/app/oracle/product/19/dbhome_1/lib/libskgxp19.so (0x00007fa69284d000)
	libskjcx19.so => /u01/app/oracle/product/19/dbhome_1/lib/libskjcx19.so (0x00007fa692604000)
	librt.so.1 => /lib64/librt.so.1 (0x00007fa6923fb000)
	libclsra19.so => /u01/app/oracle/product/19/dbhome_1/lib/libclsra19.so (0x00007fa6921c0000)
	libdbcfg19.so => /u01/app/oracle/product/19/dbhome_1/lib/libdbcfg19.so (0x00007fa691f93000)
	libhasgen19.so => /u01/app/oracle/product/19/dbhome_1/lib/libhasgen19.so (0x00007fa691270000)
	libskgxn2.so => /u01/app/oracle/product/19/dbhome_1/lib/libskgxn2.so (0x00007fa69106d000)
	libocr19.so => /u01/app/oracle/product/19/dbhome_1/lib/libocr19.so (0x00007fa690d49000)
	libocrb19.so => /u01/app/oracle/product/19/dbhome_1/lib/libocrb19.so (0x00007fa690a49000)
	libocrutl19.so => /u01/app/oracle/product/19/dbhome_1/lib/libocrutl19.so (0x00007fa690828000)
	libaio.so.1 => /lib64/libaio.so.1 (0x00007fa690625000)
	libons.so => /u01/app/oracle/product/19/dbhome_1/lib/libons.so (0x00007fa6903d1000)
	libmql1.so => /u01/app/oracle/product/19/dbhome_1/lib/libmql1.so (0x00007fa69016f000)
	libipc1.so => /u01/app/oracle/product/19/dbhome_1/lib/libipc1.so (0x00007fa68fcf5000)
	libdl.so.2 => /lib64/libdl.so.2 (0x00007fa68faf1000)
	libm.so.6 => /lib64/libm.so.6 (0x00007fa68f76f000)
	libpthread.so.0 => /lib64/libpthread.so.0 (0x00007fa68f54f000)
	libnsl.so.1 => /lib64/libnsl.so.1 (0x00007fa68f336000)
	libresolv.so.2 => /lib64/libresolv.so.2 (0x00007fa68f11f000)
	libc.so.6 => /lib64/libc.so.6 (0x00007fa68ed5d000)
	/lib64/ld-linux-x86-64.so.2 (0x00007fa693287000)

The way this works, is that a library is defined in the ELF (the executable format of linux executables) header of the oracle executable. This can be seen using the ‘readelf’ utility:

$ readelf -d oracle

Dynamic section at offset 0x16f03640 contains 45 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libodm19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libofs.so]
 0x0000000000000001 (NEEDED)             Shared library: [libcell19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libskgxp19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libskjcx19.so]
 0x0000000000000001 (NEEDED)             Shared library: [librt.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libclsra19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdbcfg19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libhasgen19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libskgxn2.so]
 0x0000000000000001 (NEEDED)             Shared library: [libocr19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libocrb19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libocrutl19.so]
 0x0000000000000001 (NEEDED)             Shared library: [libaio.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libons.so]
 0x0000000000000001 (NEEDED)             Shared library: [libmql1.so]
 0x0000000000000001 (NEEDED)             Shared library: [libipc1.so]
 0x0000000000000001 (NEEDED)             Shared library: [libdl.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libpthread.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libnsl.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libresolv.so.2]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000f (RPATH)              Library rpath: [/u01/app/oracle/product/19/dbhome_1/lib]
 0x000000000000000c (INIT)               0xdb0d80
 0x000000000000000d (FINI)               0x12b54b90
 0x0000000000000019 (INIT_ARRAY)         0x17398c20
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000006ffffef5 (GNU_HASH)           0x4002d0
 0x0000000000000005 (STRTAB)             0x9ddb10
 0x0000000000000006 (SYMTAB)             0x528128
 0x000000000000000a (STRSZ)              3567352 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000015 (DEBUG)              0x0
 0x0000000000000003 (PLTGOT)             0x17504000
 0x0000000000000002 (PLTRELSZ)           26136 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0xdaa768
 0x0000000000000007 (RELA)               0xda9328
 0x0000000000000008 (RELASZ)             5184 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0xda9188
 0x000000006fffffff (VERNEEDNUM)         7
 0x000000006ffffff0 (VERSYM)             0xd44a08
 0x0000000000000000 (NULL)               0x0

This shows the names of the needed shared libraries ‘(NEEDED)’. Some of the needed shared libraries are oracle shared libraries, such as libodm19.so, libofs.so, libcell19.so, libskgxp19.so and so on. Other libraries are operating system libraries, such as libc.so.6, libresolv.so.2, libnsl.so.1, libpthread.so.0, libm.so.6 and so on. The oracle libraries are found because an RPATH (runpath) is included in the header, in my case /u01/app/oracle/product/19/dbhome_1/lib. The operating system libraries are not included with the oracle installation, they are dynamically obtained from the operating system, for which the selection lies with the operating system.

So, we got the oracle executable, and we found out it’s a dynamically linked executable, which means that it’s using shared libraries for some of its functionality.

Now let’s take one step further. Whenever the oracle database software is installed or patched, it must be linked in order to build the executable with the current state of the software.
You might wonder the what I mean with the phrase ‘is installed’: you probably don’t execute a relink all. And that is sensible, because the installer does that for you, you can validate it in $ORACLE_HOME/install/make.log.
I’ll get to the manual linking in a bit.

The oracle database executable and compilation

The first thing to discuss now is compilation. Compilation is the process of turning text based code into a compiled form, for which a lot of compilers do not create an executable form, but an intermediary form, which is called an object. Turning an object or objects into an executable form is called linking. Compiling on linux is done using a compiler, and the default C compiler with Oracle and RedHat linux is gcc. Since Oracle 12.2, the compiler is not a requirement for installation anymore. It is documented, but I believe many may have missed this.

But isn’t there the $ORACLE_HOME/rdbms/lib/config.c file, which is still there, and still used, and isn’t there the make target config.o (make -f ins_rdbms.mk config.o)? Yes, both of them are still there. And still gcc is not a requirement anymore. If you have a pressing need for changing the config.c file (which lists the dba,oper,asm,backup,dataguard,keymanagement and RAC group names), you can still change it, and when you remove the config.o file which USED to be generated with gcc, will now be generated by the ‘as’ executable (portable GNU assembler). This is visible in the oracle database executable make target (ioracle):

$ mv config.o config.O
$ make --dry-run -f ins_rdbms.mk ioracle
chmod 755 /u01/app/oracle/product/19/dbhome_1/bin
cd /u01/app/oracle/product/19/dbhome_1/rdbms/lib/; \
/usr/bin/as -o config.o `[ -f config.c ] && echo config.c || echo config.s`; \
/usr/bin/ar r /u01/app/oracle/product/19/dbhome_1/lib/libserver19.a /u01/app/oracle/product/19/dbhome_1/rdbms/lib/config.o
echo
echo " - Linking Oracle "
rm -f /u01/app/oracle/product/19/dbhome_1/rdbms/lib/oracle
/u01/app/oracle/product/19/dbhome_1/bin/orald  -o /u01/app/oracle/product/19/dbhome_1/rdbms/lib/oracle -m64 -z noexecstack -Wl,--disable-new-dtags -L/u01/app/oracle/product/19/dbhome_1/rdbms/lib/ -L/u01/app/oracle/product/19/dbhome_1/lib/ -L/u01/app/oracle/product/19/dbhome_1/lib/stubs/   -Wl,-E /u01/app/oracle/product/19/dbhome_1/rdbms/lib/opimai.o /u01/app/oracle/product/19/dbhome_1/rdbms/lib/ssoraed.o /u01/app/oracle/product/19/dbhome_1/rdbms/lib/ttcsoi.o -Wl,--whole-archive -lperfsrv19 -Wl,--no-whole-archive /u01/app/oracle/product/19/dbhome_1/lib/nautab.o /u01/app/oracle/product/19/dbhome_1/lib/naeet.o /u01/app/oracle/product/19/dbhome_1/lib/naect.o /u01/app/oracle/product/19/dbhome_1/lib/naedhs.o /u01/app/oracle/product/19/dbhome_1/rdbms/lib/config.o  -ldmext -lserver19 -lodm19 -lofs -lcell19 -lnnet19 -lskgxp19 -lsnls19 -lnls19  -lcore19 -lsnls19 -lnls19 -lcore19 -lsnls19 -lnls19 -lxml19 -lcore19 -lunls19 -lsnls19 -lnls19 -lcore19 -lnls19 -lclient19  -lvsnst19 -lcommon19 -lgeneric19 -lknlopt -loraolap19 -lskjcx19 -lslax19 -lpls19  -lrt -lplp19 -ldmext -lserver19 -lclient19  -lvsnst19 -lcommon19 -lgeneric19 `if [ -f /u01/app/oracle/product/19/dbhome_1/lib/libavserver19.a ] ; then echo "-lavserver19" ; else echo "-lavstub19"; fi` `if [ -f /u01/app/oracle/product/19/dbhome_1/lib/libavclient19.a ] ; then echo "-lavclient19" ; fi` -lknlopt -lslax19 -lpls19  -lrt -lplp19 -ljavavm19 -lserver19  -lwwg  `cat /u01/app/oracle/product/19/dbhome_1/lib/ldflags`    -lncrypt19 -lnsgr19 -lnzjs19 -ln19 -lnl19 -lngsmshd19 -lnro19 `cat /u01/app/oracle/product/19/dbhome_1/lib/ldflags`    -lncrypt19 -lnsgr19 -lnzjs19 -ln19 -lnl19 -lngsmshd19 -lnnzst19 -lzt19 -lztkg19 -lmm -lsnls19 -lnls19  -lcore19 -lsnls19 -lnls19 -lcore19 -lsnls19 -lnls19 -lxml19 -lcore19 -lunls19 -lsnls19 -lnls19 -lcore19 -lnls19 -lztkg19 `cat /u01/app/oracle/product/19/dbhome_1/lib/ldflags`    -lncrypt19 -lnsgr19 -lnzjs19 -ln19 -lnl19 -lngsmshd19 -lnro19 `cat /u01/app/oracle/product/19/dbhome_1/lib/ldflags`    -lncrypt19 -lnsgr19 -lnzjs19 -ln19 -lnl19 -lngsmshd19 -lnnzst19 -lzt19 -lztkg19   -lsnls19 -lnls19  -lcore19 -lsnls19 -lnls19 -lcore19 -lsnls19 -lnls19 -lxml19 -lcore19 -lunls19 -lsnls19 -lnls19 -lcore19 -lnls19 `if /usr/bin/ar tv /u01/app/oracle/product/19/dbhome_1/rdbms/lib/libknlopt.a | grep "kxmnsd.o" > /dev/null 2>&1 ; then echo " " ; else echo "-lordsdo19 -lserver19"; fi` -L/u01/app/oracle/product/19/dbhome_1/ctx/lib/ -lctxc19 -lctx19 -lzx19 -lgx19 -lctx19 -lzx19 -lgx19 -lclscest19 -loevm -lclsra19 -ldbcfg19 -lhasgen19 -lskgxn2 -lnnzst19 -lzt19 -lxml19 -lgeneric19 -locr19 -locrb19 -locrutl19 -lhasgen19 -lskgxn2 -lnnzst19 -lzt19 -lxml19 -lgeneric19  -lgeneric19 -lorazip -loraz -llzopro5 -lorabz2 -lorazstd -loralz4 -lipp_z -lipp_bz2 -lippdc -lipps -lippcore  -lippcp -lsnls19 -lnls19  -lcore19 -lsnls19 -lnls19 -lcore19 -lsnls19 -lnls19 -lxml19 -lcore19 -lunls19 -lsnls19 -lnls19 -lcore19 -lnls19 -lsnls19 -lunls19  -lsnls19 -lnls19  -lcore19 -lsnls19 -lnls19 -lcore19 -lsnls19 -lnls19 -lxml19 -lcore19 -lunls19 -lsnls19 -lnls19 -lcore19 -lnls19 -lasmclnt19 -lcommon19 -lcore19  -ledtn19 -laio -lons  -lmql1 -lipc1 -lfthread19    `cat /u01/app/oracle/product/19/dbhome_1/lib/sysliblist` -Wl,-rpath,/u01/app/oracle/product/19/dbhome_1/lib -lm    `cat /u01/app/oracle/product/19/dbhome_1/lib/sysliblist` -ldl -lm   -L/u01/app/oracle/product/19/dbhome_1/lib `test -x /usr/bin/hugeedit -a -r /usr/lib64/libhugetlbfs.so && test -r /u01/app/oracle/product/19/dbhome_1/rdbms/lib/shugetlbfs.o && echo -Wl,-zcommon-page-size=2097152 -Wl,-zmax-page-size=2097152 -lhugetlbfs`
rm -f /u01/app/oracle/product/19/dbhome_1/bin/oracle
mv /u01/app/oracle/product/19/dbhome_1/rdbms/lib/oracle /u01/app/oracle/product/19/dbhome_1/bin/oracle
chmod 6751 /u01/app/oracle/product/19/dbhome_1/bin/oracle
(if [ ! -f /u01/app/oracle/product/19/dbhome_1/bin/crsd.bin ]; then \
    getcrshome="/u01/app/oracle/product/19/dbhome_1/srvm/admin/getcrshome" ; \
    if [ -f "$getcrshome" ]; then \
        crshome="`$getcrshome`"; \
        if [ -n "$crshome" ]; then \
            if [ $crshome != /u01/app/oracle/product/19/dbhome_1 ]; then \
                oracle="/u01/app/oracle/product/19/dbhome_1/bin/oracle"; \
                $crshome/bin/setasmgidwrap oracle_binary_path=$oracle; \
            fi \
        fi \
    fi \
fi\
);
$ mv config.O config.o

First of all, I am in the $ORACLE_HOME/rdbms/lib directory already. I moved the config.o file to a different name, config.O (uppercase O). This will trigger the config.o file to be generated during linking via the the makefile, because the make macro for generating the oracle executable checks for the existence of config.o in $ORACLE_HOME/rdbms/lib, and the generation of config.o is triggered by it not existing.
I used make with the ‘–dry-run’ option, which means it will list what it WOULD do, it doesn’t actually do it.
Now that the make macro doesn’t find the $ORACLE_HOME/rdbms/lib/config.o file, it generates it, using ‘as’, the GNU assembler.
After the run, I move the config.O file back to config.o.
Please mind the make target config.o (make -f ins_rdbms.mk config.o) still exists, and this follows the traditional way, so using gcc, to create the object file config.o.

For anything else than config.c, Oracle provides objects (the compiled, intermediary form of C) in object files. This has several advantages. First of all, the server to install oracle on doesn’t require a compiler. That also means that there is no discussion about compiler versions, Oracle knows for a fact which version of a compiler is used. Second, Oracle can use a different compiler than the GNU compiler, as long as it does provide objects in Linux X86_64 (ELF) format. In fact, that is what oracle does: for Oracle 19.9, Oracle used a compiler from intel: Intel(R) C Intel(R) 64 Compiler for applications running on Intel(R) 64, Version 17.0.2.174 Build 20170213
You can obtain that information from the oracle executable using:

$ readelf -p .comment oracle | egrep 'Intel.*Build\ [0-9]*'

I hope that at this point I made it clear that no compiler is needed anymore for oracle installation and making changes to the oracle installation, like patching.

The oracle database executable and objects and object files

It’s probably a good idea to show how the oracle executable is build. The way this happens is using the make target ‘ioracle’, which is visible above (make -f ins_rdbms.mk ioracle).
The macro calls ‘orald’, which actually is a script in $ORACLE_HOME/bin, which calls the operating-system ‘ld’ executable, which is the GNU linker.
The arguments to ‘orald’ are arguments that mostly are put through to ‘ld’. ‘-o’ is the output flag, and that shows the executable to be build by the linker, and besides options being set, what you mainly see is -L (library path) and -l (library) switches adding libraries (=object files and archive files) to build the oracle executable.
There’s a couple of places that are used to get objects to build the oracle executable:
– $ORACLE_HOME/rdbms/lib — oracle database rdbms specific libraries
– $ORACLE_HOME/lib — oracle database general libraries (objects are used by multiple “products” in the $ORACLE_HOME)
– $ORACLE_HOME/lib/stubs — this is a directory with ‘stub objects’, which are versions of operating system libraries that contain no code, but allow the oracle executable to be build, even if a operating system library is missing (see: https://docs.oracle.com/cd/E23824_01/html/819-0690/chapter2-22.html)
– $ORACLE_HOME/ctx/lib — oracle text (not sure why oracle text requires an explicit lookup to $ORACLE_HOME/ctx/lib, while other options are all in $ORACLE_HOME/lib
– /lib64 — operating system libraries

At this point it’s important to realise that the object files for linking oracle are visible in two forms: as plain object files (.o) and as archive files (.a). Archive files are exactly what the name suggests: these are archives of object files. You can look and manipulate an archive file using the ‘ar’ (archiver utility), for which the working strongly resembles how tar and jar work: t=list, x=extract, c=create.
If you take a look at one of the main archives, libserver19.a, you see that it contains 2852 object files:

$ ar -t $ORACLE_HOME/lib/libserver19.a | wc -l
2852

If you do wonder what’s inside, ‘ar -tv’ would be a good way to have an idea:

$ ar -tv $ORACLE_HOME/lib/libserver19.a
rw-r--r-- 54321/54321  11136 Oct 19 21:12 2020 kdr.o
rw-rw-r-- 94110/42424   8376 Apr 17 04:58 2019 upd.o
rw-rw-r-- 94110/42424  41968 Apr 17 04:58 2019 kd.o
...
rw-r--r-- 54321/54321  13248 Oct 19 21:11 2020 qjsntrans.o
rw-r--r-- 54321/54321  20296 Oct 19 21:11 2020 kubsd.o
rw-r--r-- 54321/54321  16720 Oct 19 21:12 2020 kqro.o

The conclusion here is that archive files are logical and sensible, otherwise the library directories would have been swamped with huge numbers of object files.

When object files are linked to an executable, it requires object files, and these are in ‘.o’ files, or grouped in ‘.a’ files. A third type of file is needed for linking linking an executable that is going to be a dynamically linked executable: the libraries (the ‘.so’ files) the executable is dynamically going to use. The linker will validate the libraries, which means it inspects the libraries to find the symbols that the objects that form the executable is calling. A library (‘.so’ file) is an already compiled form, in facts it’s pretty much similar to an executable, only the way it’s invoked is when it’s called by a dynamically linked executable that uses it, instead of directly.

The object files itself

This above text pretty much describe how executables, libraries, object files and archives sit together, and how the linking creates the oracle executable via the makefile. This description describes how this is configured by oracle for creating the oracle executable. However, this is really flexible, and can be done differently, so this is not how it always is or should be, this is how oracle chosen it to do.

We can look one level deeper into how this works. An object file in fact is already an archive, containing one or more compiled versions of functions:

$ nm -A opimai.o
opimai.o:                 U dbkc_free_bs_context
opimai.o:                 U dbkc_init
opimai.o:                 U dbktFlush
opimai.o:                 U __intel_new_feature_proc_init
opimai.o:                 U kgeasnmierr
opimai.o:                 U kge_pop_guard_fr
opimai.o:                 U kge_push_guard_fr
opimai.o:                 U kge_report_17099
opimai.o:                 U kgeresl
opimai.o:                 U kge_reuse_guard_fr
opimai.o:                 U ksdwrf
opimai.o:                 U kseini
opimai.o:                 U ksmdsg
opimai.o:                 U ksmgpg_
opimai.o:                 U ksmlsge_phaseone
opimai.o:                 U ksmsgaattached_
opimai.o:                 U kso_save_arg
opimai.o:                 U kso_spawn_ts_save
opimai.o:                 U ksosp_parse
opimai.o:                 U ksuginpr
opimai.o:                 U lfvinit
opimai.o:0000000000000010 T main
opimai.o:                 U opiinit
opimai.o:0000000000000300 t opimai_init
opimai.o:0000000000000140 T opimai_real
opimai.o:                 U opiterm
opimai.o:                 U sdterm
opimai.o:                 U _setjmp
opimai.o:                 U skge_sign_fr
opimai.o:                 U skgmstack
opimai.o:                 U skgp_done_args
opimai.o:                 U skgp_retrieve_args
opimai.o:                 U slgtds
opimai.o:                 U slgts
opimai.o:                 U slkbpi
opimai.o:                 U slkfpi
opimai.o:                 U sou2o
opimai.o:                 U spargs
opimai.o:                 U ssthrdmain

This example takes the object file $ORACLE_HOME/rdbms/lib/opimai.o, and this object file contains 3 actual functions (shown by an address and the symbol type ‘T’ or ‘t’), and a whole bunch of functions without an address and symbol type ‘U’. The functions with symbol type ‘U’ are undefined functions, which means that these functions are not in this archive, but defined somewhere else.
The important thing to consider is that a single symbol can contain multiple functions.

I chosen this object file, because this is in fact the actual object file where the main function, the starting function, for the oracle executable is in. If you obtain a stack trace of an oracle database process, the first function (called ‘first frame’) in at least recent linux versions (some other operating systems or versions might show earlier functions) will show main as the first function. This is also what the linker uses to build the executable, it follows the symbol information together with the command line switches to resolve and obtain all the functions via the symbol information. The linker will generate an error and not build the executable if it can’t find or resolve the symbols and get all the information it needs.

Patching

At this point you should have an understanding what a dynamically linked executable, libraries, object files and archives are, and that the oracle executable is build using a makefile using the linker.

It might be handy and interesting to look at patching. This information, the information about archives and objects, should give you more background about the specifics of patching. Oracle patching has many forms, and actually can do and change a lot of things in a lot of ways. It is retraceable what a patch does by looking at the contents of the patch. But that is now what this post is about.

Especially with one-off patches, in the case of a patch to fix or change one or more functions in the oracle executable, what the patch provides is the fixed and thus changed versions of these functions. However, oracle does not provide sourcecode. In general what oracle provides, is the object or objects containing the changed functions. In order to get the changed function or functions into the oracle executable, what generally happens is that the current/old versions of the object file are removed from the archive they are in, and saved in $ORACLE_HOME/.patch_storage, and the patched versions of the object file are inserted into the archive.

But, we saw an object file generally contains more or much more functions. Well, this is why patches can be incompatible with other patches: if multiple patches change different (or the same) functions in the same objects, the patch applied latest will undo the changes of the previous patch(es). This is also why you must request merge patches if patches are incompatible.

Dealing with individual object files, extracting them from an archive and saving them in order to be able to restore it into the archive is tedious. Also, the archive itself doesn’t mind whatever you remove from it or insert to it, even if it will break linking the oracle executable. Therefore, oracle has created opatch to perform a great deal of validations and checks, and take the work of dependency checks from you and fully automate it. In fact, in general, you can take a (one-off) patch and try to apply it, if it does, it will allow oracle to be relinked, if there is a conflicting patch it will tell you. Also, if you want to revert your applied patch, you can simply rollback and get opatch to load the previous version into the archive. This is way better than letting us humans deal with it directly.

After the patching changed the archives to contain the updated versions of the objects which contain updated functions, these must make it into the oracle executable. This must be done by relinking the executable, which will take the objects including the changed objects from all the object files and archives, and create a new executable. The oracle executable is never directly touched on linux with recent versions, to my knowledge.

I hope this explanation made sense and made a lot of these things which we are dealing with as oracle DBAs more understandable. Any comments or updates are welcome!

This post is about how to make your log files being aggregated in a single place and easy searchable via a convenient web interface.

You might think: wait a minute; doesn’t this exist already? Well, yes and no. Let me explain.
a) traditional log management: traditionally, logs are/were searched for certain strings, typically error messages or parts of error messages and a shell script run by cron, and when matched, a system that was setup to be able to email would send an indicator that the aforementioned string was found.
b) log management via a monitoring tool: the next step was the individual script on each server was exchanged for a monitoring tool, which performed the same task as the shell script. In reality quite often a default monitoring set/template was enabled, instead of the specific strings that were searched for with the shell script. Sometimes this was an improvement, sometimes this meant the specific issues (and thus messages) gone invisible. This is still the way the monitoring works in 90% of the cases, including a general completely standard monitoring template. At least in my experience.
c) other log gathering and indexing: there are many more products that perform this function. The first one that comes to my mind is splunk, and all the options for doing this cloud based (many there!), and a lot of tools based on elasticsearch, like the “ELK stack”.

I think it’s clear “a)” and “b)” are not dynamic and in fact very static. My major concern is it doesn’t allow exploratory investigation, it simply is a warning that is raised, any investigation means you have to log on and start browsing the available information locally. Everybody who worked with HP Openview or Oracle Enterprise Manager will recognise this. Yes, it’s probably all possible with these tools, but it never (ever(!)) is implemented.

For the last category, the likes of splunk, the ELK stack and the cloud based log tools: for the first two I feel definitely serves a function, but it’s aimed at aggregating logs of multiple servers, and is simply too much to setup on a server alongside the processes it is meant to monitor. For the cloud based tools: it might be my conservatism, but getting a subscription and loading up logging feels awkward, especially if it’s for my own test purposes.

This is where Loki comes in. Finally, I would say, there is a tool that can function on small scale (!) and perform log aggegration and provide a searchable database without a huge setup. This is ideal for your own test or development box to able to discover what is going on, and have all the log files at your fingertips without endlessly going through the filesystem performing cd, tail, grep, wc, sort, uniq, et cetera. I think lots of people recognise travelling from log file to log file.

Loki gives you a database that orders log entries based on time, and Grafana provides a web based UI to view and query the loki database. This is how it looks like:

(click to enlarge)
This is an example, what this shows is my test machine, where I decided to see when linux started, as well when the Oracle database instance was started.
* The first query uses the label “rdbms_alert” (indicating it came from the text-based oracle alert.log file, this is a label I added to it), and within the log lines with the label job with value “rdbms_alert”, I added a filter for the string “Starting ORACLE”, which indicates an Oracle database instance start.
* The second query uses the label “messages” (indicating it came from the linux /var/log/messages file, this label is mine too), and within the log lines with the label job with the value “messages”, I added a filter for the string “kernel: Command line”, which indicates linux startup. I additionally added a negative filter for “loki”, because loki logs the queries to the messages file, which I don’t want to see.

I hope you can see the power of having all the logs in a single place, and completely searchable.

This is just a start, this is a very simple proof-of-concept setup, for example the date/time in the log lines is not used, the date/time of the log lines is when it was ingested into loki, it is possible to have loki interpret these.

If you are interested, but are uncertain if this is for you, and would like to test this: I got a couple of Ansible scripts that can setup the combination of:
* promtail (the default loki log streaming tool)
* loki (the database)
* Grafana (the web UI)
The scripts are created on Oracle Linux 7.8.

Install git, and clone the https://gitlab.com/FritsHoogland/loki-setup.git repository using a user that has sudo rights:

git clone https://gitlab.com/FritsHoogland/loki-setup.git

Install ansible (you might have to install the EPEL repository).

In the loki-setup repo you find the setup scripts for loki, promtail and grafana.
You can execute the scripts in that order (loki, promtail and Grafana) by executing the setup_loki.yml, setup_promtail,yml and setup_grafana.yml scripts.
IMPORTANT: do proofread the scripts, and validate the variables for your situation. Don’t worry: the scripts are easy to read.

After installing, you can go to http://yourmachine:3000, login with admin and the password you set in the setup_grafana.yml script, and click on the compass (explore) option, and you can perform your own log browsing.

If you decide you want to remove it, I got a remove script for each of the components, which will remove the specific component entirely. Same applies here too: validate the script.

All Oracle database professionals know the current versions of the Oracle database (12.2, 18, 19, 20 at the moment of writing), and we also know the pace Oracle corporation keeps is so high that a lot of companies are having a hard time keeping up with the current versions. A prominent one is Oracle corporation itself for their E-Business suite software, where Oracle extended the support for the database for version 12.1.0.2 and 11.2.0.4 for E-Business suite licenses only. But this blog isn’t about bitching about the pace of Oracle support and versions getting desupported.

What I do regularly encounter is that for all kinds of reasons a database version is not updated. Most of the time the versions that are encountered are 12.1.0.2 (the long term supported version of the 12.1 version of the database), 11.2.0.4 (the long term supported version of the 11.2 version of the database), and more and more seldom 11.2.0.3. If things truly have been left without updating you might encounter 11.2.0.2, and god forbid if you still have 11.2.0.1, that version had a lot if issues.

Now what if you encounter even older versions? Probably younger Oracle database consultants might never even have seen older versions than 11.2.0.1. But what if you need to work with a truly old version? Or are just interested in such an old version to see how it looks like, and what the state of the database was at that version?

For that, I created an automatic installation script to install either:
– Release 10.2 versions: 10.2.0.1, 10.2.0.2, 10.2.0.3, 10.2.0.4, 10.2.0.5.
– Release 9.2 versions: 9.2.0.4, 9.2.0.5*, 9.2.0.6*, 9.2.0.7, 9.2.0.8.
(*=the patch install is fully scripted, but linking oracle throws an error)

Yes, this is extremely old, and if you must work with it, there has been neglect and somebody not paying attention at a quite massive scale. There are also licensing implications that do not work in your favour there.

There is a huge caveat too: the installation media for these Oracle database versions is not available for download anywhere as far as I know, and some of the patches are restricted downloads on My Oracle Support too. Since it’s Oracle proprietary software, the only way to obtain it is via Oracle.

Outside of these, if you must or want to use these ancient versions, and you got the required files, you can use:
https://gitlab.com/FritsHoogland/ol48_oracle92 for installing Oracle 9.2 on Oracle Linux 4.8 or
https://gitlab.com/FritsHoogland/ol511_oracle102 for installing Oracle 10.2 on oracle Linux 5.11

Clone the repository, put the required files in the files directory, edit the Vagrantfile to your liking and then build the database server by typing ‘vagrant up’.

In case you’re wondering how the operating system images are build, this is done using ‘packer’, I have a repository where you can see how these are build too: https://gitlab.com/FritsHoogland/packer-oracle-linux-museum

In case you’re wondering: there are even older versions; the first public version of the Oracle database on linux, as far as I know, is Oracle 8.0.5. However, the linux version to use with versions like 8.1.7, RHEL/Centos 2.1, is so old that it doesn’t play nicely with VirtualBox and packer, so in al reasonability, Oracle 9.2/oracle linux 4.8 is the earliest version that can be used without severe issues.

This post is the result of a question that I got after presenting a session about Oracle database mutexes organised by ITOUG, as a response to the conference cancellations because of COVID-19. Thank Gianni Ceresa for asking me!

The library cache provides shared cursors and execution plans. Because they are shared, sessions can take advantage of the work of previous sessions of creating these. However, by having these shared, access needs to be regulated not to have sessions overwrite each other’s work. This is done by mutexes.

The question I got was (this is a paraphrased from my memory): ‘when using pluggable databases, could a session in one pluggable database influence performance of a session in another pluggable database’?

The answer I gave was that I didn’t test a multi-tenant scenario, but because the library cache is shared between the pluggable databases, it should be possible for a session in one pluggable database to block another session in another pluggable database.

So let’s test it!

I used an Oracle version 20.2 database, which automatically gives you a pluggable database. In fact, even when you don’t specify you want a multi-tenant database, it will create one. This is as expected and documented.

I created an instance called o202, and two pluggable databases, PDB1 and PDB2. Yes, I am a creative genius.

I logged in to PDB1 and executed a ‘select 8 from dual’.

SQL> show con_id

CON_ID
------------------------------
3

SQL> select 8 from dual;

         8
----------
         8

Now using another session in the root container, I dumped the library cache at level 16 to see how that looks like for the ‘select 8 from dual’ cursor:

SQL> show con_id

CON_ID
------------------------------
1

SQL> alter session set events 'immediate trace name library_cache level 16';

Session altered.

(don’t do this on a live environment!)
These are snippets of certain parts of the dump:

Bucket: #=100089 Mutex=0x71f7f8d8(627065225216, 6, 0, 6)
  LibraryHandle:  Address=0x67bcca90 Hash=372d86f9 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
    ObjectName:  Name=select 8 from dual
      FullHashValue=f18bf11763dd069341c61ce6372d86f9 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=1 ContainerUid=1 Identifier=925730553 OwnerIdn=9
...
    Concurrency:  DependencyMutex=0x67bccb40(0, 1, 0, 0) Mutex=0x67bccbe0(146, 22, 0, 6)
...
        Child:  childNum='0'
          LibraryHandle:  Address=0x6e6b3fd8 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
            Name:  Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=3
...
            Concurrency:  DependencyMutex=0x6e6b4088(0, 0, 0, 0) Mutex=0x67bccbe0(146, 22, 0, 6)

This is bucket 100089, with its mutex. The counter for this mutex is the second argument, which is 6.
The library cache handle on the next line is the parent handle. The sql text and the hash value sit with the parent handle.
I am not sure how to interpret the “ContainerId” here, it says 1 (indicating the root container), while this SQL is only executed in container 3, which is shown above.
After the first ‘…’, the two mutexes in the parent handle can be seen: the dependency mutex, and the parent handle mutex.
After the second ‘…’, child number 0 is visible. Here the container is specified from which this child was actually executed.
After the third ‘…’, the concurrency information for the child is shown. The important bit is the dependency mutex is independent/unique, whilst the regular mutex is the same/shared from the parent.

Now what would happen if I execute the exact same SQL (select 8 from dual) in another container; PDB2 alias container id 4?

SQL> show con_id

CON_ID
------------------------------
4

SQL> select 8 from dual;

         8
----------
         8

And look at bucket 100089 again (using the dump of the library cache at level 16):

Bucket: #=100089 Mutex=0x73f81e58(8589934592, 4, 0, 6)
  LibraryHandle:  Address=0x656e2638 Hash=372d86f9 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
    ObjectName:  Name=select 8 from dual
      FullHashValue=f18bf11763dd069341c61ce6372d86f9 Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=1 ContainerUid=1 Identifier=925730553 OwnerIdn=9
...
    Concurrency:  DependencyMutex=0x656e26e8(0, 2, 0, 0) Mutex=0x656e2788(2, 31, 0, 6)
...
        Child:  childNum='0'
          LibraryHandle:  Address=0x656e0ed8 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
            Name:  Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=3
...
            Concurrency:  DependencyMutex=0x656e0f88(0, 0, 0, 0) Mutex=0x656e2788(2, 31, 0, 6)
...
        Child:  childNum='1'
          LibraryHandle:  Address=0x655fba58 Hash=0 LockMode=N PinMode=0 LoadLockMode=0 Status=VALD
            Name:  Namespace=SQL AREA(00) Type=CURSOR(00) ContainerId=4
...
            Concurrency:  DependencyMutex=0x655fbb08(0, 0, 0, 0) Mutex=0x656e2788(2, 31, 0, 6)

Executing an identical SQL statement (which is a cursor object in the library cache) in another PDB means the same library cache hash bucket is used, and container id is one of the sharing criteria for a child. Because the statement in both cases has a different container id, the cursors can not be shared. This is very logical if you think about it: both containers essentially are logically isolated databases, and therefore there is no way to tell if the tables, access rights, data, really anything is the same, so the SQL is parsed again for the other container.

This has some implications. Because of identical SQL, it hashes to the same value, which mean both containers need access to the same bucket to find the pointer to the parent handle. Both containers also need access to the same parent and the same heap 0 to obtain the child list to find their child. Both the bucket and the handle have a single mutex that serializes access.

Access to the parent heap 0 is done shared, and remains shared if a compatible child can be found.

However, as soon as a new child needs to be created for the same SQL, it will create the new child handle pinned in exclusive mode and insert it into the hash table/child list in the parent’s heap 0. The new child is pinned in exclusive mode, because it still needs to allocate the child heaps and create the child information. This new child is the newest entry, and therefore is the first one any session would find if it scans for a compatible child.

At this point multi-tenancy makes a difference:
– If a session in the same container starts parsing and follows the procedure of hashing the SQL text, obtaining the bucket, finding the parent handle and then scan the child list for a compatible child, it will find the child in exclusive mode for this container, and waits for the child creation to finish, waiting on the event: ‘cursor: pin S wait on X’.
– If a session in ANOTHER container starts parsing and follows the procedure of hashing the SQL text, obtaining the bucket, finding the parent handle and then scan the child list for a compatible child, it will find the child in exclusive mode, but because it has a different container id, it will skip the child, and continue to scan the child list for compatible children. The scanning means pinning each child that potentially can be compatible, which means has the same container id, in shared mode, and by doing that either finds a compatible child, or if it doesn’t, create one itself.

So what does that mean; the conclusion:
The library cache hash table, the parent cursor and the child list/hash table are shared between all containers. These require unique (library cache hash table/handle) or shared (child list/hash table) access for each database process for usage, but the actual time it is held in these modes is so short that it is negligible.
The main reason for waiting for a cursor/library cache entry is a child pinned in exclusive mode during creation (‘cursor: pin S wait on X’).
When an identical statement is executed in another container, it will skip a child that is created or in the process of being created in another container, even when it’s pinned in exclusive mode.

This means that the multi-tenant option in my opinion does not significantly increase the risk of waiting because of library cache concurrency, specifically because it can skip child cursors pinned exclusively in another container.

Update april 4: I tested having a child from one container being pinned in X mode and running the same SQL in another container in 12.1.0.2.0 (base release of 12.1.0.2) and it works exactly the same as described above. So whilst the library cache hash table and parent handles are shared, the cursor children are specific to a container and do not lock each other out.

This post is about one of the fundamentally important properties of a database: how IO is done. The test case I studied is doing a simple full table scan of a single large table. In both Oracle and postgres the table doesn’t have any indexes or constraints, which is not a realistic example, but this doesn’t change the principal topic of the study: doing a table scan.

I used a publicly available dataset from the US bureau of transportation statistics called FAF4.5.1_database.zip
The zipped file is 347MB, unzipped size 1.7GB.

In both cases Oracle Linux 7.7 (64 bit) is used, running in VirtualBox, with the storage being a USB3 SSD. Number of CPUs is 4, memory size is 6G. Filesystem type: xfs.
The Oracle version used is Oracle 19.5, the Postgresql version used is 12.1.
For Postgresql, the postgresql.conf file is not changed, except for max_parallel_workers_per_gather which is set to 0 to make postgres use a single process.
For Oracle, the parameters that I think are important: filesystemio_options=’setall’. Oracle is used filesystem based (so no ASM).

This is the table definition for Oracle:

create table faf451 (
  fr_origin varchar2(3),
  dms_orig varchar2(3),
  dms_dest varchar2(3),
  fr_dest varchar2(3),
  fr_inmode varchar2(1),
  dms_mode varchar2(1),
  fr_outmode varchar2(1),
  sctg2 varchar2(2),
  trade_type varchar2(1),
  tons number,
  value number,
  tmiles number,
  curval number,
  wgt_dist number,
  year varchar2(4)
);

This is the table definition for Postgresql:

create table faf451 (
  fr_origin varchar(3),
  dms_orig varchar(3),
  dms_dest varchar(3),
  fr_dest varchar(3),
  fr_inmode varchar(1),
  dms_mode varchar(1),
  fr_outmode varchar(1),
  sctg2 varchar(2),
  trade_type varchar(1),
  tons double precision,
  value double precision,
  tmiles double precision,
  curval double precision,
  wgt_dist double precision,
  year varchar(4)
);

In order for the data to be easy loadable into postgres using copy from, I had to remove ‘””‘ (double double quotes) for the empty numeric fields. In oracle I could say “optionally enclosed by ‘”‘”. For Oracle I used an external table definition to load the data.

Now, before doing any benchmarks, I have an idea where this is going. Oracle is using direct IO (DIO) so linux page cache management and “double buffering” are avoided. Also, oracle will be doing asynchronous IO (AIO), which means submitting is separated from waiting for the notification that the submitted IOs are ready, and on top of that oracle will submit multiple IO requests at the same time. And again on top of that, oracle does multi-block IO, which means that instead of requesting each 8K database block individually, it will group adjacent blocks and request for these in one go, up to a size of combined blocks of 1MB, which means it can requests up to 128 8K blocks in one IO. Postgres will request every block synchronous, so 1 8K block at a time, and waiting for each request to finish. That makes me have a strong idea where this is going.

It should be noted that postgres explicitly is depending on the operating system page cache for buffering as a design principle. Because of DIO, blocks that are read by oracle are not cached in the operating system page cache.

I executed my benchmark in the following way:
– A run for every size is executed 5 times.
– At the start of every run for a certain size (so before every “batch” of 5 runs), the page cache is flushed: (echo 3 > /proc/sys/vm/drop_caches).
– Before each individual run, the database cache is flushed (systemctl restart postgresql-12 for postgres, alter system flush buffer_cache for oracle).

I started off with 2G from the dataset, and then simply performed a ‘copy from’ again to load the same dataset into the table in postgres. Oracle required a bit more of work. Oracle was able to save the same data in way less blocks; the size became 1.18G. In order to have both postgres and oracle scan the same amount of data, I calculated roughly how much rows I needed to add to the table to make it 2G, and copied that table to save it as a 2G table, so I could insert that table to increase the size of the test table by 2G. This way in both oracle and postgres I could test with a 2G table and add 2G at a time until I reached 20G.

These are the results. As you can see in the legenda: oracle is orange, postgres is blue.
postgres oracle scan results(click graph to load full picture)

What we see, is that postgres is a bit slower with the first run of 5 for the smaller dataset sizes, which becomes less visible with larger datasets.
Also, postgres is way faster if the dataset fits into the page cache and it has been read into it. This is logical because postgres explicitly uses the page cache as a secondary cache, and the test is the only activity on this server, so it hasn’t been flushed by other activity.

What was totally shocking to me, is postgres is performing alike oracle and both roughly are able to perform at the maximum IO speed of my disk: 300MB/s, especially when the dataset is bigger, alias beyond the page cache size.

It wasn’t shocking that oracle could reach the total bandwidth of the disk: oracle uses all the techniques to optimise IO for bandwidth. But how can postgres do the same, NOT deploying these techniques, reading 8K at a time??

The first thing to check is whether postgres is doing something else than I suspected. This can simply be checked with strace:

poll_wait(3, [{EPOLLIN, {u32=18818136, u64=18818136}}], 1, -1) = 1
recvfrom(11, "Q\0\0\0!select count(*) from faf451"..., 8192, 0, NULL, NULL) = 34
lseek(20, 0, SEEK_END)                  = 335740928
lseek(20, 0, SEEK_END)                  = 335740928
kill(1518, SIGUSR1)                     = 0
pread64(5, "\f\0\0\0\310ILc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846061568) = 8192
pread64(5, "\f\0\0\0HcLc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846069760) = 8192
pread64(5, "\f\0\0\0\260|Lc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846077952) = 8192
pread64(5, "\f\0\0\0000\226Lc\0\0\0\0D\1\210\1\0 \4 \0\0\0\0\230\237\312\0000\237\312\0"..., 8192, 846086144) = 8192
…etc…

The above strace output shows only 4 rows of pread64() calls, but this goes on. So no “secret” optimisation there.

Luckily, my VM has a new enough version of Linux for it to be able to use eBPF, so I can use biosnoop. Biosnoop is a tool to look at IO on one of the lower layers of the linux kernel, the block device interface (hence ‘bio’). This is the biosnoop output:

# /usr/share/bcc/tools/biosnoop
TIME(s)        COMM           PID    DISK    T  SECTOR    BYTES   LAT(ms)
0.000000000    postmaster     4143   sdb     R  66727776  708608     5.51
0.006419000    postmaster     4143   sdb     R  66731720  77824     11.06
0.006497000    postmaster     4143   sdb     R  66734432  786432    11.03
0.011550000    postmaster     4143   sdb     R  66731872  1310720   16.17
0.013470000    postmaster     4143   sdb     R  66729160  1310720   18.86
0.016439000    postmaster     4143   sdb     R  66735968  1310720   14.61
0.019220000    postmaster     4143   sdb     R  66738528  786432    15.20

Wow…so here it’s doing IOs of up to 1MB! So somewhere between postgres itself and the block device, the IOs magically grew to sizes up to 1MB…that’s weird. The only thing that sits between postgres and the block device is the linux kernel, which includes page cache management.

To get an insight into that, I ran ‘perf record -g -p PID’ during the scan, and then perf report to look at the recorded perf data. This is what is I found:

Samples: 21K of event 'cpu-clock', Event count (approx.): 5277000000
  Children      Self  Command     Shared Object       Symbol                                                                  ◆
-   41.84%     3.63%  postmaster  libpthread-2.17.so  [.] __pread_nocancel                                                    ▒
   - 38.20% __pread_nocancel                                                                                                  ▒
      - 38.08% entry_SYSCALL_64_after_hwframe                                                                                 ▒
         - 37.95% do_syscall_64                                                                                               ▒
            - 35.87% sys_pread64                                                                                              ▒
               - 35.51% vfs_read                                                                                              ▒
                  - 35.07% __vfs_read                                                                                         ▒
                     - 34.97% xfs_file_read_iter                                                                              ▒
                        - 34.69% __dta_xfs_file_buffered_aio_read_3293                                                        ▒
                           - 34.32% generic_file_read_iter                                                                    ▒
                              - 21.10% page_cache_async_readahead                                                             ▒
                                 - 21.04% ondemand_readahead                                                                  ▒
                                    - 20.99% __do_page_cache_readahead                                                        ▒
                                       + 14.14% __dta_xfs_vm_readpages_3179                                                   ▒
                                       + 5.07% __page_cache_alloc                                                             ▒
                                       + 0.97% radix_tree_lookup                                                              ▒
                                       + 0.54% blk_finish_plug                                                                ▒

If you look at rows 13-15 you see that the kernel is performing readahead. This is an automatic function in the linux kernel which looks if the requests are sequential of nature, and when that’s true performs readahead, so that the scan is made faster.

For the difference between Oracle database versions 12.2.0.1.191015 and 12.2.0.1.200114 this too follows the line of a low amount of differences.

There have been two spare parameters that have been changed to named undocumented parameters, and no data dictionary changes.

parameters unique in version 12.2.0.1.191015 versus 12.2.0.1.200114

NAME
--------------------------------------------------
_fifth_spare_parameter
_one-hundred-and-forty-eighth_spare_parameter

parameters unique in version 12.2.0.1.200114 versus 12.2.0.1.191015

NAME
--------------------------------------------------
_bug29825525_bct_public_dba_buffer_dynresize_delay
_enable_ptime_update_for_sys

On the C function side, there have been a group of AWR functions that have been removed and a group of SGA management functions, among other functions. There functions that have been added are random and diverse.

code symbol names unique in version 12.2.0.1.191015 versus 12.2.0.1.200114

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
R_CR_entropy_resource_init                                   R_CR_entropy_resource_init                                   ??
kcbzdra                                                      (kcbz)dra                                                    kernel cache buffers subroutines for kcb ??
kdmsCreateSampleInvBlkList                                   (kdm)sCreateSampleInvBlkList                                 kernel data in-memory data layer ??
kdmsFillSampleList                                           (kdm)sFillSampleList                                         kernel data in-memory data layer ??
kewmfdms_flush_drmsum                                        (kewm)fdms_flush_drmsum                                      kernel event AWR metrics ??
kewmgaeidct                                                  (kewm)gaeidct                                                kernel event AWR metrics ??
kewmusmdb_update_smdbuf                                      (kewm)usmdb_update_smdbuf                                    kernel event AWR metrics ??
kewramcs_app_map_condbid_str                                 (kewr)amcs_app_map_condbid_str                               kernel event AWR repository ??
kewramvn_append_mdb_vvwname                                  (kewr)amvn_append_mdb_vvwname                                kernel event AWR repository ??
kewrccsq_collect_csql                                        (kewr)ccsq_collect_csql                                      kernel event AWR repository ??
kewrfosp2_fos_mdb_part2                                      (kewrf)osp2_fos_mdb_part2                                    kernel event AWR repository flush ??
kewrfosp3_fos_mdb_part3                                      (kewrf)osp3_fos_mdb_part3                                    kernel event AWR repository flush ??
kewrgcfes_get_cacheid_from_enum_str                          (kewr)gcfes_get_cacheid_from_enum_str                        kernel event AWR repository ??
kewrggd_get_group_descriptor                                 (kewr)ggd_get_group_descriptor                               kernel event AWR repository ??
kewrggf_grp_get_flags                                        (kewr)ggf_grp_get_flags                                      kernel event AWR repository ??
kewrggh_grp_get_handle                                       (kewr)ggh_grp_get_handle                                     kernel event AWR repository ??
kewrggmc_grp_get_member_count                                (kewr)ggmc_grp_get_member_count                              kernel event AWR repository ??
kewrgltn_gen_lrgtest_tab_name                                (kewr)gltn_gen_lrgtest_tab_name                              kernel event AWR repository ??
kewrgvm_grp_valid_member                                     (kewr)gvm_grp_valid_member                                   kernel event AWR repository ??
kewrice_is_cache_enabled                                     (kewr)ice_is_cache_enabled                                   kernel event AWR repository ??
kewrmfp_map_flush_phase                                      (kewr)mfp_map_flush_phase                                    kernel event AWR repository ??
kewrmplvl_map_snap_level                                     (kewr)mplvl_map_snap_level                                   kernel event AWR repository ??
kewrpfbue_pdb_from_buffer_entry                              (kewr)pfbue_pdb_from_buffer_entry                            kernel event AWR repository ??
kewrptsq_prep_topsql                                         (kewr)ptsq_prep_topsql                                       kernel event AWR repository ??
kewrrc_release_cache                                         (kewr)rc_release_cache                                       kernel event AWR repository ??
kewrsaobn_set_all_objnames                                   (kewr)saobn_set_all_objnames                                 kernel event AWR repository ??
kewrsonie_set_object_names_in_entry                          (kewr)sonie_set_object_names_in_entry                        kernel event AWR repository ??
kewrsqlc_sql_iscolored_cb                                    (kewr)sqlc_sql_iscolored_cb                                  kernel event AWR repository ??
kgskltyp                                                     (kgsk)ltyp                                                   kernel generic service resource manager ??
kkeutlCopyAllocatorState                                     (kke)utlCopyAllocatorState                                   kernel compile cost engine ??
kkeutlIsAllocStructureSame                                   (kke)utlIsAllocStructureSame                                 kernel compile cost engine ??
kmgs_check_uninited_comp                                     (kmgs)_check_uninited_comp                                   kernel multi threaded/mman manage (sga) space (?) ??
kmgs_dump_partial_inuse_list_comp                            (kmgs)_dump_partial_inuse_list_comp                          kernel multi threaded/mman manage (sga) space (?) ??
kmgs_dump_quiesce_list                                       (kmgs)_dump_quiesce_list                                     kernel multi threaded/mman manage (sga) space (?) ??
kmgs_dump_resize_summary                                     (kmgs)_dump_resize_summary                                   kernel multi threaded/mman manage (sga) space (?) ??
kmgs_fill_start_sizes                                        (kmgs)_fill_start_sizes                                      kernel multi threaded/mman manage (sga) space (?) ??
kmgs_get_min_cache_grans                                     (kmgs)_get_min_cache_grans                                   kernel multi threaded/mman manage (sga) space (?) ??
kmgs_getgran_from_comp_pg                                    (kmgs)_getgran_from_comp_pg                                  kernel multi threaded/mman manage (sga) space (?) ??
kmgs_init_sgapga_comps                                       (kmgs)_init_sgapga_comps                                     kernel multi threaded/mman manage (sga) space (?) ??
kmgs_nvmksmid_2_kcbpoolid                                    (kmgs)_nvmksmid_2_kcbpoolid                                  kernel multi threaded/mman manage (sga) space (?) ??
kmgs_recv_and_donor_are_caches                               (kmgs)_recv_and_donor_are_caches                             kernel multi threaded/mman manage (sga) space (?) ??
kmgs_shrink_gran                                             (kmgs)_shrink_gran                                           kernel multi threaded/mman manage (sga) space (?) ??
kmgs_update_param_manual_helper                              (kmgs)_update_param_manual_helper                            kernel multi threaded/mman manage (sga) space (?) ??
kmgs_update_resize_summary                                   (kmgs)_update_resize_summary                                 kernel multi threaded/mman manage (sga) space (?) ??
kmgsb_in_range                                               (kmgs)b_in_range                                             kernel multi threaded/mman manage (sga) space (?) ??
kmgsdpgl                                                     (kmgs)dpgl                                                   kernel multi threaded/mman manage (sga) space (?) ??
kmgsset_timestamp                                            (kmgs)set_timestamp                                          kernel multi threaded/mman manage (sga) space (?) ??
krvxgtf                                                      (krvx)gtf                                                    kernel redo recovery extract ??
krvxrte                                                      (krvx)rte                                                    kernel redo recovery extract ??
kslsesftcb_int                                               (ksl)sesftcb_int                                             kernel service  latching and post-wait ??
ksmg_estimate_sgamax                                         (ksm)g_estimate_sgamax                                       kernel service  memory ??
ktcxbFlgPrint                                                (ktc)xbFlgPrint                                              kernel transaction control component ??
kzagetcid                                                    (kza)getcid                                                  kernel security audit  ??
kzekmdcw                                                     (kz)ekmdcw                                                   kernel security ??
qeroiFirstPart                                               (qeroi)FirstPart                                             query execute rowsource extensibel indexing query component ??
qksbgUnderOFE                                                (qksbg)UnderOFE                                              query kernel sql bind (variable) management(?) ??
ri_entcb_cmd_func                                            ri_entcb_cmd_func                                            ??
zt_yield_entropy_source_cb                                   (zt)_yield_entropy_source_cb                                 security encryption ??

code symbol names unique in version 12.2.0.1.200114 versus 12.2.0.1.191015

NAME                                                         RESOLVE                                                      ANNOTATION
------------------------------------------------------------ ------------------------------------------------------------ --------------------------------------------------------------------------------
apagwnrn                                                     (apa)gwnrn                                                   SQL Access Path Analysis ??
apagwnrnprd                                                  (apa)gwnrnprd                                                SQL Access Path Analysis ??
apatwnrn                                                     (apa)twnrn                                                   SQL Access Path Analysis ??
kafcpy_one_row                                               (kaf)cpy_one_row                                             kernel access fetch ??
kcbz_eff_bsz                                                 (kcbz)_eff_bsz                                               kernel cache buffers subroutines for kcb ??
kdilm_row_diskcompress_policy_type                           (kdil)m_row_diskcompress_policy_type                         kernel data index load ??
kdsReadAheadSafe                                             (kds)ReadAheadSafe                                           kernel data seek/scan ??
kfdFreeReqs                                                  (kfd)FreeReqs                                                kernel automatic storage management disk ??
kfdp_getNormalFgCnt                                          (kfdp)_getNormalFgCnt                                        kernel automatic storage management disk PST ??
kghunalo                                                     (kgh)unalo                                                   kernel generic heap manager ??
kjcts_syncseq_incident_dump                                  (kjc)ts_syncseq_incident_dump                                kernel lock management communication ??
kkfdIsXlate                                                  (kkfd)IsXlate                                                kernel compile fast dataflow (PQ DFO) ??
kkoRowNumLimit_Int                                           (kko)RowNumLimit_Int                                         kernel compile optimizer ??
kkoWnRowNumLimit                                             (kko)WnRowNumLimit                                           kernel compile optimizer ??
kkoarFreeStats                                               (kkoar)FreeStats                                             kernel compile optimizer automatic (sql) reoptimisation ??
kkqgbpValidPredCB                                            (kkqgbp)ValidPredCB                                          kernel compile query  group by placement ??
kkqoreApplyFKR                                               (kkqore)ApplyFKR                                             kernel compile query  or-expansion ??
kkqstIsOneToOneFunc                                          (kkq)stIsOneToOneFunc                                        kernel compile query  ??
kkquReplSCInMWithRefCB                                       (kkqu)ReplSCInMWithRefCB                                     kernel compile query  subquery unnesting ??
kkqvtOpnInView                                               (kkqvt)OpnInView                                             kernel compile query  vector transformation ??
kokujJsonSerialize                                           (kok)ujJsonSerialize                                         kernel objects kernel side ??
kpdbCheckCommonprofileCbk                                    (kpdb)CheckCommonprofileCbk                                  kernel programmatic interface pluggable database ??
kpdbSyncCreateProfile                                        (kpdbSync)CreateProfile                                      kernel programmatic interface pluggable database DBMS_PDB.KPDBSYNC SYNC_PDB ??
krvfptai_PutTxAuditInfo                                      (krv)fptai_PutTxAuditInfo                                    kernel redo recovery ??
krvtab                                                       (krvt)ab                                                     kernel redo recovery log miner viewer support ??
krvxdsr                                                      (krvx)dsr                                                    kernel redo recovery extract ??
ksmg_estimate_nonimc_sga_size                                (ksm)g_estimate_nonimc_sga_size                              kernel service  memory ??
ktspFetchMeta1                                               (ktsp)FetchMeta1                                             kernel transaction segment management segment pagetable ??
kzekmckdcw                                                   (kz)ekmckdcw                                                 kernel security ??
kzekmckdcw_cbk                                               (kz)ekmckdcw_cbk                                             kernel security ??
opiBindReorderInfo                                           (opi)BindReorderInfo                                         oracle program interface ??
qcpiJsonSerialize                                            (qcpi)JsonSerialize                                          query compile parse interim ??
qcsSqnLegalCB                                                (qcs)SqnLegalCB                                              query compile semantic analysis (parser) ??
qergiSetFirstPartFlag                                        (qergi)SetFirstPartFlag                                      query execute rowsource granule iterator (partitioning? or PX granules?) ??
qeroiFindGranuleIter                                         (qeroi)FindGranuleIter                                       query execute rowsource extensibel indexing query component ??
qesblZero                                                    (qesbl)Zero                                                  query execute services bloom filter ??
qjsnIsDollarOnly                                             (qjsn)IsDollarOnly                                           query json ??
qjsnJsonCreatDom                                             (qjsn)JsonCreatDom                                           query json ??
qjsn_ferrh                                                   (qjsn)_ferrh                                                 query json ??
qkaGetClusteringFactor                                       (qka)GetClusteringFactor                                     query kernel allocation ??
qkaIsRTRIMRequiredForViewCol                                 (qka)IsRTRIMRequiredForViewCol                               query kernel allocation ??
qksopCheckConstOrOptWithBindInAndChains                      (qksop)CheckConstOrOptWithBindInAndChains                    query kernel sql compilter operand processing ??
qksqbCorrToNonParent                                         (qksqb)CorrToNonParent                                       query kernel sql Query compilation for query blocks ??
qksvcCloneHJPred                                             (qksvc)CloneHJPred                                           query kernel sql Virtual Column ??

(disclaimer: I can’t look at the sourcecode, which means I look at the oracle executable with normal, modern tools. This also means that there’s a of stuff that I don’t see, for example if functionality has been added inside an existing function, then that’s totally invisible to me)

%d bloggers like this: