My reasons for moving to Yugabyte

I am excited to announce that I have accepted a role at Yugabyte as ‘developer evangelist’. It’s not easy to leave the talented group of people that is Enkitec, of which many of them I call friends. In fact, the change is much bigger since Yugabyte is a database outside of the Oracle ecosystem, in which ecosystem there also are a lot of people that I have gotten to known and call friends.

So why the change? In essence, my reasons are the same as the ones Bryn Llewellyn mentioned in his blogpost about his move, although I come from a different background being a consultant and not working for Oracle.

Open source

You could argue that having spend investigating the inner working of the Oracle database for 20 or so years means I am discarding all this knowledge I build up. I don’t believe that’s true. Yes, I have build up a lot of knowledge about the Oracle database works, and parts of that knowledge might be known (usable?) to a small group. However, I am looking forward to working with software for which I can actually see how it’s made, and I don’t have to perform extensive research to figure out how it’s “probably” implemented. This is an important reason for me personally. In general I think there are much more reasons for companies for wanting to use open source software instead of proprietary, closed source software.

Distributed architecture & distributed SQL

The area I have spend a significant amount of time in, is assessing and investigating performance of the Oracle database. Oracle, like many traditional monolithic databases, use an architecture where the database itself is actually a single set of files containing the data, which is singular, and are served by processes that perform validation of requests to use that data, and perform performance features like caching so data can be used from memory, which means it doesn’t need to be re-read from the files so it can be served at memory latency. This architecture imposes limits on the scale of use, for which common solutions to overcome these limits with monolithic architectures are:
– Sharing the (still singular) datafiles with multiple machines (RAC).
– Make a copy of the data files, and spool the changes to the copy (DG/OGG/replication).
– (Manually) partition the data over multiple independent databases (sharding).
– Storing data redundantly over the available (still local) disks (RAID).
I am not saying these solutions are wrong. In fact, many of them are very cleverly architected and optimized over the years with the different architectures of databases and work really well.
However, there are a consequences that are inherently imposed by a monolithic architecture, such as scalability, flexibility and failure resistance.

The Yugabyte database provides a cloud ready, cloud vendor agnostic, failure resistant, linear scaling database (sorry for the abundance of hype words). Failure resistance is provided by keeping multiple copies of the data, which is configurable as ‘replication factor’. By keeping copies of the data, failure of a single Yugabyte node (called a ‘tablet server’) can be overcome. By spreading these copies of data over groups of tablet servers in cloud availability zones, the outage of an entire availability zone can be survived. When such a disaster strikes, the cluster will survive and provide normal (read and write functions) as long as 2/3rd of the nodes survive (in case of a replication factor of 3), without the need for any management to accommodate failover, like instantiating a replica/standby.

Each Yugabyte tablet server also serves as an API for data retrieval and manipulation, so with the addition of tablet servers, the processing power increases. A tablet server should use local, non-protected disks to take advantage of low latency, and by adding more tablet servers the work can be spread out further to these, increasing the amount of work that can be done. Aside from increasing processing for API-side processing, Yugabyte provides functionality which pushes down the work to the storage, which Exadata specialists will recognize as ‘smart scans’, alongside things like predicate pushdown and bloom filters.

Yugabyte has not invented yet another SQL and/or no-SQL dialect. Another point that I think is really strong is that Yugabyte reuses Postgresql as the SQL API. The Postgres API is “wire compatible” with Postgres, which means that any product that can talk to a Postgres database can talk to Yugabyte. In fact, instead of writing a Postgres layer, Yugabyte has re-used the Postgres source to use the query layers of postgres, and then provide the storage for postgres using Yugabyte. This also means a lot of the server side functionality of Postgres, such as PL/pgSQL can be used. Needless to say there are limitations inherent to the postgres connection with the Yugabyte storage layer, but most common postgres database functionality is available.

But that is not all! Aside and in parallel to Yugabyte’s Postgres API, it also provides a “NoSQL” APIs, which are compatible with Apache Cassandra and with Redis. Oh, and data storage is done via LSM (log-structured merge-tree), etc.

Conclusion

I hope at this point you can see that Yugabyte provides a fresh, modern approach to a database that provides a lot of advantages over more traditional, monolithic databases, for which I think a lot of cases that I witnessed over the years could significantly benefit from (performance-wise and availability-wise). I am really thrilled to be on the basis of building that further.

Also, a lot of technical reasons that I described are really just a summary, if you have gotten interested or want to learn more, I would urge you (or challenge you :-D) to have a look at https://www.yugabyte.com or read more at https://docs.yugabyte.com. Or if you want to see the code, head over to https://github.com/yugabyte/yugabyte-db!

4 comments
  1. I wish you the best. Please keep us inforrmed as to your experiences, I am sure they will be quite interesting. I’ve been dabbling in Postgres quite a bit lately. It sort of reminds me of Oracle 6-7 in terms of features, which is not a bad thing at all.

    • I am hired as ‘evangelist’ so I would say you should do your best NOT to hear from me 😀 {insert evil laughter}.

      One fundamental issue I see with postgres is the inherent reliance on the host page cache. The uncertain nature of that cache is an issue, and the fundamental performance difference as a result for a dataset that fits in memory versus not fitting in memory. Plus the singular code path of reading a single database block at a time. But things are moving.

      I do like having the source, being able to compile it with or have the debug symbols from a repository, and being able to put the source code in a source code indexer, such as opengrok (that is what I use currently to understand what is going on for both postgres and yugabyte).

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

<span>%d</span> bloggers like this: