The latest version of PostgreSQL has some cool new features. We look at the most valid of these identified by Umair Shahid, Head of PostgreSQL at Percona, while also referencing ideas from the past that shaped their foundation.
Who would have thought at the time in the 80s that the humble fork of Ingres would become, according to the genealogy of relational databases, one of the most, if not the most successful DBMS of all time?
There are two reasons behind its success; first that it is truly open source at heart, therefore embraced by a strong and vibrant community, and then that it was shaped by the visionary ideas of Mike Stonebraker, which formed the basis of the wonder that followed . Let’s take a look at some of them:
Support for ADTs in a Database System
At the heart of the concept of an object-relational database was support for ADTs or abstract data types that went beyond the traditional ones managed by the database. These were complex objects or data that needed to be stored as nested bundles, unlike traditional relational model data flattening to eliminate duplicates. It started as an attempt to meet the needs of CAD applications that use data types like polygons, rectangles, or even fully blown objects like circuit layout engines. The form it takes today is JSON, JSONB or XML. Storing this kind of data was only part of the story; the other was also to allow executing declarative queries on it. At the time, only Stonebraker had caught up with the idea. This scheme has been further extended by allowing programmers to declare their own types and user-defined functions to work with these types of types.
Years later, PostgreSQL 9.2 introduced another type, Range, which represents a range of values of a certain type of element; 14 goes one step further by introducing “multiplication” types that allow non-contiguous ranges, helping developers to write simpler requests for complex sequences such as specifying the time slots during which a meeting room is reserved during the day.
“Old” types are also getting a boost as version 14 continues to evolve its JSON support by adding new index syntax. This means that you can access your JSON data as
TO SELECT *
O address[‘attributes’][‘country’] = ‘”United Kingdom”‘
Extensible access methods for new data types
Another innovation was the B-Tree indexes that everyone is familiar with today, as well as the R-Tree indexes which allowed two-dimensional range queries to be performed on the data. These advancements laid the groundwork for Postgres extensibility, in part reflected today in the Gist indexes and interfaces that power the well-known PostGIS geographic information system.
Another index is that of GIN for Generalized Inverted Index, under which you can index your JSON data in order to enable full-text search. We have looked at this type of index and others that Postgres supports in Deep Dive Into PostgreSQL Indexes.
With indexes, however, there are overhead costs. The most popular of these, B-tree, when updated frequently, tends to accumulate dead tuples which cause index swelling. Typically, these tuples are only removed when a blank is run, but between blanks, as the page fills up, an update or insert will cause the page to split – something that is not reversible. PostgreSQL 14 can now detect and remove these tuples even between blanks, reducing the number of page splits and hence index inflation.
Multiprocessor support: XPRS
Another novelty brought about by Postgres was the sharing of memory and processors to support parallel query optimization. Although Postgres’ one and only weakness is that it cannot evolve on its own to a cluster architecture without parallel sharing, Stonebreaker’s article on “Case for Shared Nothing” has fueled the technologies behind Gamma, Terradata and the era of Big Data as an integer. But due to the extensibility of Postgres, products like the Citus DB Cluster, PostgresForest or Postgres-xc have emerged that rely on Postgres vanilla and make it able to understand and execute queries. parallels as a distributed, non-sharing database.
However, parallelism extends to distributed workloads as well, and v14 alleviates the burden on these workloads by allowing both query parallelism for table scans as well as bulk insert on foreign tables ( a database object that represents a table present on an external data source which could be another PostgreSQL node or a completely different system).
PostgreSQL 14 also brings more refinement to query parallelization by adding support for RETURN QUERY and REFRESH MATERIALIZED VIEW.
Remaining on the distributed and replication front, logical replication that was introduced in PostgreSQL-10 has been changed in v14 to allow streaming of current transactions to subscribers. Prior to this, transactions were only replicated at commit time and as such there was a lag in waiting for the transaction to commit in order to transfer the data. The new streaming method, replication performance is improved several times.
From a performance standpoint, there is yet another tweak in the underlying libpq C-based library, which enables “pipeline mode” under which applications can send a request without having to read the result of the request previously. sent. This means that a client will wait less for the server, as multiple requests / results can be sent / received in a single network transaction. And although this is a feature introduced by PostgreSQL 14, pipeline mode is a client-side technique that does not require special server support and works on any server that supports extended query protocol v3. .
Active databases, rule systems and stored procedures
The rules or triggers and stored procedures developed under Ingres were yet another construct popularized by Postgres that found their way into all the major database engines. More information on this can be found in Connecting to the Outside World with Perl and Database Events, an article that reviews the rules, database events, and stored procedures in modern Ingres.
In v14, stored procedures have been improved by allowing data to be returned using OUT parameters, which is welcome for developers familiar with other DBMSs such as Oracle.
Log-centric storage and retrieval
Dislike logging patterns, Stonebraker “Unified primary storage and historical logging into a single, simple representation on disk” Write-ahead logging scheme which not only simplified recovery but also enabled time travel i.e. running queries “from a certain clock time to access the versions of the data that were validated at that time”, defining the idea of implementing MVCC as experienced under Oracle’s snapshot isolation level.
In PostgreSQL 14, with new features that make monitoring and observability easier, you can now track the progress of all WAL activity, as well as the progress of COPY commands and replication location statistics. These and many more can be found on the release notes.
The bottom line of this story is that PostgreSQL has been innovating since the 1980s and continues, with each new release, to set the tone for the rest to follow.
A deep dive into PostgreSQL indexes
PostgreSQL is DB-Engines DBMS of the year 2020
Connect to the Outside World with Perl and Database Events
Ingres 11 technical overview
Genealogy of relational databases
or send your comment to: firstname.lastname@example.org