Jim Mlodgenski

Monday, April 14, 2025

Improvements in Expanded Objects in PostgreSQL 18

Way back in PostgreSQL 9.5, there was a simple entry in the release notes:

Improve PL/pgSQL array performance (Tom Lane)

What was behind that was a concept called expanded objects. This introduced the ability for datatypes to have a different in-memory representation from its on-disk representation. In most cases, having the same representation is ideal. For example a 4 byte integer is compact and efficient for long term storage on-disk and is simple to use in-memory. However, for more complex datatypes like arrays or JSON, how they are used in memory in say a PL/pgSQL function, is very different than the needs for long term storage. Expanded objects like arrays and JSON, transition from a flat representation used on-disk and an expanded representation in memory.

While expanded objects have been in core for a long time, not many extensions that add new datatypes use the concept. We can guess this based on a recent commit that will be available in PostgreSQL 18. Before this commit, if a local variable that uses expanded objects in a PL/pgSQL function was assigned from another local variable, the values would be flattened, assigned and then expanded again. That switching can end up being pretty expensive. This commit greatly helps datatypes like pgcollection. There is a COPY function available in pgcollection to get around this context switching, but knowing to use it does not come naturally for developers when they use the normal assignment operators for any other variable.

Pgcollection is designed for in-memory usage to help people move from databases like Oracle which have concepts like associative arrays to PostgreSQL. For some people, associative arrays are used extensively in their stored procedures and migrating them to PostgreSQL can be difficult. For associative arrays indexed by integers, moving them to PostgreSQL arrays in PL/pgSQL works pretty well, but when they are indexed by strings, inefficient work arounds like using JSON or Hstore are typically the path. Since associative arrays in Oracle are typically not persisted, using expanded objects for pgcollection was a perfect match.

Monday, November 18, 2024

pgcollection: Associative Arrays in PostgreSQL

A common development practice in applications written for Oracle is to use a collection data structure in server-side code called an associative array. These are simple key/value structures where the key is either a number or string and the value can be any type including complex types. They are a convenient way to pass around a set of data to different parts of your business logic without needing to query your tables repeatedly.

When considering a migration from Oracle to PostgreSQL, if the associative array is keyed by an integer, converting to a native PostgreSQL array works well. PostgreSQL arrays can fixed or variable sized, can be sparsely populated and can have a value of any valid PostgreSQL type including complex types. The example below shows an array being loaded with the contents of the `users` table and then accessing individual elements within that array.


blog=# 
DO
$$
DECLARE
  r  users%ROWTYPE;
  u  users[];
BEGIN
  FOR r IN SELECT * FROM users
  LOOP
    u[r.id] = r;
  END LOOP;

  RAISE NOTICE 'id: %', u[42].id;
  RAISE NOTICE 'user_name: %', u[42].user_name;
  RAISE NOTICE 'email: %', u[42].email;
END
$$;
NOTICE:  id: 42
NOTICE:  user_name: mgiovanardi15
NOTICE:  email: mgiovanardi15@bizjournals.com
DO

Converting an associative array that is keyed by a string has been more challenging. There are a number of blogs that give various workarounds like using json or hstore, but those are cumbersome and slow. The new pgcollection extension is designed to address this specific case. It adds a new collection data type that is a key value structure with the key being a string. The example below follows the same pattern as the array example above, but using the collection type.


blog=#
DO
$$
DECLARE
  r  users%ROWTYPE;
  c  collection('users');
BEGIN
  FOR r IN SELECT * FROM users
  LOOP
    c[r.user_name] = r;
  END LOOP;

  RAISE NOTICE 'id: %', c['mgiovanardi15'].id;
  RAISE NOTICE 'user_name: %', c['mgiovanardi15'].user_name;
  RAISE NOTICE 'email: %', c['mgiovanardi15'].email;
END
$$;
NOTICE:  id: 42
NOTICE:  user_name: mgiovanardi15
NOTICE:  email: mgiovanardi15@bizjournals.com
DO

The pgcollection extension is still in beta, so don't run it in production without extensive testing. While testing, if you come across any issues, please open up something on Github.