Wednesday, April 8, 2026

pgcollection 2.0: Integer Keys, Range Deletes, and Oracle Parity

In my first post about pgcollection, I introduced the collection type to address the challenge of migrating Oracle associative arrays keyed by strings to PostgreSQL. For integer-keyed associative arrays, I noted that native PostgreSQL arrays work well enough for simple cases. That holds true until the keys are sparse.

Consider this Oracle pattern:

DECLARE
  TYPE cache_t IS TABLE OF VARCHAR2(100) INDEX BY PLS_INTEGER;
  cache  cache_t;
BEGIN
  cache(1)       := 'first';
  cache(1000000) := 'millionth';
  DBMS_OUTPUT.PUT_LINE('Count: ' || cache.COUNT);  -- 2
END;

The equivalent attempt with a PostgreSQL array produces a different result:

DO $$
DECLARE
  a text[];
BEGIN
  a[1]       := 'first';
  a[1000000] := 'millionth';
  RAISE NOTICE 'Length: %', array_length(a, 1);  -- 1000000
END $$;

PostgreSQL fills positions 2 through 999,999 with NULLs. You asked for two entries and got a million-element array. Worse, it is impossible to distinguish between a key that was explicitly set to NULL and one that was never set at all. pgcollection now avoids both problems.

icollection

icollection is a 64-bit integer-keyed associative array that stores only the keys you set. The same pattern from above works as expected:

DO $$
DECLARE
  cache  icollection('text');
BEGIN
  cache[1]       := 'first';
  cache[1000000] := 'millionth';

  RAISE NOTICE 'Count: %', count(cache);  -- 2
  RAISE NOTICE 'Value: %', cache[1000000];
  RAISE NOTICE 'Key 500 exists: %', exist(cache, 500);  -- false
END $$;

icollection supports the same full set of operations as collection — subscript access, forward and reverse iteration, sorting, set-returning functions, and JSON casting — with bigint keys instead of text. It maps directly to Oracle’s TABLE OF ... INDEX BY PLS_INTEGER, with the keys widened to 64-bit so overflow is not a concern during migration.

The exist() function resolves the NULL ambiguity problem directly. With a PostgreSQL array, a[2] returns NULL whether the key was set to NULL or never set. With icollection, exist() distinguishes the two cases:

DO $$
DECLARE
  ic  icollection('text');
BEGIN
  ic[1] := 'value';
  ic[2] := NULL;

  RAISE NOTICE 'Key 2 exists: %', exist(ic, 2);  -- true (explicitly NULL)
  RAISE NOTICE 'Key 3 exists: %', exist(ic, 3);  -- false (never set)
END $$;

Bulk delete operations

Oracle’s .DELETE method has three forms: delete a single key, delete a range of keys, and delete everything. pgcollection 1.x supported only single-key deletion. Version 2.0 adds the other two.

Delete all elements:

capitals := delete(capitals);

The previous workaround was assigning NULL, but that discarded the value type. A collection('date') set to NULL would infer its type from the next add() call rather than preserving date. Calling delete() with no arguments removes every entry while keeping the type intact.

Range delete:

-- String keys: uses the collection's collation for comparison
capitals := delete(capitals, 'B', 'D');

-- Integer keys: numeric comparison
arr := delete(arr, 5, 15);

Only keys that exist within the range are removed. If the low bound is greater than the high bound, the call is a no-op. This matches Oracle’s behavior.

Oracle migration guide

The other significant addition in 2.0 is a comprehensive Oracle migration guide covering common associative array patterns side by side: basic CRUD, FIRST/NEXT iteration, EXISTS checks, bulk loading from queries, FORALL-style bulk DML, record types as values, INOUT parameters, and exception handling.

The FORALL replacement is worth examining. In Oracle, bulk DML with associative arrays looks like this:

FORALL i IN 1..ids.COUNT
  INSERT INTO employees (id, name) VALUES (ids(i), names(i));

With pgcollection, the set-returning functions make this a single statement:

INSERT INTO employees (id, name)
  SELECT key, value FROM to_table(emps);

There is no loop and no per-row context switch between PL/pgSQL and the SQL engine.

The guide also documents the behavioral differences to be aware of during migration. The most consequential is that Oracle associative arrays with string keys are always sorted, while pgcollection preserves insertion order and requires an explicit sort() call. Code that iterates with FIRST/NEXT and expects alphabetical order needs that sort() added.

Upgrading

Existing collection usage is fully backward compatible. If you are running pgcollection 1.1.1, the upgrade is a single statement:

ALTER EXTENSION collection UPDATE TO '2.0.0';

For new installations, pgcollection requires PostgreSQL 14 or later:

git clone https://github.com/aws/pgcollection.git
cd pgcollection
make && make install
CREATE EXTENSION collection;

Summary

pgcollection 2.0 closes two remaining gaps for Oracle migrations: sparse integer-keyed arrays with icollection, and the full set of Oracle .DELETE semantics with the new delete() overloads. The migration guide brings the patterns together in one place. Performance benchmarks comparing icollection to native PostgreSQL arrays are something I plan to cover in a follow-up post.

The pgcollection extension is available on GitHub under the Apache-2.0 license. If you run into issues or patterns the migration guide does not cover, open an issue on GitHub.

Monday, April 14, 2025

Improvements in Expanded Objects in PostgreSQL 18

 Way back in PostgreSQL 9.5, there was a simple entry in the release notes:

Improve PL/pgSQL array performance (Tom Lane)


What was behind that was a concept called expanded objects. This introduced the ability for datatypes to have a different in-memory representation from its on-disk representation. In most cases, having the same representation is ideal. For example a 4 byte integer is compact and efficient for long term storage on-disk and is simple to use in-memory. However, for more complex datatypes like arrays or JSON, how they are used in memory in say a PL/pgSQL function, is very different than the needs for long term storage. Expanded objects like arrays and JSON, transition from a flat representation used on-disk and an expanded representation in memory. 

While expanded objects have been in core for a long time, not many extensions that add new datatypes use the concept. We can guess this based on a recent commit that will be available in PostgreSQL 18. Before this commit, if a local variable that uses expanded objects in a PL/pgSQL function was assigned from another local variable, the values would be flattened, assigned and then expanded again. That switching can end up being pretty expensive. This commit greatly helps datatypes like pgcollection. There is a COPY function available in pgcollection to get around this context switching, but knowing to use it does not come naturally for developers when they use the normal assignment operators for any other variable.

Pgcollection is designed for in-memory usage to help people move from databases like Oracle which have concepts like associative arrays to PostgreSQL. For some people, associative arrays are used extensively in their stored procedures and migrating them to PostgreSQL can be difficult. For associative arrays indexed by integers, moving them to PostgreSQL arrays in PL/pgSQL works pretty well, but when they are indexed by strings, inefficient work arounds like using JSON or Hstore are typically the path. Since associative arrays in Oracle are typically not persisted, using expanded objects for pgcollection was a perfect match. 


Monday, November 18, 2024

pgcollection: Associative Arrays in PostgreSQL

A common development practice in applications written for Oracle is to use a collection data structure in server-side code called an associative array. These are simple key/value structures where the key is either a number or string and the value can be any type including complex types. They are a convenient way to pass around a set of data to different parts of your business logic without needing to query your tables repeatedly. 

When considering a migration from Oracle to PostgreSQL, if the associative array is keyed by an integer, converting to a native PostgreSQL array works well. PostgreSQL arrays can fixed or variable sized, can be sparsely populated and can have a value of any valid PostgreSQL type including complex types. The example below shows an array being loaded with the contents of the `users` table and then accessing individual elements within that array.


blog=# 
DO
$$
DECLARE
  r  users%ROWTYPE;
  u  users[];
BEGIN
  FOR r IN SELECT * FROM users
  LOOP
    u[r.id] = r;
  END LOOP;

  RAISE NOTICE 'id: %', u[42].id;
  RAISE NOTICE 'user_name: %', u[42].user_name;
  RAISE NOTICE 'email: %', u[42].email;
END
$$;
NOTICE:  id: 42
NOTICE:  user_name: mgiovanardi15
NOTICE:  email: mgiovanardi15@bizjournals.com
DO

Converting an associative array that is keyed by a string has been more challenging. There are a number of blogs that give various workarounds like using json or hstore, but those are cumbersome and slow. The new pgcollection extension is designed to address this specific case. It adds a new collection data type that is a key value structure with the key being a string. The example below follows the same pattern as the array example above, but using the collection type.


blog=#
DO
$$
DECLARE
  r  users%ROWTYPE;
  c  collection('users');
BEGIN
  FOR r IN SELECT * FROM users
  LOOP
    c[r.user_name] = r;
  END LOOP;

  RAISE NOTICE 'id: %', c['mgiovanardi15'].id;
  RAISE NOTICE 'user_name: %', c['mgiovanardi15'].user_name;
  RAISE NOTICE 'email: %', c['mgiovanardi15'].email;
END
$$;
NOTICE:  id: 42
NOTICE:  user_name: mgiovanardi15
NOTICE:  email: mgiovanardi15@bizjournals.com
DO

The pgcollection extension is still in beta, so don't run it in production without extensive testing. While testing, if you come across any issues, please open up something on Github.

pgcollection 2.0: Integer Keys, Range Deletes, and Oracle Parity

In my first post about pgcollection , I introduced the collection type to address the challenge of migrating Oracle associative arrays keye...