Skip to main content

PostgreSQL: Partitioning/Sharding and Foreign Data Wrapper Step up and Configuration

Following snippet of PostgreSQL documentation, https://www.postgresql.org/docs/12/ddl-partitioning.html#DDL-PARTITIONING-OVERVIEW, explains the benefits of partitioning as fallows. 

Partitioning refers to splitting what is logically one large table into smaller physical pieces. Partitioning can provide several benefits:

  • Query performance can be improved dramatically in certain situations, particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions. Partitioning effectively substitutes for the upper tree levels of indexes, making it more likely that the heavily-used parts of the indexes fit in memory.

  • When queries or updates access a large percentage of a single partition, performance can be improved by using a sequential scan of that partition instead of using an index, which would require random-access reads scattered across the whole table.

  • Bulk loads and deletes can be accomplished by adding or removing partitions, if the usage pattern is accounted for in the partitioning design. Dropping an individual partition using DROP TABLE, or doing ALTER TABLE DETACH PARTITION, is far faster than a bulk operation. These commands also entirely avoid the VACUUM overhead caused by a bulk DELETE.

  • Seldom-used data can be migrated to cheaper and slower storage media.

These benefits will normally be worthwhile only when a table would otherwise be very large. The exact point at which a table will benefit from partitioning depends on the application, although a rule of thumb is that the size of the table should exceed the physical memory of the database server.

PostgreSQL offers built-in support for the following forms of partitioning:

Range Partitioning

The table is partitioned into ranges defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions. For example, one might partition by date ranges, or by ranges of identifiers for particular business objects. Each range's bounds are understood as being inclusive at the lower end and exclusive at the upper end. For example, if one partition's range is from 1 to 10, and the next one's range is from 10 to 20, then value 10 belongs to the second partition not the first.

List Partitioning

The table is partitioned by explicitly listing which key value(s) appear in each partition.

Hash Partitioning

The table is partitioned by specifying a modulus and a remainder for each partition. Each partition will hold the rows for which the hash value of the partition key divided by the specified modulus will produce the specified remainder.

If your application needs to use other forms of partitioning not listed above, alternative methods such as inheritance and UNION ALL views can be used instead. Such methods offer flexibility but do not have some of the performance benefits of built-in declarative partitioning.

Sharding is not partitioning. As explained by the following MongoDB documentation - https://www.mongodb.com/features/database-sharding-explained#:~:text=Sharding%20is%20a%20method%20for,storage%20capacity%20of%20the%20system 

"Sharding is a method for distributing a single dataset across multiple databases, which can then be stored on multiple machines. This allows for larger datasets to be split into smaller chunks and stored in multiple data nodes, increasing the total storage capacity of the system."

However, PostgreSQL has implemented sharding on top of partitioning by allowing any given partition of a partitioned table to be hosted by a remote server. The basis for this is in PostgreSQL’s Foreign Data Wrapper (FDW) support, https://wiki.postgresql.org/wiki/Foreign_data_wrappers, which has been a part of the core of PostgreSQL for a long time. 

This document will go into required set up, steps to enable sharding in PostgreSQL/RDS for a given table and using a simple Spring Boot application to connect database and tables.

Infrastructure Set Up

Components:
  • ServiceA: Spring Boot application configured with a single DataSource configuration
    • Controller, Repository
    • HikariCP configuration
  • AWS RDS PostgreSQL instance in eu-central-1.
  • AWS RDS PostgreSQL instance in us-east-1.

Note: This article does not go into how to spin RDS instances in AWS or how to allow networking / security for them to communicate with each other which is needed. Both of these databases can be in the same Region. To make it transparent, in this set up, I used a different region all together.

Table Structure and Sharding Strategy 


I used the following table set up to test the set up. 

Table 1: TABLE_A → SHARDED TABLE
  • id (pkey)
  • name
  • email
  • organization_id
  • organization_value (used in declarative sharding)

Configuration Set Up

No specific configuration is required at AWS RDS level as the most recent versions of PostgreSQL already come with postgres_fdw installed. We will just need to be enabling it.
  • Local database → this is the database application is configured to connect. In our set up, this is the RDS database that was set up in USA.
  • Remote database → this is the database set up in EU which is going to hold our remote table partitions.
On the local server the preparatory steps involve loading the postgres_fdw extension, allowing our local application user to use that extension, creating an entry to access the remote server, and finally mapping that user with a user in the remote server (fdw_user) that has local access to the table we’ll use as a remote partition.

Local Database Set Up

//at the LOCAL database enable postgres_fdw. "postgres" is the user.
CREATE EXTENSION postgres_fdw;
GRANT USAGE ON FOREIGN DATA WRAPPER postgres_fdw to postgres;
//at the LOCAL database, set up a server configuration to wrap our EU database.
CREATE SERVER shard_eu FOREIGN DATA WRAPPER postgres_fdw
OPTIONS (dbname 'postgres', host 'hosturl.com', port
'5400');
//at the LOCAL database, set up a user mapping to connect to REMOTE Server
CREATE USER MAPPING FOR postgres
SERVER shard_eu
OPTIONS (user 'postgres', password 'mysecretpassword');

//at the LOCAL database, create a partitioned table using LIST partitioning based on organization_value column that would have "eu", and "usa" values.
CREATE TABLE public.TABLE_A
(
id bigint NOT NULL,
name character varying COLLATE pg_catalog."default",
email character varying COLLATE pg_catalog."default",
organization_id bigint,
organization_value character varying COLLATE pg_catalog."default"
) PARTITION BY LIST(organization_value);

Remote Database Set Up

//At the REMOTE(EU) table, create a table.
CREATE TABLE public.TABLE_A_EU
(
id bigint NOT NULL,
name character varying COLLATE pg_catalog."default",
email character varying COLLATE pg_catalog."default",
organization_id bigint,
organization_value character varying COLLATE pg_catalog."default",
CONSTRAINT table_a_eu_pkey PRIMARY KEY (id)
)
WITH (
OIDS = FALSE
)
TABLESPACE pg_default;

We then at the local database correctly define Partitions. Please note the "Foreign Table" creation for the eu partition.

//AT the local database (USA), create a foreign table for "eu" partition.
CREATE FOREIGN TABLE TABLE_A_eu PARTITION OF TABLE_A
FOR VALUES in ('eu')
SERVER shard_eu;

//At the local database (USA), create table to hold "usa" partition.
create table TABLE_A_usa PARTITION OF TABLE_A for values in ('usa');

Application Set Up / Test Cases and Execution


Spring Boot Application can be used to easily configure a Datasource connection.
#Simple DataSource Setup Using Spring/Hikari - connection is only to Local database (i.e) - USA
app:
datasource:
postgres:
app:
jdbc-url: jdbc:postgresql://host_url_for_local_db:5400/appdb?useServerPrepStmts=false&rewriteBatchedStatements=true&ssl=true&sslmode=require
username: postgres
password: mysecretpassword
driverClassName: org.postgresql.Driver
maximum-pool-size: 10
minimum-idle: 10

With this application, you can just connect to the local database and interact with the TABLE_A table as if REMOTE server didn't exist at all.

When you do a simple query like

SELECT * from TABLE_A - PostgreSQL will use remote server settings through postgres_fdw extension and load data from both partitions - "eu" vs "usa".

There are limitations to POSTGRES_FDW plugin. 

Comments