PostgreSQL Roles are Cluster-Wide

A role in PostgreSQL is common to all databases in the cluster. This seems to be the result of a design decision made when the former user and group handling was unified under role. Follow these links for reference:

Roles, or rather those roles that are not just representing a specific user, ought instead to be an intrinsic part of the database model. Roles are defined by the kind of access they provide (read, write, etc) and by what relations(table, view, function, etc) they provide that access to. Access control is ideally managed within a database using roles rather than separately within each individual application that uses that database. So it makes sense that the access control rules (the roles and their associated permissions) would be defined alongside the definitions of the relations for which they are controlling access, any changes are then self contained. The access control model should be represented as part and parcel of the rest of the database model. Individual users (which are also represented as roles in PostgreSQL) are assigned one or more of the roles defined within each particular database model (based on the local enterprise definition of what needs they have of any particular database).

There is no sense to representing this kind of role at the cluster level as the definition of the role is associated specifically with the database where it actually controls access. In PostgreSQL, to encapsulate the full functionality of the database requires using not only the system catalog tables specific to that database but also the roles relevant to that database held in the cluster wide system catalog tables. With the exception of some special cases, like roles for cluster wide database management, this is an artificial split. Roles that are managed outside of the database model across all databases in the cluster make some sense either when there is only one database in the cluster, or when all the databases in the cluster are not independent and act together as part of one managed system. Roles (again, those that are not just being used to represent a specific user) should otherwise be defined and managed at the level of the individual database, or arguably even at the level of each schema within a database.

Below are some ideas for enhancing PostgreSQL to support database specific roles. These ideas build up from limited and simplistic solutions to more flexible and complex ones.

Approach One

This is what we currently do. We prefix every role name with the database name. So roles called customer and salesperson used in a database called orders would actually be called orders_customer and orders_salesperson. This approach relies on all owners of databases in the cluster playing ball and we need some custom handling around role DDL statements when changes are made to the database model.

Approach Two

This just puts some syntactic sugar around the first approach.

Syntax Behaviour
CREATE [DBONLY] ROLE abc Creates the role abc after first adding a prefix which is the name of the current database returning an error if the name already exists. Without the DBONLY option behaves as now. Most other options would be incompatible with the DBONLY option, for example LOGIN, CREATEDB, etc.
DROP [DBONLY] ROLE abc Drops the role abc with a prefix matching the name of the current database returning an error if the name is not found (except when IF EXISTS is used). Without the DBONLY option behaves as now.
GRANT … ON … TO [DBONLY] abc Grants the privileges to the role abc with a prefix matching the name of the current database returning an error if the name is not found. Without the DBONLY option behaves as now.
REVOKE … ON … FROM [DBONLY] abc Removes the privileges from the role abc with a prefix matching the name of the current database returning an error if the name is not found. Without DBONLY option behaves as now.
DROPDB Drops a database and all the roles in the cluster that have a prefix matching the given database name.

The cluster wide system catalog table pg_roles is still used. A role created manually with a database prefix (without using the DBONLY option) would be treated the same as if the role had been created database specific (when using the DBONLY option). Roles names still have to be unique across the cluster.

With this approach no meta data is retained that identifies a role as cluster-wide or database-specific. A pg_dump will simply dump the role name as now (including the prefix) and a pg_load would re-create it in the same way as now. The dropdb command would drop all roles that have a prefix matching the database name irrespective of how they were actually created.

The advantage of this approach is that no change is required to the mechanism that looks up roles and checks that the level of access is appropriate to the user that has those roles when relations are actually used.

However we can definitely do better while staying simple.

Approach Three

This works much as above except that instead of adding a prefix to the role name we add a column to the cluster wide pg_roles table, catalog_name, where we add the name of the database associated with the role when using the DBONLY option (left null otherwise). The advantage is that we now have the meta data preserved as to how the role was originally defined – either database-specific or cluster-wide.

This fixes various concerns with the second approach. DROPDB can be constrained to search for roles matching the database name in the catalog_name column, so cluster roles that just happen to have a matching database name prefix will be ignored. The pg_dump and pg_restore commands can use the DBONLY option in the output as necessary by checking whether the catalog_name column has a value or not. The role name lookups for GRANT and REVOKE remain specifically controlled using the DBONLY option (so as to avoid handling any complex role hierarchy semantics). Still no change is required to the mechanism that looks up roles and checks that the level of access is appropriate to the user that has those roles when relations are actually used (on the assumption that when GRANT is used the corresponding pg_roles row OID is assigned, the actual name is irrelevant). However two roles could now exist called the same, one with catalog_name empty and one (or more) with a database name in catalog_name. So there are consequences for displaying the content of system catalog tables in psql using \du or \dp for example, so the system catalog views would need to be amended in some way.

A downside is that all the roles are still held in one system catalog table for the cluster, but in practice this might not actually matter. The semantics for pg_dump need to be thought about: when dumping a specific database would want an option to also dump its own specific roles; when dumping all databases all roles should be dumped – database-specific and cluster-wide.

This approach needs few code changes and is backwards compatible.

Approach Four

This would add a per-database pg_roles system catalog table. This holds the roles specific to that database and the role names can be the same as role names in other databases or cluster-wide. Would probably need rules to handle prioritisation in name clashes with roles held in the cluster wide pg_roles table. For example CREATE ROLE would need to check both the database-specific pg_roles and cluster-wide pg_roles. Similar complexity would need to be handled in GRANT and REVOKE. Access control checks would now need to check both database-specific pg_roles and cluster-wide pg_roles, and there may be conflicts to resolve depending on how a role hierarchy is implemented. While having a database-specific pg_roles table makes semantic sense, is necessary for managing who can change which roles, and a thorough implementation of role priority/conflict handling might result in useful behaviour, there is a lot of additional complexity in the implementation.

Schema Handling

A further enhancement should allow roles to be separately defined per-schema within a database. Support could be added in some fashion by allowing schema qualification of the role name in CREATE ROLE, using public (or as defined in the schema path) by default. The implementation of this in the third approach above would be to add a schema_name column into pg_roles.

Leave a Reply

Your email address will not be published. Required fields are marked *