Hold the exact(ish) “CREATE VIEW” definition in the PostgreSQL System Catalog

It would be rather nice if the exact SQL definition (well almost, see below) supplied with the “create view” statement could be held in the PostgreSQL System Catalog, rather than the re-written optimised version that is currently held. This has come up before, see:

http://www.postgresql-archive.org/idea-storing-view-source-in-system-catalogs-td1987401.html
http://www.postgresql-archive.org/Preserving-the-source-code-of-views-td5775163.html

Back then this idea got stomped on for good reasons but I am not suggesting quite the same.

We would prefer PostgreSQL to store view definitions in the system catalog *AFTER* column expansion (unlike in the discussion above) but *BEFORE* internal parser rewrites. So, currently for example:

CREATE TABLE bar ( x INTEGER, y TEXT );

CREATE VIEW foo(x, y) AS
SELECT * FROM bar WHERE y IN ( ‘a’, ‘b’ ) AND x IS NOT DISTINCT FROM 0;

SELECT view_definition FROM information_schema.views
WHERE table_name = ‘foo’;

SELECT bar.x, bar.y
FROM bar
WHERE ((bar.y = ANY (ARRAY[‘a’::text, ‘b’::text])) AND (NOT (bar.x IS DISTINCT FROM 0)));
(1 row)

The column expansion and corresponding rewrite is mandatory at the time of creation (see below), but the other changes in the definition are not. They make comparing the current definition of a view as it is in a database against an externally maintained definition difficult, which is what we in particular want to be able to do.

The SQL standard (at least the ancient version I have access to) for “information_schema” seems rather open to interpretation on this issue. It says that the view definition column:

“contains a representation of the view descriptors”

What do we take “representation” to be – presumably a completely mangled internally optimised “blob” would be compliant (but otherwise useless). The spirit of the statement might be more reasonably taken to be “a human legible representation”. In that case how much rewrite should actually be allowed – to me the rewritten definition above is already getting less legible than the original, but maybe others feel different. I don’t know if it was a design decision that the parser rewrites were folded back into the stored definition or just a side effect of the implementation. Either way, if there is any additional rewriting in the future there must be a point at which the rewrites would be considered to make the stored view definition so sufficiently illegible that the spirit of the standard at least is no longer being followed – or would future rewrites be needlessly constrained by having to preserve the legibility of the stored definition?

The SQL standard is however completely explicit about column expansion:

“NOTE 13: Any implicit column references that were contained in the <query
expression> associated with the <view definition> are replaced by explicit
column references in VIEW_DEFINITION.”

The SQL standard also implies that the definition should be NULL when the length exceeds some limit (not sure what that is but I have a vague feeling from somewhere that it was 4K characters). PostgreSQL does not enforce that limit that I know of (or simply has a much higher limit) since we have some massive views that are returned entirely – and that seems very sensible to me, can’t see a point having a limit these days (except for backwards compatibility with external applications). Maybe this limit has been removed in more recent versions of the standard.

The re-written view definition is also what is output by pg_dump. The consequence of this is that any new/altered parser rewrites made in a newer version of PostgreSQL might not be automatically applied to the view when it is restored from the old versions dump file into that new version. They would however always be applied if the view was restored from a dump that used the original raw definition from the CREATE VIEW statement.

I looked at MariaDB for comparison. This does some rewriting as well – it quotes every table and column reference and adds superfluous bracketing, but does not replace IN (it does not have a comparable “NOT DISTINCT” syntax, just <=>, but I suspect it would shift an embedded NOT in similar constructs in the same way as PostgreSQL). I guess YMMV here depending on how much clever rewriting the database does. Don’t know about other databases like Oracle or SQL Server.

Arguably the same principle should be applied to any embedded SQL literal expression used in DDL operations where the parser rewrites it, e.g. in DEFAULT values for table columns or WITH clauses used in trigger conditions. No idea what the SQL standard says about these, and indeed if they are even included under the information_schema.

Storing the original CREATE VIEW statement post column expansion is obviously of no benefit internally to PostgreSQL itself (excepting the minor pg_dump point above) so I can understand the reluctance to change anything. Reading the earlier discussion on this issue also suggests that this might be a really tricky change to make, I guess due to the way the parser is currently written, so cost/benefit is poor.

Anyway, for the record at least, we have a use case where it would be better for us if the original CREATE VIEW statement was returned as-is in the “information_schema.view_definition” column, but in compliance with Note13 any column expansion re-writes will have been applied. Over and above standards compliance I believe that the column expansion is structurally necessary anyway to facilitate subsequent table/column re-naming and the automatic cascade of the re-naming to views that use them. The further parser rewritten definition could continue to be stored in the pg_catalog in addition if it is needed internally within PostgreSQL (i.e. pg_dump). However, I also suggest that it might be sensible for the original “column expansion only” definition to be used in the pg_dump output, so that the “intent” of the view when originally created is what should (or should not in the case of any incompatibilities) migrate into a future version (or be restored into the same version).

2 thoughts on “Hold the exact(ish) “CREATE VIEW” definition in the PostgreSQL System Catalog

  1. Marti Raudsepp says:

    There’s a good reason why PostgreSQL outputs the “rewritten” view definition: PostgreSQL views are always schema-bound, that is, bound to the exact tables and columns at view creation-time.

    This means that when the schema changes — mostly through column or table renames — the view still references the same renamed table. Thus the original view definition text (with old names) will be invalid, but the rewritten view definition returned from the catalogs accurately reflects the current state of the view.

    It’s debatable whether schema-bound views are the desired behavior but that’s the current implementation, and returning the un-rewritten view definition can in fact give you an invalid result.

    • timc says:

      Yes, agreed, changing names and columns in the rewritten definition is absolutely fine as well. I am just talking about “non-essential” rewrites, optimisations that make the view harder to read (and compare) but do not affect functionality within the schema.

Leave a Reply

Your email address will not be published. Required fields are marked *