Introduction
Top level design for the Personalised Information Portal, Computing Project #470.
The Personalised Information Portal (PIP) hosts web pages with an individualised cut of data that is appropriate to the teaching and administrative duties currently held by the particular member of staff viewing those pages. For example: a member of staff that is a PGR supervisor will see a page containing a set of live data on those PGR students they are supervising; a member of staff that is a PGR Selector for an institute will see a page containing a set of live data on the current applications being processed for that institute; a member of staff that is an institute director will see a page containing management reports for things like student numbers and funding information customised to their institute. Individualised web pages provide direct access to the data we have in local systems (and links to associated resources) needed by staff performing the associated duties. This: is more efficient for staff than having to search for aggregated resources held in many different places; helps the School to meet various data protection obligations; provides a useful resource for new staff (or existing staff taking on new roles) when they first start out. In addition, the content (or a particular view of it) can be subsequently used within a PDR.
Design Principles
The content on the PIP will be predominantly (and at the outset exclusively) provided from Theon (or a Theon Managed Micro Service). So the design must work simply and efficiently when using this source. The design should not however preclude the use of other sources in the future, even if these are not as directly integrated or as easy to add.
The PIP should be completely agnostic to the sources and content of data so the design must not require any context sensitive (source dependent) structure or functionality. The decision on what content to provide and how that content is presented (while remaining consistent with the overall presentation) on the PIP rests entirely with the originating service, since that is where the data is mastered and where what data to present and how is most effectively controlled. This keeps the PIP itself simple to maintain and keeps any evolving logic specific to each originating service local to that service.
The design should be lightweight, robust and fast to implement by re-using or borrowing from as much existing technology and standards as possible. For example, a solution like uPortal (what MyEd is based on) is too heavyweight as well as being an outdated approach. The PIP only needs to host static content, it does not need to support embedded applications.
Adding a new external service onto the PIP should be cheap to do on top of the changes that have to be made to the external service anyway in order to provide the appropriate content in a suitable format. Assuming ideal conditions (which depend on the nature of the service) a service owner that has the content ready that they want to put on the PIP, and is familiar with the procedure, should need no more than half a working day to include their content on the PIP.
Terminology
- PIP
- Personalised Information Portal. The service acting as a proxy to serve up data on an individualised basis from any number of channels supplied by any number of external services. The PIP is essentially a service that collates the content from other services and builds this content onto individualised web pages for users (as well as providing the content via an API for re-use). The PIP is content agnostic – it does not need or use contextual information which is specific to each individual external service. In effect the PIP is just a relay. The PIP can provide two way relaying – some services may want to expose controls for update within the PIP, when bouncing the user back to the full external service interface would be overkill. The PIP here pushes content back to the external service, changes can be made, content re-generated and relayed back to the user through the PIP. Note that we assume just one PIP exists here, in practice there might be more than one if they have more specialised roles, the design does not preclude there being any number of specialised instances.
- satellite
- A service that has content that is made available through the PIP.
- channel
- A single data set presented on the PIP and supplied by a satellite. A satellite may supply more than one channel. A channel may contain embedded callbacks to allow a PIP to remotely initiate specific actions on the satellite. A channel dataset must be supplied raw (unformatted) but can also be supplied formatted (in compliance with the PIP channel specification). A channel will also provide associated meta data, such as generation time and a “backlink” (allowing the channel to provide a link to the satellite so that the user can view additional content or perform any operations not implemented via callback).
- supplier
- A supplier is a satellite that is providing one or more channels to a PIP in order for that PIP to present the content held in that external service relevant to the current individual accessing. A supplier pushes content to a PIP.
- receiver
- A receiver is a satellite that can accept return data from one or more of its channels in a PIP in order for that PIP to remotely affect change within the corresponding data sets held in that external service. The PIP pushes content to a receiver.
- tightly coupled
- A satellite that is structurally integrated with the PIP in order to be a supplier and/or a receiver of content for the PIP. A tightly coupled satellite can operate with a closed loop (push and wait for response) or open loop (push and continue) as either supplier or receiver for the PIP.
- loosely coupled
- A satellite that uses the RESTful API provided by the PIP in order to be a supplier or a receiver of content for the PIP. A loosely coupled satellite can operate with a closed loop (push and wait for response) or open loop (push and continue) as supplier for the PIP, but it cannot operate as receiver for the PIP.
- TMMS
- Theon Managed Micro-Service. The School Database (Hypatia) is a TMMS, despite it being anything but “micro”. There are others within the School such as Referee, DiceDesk, InfROS. Over time Hypatia itself will be fractured into a separate TMMS to cover each distinct workflow. The PIP will itself be a TMMS. A TMMS provides out of the box: PostgreSQL database model management; a web based user interface (for administration); a RESTful API; internalised page generation for a web interface. Services built using a TMMS, such as the PIP, are fast to implement.
- callback
- A mechanism for a channel on a tightly coupled satellite to use a PIP to pass over user affected content and initiate an action on that satellite. A callback can be implemented as a closed loop requiring the satellite to supply updated content for the channel reflecting the results of the action.
- closed loop
- A closed loop is synchronous. An atomic transaction initiated from the push of data (either from a satellite to the PIP as a supplier or from the PIP to a satellite as a receiver) that must fully complete before control is handed back. In the context of a supplier this ensures that channel content has been transferred and made available to the end user before the transaction is complete. In the context of a receiver this ensures that return content has been transferred and any action associated with it, including perhaps updating the channel content, has completed. A closed loop can easily result in performance problems. or even worse deadlock, in the PIP and a satellite so care must be taken in implementation.
- open loop
- An open loop is asynchronous. Any processing resulting from the push of data (either from a satellite to the PIP as a supplier or from the PIP to a satellite as a receiver) is deferred and control is handed back immediately. In the context of a supplier this means that once the channel content has been transferred the operation is complete – the necessary processing to make the content available to the end user will then happen at some point thereafter. In the context of a receiver this means that once the return content has been transferred the operation is complete – any action associated with it will be down to the satellite to schedule.
- notification
- A notification is a side band of a channel data set. These are temporary messages that are handled by the PIP differently (but consistently across all channels) from the main content. They are usually, but don’t have to be, generated from running callbacks.
Design
The PIP will have its own local PostgreSQL database. This will hold the content supplied by each satellite. The content in this local database will be used to create individualised web pages. A standard Cosign authenticated Apache service fronts the content. All the web page HTML is created directly from the database on the fly by aggregating the relevant satellite channel content for the currently authenticated user. This is done by a Python server running under WSGI. This also handles the users delegated credentials and database connection pooling. Identical functionality already exists to do this, and has been proven robust and performant over many years, in the server for TheonUI. We will simply re-use the core functionality in this to develop a custom server for hosting satellite channel content. Like in TheonUI all authentication and authorisation control will be offloaded to the database itself, as the connection is made using the current users delegated credentials. The Python server will incorporate a content agnostic version of the CGI form submission handler, currently in use on TheonPortal, in order to implement callbacks.
The most recent content supplied by a satellite is always used on the web page on reload. User initiated forced refresh of satellite content can be implemented, where necessary, by using a callback for the originating satellite. Any satellite content which is raw data will, when updated, automatically trigger internal page re-generation (full or partial) within the database – the most recent re-generated page is returned on web page reload. Such content can also be re-generated, where more appropriate, by scheduled run. In most cases though the logic to constrain, build and format channel content is done in each satellite, the PIP just replays and/or reformats relevant fragments of the content into a web page. There will be a CSS stack accessed through Apache used to enforce a consistent style on the PIP and satellite channel content.
The PIP will be implemented as a TMMS, meaning all functionality and authorisation control (apart from the actual web hosting and authenticated user mapping done by Apache) is implemented in the local back-end database. In this way the PIP can: use the built-in RESTful API for direct content access and callback by users and for loosely coupled satellite data upload; use TheonUI for administrative management (global settings).
Authentication of users on the local PostgreSQL database will be done through Prometheus roles and capabilities using pgluser to manage change. Authorisation rules will be part of the TMMS model. The local PostgreSQL database will take a feed of data from the School Database containing active users assigned posts, teaching and administrative duties/scope. These attributes are used by satellites to construct appropriately grouped channel content, correlated against users and ultimately constrained within the PIP based on currently authenticated user.
Users with an administrative role that allows them to see other users pages (or specific views of those pages) do so by virtue of having the necessary user channels embedded on their own pages.
Adding a satellite and channel(s) to the PIP can be done by anyone making the necessary satellite amendments (described below) and the PIP owner enabling access for that satellite.
Tightly Coupled Satellite
There are three kinds of satellite:
- Satellite is a TMMS
- Satellite uses PostgreSQL but is not a TMMS
- Satellite does not use PostgreSQL or even a database at all
All of these can be configured as a tightly coupled satellite, but in order of increasing complexity. Any of the above could instead be configured as a loosely coupled satellite.
A tightly coupled satellite is connected to the PIP by having shared exchange tables defined in a PostgreSQL database which are each a foreign data wrapper for a corresponding table in the PIP PostgreSQL database. These are used by the satellite to supply data to the PIP or receive data from the PIP in a standardised way.
The satellite PostgreSQL database must be amended to include these FDW’s and supporting framework. The satellite must then be adapted to generate the channel content to put in to the supply FDW and handle the channel content got out of the return FDW. These satellite adaptations to generate and handle channel content would be necessary whatever the format and communication protocol so there is no overhead to doing this. The PIP provides support to make the addition of the FDW tables to the satellite low overhead (see below).
When a callback is made through the return FDW the entire transaction can be atomic forming a closed loop. Starting from the initiation within the PIP, through to the return content handling and supply content re-generation within the satellite and finally back within the PIP to re-generate any web pages, including full roll back in both the satellite and the PIP in case of error. This atomicity can also be two way, so that content supply requires complete loop completion (re-generation of web pages) within the PIP. Using a closed loop can constrain performance as a result of delays in the PIP or satellite and concurrency due to locking in the shared exchange tables. Either direction can optionally be configured as an open loop (neither end waits for completion of the other) to avoid problems of this nature where necessary.
Configurations for the three kinds of satellite are set out below.
- Satellite is a TMMS:
Satellite directly includes the PIP factory generator for a “satellite connection” and uses a PIP provided template to auto build the necessary FDW schema. Can also directly include the PIP template headers for data formatting to comply with the standard channel content requirements. - Satellite uses PostgreSQL but is not a TMMS:
Satellite applies the necessary “satellite connection” DDL from the PIP TMMS manually in order to build the necessary FDW schema (or can use IMPORT FOREIGN SCHEMA, although this requires an authenticated connection first, or can just do “CREATE EXTENSION pip”). Can also optionally use the PIP template headers for data formatting, but could do its own thing entirely if necessary (although must still comply with the standard channel content requirements). - Satellite does not use PostgreSQL or even a database at all:
Satellite must add a standalone TMMS PIP “satellite connection” database which has the necessary FDW schema. The satellite must push data into this intermediate database and or handle callbacks from it itself in whatever way is appropriate to that service. Can also optionally use the PIP template headers for data formatting, but could do its own thing entirely if necessary (although must still comply with the standard channel content requirements). Depending on the implementation of the interface between the satellite and the intermediate connection database a closed loop may not be achievable in either one or both directions, in which case a loosely coupled satellite is likely to be a simpler approach.
Loosely Coupled Satellite
This kind of satellite will supply content into the PIP PostgreSQL database shared exchange tables indirectly by using the PIP API, or directly using an appropriately authenticated database connection but that is not recommended. It is not possible to use callbacks in a loosely coupled satellite, so there will never be return content to handle. A loosely coupled satellite is most likely to be open loop but can be closed loop. A loosely coupled satellite can use the PIP template headers for data formatting, but is more likely to use a different implementation (although must still comply with the standard channel content requirements).
Implementation Summary
Functionality is almost entirely within the back-end PostgreSQL database – the shared exchange tables and associated constrained views, triggers, processing and authorisation control. The TMMS model for this database adds the API and also provides the FDW extension/templates for satellites to use. The Python WSGI server under Apache/Cosign hosts the content. Standard related tools such as pgluser and Prometheus Roles and Capabilities, LCFG and spanning maps are used to configure and setup the rest of the service.
The only custom development involved in implementing this service consists of work on the Python WSGI server (by extension of the existing TheonUI server) and on the back-end database structure and function. All other effort is essentially configuration of existing services. The communication protocol between satellites and the PIP is implemented using the standard postgres_fdw extension.
Further design work will be needed on the actual structure and content of a satellite channel.