Personalised Information Portal – Top Level Design

Introduction

Top level design for the Personalised Information Portal, Computing Project #470.

The Personalised Information Portal (PIP) hosts web pages with an individualised cut of data that is appropriate to the teaching and administrative duties currently held by the particular member of staff viewing those pages. For example: a member of staff that is a PGR supervisor will see a page containing a set of live data on those PGR students they are supervising; a member of staff that is a PGR Selector for an institute will see a page containing a set of live data on the current applications being processed for that institute; a member of staff that is an institute director will see a page containing management reports for things like student numbers and funding information customised to their institute. Individualised web pages provide direct access to the data we have in local systems (and links to associated resources) needed by staff performing the associated duties. This: is more efficient for staff than having to search for aggregated resources held in many different places; helps the School to meet various data protection obligations; provides a useful resource for new staff (or existing staff taking on new roles) when they first start out. In addition, the content (or a particular view of it) can be subsequently used within a PDR.

Design Principles

The content on the PIP will be predominantly (and at the outset exclusively) provided from Theon (or a Theon Managed Micro Service). So the design must work simply and efficiently when using this source. The design should not however preclude the use of other sources in the future, even if these are not as directly integrated or as easy to add.

The PIP should be completely agnostic to the sources and content of data so the design must not require any context sensitive (source dependent) structure or functionality. The decision on what content to provide and how that content is presented (while remaining consistent with the overall presentation) on the PIP rests entirely with the originating service, since that is where the data is mastered and where what data to present and how is most effectively controlled. This keeps the PIP itself simple to maintain and keeps any evolving logic specific to each originating service local to that service.

The design should be lightweight, robust and fast to implement by re-using or borrowing from as much existing technology and standards as possible. For example, a solution like uPortal (what MyEd is based on) is too heavyweight as well as being an outdated approach. The PIP only needs to host static content, it does not need to support embedded applications.

Adding a new external service onto the PIP should be cheap to do on top of the changes that have to be made to the external service anyway in order to provide the appropriate content in a suitable format. Assuming ideal conditions (which depend on the nature of the service) a service owner that has the content ready that they want to put on the PIP, and is familiar with the procedure, should need no more than half a working day to include their content on the PIP.

Terminology

PIP
Personalised Information Portal. The service acting as a proxy to serve up data on an individualised basis from any number of channels supplied by any number of external services. The PIP is essentially a service that collates the content from other services and builds this content onto individualised web pages for users (as well as providing the content via an API for re-use). The PIP is content agnostic – it does not need or use contextual information which is specific to each individual external service. In effect the PIP is just a relay. The PIP can provide two way relaying – some services may want to expose controls for update within the PIP, when bouncing the user back to the full external service interface would be overkill. The PIP here pushes content back to the external service, changes can be made, content re-generated and relayed back to the user through the PIP. Note that we assume just one PIP exists here, in practice there might be more than one if they have more specialised roles, the design does not preclude there being any number of specialised instances.
satellite
A service that has content that is made available through the PIP.
channel
A single data set presented on the PIP and supplied by a satellite. A satellite may supply more than one channel. A channel may contain embedded callbacks to allow a PIP to remotely initiate specific actions on the satellite. A channel dataset must be supplied raw (unformatted) but can also be supplied formatted (in compliance with the PIP channel specification). A channel will also provide associated meta data, such as generation time and a “backlink” (allowing the channel to provide a link to the satellite so that the user can view additional content or perform any operations not implemented via callback).
supplier
A supplier is a satellite that is providing one or more channels to a PIP in order for that PIP to present the content held in that external service relevant to the current individual accessing. A supplier pushes content to a PIP.
receiver
A receiver is a satellite that can accept return data from one or more of its channels in a PIP in order for that PIP to remotely affect change within the corresponding data sets held in that external service. The PIP pushes content to a receiver.
tightly coupled
A satellite that is structurally integrated with the PIP in order to be a supplier and/or a receiver of content for the PIP. A tightly coupled satellite can operate with a closed loop (push and wait for response) or open loop (push and continue) as either supplier or receiver for the PIP.
loosely coupled
A satellite that uses the RESTful API provided by the PIP in order to be a supplier or a receiver of content for the PIP. A loosely coupled satellite can operate with a closed loop (push and wait for response) or open loop (push and continue) as supplier for the PIP, but it cannot operate as receiver for the PIP.
TMMS
Theon Managed Micro-Service. The School Database (Hypatia) is a TMMS, despite it being anything but “micro”. There are others within the School such as Referee, DiceDesk, InfROS. Over time Hypatia itself will be fractured into a separate TMMS to cover each distinct workflow. The PIP will itself be a TMMS. A TMMS provides out of the box: PostgreSQL database model management; a web based user interface (for administration); a RESTful API; internalised page generation for a web interface. Services built using a TMMS, such as the PIP, are fast to implement.
callback
A mechanism for a channel on a tightly coupled satellite to use a PIP to pass over user affected content and initiate an action on that satellite. A callback can be implemented as a closed loop requiring the satellite to supply updated content for the channel reflecting the results of the action.
closed loop
A closed loop is synchronous. An atomic transaction initiated from the push of data (either from a satellite to the PIP as a supplier or from the PIP to a satellite as a receiver) that must fully complete before control is handed back. In the context of a supplier this ensures that channel content has been transferred and made available to the end user before the transaction is complete. In the context of a receiver this ensures that return content has been transferred and any action associated with it, including perhaps updating the channel content, has completed. A closed loop can easily result in performance problems. or even worse deadlock, in the PIP and a satellite so care must be taken in implementation.
open loop
An open loop is asynchronous. Any processing resulting from the push of data (either from a satellite to the PIP as a supplier or from the PIP to a satellite as a receiver) is deferred and control is handed back immediately. In the context of a supplier this means that once the channel content has been transferred the operation is complete – the necessary processing to make the content available to the end user will then happen at some point thereafter. In the context of a receiver this means that once the return content has been transferred the operation is complete – any action associated with it will be down to the satellite to schedule.
notification
A notification is a side band of a channel data set. These are temporary messages that are handled by the PIP differently (but consistently across all channels) from the main content. They are usually, but don’t have to be, generated from running callbacks.

Design

The PIP will have its own local PostgreSQL database. This will hold the content supplied by each satellite. The content in this local database will be used to create individualised web pages. A standard Cosign authenticated Apache service fronts the content. All the web page HTML is created directly from the database on the fly by aggregating the relevant satellite channel content for the currently authenticated user. This is done by a Python server running under WSGI. This also handles the users delegated credentials and database connection pooling. Identical functionality already exists to do this, and has been proven robust and performant over many years, in the server for TheonUI. We will simply re-use the core functionality in this to develop a custom server for hosting satellite channel content. Like in TheonUI all authentication and authorisation control will be offloaded to the database itself, as the connection is made using the current users delegated credentials. The Python server will incorporate a content agnostic version of the CGI form submission handler, currently in use on TheonPortal, in order to implement callbacks.

The most recent content supplied by a satellite is always used on the web page on reload. User initiated forced refresh of satellite content can be implemented, where necessary, by using a callback for the originating satellite. Any satellite content which is raw data will, when updated, automatically trigger internal page re-generation (full or partial) within the database – the most recent re-generated page is returned on web page reload. Such content can also be re-generated, where more appropriate, by scheduled run. In most cases though the logic to constrain, build and format channel content is done in each satellite, the PIP just replays and/or reformats relevant fragments of the content into a web page. There will be a CSS stack accessed through Apache used to enforce a consistent style on the PIP and satellite channel content.

The PIP will be implemented as a TMMS, meaning all functionality and authorisation control (apart from the actual web hosting and authenticated user mapping done by Apache) is implemented in the local back-end database. In this way the PIP can: use the built-in RESTful API for direct content access and callback by users and for loosely coupled satellite data upload; use TheonUI for administrative management (global settings).

Authentication of users on the local PostgreSQL database will be done through Prometheus roles and capabilities using pgluser to manage change. Authorisation rules will be part of the TMMS model. The local PostgreSQL database will take a feed of data from the School Database containing active users assigned posts, teaching and administrative duties/scope. These attributes are used by satellites to construct appropriately grouped channel content, correlated against users and ultimately constrained within the PIP based on currently authenticated user.

Users with an administrative role that allows them to see other users pages (or specific views of those pages) do so by virtue of having the necessary user channels embedded on their own pages.

Adding a satellite and channel(s) to the PIP can be done by anyone making the necessary satellite amendments (described below) and the PIP owner enabling access for that satellite.

Tightly Coupled Satellite

There are three kinds of satellite:

  1. Satellite is a TMMS
  2. Satellite uses PostgreSQL but is not a TMMS
  3. Satellite does not use PostgreSQL or even a database at all

All of these can be configured as a tightly coupled satellite, but in order of increasing complexity. Any of the above could instead be configured as a loosely coupled satellite.

A tightly coupled satellite is connected to the PIP by having shared exchange tables defined in a PostgreSQL database which are each a foreign data wrapper for a corresponding table in the PIP PostgreSQL database. These are used by the satellite to supply data to the PIP or receive data from the PIP in a standardised way.

The satellite PostgreSQL database must be amended to include these FDW’s and supporting framework. The satellite must then be adapted to generate the channel content to put in to the supply FDW and handle the channel content got out of the return FDW. These satellite adaptations to generate and handle channel content would be necessary whatever the format and communication protocol so there is no overhead to doing this. The PIP provides support to make the addition of the FDW tables to the satellite low overhead (see below).

When a callback is made through the return FDW the entire transaction can be atomic forming a closed loop. Starting from the initiation within the PIP, through to the return content handling and supply content re-generation within the satellite and finally back within the PIP to re-generate any web pages, including full roll back in both the satellite and the PIP in case of error. This atomicity can also be two way, so that content supply requires complete loop completion (re-generation of web pages) within the PIP. Using a closed loop can constrain performance as a result of delays in the PIP or satellite and concurrency due to locking in the shared exchange tables. Either direction can optionally be configured as an open loop (neither end waits for completion of the other) to avoid problems of this nature where necessary.

Configurations for the three kinds of satellite are set out below.

  1. Satellite is a TMMS:
    Satellite directly includes the PIP factory generator for a “satellite connection” and uses a PIP provided template to auto build the necessary FDW schema. Can also directly include the PIP template headers for data formatting to comply with the standard channel content requirements.
  2. Satellite uses PostgreSQL but is not a TMMS:
    Satellite applies the necessary “satellite connection” DDL from the PIP TMMS manually in order to build the necessary FDW schema (or can use IMPORT FOREIGN SCHEMA, although this requires an authenticated connection first, or can just do “CREATE EXTENSION pip”). Can also optionally use the PIP template headers for data formatting, but could do its own thing entirely if necessary (although must still comply with the standard channel content requirements).
  3. Satellite does not use PostgreSQL or even a database at all:
    Satellite must add a standalone TMMS PIP “satellite connection” database which has the necessary FDW schema. The satellite must push data into this intermediate database and or handle callbacks from it itself in whatever way is appropriate to that service. Can also optionally use the PIP template headers for data formatting, but could do its own thing entirely if necessary (although must still comply with the standard channel content requirements). Depending on the implementation of the interface between the satellite and the intermediate connection database a closed loop may not be achievable in either one or both directions, in which case a loosely coupled satellite is likely to be a simpler approach.

Loosely Coupled Satellite

This kind of satellite will supply content into the PIP PostgreSQL database shared exchange tables indirectly by using the PIP API, or directly using an appropriately authenticated database connection but that is not recommended. It is not possible to use callbacks in a loosely coupled satellite, so there will never be return content to handle. A loosely coupled satellite is most likely to be open loop but can be closed loop. A loosely coupled satellite can use the PIP template headers for data formatting, but is more likely to use a different implementation (although must still comply with the standard channel content requirements).

Implementation Summary

Functionality is almost entirely within the back-end PostgreSQL database – the shared exchange tables and associated constrained views, triggers, processing and authorisation control. The TMMS model for this database adds the API and also provides the FDW extension/templates for satellites to use. The Python WSGI server under Apache/Cosign hosts the content. Standard related tools such as pgluser and Prometheus Roles and Capabilities, LCFG and spanning maps are used to configure and setup the rest of the service.

The only custom development involved in implementing this service consists of work on the Python WSGI server (by extension of the existing TheonUI server) and on the back-end database structure and function. All other effort is essentially configuration of existing services. The communication protocol between satellites and the PIP is implemented using the standard postgres_fdw extension.

Further design work will be needed on the actual structure and content of a satellite channel.

Staff Personalized Information Portal

These blog pages will track design and development of the Staff Personalized Information Portal, which is a local computing project.

Description

The Staff Personalized Information Portal contains a personalised cut of data that is appropriate to the teaching and administrative duties currently held by the member of staff. For example: a member of staff that is a PGR supervisor will have a page containing a set of live data on those PGR students that they are supervising; a member of staff that is a PGR Selector for an institute will have a page containing a set of live data on the current applications being processed for that institute; a member of staff that is an institute director will have a page containing management reports for things like student numbers and funding information customised to their institute. The intention is that it provides a one stop shop for access to the data (and links to associated resources) needed to perform those duties (well, that which can be satisfied through our local systems anyway). It is more efficient for staff but also helps us meet various data protection obligations. In addition the content (or a particular view of it) can be subsequently used within a PDR. We also think it would be a useful resource for new staff (or existing staff taking on new roles) when they start.

We produced a rough prototype of what such a portal might look like and examples of what content it might have on it. This was reviewed by a number of administrative staff and was also demonstrated to over ten different academic staff for feedback. There was unanimous agreement that this would be a good thing. A more comprehensive report on the feedback received would be produced as an initial deliverable for this project. This project will define and produce an underlying technical framework, evolved from the prototype and user requirements, onto which multiple sources of data can ultimately be included, as well as a basic information portal for academic staff to cover the core known requirements. An initial deliverable of the project will itemise the specific datasets that will be included as well as the source, their construction and whether any new central data is required. However the actual content on the page is likely to change and grow over time based on what different users in different roles identify that they need (and this would constitute ongoing work beyond this project).

We would like to see a usable prototype in place during Semester 2 of 2018/19 with the final version being available from early in Summer 2019 (so as to cover duty assignment, TSP bid handling and course preparation).

 

 

 

Mock REF Reviewer System

This project is to provide a simple paper referee and review system for the mock REF exercise. The expectation is that College will produce something for the actual REF submission (in 2021). So in this context we want something that can be thrown together fairly quickly. Also the workflow is not entirely clear at the outset, so we also want something that can be quickly altered. So I decided to use a Theon managed service. This means the database schema and corresponding UI can be prototyped and turned into a production system very quickly, but also allows the live system to be easily updated to account for any late design changes. Furthermore we now also get a RESTful  API for free by doing this, and if a more custom UI turns out to be required it will be relatively quick to do using the existing API.

Production Hadoop Service

Myself and Chris have started working on this project. The idea is for both of us to get up to speed with installing and running a Hadoop cluster in the first instance.

To date we have got one one node running all service and successfully running a job. This was using secure ssh for connection. We have started looking at converting this to using Kerberos before trying to build more nodes. Configuration seems to be working but we have an issue using host principals to ssh.

make comments on criteria available

Another (feature creep) request for the new system. It looks likes the mechanism is there and this will be easy to bolt on subsequently in which case it is unlikely to be done with this particular project.

Returning feedback to students wasn’t part of the  original project description, rather a requirement that has arisen in the meantime. But the fact that the reports are now stored is surely a  big step in providing that feedback. When we agreed at Teaching Committee that information from the reports  should be returned to students as feedback, I pointed out that it would probably not be possible to do it in June 2015, but I hope we will have something in place for June 2016.

Achieving this with the new system is probably just a case of Webmark also producing a
reduced version of the form just including the relevant fields students can see and then a specific index page for access by students that shows them that form.

Comment: That would be perfect! The “relevant fields” are:

  • Individual marking form: Comments on the criteria
  • Agreed mark form: nothing
  • Moderator’s form, if any: Comments on the criteria

I guess that an ideal “specific index page for access by students” would  be to include it on student.inf.ed.ac.uk but with a time delay so that  it is only accessible once the examiners’ meetings have passed.

Mostly Done

The bulk of this has now been done (the first four goals).

The “Proposal for Submissions” was followed pretty much as-is. The MSc submissions link will now jump to an MSc variant of the UG4 form but using the same underlying mechanism. Both handling flags can be set and materials uploaded at submission time.

The “Proposal for Access to Submissions” was modified slightly. In the end the existing “projsubs/ug4” and “projsubs/msc” folders were retained (rather than new URLs and folders being created) with the modified indexing code. Instead of passing arguments the indexing behaviour is automatically changed based on whether the current year is the same as the year being processed. If not then the old behaviour is retained. If it is then the new behaviour is followed. This new index format for the current (assessment) year is as described, except that to achieve sorting a variation of the “dotable” cgi script is used and a data file is produced instead (mapped to the cgi script by a rule in .htaccess). Some other minor changes were made also to the existing setup (such as indexing in year descending order). The indexing scripts now generate the liveroot files directly (since they have to change the name and extension) rather than via stdout. They also generate a data file into the upload directory which maps the matriculation number to the submitted folder number – this is used by Webmark (see below).

The “Proposal for Webmark Returns” was also modified slightly. The end result is the same though – a PDF for each filled in form and a data file for each mark is copied into the submission upload directory. Instead of using a cc mail alias and remctl a feature was added to Webmark to allow per-output subdirectories in the final file output path, components of which could be literal or set by the value of a form field. An additional source was added which is the data file generated by the indexing scripts above. This allows a “dynamic” output directory (which is the students upload submission directory) to be set against the additional outputs. By this means Webmark can write the files for the indexing scripts directly into their final location (and safely as it is part of the user submission process, which also ensures that the students upload directory is in place before a Webmark submission can be made). A consequence of this though is that it is no longer possible to fill in a blank form for the mark returns – a student must be selected from the drop-down, in practice it would be an error condition if not all the students are listed in the drop-down.

The “Proposal for Public Access” has not been done yet – but will be another modification to the indexing scripts. In this context they will simply produce a list of projects for all years (including the year in the index) with just the student name and project title (which is a link to the copied PDF as at the moment). This list will only include students with distinctions though (as previously described).

To clean everything up the older msc project submissions were moved from their original “infthesis” upload location (where they were mixed up with PhD and MScRes submissions) into the new “mscprojects” location and the indexing scripts re-run for the default three years and then individually all the way back to 2003. The indexing scripts were also manually re run for all the existing UG4 years.