Lewis Forbes – LaTeX and Accessibility

Reading Time: 5 minutes

Programme of Study and Year: Informatics (MInf), going into final year.

Intern Position: Digital Learning Intern


As a former Computer Science and Maths student (now just a Computer Science student), LaTeX has brought me much grief over my time at university. Writing my 30+ page dissertation using the software definitely increased my confidence in using it, but to be honest I still don’t really understand how it works and frequently rely on StackOverflow.

As such, it was saddening but somewhat validating to learn that the problem of accessibility in LaTeX is one which has been plaguing the community for years – it’s maintained by a team of volunteers who have struggled to implement features which are becoming standard in electronic documents. In this post I’ll share my attempts at creating the most accessible LaTeX documents possible and conclude with steps you can take to achieve this. The speed at which LaTeX seems to advance (and break) led to a lot of trial and error in this process, and it’s possible that things which worked for me might not work for you.

I used the accessibility evaluation software Ally to determine what makes documents accessible, which gives documents an accessibility percentage and suggests where improvements can be made. Ally gave the initial PDF I provided a score of 5%, but guided me to create a document with a score of 96%. It did not comment on maths however, which is not accessible by default in LaTeX as explained in this report by Massie and Sarantsev.

Ally’s initial rating and comments for an unchanged LaTeX document.

Based on Ally’s guidance and the information in the report mentioned, the inaccessible aspects of default LaTeX documents are their lack of tagging and headers, lack of alt text, lack of metadata, and maths which cannot be meaningfully read by screen readers.

LaTeX Accessibility Summary

If your document contains a lot of maths, I recommend creating an HTML5 file using Pandoc. See the checklist in the ‘Maths’ section below for information on this.

If your document has no maths, follow these steps:

  1. Add alt text as explained in the first two bullet points in the ‘Alt Text’ section below.
  2. Add metadata as explained in the ‘Metadata’ section below.
  3. Add tags/headings using the using this PDFix tool.

As mentioned, following these steps produced 96% accessible LaTeX project, based on Ally’s score of its PDF after tagging.


I will now go into more detail about each of the different inaccessible areas mentioned.

Alt Text

Alternative text is often included for those using screen readers, but thanks to the curb cut effect has uses for many people – both visually impaired and sighted. Different sources recommend different ways of including this:

I successfully added alt text using the following methods:

  • Using a parameter on caption \caption[alt text]{caption text}, as recommended by ChatGPT. This should be used when \includegraphics{} is used within a \figure{} environment.
  • Using \pdftooltip{} from the pdfcomment interface successfully added tooltips with user-specified text to the document, which Ally recognized as alt text. This should be used when \includegraphics{} is being used with no \caption{}. For example, I used it in a \subfloat{} environment. This was advised in this StackExchange forum.

The following methods were recommended by various sources, but did not allow me to successfully add alt text to images:

Metadata

Ally highlights the need for the inclusion of a PDF’s title and language. This allows screen readers to introduce the document, which can be included with the following lines in the document preamble:

\usepackage{hyperref}
\hypersetup{pdftitle={Document Name}, pdflang={en-GB}}

PDF Tagging & Headings

There does not appear to be a way of reliably generating tagged PDFs using LaTeX. The unsuitability of two potential solutions I came across follows:

  • As outlined in the tagpdf documentation, the tagpdf package is not meant for normal document production. As such, the syntax required to use it is complicated and the package likely contains bugs.
  • As outlined on the Accessibility package GitHub page, the accessibility package is also not suitable for production and is no longer maintained. Although it does produces tagged PDFs according to Ally, it sometimes leads to documents not compiling, and sometimes causes unexpected behaviour. As an example:

Tags can be added to a PDF once it’s been created by a few different services, namely Adobe Acrobat Pro DC, Microsoft Word, and PDFix. Since Acrobat Pro isn’t free to use and Word seems to often ruin the format, I found PDFix’s ‘Make PDF Accessible’ tool to be the best solution. This also allows metadata to be changed. The company appear reputable with the PDFix privacy policy stating they delete all provided files for 30 days and pass data to third parties “only within the extent necessary to meet its obligations”.

The only problem I found with this service was its inability to render a .pdf vector image. This format is unusual, and was easily fixed by converting the image to a .png file.

Maths

Making maths accessible in LaTeX does appear to be possible but is a little complex. Most sources seem to recommend converting LaTeX documents to HTML5 documents via a semi-automated process using various tools. This aforementioned Massie and Sarantsev paper provides a good overview of the topic.

I found Pandoc to be the easiest tool to do this conversion. To convert maths it uses MathJax – a JavaScript engine which creates “beautiful and accessible math in all browsers”. HTML documents are accessible by default since they are tagged, and contain conventions for setting alt text and metadata. See this MathJax documentation page for information on screen readers for maths it helps display.

Once installing Pandoc, LaTeX documents can be converted on Windows as follows:

  1. Open command prompt (press Win+R, type cmd, press enter).
  2. Copy the location of the folder containing the .tex file you wish to convert. The .bib file should be in the same directory.
  3. In command prompt, enter:
    • cd "the folder location you copied"
  4. Enter the following command, replacing myTex.tex and myBib.bib with your filenames.
    • pandoc myTex.tex -f latex -t html -s -o output.html --bibliography myTex.bib --citeproc --mathjax
  5. Move the new file output.html up one folder level. For example, from C:/folder1/folder2/folder3/output.html to C:/folder1/folder2/output.html. This is so images’ paths are correct.
  6. Open output.html.

You might find Pandoc is less forgiving with syntax errors than your usual compiler when it comes to your bibliography. A verifier such as BibTeX Tidy can be used to identify and correct errors.

The Future

Decreasing the steps authors have to take to make their LaTeX documents accessible is an area of active development, as outlined in The LaTex Project’s accessibility publications. The most recent update I’ve seen comes from this LaTeX news article introducing the final pre-release of the June 2023 version of LaTeX.

This pre-releases produced viable, tagged documents for simple files, but cannot format some complex files, as shown in the image below. For documents with tables it produced well-compiled PDFs, but they were untagged. This means the pre-release is currently no better than the tagging methods mentioned above.

A screenshot of a unreadable document, containing text which overflows to off of the page and displayed commands.
A poorly formatted document produced by the final June pre-release of LaTeX.

Leave a Reply

Your email address will not be published. Required fields are marked *