My internship working on Snapshot
I am Anita, this year’s Website Programme summer intern, and for the last ten weeks I have been working on improvements to our site-auditing tool, Snapshot.
About Me
I am an Informatics student going into the final year of my Cognitive Science degree at the University of Edinburgh.
I started working part-time in Website and Communications as the Web Audit Assistant in January 2018. Now, I’ve transitioned to a full-time intern post, during which I have been focusing on Snapshot, a site-mapping and auditing tool for University websites.
About Snapshot
The main purpose of Snapshot is to give site owners and editors a better understanding of their websites. EdWeb users can use Snapshot to build sitemaps, analyse how old web content is and survey what tracking cookies are active on a site.
Get a site health check with Snapshot: read our intro blog post on the tool
Summer 2017 intern Patrick’s recap of developing Snapshot
University websites can be large and complex, and therefore it is important for websites to be audited regularly. It is not unusual for websites to contain outdated content and neglected pages.
Also, it is important to be certain that our websites are compliant with data protection and privacy legislation, especially since the General Data Protection Regulation (GDPR) took effect this May.
What I’ve been working on
In the last ten weeks, I have been focusing on Snapshot’s cookie-auditing functionality. Many privacy-invasive cookies are set by outside websites through elements like embedded videos, maps and Twitter feeds. These are called third-party cookies because they are set by external websites.
Snapshot collects site information using a Python web-crawling framework called Scrapy. However, this technique does not get all cookies which appear on a webpage as it doesn’t actually open the page, and many third-party cookies are set by scripts which only take effect when a page is loaded. To solve this problem, our new cookie collector uses an automated process to open pages with a browser called PhantomJS and load all of the cookies.
In addition to all of Snapshot’s old features, users will soon be able to view site cookie reports. These reports highlight cookies which are problematic and also list those which are safe and covered by the University cookie policy. Hopefully, this new feature will help site managers locate and eliminate unapproved cookies on their websites.
Snapshot has potential for growth and improvement. Site-crawling and cookie scanning are services that would be beneficial for all University websites, even those that are not part of EdWeb. If a more universal crawler were built for Snapshot, we would gain valuable knowledge about the University web estate, large parts of which are currently uncharted.
What I have learned
This internship has been both challenging and rewarding. I have picked up many new skills, including browser automation with PhantomJS, Selenium and Chromedriver, and how to use Flask and Django Python frameworks.
I’ve also had my first taste of front-end development. Lastly, I learned about and participated in user testing. Overall, I’ve gained valuable experience and confidence in my software development skills.
Related reading
If you would like to read more on the topics discussed here, I recommend these blogposts:
Mattias Appelgren: My internship and the Cookie Audit tool
Colan Mehaffey: The building blocks of our new strategy