Any views expressed within media held on this service are those of the contributors, should not be taken as approved or endorsed by the University, and do not necessarily reflect the views of the University in respect of any particular issue.

My internship working on Snapshot

I am Anita, this year’s Website Programme summer intern, and for the last ten weeks I have been working on improvements to our site-auditing tool, Snapshot.

About Me

I am an Informatics student going into the final year of my Cognitive Science degree at the University of Edinburgh.

I started working part-time in Website and Communications as the Web Audit Assistant in January 2018. Now, I’ve transitioned to a full-time intern post, during which I have been focusing on Snapshot, a site-mapping and auditing tool for University websites.

About Snapshot

The main purpose of Snapshot is to give site owners and editors a better understanding of their websites. EdWeb users can use Snapshot to build sitemaps, analyse how old web content is and survey what tracking cookies are active on a site.

Get a site health check with Snapshot: read our intro blog post on the tool

Summer 2017 intern Patrick’s recap of developing Snapshot

University websites can be large and complex, and therefore it is important for websites to be audited regularly. It is not unusual for websites to contain outdated content and neglected pages.

Also, it is important to be certain that our websites are compliant with data protection and privacy legislation, especially since the General Data Protection Regulation (GDPR) took effect this May.

What I’ve been working on

In the last ten weeks, I have been focusing on Snapshot’s cookie-auditing functionality. Many privacy-invasive cookies are set by outside websites through elements like embedded videos, maps and Twitter feeds. These are called third-party cookies because they are set by external websites.

Snapshot collects site information using a Python web-crawling framework called Scrapy. However, this technique does not get all cookies which appear on a webpage as it doesn’t actually open the page, and many third-party cookies are set by scripts which only take effect when a page is loaded. To solve this problem, our new cookie collector uses an automated process to open pages with a browser called PhantomJS and load all of the cookies.

In addition to all of Snapshot’s old features, users will soon be able to view site cookie reports. These reports highlight cookies which are problematic and also list those which are safe and covered by the University cookie policy. Hopefully, this new feature will help site managers locate and eliminate unapproved cookies on their websites.

Snapshot has potential for growth and improvement. Site-crawling and cookie scanning are services that would be beneficial for all University websites, even those that are not part of EdWeb. If a more universal crawler were built for Snapshot, we would gain valuable knowledge about the University web estate, large parts of which are currently uncharted.

What I have learned

This internship has been both challenging and rewarding. I have picked up many new skills, including browser automation with PhantomJS, Selenium and Chromedriver, and how to use Flask and Django Python frameworks.

I’ve also had my first taste of front-end development. Lastly, I learned about and participated in user testing. Overall, I’ve gained valuable experience and confidence in my software development skills.

Related reading

If you would like to read more on the topics discussed here, I recommend these blogposts:

Mattias Appelgren: My internship and the Cookie Audit tool

Colan Mehaffey: The building blocks of our new strategy


Leave a reply

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>


Report this page

To report inappropriate content on this page, please use the form below. Upon receiving your report, we will be in touch as per the Take Down Policy of the service.

Please note that personal data collected through this form is used and stored for the purposes of processing this report and communication with you.

If you are unable to report a concern about content via this form please contact the Service Owner.

Please enter an email address you wish to be contacted on. Please describe the unacceptable content in sufficient detail to allow us to locate it, and why you consider it to be unacceptable.
By submitting this report, you accept that it is accurate and that fraudulent or nuisance complaints may result in action by the University.