The second week of my internship is nearly over and although it has been less hectic than the first, it has still been very busy. I have started working with the witchcraft spreadsheets by trying to locate the witches and I then met with the creators of the database to ensure historical accuracy. By the end of this week, I have been able to digest the information from the many meetings and research to be able to come up with a plan for action for this project.

Starting to find locations

At the start of the week I began to work on the spreadsheet containing information about the location of all of the witches. The location of each accused Scottish witch has already been noted down in the original database. However, these locations are not matched up to coordinates for a physical place. So I decided the best task to start with is to link the witches’ locations to a coordinate point of the town/village. These points can then be plotted on a map.

How to find locations on Wikidata

The matching up of the accused witches residence to a physical place name and coordinate has been done by using Wikidata identifier. The Wikidata identifiers are used to identify every difference between data entries for example an identifier would be for a person to have a Scottish nationality which has a Q number of – Q181634. There is a Wikidata identifier (Q number) for every location that has been added to Wikidata and each of these also have a coordinate point which can be used to identify it as a physical location.

Google Sheets has a plugin which can be used to find the unique Q number for each data item and if there is a Q number then it will load into the spreadsheet using the plugin =WIKIDATAQID. When a location has a Q number then it can be associated with the specific witch that resided there on Wikidata. This plugin can be very slow at times but does eventually work. Once a Q number has been found it needs to be saved as the actual number by copying the formula and pasting as ‘Paste values only’ rather than being kept as the original formula as it will likely crash.

Figure 1: Screenshot of Witches Location Google Sheets using the =WIKIDATAQID plugin. Note that the -1 Q number is when there is not an entry on Wikidata for this location. Source data from Scottish Witchcraft Dataset.

Figure 2: Screenshot of the Wikidata plugin =WIKIDATAFACTS which displays the first line of the Wikidata entry. Source data from Scottish Witchcraft Dataset.


















The first line of each Wikidata entry can also be tested to understand if the Q number is correct for each location using =WIKIDATAFACTS. Eventually once every witch is matched with their location on Wikidata, then these coordinate points can be projected onto a map. However, the major issue is that many of these locations are yet to have been added to Wikidata as a data item and therefore lack a Q number. So once I find out what locations have a Q number then I will need to investigate to locate the other locations and create new Q number entries on Wikidata.

After starting this matching process, it is clear that there is going to need to be a lot of investigation in order to find the places that have not been added to Wikidata. Currently there are 497 out of 821 places that have yet to be added so it is going to take a very very long time to find these areas and add them to Wikidata. But do not fear, I have another 11 weeks of this internship to solve this problem!

Woman in Data Science conference

Aside from my main project I had the chance to go to the Woman in Data Science conference at the University of Edinburgh on Tuesday 11/06 to hear about the gender imbalance in the subject area of Data Science. Coming from a background of Geology with a 50:50 gender split I tend to forget about the general gender imbalance in STEM subjects and so it was interesting to hear about these issues and what is being done to fix these problems. There was a range of speakers from the Information Services and the Data-Driven Innovation Program who each spoke about what their departments are wanting to/have done to tackle this issue. At the end of the conference the audience were then asked multiple questions about why the gender imbalance has occurred and what can be done to fix it. These questions got the room thinking about these issues too, hopefully people’s ideas for changing this imbalance will be used in the future.

Even more meetings

This week has been a relatively light meeting week in comparison to last week but meeting with the creators of the Witchcraft Dataset on Wednesday 12/06 was probably the most important meeting so far. In general the creators have similar thoughts to me in finding the residence location for the witches, I am just going to need to be careful to have accurate locations.

At the end of my meetings I now have some good websites shown below which can be used to help locate the residence locations for the witches. These websites would also be very useful for locating any obscure places in Scotland which have a low internet presence.

Useful websites

GB 1900 – Uses 1900 Ordnance Survey maps to locate different places in the UK.

Scotland’s Places – This website can be used to search for information about different regions and towns in Scotland.

National Library of Scotland Maps – There are so many useful maps and documents which are part of the online archive with maps dating from the 1500’s and many are georeferenced.

Gazetteer of Scotland – The Gazetteer contains a record of every town and village in Scotland.

So at the end of my second week I think I am now ready to tackle the first task of finding the residence location of these witches and adding these to Wikidata. Aside from this challenge, I have produced a plan for the weeks ahead of me with my ideas for visualising the data. I have some big plans for the data so I am excited to see how the next few weeks unfold.