So we are at the end of week 6, almost half of the way through the internship, and time seems to have flown by. These past weeks I have been put to work with data processing, and bulk editing Wikidata with all sorts of information extracted from the Survey of Scottish Witchcraft database.
What we haven’t really explored before on the website has been the relationships between the accused witches and people involved in the cases, as well as the links between the witches trials. What a lot of people don’t know about the Scottish witch hunts, was the way accused witches would be encouraged to denounce other people as witches, sometimes in a desperate plea to lessen their sentences.
A wealth of information about mentions and cases are in the Survey. So, to get this information into Wikidata, to make it open-source and free for everyone to use, I had to get to grips with bulk processing. There are many different bits of software to aid in Wikidata editing and one I’ve gotten to quite enjoy has been OpenRefine.
Google Sheets and OpenRefine
I’ve found OpenRefine to be a great tool for data wrangling, but when I started using it I was slightly clueless as it’s not that well documented. It’s been amazingly helpful for reducing time in data processing. There are 3213 cases in the Survey database, one for each accused witch, so manually editing and adding each case would probably take the whole 12 weeks of this internship, maybe even longer! With OpenRefine, I can do batches of hundreds of cases in minutes; a huge time-saver.
How I started was using a Google Sheets spreadsheet to lay out what data I wanted to put onto Wikidata as well as the data I needed to link to other pages already on Wikidata. All of the accused witches have already been added onto Wikidata, with each having a property, Survey of Scottish Witchcraft – Accused witch ID, with a unique reference from the survey as the value. As many of the accused witches have the same names (5 Margaret Youngs!), this ID keeps us right to distinguish these witches from each other.
We used a Google Sheets plugin from the Wikipedia and Wikidata Tools, called WIKIDATALOOKUP, which takes the a property and a value to find the items. We used this plug-in to find all the Q numbers for the witches whose cases we are creating. After I created a column for these Q numbers, I was ready to import the a batch (200 cases) spreadsheet to OpenRefine. It’s important to split your data into batches to (1) reduce the time OpenRefine takes to execute and (2) let less errors pass through, if there are any.
OpenRefine has a feature you can use on columns, called reconciliation. Reconciling a column with Wikibase data means to match the cells with items on Wikidata. If no matches are found then you have the choice to create a new item with the label being the text in the cell. First, I reconciled my column with the unique identifiers for the accused witches Wikidata pages. This matched the whole column, as expected, with the accused witches items. After this, I reconciled the Len column (short for “label in English”) and as none of the cases are on Wikidata yet, I had the option to create a new item for each case.
After this, I had to consider how I wanted to structure the new case items I wanted to make. To do this I used the “Edit Wikibase schema” option in the Wikidata Extension drop down. Using the columns in the schema, I constructed how I wanted the items to look like. This is shown below:
After I decided I was happy with the schema, I checked the preview to see a preview of what the first couple of rows of data would look like imported onto Wikidata. After couple and triple checking (always important!), I clicked the “Upload edits to Wikibase” option in the Wikidata extension drop down. And ta-da! I had created 200 cases of witchcraft investigation in a click of a button. Like magic 😉 Here’s an example of one of the cases that was created in this batch.
Recovering Histories Edit-a-thon
At the end of this week I had the pleasure of attending a Wikipedia edit-a-thon arranged by students at the University with the purpose of filling knowledge gaps on Wikipedia and of rectifying inequalities online. The focus of the afternoon was on amplifying women’s and LGBTQ+ voices, as well as making known Edinburgh’s history of slavery on Wikipedia. Did you know that less than 1 in 5 biographical articles on Wikipedia are written about women? Definitely a much-needed and worthwhile cause.
Our project on Scottish witches is very related to this cause, as a big factor in the persecution of witches in 16th-17th century Scotland was linked in misogyny. The event gave us an whirlwind introduction on how to edit Wikipedia, and after we could pick a choice of what articles we wanted to edit. I chose to create a Wikipedia article on one of the most prolific Alloa witches, Margaret Duchill (click the link to check it out!).
I had a great time learning how to edit Wikipedia, as I hadn’t done anything like that before then (except perhaps some Wiki vandalism back in the day). I hadn’t known how easy it was to actually create an article, as long as you reference. As of the weekend after the event, my article hasn’t been taken down by the Wikipedia mods, so I’m very happy about that. A successful couple of weeks I’d say!
Leave a Reply