At this point, I am assuming that my project will be some narrative exploration of my own digitised self, utilizing generative models trained on my own personal data. I don’t yet know what form the narrative will take; it may possibly be that I can present my conversation with my digital doppelganger as a narrative in itself, or perhaps use it to create a sort of pseudo-memoir of my digital identity. Maybe a reflective nonfiction piece on my own experience with the interaction would be more interesting, or a series of narrative data visualisations attempting to capture the doppelganger’s “otherness.” I’m not sure it would be wise to set my mind on a particular kind of output at this stage, before I’ve even worked with the data.
In any case, it is pretty clear to me that a mixed-methods approach will be vital to this project. Whether the end goal is discursive design, data visualization, discourse analysis, a case study, or some other broad framing of the project, the individual steps will require both computational NLP work and qualitative, participatory design-as-research.
There are a number of methods and and skills I do anticipate needing. First, I will have to do a large amount of personal data mining, cleaning, and standardization. From text files I have saved and stored personally to data exports from Meta, Google, etc., there should be quite a lot of data available to me; the difficulty here will be putting that data in a usable form and stripping out sensitive information. While I could leave in sensitive information and just make sure never to expose my model or its outputs to anyone else without carefully filtering it to make sure no secure information is leaked; I would be much more comfortable sanitizing out as much potentially vulnerable data as possible first.
Then I will need to determine an appropriate model and, depending on the scope of my data and my desired level of complexity, either train it from scratch or fine tune a pretrained version. I have some limited experience fine-tuning models with Pytorch, FastAI, and HuggingFace that should be useful here.
At this point, my plan becomes less clear. Once I can actually communicate with the model, I’ll have to see what happens and utilize whichever methods seem most relevant and interesting. It might be worth doing more rounds of quantitative research–maybe the model’s word frequencies or sentiment ratings will vary strongly enough from my own to be an interesting line of inquiry–but my primary methods will be qualitative. I could perform a very intentional and prescriptive case study on the model, or simply interact with it and see what happens, or even start to treat it as a collaborator rather than an object of study. It really depends on what the model itself seems suited for and which questions my interactions with it raise.


I’m curious, have you had specific past interactions with generative models or other technologies that have particularly highlighted for you the complex relationship between the self and the digital doppelgänger? If you end up going with a reflective nonfiction piece or memoir, I wonder if it would be interesting to include some reflections on the “before” — i.e., how your experiences up to this point have influenced your thinking about the digitized self — which could then offer interesting continuities with or divergences from your thinking after undertaking the research (e.g., do you have a very different experience of the digitized self when the model is trained on a general body of data vs. your own personal data?).