25 items found for ""
- Open-source causal mapping functions
Over the last three years we have developed a whole range of algorithms for causal mapping. The algorithms are published open-source in the form of functions for the programming language R. These functions are the heart of the analysis engine in our app CausalMap3. The app itself is free to use for analysing existing causal mapping data files but it is closed source, so the functions are a reproducible window into the algorithms and can be used by anyone who wants to analyse and visualise causal mapping data without using the Causal Map app. When we get a chance, we will release them as a full R package. But for now you can read and download a single R script which contains the functions and all you need to use them. We are still in the process of documenting the functions. To make them easier to use, we have a vignette which demonstrates the use of many of the functions as applied to example causal mapping data files (here is the main example file and here is a special example to demonstrate the use of combined opposites). The vignette is an .Rmd file: if you process it using the package knitr or by pressing Knit in Rstudio, you should get output like this. It contains over 50 different types of map.
- EES 2023: Sharing Our Journey on AI's Application in Qualitative Research
Hi there, Steve and I (Gabriele) are excited to tell you about our latest work on using AI in qualitative research. Back in November 2023, Steve took the stage at the European Evaluation Society. His presentation, titled "What Influences What? Using AI to Turn Stakeholders' Stories Into Causal Maps, Rapidly, Rigorously and at Scale", for us it wasn't just any talk; it was a landmark moment in our work with AI in evaluation. The presentation is accompanied by a report detailing our methodology with a proof-of-concept research study. What we find so exciting is that how the causal mapping approach provides such a simple way to make sense of large amounts of text. Basically, you make a high-level statement explaining any particular aspects you are interested in, press the big green button, (cross your fingers) and watch the maps appear. It’s a very different feel from working with traditional qualitative data analysis. Our approach introduces two innovative uses of AI in evaluation: Automated Qualitative Interviews: conducting extensive, AI-powered online interviews to probe (in this study case) current issues in the USA. Automated Data Analysis: using AI to automatically code and analyse these interviews, creating detailed causal maps that reveal shared perceptions, group-specific views, and evolving trends. It's been an adventure, figuring out how these AI tools can help evaluators get a grasp on the complex systems they work with - projects, programs, or policies. If you want to know more about this, check out our full report here. We'd love to hear your thoughts!
- News: Causal Map 3 is live!
Your causal mapping journey just got an upgrade with Causal Map 3 (CM3)! 🎉 CM3 is the latest version of the Causal Map app, which is now live and available for use! The new version comes with several new features that make it even more user-friendly and efficient than Causal Map 2 🚀. Some of the new features include: 🌓 Improved filtering options: The filters are now split into 3 different sets that are simpler and easier to use 🤩 📈 Uploading and adding data: It’s easier to upload new and additional data, and to tweak existing data. And of course, CM3 is compatible with CM2 files. 👩💻 User interface: Enjoy full-screen capabilities for maps and tables, along with improved dropdowns and map legends.. ✍ Subscriptions cover both platforms: so if you have a subscription to Causal Map 2, you get Causal Map 3 for free! 💰 These new features make CM3 an even more powerful tool 💪 for analysing and visualizing causal claims within narrative data from interviews or reports. We’re adding new features and tweaks all the time, so watch this space. If you’re interested in learning more about Causal Map 3 you can visit our new guide 📚
- Why do causal mapping rather than ordinary QDA?
Question: I've got a load of texts to analyse, can I use Causal Map just for standard Qualitative Data Analysis, like ordinary thematic analysis, to identify important themes? Our answer: That doesn't really work in Causal Map. Sorry! Question: Why not? If you've got a good qualitative text analysis pipeline, why can't you generalise it? Our answer: We put the question the other way round: why do thematic analysis (which is harder) when causal mapping can identify not only some of the important themes but also tell you how they influence one another? ... what you really want to do in the end, especially if you are doing evaluation, is find out what causes what in the eyes of your stakeholders. Identifying static themes can be interesting but often it's the causal information which helps you answer your main research and evaluation questions. Causal mapping is often a great way to cut to the chase. Thematic analysis is more of an art than a science. "What are the main themes here" is a very open-ended question which can (and should) be interpreted in different ways by different analysts ("positionality"). Whereas people (and GPT-4) tend understand the instruction "identify each and every section of text which says that one thing causally influences another" quickly and easily, and they tend to agree on how to apply the rule. The way we do causal mapping means identifying each and every causal connection. There is less room for someone's opinion in selecting what themes are most salient. Surprisingly, we can get good results without even a codebook of suggested themes aka causal factors, let alone bothering to train the AI or give it examples. This means that the steps from your initial research idea all the way up to (but not including) your final analyses can be quite easily automated in a transparent way. You can train an army of analysts to the coding for you manually, or you can press the AI button, or a combination of both, and either way you will get pretty similar results. There isn't so much room for the opinion of your analysts, whether human or robot, at any point in the pipeline. Of course causal mapping is not free of bias (or positionality) due to human analysts' or AI-analysts' "world-views". It just leaves less room for those biases than more general thematic coding. And of course, there are times when general thematic analysis or some other kind of QDA is really what you need. We're just saying that causal mapping might fit your need more often than you think. Finally, you can in fact use Causal Map to identify some kinds of theme, and in particular when we are doing manual coding we sometimes code where causal factors are mentioned but without any specific cause or effect, like if someone just says "unemployment has gone up". But this isn't a main focus of our work. And just to emphasise, causal mapping isn't proprietary. It's been around for 50 years. You can even do it (manually) in Excel.
- An AI to help you build a theory of change
We just built a toy web app called theorymaker3 - a very experimental service provided for free by us. Chat with an AI to build a graphical theory of change. The app contains a few example prompts to get started, which you can edit, or you can start with your own. It's a proof of concept and to start a conversation. At the moment you cannot save your work, so be sure to take screenshots of your diagrams and/or download them using the `Save SVG` button. If you want to recreate the same diagrams at a later time, you will have to copy and paste the prompts you used and keep them somewhere safe. You can export your diagram in SVG format, a lossless vector format which you can paste into most modern word processing and graphics programs including PowerPoint. However the export isn't that easy to work with, it doesn't resolve to the sort of connectors and boxes you might be used to manipulating in PowerPoint. Do NOT use theorymaker for any information which is sensitive or which identifies individuals. This app uses the GPT 4 API which is quite expensive for us, so feel free to mess around for quarter of an hour or so but let us know if you want to use it for a bigger project. How to make a diagram When you are happy with your first prompt, press Go and wait while the AI draws your diagram. You can continue to build up or correct your diagram like this by typing into the chat window. You can also tweak the formatting etc. You may find it better to first create just one part of your diagram and then add more parts in later steps. You could build up quite detailed diagrams like this - but as you can't yet save your work, you might not want to bother at this point. Tweaking the results by hand If you happen to know Graphviz syntax, you can edit the text in the result window to adjust your diagram directly. But the AI does not (yet) know what you have done so if you continue the chat, these changes will be lost. Theorymaker? Steve says: I built a site called theorymaker.info about ten years ago as a hobby project. It was my very first web app and I've since lost access to the code 😳. It's a bit flaky but people have been using it for all kinds of different projects around the world. You have to use a special syntax which is designed for the kind of text-heavy theories of change which are common in the kind of projects I was mostly involved in. Theorymaker 3 is a reimagining of the original Theorymaker.
- Large language models = intersubjectivity?
When old-school quant people criticise qualitative methods as being merely subjective rather than objective, our reply (up to 2023) has always been something like this: We don’t aim for objectivity, which is an illusion; we aim for intersubjective verifiability. In fact, scientific objectivity is only (sometimes) possible because of intersubjectivity - for example that people agree how to use measuring instruments. At its most basic, intersubjective verifiability simply means that we aim to be able to make statements which most relevant stakeholders would agree with; if necessary we also provide the steps necessary to verify them. Sometimes this means breaking down more ambitious claims into smaller steps. A good example is QDA (qualitative data analysis). No-one would claim that a qualitative report written as a summary of say a set of interviews is reproducible or objective or even intersubjectively valid; it’s always going to be, at the very least, flavoured by the analyst’s own positionality. We can make the process a bit more intersubjectively verifiable by breaking down the task and providing more detailed instructions of how to do the analysis, (though this may also limit the creativity needed to arrive at fundamentally new insights). We might not be able to aim for reproducibility (another analyst given the same instructions would arrive at the same results) but we can aim for Nachvollziehbarkeit or retrace-ability: in retrospect, someone else would at least be able to retrace the steps and agree that the results are one plausible answer to the question. Now, it's 2023 and we’re all drowning under dozens of recent posts and articles about using Large Language Models (LLMs) as a way of summarising or answering questions about one or more texts — documents, interviews etc. Whatever the pros and cons, these possibilities are revolutionising social science because they can suddenly level up qualitative research by making it almost reproducible. This isn’t just any reproducibility in the sense that somebody’s computer program will always produce the same results on the same text (but who knows how the code works, will it even work next year)? General LLMs are different because the prompt is the program: ideally, a good prompt is just the same literal natural-language instructions you would write for your postgrad assistant. It shouldn’t matter that no-one really knows how ChatGPT works any more than you care if you know how your postgrad assistant’s brain works. Ideally it should be irrelevant which assistant (postgrad, OpenAI, Bard, etc) follows the instructions, and we might expect to see the results of different platforms gradually converge. (This is assuming that we set the “temperature” of the prompt to 0 to discourage creativity — which has the downside of reducing the possibility of producing fundamentally new insights). Wait, you say, but these LLMs are created by skimreading the whole internet and basically answer the question “what would the whole of the internet say about this” (with a bit of politeness added by various sets of LLM guardrails)? And the internet is, as we know, is a reflection of our own flawed species, and a very unequal reflection at that. You’ll probably find more text on the internet about the Barbie movie than about climate disasters in the last few weeks. When applying an LLM to analysing a report or interview, you’re basically entrusting your work to the internet’s worldview. Yet the results are mostly amazing! (I’m not talking here about asking the LLM to give factually accurate references; I’m talking about the much more interesting task of asking it to follow a set of instructions.) I think our humanity shines back at us. Wittgenstein might have agreed that LLMs can, like us, do that amazing thing: following a rule, and even the spirit of the rule, in the sort of way that most of us would agree is right (just 100s of times faster and without getting tired). It’s as if the internet as hoovered up by LLMs is an embodiment of intersubjectivity. And perhaps it is, both in the epistemological sense (how do we know what is true?) but also in the social or even metaphysical sense according to Husserl and co: a collective creation, a life-world. Chatbots to an extent share in our language games. Applying ChatGPT to a research question, when done well, can be like saying: let’s do this in a shared way which everyone can agree on. 🔥 Yes, there are hundreds of caveats. Who puts what on the internet is just a reflection of our species: mostly colonial, patriarchal, greedy and exploitative. But what are you going to do except engage with these developments? Which other species are you rooting for, seriously?
- StorySurvey 4: Automated interviews at scale
StorySurvey, our app for automatically conducting qualitative interviews, has received a major update! The app now features an improved chat bot that will enable the automated surveys to be carried out more robustly and at a larger scale. Keep reading to see what it looks like and for an invitation to explore StorySurvey's possibilities for your organisation! Key changes in StorySurvey4 and how you can join in We have now separated the set-up interface and the interviewing bot, with the new chat bot being designed to be faster, more robust and scalable for larger surveys. You can play around with an example survey with this link, which is an automated interview relating to the respondent's experience of a recent work project and is designed to focus on interpersonal factors as explanations for what went well and what didn't. Some StorySurveys have a specifically causal focus, asking respondents for the reason behind each reason they give to a certain question, in order to build a fuller picture of their experience. Others, like this one, don't. When looking at the interface for setting up your interview, you will recognise a similar look and feel to StorySurvey3. You are still able to start with an existing prompt, edit it to suit your needs and get a shareable link to send to respondents. You can also still review the survey data in Causal Map, where Natural Language Processing (NLP) is used to synthesise the key words and ideas said by respondents. Join us in testing the latest version The latest iteration of StorySurvey was recently used in a live case study with a healthcare provider. The response rate was around 80% which was very encouraging, and nearly everyone completed quite a long interview, producing around a page or two of interview transcripts each. We also got quite a lot of positive feedback on the interview experience. Keep an eye out for a future blog post with further details. We are testing each layer of development of StorySurvey4 and are looking to partner with a couple of additional organisations who would like to try it out in a real case. The survey can have a specific causal angle or take a more open format. If you are interested in testing an automated interview with between 20 and 200 respondents, drop us a message at firstname.lastname@example.org!
- AI in evaluation: actually show your working!
There's been a lot of talk about using AI and in particular large language models in evaluation and specifically in coding and processing texts. Here at Causal Map we've been working very hard on just that (and on automating interviewing too, but that's another story). And we see fantastic potential. Our Causal Map app now has a beta version of that big "auto code" button we'd always dreamed of (and feared). However, I wanted to draw attention to a really big distinction which I think is important. There's a continuous spectrum between on at the one end transparent, reproducible approaches founded in social science and the other end of the spectrum black box approaches where responsibility is shifted from the evaluator to the AI. There may be use cases for the latter, "black box" kind of approach. Maybe one day doctors will abrogate all responsibility to medical AI. Maybe one day evaluators will abrogate all responsibility to evaluation AI. But here I'd like to set out reasons why right now we should prefer transparency. Black box coding is possible today in its rudiments and it's going to get a lot more accessible and powerful quite quickly. At its most extreme, you simply say to the AI 'Here's a load of documentation from a project. You tell me if the project is efficient, effective, sustainable, draw some conclusions and make recommendations according to criteria C, D and E. This is an extreme case, but the basic idea is submitting a long text and asking for a black box judgement about what themes are present and even what conclusions can be drawn. To be sure, it's possible to say to a model 'Yes and also show your working or print out some quotes or examples to backup your findings.' But it's very important to realise that this "show your working" question is spurious because AI at the current state of development has no more insight into its inner workings than does a human being has into his or hers, and probably less so. So while it can (and will) competently bullshit about what steps somebody might have taken to reach that conclusion it doesn't mean it's actually the steps that it did take. So basically, you have no way of knowing how the AI came up with a particular finding or conclusion using this approach and it's a massive abrogation of responsibility for an evaluator to sign off this kind of output without further analysis. Now at the other, "transparent" end of the spectrum, what we recommend is using AI merely to follow established procedures of manual coding and do it faster, more reliably and more reproducibly. That's a big win. The old school way: First of all, highlighting individual sections of text according to explicit rules set by the evaluator and then aggregating and combining those codings, again according to explicit rules. As an aside, we believe that even before we get into the AI possibilities, causal mapping in particular is a really good way to summarise documents and in particular sets of documents. Obviously, there is more to documents than simply the causal claims made within them, but if you had to pick a type of content an evaluator might want to extract from a document, causal claims are pretty central and the procedure for identifying, extracting and aggregating those claims are an order of magnitude more straightforward than any other kind of useful text analysis (unless you count word clouds...). In particular, causal mapping is particularly good at making summaries from sets of documents, such as semi structured interviews with comparable respondents, rather than only the special case of making one summary of just one document. It is already possible to say to an AI, 'please read this long document and draw a causal map saying what do you think are the main causal drivers and outcomes and intermediate links and just print out the specification of a diagram'. And the job's done. That's exactly the sort of approach we are warning against because you have no way of knowing how the model has reached that conclusion. When we use AI to help code a set of documents we tell it to explicitly identify causal claims and provide the relevant quote for each individual claim, following rules we give it and in each case, it's possible to look at the actual quote it identifies and check if it really is appropriate evidence for the causal claim. Just as with human coding, in the sort of way causal mapping has been carried out for 50 years or more It's been a lot of work to develop the right set of prompts (and they are still a work in progress) embedded in our app, but the prompts we use in any given case are pretty simple and transparent: around half a page of standard prompts which are pretty much the same across use cases and another half a page or so of prompts which are specific to the use case; these themselves are 90% derived in an automated way. Nevertheless, the evaluator bears 100% responsibility for overseeing these prompts, which are plain English. They can be followed by a team of postgrads or by the AI: there is no difference in principle. There is no black box and no magic, and any human can follow every step of the argumentation. At present, the AI is much faster and more reliable and transparent than a human coder; and a human coder is much better at seeing larger connections, reading between the lines and linking up the parts of a larger story. The most interesting part of causal coding with AI is to add this human inspiration back into the AI prompt in a transparent way. In order to then aggregate, synthesise and simplify the causal maps which result, we can use the many, more or less standard, causal mapping procedures which have been developed over the years and in particular our open source set of causal mapping algorithms. So an interested outsider can follow the chain of argument right away from the original text to the final conclusion. Responsibility is the issue here. If you feed data or documents into an AI and let it come up with its own conclusions, they aren't your conclusions and as an evaluator you can't sign off on them. Maybe this will change in the future as we learn to find our way around in this new world. But right now, you need to show your working. Of course the big worry in all of this is that higher-level, black-box approaches are much quicker and easier to apply, putting together black-box approaches to get from documents to findings to (evaluative) judgements in just a few clicks, given some generic definitions of evaluation criteria. Black-box approaches could be the beginning of the end of evaluation as we know it, but they'd be really tempting for a commissioner: for a purely document-based review, who'd bother with the time and expense to commission an evaluator if you can get your report written in a few minutes? With black-box approaches, people's fears about bias are really justified.
- Using QuIP and the Causal Map app in PhD research: understanding social protection
This blog was originally posted on the Bath Social Development & Research site. We are grateful for this guest post from Michelle James. Michelle is a PhD researcher specialising in refugee and asylum seeker welfare and wellbeing in the UK. She also works as an independent research consultant in the development sector. Her particular interests include partnership models of development, community empowerment and mobilisation, and behaviour change. I am currently in the final year of a PhD in Social Policy at the University of Bath looking at how different forms of social protection impact the wellbeing of UK asylum seekers. As an experienced QuIP analyst, already impressed by the benefits of the research tool, I knew that I wanted to incorporate QuIP data collection and analysis into my PhD methodology. Collecting data from a hard to reach, linguistically diverse and potentially vulnerable population during the covid pandemic was, however, far from straightforward. My anti-oppressive research approach led me to adopt research tools that I hoped could empower participants to enact agency within the project, minimise the extractive nature of data collection, while still generating the academically rigorous data I required for my PhD. I also needed to gather data that helped me understand what impact government, community and peer-led social protection was having on UK asylum seekers without asking leading questions to minimise response bias. As such, I chose to utilise two main data collection tools, supplemented by a range of additional data to triangulate my findings. Firstly, I trained asylum seeker/refugee peer interviewers to independently undertake QuIP interviews with those in their social network. The interviews asked participants what changes they had experienced in their lives since being dispersed to their current location by the government and who/what they attributed these changes to. I hoped that the peer interviewers would benefit personally from involvement in the project through gaining work experience that they could cite when applying for future employment. In addition, evidence suggests that asylum seekers fear speaking to British institutional researchers so I also hoped that participants may have more confidence to take part and provide detailed answers if speaking with a peer in their own language. Secondly, I undertook a photovoice project with ten asylum seekers/refugees who each captured a series of images to depict what made their lives easier/happy or harder/unhappy. The images were shared and discussed in depth by all participants at a follow up workshop and the photographers collaborated with me to put on an exhibition of their work in Summer 2022. Check out the online version of the exhibition here. The QuIP interview data, photovoice narrative statements and workshop transcript were uploaded to Causal Map, alongside survey data, peer researcher and photographer feedback interview data, and exhibition feedback statements. The Causal Map app allowed these different types of data to be effectively analysed in one place, uncovering themes and causal patterns through an inductive process. Although the data were consolidated into one Causal Map project file, the software made it possible to separate and interrogate different categories of data independently when creating visualisations to understand which data were useful in answering different types of questions. This resulted in the creation of a sub-group of key informants (QuIP interviewees and photovoice participants), whose data were used to look at the breadth and depth of significant of different types of social protection on wellbeing, with the remaining data incorporated only when applicable to specific research questions. Once the Causal Map visualisations were created, I incorporated them into my thesis alongside pertinent quotes and photovoice images to offer a more rounded qualitative and pictorial description of the wellbeing changes expressed by research participants and the impact of different forms of social protection. For example, the causal map visualisation to the right shows how government-based social protection was impacting the lives of asylum seekers at the time of data collection. Causal chain diagram (Right): Trace path from formal social protection, key informants, 50 most frequent links where source count > 3 In my thesis, this was accompanied by a range of photographs and quotes to drill down on the causal links expressed in the diagram. One example photograph and quote can be seen to the left and below. “It is not easy for an asylum seeker to stay in a hotel room for months. You spend many hours alone, it is very isolating. Hotels are often a long way from support services and you have no money for bus tickets to reach them.” Photo title: Modern Jail Image and text: W, asylum seeker, Afghanistan, 2022 Finally, the causal links and themes unearthed through inductive analysis using the Causal Map were considered alongside relevant theory and literature in my thesis leading to a number of policy and research recommendation for the improvement of social protection provision for UK asylum seekers. Overall, I found the Causal Map app to be particularly helpful in combining a diverse range of data sources into one place, and the simple interface allowed for effective induction analysis by breaking up a substantial dataset into small manageable pieces of text that could be considered independently. Following helpful training by the Causal Map and Bath SDR team, I was able to interrogate the data to create helpful visualisations to answer each of my main research questions. The quantitative nature of these causal maps is helpful for top-level policy discussion, while the retention of, and ease of access to, the qualitative data that underpin the diagrams is important for research transparency and to support a more qualitative and theoretical discussion of the main causal links found in the dataset.
- StorySurvey3: evaluation interviews, automated!
You start your evaluation full of enthusiasm and do your first face-to-face interviews. You learn a lot: how is the atmosphere in the office? Are staff reluctant to let you talk to project users? But it's a big project, there are potentially hundreds of people you could talk to. There are a set of questions you need to cover. Perhaps you want to trace out people's view of the causal links between an intervention and some outcomes. Wouldn't it be great if you could clone yourself and send yourself into everyone's inbox? With the latest iteration of our survey tool StorySurvey, you can do just that right now: Design an AI-driven interactive interview in any world language, share the link with your respondents and download the transcripts! Your respondents need an internet-connected device. It's free for reasonable use (up to 200 respondents). What is StorySurvey like for respondents? Here's an example survey. Try it! Notice that the link has "?survey=msc-conference" at the end of it, to direct you to a specific survey. That is the kind of link you send to a respondent. How do you use StorySurvey to design a survey? If you want to experiment with StorySurvey and design your own surveys, go straight to https://causalmap.shinyapps.io/StorySurvey3/. The way it works is you start from a script aka prompt. This is an instruction to the AI interviewer to tell it how to do the interview. It can be as simple or as complicated as you like, but basically it's just plain English (or French or any other world language). We have prepared a few example prompts. At Causal Map we are most interested in interviews which encourage respondents to tell causal stories, even printing out the individual causal links. But you can design any kind of interview. Your job is simply to copy and adapt any of the pre-prepared prompts, or create your own from scratch. Then test your survey by sending the prompt to the automatic interviewer who will then start interviewing you. Keep editing your prompt and testing again until you are satisfied. Then, you can get a link to the finished survey and send it to others to answer. Your survey link can be public or private. If it is public, the name you give your survey will be part of the link to the survey so it might be possible to guess it, but if you choose a private link, your survey URL will be very hard to guess. Your prompt is always public, so that others can adapt it to build ever better evaluation interviews. This way it would be great to build a library of evaluation interview prompts. How to get your results? You can view and test out StorySurvey, and get interviewed, without even logging in. However to save and share a survey, you need to log in with a Google account or email address. Right now, when you download your transcripts you can analyse them any way you want, but there is no "standard" way to do it. But soon, you will be able to visualise your results in Causal Map. Contact us if you need help with creating or launching a survey or visualising the results: email@example.com. Technical details StorySurvey uses GPT-4. If you've experimented with ChatGPT before, you might have noticed that the "temperature" is high, which is good for writing essays. At StorySurvey the 'temperature' is set to zero which means the conversation is more deterministic: good for social research. Chat-based interviews like this are no substitute for face-to-face key informant interviews - but they can be used to reach a much larger number of additional respondents. Obviously you can't reproduce a whole interview approach in a simple prompt! But it's interesting to try, and a simple prompt can still generate a useful survey. This site is free and experimental and we at Causal Map Ltd make no promises about what will happen to it in the future. If you want help with a survey, contact us. Privacy StorySurvey data is stored in a SQL database at Amazon RDS, which uses industry-standard security. Data is transferred using https (http over TLS). The text of the interview passes through the OpenAI servers. Data submitted through the OpenAI API is no longer used for service improvements (including model training) unless the organization opts in, which we have not. OpenAI deletes user data after 30 days. We recommend that respondents do not submit data which might identify themselves or others, and respondents have to accept this condition before proceeding with the survey. Enjoy!
- How hard is evaluation actually?
🏭 When machines replaced much manual labour, white-collar workers thought "I'm ok, my job is much harder to mechanise". 🖥 And then when computers came for clerical jobs, university-educated white-collar workers thought "I'm ok, my job is much harder to automate. I'm not just applying a template, my job is just harder, it requires actual intelligence". 🤖 Then came Large Language Models like GPT, and suddenly it turns out that large parts of many tasks which have needed university-level education are actually just the application of a template. Or applying a template to choose between templates, and then combining the results of the application of templates. And the same probably goes for large parts of entertainment and the arts. This is what Stephen Wolfram argues in this really interesting post, and I think he's probably right. ChatGPT has shaken up our hierarchy of what tasks count as hard. If you don't agree as an evaluator that a lot of your job is just the application of high-level and lower-level templates, you might at least agree that this is true of writing those accursed proposals we sweat over so much. Maybe the stuff we thought of as hard in evaluation, like selecting and applying a "method", suddenly looks easier. Whereas the stuff which has been neglected, like establishing a rapport, knowing which question to ask and when, or reading an undercurrent, does not look very much easier. Most importantly, whatever happens, it's still someone's job to say "I declare that this is the right kind of method to apply in this situation and I believe it has been applied in the right way and I vouch for these findings and these evaluative conclusions ... and just as I'd have had previously to vouch for the work done by an intern, I'm now going to vouch for the work done by some algorithms, and the selection of those algorithms". What do you think? How hard is evaluation really?
- ChatGPT is changing how we do evaluation. The view from Causal Map.
Causal mapping – the process of identifying and synthesising causal claims within documents – is about to become much more accessible to evaluators. At Causal Map Ltd, we use causal mapping to solve evaluation problems, for example to create “empirical theories of change” or to trace evidence of the impact of inputs on outcomes. The first part of causal mapping has involved human analysts doing “causal QDA”: reading interviews and reports in depth and highlighting sections where causal claims are made. This can be a rewarding but very time-consuming process. Natural Language Processing (NLP) models like ChatGPT can now do causal mapping pretty well, causally coding documents in seconds rather than days. And they are going to get much better in the coming months. 👄More voices: It is now possible to identify causal claims within dozens of documents or hundreds of interviews or thousands of questionnaire answers. We can involve far more stakeholders in key evaluation questions about what impacts what; and it is possible to work in several natural languages simultaneously. 🔁More reproducibility: To be clear: humans are still the best at causal coding, in particular at picking up on nuance and half-completed thoughts in texts. But NLP is good at reliably recognising explicit information in a way which is less subject to interpretation. 🍒More bites at the cherry: With NLP we can also do things that were practically impossible before, like saying “that’s great but let’s now recode the entire dataset using a different codebook, say from a gender perspective”. ❓Solving more evaluation questions: we hope to be able to more systematically compare causal datasets across time and between subgroups (region, gender, etc). 🤯New challenges We’re hard at work addressing the new challenges which NLP is bringing to causal coding: - Processing many large documents simultaneously. - Using existing pre-coded datasets to train models which are specialised for causal coding and/or for specific subject areas. - Developing a common grammar for causal coding, building on our existing work. For example, what to do when some claims are about an increase in income and others are about a decrease in income? - Optimising the prompts we give to the NLP models (this is not only a technical challenge but also has a substantive element: we have to explain to the machine in ordinary language what we actually mean by a causal claim or a causal link). - Grouping, labelling and aggregating similar causal factors. - After examining a coded dataset and further developing the "causal codebook", telling the NLP to completely recode the same dataset with the new codebook – something which has been prohibitively time-consuming up to now. - Developing human/NLP workflows. For example, a human codes a sample of the text and tells the NLP to “continue like this”. - Monitoring bias against specific groups and guarding against possible blind spots in identifying causal information. What we already offer at Causal Map We have developed a grammar and vocabulary for causal mapping, and a set of open-source algorithms for processing and visualising causal map databases. We help evaluators do things like this: - Trace the evidence for different causal pathways from one or more interventions to one or more outcomes. How many individual sources mentioned one or more of these paths? - Consolidate causal factors into a causal hierarchy - Examine and display differences between causal maps for different groups or different time points We see a lot of potential (as well as risks and pitfalls) in leveraging this functionality to help evaluators get more out of data which is currently more difficult to analyse - and we’d interested in sharing ideas and collaborating with others interested in exploring where we go next. ---  Actually we use the related model GPT3 via its API, as ChatGPT does not yet have its own API.