Podcast

Jana Lasser: Advancing Reproducibility and Reuse

In this episode of the Politics of Openness Podcast, Katja Mayer interviews Jana Lasser, who is professor for data analysis at the Idea Lab at the University of Graz.

Prof. Dr. Jana Lasser has a PhD in physics and her research focuses on emergent phenomena in complex social systems, employing machine learning, data science, and computational modelling. Her work spans topics such as public health measures during the COVID-19 pandemic, the spread of misinformation on social media, and the societal impacts of recommendation algorithms. Jana is also a strong advocate for scientific integrity and open science practices, emphasising the importance of transparency and reproducibility in research. She shares her experiences and motivations for embracing open science, particularly open data practices, which stem from her early encounters with the challenges of accessing scientific publications.

Jana discusses her journey from physics to computational social science and the epistemic and pragmatic reasons behind her commitment to open science. She elaborates on the concept of reproduction packages, which include comprehensive documentation and code to ensure research is reproducible and accessible. She provides an example of a recent study published in Nature Human Behavior, highlighting the importance of thorough documentation for future use and collaboration. Jana also addresses the ethical challenges in open data, especially in sensitive research areas like hate speech. She emphasises the need for responsible data management and the role of ethics review in ensuring privacy and ethical considerations. Additionally, Jana discusses the potential of large language models (LLMs) for simulating human behaviour and the importance of openness in both models and training data not only to enhance scientific transparency and reproducibility, but also to comply with scientific integrity, as well as ethical and legal frameworks.

Links: 

Personal Website Jana Lasser

Twitter

IDEA Lab Uni Graz

Network against Abuse of Power in Science

Works mentioned: 

  • Lasser, J., Herderich, A., Garland, J., Aroyehun, S. T., Garcia, D., & Galesic, M. (2023). Collective moderation of hate, toxicity, and extremity in online discussions. arXiv preprint arXiv:2303.00357. https://arxiv.org/abs/2303.00357
  • Lasser, J., Ahne, V., Heiler, G., Klimek, P., Metzler, H., Reisch, T., Sprenger, M., Thurner, S. and Sorger, J. (2020). Complexity, transparency and time pressure: practical insights into science communication in times of crisis JCOM 19(05), N01. https://doi.org/10.22323/2.19050801

Furthermore: 

PhDComics Open Access explained https://www.youtube.com/watch?v=L5rVH1KGBCY 

Transcript of this episode:

Katja Mayer
Today, we’re joined by Dr. Jana Lasser, currently Professor for Computational Social Sciences and Humanities at RWTH Aachen and Associate Faculty at the Complexity Science Hub Vienna. Welcome Jana!Starting from May 1st, you will be professor for Data Analysis at the newly founded interdisciplinary research center IDea-Lab at the University of Graz. Congratulations to the new job.

With a PhD in physics, your work dives deep into understanding emergent phenomena in complex social systems, for example, through machine learning, data science, and computational modeling. Your research spans from investigating public health measures during the COVID pandemic, the dynamics of misinformation spread on social media, to the societal impacts of recommendation algorithms. Beyond your academic pursuits, you are an advocate for scientific integrity and open science practices. Furthermore, you are a board member of the network against power abuse in science, where you are actively working towards systemic change in the scientific landscape.

https://www.netzwerk-mawi.de/en

So welcome again, Jana. Thank you for joining me today. The introduction to you was very short, so maybe you would like to give us a bit more insights into your work and your relation to open science, particularly open data practices. And maybe, you also want to share with us what open science means for you.

Motivations for Open Science

Jana Lasser

Oh, yeah. Thank you, Katja, for inviting me and for giving me the opportunity to tell you a bit about my research and also my relationship to open science. So, I first came in contact with open science when I was still an undergrad, and somebody—I don’t even remember who—told me about scientific publishing and how crazy it is that we have to pay to access our own articles. I didn’t know anything about scientific careers or how science worked. I just found it very crazy that we would write something as academics and then have to pay to get it back. It’s even more crazy that the taxpayers who pay us to do the work couldn’t even access the work in the end.

PhDComics https://www.youtube.com/watch?v=L5rVH1KGBCY Open Access explained

That kind of incited my interest, and I joined various meetups. That was still when I was in Göttingen, where I also did my PhD back then in physics. Following from that, I explored what I call the open science landscape more and more, leading me to topics of reproducibility. Not only making our results transparent and reproducible, but also the process that led us to them. Being a quantitative empirical researcher, that for me involves data—making data accessible, that involves methods—making code accessible. More recently, it also involves documentation, because even if we publish all the code and all the data, it might be in a form or a format that is just not understandable to anybody. So, that is what I’m thinking most about these days: how can we ensure that it’s not only accessible and reproducible in principle, but also in practice?

Science in public service has to be transparent and accessible

Katja Mayer
How, in a way, came this motivation to enhance reproducibility in your own research about? So how did this move towards more open science, but especially also to more reproducibility, influence your own approaches in computation, social science, especially because this is where you moved to, from physics after some time?

Jana Lasser
Yes, exactly. So, I moved to computational social science. I’m now applying all the analytical and modeling skills that I learned as a physicist to try to understand societies. My primary motivation, I think, was kind of ideology driven. I am still convinced that as scientists, we serve the public, and to serve the public, we have to be transparent and make our results accessible. I think this is also reflected in the issues we have, particularly in Austria, where trust in science is eroding. Given that background, it’s even more important that we lay open our approaches, how we come to conclusions, and are very transparent about how our findings come about. But then that was quickly also joined by purely pragmatic experience.

Organising reproducibility

So, I am a coder, and I was told very early in my studies that we have to document our code very thoroughly. Today, we might know what we were thinking with the code we are writing, but half a year from now, only God will know if we don’t document. The same applies to all aspects of research and doing science. I’ve seen time and time again that I am thankful to my past self if I document things thoroughly and create what I now call reproduction packages of my research. This involves putting all the primary data in one place, ensuring all the code works with that primary data, and making sure everything reproduces accurately.

This thorough documentation forces me to provide descriptions that enable myself to work with the data in the future, such as when reviews come or when I have to onboard a new master’s student who wants to do a spin-off project based on my research. Time and time again, it saves me a lot of time and headaches to keep working and building on my own research, as well as on other people’s research.

Katja Mayer
Could you give us maybe a concrete example of how such a reproduction package has worked out? Maybe a case, something that kind of illustrates it even more?

Jana Lasser
Sure. I think a good example is a study that we recently published last September in Nature-Human Behavior. It was about society’s conceptualization of what it means to be honest. It was a very data-driven study. We collected millions of tweets from U.S. politicians from Twitter. Then we developed a new measurement approach to measure two distinct conceptualizations of honesty: one is called belief-speaking, the other truth-seeking. Whoever’s interested in the scientific details can read the study.

Lasser, J., Aroyehun, S.T., Carrella, F. et al. From alternative conceptions of honesty to alternative facts in communications by US politicians. Nat Hum Behav 7, 2140–2151 (2023). https://doi.org/10.1038/s41562-023-01691-w  + repo with the reproduction package: https://github.com/JanaLasser/new-ontology-of-truth

We also had some involved statistics to measure our effects. This reproduction package involved giving access to the primary data to the extent it is possible because we cannot publish raw tweets. However, we did publish tweet IDs and all the derivative statistics that we used for every tweet. We also provided the data collection code, the measurement instruments that we developed, all the analysis code, and all the code necessary to produce every figure in the paper, along with a very exhaustive README. If you go through that, you can really reproduce the complete article, every single figure.

This approach has already helped me a lot in the half year since we published the study. I’ve been using it in lectures to teach. I onboarded a new master’s student who took only a few days to get into the code because it was already there and very well documented. I will probably onboard several more master students on the project soon. For me, it’s also nice to work with it now that I’ve put in all the work to prepare it in that way.

Katja Mayer
So it’s really good for the sustainability of your own research practices that you can build up a corpus of knowledge that is reusable for yourself, but also for your peers then afterwards, right?

Jana Lasser
Yeah.

Careful openness: Research integrity and Ethics

Katja Mayer
Something that you also mentioned is that your work often includes very sensitive and critical topics. I can imagine the discussion on honesty or the investigation of honesty in social media behavior will reveal that there is also a lot of hate speech and other things. I think your experiences from the COVID pandemic have especially sharpened your approaches towards critical and sensitive issues in data collection and opening up data. Is there any particular experience you would like to share, especially at this intersection of research integrity, ethics, and openness?

Jana Lasser
Yeah, it’s a very good point. For every research project, it’s a new discussion and a new negotiation, balancing aspects of openness with the privacy of the data subjects and our obligation to inform society. Thankfully, for the honesty study I mentioned, it was fairly easy because we were investigating politicians. They are of public interest, so it’s kind of expected that what they say in public will also be discussed in public.

Preprint: Lasser, J., Herderich, A., Garland, J., Aroyehun, S. T., Garcia, D., & Galesic, M. (2023). Collective moderation of hate, toxicity, and extremity in online discussions. arXiv preprint arXiv:2303.00357. https://arxiv.org/abs/2303.00357

A completely different case is a study that we are currently doing about hate speech and people who try to intervene against it. Here, we are very careful with publishing our primary data because it could lead to the identification of the people involved in these efforts, which could have detrimental effects for them. Ethics is a big aspect here, and we might not be able to publish our primary data openly. However, we will still make it available for other researchers if they sign a data protection agreement.

During COVID, we had discussions not just about publishing data, but also about publishing results and in what way. We were concerned whether our results would freak people out too much, to put it bluntly. We decided internally to publish our findings as soon as they were quality-checked and we were sure enough about them. In the end, we cannot decide what society is ready to see and what not; this is up to society. People are perfectly capable of making sense of information and making their own decisions given all the information. So, we decided to publish everything as soon as we could.

Lasser, J., Ahne, V., Heiler, G., Klimek, P., Metzler, H., Reisch, T., Sprenger, M., Thurner, S. and Sorger, J. (2020). Complexity, transparency and time pressure: practical insights into science communication in times of crisis JCOM 19(05), N01. https://doi.org/10.22323/2.19050801

It’s an ongoing discussion, and I think there is no rule that fits all application cases.

Planning responsible data (re-)use, burden or opportunity?

Katja Mayer
Do you have any experiences from that time you were just telling us about, of people that you had not thought about reusing your data or people who were using your data against specific political views or ideologies that you could not have foreseen and that were maybe troubling you or something like that? Did that happen to you?

Jana Lasser
Not really, thankfully. So far, I have not been in a position where somebody used my research for something that I wouldn’t condone. But that could still happen. I guess the best thing we can do is to think about possible cases beforehand and make sure that we have it all covered. It is also becoming more usual to go through ethics review with the type of research we do. This wasn’t the case only two or three years ago, but these days, whenever we work with data that has something to do with humans, it’s standard to go through ethics review. Many people perceive this as a hassle and an administrative burden, but I actually like this chance to sit down and think about these aspects of my research. I think it’s a really important step, and I try to take it as seriously as I can in the limited amount of time that we usually have for these things.

Katja Mayer
Yeah, that’s for sure. The review boards also have very limited time to make decisions on those really big and very important questions. One thing I would like to stay with in this regard is the question of stewardship of your own data. Of course, what you were telling us is that you take a lot of effort to document the data to keep it reusable and so on. But what about data reuse that is somehow troubling you? Have you ever thought about being such a steward of your own data that you feel still responsible after it’s out in the open and that you might follow up with the people reusing your data in a way that you thought would not fit the desired outcomes or something like that?

Jana Lasser
I mean, in the hypothetical case that happens, I can imagine doing that, definitely. I would need to know about it in the first place, but hopefully they cite my data publication because that’s how I get my data out there these days. Then I would know about it. There’s only so much I can do. I usually publish my data under a Creative Commons license, which is rather permissible. There might be instances where I just cannot really do anything about it, but I might at least talk to the involved people and see if we can find a way that conforms to what I had in mind for my data to be used.

Katja Mayer
Yeah, you know, I asked this question because there’s an ongoing debate on whether or not sometimes it’s better to close data or not share data because it could be reused in a way that it’s not intended or not good for society or whatever.

And so I wonder whether openness would actually help to trace those activities much better and then confront those activities with an alternative worldview or alternative interpretations, right? This would actually make it traceable, right? This would make the use of your data much more traceable as it is now when people are using scientific insights to make completely contrary arguments sometimes. It’s not easy for the scientists themselves to react to those things because they just don’t know about it.

Jana Lasser
For sure, yes. And I mean, by, for example, using a share-alike license, we can kind of ensure that whatever derivative comes from our work will also have to be open so we can at least then keep an eye on it. A similar argument applies here as I put forward regarding the scientific findings. Who am I to decide what is a legitimate use of the knowledge? Data, for me, is under this umbrella that I produce. So, putting it out there under a license that at least allows for public scrutiny and then, sure, probably it’s also my responsibility to follow up and track it. But I don’t think that closing the data is the right solution. Of course, given that there are no privacy concerns with the data subjects, but other than that, I would argue that openness is the only way.

Open Science Communication

Katja Mayer
The work with media and different publics that are not scientific in society is something I know you have had several experiences with, especially during COVID. There was a big rush; you had to produce results and feed them to politicians and the media. It seemed like nobody was ever satisfied; everyone was saying, “Faster, faster, we need more, we can’t make decisions. What should we decide? Science should give us the thing where we can make the decision from.” What lessons have you learned from your work in the field of public health at that time, interacting with policymakers and the media? It’s another form of openness, right? This kind of science communication was necessary at that time.

Jana Lasser
Yes, definitely. What have I learned? I have learned not to jump into high-pressure, high time-pressure projects head-on without really thinking about whether I have the resources and capacity for it because it is very intense. The particular case that was most impressive for me was the school measures strategy during COVID. We ran a simulation study over Christmas that informed Austria’s school strategy launched in February. I had only half a day of vacation over Christmas. It was a very rewarding activity because I believe that we as scientists should care about the impact our research has. This was a case where our research had a very direct impact, which was very satisfying, and I believe it acted positively.

I also learned that science journalism is really on our side as scientists. Initially, I was careful in interacting with journalists because my first contact with media was related to research policy, not research itself. When I was a PhD student in Germany and a PhD representative, we dealt with several cases of power abuse in the Max Planck Society. As a representative, I received many questions from the media. Nothing bad happened, but it was a different kind of communication. I was representing both the organization and the PhD students, and journalists were interested in the scandal, so I had to be very careful about what I said.

During COVID, I felt that science journalists were very interested, supportive, and wanted to understand. They always gave me things to double-check, and making corrections was never a problem. This experience gave me a lot of confidence and security in working with journalists, and I am very willing and happy to do interviews and try to describe my research in an understandable way. It was a really good experience for me.

Katja Mayer
When you say you try to describe the research in an understandable way, are there any specific formats that you experimented with, like visualizations, where you would say this is another type of openness to present research results and the method in a visual way?

Jana Lasser
Well, “experimented with” might be a bit of a big word. I did a few things, and some of them worked. I have a Twitter profile where I communicate new research findings, mostly to an academic audience, but sometimes they reach a broader audience. I think about how I compose a Twitter thread and sometimes create different visuals than I would for a scientific publication. However, it’s really a matter of how much time I have because I still have to do this on the side.

For the COVID school research project, I worked with a postdoc who specialized in visualizations, Johannes Sorger, and we implemented a website where our simulation results could be explored interactively. I still think it’s a very cool website, but I also think it’s not very accessible in the end.

COVID19 Prevention-Measure Explorer for Schools https://vis.csh.ac.at/covid-schools

It has a lot of information in there, and maybe given another half a year, we could make it a very, very cool tool for teachers, students, and parents to use. But in the end, we were also lacking just the temporal resources to make that happen. So, yeah, I do try different things, and some of them have worked out.

Katja Mayer
But I think it’s a very good basis for the new job that you are about to start, right? To have all these experiences. And then maybe, as your institution has the word “lab” inside, as I remember correctly, I hope this could be a great place to further engage with these strategies or try out new ways of communicating also to the public, which is somehow another dimension of reproducibility or accessibility that we should keep in mind when talking about openness, I guess.

Open Large Language Models for Simulating Human Behavior

So now maybe looking more to the things you’re doing right now and an outlook to the future. I heard that you and your team are currently using or trying out LLMs, large language models, for simulating human behavior. What are your experiences there? What is meant when you say open large language models? Maybe you can explain that briefly.

Jana Lasser
Yeah, I’ll start with that. So, everybody probably now knows ChatGPT. That’s what we call closed large language models. It’s proprietary, owned by OpenAI, and we only get access to a web interface where we can chat with it. The issue with that is we don’t really know what happens in the background. The behavior of OpenAI’s ChatGPT model changes. There are studies that investigated things like the length of the response or the accuracy of the response over time or at different points in time. And it does change, and we don’t know why. That’s an issue if we use these things for research because we can never be sure that our results actually reproduce.

But there are large language models like Mistral, a French company publishing some of their models, and there are also the Llama-type models that initially were leaked. Now, I think they’re actually publishing them out of their own will, which can be downloaded. Anybody with the right hardware can run them locally. That means the model is completely under our control. We know when it changes because it wouldn’t change if we don’t change it. That’s good for research because we can make sure that whatever happens, we know why it happens because we did it.

We’re trying to use these models to simulate human behavior or to see whether it’s feasible to simulate human behavior. These models produce very coherent-sounding text, and there are studies showing that they can reproduce human responses to survey questions. There are various ideas around using these large language models to study humans in new ways. For example, in ways that are not accessible to classical research because they might be unethical to do with real humans or deal with demographics that are not accessible to survey questions, such as inmate populations, or to conduct surveys that are excessively long.

These are interesting use cases. Right now, we are focused on validating whether we can reproduce known behavior reliably and exploring the biases in these models. There are lots of biases in the models that they picked up in the training data. We don’t know the training data for open models. There are approaches to train new foundation models with curated datasets, but these are not there yet. So that’s where we are moving right now, and it’s all moving very fast, but it’s a very exciting thing to do.

Katja Mayer
So, here, the openness on the model dimension is already given, but you most of the time don’t have access to the data that was used to train the models, right? Which I guess could also be a problem, especially when you want to focus on groups that might have been very underrepresented.

Jana Lasser
Yeah. Or we might just get caricatures of these groups because other people wrote about their conceptualizations of these underrepresented groups, and that’s what we get in the training data.

Katja Mayer
There are not yet enough open data corpora that we can use as benchmarks for other models to see whether they really work well.

The Next Frontier: Getting Better Training Data

Jana Lasser
Yes, the scale is simply not there yet. To get to the performance that models like GPT-4 have, the corpora—the number of tokens or words needed to train the model—is so vast that with hand-curated datasets where we know what is in there, we are light years away. These models, the corpora, they are just so vast that nobody really knows what exactly went in there. They probably also contain lots of copyrighted materials. So this is the other battleground that is currently happening—the whole discussion about copyright and training data for these models, and what can be used and what can’t be used.

For an initiative that approaches this in an ethical way and actually cares about what data goes in there, all these hurdles—thinking about copyright, curating the data—are just so time-consuming. But I think this needs to happen now. So I think the next frontier of model development is not getting more and more data but getting better data where we know really what is in there. Because right now, these LLMs are a bit like magical spells. We can interact with them, we can experiment with them, but sometimes they do things that we simply don’t know why. The explanation is very likely in the training data. But since we can’t interrogate that, it’s a bit magic, and that’s not very scientific. So I really hope that the development goes in that direction.

Katja Mayer
And you will try and catch up with the dynamics of these fields. It’s incredible how fast everything changes. Every week there are new interruptions, revolutions, and what else is going on. So it’s actually quite hard to follow all that, I guess.

Jana Lasser
Yeah, for sure. I wouldn’t mind a break.

Outro

Katja Mayer
So thank you, Jana, very much. I think you gave us beautiful insights into your work. What remains is that I wish you a really good start for your new job. It will be exciting for sure. We will follow you and your activities, and I wish you all the best in the sense of openness that you can keep up your wonderful work. Show us maybe new ways of even enhancing the reproduction packaging of our research. Thank you very much, Jana.

Jana Lasser
Thank you, Katja.


Dilek Fraisl: Citizen Science Generated Data

Politics of Openness Podcast with Katja Mayer and Dilek Fraisl (February 2024)

In the first episode of the new podcast “Politics of Openness” Katja Mayer and Dilek Fraisl discuss Citizen Science Generated Data and their benefits for the Sustainable Development Goals, as well as the challenges of participatory knowledge production.

Dilek Fraisl
Dilek Fraisl

Dr. Dilek Fraisl is a researcher with the Novel Data Ecosystems for Sustainability (NODES) Research Group at the IIASA Advancing Systems Analysis Program. As Managing Director of the Citizen Science Global Partnership and a Consultant to the United Nations Development Program, Dilek brings a wealth of knowledge in data ecosystems, governance, and sustainability transitions. Today, we dive deep into the world of citizen science generated data, exploring how non-scientists actively participate in research by collecting data and contributing observations. We discuss the role of citizen science in supporting SDG indicators, highlighting specific geopolitical contexts where these initiatives have made significant impacts. We also cover the process of participatory data collection, the integration of such data into systematic analyses and the routines of statistical offices, and the broader implications for policy and global sustainability goals. Join us to uncover the transformative potential of citizen science in tackling global challenges.

Links:

Transcript of the podcast episode

Katja Mayer
Welcome to “Politics of Openness,” a podcast that explores open data practices in the social sciences and related data sciences. Hosted by Dr. Katja Mayer, this series investigates both the transformative potential and the challenges of open research data amidst the digital transformation of our world. Join us as we examine the ethical, practical, and technological aspects of making research data and scientific knowledge production more accessible, interoperable and re-usable, reflecting on how these practices can enhance scientific productivity and social innovation. Questions of collective benefit, data equity, governance and all related questions of ethics, are crucial to this discussion. Through interviews with experts and practitioners in data sharing, open science, citizen science, infrastructures, digital commons and many more fields, this podcast aims to be your guide to understanding the complexities and paradoxes of openness in scientific knowledge production. In an era dominated by ubiquitous datafication, predictive analytics, and artificial intelligence, let us explore what is at stake and what actions are underway.

Today with me in the studio is Dilek Fraisl.


Dear Dilek, welcome to this podcast. I’m happy to introduce you to the audience. You are an expert in the field of sustainability and data as well as data governance. You are a researcher in the Novel Data Ecosystems for Sustainability, the NODES research group at the IASA, Advancing Systems Analysis Program. And furthermore, you are the managing director of the Citizen Science Global Partnership and a consultant to the United Nations Development Program. What I read from your CV, you have a PhD in sustainability transitions from the University of Natural Resources and Life Sciences in Vienna. And you now specialize in creating new systematic evidence and integrating diverse data sources to tackle pressing global challenges. Listeners can find a link to your profile and your publications in the podcast description.

So hello, Dilek, welcome to this podcast. As you know, this podcast is dedicated to open data politics and the multifaceted openness of knowledge production. So I guess you are the perfect guest for our focus today on citizen science, which in short can be defined as active participation of non-scientists in research. And we’ll talk about a bit about what kind of data is collected there and what is possible by collecting and kind of crowdsourcing this data collection or contributing observations by citizens to science. And in particular, we speak about the data produced in citizen science, which is a bit different from citizen-generated data in general, is it? So maybe you can start by explaining those types of data and in one context, you studied them.

Citizen Science and Citizen Science Generated Data and other Terminologies

Dilek Fraisl:
Thank you, Katja, so much. Hello and hello, everybody. First of all, I would like to thank you for inviting me to your podcast. What you’re suggesting about citizen science and citizen-generated data and the terminologies—well, that’s a very difficult topic. It’s never easy to answer for me, even though I’m in the field and doing a lot of research also related to definitions and terminologies of citizen science.

In my work, I use the term citizen science as an umbrella term to cover all diverse approaches related to public participation in knowledge production. Let me put it that way. Because there are, as I said, many different terminologies. Maybe you’ve heard of crowdsourcing or participatory mapping. You said participatory science, I guess, if I remember correctly. Citizen-generated data, you said, right? These are all different terms that are used to describe these types of public engagement in scientific research and knowledge production activities. And this is how we define it by using the umbrella term citizen science. The reason why there are so many different terminologies that exist related to similar types of activities is because they all come to existence in different contexts and in different disciplines. And they really show or reflect the peculiarities of these different fields, disciplines, or areas they are coming from.

Why citizen science in my field is because it is academia-oriented. So citizen science as a term had been first used to describe mostly environment-related public engagement in scientific research activities. And that’s the reason why we’re using this term. And for instance, citizen-generated data is a different term that you highlighted. This is mostly used in the context of development data and official statistics to describe citizens participating in issues that are relevant to them and their own communities. And this is mostly relevant to civil society-oriented data that are produced for many different reasons. Like I said, this could be addressing an issue that people and the communities are facing. Or it could be related to advocacy purposes on a particular topic to collect data, to generate data on those topics and, for instance, keep the accountable on these matters and measures.

The reason why I also said this is a huge issue that is being discussed a lot. I will give you one example. For instance, currently in the U.S., we had the Citizen Science Association, a U.S.-based Association. And they changed their name just recently to Advancing Participatory Sciences, Association for Advancing Participatory Sciences from the Citizen Science Association as the name.

Link to https://participatorysciences.org/

Because the term citizen is really very sensitive in the context of the U.S. and it may be excluding those people that may not have citizenship. And also, science could sound a little bit like excluding those bottom-up practices that I talked about, for instance, citizen-generated data in the context of civil society. So probably I confuse you more than I could explain the term.

See the following papers:

  • Cooper et al.,Inclusion in citizen science: The conundrum of rebranding.Science372,1386-1388(2021). DOI:10.1126/science.abi6487

Katja Mayer:
No, don’t worry. No, I think that is quite, it’s very interesting what you’re telling also to the audience that recently there has been a shift in also creating more sensitivity towards the language we are using.

You’re looking specifically on how the Sustainable Development Goals indicators could be supported through such citizen science initiatives. Could you maybe please elaborate a bit on these types of indicators, but also, of course, already mentioned on the geopolitical context where citizen science has had maybe the most impact and why these particular areas are more conducive to contributions from citizen science?

Dilek Fraisl:

Sure, I’m happy to. So first of all, let’s try to understand the SDGs as a framework. SDGs are the UN Sustainable Development Goals, and this framework as a whole has 17 goals inside, right? From covering diverse aspects of sustainability from environmental degradation to poverty or policy-related issues or equality or education or health.

SDGs https://sdgs.un.org/goals

The United National Sustainable Development Goals

We cannot improve what we cannot measure

So there are these 17 different SDGs, and under these 17 SDGs we’re talking about 169 targets. So each goal has several targets based on what they are trying to achieve as a global-level goal. Under these targets, as I always say within the data and statistics community, we cannot improve what we cannot measure. In order to be able to understand whether we’re reaching these goals and targets, we need to be able to measure our progress towards them. The SDGs are also a framework that was adopted in 2015 to be achieved by 2030. So we have a very concrete timeline and deadline to be able to achieve different SDGs and different targets under these SDGs. The final deadline for the overall SDG framework is 2030, but there are a lot of issues related to data availability. Even when data is available, whether this data is available for all the parts of the world is a question. When we talk about the SDGs, we’re talking about a global framework, and maybe some countries are lagging behind in the context of the developing world, for instance, this is the case.

The current ways of measuring progress towards the SDGs are done through official data sources or traditional sources of data. These are, for instance, censuses or household surveys, official surveys, and so on. But there are a lot of issues related to these data sources. For instance, if you want to do a household survey related to education, it costs a lot of money to the country, or they are supported usually through the aid organizations, but it’s a lot of money. Sometimes they are out of date because it’s expensive. It’s not so often that you can organize and do these activities. So a lot of development-related policies may be made in many situations based on the data that were produced five to ten years ago. There are examples of this, but we need timely, accurate, and comprehensive data to be able to achieve the SDGs and their targets. We suggested in our work citizen science data can be one of the ways to actually measure progress towards the sustainable development goals and their targets. So we’ve done a study to understand the potential of citizen science data for measuring progress toward the SDGs.

Fraisl, D, Campbell, J, See, L, Wehn, U, Wardlaw, J, Gold, M, Moorthy, I, Arias, R, Piera, J, Oliver, JL, Masó, J, Penker, M and Fritz, S. 2020. Mapping citizen science contributions to the UN sustainable development goals. Sustainability Science, 15(6): 1735–1751. DOI: https://doi.org/10.1007/s11625-020-00833-7

See the whole special issue here: https://theoryandpractice.citizenscienceassociation.org/collections/contributions-of-citizen-science

This was one of the first scientific papers, peer-reviewed publications that came out of the topic which created quite a bit of attention within the UN and official statistics communities among the government agencies and within the scientific community as well. We were able to link through evidence that the data produced through these initiatives can actually contribute to the SDGs. We looked at these 169 targets, and there are 231 unique SDG indicators. The number is huge, and we found that 33 percent of these SDG indicators can be supported through citizen science data. The types of SDGs that could benefit from citizen science data the most are usually on environmental aspects or environmental topics, and that also includes health topics as well.

Health SDG is one of the SDGs that could be monitored or supported through citizen science data the most. We also published on that and we looked at the health and well-being related SDG indicators in the SDG framework. We found that 85 percent of health and well-being related SDGs or SDG indicators can be supported through citizen science data.

Fraisl D, See L, Estevez D, Tomaska N, MacFeely S. Citizen science for monitoring the health and well-being related Sustainable Development Goals and the World Health Organization’s Triple Billion Targets. Front Public Health. 2023 Aug 9;11:1202188. doi: 10.3389/fpubh.2023.1202188. PMID: 37637808; PMCID: PMC10450341. https://www.frontiersin.org/journals/public-health/articles/10.3389/fpubh.2023.1202188/full

If you look at your own SDG framework, we found that at that time, based on existing citizen science projects and data sets and initiatives, 33 percent of those SDG indicators can be supported through citizen science data. That’s very impressive. Very impressive. That’s a huge number. These are all the issues covering, for instance, air quality, public health-related topics such as malaria disease through mosquito-borne diseases monitoring projects. As I said, there are a lot of air quality projects that are also taking place in Europe and a lot of other projects that are related to marine litter and plastics, which is a huge topic as well. SDG 14 is dealing with that as well. I’m talking particularly about that because I could provide a very concrete example of how these data can be useful for monitoring the SDGs but also informing policy decisions and policy actions.

Van Brussel, S., & Huyse, H. (2019). Citizen science on speed? Realising the triple objective of scientific rigour, policy influence and deep citizen engagement in a large-scale citizen science project on ambient air quality in Antwerp. Journal of Environmental Planning and Management62(3), 534–551. https://doi.org/10.1080/09640568.2018.1428183

Fraisl, D., See, L., Bowers, R. et al. The contributions of citizen science to SDG monitoring and reporting on marine plastics. Sustain Sci 18, 2629–2647 (2023). https://doi.org/10.1007/s11625-023-01402-4

Empowering global goals: enhancing monitoring and targeting of SDGs with citizen science data

I would like to mention a very quick thing. The examples that I provided, for instance, air quality, water quality, biodiversity monitoring related SDG indicators, marine aeroplastics related SDG indicators, but there are also a lot of other social aspects in the SDG indicators that could be supported through citizen science data. We know that the national statistical offices, as examples, are working on technologies that collect data through citizen science on gender-based violence, and they have been using these data for their monitoring activities and for their SDG reporting. As well as now, I’m involved in a project through my UNDP work, which is about monitoring citizen satisfaction with public services. These are all also indicators that are amenable to citizen science approaches, and we can certainly use the potential of citizen science to leverage data and to close these data gaps, as well as to help inform policies and reform policies related to these important social topics as well.But going back to the marine litter example, we published this study that found that 33 percent of the SDG indicators can be monitored through citizen science data, but we didn’t want this to be only a scientific paper and publish and perish approach.

Paper here: https://link.springer.com/article/10.1007/s11625-020-00833-7

We wanted to test the results of this and see whether this is actually really working not only in theory but also in practice. So we partnered up with UNEP, the UN Environment Programme, as well as the Ghana Statistical Service, which is the national statistical office of Ghana, to basically see whether we can use already existing data in Ghana from citizen science initiatives and whether we can leverage these data for the official statistics of Ghana.

UN Environment Programme https://www.unep.org/

National Statistical Office Ghana https://www.statsghana.gov.gh/

Pioneering Change: https://www.undp.org/policy-centre/governance/news/pioneering-change-ghanas-bold-leap-citizen-science-public-service-satisfaction

Listening to local needs and concerns

There has been no data in the country that is official about the marine plastics issue because a plastics issue or marine litter issue is very, very difficult to measure, especially using only traditional ways of data collection. So we partnered up with the Ghana Statistical Service. But the first thing we had done before we actually identified the marine litter as an issue or as the citizen science initiative, we gathered with the Ghana Statistical Service at the IIASA and we actually asked them what are your priorities as a country in this SDG framework. So even though the SDG framework is a global one, countries report on their progress, which is then being used as global comparisons. So we asked them what are their sustainability priorities and needs. If we look at the sustainable development goals, they suggested that marine litter is one of the biggest problems in the country, and they have no official data to actually back any kind of policy up in the country. And they are producing already, they are in the process of producing an integrated marine management policy in Ghana. So they thought the results of this work, if successful, could actually feed into that integrated marine and coastal management policy. Then after we talked about this and after we thought this would be an area to explore, the statistical service invited the Environmental Protection Agency under the Ministry of the Environment, and we continued our discussions focusing on this topic. This was the first step, basically asking the country their needs for policy and their needs for monitoring. Then we looked at the already existing data and initiatives, and we used the existing platforms. For instance, we used the TIDES database.

TIDES database: https://www.coastalcleanupdata.org/

TIDES is a database that is created or initiated by the Ocean Conservancy International Coastal Cleanup, which is a huge global initiative that organizes cleanups, coastal cleanups all over the world. They have millions of volunteers contributing data, and they have been doing this for over 30 years in many different parts of the world. Ghana has been one of the countries where the data or the methodology from this initiative that sits in the US was used to collect data on beach litter by the civil society organizations and citizen science networks in the country. This data is all open, publicly available. You can have a look at that, download it in the TIDES database. We looked at it, and along with the Earth Challenge Marine Data Integration platform, which is a platform that gathers data from three big citizen science initiatives on marine litter, global ones that are taking place in different parts of the world.

Earth Challenge Integrated Data: Plastic Pollution (MLW, MDMAP, ICC) 2015-2018
https://globalearthchallenge.earthday.org/datasets/EC2020::data-earth-challenge-integrated-data-plastic-pollution-mlw-mdmap-icc-2015-2018/about

Dataset: https://globalearthchallenge.earthday.org/datasets/EC2020::data-earth-challenge-integrated-data-plastic-pollution-mlw-mdmap-icc-2015-2018/explore?location=0.001263%2C2.357206%2C1.00

We looked at those and saw that there’s sufficient data that could be representative of Ghana’s coastline in the country. Then the Ghana Statistical Service and the Environmental Protection Agency brought together in-country partners within Ghana that included universities, civil society organizations that are working on environmental issues, particularly on marine litter, and that have been collecting data on marine litter, that have been organizing beach cleanups in the country, and then gathering these data and submitting these data to the TIDES database, and then that was put up online by this initiative, representatives in the US. We looked at the data, and it was really promising. We organized meetings with the civil society organizations as well as the US-based citizen science beach cleanup initiative representatives.

Ocean Conservancy: Coastal Cleanup
https://oceanconservancy.org/trash-free-seas/international-coastal-cleanup/

Further information: The Ocean Cleanup
https://theoceancleanup.com/research/citizen-science/

They had been describing the methodology in detail to the national statistical office representatives or the government representatives in Ghana because citizen science is a new approach to those communities. It’s a very novel approach. They want to understand the ins and outs before they could actually validate the data and use it as official data for the country. Then civil society organizations attended those meetings to explain how they were implementing this methodology that was developed by the scientists in the US, how they were implementing it on the ground. We discussed a lot of the challenges of data collection. We discussed a lot about the limitations of the datasets produced through this initiative, and we looked at how we could overcome them. Then the Ghana country partners and stakeholders also invited some of the UN agencies in the country. We brought together global stakeholders from the IIASA side, from the research community, as well as the UN community, and we went through this data validation process. As a result of this, Ghana government agencies have agreed and decided that these data can be used as official data in the country to show the marine litter issue or part of the extent of the marine litter issue, at least covering the beach litter in the country. They used these data to report on the SDGs global database that they actually used this data or submitted this data as country validated data. They used this and presented this in their voluntary national review to the high-level political forum on the SDG monitoring, a forum on the official forum of the UN related to SDG monitoring.

High-level Political Forum on Sustainable Development https://hlpf.un.org/

They are actually also using these data to inform this policy on integrated marine management and marine and coastal management policy in the country. But what is really important here for me is not only the researcher that is working in the field of citizen science, but also a citizen science advocate on top of it. We really managed to leverage this very local level data because a lot of the citizen science projects are taking place at a very local level with the national and global monitoring processes. That was really a very important outcome of this, that the data from these civil society organizations are actually used and being used to act on this very important environmental issue related to marine plastics.

Katja Mayer:
That’s impressive, really impressive. I mean, from your story, I think we can understand how much also diplomatic capacities must be in place here. So it’s basically data diplomacy in this story that you have told us. But in addition, I think what really is important when I listen to your experiences is also that there are already some open infrastructures available that make this kind of leveraging this global and local levels possible in the end, right?

So that’s very true.

The importance of Open Infrastructures

Sometimes we tend to overlook the importance of infrastructures in that regard, without which some kind of data diplomacy would not be possible at all. But a very impressive story. I will link, of course, to all the papers that you referred to in the description of the podcast, and our listeners are invited to click through them. I think they’ll find very interesting new sources of inspiration by looking through them.

I want to come back to one aspect that you have mentioned: the skepticism also by the official statistics community and the agencies using these statistics about the quality of the data. So I think there is one paper which is called “Citizen Science: What’s in it for the Official Statistics Community?” that you and your co-authors wrote.

Proden, E., Fraisl, D. and See, L., 2023. Citizen Science: What is in it for the Official Statistics Community?. Citizen Science: Theory and Practice, 8(1), p.35. DOI: https://doi.org/10.5334/cstp.584

Where you identify a gap in our understanding of the impact of citizen science data and the lack of awareness even for this data by the official statistics community. As you said, I mean, in many countries citizen science is a very new thing. Can you share with us maybe even more insights from your research on the readiness of national data ecosystems and their statisticians as well to embrace citizen science data?

Citizen Science: What is in it for the official statistics community?

Dilek Fraisl:
Yeah, this is one of the very important nuances actually that we need to be aware of in my experience. So it’s not only about lack of trust in citizen science data or data quality. It’s actually about people not knowing exactly what it is. So they think or they look at these citizen science data, they are coming from the perspective of the national statistical offices. They usually think about the civil society organizations that are collecting data for advocacy purposes, and this could be risk and the impartiality aspects related to the fundamental UN Fundamental Principles of Official Statistics.

UN Fundamental Principles of Official Statistics https://unece.org/statistics/FPOS

So they want to make sure that the data are impartial, and if you’re collecting data for advocacy purposes, this could create a lot of questions. But I guess what’s important to understand is to be able to communicate and document the methodologies and a lot of the things that you’re doing as part of a project to these national statistical offices. I actually didn’t get a chance to mention that as a result of this marine litter work, what we had done was to write a scientific paper.

Campbell, J. et al. ‘The Role of Combining National Official Statistics with Global Monitoring to Close the Data Gaps in the Environmental SDGs’. 1 Jan. 2020 : 443 – 453. https://content.iospress.com/articles/statistical-journal-of-the-iaos/sji200648

And explain the process that we went through that I just basically told you in a very simplistic way without going into the details of the data analysis and what has been done and what we looked into details. But we wrote this process as well as our results. We documented all of them, and we also documented them in a way that is not only understandable for the scientific community but also understandable by the government agencies, the national statistical offices, or policymakers that are not necessarily experts on using these kinds of data sources, especially citizen science.

So this is really important because Ghana was the first country that had used citizen science data for official statistics, especially in the context of marine litter. But this could be adapted to many different SDG indicators as well on the topics that I mentioned. So we want other countries to follow this methodology and approach that we went through so that they don’t have to start from scratch. So it’s really important to be able to document everything in a way that is actually highlighting not only the opportunities or the success of the results but also what were the challenges, what were the limitations of these types of data, what limitations we faced in our dataset, and what we have done to actually address those limitations.

We published this as an annex to the paper in the form of checklists and basically different easy, simple tables that any national statistical office can take and understand in the form of a toolkit. So it’s easier to basically understand and implement this approach in another context or adapt it to another country. So this is really important. A lot of the skepticism about citizen science data in the context of official statistics may be related to the, or is related to, according to the findings of this paper that you were referring to, the lack of knowledge about this type of data because they are very new to the citizen science data world or citizen science is a new method to them.

They don’t really know how it works, they don’t really know what kind of methodologies are being used, and they just basically, because they don’t understand the details, they tend to frame the whole field as not good quality. But we need to understand that we need to look at each citizen science initiative in its own context, and we can’t basically judge the whole field as good quality data or bad quality data, but instead look at the particular practices of a particular citizen science initiative. And this could only be done through proper documentation.

Katja Mayer:
I totally understand what you’re saying. Also, in the context of the bigger question of what benefits are there for the citizens or the participants in such projects, and of course, transparency and understandable documentation should be one of the highest priorities in such participatory settings. However, we also have to say that in academia, exactly this kind of publications are not incentivized very much, right? So these other publications where you don’t get a lot of merit because they are not normally, as you said, I mean, sometimes you have to kind of do it as an annex to a scientific paper in fact to get the credit for this kind of work at all, right? Because there is no publication outlets or formats that would help you to get the right credit for this really huge effort also to translate it into another language, to document it, and to provide this kind of transparency that’s needed in order for the method to become mobile and useful to many other stakeholders as well. So I think this is a really important point.

Next priorities for the intergration of citizen science into global monitoring framework?

I have one last question to you if I may. Based on what you have told us, for me the question is also, and also of course besides all the limitations and challenges in terms of how important it is to document and make transparent the processes, what do you see? Do you have any other key challenges or opportunities in mind, or maybe even also priorities for the further integration of citizen science into the global framework for monitoring and implementing the SDGs? Or maybe even what kind of institutions are needed to empower these kinds of processes, these kinds of participation? Maybe you want to give us a short outlook of what would be the priorities here in the next years?

Dilek Fraisl:
Sure, and a lot of the things that made this work possible actually was a project funded by the European Commission that was called WeObserve.

We Observe Project https://www.weobserve.eu/

Because as part of the WeObserve project, what we had done was to create communities of practice. So the issue of not being able to link citizen science data into the monitoring and achievement of multilateral agreements and international development frameworks is because these communities are unfortunately, or have unfortunately, been working in silos.

Bringing together the silos and creating communities of practice

There is, for instance, a big citizen science researcher and practitioner community, and there is the UN and official statistics community, but they haven’t been really working with each other or communicating with each other. As part of this community of practice, what we managed to do was to bring these diverse communities together. So we had members, anybody—it was an open call—anybody who was interested could participate, was able to participate in the community of practice activities, which we produced these scientific papers and many of them, some of them, as I mentioned. And basically, we managed to bring these diverse communities together, as well as the civil society, as well as the policymakers, as well as national statistical offices and academia, to discuss what are the issues and what can we do right.

So these kinds of communities of practice that are particularly focusing on these issues are really, really helpful. And as part of another project again, because WeObserve has already finished, but the impact of this community of practice is still growing in science through the scientific work that we do, but also in society through addressing these official monitoring and data gaps in the SDG framework, which helps actually eventually to achieve the SDGs. And so we are now funded, we now have another project that is called Urban Relief.

https://urbanreleaf.eu/

Again funded by the Horizon Europe program, and as part of this program, we are recreating this community of practice that we built with WeObserve. So that could be an interesting place for your listeners to have a look at, and we will launch an open call rather sooner than later, in the next month or so, again making an open call for anyone who would like to participate in that SDG-related aspect.

To get involved: https://urbanreleaf.eu/get-involved/

But this time focusing on the urban context. Anyone can participate and become part of that big group leveraging the networks that were created with WeObserve. So that is one thing. The other thing, as you mentioned at the start while you were introducing me, I’m working at the Citizen Science Global Partnership.

https://citizenscienceglobal.org/

And I’m the Managing Director of the Citizen Science Global Partnership, which I will call CSGP from now on. CSGP is a network of networks that aims to advance citizen science for a sustainable world, and it is hosted at IIASA where I work. One of the foundational goals of the CSGP is really to get the word out about citizen science in terms of data and its potential, not only for addressing the data gaps or producing new data on issues that are important to citizens and communities, but also in terms of its potential impact on society.

So we want to leverage this aspect of citizen science and work by working on these SDG frameworks or other frameworks that are very important for addressing societal challenges that we’re all facing today. So CSGP is a new organization, but it’s growing, and we will soon launch an open call to everyone who would like to participate in the activities of the Citizen Science Global Partnership.

For the open call to come, watch https://citizenscienceglobal.org/

Which will be a bridge between the global citizen science community as well as the UN and global-level communities. Because a colleague of mine from the WHO, who is the Director of Data and Analytics at the WHO, was giving a talk that I invited in a citizen science session that I organized, and I invited them. And then somebody raised in one of the audience, one of the participants raised a question about the UN not caring about our data. And he said very nicely, it’s not that we don’t care about it, we don’t know much about it. And when I want to talk to citizen science, who do I call? Is there a phone number? We are well organized, for instance, as a citizen science community, but we all have diverse associations in different parts of the world. But Citizen Science Global Partnership can really bring together these diverse associations and networks to be the global voice for citizen science to talk to these global-level initiatives while also highlighting the importance of the local in citizen science.

Katja Mayer:
Oh yeah, it’s very good that you mentioned this. I think we will also, of course, share the open calls that you referred to and also, I think, exactly this overcoming the silos, bridging the boundaries of different communities will for sure be of utmost importance to further the, yeah, and also strengthen all the opportunities that come with more data and helping to maybe also refine some of the indicators of the SDG framework.

Yeah, so it’s really, I think it’s a fascinating moment here, a beautiful story also in these times of crisis, which is normally the narrative that goes around that we have a lot of options to participate in knowing more about the world and also bringing attention to what kind of knowledge is where and also to bridge exactly these islands of knowledge sometimes. So I thank you very much, Dilek.

I think it was fascinating. You gave us a huge complicated topic in very, let’s say, good and easy to understand terminology. Thank you also for that. I know how complex it is, but I think our listeners will now have a better picture of what the issues are and what can be done and where also at the moment the opportunities are, where it’s important to look into. Thank you very much for being here.

We will share the links, and also the contacts to the Citizen Science Global Partnership, of course. And if you have any questions, we are happy if you contact us. You also find the email address in the description of the podcast. So Dilek, I wish you all the best for your continuing work, for the open calls, for the many diplomatic efforts that are still there for you to be mastered and a lot of energy to do all that in the next coming years.

Thank you very much.

Dilek Fraisl:
Thank you so much. This was really interesting and a very nice conversation.

Contact:
Dilek Fraisl: info@globalcitizenscience.org