A reality check on research reproducibility in Open Science student’s projects

Open Science as the foundation of transparent and reproducible science is increasingly being incorporated into curricula. We argue that Open Science education is predestined not to be taught in classical lectures


Motivation
Open Science is the founda�on of transparent and reproducible science.¹ Scien�fic studies should be described and conducted in a way that enables others to understand and reproduce them. It is not easy to judge at a first, second or even a third glance whether a study has been designed and described in such a way. Simply finding the associated data and code is not necessarily enough to make a reproduc-�on possible. The next challenge is to design and describe one's own studies in a way that makes them reproducible.² Both perspec�ves, i.e. reproducing a study and crea�ng a reproducibly study, are important. So far, reproducibility educa�on tends to focus on researchers and not students (McAleer et al. 2022). The module "Open Science" in the Master's programme "Digital Science" at the Cologne University of Applied Sciences challenges that and considers both perspec�ves.³ The module is a combina�on of input and discussions centred around student projects. From the different concepts of reproducibility and replicability, to research data management and the development of research so�ware, to the selec�on of licences and the communica�on of one's own results, everything along the way is covered. For this purpose, students reproduce self-selected studies and navigate their way through all the challenges, to either success or failure. Regardless of the outcome, this is a valuable and rewarding process, for students as well as the lecturers and the scien�fic community, and offers a long-las�ng learning experience.
In this ar�cle, we introduce the concept of the "Open Science" module and one of the students' projects from winter semester 2022/2023. The chapter on the teaching concept is wri�en by the two lecturers (Claudia Frick and Mirjam Blümm) and the chapter on the students' project is wri�en by the two students who conducted the reproducibility study (Natasha Randall and Berrak Küçük) amended by a statement of one of the authors of the original study (Drew Bailey).

Open Science teaching concept
The learning objec�ves of the "Open Science" module include the ability to lead a scien�fic discourse about Open Science, apply related tools and services, process and provide research data as well as understand and reproduce case studies. In terms of Bloom's taxonomy, students should achieve all cogni�ve domain levels as far as possible (Bloom et al. 1956). The course followed the "flipped classroom" principle (Kirch 2016) and was based on three main components: methodological  (Chiarelli et al. 2021, p. 10-11). A study is replicable if its results can be confirmed using the same method but new data, and is robust if its results can be confirmed using a new method but same data (Eickhoff 2020, 20:11-23:58). We acknowledge the many other possible definitions and discussions (Plesser 2018, van de Sandt 2019). 2 Even literature search can be described reproducibly (Booth et al. 2016 and technical content, classroom interac�on, and prac�cal student projects. In prac-�ce, the students prepared materials provided in advance via the learning pla�orm Ilias 4 and in the first half (90 minutes) of the on-site lessons, essen�al content was picked up and jointly developed with the help of an interac�ve whiteboard. The second half of the lessons were reserved for group work on the student projects. Fig. 1 shows an example unit on research data management (RDM) on the interac�ve whiteboard. A�er watching a short video about the difficul�es of data reuse, students discussed use cases and compiled arguments for RDM, which they recorded with s�cky notes on the whiteboard. We then discussed the different steps of RDM based on the research life cycle, and defined the overall tasks. As students had read the "Prac�cal Guide to the Interna�onal Alignment of Research Data Management" (Science Europe 2021) in prepara�on for the lesson, they already knew that a data management plan (DMP) is a key instrument for RDM. In class, the use and applica�on of DMPs was reflected on, and its elements recorded on the whiteboard. In the subsequent group work, the students considered the importance of RDM for their data, and how they could use its elements for their projects.
In this spirit, we designed nine units for the module; the first unit contained a general introduc�on to the topics covered, and the group work started with the search for possible studies to reproduce. For this, research tools and methods were discussed in class.  focused on the FAIR Principles (Wilkinson et al. 2016), data and metadata formats, as well as legal and ethical aspects. The student groups reported on their progress and con�nued to work on their reproducibility study and documenta�on.
The second Research Data Day in North Rhine Westphalia in 2022 5 was integrated into the schedule as the fi�h unit. Students were encouraged to par�cipate in the online programme, offering insight into various current FDM projects. 6 In the sixth unit each group presented an overview of the study they had chosen to reproduce, including the topic, research ques�on, method, data, code, and result. They For the student projects, three groups were formed and three different studies chosen (Ariyo et al. 2014, Xu et al. 2022, Li et al. 2014). One of the biggest challenges from our perspec�ve was dealing with the different levels of uncertainty: will students iden�fy original studies that are at least theore�cally reproducible? Will the methodology and resources (e.g., data, code) be both available and understandable? Will students be able to reproduce the study? How will we handle communica�on if inconsistencies in the original studies arise? Despite various difficul�es (such as missing metadata and outdated so�ware), the students were able to reproduce their chosen studies to a large extent. A par�cularly good example, especially in terms of Open Science, is described in more detail in the next sec�on.

Student project
Our first criterion when choosing a study for our project was to look for papers with publicly available datasets, as these were not so easy to find; many studies either do not publish their data, or the data format is not very accessible. Another criterion was that the study should be understandable even without specific domain knowledge. We conducted a search through DataCite 7 and discovered the study "Wo- Aggression" (Li et al. 2014). The paper was published in PLOS One, and the raw data (Li et al. 2015) was stored on the open-access repository Dryad 8 and licensed under a CC0 1.0 licence, allowing for re-usability. We found the paper interes�ng, and it had been cited 26 �mes as of April 2023, according to PLOS One. We therefore decided to a�empt to reproduce this study.
In the original experiment, 20 photographs of men's faces were graphically transformed into a pair of feminised and masculinised versions, and the 331 female par�cipants chose which face they preferred. They were then shown their assigned group's priming images: of either male-on-female aggression (e.g., domes�c violence), male-on-male aggression (e.g., boxing), male intergroup aggression (e.g., soldiers), neutral (e.g., reading a book), or pathogen (e.g., dirty toilet) images. The par�cipants once again selected their preferred face. A linear random intercept model with a logis�c response variable extracted the variance in the women's masculinity preferences, caused by the different priming groups. The study concluded that regardless of the priming images shown, par�cipants tended to prefer the masculinised face, and this preference increased over �me. The main finding of the study was a significant interac�on (p=0.011) between �me and the male-on-female aggression priming group; the preference for masculinity of the par�cipants in the male-on-female aggression priming group had not increased over �me. The study's authors had provided an email address with the paper, allowing us to contact them and state our inten�ons to a�empt to reproduce their study. The immediate response was extremely posi�ve, expressing interest in the outcome of our project, and a willingness to help in any way they could. However, at the �me of our project the original paper had been published over 8 years ago, thus the authors struggled to recall many of the details of the study, hindered by a lack of documenta�on and metadata. Author Drew Bailey provided us with all of the available, addi�onal files, including the original R code 9 used for the analysis. The code u�lised now-obsolete func�ons and contained few comments; it was clear that the code had not been wri�en for reuse by third par�es, or for long-term maintenance. It quickly became evident to us, that even with a full dataset and original code available, reproduc�on of a study is very difficult without also having a comprehensive descrip�on of the study's methodologies.
We therefore strived to apply the open prac�ces we had been taught in the Open Science course when carrying out our reproduc�on work. We created a data management plan, organised the files into structured directories, and created addi�onal descrip�ve metadata. We considered FAIR principles (Wilkinson et al. 2016) : 10.15460/apimagazin.2023.4.2.144 GitHub in heavily commented Jupyter notebooks, so that our reproduc�on work could be easily understood and in turn replicated by others. 10 Through our work, we demonstrated the reproducibility of the paper, as we were able to successfully recreate the study's main sta�s�cal model, and replicate the published results. We also constructed some of our own alterna�ve models, to test for the robustness of the study; these also supported the conclusions of the original paper. Our results seemingly contributed to the body of successfully replicated studies, tackling the replica�on crisis in science (Baker 2016) -but reproducibility alone does not necessarily tell the whole story. When exploring the original dataset, we discovered that the data for 40 of the par�cipants had accidentally been duplicated from other par�cipants in the study. From reading a comment in the R code and following through its implica�ons, we also realised that one of the face image pairs was miscoded, with the wrong side assigned as masculinised; hence the results rela�ng to that par�cular image had been incorrectly inverted. Our next step was therefore to correct these errors -removing the 40 duplicated par�cipants from the dataset, and inver�ng the wrongly assigned image data. We then recreated the study's main model with the fixed dataset. We were pleased to find that the conclusions of the paper remained valid, and the main finding was even strengthened as a result of the correc�on; the original p-value for the male-on-female priming condi�on interac�on with �me, of p=0.011, was now reduced to p=0.005.  journal, referencing the findings of our project work. We also published the correc�on on PsyArXiv (Randall et al. 2023) and linked to it in a comment on the original study. This was endorsed in a responding comment by Drew Bailey, who con�nued to work with and support us throughout the process; his perspec�ve is described in the following sec�on.

Original author's perspective
When I received an email from Natasha and Berrak reques�ng some data and clarifica�on from our 2014 study on masculinity preferences, I experienced mixed emo�ons. On the one hand, it is fla�ering to learn that students are interested enough in one's work to take the �me to download the data and reproduce it. On the other hand, I admit the idea that an independent team is reexamining one's previously published work is also a li�le scary.
Yet, the only appropriate response for me was to offer them my fullest support. I opened the R code I had wri�en for the project and became disappointed, for it contained li�le documenta�on at all. This was one of the first projects -I was a Ph.D.
student and the reproducibility movement in psychology was unfolding between the �me we designed the study (2011), analysed the data (some�me between 2012 and 2013), and published the paper (2014) -for which we had published our data. We took some pride in our transparency then, but I had no formal training in Open Science prac�ces. In hindsight, it seems silly to think that anyone would ever publish raw data without clearly commented code, but I had not performed a reanalysis of data from a previously published paper at the �me, and did not sufficiently consider what kind of informa�on would necessitate such an undertaking.
I let my co-authors know right away about the students' project. Fortunately, they were very suppor�ve. When the students found two clear errors -duplicated par�cipants that should have been removed when data were pulled at the end of the study, and a miscoded item -I verified them in my data and code, reran the analysis, and informed my co-authors about the errors, taking full credit for them. I was, again, disappointed, but took some solace in finding the results were largely unchanged. I think a reason for this is that the results reported in the paper were not selected on sta�s�cal significance; indeed, we reported mostly null es�mates in the paper -as I recall, it was rejected at another journal par�ally because of this. If we had, then errors would be correlated on average with sta�s�cal significance, and correc�ons would have been more likely to invalidate our findings. To be clear, this is not a defence of sloppy data management prac�ces, but it is yet another reason that selec�ng which es�mates to report prior to seeing them is good prac�ce.
Finally, I felt responsible to make sure the correc�on was published. I encouraged Natasha and Berrak to reproduce the model results with the corrected dataset, and Despite intellectual, technical, and structural innova�ons designed to facilitate replica�on and reproducibility, these tasks remain �me-consuming and difficult: even with my full coopera�on, it took approximately five months since I received Natasha and Berrak's first email for them to post their correc�on online. Although I view structural factors as important for the past and future success of Open Science (e.g., Freese, Peterson 2018), social interac�ons have and will con�nue to play an important role in the prolifera�on of Open Science prac�ces (e.g., Janz, Freese 2021). Scien�sts, par�cularly those who received most of their training prior to the Open Science movement, should take responsibility to minimise the s�ll-present social barriers to replicability and reproducibility.

Discussion
Open Science educa�on is predes�ned not to be taught in classical lectures and seminars, but to be experienced first-hand in student projects. Two reasons for this are well illustrated by our case study. Firstly, we live in a world where reproduc�on of published results is s�ll all too rare. Therefore, student projects a�emp�ng to reproduce published studies contribute to closing this gap, which is mo�va�ng for the students as well as the lecturers. Moreover, it enables students to connect with published researchers, and challenges everyone involved to communicate about possible correc�ons. Secondly, trying to reproduce a published study is a very vivid and long-las�ng learning experience, because one experiences first-hand what a study needs, to have even a chance to be reproduced in the first place. Addi�onally, published studies are self-contained units which naturally define individual student projects, and discussing their suitability at the beginning and during the module certainly brings to light a lot of Open Science topics. All in all, this approach can be carried out successfully, but it requires a lot of flexibility and commitment from teachers, students and the scien�fic community as a whole.