The Replicability Crisis and Public Trust in Psychological Science

Replication failures of past findings in several scientific disciplines, including psychology, medicine, and experimental economics, have created a ‘crisis of confidence’ among scientists. Psychological science has been at the forefront of tackling these issues, with discussions about replication failures and scientific self-criticisms of questionable research practices (QRPs) increasingly taking place in public forums. How this replicability crisis impacts the public’s trust is a question yet to be answered by research. Whereas some researchers believe that the public’s trust will be positively impacted or maintained, others believe trust will be diminished. Because it is our field of expertise, we focus on trust in psychological science. We performed a study testing how public trust in past and future psychological research would be impacted by being informed about i) replication failures, ii) replication failures and criticisms of QRPs, and iii) replication failures, criticisms of QRPs, and proposed reforms. Results from a mostly European sample (N = 1129) showed that, compared to a control group, whereas trust in past research was reduced when people were informed about the aspects of the replication crisis, trust in future research was maintained except when they were also informed about proposed reforms. Potential explanations are discussed.

Science is generally considered in high esteem, and the public trusts science and places a high level of confidence in scientists (e.g., Jonge, 2015;Lamberts, 2017;Lindholm, Bergman, & Gustav, 2018;National Science Board, 2016;Scheufele, 2013;German Science Barometer, 2017). However, a crisis of confidence has taken place in some scientific disciplines, caused by concerns about the replicability of past findings (e.g., Baker, 2016;Pashler & Wagenmakers, 2012). This crisis could potentially impact the public's trust in science (e.g., Rutjens, Heine, Sutton, & van Harreveld, 2017). The ongoing scientific discussion on how to improve the way we do science has resulted in a surge of criticisms against questionable research practices that produce irreproducible and misleading results. Criticism on research practices has always been part of the scientific process (e.g., Hull, 1988). What is new is that these discussions and criticisms are now increasingly taking place in blogs, on Twitter, and in Facebook discussion groups, which makes these criticisms more accessible to journalists and the general public (Brumfiel, 2009). People who lack training in the scientific process (e.g., Hallman, 2017) may misperceive legitimate disagreements as something out of the ordinary (see also Pittinsky, 2015). So far, there has been no systematic study to test the impact of different aspects of the replicability crisis (i.e., failures to replicate, criticisms of questionable research practices, and proposed reforms) on the public's trust in science.
Some researchers are concerned that the public will lose trust in scientific fields that have been criticized, such as psychological science. In the extreme case, there is the concern that public criticisms of research practices and issues regarding replicability can fuel anti-science movements (e.g., Ioannidis, 2017;Pickett & Roche, 2018). For example, legitimate calls for increased scientific transparency have been used as the basis for a political attempt to severely restrict research that the US Environmental Protection Agency can use in writing regulations (e.g., Gebellhof, 2018). Others have suggested that public criticisms and attempts to reform research practices may be perceived as science being self-correcting, thereby demonstrating scientists' commitment to improving the way they do science. This could positively impact public trust, or at least help maintain trust in science in the long run (e.g., Rutjens et al., 2017;Srivastava, 2017;Vazire, 2016). Because of the importance of trust in science among the general public (who in democratic societies vote for parties that subsequently determine science policy) and the disagreement about whether the public discussions around reproducibility will reduce or increase public trust in psychological science, we aim to experimentally examine the impact of being informed about psychology's replicability crisis, and attempts to selfcorrect the way psychological science is practiced.
The purpose of the present study is to provide empirical data to inform discussions that have thus far been largely anecdotal or hypothetical. Although it seems plausible that learning about replication failures will reduce trust in psychological science (especially in past findings), it is unclear whether additionally hearing about publicly expressed criticism about questionable research practices will further reduce trust or be seen as a sign of a healthy field of science and thus increase trust. Finally, a novel question in the proposed study is whether learning about current reforms in psychological science will increase trust in psychological science enough to compensate for the lower trust after hearing about replication failures. This study will allow us to examine whether the current developments in psychological science taken together, replication failures, criticisms, and proposed reforms, can be expected to decrease, increase, or keep stable, the trust in psychological science.

The replicability crisis and questionable research practices
Several fields of scientific inquiry have encountered what has been widely referred to as a replicability (or reproducibility) crisis. In psychology, the Open Science Collaboration (2015) attempted to replicate 100 studies from three major psychology journals. Of those 100, 64 did not produce statistically significant resultsthat is, they failed to support the hypothesis of the original study. In cancer biology, of 53 landmark studies investigated, 89% (47) could not be confirmed (Begley & Ellis, 2012; see also Davis et al., 2017). In experimental economics, an attempt to replicate 18 findings from two major journals was unable to confirm the original results in 7 (39%) cases (Camerer et al., 2016). And, in experimental philosophy, approximately 30% of 40 original findings failed to replicate (Cova et al., 2018). Other large-scale attempts to replicate important psychological findings have also yielded null-effects (e.g., Hagger et al., 2016;O'Donnell et al., 2018;Wagenmakers et al., 2016). The replicability crisis has gained considerable news coverage (e.g., Connor, 2015;Engber, 2017;Feilden, 2017;Yong, 2015) and a sizable Wikipedia entry entitled, "Replication crisis".
Questionable research practices (QRPs), whether intentional or not, have been discussed as one possible reason underlying the current replicability crisis by leading to a misrepresentation of the data (Banks et al., 2016;John, Loewenstein, & Prelec, 2012;Pickett & Roche, 2018;Simmons, Nelson, & Simonsohn, 2011;Wicherts, 2011). QRPs consist of data-analytic choices driven by their utility in producing more favourable statistical results, which are not transparently reported in methods sections (such as optional stopping during data collection, or selectively excluding outliers). They are problematic because they inflate Type 1 error rates (incorrectly rejecting the null hypothesis, Simmons et al., 2011) and thus the percentage of false positives in the literature. Many scientists admit to having engaged in QRPs (for psychology see, Fiedler & Schwarz, 2016;John et al., 2012; for management see Banks et al., 2016; for ecology and evolution see ;Fraser, Parker, Nakagawa, Barnett, & Fidler, 2018; for science more generally see, Martinson, Anderson, & De Vries, 2005). Many psychological scientists have started educating researchers about QRPs, as well as pointing out QRPs in the literature, perceiving them as an important underlying cause of the replicability crisis (in addition to other problems, such as publication bias) (e.g., Cumming, 2016;Neuroskeptic, 2015).

Effects of the replicability crisis on trust in psychological science
How might the published failures to replicate, and the public discussion of QRPs, impact the public's trust of psychological science? One perspective suggests that transparency about problems and improvements in scientific practice may maintain the public's trust and confidence in the long term. A compelling argument for this view is that, given QRPs are perceived to be morally unacceptable and deserving of punishment (Pickett & Roche, 2018), scientific criticisms against them may signal to the public that such behaviours are taken seriously by the psychological science community, and that psychological scientists are attempting to address these problems. As Pittinsky (2015) argues, scientists can inspire the public's faith in science by acknowledging the seriousness of scientific misconduct, and being transparent about discussions among scientists for how to improve research practices. Vazire (2017) posits that transparency increases the accountability of researchers, incentivising more reliable scientific practices that produce better quality works, and thus reducing public uncertainty about the quality of research findings which, in turn, increases trust in science. Although her arguments specifically referred to transparency about data and methods, the argument can be extended to publicly expressed criticisms against QRPs, for this too would incentivise more reliable practices. Hence, although awareness of replication failures may negatively affect public perceptions about the quality of past works (i.e., the published literature), perceptions about quality in the future of the scientific field, and thus trust in its future findings, may be increased (or maintained) when there is also awareness about scientific criticisms of QRPs and initiatives aimed at reform. Hendriks, Kienhues, and Bromme (2016) present findings that support this view. Participants in their study read what was described as a blog entry by an expert science blogger, followed by a commentary said to be written either by the blogger or by another person criticising the original blog post. They found that self-criticism of the blog post, compared to the external criticism, resulted in the expert science blogger being rated higher on integrity and benevolence (Hendriks et al., 2016). Integrity and benevolence have been shown to represent two dimensions of epistemic trust (Hendriks et al., 2015(Hendriks et al., , 2016Mayer & Davis, 1999). In another scenario-based study conducted by Fetterman and Sassenberg (2015), scientists tended to overestimate how negatively they would be perceived by other scientists following failed replications of their work. In addition, they found that the scientific reputation of a researcher who admitted to being "wrong about the effect" was higher than the researcher who questioned the replication. At the group level, one could argue that because psychological scientists themselves are testing the reliability of past research findings and criticising QRPs, their publicly expressed criticisms can have beneficial effects on the scientific reputation and the perceived benevolence and integrity of, and thus trust in, the collective (i.e., the psychological science community).
In contrast to the preceding view that transparency instils trust in the scientific process, others are concerned that the replicability crisis and public discussion about QRPs will negatively influence public perceptions of psychological science (e.g., Fanelli, 2018). This concern is expressed clearly by Klaus Fiedler in an interview, " . . . I believe that the way [the debate] unfolded over the last decade was counterproductive. It damaged psychology's public image and undermined the self-confidence of our young scientists and students" (Genschow & Crusias, 2018). The enhanced status of scientific knowledge is derived from its procedural norms, including replication and peer review (Gluckman, 2014). When large replication attempts of research in psychological science fail, and publicly expressed criticisms suggest that these failures are a result of QRPs, public perceptions of psychological science may be negatively affected, diminishing the field's status, and leading to less trust in psychological science.
Although it is possible that concerns about the negative impact of failed replications and criticisms of QRPs on public trust may be somewhat overestimated (e.g., Fetterman & Sassenberg, 2015), there is some research that supports these concerns. Pickett and Roche (2018) randomly allocated participants to be presented with either a definition of data falsification (i.e., fraud; n = 415) or selective reporting (i.e., QRPs; n = 406). Participants then judged how morally unacceptable they believed these behaviours to be, and the action that should be taken against scientists engaging in them. Although a greater percentage of participants presented with data fraud said that it was morally unacceptable (96%), a majority of those presented with QRPs also judged these as morally unacceptable (71%). Of the participants responding to QRPs, 73% thought that scientists who engaged in them should be banned from receiving funding, 63% thought they should be fired, and 37% thought QRPs should be a crime. If members of the public consider QRPs to be such a severe moral transgression deserving of punishment, then it is likely that publicly expressed criticisms that connect the replicability crisis with scientists using QRPs will have a negative impact on perceptions of the scientific field that is being criticised. Relatedly, in a representative sample of over 1000 Australians, Critchley (2008) found that the perceived benevolence of scientists (including ethical research methods and honesty about results) was positively related to trust in scientists (r = .41). Together, these findings would suggest that learning about QRPs, and criticisms on their use by psychological scientists, could reduce the public's trust in the field.
The most direct support for this position comes from a recent study conducted by Chopik, Bremner, Defever, and Keller (2018). They used a 1-hour lecture to educate undergraduate students about the replicability crisis in psychology, the potential causes for it (including QRPs), and good research practices such as sharing data and materials and designing studies with high power. In the pre-post within-subjects design, after having taken part in the lecture, students had less trust in "the results of studies done by psychologists".
It seems sensible that learning about the replicability crisis in psychology reduces trust in studies that have been done (i.e., trust in past research). But Chopik et al.'s (2018) study does not inform us about trust in future research done by psychologists nor does it inform us about how the various additive aspects of the replicability crisis (i.e., replication failures, criticisms of QRPs, and proposed reforms) work together to impact trust. Will public criticism of QRPs have a positive or negative effect on trust in future research in psychological science? And to what extent can learning about proposed reforms in psychological science improve trust, even after learning about replication failures?

Overview of present study
In the present study, we experimentally manipulated whether participants were presented with information about (1) replication failures, or (2) replication failures and criticisms of QRPs, or (3) replication failures, criticisms of QRPs, and suggestions for reform, or (4) a control condition with general information about psychology, but no information about any of the issues discussed in the other experimental conditions. We investigated the effects of being exposed to information about replication failures, criticisms of QRPs, and suggestions for reforms on three dependent variables: (1) trust in past research within psychological science, (2) trust in future research within psychological science, and (3) support for future research in psychological science. We compared the four experimental conditions on the three outcome variables. We predicted that knowledge about replication failures would reduce trust, especially in past research, but it might also reduce trust in future research. Learning about criticisms of QRP's might further reduce trust in both past and future research in psychological science. However, an alternative perspective would suggest that self-criticism would reduce trust in past research but improve trust in future research. Finally, we expected knowledge about reforms to improve trust in future research in psychological science, but an important question was whether the increase in trust when hearing about reforms would be enough to counteract the predicted negative effect of hearing about replication failures. Would knowledge about replication failures outweigh knowledge about reforms, or might learning about reforms in psychology maintain or even lead to an improvement in trust in psychological science, despite the failures to replicate past studies?
We were interested in three comparisons: (1) First, compared to the control group, we expected that people informed about replication failures would have less trust in past and future research in psychological science.
(2) Our second interest was in whether and how learning about criticisms of QRP's in addition to learning about failures to replicate would impact trust in past and future science. It seemed plausible that if learning about QRP's had an effect on past research, it would be negative. However, with respect to future research, we did not have a directional prediction. Trust in future research could decrease (if people believe QRP's would also affect future studies) but trust in future research could also increase (if people see criticisms on QRP's as healthy self-criticism of a field trying to improve). (3) Finally, we were interested in whether, compared to a control condition, learning about all information about the reproducibility crisis (replication failures, criticisms on QRP's, and proposed reforms) would overall reduce, maintain, or even improve people's trust in future research in psychological science. We had no directional prediction, but we believed that this comparison addressed an important question about the extent to which the discussion as it has thus far unfolded would impact peoples' trust in the future of psychological science.

Non-preregistered pilot study
In a pilot study, performed in response to the first round of reviews, we aimed to examine three main questions. First, we tested the comprehension checks to estimate what percentage of participants would remember key points in the presented information.
As one reviewer noted, including comprehension checks might increase a demand effect, so if the percentage of participants who pass the comprehension checks is high enough, we can ignore them in the main study to reduce demand effects. Second, we more closely examined a composite measure of trust in the psychological science community (which had too poor reliability to be used in the main study), and, third, we collected initial data (as the pilot study was largely similar in terms of independent and dependent variables as the proposed study). These materials, data, and analysis script are available on the OSF (https://osf.io/sftz2/).

Procedure
Participants were randomly allocated to the control, replication failures, criticisms of QRPs, or the reforms condition (see Materials and Measures section of the preregistered study). After reading the information for their respective condition, participants responded to two multiple-choice comprehension checks (see Supplementary Materials) to ensure they had read and understood the critical pieces of information they were provided with. Next, participants were asked how much they trust past research in psychological science, how much they trust future research in psychological science, and how much they agree that public funding should be used to support future research in psychological science (from 1 = not at all, to 10 = completely; see Materials and Measures section below). In this pilot study we also included a scale to measure trust in the psychological science community from a previous pilot test, minus one item due to poor fit statistics, but the scale showed poor psychometric properties. We do not report the scale here (but the data are available on OSF, https://osf.io/sftz2/, where data are also available from another pilot study designed to develop a measure for trust in the science community). We decided not to use the scale measuring trust in the psychological science community in the final study. All trust measures were presented in random order on a separate page. After responding to the trust measures, participants were asked, "Before taking this survey, how informed were you about a replicability or reproducibility crisis in psychology (from 1 = never heard about it, to 10 = very well-informed)?". Finally, before being debriefed and paid, participants provided their age and a single-item measure of political self-identification, "Please indicate how you politically self-identify, from very liberal/left-wing to very conservative/right-wing", on a 7-point scale (the farthest left point was labelled "very liberal/left-wing", the farthest right point was labelled "very conservative/rightwing", and the midpoint was labelled "Moderate/Centre"). This item was included so as to test the construct validity of the trust in the psychological science community scale (i.e., the further right/conservative people political self-identify the less trust in the science community they would be expected to have).

Results
Out of the 201 participants in the pilot study, 184 (91.5%) passed both comprehension checks. We deem this an acceptable pass-rate and, to avoid potential demand effects introduced by the comprehension checks, we decided not to use comprehension checks in the final study.
We further explored differences between naïve vs. more informed participants by dichotomising the informed variable so that we could compare those who had never heard about the replicability crisis (i.e., selected 1 on the rating scale, labelled "Never heard about it") with participants who had (i.e., selected 2-10 on the rating scale). From the 201 participants in the study, 60 (29.9%) had never heard about the replicability crisis. There were non-significant differences (that should be interpreted with special caution given the small sample size in the group who answered '1') between those who had never heard about the replicability crisis and those who had, in subjective SES (p = .672), level of education (p = .126), income level (p = .762), political self-identification (p = .380), age (p = .118), trust in past research (p = .789), trust in future research (p = .641), and support for future research (p = .678). Table 1 presents the descriptive statistics. Although we find no reason to assume strong effects, due to the small sample size we cannot rule out small or medium effects.
This pilot study allowed us to collect data very similar to that planned in the preregistered (albeit with smaller sample sizes). We examined the effect of being informed about the various elements of the replicability crisis on trust and support for research in psychological science. Our focus was on three contrasts we were particularly interested in (see Analysis Plan). The descriptive statistics (Ms and SDs) are presented in Table 2. Almost none of the comparisons were statistically significant with 50 participants in each between-subject condition, but the We note that these preliminary tests are underpowered for effect sizes we still consider interesting (see Participants section below for power calculations and target sample size for the final study). Nevertheless, the descriptive statistics (and effect size confidence intervals) seem to suggest that although being informed about the various aspects of the replicability crisis might reduce trust in past research, trust in future research may be maintained (e.g., when there are criticisms of research practices or when participants are informed about proposed reforms). We performed the main study to examine whether this predicted pattern of means would emerge when collecting data from a larger sample.
In response to a concern raised by a reviewer that selecting participants who have not heard of the replication crisis might lead to selection effects, because these participants differ from participants who have heard of the replication crisis, we tested whether the manipulations had differential effects on trust for those who had heard about the replication crisis in psychology as compared to those who had not. To do this, we dichotomised participants into those who had never heard about the replicability crisis in psychology (i.e., those who selected option 1 on the item asking participants how informed they are about the replicability crisis) and those who had at least heard about it (i.e., those who selected options 2-10). Entering this dichotomised naivety variable as a factor together with experimental manipulation in an ANOVA, showed statistically nonsignificant interaction effects on trust in past research (p = .167), trust in future research (p = .622), and support for future research (p = .717). Moreover, the main effects of naivety were also not statistically significant, providing no evidence for differences between those who had never heard about the replicability crisis compared to those who had (ps = .776, .632, and .659 for trust in past research, trust in future research, and support for future research, respectively). Given the small sample size, these tests are underpowered. Nevertheless, it can be seen that the pattern of descriptive statistics for participants who were naïve about the replicability crisis, presented in Table 3, is very similar to the pattern of results for the entire sample examined as a whole (i.e., Table 2).

Participants
We were interested in small effects because, whereas in the real world people are likely to be repeatedly presented with stories about the replicability crisis, our manipulation involved a single reading of short statements. A single exposure with a small effect may translate to larger effects with repeated exposure over a longer period of time. We had difficulty in setting a smallest effect size of interest. In our pilot study, the average effect size estimate for decreased trust in past research, compared to the control group, due to being informed about each aspect of the replicability crisis was Cohen's d = 0.37 (0.36 for replication failures, 0.45 for criticisms of research practices, and 0.29 for reforms). We therefore decided to set the smallest effect size of interest to what we believed would be an effect small enough for an initial test of the concerns about the impact of the replicability crisis: Cohen's d = 0.3. Our main contrasts of interest (described in more detail in the Analysis Plan further below) were tested using one-and two-tailed t-tests. We decided on a Type 1 error rate based on Good's (1988) proposal to adjust the alpha level as a function of the sample size (adjusting the p-value to p ffiffiffiffiffiffiffiffiffiffiffiffiffi N=100 p , or adjusting the alpha level to 0:05= ffiffiffiffiffiffiffiffiffiffiffiffiffi N=100 p ), which intends to prevent Lindley's paradox when collecting larger samples. Using the TOSTER package in R (Lakens, 2017) we calculated that for alpha = .0307, and 90% power to exclude an effect size outside the equivalence range of d = −0.3 and d = 0.3, we require a sample size of 275 per condition (Lakens, Scheel, & Isager, 2018). We therefore aimed to recruit a total of 1100 participants (275 participants across 4 conditions) using Prolific Academic, without allowing those who took part in the pilot study to participate in the main study. We pre-screened participants to be fluent in English. Participant recruitment continued until the number of participants who had completed the survey reached the sample size requirement (N = 1100).
Our sampling process resulted in 1129 complete responses on our preregistered dependent variables (the three main outcome measures) because some completed all dependent variables, but not some of the other questions. These participants (528 female, 568 male, 4 other/non-binary, 29 missing values) had mean age of 30.56 years (range: 18-75 years) and mostly resided in Europe (91.6%). For the exploratory analyses involving variables included later in the survey, sample size varies and includes only those with complete responses on the variables included in the analysis.

Procedure
We randomly allocated (using the Survey Monkey randomiser) participants to one of four groups. As in the pilot study, in one (control) group, participants read three paragraphs of information about psychological research (see the Materials and Measures section below). In a second group, participants read the same first two paragraphs as the control group, but the last paragraph was replaced with some information about failed replications in psychological science. For the third group of participants the second and third paragraphs of the control group were respectively replaced with information about replication failures and information about researchers criticising the use of QRPs as a cause of the failed replications. A fourth group of participants read the two paragraphs with information about replication failures and criticisms of QRPs, plus another paragraph in which the criticisms are followed by suggestions and implementation of reforms to improve the reliability of research. Participants were asked to indicate their trust in past research, trust in future research, and support for future research in psychological science. Participants were then asked to explain why they had those levels of trust in past and future research in an open answer format (answers to this question are not reported in this paper), and extending the pilot, followed by two measures of ambivalence (mixed feelings) about the events described in the vignettes (a subjective measure and an objective measure). Next, participants were asked, "Before taking this survey, how informed were you about a replicability or reproducibility crisis in psychology?" (from 1 = Never heard about it, to 10 = Very well-informed). Finally, participants provided demographic details (age, gender; and the same demographic details as those collected in the pilot study were obtained from Prolific's pre-screening criteria), were debriefed, thanked for their time, and paid.

Materials and measures
The information provided to each group of participants was designed to be comparable in length (160 words) so as to control for how much information each group received regarding psychological science.

Information for the control group
Psychology was originally studied as a philosophical pursuit with insights coming from introspection and observation. During the Enlightenment in Europe, psychology became more popular and shortly afterwards it was studied using experiments. Now, many universities around the world study psychology as a scientific pursuit.
Psychological science includes several different subfields. Cognitive psychologists study mental processes such as perception, attention, reasoning, memory, and learning. Social psychologists study how humans think about and relate to each other, including the influence of others on one's behaviour, beliefs, attitudes, and stereotypes. Personality psychologists study enduring patterns of emotion, thought, and behaviour, in individuals.
Research in psychological science includes many different methodologies, such as controlled experiments in laboratory settings, field experiments, and large surveys, that require the use of statistical analyses. For example, psychological scientists use statistics to draw conclusions about the effects of new behavioural interventions or policy changes. Much of the conclusions drawn from research in psychological science often rely heavily on statistics.

Information for the replication failures group
Psychology was originally studied as a philosophical pursuit with insights coming from introspection and observation. During the Enlightenment in Europe, psychology became more popular and shortly afterwards it was studied using experiments. Now, many universities around the world study psychology as a scientific pursuit.
Psychological science includes several different subfields. Cognitive psychologists study mental processes such as perception, attention, reasoning, memory, and learning. Social psychologists study how humans think about and relate to each other, including the influence of others on one's behaviour, beliefs, attitudes, and stereotypes. Personality psychologists study enduring patterns of emotion, thought, and behaviour, in individuals.
Recently, some psychological scientists tested the reliability of research in their field by repeating the procedures of past studies to see if they could produce the same results. In 2015, one large collaborative project found that from the 100 original studies examined, 65 didn't produce the same results. In other words, researchers were able to successfully replicate 35% of the studies.

Information for the criticisms of QRPs group
Psychology was originally studied as a philosophical pursuit with insights coming from introspection and observation. During the Enlightenment in Europe, psychology became more popular and shortly afterwards it was studied using experiments. Now, many universities around the world study psychology as a scientific pursuit.
Recently, some psychological scientists tested the reliability of research in their field by repeating the procedures of past studies to see if they could produce the same results. In 2015, one large collaborative project found that from the 100 original studies examined, 65 didn't produce the same results. In other words, researchers were able to successfully replicate 35% of the studies.
Some psychological scientists have criticised research practices, stating that a major reason for failures to replicate past studies is that the original findings were based on poor and untransparent research practices. These include things like selective reporting, where many analyses are performed but only favourable ones reported. They argue that such practices produce false findings.

Information for the suggestions for reforms group
Recently, some psychological scientists tested the reliability of research in their field by repeating the procedures of past studies to see if they could produce the same results. In 2015, one large collaborative project found that from the 100 original studies examined, 65 didn't produce the same results. In other words, researchers were able to successfully replicate 35% of the studies.
Some psychological scientists have criticised research practices, stating that a major reason for failures to replicate past studies is that the original findings were based on poor and un-transparent research practices. These include things like selective reporting, where many analyses are performed but only favourable ones reported. They argue that such practices produce false findings.
Because of this, many psychological scientists have started initiatives to increase the reliability of research. They are pushing for greater transparency in research practices and increased use of registered reports where statistical analyses are specified before data collection and articles published regardless of results.
In the real world, news stories may frame the replicability crisis in a negative way, with headlines such as "Study reveals that a lot of psychology research really is just 'psychobabble'" (Connor, 2015), or provide content that is more nuanced and somewhat positive, "Housecleaning is a crucial corrective in science, and psychology has led by example." (Carey, 2018). The effects of such framing in media outlets is beyond the scope of this study, the purpose of which was to investigate how merely being informed about the various aspects of the replicability crisis impacts the public's trust.
Outcome measures. Participants responded to the following questions by giving ratings on 10-point Likert-type scales. The scales were labelled: 1 = not at all, to 10 = completely. We measured trust in past research by asking: "How much do you trust past research in psychological science?". We measured trust in future research by asking: "How much do you trust future research in psychological science?". We measured support for future research by asking: "How much do you agree that public funding should be used to support future research in psychological science?". The above three items were presented in randomised order each on a different page.
Additional measures. To examine potential reasons for how people's trust might be impacted by the replicability crisis, we included some additional measures. Participants were asked an open question: "explain why you have these levels of trust in past and future research in psychological science". Following this open question, we measured both subjective and objective ambivalence (Schneider & Schwarz, 2017;Schneider, Veenstra, van Harreveld, Schwarz, & Koole, 2016). First, we measured subjective ambivalence by asking participants to "Think about the events you have just read about in psychological science. To what extent do you have mixed thoughts or feelings about the events?". Second, we measured what is called objective ambivalence using two items, one each for the positive feelings and negative feelings: Think about the events you just read about in psychological science. When you think about the positive [negative] aspects of the events described, while ignoring the negative [positive] aspects, how positive [negative] is your evaluation of the events in psychological science?.
Both subjective and objective ambivalence items were measured with Likert-type scales (from 1 = not at all, to 9 = very much). To calculate objective ambivalence, we use the formula ((P + N)/2) -|P -N|, where P is the positive rating and N is the negative rating. This measure can range from −3 to 9, with scores below 1 reflecting univalence, and from 1 to 9 reflecting increasing ambivalence (a score of 1 reflects neutrality; see Schneider et al., 2016 for a thorough explanation).

Preregistered analysis plan
We were particularly interested in planned contrasts that tested our main hypotheses, for which we planned to use Welch's t-tests against 0 (which does not assume homogeneity of variance), and equivalence tests with an equivalence range from d = −0.3 to d = 0.3. Welch's t-test is quite robust to violations of normality as long as the sample size is sufficiently large (Delacre, Lakens, & Leys, 2017). We planned that if a visual inspection suggested strongly asymmetric distributions, we would also perform a Mann-Whitney U test and rely on the more conservative of the two p-values. A visual inspection of the distribution of the three main outcome variables did not suggest strong asymmetric distributions.
We believe that trust in past research and trust in future research are related to different questions. Being informed about each of the different elements of the replicability crisis may affect trust in past research similarly (particularly given that all elements involve failures to replicate past studies). On the other hand, how they affect trust in future research may differ from each other, given that each element may signal different things about what future research may involve (e.g., reforms to research practices). Therefore, in correcting for multiple comparisons, we considered all tests that involve trust in past research as the dependent variable to be a family of tests, and all tests that involve trust in future research to be another family of tests. We used the Holm-Bonferroni correction to adjust our alpha levels. There are two comparisons for trust in past research (i.e., control vs. replication failures, and control vs. criticisms of QRPs): The lowest of the two p-values from these comparisons was tested against an alpha of .0307/2 (.0154), and the other against an alpha of .0307. All three comparisons examine trust in future research: The lowest of the three p-values was tested against an alpha of 0.0307/3 (.0102), the second lowest against an alpha of 0.0307/2 (.0154), and the last test at the 0.0307 level.
We included a question asking participants about their support for future research in psychological science specifically. We expected the same pattern of results as for trust in future research (i.e., compared to control, learning about (i) replication failures will reduce support, (ii) criticisms of QRPs may either reduce or increase support, and (iii) all aspects of the replication crisis may either reduce or increase support). Although this variable may be taken as belonging to the same family of tests as trust for future research, we examined support for future research as something more exploratory and treated it separately. Hence, we corrected the alpha for multiple comparisons, treating support for future research as a separate family of tests. Support for future research was thus involved in all three comparisons and had the same corrected alpha levels as trust in future research.

Preregistered analyses
Descriptive statistics (M, SD) of the three preregistered outcome measures for each of the four groups are presented in Table 4 and the data visualized in Figure 1.

Replication failures and trust
First, we tested whether, compared to the control group, the group informed only about replication failures had less trust in past and future research in psychological science, 008. Therefore, we do not find support for the idea that trust in future research is reduced by learning about replication failures and we can conclude that the effect is not larger than 0.3. Hence, in line with previous research (e.g., Chopik et al., 2018;Wingen, Berkessel, & Englich, 2019), and as expected, we found that learning about replication failures reduces trust in past research. However, extending on the previous works, we found no evidence that trust in future research is affected. This suggests that attempting to replicate past findings may act as a signal that researchers want to get things right and that psychological science is self-correcting. There was a non-significant reduction in trust for future research compared to the control group, but the equivalence test was inconclusive. Further research is needed to disambiguate these findings. It is possible that the replication attempts signal that psychological science is self-correcting, and criticisms of QRPs indicate that researchers take these issues seriously, but it is also possible that learning about QRPs undermines these signals.

Reforms and trust
Next, we examined whether and how trust in future research would be affected if the general public was informed about the three core aspects of the replicability crisis, which include replication failures, criticisms of QRPs, and suggested reforms. This comparison informs us about whether receiving brief but complete information regarding current discussions in the field of psychology will affect trust in future research from psychological science. For this, we used a two-sided Welch's t-test and found that compared to the suggested reforms) reduces trust in future research from psychological science. This result is unexpected, given that learning about replication failures did not significantly affect trust in future research, nor did learning also about criticisms of QRPs.
Why would additionally learning about reforms reduce trust? One explanation might be that people, upon learning that new reforms such as increased transparency in research practices, might be surprised to find out that transparency is not already part of the process, resulting in a negative reaction. An alternative explanation is a methodological confound. The three other conditions began with the paragraph describing the history of how psychology has been studied, whereas the reforms condition started with the paragraph on replication failures (i.e., immediately started on a negative note). This may have caused people to react more negatively to the reforms condition. However, results of exploratory robustness checks for this finding (see "Sensitivity Analyses" in supplementary materials) showed that there were non-significant differences in trust of future research, between the control and reforms groups, unless we included people who indicated that they were somewhat well-informed (i.e., a rating of 8 or higher on that item) in the analyses. As such, we would warn against over interpreting this unexpected result. Yet another explanation may be that people do not believe that the suggested reforms are extreme enough or that they will not have a significant impact soon enough to restore their trust in future research. Further research is needed to test these alternate explanations.

Support for future research
We also pre-registered our intention to examine support for future research in psychological science. 3) for all three comparisons to the control group were significant: replication failures, t (525.91) = 3.01, p = .001; criticisms of QRPs, t (555.83) = 3.35, p < .001; and reforms, t (535.78) = 3.45, p < .001. Therefore, we can conclude that for each of these comparisons the effect is not larger than 0.3. People thus support future research in psychological science regardless of whether they have heard about any aspects of the replicability crisis. One explanation may be that people understand that psychological science is a young science and is thus likely to make mistakes along the way. Moreover, the replicability crisis, and each of the core components that we have presented (replication failures, criticisms of QRPs, and reforms), may act as a signal to the public showing that we care about getting it right, that we will identify and correct mistakes. People's views about the process of science may be analogous to how researchers view science, in so far as it is self-correcting and that it advances through periods of crises (e.g., Kuhn, 1970;Vazire, 2018). Thus, wanting to further the advancement of knowledge about humans, people desire for psychology to continue to be supported by public funding, even if they may trust its future research outputs a little less, because in the long-run the research may become more trustworthy. Of course, these explanations are speculative and should be examined by future research.

Exploratory analyses
In exploratory analyses, we examined the measures of negative and positive feelings (designed to be used to calculate an objective measure of ambivalence) people had in response to the information they were given. A one-way ANOVA revealed no statistically significant differences between groups on positive feelings, F (3, 1094) = 2.51, p = .058 (M control = 6.67, SD control = 1.20; M rep = 6.54, SD rep = 1.39; M qrp = 6.38, SD qrp = 1.38; M reform = 6.44, SD reform = 1.23). However, there were statistically significant differences between groups on negative feelings, F (3, 1096) = 4.84, p = .002. Bonferroni corrected post hoc comparisons (corrected alpha .0083) showed that the reforms group (M = 5.01, SD = 1.83) had significantly more negative feelings than the control group (M = 4.49, SD = 1.69; p < .001, Cohen's d = 0.29, CI 95% [0.12; 0.46]), and the replications group (M = 4.60, SD = 1.78, p = .009, Cohen's d = 0.22, CI 95% [0.06; 0.39]), but not the QRPs group (M = 4.87, SD = 1.77, p = .348, Cohen's d = 0.08, CI 95% [−0.08; 0.24]). The other group differences were not statistically significant. Therefore, people in the reforms group had more negative feelings about the events they read than people in the control group and the replication failures group. This supports our speculation that the unexpected reduction in trust for future research that resulted from reading about reforms may have been due to a methodological confound (i.e., beginning on a negative note). But this finding is also in line with the alternative explanation that people may have reacted negatively to learning about the reforms because they may have been previously unaware that the suggested reforms (e.g., transparency) were not already part of the process. Moreover, this analysis was exploratory and our conclusions are thus speculative. Future research can examine these speculative explanations.

Conclusion
Psychology's replication crisis has sparked many debates about whether public discussions about replication failures and questionable research practices will cause people to lose trust in psychological science. We examined whether informing people about three major aspects of the replication crisis (i.e., replication failures, criticisms of questionable research practices, and reforms) affects how much people trust psychological research, distinguishing trust in past research from trust in future research. Our results, using a Prolific Academic sample, suggest that being informed about replication failures and criticisms of questionable research practices may reduce trust in past research but not trust in future research. The results were a little less clear regarding what happens when people are also informed about reforms, such as increased transparency, which caused people to trust future research less. Nevertheless, people tended to indicate that future research in psychological science should be supported by public funding. We hope that these results will inform the debates about the impact of psychology's replication crisis and that future research will examine these findings in more representative samples, testing also the alternative explanations of the findings. Moreover, future research should examine the impact of these events on different samples of people (e.g., science funders may respond more favourably to reforms; students may respond negatively to the crisis and be less attracted to study past research), and the impact of different variations of presenting the information. Note 1. Cohen's d effect sizes were calculated using the "effsize" package in R (Torchiano, 2018).