Strategies towards Evaluation beyond Scientific Impact. Pathways not only for Agricultural Research
Abstract:
Various research fields, like organic agricultural research, are dedicated to solving real-world problems and contributing to sustainable development. Therefore, systems research and the application of interdisciplinary and transdisciplinary approaches are increasingly endorsed. However, research performance depends not only on self-conception, but also on framework conditions of the scientific system, which are not always of benefit to such research fields. Recently, science and its framework conditions have been under increasing scrutiny as regards their ability to serve societal benefit. This provides opportunities for (organic) agricultural research to engage in the development of a research system that will serve its needs. This article focuses on possible strategies for facilitating a balanced research evaluation that recognises scientific quality as well as societal relevance and applicability. These strategies are (a) to strengthen the general support for evaluation beyond scientific impact, and (b) to provide accessible data for such evaluations. Synergies of interest are found between open access movements and research communities focusing on global challenges and sustainability. As both are committed to increasing the societal benefit of science, they may support evaluation criteria such as knowledge production and dissemination tailored to societal needs, and the use of open access. Additional synergies exist between all those who scrutinise current research evaluation systems for their ability to serve scientific quality, which is also a precondition for societal benefit. Here, digital communication technologies provide opportunities to increase effectiveness, transparency, fairness and plurality in the dissemination of scientific results, quality assurance and reputation. Furthermore, funders may support transdisciplinary approaches and open access and improve data availability for evaluation beyond scientific impact. If they begin to use current research information systems that include societal impact data while reducing the requirements for narrative reports, documentation burdens on researchers may be relieved, with the funders themselves acting as data providers for researchers, institutions and tailored dissemination beyond academia.
1. Introduction
A crucial aim of agricultural research is to address sustainable development. Global challenges like climate change [1] or the degradation of ecosystem services have fundamental negative impacts on human health and well-being [2]. Agriculture is both driving and being affected by those developments ([2] p. 98), [3]. Such challenges require immediate and adequate ac- tion on the part of the whole of society, but also the contribution of relevant knowledge through research ([3] p. 3; [4] p. 322). However, whether research is able to make that contribution depends primarily on the conditions and incentives within the scientific system.
In this article, the focus will be on research evaluation, which can be an important driver for developing science in the direction of scientifically robust, societally relevant and applicable knowledge production. Cur- rently, scientific quality assurance is mainly performed through peer review of papers and project proposals, while scientific impact is evaluated based on publication output in peer-reviewed journals and citation-based performance indicators (detailed in Section 2.3). Citations of a publication are a measure of the ac- knowledgement by the respective researcher's peers. Citations are counted by and in peer-reviewed journals that are indexed for citation counting. Furthermore, a researcher's publication output and citation rates can be subsumed in an index, e.g. the h-index [5]. Citations are also used as a measure of the recognition of journals, where all citations of a journal within other journals are counted, e.g. the Journal Impact Factor (IF) used by Thompson Reuters [6]. Accordingly, scientific impact is associated with high publication output in high-impact journals and high citation rates in other highly ranked journals. These measures assess, at best, the impact of research on science itself. How- ever, they neither assess societal impact nor serve as proxies for it [7]. As a result, research which similarly targets audiences outside academia may not be ade- quately appreciated in research evaluation. The term societal impact is used here to sum up all the practical, social, environmental, economic and other 'real-world' impacts research may have for its target groups and society as a whole.
To overcome shortcomings in current research evaluation practices, several alternative evaluation con- cepts which take societal impacts into account have been developed over the past few years (see Section 3.2). However, such an evaluation of societal impact faces some inherent challenges, including time and attribution gaps. The term 'time gap' describes the problem that if impact occurs, it is in most cases with some delay after completion of the research. Sec- ondly, the 'attribution gap' means that impacts are not easily attributed to a particular research activity like a project or publication. For example, the adoption of a particular agricultural innovation may be the result of several research activities combined with policy chang- es and other influences. Accordingly, the state of the art of societal impact assessment focuses on the contribution of research in complex innovation systems, instead of attributing the impacts linearly in terms of cause and effect [8]. Furthermore, proxies are often employed, instead of direct measures of impact. One example is the concept of 'productive interactions', defined as direct, indirect or financial interactions with stakeholders that support the use of research results and make an impact likely [9].
With bibliometric data it is possible to analyse interdisciplinary publications via references from and cita- tions in different fields [10], as well as interactions between basic and applied research. By contrast, the assessment of societal impact (or corresponding prox- ies) cannot be built on bibliometric analysis, and in most cases there are no other sources with easy-to- use data available either. Thus the effort involved in data assessment for documentary analysis or inter- views, for example, inhibits the frequent use of such evaluation approaches.
Starting from these observations, the aim of this paper is to discuss two possible strategies to facilitate research evaluation that is more balanced, both with regard to scientific quality and impact, and to societal relevance and applicability. The first strategy is to strengthen general support for such evaluation be- yond scientific impact; the second is to reduce the effort of societal impact evaluations by improving data availability.
Section 2 below introduces the relevant movements and focuses on shared interests as a base for broader support of evaluation beyond scientific impact. Section 3 then provides concrete measures for such support, including possibilities for improving data availability for evaluation beyond scientific impact. In each section the paper shows how agricultural research that is oriented towards sustainability and real-world impact, with a special focus on organic agricultural research, could be involved in these developments in order to create good conditions for its fields of research. We will conclude with an overview of the actions that may be undertaken jointly by various actors.
2. Multiple Voices Call for Changes in Know- ledge Production and Research Evaluation
Various societal groups are demanding changes in knowledge production and research evaluation, for example researchers and funding agencies engaged in sustainability, global challenges and transdisciplinary approaches, the open access movements, and re- searchers who scrutinise current research evaluation systems for their ability to serve scientific quality.
Several international assessments synthesise scientific and non-scientific knowledge via multiple-stakeholder processes involving science, governments, NGOs, international organisations and the private sector, for example the Millennium Ecosystem Assessment (MA) [2], the International Assessment of Agricultural Know- ledge, Science and Technology for Development (IAASTD) [3] and the World Health Summit ([11] pp. 86‒87). These assessments, and some scientific groups that give policy advice, such as the WBGU (German Advisory Council on Global Change) [4], point out that there is considerable pressure on society to tackle pressing challenges adequately, which in turn requires knowledge to be produced, accessed and used in ways that assist such adequate action and are conducive to sustainable development.
However, the transfer of existing knowledge and technologies faces several challenges. On the one hand, the balance of power and conflicting interests impede the use of research evidence ([2] p. 92). The reduction in greenhouse gas emissions, for example, is still not sufficient, although the IPCC has been trans- ferring the state of the art regarding climate change to politics for 20 years now. [1]. On the other hand, the need to increase access, clarity and relevance of research evidence for politics has been discussed [12]. Furthermore, concepts for the transfer of knowledge and technology should reflect on possible risks. Instead of merely assuming the superiority of external know- ledge and novel technologies, they should be tested beforehand under actual conditions of use ([3] p. 72) or evaluated in sustainability assessments [13].
The challenges in knowledge transfer also lead to a demand for changes in knowledge production in order to increase the applicability and sustainable benefits of knowledge. The reasons for such demands are firstly that technological development is fast and may have deep, in some cases irreversible impacts on our ecological, economic or social environment ([14] pp. 87‒93). Secondly, post-modern societies consist of complex subsystems that function according to their own inherent rules and often fail to deal with impacts that occur in more than one of them at the same time ([14] pp. 61‒63, 87‒93). Thus, knowledge production also needs to cut across specialised areas and societal subsystems ([15] p. 544; [4] p. 322) and should
support transformative processes ([4] p. 322), [11]. Thirdly, true participation of stakeholders in research processes is required to support practical applicability, ownership of solutions and sustainable impact of knowledge ([2] p. 98; [3] pp. 72‒73; [4] p. 322). Accordingly, recommendations cover enhanced know- ledge exchange among disciplines, between basic and applied research ([4] p. 322) and between science and politics [12], ([16] p. 9) and the involvement of stake- holders, including the integration of traditional and local knowledge ([2] p. 98; [3] pp. 72‒73; [4] p. 322). Such transdisciplinary processes may also be supported by involving 'knowledge brokers' as intermediaries to facilitate knowledge exchange [12], ([17] p. 17). Addi- tionally, joint agenda setting, including science, politics, the economy and in particular civil society organisations is recommended for research regarding sustainability ([4] p. 322) and agriculture ([17] p. 17) and is, in some cases, already practised [18], [19], [20]. This corresponds to the aim of civil society organisations to strengthen their influence in research policy, for example [21].
The recommendations specified in this section arewell subsumed in the terms co-design, co-production,co-delivery and co‐interpretation used by the projectVisionRD4SD [22]. These recommendations show thatconcepts for inter- and transdisciplinary research (e.g.[23], [24], [25], [26]) and approaches of 'systems of innovation',understanding innovation as a set of complex proc-esses involving multiple actors beyond science (e.g. [27]), are now well accepted in policy advice. Like-wise, several research funders have started to supportsustainabilityandtransdisciplinarityexplicitlyinre-searchprogramming([14] pp.202‒214),[28], [29].
Apart from the promising developments mentioned above, current incentive systems are considered inappropriate for encouraging researchers to focus their research on sustainable development.
Reputation-building processes based on publications in high-ranking scientific journals and third-party funding are often governed by disciplinary perceptions and fail to acknowledge interdisciplinary and systemic ap- proaches ([4] p. 351). Interdisciplinary research usually has to match the standards of different disciplines in peer review processes, which adversely affects publi- cation success [10], ([15] p. 547) and the evaluation of multidisciplinary institutions [30]. Audits based on bi- bliometric performance indicators [15] and, explicitly, the use of journal rankings [10] have been shown to be biased negatively against inter- and multi-disci- plinary research.
Some authors discuss consequences such as poorer career prospects, orientation of research away from complex social questions, reduction in cognitive diver- sity within a given discipline or the entire science sys- tem [10], and an increasing relevance gap between knowledge producers and knowledge users [15]. Simi- larly, Schneidewind et al. highlight the diversity of the sciences in objectives and theories as a base for soci- etal discussion processes ([14] pp. 30‒33) and good scientific policy advice ([14] p. 63).
Thus, researchers, institutions and funding agencies that move towards joint knowledge production for sus- tainable development may often feel contradicted by the current incentives within scientific reputation sys- tems. Accordingly, the indication is that it is necessary to improve current evaluation practices in general and apply evaluation criteria beyond scientific impact.
Broader support for changes in knowledge production and research evaluation provides multifarious oppor- tunities for agricultural research. As organic and sus- tainable farming addresses and works within the com- plexity of ecological systems, and farmers' knowledge and practices are key to building resilient agricultural production systems, the approaches highlighted in Section 2.1.1 have, since their early days, been ad- vocated in agroecology [31] and organic agricultural research ([19] pp. 15‒16), [32], [33], Agricultural re- searchers are often already in contact with actors along the whole value chain of agriculture, and ap- proaches are reflected in diverse concepts for trans- disciplinarity e.g. [34], [35], [36], and systems of innovation e.g. [37]. Researchers' experiences, and their awareness of the challenges posed by such approaches e.g. ([19] p. 61), [38], promote their adequate advance- ment via mutual learning with other research com- munities. Furthermore, the competence of (organic) agricultural research to develop applicable solutions with substantial value in the context of some pressing social and ecological challenges may become more visible.
Research evaluation that goes beyond conventionalperformance indicators and involves stakeholders isseen asnecessary for agricultural research too ([3] pp. 72‒73; [17] pp. 81‒84; [19] p. 56). Such researchevaluation may facilitate the application of transdis-ciplinary and related research approaches without dis-advantages for researchers' reputations. The necessityofsuchincentiveeffectsissupportedbyvariousstatements,e.g."Europeanagriculturalresearchiscur-rentlynotdeliveringthefullcomplementofknowledgeneeded by the agricultural sector and in rural com-munities" ([19] p. 57). Similarly, the evaluation of anorganicagriculturalresearchprogrammeinSwedenresulted in the verdict 'excellent' by scientific peers, while the agricultural advisors indicated too little relevance to pressing problems [39]. The DAFA position paper "As- sessment of applied research" considers it necessary to build a consensus about possible indicators, make a commitment to their rigorous application and improve documentation for practice impact [40]. Thus, (organic) agricultural research may use its commonalities with sustainability research in order to jointly advance inter- disciplinary and transdisciplinary research approaches and to advocate their adequate support in funding and appreciation in research evaluation.
Open access movements also aim to increase the ben- efit of research results for science and society. More than ten years ago, the Berlin declaration called for open access for original research results, raw data, metadata, source materials, digital representations of pictorial and graphical materials and scholarly multi- media [41]. Arguments in favour of open access are for example a) to regard publicly funded knowledge as public property, b) to enhance the transfer, visibility and benefit of knowledge, which is now easily pos- sible via digital technologies and reasonable because of the increased scientific literacy of the public, and c) to support participation in democratic societies [41], [42]. Furthermore, the open access movements provide concepts for increased collaboration and interaction in the creation of research results and pluralisation and transparency in the evaluation of publications, and support the full use of technological developments in data processing (see Section 3.1).
However, the inadequate exchange, use, relevance and ownership of scientific knowledge in politics, practice and society indicate that open access alone does not suffice to create benefits of knowledge. Thus co-design, co-production, co-interpretation and co- delivery are necessary on one hand to serve societal benefit, whilst on the other the dissemination of openly accessible research outputs tailored to target groups within and beyond science is also a requirement. Such a comprehensive view of the benefits of research for society increases the credibility of the arguments and supports the view that the corresponding changes in evaluation criteria can be promoted jointly by open access movements and research that is concerned with sustainable development. In our view, (organic) agri- cultural research is well placed to become a proficient actor in the process of combining the tasks of these two groups. The (organic) agricultural research com- munity is experienced in knowledge transfer and inter- and trans-disciplinary approaches within the diverse agricultural sector and is aware of 'open-access issues', for example interrelations between agriculture and public goods ([3] pp. 24, 30, 73).
In general, evaluation procedures that support scientific quality are required for both basic and applied research as foundations for evidence-based decisions. However, as detailed below, current scientific impact evaluation procedures are shown to have potential negative con- sequences for scientific quality. Knowledge of these consequences and possibilities for improvement is help- ful for strengthening scientific quality, increasing aware- ness of the general effects of evaluation processes, and generating some 'open space' to introduce criteria related to societal impact.
Several criteria are used by the scientific community to assess scientific quality. The most common are the novelty and originality of the approach, the rigour of the methodology, the reliability, validity and falsifiability of results and the logic of the arguments presented in their interpretation. Peer review processes are broadly perceived as functioning self-control of the scientific community towards scientific quality in publications and third-party funding. Correspondingly, reviewers trust the fairness and legitimacy of their own review decisions [43].
Nevertheless, peer review processes also reflect hierarchy and power within science as a social system. Editors and peers appear as 'gatekeepers', who not only maintain quality but also uphold existing paradigms and decide which of the many high-quality research papers submitted will be allowed to enter the limited space available in the journal concerned [44], [45]. Evaluative processes are found to involve not only expertise, but also interactions and emotions of peers [46] in ([43] p. 210). Instead of erroneously assuming that a "set of objective criteria is applied consistently by various reviewers", it is necessary to focus on what factors promote fair peer review processes ([43] p. 210).
Undesired decision processes such as strategic voting may occur on peer review panels; it has been suggested that fairness is improved if peers rate rather than rank proposals and give advice to funders instead of deciding about funding [43]. Furthermore, in singleblind reviews, knowledge of the author's person, gender and institutional affiliation may influence peer review [43,47‒50]. Double-blind and triple-blind re- views, the latter including editor-blindness, partly reduce bias [45], but advantages for native speakers, preferences for the familiar and insufficient reliability of reviewer recommendations do remain ([43] p. 210), [48], [49], [50]. For example, the agreement between peers with and without experience in organic agricultural research has been found to be poor with regard to reviewers' assessment of scientific quality in organic farming research proposals [51]. In some cases peer review fails to identify fraud, statistical flaws, plagiarism or repetitive publication [47], [48], [49], [50]. Recently, trials on the submission of fake papers have revealed alarmingly high acceptance rates, in high- ranked subscription journals [52] and open access journals [53]. The latter study includes some pub- lishers who were already on Beall's list of 'predatory publishers', which identifies open access publishers of low quality [54], [55].
Accordingly, further possibilities for improving peerreview processes are being discussed. They focus onincreasing efficacy and transparency in research dis-semination and quality assurance via the full use oftechnological developments in connection with openaccess(see Section3.1).
Bibliometric indicators (Table 1) are also results of socially embedded processes because, firstly, publi- cation in a certain journal reflects the decisions of reviewers and editors, and secondly, citation-based performance indicators subsume the decisions of many scientists as to whether to cite or not. In general, the publication of research evidence is influenced by re- searcher bias (the observer expectancy effect), which results in a higher likelihood of false positive findings and publication bias, meaning that "surprising and novel effects are more likely to be published than studies showing no effect" ([56] p. 3). Accordingly, "the strength of evidence for a particular finding often declines over time". This is also known as the decline effect ([56] p. 3). Moreover, non-significant results often remain unpublished. This phenomenon, known as the file-drawer effect, distorts the perception of evidence and reduces research reliability and efficacy [57].
The fact that peer decisions are often influenced by metrics also has to be taken into account: Merton describes the cumulative processes of citation rates as the Matthew effect, which follows the principle that "success breeds success" and results in higher citations being overestimated and lower citations underestimated [58]. Such dynamics are enforced by increasing scarcity of time resources and an augmented need to filter a large amount of accessible information [59]. Evidence of the Matthew effect, also called accumulative advantage, is frequently detected in science [60] and considered by scientists to be the major bias in proposal evaluation ([48] pp. 38-39).
A further interaction occurs between metrics and strategic behaviour: as person-related indicators of productivity (publication output) and impact (citation- based indicators) influence funding or career options [61], dividing results into the 'least publishable unit' [62], increasing the number of authors, or citing 'hot papers' are strategies for boosting scientists' performance indicators [45].
Furthermore, indices may hide information. The popular h-index combines publication output and citation rates in one number. It reduces the disproportionate valuation of highly cited and non-cited publications, with the result that researchers with quite different productivity and citation patterns may obtain the same h-index. This has been criticised, and the recommendation is to use several (complementary) indicators to measure scientific performance, in par- ticular separate ones for productivity and impact [63].
The relevance and use of journal-related metrics are also subjects of intense debate. A review of several empirical studies about the significance of the Journal Impact Factor (IF) concluded that "the lit- erature contains evidence for associations between journal rank and measures of scientific impact (e.g. citations, importance and unread articles), but also contains at least equally strong, consistent effects of journal rank predicting scientific unreliability (e.g. retractions, effect size, sample size, replicability, fraud/ misconduct, and methodology)" ([56] p. 7). For ex- ample, a correlation was detected between decline effect and the IF: initial findings with a strong effect are more likely to be published in journals with a high IF, followed by replication studies with a weaker ef- fect, which are more likely to be published in lower- ranked journals [56].
Moreover, the IF and other journal-based metrics are increasingly considered inappropriate for comparing the scientific output of individuals and institutions. This is indicated by the San Francisco Declaration on Research Assessment (DORA), currently signed by nearly 500 notable organisations and 11,000 individuals [64]. DORA substantiates this statement with findings which show that a) citation distributions within journals are highly skewed; b) the properties of the IF are field-specific: it is a composite of multiple, highly diverse article types, including primary research papers and reviews; c) IFs can be manipulated (or 'gamed') by editorial policy; and d) data used to calculate the IF are neither transparent nor openly available to the public [65]. Gaming of the IF is, for example, possible by increasing the pro- portion of editorials and news-and-views articles, which are cited in other journals although they do not count as citable items in the calculation of the IF [66]. Thus, journal-based metrics are not only found to be unreliable indicators of research quality; the pressure to publish in high-ranked journals may also compromise scientific quality. Furthermore the latter "slows down the dissemination of science (...) by iter- ations of submissions and rejections cascading down the hierarchy of journal rank" ([56] p. 5) which also enormously increases the burden on reviewers, au- thors and editors [67].
In agricultural research, some scepticism about jour- nal-related metrics is already evident: the Agricultural Economics Associations of Germany and Austria, for example, perform 'survey-based journal ranking', be- cause this was perceived to be more adequate than using the IF [68].
Apart from current criticism, efforts in indicator de- velopment should be acknowledged. In article-based metrics, the weighting of co-authoring and highly cited papers, excluding self-citations, leverage of time frames and inclusion of the citation value (rank of the citing journal) aim to assess scientific impact more precisely. Similarly, the further development of jour- nal-based metrics (see Table 1) involves the exclusion of self-citations and inclusion of citation value, the weighting of field-specific citation patterns, the inclu- sion of network analyses of citations or weighting the propinquity of the citing journals to one another [69]. Nevertheless, the self-reinforcing dynamics of biblio- metric indicators and their interactions with the cred- ibility of science are not taken into account in these indicator variations. For example, the weighting of citation value may even increase accumulative advan- tage.
To sum up, it seems appropriate to improve peer-review processes, to reject certain indicators, and crucially, to apply a broad set of indicators, because scientific performance is a multi-dimensional concept and indicators always contain the risk that scientists will respond directly to them rather than to the value the indicator is supposed to measure ([10] p. 7). Explicitly, DORA recommends that funding agencies and institutions should "consider a broad range of impact measures including qualitative indicators of research impact, such as influence on policy and practice" [65]. As societal benefit requires scientific quality as a base for evidence, but also goes beyond it, needing a high degree of applicability and positive application impacts, these are in fact supplements, not opponents. Therefore, enriching scientific performance with societal impact indicators can result in decisions and incentives in the scientific system that are more reliable and more beneficial to society.
Citation count | In general, the number of citations received by a paper is counted. They can be summed up for all publications of an institution or person, or calculated relative to the average citation rate of the journal or respective field over a certain period (usually three years) [70], [71]. Citation data are counted (except examples provided in Section 3.1) for and in journals indexed in the Journal Citation Report by Thompson Reuters or in the SCImago database by Elsevier [69]. Citations are generally assessed in papers, letters, corrections and retractions, editorials, and other items of a journal. |
h-index | The h-index combines publication output and impact in one index: h = N publications with at least N citations, (where the time span for calculation can be selected). For the h-index, there are some derivatives that include the number of years of scientific activity, excluding self-citations, and weighting co-authoring and highly cited papers [5]. |
IF and journalbased metrics built on Thompson Reuters database | The Journal Impact Factor (IF) is calculated by dividing the number of current-year citations to the source items published in that journal during the previous two years by the number of citable items. It can also be calculated for five years and exclude journal selfcitations [6]. Example: $I F=\frac{\text { Number of citations } \epsilon 2014 \text { for articles of journal A published } \in 2012 \wedge 2013}{\text { Number of citeable items } \epsilon \text { journal A published } \epsilon 2012 \wedge 2013}$ Another metric is Article Influence, in which the citation time frame is five years, journal selfcitation is excluded and the citation value (impact factor of the citing journal) is weighted [69]. |
Eigenfactor | Eigenfactor also uses Thompson Reuters citation data to calculate journal importance with several weightings. It includes network analysis of citations, weighting citation value and field-specific citation patterns [72]. |
Journal-based metrics built on Elsevier's Scopus database | All indicators are calculated within a citation time frame of three years. The Source Impact Normalized per Paper (SNIP) is calculated in a similar way to the IF. The Scimago Journal Ranks (SRJ and SJR2) limit journal self-citation and weight citation value. SJR2 includes a closeness weight of the citing journals, meaning that citation in a related field is calculated as being of higher value, because citing peers are assumed to have a higher capacity to evaluate it [69]. |
3. Concrete Strategies to Support Evaluation beyond Scientific Impact
While Section 2 introduced relevant movements and pointed to shared interest as a base for further coop- eration, this section will describe concrete measures for facilitating evaluation beyond scientific impact. As seen in the previous section, evaluation beyond sci- entific impact may introduce criteria for various as- pects of knowledge production (Figure 1).
Although the quality of peer reviews and self-rein- forcing dynamics affect open and subscribed publi- cation models, several possibilities for increasing efficacy in dissemination and quality assurance via digital communication technologies are discussed in the context of open access. For peer review processes, in- creased transparency is the core issue [73]. Open review, meaning that reviews are published with the preprint or the final paper, is possible with different degrees of openness and interactivity [42], though some aspects are discussed controversially. Disclosure of authors' identities entails the risk of increasing bias as in single-blind reviews [74], while disclosure of reviewers' identities is shown to preserve a high quality of reviews [75], though suspicions do remain that this may inhibit criticism and make it more difficult to find reviewers [47], [76]. However, the publishing of reviews, enabling interactions between reviewers and authors and increasing the basis of feedback and valuation via comment, forum and rating functions for readers, is commonly expected to increase transparency, fairness and scientific progress [44], [67], [73]. Some applied examples are the Journal BMJ [42], Peereva- luation.org [77] or arXiv.org. At arXiv.org the pub- lication of manuscripts accelerates dissemination and reduces the filedrawer effect; in case of revisions and publication in a journal, the updated versions are ad- ded [44], [78]. Another possibility is to guarantee publication (except in cases of fraud), but not until there has been a double-blind review of the man- uscript focusing solely on scientific quality [67]. Re- views and revised versions may be used for suggested new publication concepts with a modified role for editors [67] or even without journals [56], but also for the current system, where they can serve to assist in publication decisions made on the editorial boards of individual journals.
Additionally, review approaches should allow the engagement of peers in research evaluation to be rewarded [67] and the quality of peer review activities to be assessed [77].
Open access to data is supported by several actors [79]. It enables verification, reanalysis and metaanalysis and reduces publication bias, thus safeguarding scientific quality and societal benefit [80]. Accordingly, it is suggested that the full dissemination of research and re-use of original datasets by external researchers should be implemented as additional performance metrics [80].
Diverse citation and usage data can be accessed via the Internet for all objects with a digital object iden- tifier (DOI) or other standard identifiers [81]. Thus, citation counting beyond Thomson Reuters or Scopus databases is possible, e.g. via Google Scholar, CrossRef, or within Open Access Repositories [42]. Furthermore, responses to papers can be filtered with various Web 2.0 tools (e.g. Altmetrics.com [82]), which are often combined with platforms to share and discuss diverse scholarly outputs (e.g. Impactstory.org). Such data are also tested for the evaluation of the societal use of research [83]. Consequently, the call for open metrics includes open access to citation data in existing citation databases and all upcoming metrics that record citations and utilisation data [42].
In conclusion, there are many opportunities for increasing transparency and interaction in review processes, facilitating and acknowledging cooperative behaviour and including a higher diversity of scientific products and ways of recognising them in research evaluation processes. This may help to improve current evaluation systems. Until now, these approacheshave mostly been restricted to scientific outputs, butthey may likewise be used to disseminate outputs andimplement feedback functions tailored to diverse usercommunitiesoutsideacademia.Forexample,enhanced data assessment and communication tools arealso found to support the concept of citizen science [84], where citizens carry out research or collect dataas volunteers [85].
Science politics, funding procedures and applied eva- luation criteria are important drivers of research fo- cuses, and therefore determine what knowledge will exist to face future societal challenges. As seen al- ready in Section 2.1.1, research funders are increas- ingly interested in supporting transdisciplinarity and related research approaches and they also support open access. For example, the most recent European research programme, "Horizon 2020" [86], [87] highlights the need for multi-stakeholder approaches and the support of "systems of innovation" via European Inno- vation Partnerships [88]. It also makes open access to scientific peer-reviewed publications obligatory and tests open data approaches in certain core areas [89].
Adequate measures to support "Research and Development for Sustainable Development" via research programming are provided by VisionRD4SD, a col- laboration process between European research funders. It identifies measures for the whole programme cycle, presents them in a prototype resource tool and recommends a European or international platform to support networking, dialogue and learning processes on this subject [90]. Likewise, a guide for policyrelevant sustainability research is directed at funding agencies, researchers and policymakers [91].
Institutions and funders who are interested in applying concepts of research evaluation beyond scientific impact (see criteria in Figure 1) can build on existing approaches. Evaluation concepts are developed for interdisciplinary and transdisciplinary research and for societal impact assessment used by research agen- cies, research institutions or for policy analysis (reviews may be found in [92], [93], [94]). Examples of regularly ap- plied evaluation procedures including societal outputs are the Standard Evaluation Protocol for Universities in the Netherlands ([95] p. 5) (see below) and the Research Excellence Framework in the UK [96].
In the section that follows, we will suggest meas- ures to ensure, that evaluation beyond scientific im- pact is effectively. First, steps should be taken to ensure that societal impact criteria are applied by reviewers, although these indicators may be felt to be outside of reviewers' realm of disciplinary expertise [97] or of lesser importance to them ([48] pp. 32‒35). Interestingly, in one study ([48] pp. 32‒35), societal impact indicators such as relevance for global societal challenges or citizens' concerns, public outreach, contribution to science education and usefulness for political decision-makers were ranked higher in agri- cultural research than in other fields, and they were ranked higher by students than by professors. Such results suggest that not only peers, but also knowledge users ([15] p. 548), [97] should be involved in evaluation. To increase the ability of scientists and others to judge societal impacts, data on the societal impact of research and their proxies (hereinafter subsumed as societal impact data) could provide a transparent and reliable basis for such judgement.
Furthermore, the experiences documented in Section 2.3 suggest avoiding narrow indicator sets and their use for competitive benchmarking or metrics- based resource allocation. Instead, broad indicator sets and fair and interactive processes which support organisational development [30] or learning processes [98] need to be applied. One example is the above- mentioned Standard Evaluation Protocol in the Netherlands, where "the research unit's own strategy and targets are guiding principles when designing the assessment process" ([95] p. 5).
However, when funders or institutions begin to apply evaluation beyond scientific impact, they should focus on increasing the acknowledgement of societal impact within the scientific reputation system in general. This is necessary to ensure that their incentives are effective and do not merely increase researchers' trade-offs between contributing to scientific and societal impact. Adequate measures adopted by funders could be additional funding or distinctions of particularly suc- cessful projects as "take-home values" for researchers.
Moreover, research institutions and research funders should become active in improving data availability. Only with reliable and easy-to-use data beyond scien- tific impact can balanced research evaluation be con- ducted frequently enough to provide the desired in- centives within the scientific system.
Until now, research funding agencies have often demanded detailed reporting on the dissemination and exploitation of results. In German federal research, exploitation plans are required as text documents for proposals and reports [99]. Proposals for Horizon 2020 include plans for dissemination and exploitation ([100] p. 17), but the need to improve digital data assessment for evaluation purposes is also emphasised ([101] p. 47). However, texts with societal impact descriptions cannot be analysed with ease, and the facilities they offer in terms of filtering and cross-referencing are also poor, so they have little value for research evaluation or for the sharing of the information within the scientific system. Likewise, the use of digital systems is only valuable if they allow multiple reuse of data.
4. Improve Data Availability for Evaluation beyond Scientific Impact
To improve the availability of data for societal impact evaluation, we recommend uniting the interests of institutions and funders in such data and giving them more leverage by making use of the current state of interoperability in e-infrastructures, especially research information systems and publication metadata.
Interoperability, in general, enables the exchange, aggregation and use of information for electronic data processing between different systems. Its functionality depends on system structures and exchange formats (entities and attributes), federated identifiers (for persons, institutions, projects, publications and other objects) and shared (or even mapped) vocabularies and semantics [102]. Thus, interoperability includes, besides technical aspects, cooperation to reach agreement.
The interests of institutions and funders in societal impact data may be served by the possibilities of Current Research Information Systems (CRIS). These are used increasingly by research institutions as a tool to manage, provide access to and disseminate research information. Standardisation of CRIS aims to enable automated data input, e.g. via connection to publication databases, and ensure it is only necessary for data to be input manually once but can be used many times (e.g. for automated CVs, bibliographies, project participation lists, institutional web page gen- eration, etc.) [103]. Standardisation is promoted by euroCRIS via the CERIF standard (Common European Research Information Format) [103] and CASRAI (Consortia Advancing Standards in Research Administration Information) via the development of data profiles and semantics [104], and is embedded in diverse collaborations with initiatives related to inter- operability and open access [105].
The CERIF standard is explicitly convenient for enabling interoperability between research institutions and funders, because research outputs can be as- signed to projects, persons and organisational units. In the UK, interface management between the re- search councils and higher education institutions is already established, and societal outputs and impacts are part of the data assessment [106], [107]. The aim is to develop these systems further by applying the current CERIF standard in order to increase interoperability with institutional CRIS. It has been shown that output and impact types used in the UK can be implemented in the current CERIF standard [108].
Accordingly, research funders should engage in the development and use of CERIF-CRIS that (a) include data related to interactions with, and benefit for, practice and society, and (b) partly replace written documents in the process of application and reporting. They should (c) act as data providers by making data available, e.g. via interface management with research institutions, file transfer for individual scientists and re-use of data for subsequent proposals and reports. Thus, funders can contribute to the provision of comprehensive societal impact data without increasing the documentation effort for scientists. In doing so, they also help to corroborate and ensure the quality of such data.
To facilitate these aims, several measures can be applied. Regarding (a), it is necessary to develop shared vocabularies for societal impact related to out- puts and outcomes. Compiling societal impact data (based on existing evaluation concepts and documentation tools) and structuring them in coherence with CRIS standards (e.g. CERIF, CASRAI) is one task in the project 'Practice Impact II' [109]. Furthermore, funders, researchers and their associations that are interested in societal impact could formulate a mandate to CASRAI and euroCRIS to further develop shared vocabularies for types and attributes of output, outcome and impact towards society and stay in- volved in this process. Such a commitment would also facilitate the integration of societal impact data in their CRIS by different providers, and this would create a base for data transfer between funders and institutions with regard to (c).
Regarding (b), it is necessary to build a closer connection between those data and the documentationrequirements in proposals and reports. The above-mentioned research project, "Practice Impact II", isdeveloping this with a focus on German federal re-search in the realm of organic and sustainable agriculture.Theprojectintegratestheuserperspectivesofscientists, research funding agencies and evaluators inits development andtesting [109], [110], in order toachieve the required usability and reduction in effort,withregardto (c),above.
Regarding (c), there are further possibilities besides the interoperability between funders and institutions. CRIS, with their function as repositories, are also tools for presenting research results to the public. Research funders could use them to support open access dissemination tailored to specific target groups within and beyond academia. Furthermore, closer connections between societal impact data and scientific publications might be established.
For bibliographic metadata of publications, such as authors, title, year, interoperability has already been developed further than it has for other research outputs. Common vocabularies for publication types, advancement of standards and mapping between different standards of metadata are being pushed ahead by libraries [111] and open access repositories [112], [113] in order to aggregate machine-readable metadata from multiple systems to create new platforms or services [114]. Furthermore, linked data standards (like the Resource Description Framework, RDF) help to apply the full benefit of web applications for bibliographic metadata. The RDF, for example, allows classical standards-based metadata to be complemented with socially constructed metadata, e.g. user tags, comments, reviews, links, ratings or recommendations [115]. Furthermore, in future, closer links between data and publications will evolve. For example, in 2013, the research data alliance (RDA) started to build social and technical bridges to enable open sharing and interoperability of research data and make them citable, also with an agricultural section [79]. The practice of linking scientific publications with their associated data with the aim of increasing reliability is a recent innovation [80].
Accordingly, the development of systems that link scientific publications via the project to research outputs for audiences outside academia, and to the interactions and impacts of this research as an indication of their societal relevance and applicability is a promising opportunity. Such an increase in the visibility of knowledge tailored towards specific target groups can increase the realworld impact of research and record that impact via feedback functions.
5. Conclusion: Argumentation for Evaluation beyond Scientific Impact
Joint interests of the actors introduced in this paper can be built on the basis that science needs to generate greater societal benefit, and that high scientific quality is a precondition for that. Higher societal benefit is then associated both with open access and with tailored knowledge production and dissemination for audiences beyond academia. Furthermore, evaluation beyond scientific impact can be given some leverage by the full use of digital communication technologies and progress in interoperability. The possible measures suggested in this paper assume close cooperation among various actors (Figure 3).
Research funders in particular may support changes in knowledge production because they perform programme design, define funding criteria, and may provide easy-to-use data related to societal impact, for example if research institutions aim to be evaluated with a balance of scientific and societal impact.
As argued in this paper, the measures summarised above are also valid for organic agricultural research and related fields. In the section that follows, some measures and opportunities will be specified.
• Being small, the (organic) agricultural research community may focus on commonalities with other movements. For example, it may benefit from critical
voices in scientific impact evaluation, statements of sustainability research and open access movements, which provide the base for introducing criteria beyond scientific impact in research evaluation.
• The (organic) agricultural research community has several synergies with the sustainability (research) community. One is the potential for mutual
learning to further develop transdisciplinary research concepts and their proficient application. Another is to organise more powerful support for those research approaches via adequate funding and acknowledgement of societal impact indicators in research evaluation.
• Building up a closer connection between open access and knowledge production tailored to societal needs as two complementary aspects of the societal benefit of science corresponds well with the self-conception of (organic) agricultural research.
If agricultural research funders intend to improve the capabilities for agricultural research to contribute to real-world impact and sustainable development, they should engage in improving access to societal impact data for supporting evaluation beyond scientific impact within the scientific system. Use-cases for CRIS that integrate societal impact data, reveal funders' needs and reduce scientists' efforts towards proposals and reports may be developed successful in agricultural research. This is because funders and the research community in agricultural research are well connected to jointly develop a use-case with effective feedback loops. Furthermore, they may share their experiences in assessment of societal impact data with other research fields and funders. This may lead to further involvement in processes that support the standardisation and interoperability of those societal impact data.
To conclude, the range of interest groups and viable measures is such that there is no need to accept the deficits in current research evaluation systems, it is possible to change them!
Our research has been supported by the German Federal Ministry of Food and Agriculture through the Federal Agency of Agriculture and Food within the Federal Programme of Organic Farming and Other Types of Sustainable Agriculture; project title: Development and Testing of a Concept for the Documentation and Evaluation of the Societal Impact of Agricultural Research. Furthermore we would sincerely like to thank Donal Murphy Bokern, Thorsten Michaelis and Hansjörg Gaus for their very helpful recommendations, and Thomas Lindenthal for his collaboration in our initial work on this topic.