Evaluation and Comparison of Academic Impact and Disruptive Innovation Level of Medical Journals: Bibliometric Analysis and Disruptive Evaluation

Background As an important platform for researchers to present their academic findings, medical journals have a close relationship between their evaluation orientation and the value orientation of their published research results. However, the differences between the academic impact and level of disruptive innovation of medical journals have not been examined by any study yet. Objective This study aims to compare the relationships and differences between the academic impact, disruptive innovation levels, and peer review results of medical journals and published research papers. We also analyzed the similarities and differences in the impact evaluations, disruptive innovations, and peer reviews for different types of medical research papers and the underlying reasons. Methods The general and internal medicine Science Citation Index Expanded (SCIE) journals in 2018 were chosen as the study object to explore the differences in the academic impact and level of disruptive innovation of medical journals based on the OpenCitations Index of PubMed open PMID-to-PMID citations (POCI) and H1Connect databases, respectively, and we compared them with the results of peer review. Results First, the correlation coefficients of the Journal Disruption Index (JDI) with the Journal Cumulative Citation for 5 years (JCC5), Journal Impact Factor (JIF), and Journal Citation Indicator (JCI) were 0.677, 0.585, and 0.621, respectively. The correlation coefficient of the absolute disruption index (Dz) with the Cumulative Citation for 5 years (CC5) was 0.635. However, the average difference in the disruptive innovation and academic influence rankings of journals reached 20 places (about 17.5%). The average difference in the disruptive innovation and influence rankings of research papers reached about 2700 places (about 17.7%). The differences reflect the essential difference between the two evaluation systems. Second, the top 7 journals selected based on JDI, JCC5, JIF, and JCI were the same, and all of them were H-journals. Although 8 (8/15, 53%), 96 (96/150, 64%), and 880 (880/1500, 58.67%) of the top 0.1%, top 1%, and top 10% papers selected based on Dz and CC5, respectively, were the same. Third, research papers with the “changes clinical practice” tag showed only moderate innovation (4.96) and impact (241.67) levels but had high levels of peer-reviewed recognition (6.00) and attention (2.83). Conclusions The results of the study show that research evaluation based on innovative indicators is detached from the traditional impact evaluation system. The 3 evaluation systems (impact evaluation, disruptive innovation evaluation, and peer review) only have high consistency for authoritative journals and top papers. Neither a single impact indicator nor an innovative indicator can directly reflect the impact of medical research for clinical practice. How to establish an integrated, comprehensive, scientific, and reasonable journal evaluation system to improve the existing evaluation system of medical journals still needs further research.


Introduction
Scientific and technical journals play a crucial role in showcasing research findings, and the value orientation of their published results is closely intertwined with their evaluation orientation.However, since Garfield [1] put forward the idea that "citation analysis can be used as an evaluation tool for journals" in 1972, the evaluation system of journals based on academic impact has become mainstream.However, relying too much on impact indicators for evaluation may hurt academic research and discipline progress.
On the one hand, some scholars have long pointed out that ranking journals according to their impact factors is noncomprehensive and may lead to misleading conclusions [2,3].Meanwhile, many academic journals and publishers have engaged in strategic self-citation, leading to an overinflated journal impact factor (JIF) [4].Some editorial behaviors to enhance the JIF clearly violate academic norms [5].Some scholars are overciting each other's work to enhance their academic impact [6].External contingencies can have a devastating effect on citation indicators [7].Scientists themselves also present a mixed attitude toward impact factors [8].
On the other hand, despite all the benefits of increased academic impact to journals, there is a nonnegligible problem in the evaluation of journals that citation indicators essentially characterize the impact of journals rather than their disruptive innovation [9].Relevant studies have confirmed that the level of disruptive innovation of scientific research is getting increasingly lower [10] and the progress in various disciplines is slowing [11].This is often overlooked against the background of impact-only evaluations.Therefore, despite the urgent need for disruptive innovations in science [12], impact-based journal rankings have made it more difficult to accept novel results [13], replacing the "taste of science" with the "taste of publication" [14] in the actual environment.
The evaluation of academic journals is about not only the journals themselves [15] but also the wide use of the evaluation results in academic review, promotion, and tenure decisions [16].Meanwhile, the quality and results of medical research are directly related to human health and life and have a direct impact on human health and well-being.Therefore, the general and internal medicine journals indexed in Science Citation Index Expanded (SCIE) in 2018 were chosen as the study object.The OpenCitations Index of PubMed open PMID-to-PMID citations (POCI) and H1 Connect databases were selected as the sources for citation relationship data and peer review data.We investigated the connections and contrasts between the academic impact, disruptive innovation level, and results of peer review for medical journals and published research papers.We also analyzed the similarities and differences as well as the fundamental causes of the varying evaluation results in terms of impact evaluation, disruptive innovation, and peer review for various types of medical research papers.We aimed to provide a reference for the correct understanding of the innovation level of the results published by journals; the scientific and reasonable evaluation of medical journals; and the construction of a scientific, objective, and fair academic evaluation system.

Research Object
Because there is basically no disruptive innovation in the review literature, this paper only focuses on the general and internal medicine journals indexed in SCIE in 2018 and the research papers involved.In addition, considering the computational efficiency, accuracy, and difficulty of data acquisition, research papers for which citation relationships in the aforementioned journals were not included in POCI and the journals with too few references (less than 10) were excluded.Finally, 114 journals were retained at the journal level, and 15,206 research papers were retained at the paper level.

Data Resource
The data acquired in this study included journal information, literature information, citation relationship data, and peer review data.The data were obtained through the Journal Citation Reports (JCR), Web of Science (WoS), POCI, and H1 Connect databases.
Of these databases, POCI is a Resource Description Framework (RDF) data set containing details of all the citations from publications bearing PMIDs to other PMID-identified publications harvested from the National Institutes of Health Open Citations Collection.POCI covers more than 29 million bibliographic resources and more than 717 million citation links (as of the last update in January 2023).Citations in POCI are not considered simple links but as data entities.This means that it is permissible to assign descriptive attributes to each citation, such as the date of creation of the citation, its time span, and its type [17].
H1 Connect (formerly Faculty Opinions), the world's most authoritative peer review database in the biomedical field, incorporates the combined efforts of more than 8000 international authoritative experts from around the globe and is a knowledge discovery tool used to evaluate published research.H1 Connect's reviewers are authoritative experts in the life sciences and medical fields.They provide commentary, opinions, and validation of key papers in their own field.The quality and rigor of the reviewers mean that researchers can be assured of the quality of the papers they recommend, and H1 Connect brings these recommendations together to recommend high-quality research to a wider audience.H1 Connect's experts typically evaluate the "high level" of research literature in the field within 2 months of publication, with over 90% of recommendations made within 6 months of publication.

Data Acquisition and Processing
The data acquired for this study consisted of 3 parts.The specific steps for data acquisition and processing were (1) data acquisition and (2) data processing.

Data Acquisition
The steps taken to acquire the data included the following: log in to JCR; select Medicine, General & Internal in the "Browse categories" page; select JCR Year=2018 and Citation Indexes=SCIE in the filters; export the result to XLS format; according to the acquired journal title, select the publication year as 2018 and the literature type as Article to search the WoS core database; and export the full record of the journal literature in XLS format.Finally, we downloaded the related H1 Connect literature data and POCI data according to the list of journals.

Data Processing
To process the data, we undertook the following steps: use Navicat to import the full record of research papers into the local SQLite database, process the downloaded POCI data, extract the PMID numbers of all the focus papers from the full record, retrieve the references and citations of the focus papers as well as the citations of the references of the focus papers in the local database transformed based on the POCI data, and establish the relevant data tables for the subsequent calculations.

Innovation Indicators
Some researchers have observed, at an early stage, that some technological innovations complete and improve current technologies without replacing them while others outright eliminate technologies that were used in the past.However, for a long time, scholars did not analyze and explain the essence of this phenomenon.It was not until 1986 that Tushman and Anderson [18] summarized the phenomenon as follows: There are 2 types of major technological shifts that disrupt or enhance the capabilities of existing firms in an industry.However, Christensen [19], a professor at Harvard Business School in the United States, argued that disruptive innovations are new technologies that replace existing mainstream technologies in unexpected ways.Building on these views, Funk and Owen-Smith [20] provided deeper and more insightful insights.They argued that the dichotomy between disruptive and augmentative technologies lacks nuance and that the impact of new technologies on the status quo is a matter of degree rather than absolute impact.
In this regard, Govindarajan and Kopalle [21] also pointed out that disruptive innovation lacks reliable and valid measurement standards.Therefore, Funk and Owen-Smith [20] created the consolidation-disruption (CD) index, which aims to quantify the degree of technological change brought about by new patents.The index drew the attention of Wu et al [22], who analogized the basic principle of the CD index to measure disruption by calculating the citation substitution of the focus paper in the citation network and who were the first to apply the evaluation of disruptive innovation to the world of bibliometrics.
As an important carrier of academic results, it is important to evaluate papers quantitatively, rationally, and efficiently in terms of their innovation [23].The disruption (D) index has received widespread attention after it was proposed by Wu et al [22].A subset of scholars then explored disruptive papers in specific subject areas, including scientometrics [24], craniofacial surgery [25], pediatric surgery [26], synthetic biology [27], energy security [28], colorectal surgery [29], otolaryngology [30], military trauma [31], breast cancer [32], radiology [33], ophthalmology [34], plastic surgery [35], urology [36], and general surgery [37], based on the D index.Park et al [10] also analyzed the annual dynamics of the disruption level of papers and patents across the subject areas.
Another group of scholars conducted in-depth research on the index itself: Bornmann et al [38] explored the convergent validity of the index and the variants that may enhance the effectiveness of the measure and tested the validity of the D index based on the literature in the field of physics [39].Ruan et al [40] provided an in-depth reflection on the limitations of the application of the D index as a measure of scientific and technological progress.Liu et al [41,42] empirically investigated the stabilization time window of the D index in different subject areas and addressed the mathematical inconsistency of the traditional D index and proposed an absolute disruption index (Dz; as in Equation 2) [43].
This series of studies has made it possible to evaluate the disruption of research papers based on the D index, which has gradually matured.On this basis, Jiang and Liu [44] proposed the Journal Disruption Index (JDI) to evaluate the disruptive innovation level of journals (as in Equation 3) and validated the evaluation effect of this indicator based on Chinese SCIE journals [45].
In Equations 1, 2, and 3, N F refers to the literature that only cites the focus paper (FP), N B refers to the literature that cites both the focus paper and at least one reference (R) of the focus paper, and N R refers to the literature that only cites at least one reference (R) of the focus paper but not the focus paper.n is the number of "Article" type pieces of literature contained in the journal, and Dz i is the Dz of the i th article in the journal.
In this study, Dz and JDI were chosen to evaluate the disruption of the selected studies at the literature and journal levels, respectively.Bornmann and Tekles [46] argued that 3 years is necessary regardless of the measurement for that discipline.Considering that, in the determination of the stabilization time window of the D index for each discipline conducted by Liu et al [42], the stabilization time window for clinical medicine is 4 years after publication.Therefore, in the process of calculating Dz in this paper, the citation time window of the focus papers was set to 2018 to 2022 to ensure the validity of the results.In addition, because the JDI value was too small, we multiplied the JDI by 1000 in all subsequent presentations.

Peer Review Indicators
The peer review indicators selected for this study included the peer review score (PScore), weighted peer review stars (PStar_w), and weighted peer review evaluation times (PTime_w).In this case, the weighted indicators refer to the weighting of ratings and number of evaluations by the number of evaluators when an evaluation was completed by more than one reviewer.
The advantage of using peer review indicators is that it can make up for the lag and one-sidedness of relying solely on the citations among the literature to assess the quality of the literature.It also corrects the shortcomings of the traditional JIF to judge the quality of the literature.Compared with a single impact indicator, it is more scientific.

Impact Indicators
The impact indicators selected for this study included the Cumulative Citation for 5 years (CC 5 ), JIF, Journal Citation Indicator (JCI), and Journal Cumulative Citation for 5 years (JCC 5 ).Among these, JIF is the total number of citations for scholarly articles published in the journal in the past 2 years divided by the total number of citable articles.JCI is the average Category Normalized Citation Impact (CNCI) of the citable literature published in the specific journal in the previous 3 years.The JCC 5 is obtained by dividing the sum of the citation frequencies recorded in the POCI repository of the corresponding research papers (focus papers) of the selected journals for the years 2018 to 2022 by the number of research papers published by the journal (as in Equation 4).
In Equation 4, a i C t represents the number of citations of the i th research paper of the journal in year t, and n is the number of research papers published by the journal.

Analysis of Differences in Academic Influence and Level of Disruptive Innovation of Journals
The academic impact and level of disruptive innovation of the selected 114 journals, the results of the correlation analyses of the indicators, and the differences in the rankings are shown in Table 1, Table 2, and Figure 1, respectively.From these, we can see that (1) journals that are at the top of the ranking of impact indicators are usually also ranked at the top in the ranking of disruptive innovation, (2) journals at the bottom of the impact rankings are usually at the bottom of the innovation rankings, and (3) there is little difference in the ranking results of journals under different impact indicators, but there is a big difference in the ranking results of influence and disruptive innovation.Although there are moderate correlations between the JDI of journals and the 3 influence indicators of JIF, JCI, and JCC 5 , the average difference between the disruptive innovation ranking and academic influence ranking of the journals included in the study reached 20 places.

Analysis of the Differences Between the Journals' Impact and Disruption and the Results of Peer Review
In order to better analyze the differences between the academic impact and level of disruptive innovation with peer-reviewed results of journals in the field of general and internal medicine, papers indexed in H1 Connect were referred to as "H-papers," and the source journals of "H-papers" are referred to as "H-journals."The evaluation indicators, ranking of H-journals, and the percentage of H-papers are shown in Table 3 and Table 4.We can see that (1) the top 7 journals in terms of both academic impact and disruptive innovation are all H-journals, (2) the average impact ranking of H-journals is higher than the average innovation ranking, and (3) some journals with low impact and innovation also became H-journals.

Analysis of Differences in Academic Impact and Level of Disruptive Innovation of Papers
Ideally, if an article is accepted by a specific journal, it is because its overall quality is similar to other papers previously published in that journal [47].However, journal-level indicators are, at best, only moderately suggestive of the quality of an article [48], which makes indicators that measure specific articles more popular [49].Therefore, in this study, research papers in the field of general and internal medicine were also evaluated in terms of their academic impact and level of disruptive innovation, and the results are shown in Figure 2.
From the results, we can see that research papers that rank high in the impact ranking usually also rank high in the disruptive innovation ranking.There were 8 (8/15, 53%), 96 (96/150, 64%), and 880 (880/1500, 58.67%) of the Top 0.1%, Top 1%, and Top 10% papers, respectively, selected based on the Dz and CC 5 that were the same.Second, the level of disruptive innovation of research papers with the same level of impact varied greatly, and the impact level of research papers with the same level of disruptive innovation also varied greatly.Third, despite the high correlation (r=0.635,P<.001) between the Dz and CC 5 of the selected research papers, the average difference between their innovation and impact rankings reached about 2700.Fourth, the actual analysis results showed no correlation between the innovation of the selected research paper and the number of references, which indicates that the Dz index is basically unaffected by the difference in the number of references in the actual evaluation process (r=0.006,P=.43).

Analysis of Differences Between Papers' Impact and Disruption and the Results of Peer Review
In order to better analyze the differences between the academic impact, level of disruptive innovation, and peer review results of journal research papers in the field of general and internal medicine, the differences between the academic impact and disruptive innovation level and the peer review results of H-papers were analyzed.The specific results are shown in Table 5 and Figure 3. From the results, we see that there were 8 (8/15, 53%), 65 (65/150, 43.3%), and 187 (187/1500, 12.47%) H-papers among the top 0.1%, top 1%, and top 10% papers, respectively, selected based on Dz.There were 5 (5/15, 33%), 74 (74/150, 49.3%), and 220 (220/1500, 14.67%) H-papers among the top 0.1%, top 1%, and top 10% papers, respectively, selected based on CC 5 .Second, there was a significant positive correlation between the peer review indicators, disruptive innovation indicators, and academic impact indicators of H-papers, reflecting the consistency between the quantitative evaluation and peer review at the overall level.Third, the average impact ranking of H-papers was 865 (top 5.68%), and the average disruptive innovation ranking was 1726 (top 11.35%), which means that the average impact ranking of H-papers was higher than the average disruptive innovation ranking.Fourth, some papers with low academic impact and disruptive innovation level also became H-papers.Fifth, compared with the CC 5 , Dz has a minor correlation advantage with PTime_w and PStar_w.

RenderX
In addition to rating and commenting on the included research papers, different labels are added by the reviewers of articles in H1 Connect according to their different internal characteristics.The relevant definitions are shown in Table 6 (refer to Du et al [50]).We categorized and counted the 257 H-papers (in this study, if a paper had more than 1 tag, it was counted separately in the calculation of each tag), and the results are shown in Table 7. From this, we can see that (1) research papers with the tags in the "Novel Drug Target," "Technical Advance," and "New Finding" categories had a high academic impact and a high disruptive innovation level; (2) research papers with the "Changes Clinical Practice" tag showed only a moderate academic impact and disruptive innovation level but had high levels of peer-reviewed recognition and attention; (3) experts showed higher recognition and concern for research papers with tags of "Negative/Null Result," "Controversial," "Refutation," and "Interesting Hypothesis," but their academic impact and disruptive innovation level were lower than others.

Evaluation of Disruptive Innovation Is Detached From the Traditional Evaluation System
From these research results, large differences were seen between the innovation and impact rankings of the journals and research papers included in the study.This is also consistent with the findings of Guo and Zhou [51].This phenomenon reflects the essential difference between the 2 different evaluation systems.It also proves that the evaluation of the disruptive innovation of research papers and journals based on Dz and JDI is not consistent with the traditional evaluation system of impact.
The essence of disruptive evaluation is to measure the innovation from the substitution level of knowledge structure.This evaluation method brings new ideas to the field of scientific research evaluation, helps relevant institutions and scholars remove the constraints of the traditional evaluation system, and helps to establish a value orientation of encouraging innovation for scientific research and scientific and technological journals, so as to promote the benign development of the academic ecology.

The 3 Evaluation Systems Only Have High Consensus on the Top Object
Although, given the consistency of scientific evaluation, there will be some uniformity in the level of disruptive innovation, level of academic impact, and peer review results of journals as well as research papers.
However, from the research results presented in this paper, we can see that (1) the top 7 journals in terms of both academic impact and disruptive innovation were all H-journals, (2) more than one-half of the top papers selected based on Dz and CC 5 were the same, (3) the average H-papers were ranked at the top in terms of impact and innovation, and (4) the results of the different evaluation systems only had high consensus on the authoritative journals and top papers in the field.
These findings are also consistent with those of Goman [52] and Chi et al [53].A fundamental reason for this phenomenon is that the purposes of the 3 evaluation systems are inherently different.Therefore, in the actual evaluation process, the 3 kinds of indicators are not interchangeable, and the combination of the 3 evaluation systems may be a feasible solution to establish a comprehensive, scientific, and reasonable journal evaluation system.

A Single Indicator Cannot Accurately Reflect the Impact of Medical Research on Clinical Practice Alone
From the aforementioned findings, we can see that research papers of the "Novel Drug Target," "Technical Advance," and "New Finding" types have high academic impact and high levels of innovation, which is in line with their classification definitions.This is also consistent with the findings of Du et al [50], Thelwall et al [54], and Jiang and Liu [55].Second, research papers of the "Changes Clinical Practice" type showed only moderate levels of innovation and impact but had high levels of peer-reviewed recognition and attention.This reflects the difficulty of evaluating the impact of a particular academic paper on clinical practice, whether the evaluation system is based on academic impact or level of disruptive innovation.Third, peer-reviewed experts show higher recognition and concern for research papers of the types "Negative/Null Result," "Controversial," "Refutation," and "Interesting Hypothesis," but the level of impact and disruptive innovation of these papers are lower.This partly reflects the current academic community's excessive focus on positive results; deliberate avoidance of negative results; and overall lack of support for debatable, falsified, and other types of research.
Several scholars have recently suggested that the evaluation system of medical journals should be redesigned for contemporary clinical impact and utility [56].In this regard, Thelwall and Maflahi [57] advocated that the references of a guideline are an important basis for studying clinical value.However, Traylor and Herrmann-Lingen [58] found only a weak correlation between the number of citations to individual journals in the guidelines and their respective JIF.Therefore, the JIF is not a suitable tool for assessing the clinical relevance of medical research.The results of this study similarly found that research papers of the "Changes Clinical Practice" type showed only moderate levels of disruptive innovation and academic impact, but such research papers received higher recognition and attention from peer review experts.Therefore, combining quantitative evaluation with peer review may be a feasible way to measure the impact of medical research on clinical practice.

Limitations
However, this study also has the following limitations.First, papers in the medical field have a preference for citing review articles [59], which has a certain impact on the evaluation of the disruptive innovation of research papers.Second, the scoring mechanism provided to its reviewers by H1 Connect has a low differentiation degree and cannot perfectly distinguish the differences in quality between papers yet.In addition, H1 Connect has too few evaluators.Brezis and Birukou [60] illustrated that, if the number of reviewers is increased to about 10, the correlation between the results and the quality of the paper will be significantly improved.However, it is difficult to seek so many high-quality experts who are willing to accept open peer review in the high pressure environment of "Publish or Perish" [61].Third, since the citation data sources used in this study are all based on PubMed data, this study also suffers from the problem of missing references and citations that are not labeled with a PMID, which affects the accuracy of the evaluation results to a certain extent.In future studies, we will obtain more accurate measurement results by jointly using multiple sources of citation data.

Conclusions
In this study, the general and internal medicine journals indexed in SCIE in 2018 were chosen as the study object.The POCI and H1 Connect databases were selected as sources of citation relationship data and peer review data.We investigated the connections and contrasts between the academic impact, level of disruptive innovation, and results of peer review for medical journals and published research papers.We also analyzed the similarities and differences as well as the fundamental causes of the varying evaluation results in terms of impact evaluation, disruptive innovation, and peer review for various types of medical research papers.
The results of this study show that the evaluation of scientific research based on the innovation index is detached from the traditional impact evaluation system, the 3 evaluation systems only have high consistency for authoritative journals and top papers, and neither the single impact index nor the innovation index can directly reflect the impact of medical research on clinical practice.
In addition, with the increasing importance of replicative science, the accuracy of statistical reports, evidential value of reported data, and replicability of given experimental results

XSL • FO
RenderX [62] should also be included in the examination of journal quality.How to establish a comprehensive, all-encompassing, scientific, and reasonable journal evaluation system needs to be further investigated.

a JCC 5 :
Journal Cumulative Citation for 5 years.b JDI: Journal Disruption Index.c JIF: Journal Impact Factor.

d
JCI: Journal Citation Indicator.
a JDI: Journal Disruption Index.b JCC 5 : Journal Cumulative Citation for 5 years.
c JIF: Journal Impact Factor.d JCI: Journal Citation Indicator.e Not applicable.

Figure 1 .
Figure 1.Comparison of ranking differences based on different evaluation indicators.JCC5: Journal Cumulative Citation for 5 years; JCI: Journal Citation Indicator; JDI: Journal Disruption Index; JIF: Journal Impact Factor.

Figure 2 .
Figure 2. Comparison of the ranking of the absolute disruption index (Dz) and Cumulative Citation for 5 years (CC5) of selected research papers.

b
Dz: absolute disruption index.c PTime_w: weighted peer review evaluation times.d PStar_w: weighted peer review stars.e PScore: peer review score.f Not applicable.

Figure 3 .
Figure 3.Comparison of ranking differences between H-papers (papers indexed in the H1 Connect database) in general and internal medicine based on different evaluation indicators.CC5: Cumulative Citation for 5 years; Dz: absolute disruption index; PStar: peer review stars.

b
Dz: absolute disruption index.cPScore: peer review score.d PStar_w: weighted peer review stars.ePTime_w: weighted peer review evaluation times.

Table 1 .
Comparison of academic influence and disruptive innovation level of journals.

Table 2 .
Correlation analysis of journal evaluation indicators.

Table 3 .
Evaluation indicators of H-journals (sources of papers indexed in the H1 Connect database).

Table 4 .
Ranking and percentage of H-papers (papers indexed in the H1 Connect database) in H-journals (sources of H-papers).
a JDI: Journal Disruption Index.bJIF: Journal Impact Factor.c JCI: Journal Citation Indicator.

Table 5 .
Validation of the correlation between 3 kinds of indicators of H-papers (papers indexed in the H1 Connect database).
a CC 5 : Cumulative Citation for 5 years.

Table 6 .
Categorization and definition of peer review tags.

Table 7 .
Details of research papers with different peer-reviewed tags.