UC Santa Barbara Departmental Working Papers

Similar documents
UC Santa Barbara Departmental Working Papers

Usage versus citation indicators

Focus on bibliometrics and altmetrics

Open Access Determinants and the Effect on Article Performance

Developing library services to support Research and Development (R&D): The journey to developing relationships.

The Impact Factor and other bibliometric indicators Key indicators of journal citation impact

Research Evaluation Metrics. Gali Halevi, MLS, PhD Chief Director Mount Sinai Health System Libraries Assistant Professor Department of Medicine

Bibliometric report

Web of Science Unlock the full potential of research discovery

Using Bibliometric Analyses for Evaluating Leading Journals and Top Researchers in SoTL

Predicting the Importance of Current Papers

EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS

Results of the bibliometric study on the Faculty of Veterinary Medicine of the Utrecht University

DON T SPECULATE. VALIDATE. A new standard of journal citation impact.

Cited Publications 1 (ISI Indexed) (6 Apr 2012)

Appalachian College of Pharmacy. Library and Learning Resource Center. Collection Development Policy

arxiv: v1 [cs.dl] 8 Oct 2014

Methods for the generation of normalized citation impact scores. in bibliometrics: Which method best reflects the judgements of experts?

1.1 What is CiteScore? Why don t you include articles-in-press in CiteScore? Why don t you include abstracts in CiteScore?

Bibliometric Rankings of Journals Based on the Thomson Reuters Citations Database

BIBLIOMETRIC REPORT. Bibliometric analysis of Mälardalen University. Final Report - updated. April 28 th, 2014

The real deal! Applying bibliometrics in research assessment and management...

InCites Indicators Handbook

F1000 recommendations as a new data source for research evaluation: A comparison with citations

hprints , version 1-1 Oct 2008

PBL Netherlands Environmental Assessment Agency (PBL): Research performance analysis ( )

Keywords: Publications, Citation Impact, Scholarly Productivity, Scopus, Web of Science, Iran.

Journal Article Share

Scientometrics & Altmetrics

UNDERSTANDING JOURNAL METRICS

Measuring the Impact of Electronic Publishing on Citation Indicators of Education Journals

Analysis of data from the pilot exercise to develop bibliometric indicators for the REF

Ebook Collection Analysis: Subject and Publisher Trends

Professor Birger Hjørland and associate professor Jeppe Nicolaisen hereby endorse the proposal by

STI 2018 Conference Proceedings

The use of bibliometrics in the Italian Research Evaluation exercises

On full text download and citation distributions in scientific-scholarly journals

Scientometric and Webometric Methods

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

GPLL234 - Choosing the right journal for your research: predatory publishers & open access. March 29, 2017

Appropriate and Inappropriate Uses of Journal Bibliometric Indicators (Why do we need more than one?)


THE USE OF THOMSON REUTERS RESEARCH ANALYTIC RESOURCES IN ACADEMIC PERFORMANCE EVALUATION DR. EVANGELIA A.E.C. LIPITAKIS SEPTEMBER 2014

MEASURING EMERGING SCIENTIFIC IMPACT AND CURRENT RESEARCH TRENDS: A COMPARISON OF ALTMETRIC AND HOT PAPERS INDICATORS

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

Embedding Librarians into the STEM Publication Process. Scientists and librarians both recognize the importance of peer-reviewed scholarly

Corso di dottorato in Scienze Farmacologiche Information Literacy in Pharmacological Sciences 2018 WEB OF SCIENCE SCOPUS AUTHOR INDENTIFIERS

RESEARCH. Open access publishing, article downloads, and citations: randomised controlled trial

Citation Analysis. Presented by: Rama R Ramakrishnan Librarian (Instructional Services) Engineering Librarian (Aerospace & Mechanical)

Introduction to Citation Metrics

Microsoft Academic: is the Phoenix getting wings?

in the Howard County Public School System and Rocketship Education

Some citation-related characteristics of scientific journals published in individual countries

Elsevier Databases Training

WEB OF SCIENCE JOURNAL SELECTION PROCESS THE PATHWAY TO EXCELLENCE IN SCHOLARLY COMMUNICATION

SCOPUS : BEST PRACTICES. Presented by Ozge Sertdemir

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3, Issue 2, March 2014

UCSB LIBRARY COLLECTION SPACE PLANNING INITIATIVE: REPORT ON THE UCSB LIBRARY COLLECTIONS SURVEY OUTCOMES AND PLANNING STRATEGIES

Measuring Academic Impact

A bibliometric analysis of publications by staff from Mid Yorkshire Hospitals NHS Trust,

Percentile Rank and Author Superiority Indexes for Evaluating Individual Journal Articles and the Author's Overall Citation Performance

The use of citation speed to understand the effects of a multi-institutional science center

Alphabetical co-authorship in the social sciences and humanities: evidence from a comprehensive local database 1

On the causes of subject-specific citation rates in Web of Science.

Print versus Electronic Journal Use in Three Sci/Tech Disciplines: What s Going On Here? Tammy R. Siebenberg* Information Literacy Coordinator

Journal Citation Reports on the Web. Don Sechler Customer Education Science and Scholarly Research

Eigenfactor : Does the Principle of Repeated Improvement Result in Better Journal. Impact Estimates than Raw Citation Counts?

Citation analysis: Web of science, scopus. Masoud Mohammadi Golestan University of Medical Sciences Information Management and Research Network

The journal relative impact: an indicator for journal assessment

Citation Metrics. BJKines-NJBAS Volume-6, Dec

Can scientific impact be judged prospectively? A bibliometric test of Simonton s model of creative productivity

Microsoft Academic is one year old: the Phoenix is ready to leave the nest

REFERENCES MADE AND CITATIONS RECEIVED BY SCIENTIFIC ARTICLES

Citation analysis: State of the art, good practices, and future developments

Complementary bibliometric analysis of the Health and Welfare (HV) research specialisation

Running a Journal.... the right one

Articles with short titles describing the results are cited more often

STAT 113: Statistics and Society Ellen Gundlach, Purdue University. (Chapters refer to Moore and Notz, Statistics: Concepts and Controversies, 8e)

Classic papers: déjà vu, a step further in the bibliometric exploitation of Google Scholar

Using InCites for strategic planning and research monitoring in St.Petersburg State University

Authorship Trends and Collaborative Research in Veterinary Sciences: A Bibliometric Study

In basic science the percentage of authoritative references decreases as bibliographies become shorter

A Taxonomy of Bibliometric Performance Indicators Based on the Property of Consistency

Citation & Journal Impact Analysis

Fate of manuscripts rejected by a non-english-language general medical journal: a retrospective cohort study

A systematic empirical comparison of different approaches for normalizing citation impact indicators

Patron-Driven Acquisition: What Do We Know about Our Patrons?

The problems of field-normalization of bibliometric data and comparison among research institutions: Recent Developments

Mapping the Research Productivity of Three Medical Sciences Journals Published in Saudi Arabia: A Comparative Bibliometric Study

Chapter 27. Inferences for Regression. Remembering Regression. An Example: Body Fat and Waist Size. Remembering Regression (cont.)

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

WHO S CITING YOU? TRACKING THE IMPACT OF YOUR RESEARCH PRACTICAL PROFESSOR WORKSHOPS MISSISSIPPI STATE UNIVERSITY LIBRARIES

What are Bibliometrics?

Bibliometric Analyses of World Science

Linear mixed models and when implied assumptions not appropriate

Making Hard Choices: Using Data to Make Collections Decisions

Citation Analysis in Research Evaluation

A Correlation Analysis of Normalized Indicators of Citation

Source normalized indicators of citation impact: An overview of different approaches and an empirical comparison

Transcription:

UC Santa Barbara Departmental Working Papers Title Using downloads and citations to evaluate journals Permalink https://escholarship.org/uc/item/1f221007 Authors Wood-Doughty, Alex Bergstrom, Ted Steigerwald, Douglas Publication Date 2017-11-26 Data Availability The data associated with this publication are available upon request. escholarship.org Powered by the California Digital Library University of California

Using Downloads and Citations to Evaluate Journals Alex Wood-Doughty Ted Bergstrom Douglas G. Steigerwald Department of Economics University of California Santa Barbara November 26, 2017 Abstract Download rates of academic journals have joined citation rates as commonly used measures of research influence. But in what ways and to what extent do the two measures differ? This paper examines six years of download data for more than five thousand journals subscribed to by the University of California system. While download rates of journals are highly correlated with citation rates, the average ratio of downloads to citations varies substantially among academic disciplines. We find that, typically, the ratio of a journal s downloads to citations depends positively on its impact factor. Surprisingly, we find that, controlling for citation rates, number of articles, academic discipline and year of download, there remains a publisher effect, with some publishers recording significantly more downloads than would be predicted from characteristics of their journals. Download statistics are recorded and supplied to libraries by journal publishers, often subject to confidentiality clauses. If libraries use download statistics to evaluate journals, they may want to account for publisher bias in these statistics. The authors thank Chan Li and Nga Ong of the California Digital Library for helping us to obtain download data. 1

1 Introduction Measures of the impact and influence of academic research are valuable to many decisionmakers. University librarians use them to make purchasing and renewal decisions. 1 Academic departments use them in their hiring, tenure, and salary decisions. 2 Funding agencies use them to assess grant applicants. They are also used in determining the public rankings of journals, academic departments, and universities. 3 Citation counts have long been the most common measures of research influence. Eugene Garfield s Institute for Scientific Information introduced the systematic use of citation data with the Science Citation Index in 1964, and Journal Citation Reports (JCR) in 1975. 4 The advent of electronic publishing has given rise to a new measure of research influence: download counts. 5 For library evaluations, download counts offer some advantages over citation counts. Only a minority of those who download a journal article will cite it. Citation counts reflect the activities of scholars worldwide. Subscribing libraries can observe the number of downloads from their own institutions, which reflect their own patterns of research interests. For academic departments and granting agencies, the use of download data in addition to citation records yields an enriched profile of the influence of individual researchers work. 6 Download data also have the advantage of being much more immediate than citation data, a valuable feature for tenure committees or grant review panels tasked with evaluating the work of younger academics. Several previous articles have explored correlations between citations and downloads. Examples include Moed (2005); Duy and Vaughan (2006); Wan et al. (2010); Coughlin and Jansen (2015); Gorraiz, Gumpenberger and Schlögl (2014); Moed and Halevi (2016); Vaughan, Tang and Yang (2017). Brody, Harnad and Carr (2006) examine the extent to which downloads from the physics e-print archive, arxiv.org, predict later citations of an article. McDonald (2007) explores the ability of prior downloads at the California Institute of Technology (Caltech) to predict article citations by authors from Caltech. Most of these studies are limited to a small number of journals within a few narrowly defined disciplines. Our download data includes downloads at the ten University of California campuses from more than 5,000 academic journals in a wide variety of academic disciplines. This rich source of data allows us to explore several interesting questions, including the following: 1 See Coughlin, Campbell and Jansen (2013); Gallagher, Bauer and Dollar (2005). 2 Gibson, Anderson and Tressler (2014); Ellison (2013) 3 Hazelkorn (2015) 4 A brief history of the science citation index and the impact factor appears in (Garfield, 2007). 5 Kurtz and Bollen (2010) present a broad-ranging summary and history of the application of download information and other direct measures of journal usage. 6 Kurtz et al. (2005) and Kurtz and Henneken (2017) demonstrate such analysis as applied to astrophysicists. 2

How do the average numbers of downloads and citations and the ratio of downloads to citations differ across research disciplines? Do more prestigious journals differ from less prestigious journals in the ratio of citations to downloads? Is the ratio of downloads to citations for journals consistent across publishers? 2 Data Our data include numbers of downloads, citations and articles per year for more than 5,000 scholarly and scientific journals. The citations data come from the website SCImago Journal & Country Rank, which records for each journal, in each year, the number of citations during that year to articles that were published in that journal within the preceding three years. We also obtained estimates of number of articles published annually by each journal from the SCImago website. 7 The citations and article-count data cover the period 2010-2016. Key to our analysis are the data on successful online full-text article requests (downloads) that we obtained from the ten-campus University of California library system. While most publishers supply their subscribing libraries with institution-specific data on downloads, restrictive clauses in publishing contracts typically forbid public access to this information. The University of California data are not subject to such restrictive clauses. Publishers prepare download data according to guidelines set by COUNTER (Counting Online Usage of Networked Electronic Resources), a nonprofit organization set up by libraries, data vendors and publishers to ensure that online usage statistics are comparable. Almost all publishers provide journal download data to their institutional subscribers at the COUNTER level known as Journal Report 1 (JR1), which reports the monthly number of downloads to all articles that have ever been published in that journal. A smaller number of publishers also provide data at the Journal Report 5 (JR5) level, which reports the number of downloads in the current year, while specifying the year in which each downloaded article was published. For example, the JR5 data for 2015 would report the number of articles that were published in each year since 2000 and downloaded in 2015. In this paper, we analyze University of California downloads from four large commercial publishers Elsevier, Springer, Taylor & Francis, and Wiley that publish across many disciplines, one commercial publisher that specializes in life and physical sciences Nature 7 SCImago reports annual numbers of documents, which includes not only articles, but also book reviews, letters to the editor, and opinion pieces. Scimago also reports three year totals of both the number of documents and the number of citable documents. We estimate annual numbers of citable documents by multiplying the number of documents reported for a year, by the ratio of citable documents to documents for the adjacent 3-year interval. 3

Publishing Group (NPG), and two professional society publishers American Chemical Society (ACS) and Institute of Electrical and Electronics Engineers (IEEE). 8 For each of these publishers, we have annual JR5 data on downloads that occurred in each of the years 2013 to 2016 of articles that were published in each year from 2000 to 2016. For three of the publishers we have additional data: downloads for 2012 for Elsevier, and downloads for 2011 and 2012 for Springer and Taylor & Francis. For each of the 5,423 journals offered by these publishers, we have download data from four to six years, giving us a total of 26,793 journal-year observations. We use the California Digital Library s classification system to associate each journal with a broad research area and with a specialized discipline. Because some journals are rarely downloaded or cited, they have not been classified. After eliminating these low-use journals, our data set consists of 5,423 journals classified into one of four broad research fields: Arts and Humanities, Life and Health Sciences, Physical Sciences and Engineering, and Social Sciences. Within these broad areas, journals are partitioned into 163 specialized research fields. Table 1 shows the distribution of journals by broad research field across publishers. As the table shows, each of the four large commercial publishers has a significant presence in all four research fields, while the other publishers have more limited scope. Table 1: Number of Journals by Research Field and Publisher Arts and Humanities Life and Health Sciences Physical Sciences and Engineering Social Sciences Number of Journals Elsevier 18 875 613 267 1773 Springer 30 517 476 178 1201 Taylor & Francis 94 107 171 546 918 Wiley 59 541 247 422 1269 ACS 0 9 35 0 44 IEEE 0 2 140 3 145 NPG 0 65 8 0 73 All Publishers 201 2116 1690 1416 5423 Note: Statistics for the universe of unique journals in our dataset 3 Patterns of Downloads and Citations by Field and Publisher Because our download and citation data are compiled at the journal level, we account for differences in the number of articles per journal. For each journal in our dataset and for each year in which we have JR5 download data, we find the total number of University of 8 For NPG we exclude the journal Nature, due to its broader, general interest readership. 4

California downloads of articles published in the current year and the previous two years. We divide by the number of articles published in that journal during this period. We call this ratio the number of UC downloads per recent article for the year in which the downloads take place. Table 2: Downloads and Citations per Recent Article by Research Field Mean Median P75 P90 Arts and Humanities UC Downloads 4.8 3.3 6.2 10.4 Citations 1.8 1.2 2.4 4.1 Ratio 5.90 2.75 5.52 12.43 Life Sciences UC Downloads 12.8 6.0 11.5 21.5 Citations 8.6 6.5 10.2 15.6 Ratio 1.84 1.00 1.64 2.78 Physical Sciences UC Downloads 5.3 2.6 5.3 9.7 Citations 6.9 5.0 8.3 12.8 Ratio 0.92 0.55 0.92 1.50 Social Sciences UC Downloads 5.6 3.3 7.0 13.3 Citations 4.3 3.1 5.5 8.8 Ratio 2.15 1.12 2.18 4.07 All Fields Combined UC Downloads 8.2 3.9 8.1 15.4 Citations 6.7 4.8 8.3 12.8 Ratio 1.79 0.85 1.59 3.02 Source: 2011-2016 JR5 download reports for University of California The number of citations per recent article is more commonly known as the journal s impact factor for the citation year. 9 This is the number of citations to articles published in the three previous years divided by the number of articles published in that period. 10 Table 2 reports the mean, median, 75th percentile, and 90th percentile of the number of UC downloads per recent article, the number of citations per recent article, and the 9 A brief history of the science citation index and the impact factor appears in Garfield (2007). Research on the use of citations is surveyed by Bornmann and Daniel (2008). 10 Note that to calculate recent downloads, we sum downloads over items published in three years, including year of downloading and the two previous years, while the impact factor sums citations in the citing year to items published in the three years prior to the year in which citing occurred. 5

Table 3: Downloads and Citations by Publisher Mean Median P75 P90 ACS UC Downloads 14.8 11.4 16.8 29.2 Citations 18.7 14.2 18.9 36.9 Ratio 1.04 0.70 1.00 1.73 Elsevier UC Downloads 12.6 6.8 13.0 23.3 Citations 8.7 6.8 10.4 15.4 Ratio 2.13 1.11 1.82 3.07 IEEE UC Downloads 5.1 4.0 6.6 9.5 Citations 10.5 9.0 13.1 18.8 Ratio 0.78 0.42 0.77 1.42 NPG UC Downloads 107.9 39.8 191.9 288.4 Citations 34.4 22.3 46.4 79.6 Ratio 2.63 2.29 3.19 4.40 Springer UC Downloads 4.8 3.0 6.2 10.4 Citations 4.9 3.9 6.8 9.9 Ratio 1.35 0.78 1.36 2.46 Taylor Francis UC Downloads 2.8 1.8 3.7 6.6 Citations 3.1 2.4 3.8 5.7 Ratio 2.36 0.73 1.87 4.59 Wiley UC Downloads 5.9 3.6 7.4 13.1 Citations 7.2 5.7 8.9 13.7 Ratio 1.27 0.71 1.24 2.36 All Publishers Combined UC Downloads 8.2 3.9 8.1 15.4 Citations 6.7 4.8 8.3 12.8 Ratio 1.79 0.85 1.59 3.02 Source: 2011-2016 JR5s 6

ratio of recent UC downloads to recent citations for the journals in our sample, classified by broad research area. 11 Table 3 reports these same basic features by journal publisher. Across all categories, the means of citations and downloads exceed the medians. This reflects the fact that some journals have extraordinarily high levels of downloads and citations per recent article. The tables also reveal that numbers of recent downloads per article and recent citations per article differ substantially across fields of research and between publishers. The life sciences have higher levels of both downloads and citations per recent article and, among the four large commercial publishers, Elsevier has higher downloads and citations per recent article. The ratio of downloads (per recent article) to citations (per recent article), is a measure that is invariant to the number of articles. The magnitude of these ratios depend on the fact that we use downloads from University of California campuses only, while our citation measure counts citations from researchers from all institutions, world-wide. (Moed and Halevi (2016) find that when both downloads and citations are attributable to the same research institutions, downloads have a much larger mean, but are less skewed, than citations.) Table 2 shows that for Arts and Humanities journals, the ratios of downloads to citations are much higher than for journals in the other three categories. This suggests that in evaluating library subscriptions, the use of citation rates alone may undervalue journals in arts and humanities relative to other fields. For life sciences, the ratio of downloads to citations is slightly higher than the average for all fields. For the social sciences this ratio is close to the average, and for the physical sciences, it is lower than average. Tables 2 and 3 indicate that the prestige of a journal, as measured by its number of citations per article, may affect its ratio of downloads to citations. These tables show the ratio of downloads to citations for the three quantiles of the distribution where the 90th percentile contains the most prestigious journals. Table 2 indicates that for all four broad categories, prestige generally leads to a higher ratio of downloads to citations. Table 3 reports the prestige effect by publisher. Again, the ratio of downloads to citations appears to increase with the prestige level of journals. Table 3 shows substantial differences among publishers in the ratio of UC downloads to citations. The mean ratio of downloads to citations for the Nature Publishing Group is more than twice that of Elsevier, which in turn has a much higher ratio than that for the other large commercial publishers with multidisciplinary coverage. Significant differences in the ratio of downloads to citations across publishers would give one reason to question whether the download statistics reported by publishers accurately reflect differences in usage, or whether they are somehow distorted by differences in the publishers platforms for download access or other features of data reporting. In subsequent discussion, we explore whether the observed cross-publisher differences 11 The mean for the ratio is the mean of the ratio of downloads to citations, not the ratio of the mean of downloads to the mean of citations. 7

can be explained by the fact that publishers differ in the academic disciplines that they cover and in the relative prestige of the journals that they publish. 4 Estimating a function to predict downloads Table 2 describes the behavior of downloads as a function of a single explanatory variable, citations, for each of four broad disciplinary categories. In order to investigate the relation of downloads to several variables simultaneously, it will be useful to estimate a function that predicts the number of downloads as a function of these variables. As we see from Table 2, the ratio of downloads to citations tends to be higher for relatively prestigious journal with high ratios of citations per articles. This suggests that the number of downloads from a journal can be better predicted if one accounts for the number of articles in the journal as well as the number of citations. From Table 2 it is also apparent that the number of downloads from a journal depends not only on its number of citations and number of articles, but also on the academic discipline to which it is devoted. Since for each journal we have download data taken from each of several years, it is also appropriate to control for the year of download. Having controlled for a journal s citations, impact factor, academic discipline, and year of download, we might expect that the identity of the journal s publisher would have little or no effect on the predicted number of downloads. In order to determine whether this is the case, we fit an equation that includes an indicator variable for each publisher. Thus the equation that we estimate includes the following variables. Let D jy represent the number of times in year y that University of California libraries have downloaded articles that were published in journal j in year y and in the three years prior to year y. Let A jy be the number of articles published in journal j in the three years previous to year y. Let C jy be the number of times that articles published in journal j in the previous three years were cited in year y. We assign indicator variables for the academic discipline to which a journal is assigned, the year in which downloads are recorded, and the journal s publisher. We then employ maximum likelihood procedures to estimate a function that predicts downloads and takes the form E(D jy ) = A α jyc β jy F jy y P j (1) where F j, P j, and Y y are multiplicative factors corresponding respectively to the journal s discipline, its publisher, and the year of download for the observed downloads. (Appendix 1 presents formal details of our estimation procedure.) We can rewrite Equation 1 to explicitly show separate effects of citations per article (aka impact factor) and of number of articles (size of journal) on the number of downloads. 8

Equation 1 is equivalent to E (D jy ) = A α+β jy ( Cjy A jy ) β F j Y y P j. (2) We use maximum likelihood methods, as described in the Appendix of this paper, to estimate the parameters α + β, β and the coefficients Y y, F j, and P j, corresponding to indicator variables for year of download, journal discipline, and journal publisher. For each journal we have between four and six observations, corresponding to downloads in different years. We estimate standard errors using cluster-robust methods to account for within-journal correlation. 12 5 Results Table 4 reports estimates of some of the parameters of Equation 2. These include the coefficient α + β that measures the elasticity of downloads with respect to number of articles, holding impact factor constant, and the coefficient β, which measures responsiveness of downloads to impact factor, holding the number of articles constant. The second column of Table 4 reports coefficient estimates with an indicator variable for broad disciplinary category to which the journal belongs. (These coefficients are normalized to express their ratio to that of social science.) The third column of Table 4 reports estimates when indicator variables are used for each of 163 narrowly defined fields. Listings of these 163 fields, classified by broad disciplinary area appear in Tables 10-13 of the Appendix. Coefficients of indicator variables for these disciplines also appear in these tables. The estimates shown in Table 4 are constructed under the assumption that the coefficients β, α + β, measuring the effects of impact factor and scale of a journal, and the coefficients P j, measuring the publisher effect, are the same across all disciplines. Table 5 shows results when we relax this assumption by fitting separate equations for each of the four broad disciplinary categories. 12 When these results are compared with robust standard errors that only account for heteroskedasticity, we find that the cluster-robust standard errors are about twice the estimates found without accounting for within-journal correlation. 9

Table 4: Effect of Journal Characteristics on Downloads Broad Cat. Fine Cat. Impact Factor (β) 1.179 1.082 (0.102) (0.054) Articles (α + β) 0.877 0.899 (0.030) (0.027) Arts and Humanities 2.154 (0.364) Life and Health Sciences 0.968 (0.056) Physical Sciences and Engin. 0.519 (0.037) Social Sciences 1 (.) ACS 0.988 0.875 (0.148) (0.095) Elsevier 1 1 (.) (.) IEEE 0.505 0.573 (0.052) (0.050) NPG 1.636 1.587 (0.260) (0.189) Springer 0.616 0.618 (0.030) (0.028) Taylor & Francis 0.574 0.457 (0.058) (0.028) Wiley 0.535 0.517 (0.039) (0.026) R 2 0.834 0.876 Number of Observations 26793 26793 Note: Standard errors clustered at the journal level are reported in parentheses. 10

Table 5: Separate Equations by Broad Category Arts and Humanities Life and Health Sciences Physical Sciences and Engineering Social Sciences Impact Factor (β) 0.327 1.209 0.929 0.655 (0.049) (0.061) (0.052) (0.058) Articles (α + β) 0.955 0.864 0.937 0.903 (0.077) (0.032) (0.030) (0.034) ACS 1.106 (0.083) [139] Elsevier 1 1 1 1 (.) (.) (.) (.) [90] [4,299] [3,000] [1,318] IEEE 0.641 (0.050) [537] NPG 1.400 (0.152) [240] Springer 0.824 0.519 0.845 0.755 (0.146) (0.031) (0.060) (0.056) [180] [3,030] [2,799] [1,062] Taylor & Francis 0.474 0.455 0.480 0.363 (0.068) (0.051) (0.047) (0.025) [530] [637] [999] [3,156] Wiley 0.628 0.406 0.851 0.527 (0.102) (0.019) (0.076) (0.031) [216] [2,131] [930] [1,500] R 2 0.653 0.874 0.882 0.811 Number of Observations 1016 10337 8404 7036 Note: Standard errors clustered at the journal level are reported in parentheses. The number of journal-year observations for each publisher appear in brackets. 11

5.1 The effects of impact factor and number of articles The coefficient β represents our estimate of the elasticity of the number of downloads of a journal with respect to its impact factor, while holding the number of articles in the journal constant. Thus, holding the number of articles constant, a 1% increase in impact factor would result in a β% increase in the downloads. Since the impact factor is the ratio of the number of citations to the number of articles, a 1% increase in the impact factor, holding articles constant, is equivalent to a 1% increase in citations. Thus we can also interpret β as an estimate of the elasticity of downloads with respect to citations. In Table 4, we see that in both of the regressions with broad and fine categories, the estimates for β are slightly greater than, but not statistically significantly different from, unity. This suggests that if a journal holds its number of articles constant, but experiences a 1% increase in citations, then its expected number of downloads would also increase by about 1%. In Table 5, where the parameters β and α + β are allowed to differ among broad categories, a slightly different picture emerges. The elasticity, β, of downloads with respect to impact factor is approximately unity for the physical sciences and engineering, but this elasticity is much smaller for the arts and humanities and for the social sciences and significantly greater than unity for the life and health sciences. Figure 1 plots the predicted relation between impact factor and downloads for each of the four broad disciplinary categories, controlling for the number of articles, the publisher and the date of download. We see that for journals with relatively low impact factors, journals in the arts and humanities and the social sciences have more downloads per citation than journals in life and health sciences and in physical science and engineering, while for journals with relatively high impact factors, this relation is reversed. The coefficient α+β represents the elasticity of the number of downloads from a journal with respect to the number of articles it contains, holding constant the journal s impact factor. Thus a 1% increase in the number of articles, holding impact factor constant, is predicted to result in an (α + β)% increase in the number of downloads from that journal. Tables 4 and 5 both shows estimates of α + β that are slightly less than one for all broad disciplinary categories. This indicates that if a journal expands its number of articles by 1%, while holding its impact factor constant, its predicted number of downloads would increase by slightly less than 1%. 5.2 Explaining the variation in downloads Table 6 shows the progression of the fraction of observed variation in downloads that is explained as variables are sequentially added to Equation 1. Each column reports the R 2 when the variables marked X are included as explanatory variables. If citations alone are used to predict downloads, we see that variation in the number of recent citations to a journal are sufficient to account for about 75% of the variation in downloads about the 12

10000 Relationship Between Downloads and Impact Factor 1000 Downloads 100 Arts and Humanities Life and Health Sciences Physical Sciences and Engineering Social Sciences 10 0.1 0.25 0.5 1 2 5 10 Impact Factor Figure 1: Relationship between Downloads and Impact Factor mean. If in addition, we account for journal impact factor, by including a journal s number of articles as well as number of citations, then about 77% of variation is accounted for. Adding an indicator variable for the journal s discipline improves the R 2 to about 81% if broad categories are used, and 85% if fine categories are used. If we include an indicator variable for publisher, the R 2 improves to 88% and if we also interact the effects of broad category with the other variables, the overall R 2 increases to 89%. Table 6: Progression of R 2 as variables are added R 2 0.748 0.768 0.772 0.807 0.852 0.876 0.886 Citations X X X X X X X Articles X X X X X X Download Year X X X X X Broad Cat. X X Fine Cat. X X X Publisher X X Interaction X 13

5.3 The effect of download year Table 7 shows the coefficients of year-of-download from the estimating equations for each of the four broad disciplinary categories. The rows for each download year report the multiplicative factor for that year. The year 2013 is selected as the base year because we do not have data for all publishers in 2011 and 2012. 13 Thus changes for the first two years reflect not only trends in downloading, but also the changing composition of our sample. We also replace the multiplicative factors with a linear time trend; the estimated trend coefficient is reported in the last row. There appears to have been a substantial increase in downloading for journals in Life and Health Sciences. For the other categories there appears to have been modest growth, except in the case of physical sciences and engineering from 2013 to 2016, where the download coefficient has remained roughly constant from 2013-2016. Table 7: Effect of Download Year Download Arts and Life and Health Physical Sciences Social Year Humanities Sciences and Engineering Sciences 2011 0.978 0.810 0.753 0.914 (0.102) (0.031) (0.052) (0.035) 2012 1.122 0.900 0.800 1.259 (0.107) (0.016) (0.042) (0.039) 2013 1 1 1 1 (.) (.) (.) (.) 2014 1.308 0.986 0.948 1.192 (0.134) (0.015) (0.062) (0.044) 2015 1.035 1.005 0.869 1.182 (0.095) (0.016) (0.053) (0.043) 2016 1.245 1.274 0.977 1.393 (0.123) (0.040) (0.066) (0.052) Average annual 3.33% 7.7% 1.6% 5.5% growth rate (1.62) (0.85) (1.44) (0.81) 13 Our data for the year 2011 included only the publishers Springer, and Taylor & Francis. For 2012, we have data from Springer, Taylor & Francis, and Elsevier. For the years from 2013 onward we have data for all seven publishers. 14

5.4 The effect of academic discipline Table 8 records discipline effects on downloads for a sample of academic disciplines. The second column of this table shows a simple ratio of downloads to citations for each of these disciplines. The third column shows the coefficient F j on an indicator for discipline j when fitting equation 2 using fine categories to denote fields. These coefficients are normalized so that the mean coefficient for all journals is set to 1. The fourth column is the coefficient for this discipline when we fit this equation but allow the parameters α and α + β to differ among the four broad categories. These coefficients are again normalized relative to the mean coefficient for all journals. This table highlights the importance of controlling for differences in the broad categories. While arts and humanities journals have larger ratios of downloads to citations, once we control for differing relationships between citations and downloads, these large differences disappear. 5.5 The effect of journal publishers Libraries do not, in general, maintain their own download counts. This information is collected and supplied to libraries, usually on a confidential basis, by the journals publishers. Since collection and distribution of these statistics are not managed in a centralized and transparent way, it is reasonable to ask whether publisher-supplied data can be reliably compared across publishers. Some librarians (Li and Wilson, 2015) have expressed concern that different publisher platforms record downloads in different ways. For example, some platforms may make it more likely that a user downloads both a PDF copy and an HTML copy of the same paper, thus counting two downloads for a single usage. The University of California has Big Deal subscriptions to all of the journals published by each of the seven publishers treated here. If the relation between recorded downloads and actual usage is the same across publishers, then we would expect that after controlling for journal characteristics such as citations, number of articles, and academic discipline, the identity of the publisher should have little or no effect on the number of downloads at the University of California. The second column of Table 9 compares the ratios of downloads to citations for each publisher relative to that of Elsevier. 14 Here we see dramatic differences among publishers. Nature Publishing Group s ratio is more than twice that of Elsevier, and Elsevier s ratio is much higher than that of the other publishers. Of course, differences in the ratio of downloads to citations among publishers do not necessarily imply differences in the way that publishers report data. Some of these differences can be explained by the fact that publishers differ in the distribution of disciplines that they cover and the fact that ratios of downloads to citations differ between disciplines. 15 14 These figures are obtained from Table 3 by dividing each publisher s download to citations ratio by that of Elsevier. 15 As Table 1 shows, NPG (Nature Publishing Group) specializes in life and health sciences, while ACS 15

Table 8: Coefficients for Selected Disciplines Ratio: Intercept: Intercept: Downloads to Relative to Relative to Own Citations All Journals Broad Category Arts and Humanities 2.18 2.11 1.32 Dance 11.3 22.1 1.60 Literature 4.03 4.52 0.78 Music 7.49 19.4 1.81 Philosophy 1.61 3.35 0.57 Life and Health Sciences 0.67 0.95 1.06 Biology 0.91 1.76 2.47 Medicine 0.84 1.21 1.67 Oncology 0.91 0.91 1.26 Pharmacy & Pharmacology 0.54 0.77 1.06 Physical Sciences & Engin. 0.34 0.51 0.45 Chemical Engineering 1.14 0.36 0.71 Chemistry 0.36 0.83 1.51 Computer Science 0.50 0.38 0.78 Electrical Engineering 0.47 0.59 1.17 Mathematics 0.66 0.82 1.24 Mechanical Engineering 0.65 0.41 0.84 Physics 0.40 0.58 1.07 Social Sciences 0.81 0.98 1.51 Economics 0.55 0.91 0.32 Education 0.68 1.00 0.53 History 3.57 3.93 1.01 Law 1.42 1.57 0.56 Library & Information Science 0.71 0.43 0.24 Political Science 1.97 2.09 0.90 Psychology 0.73 1.36 0.72 Table 2 shows that for journals in all areas except arts and humanities, journals with high impact factor have substantially higher ratios of downloads to citations than those (American Chemical Society) and IEEE (Institute for Electrical and Electronic Engineers) specialize in physical science and engineering. This table also shows that the broad-based commercial publishers, Elsevier, Springer, Taylor & Francis, and Wiley differ markedly in disciplinary coverage. We see from Table 2 that the the ratio of downloads to citations differs substantially across disciplines. 16

with low impact factor. Thus the distribution of quality (as measured by impact factor) of a publisher s portfolio will also affect the publisher s average ratio of downloads to citations. The estimates of Equation 2 found in Table 4 comprise our efforts to control for disciplinary specialization, impact factor, size of journal, and date of download. The third and fourth columns of Table 9 show our estimates of the publisher-effects that remain after controlling for these factors. The third column reports results when we control only for the four broad categories, while the fourth column reports results when we allow separate effects for each of 163 narrowly-defined academic disciplines. The final four columns of the table compare publisher effects relative to that of Elsevier when we fit separate equations for each of the four broad categories while including indicator variables for each of the 163 narrowly defined disciplines. Table 9: Estimated Publisher Effects Normalized Relative to Elsevier Simple Broad Fine Arts & Life & Physics & Social Ratio Cat. Cat. Hum. Health Sci Engineering Science NPG 2.15 1.64 1.54 1.40 Elsevier 1 1 1 1 1 1 1 ACS 0.55 1.00 0.88 1.11 IEEE 0.32 0.51 0.57 0.64 Springer 0.67 0.62 0.62 0.82 0.52 0.85 0.76 Taylor Francis 0.63 0.57 0.46 047 0.41 0.85 0.53 Wiley 0.54 0.54 0.52 0.63 0.41 0.85 0.53 Table 9 shows that when the effects of variables such as discipline and impact factor are accounted for, the publisher effect for Nature Publishing Group relative to Elsevier is reduced from 2.15 to 1.40, and the difference in publisher effect between Elsevier and the American Chemical Society becomes insignificant. 16 However, even when controlling for other variables, the three other broad-based commercial publishers and IEEE report substantially fewer downloads than does Elsevier. 6 Conclusion This paper originated as an exploration of the relation between journal downloads and journal citations. If the number of times that a journal is downloaded by a library s 16 Eighteen of the seventy-three journals published by NPG belong to the Nature Reviews series, which concentrates on review articles. We ran separate estimates that allowed a separate coefficient for Nature Reviews journals. These coefficients were slightly higher than for the other NPG journals, the effect on the coefficient for the other journals was not significantly changed. 17

patrons accurately measures usage, then there is a strong case that libraries should use download data in addition to or perhaps instead of citation data when deciding how to allocate their subscription expenditures among journals. We found that there is substantial correlation between citations and reported downloads, with an R 2 of about.75 in a simple regression. But we also see that the ratio of downloads to citations varies with some other observable journal characteristics. In particular, we see substantial variation in this ratio between disciplines, and we see that the ratio of downloads to citations is higher for journals that are more prestigious as measured by impact factor. However, our estimates uncovered a disconcerting dependence of journal downloads on the journal s publisher. This dependence persists when we control for academic discipline and for impact factor. When we fit an estimating function that controls for these variables, the numbers of recorded downloads from Nature Publishing Group, Elsevier, and American Chemical Society are significantly greater than the corresponding numbers for journals published by the other four publishers. We see from the second column of Table 4 that controlling for citations, impact factor, and journal discipline, journals published by Elsevier have reported downloads almost twice as high as those published by Springer, Wiley, Taylor & Francis, and IEEE, while Nature Publishing Group journals have about one and a half times as many reported downloads as Elsevier. In Table 5, where we run separate regressions for the four broad disciplines, these differences persist, although they are narrowed somewhat for journals in the physical sciences and engineering. We are left with a mystery. Why should the name of a journal s publisher have an independent effect on the number of times that its articles are downloaded? In principle, it is possible that some characteristics of journals that we have not accounted for differ significantly among publishers and influence the number of downloads sufficiently to result in the large publisher effects that we observe. An alternative hypothesis is that different publishers record downloads in different ways. The number of downloads may depend on the nature of the publisher s platforms or it may be that some publishers papers are more frequently downloaded from sources not counted by the publisher. If librarians are to use download data to evaluate journals when making subscription decisions, it is important that they be able to compare the offerings of different publishers using a metric that treats all publishers equally. If the publisher effect that we find for reported downloads is not related to actual usage, then in comparing the usage value of journals from different publishers, it might be appropriate to weight the downloads from different publishers differently. For example, if we assume that the publisher effects found in our Table 9 are due to the way publishers record downloads and not to actual usage, then an appropriate measure of usage would weight reported downloads accordingly with weights inversely proportional to the coefficients found in Table 9. Currently, download data is collected by publishers and reported to subscribing libraries, often subject to a confidentiality clause that prevents them from sharing this data. If download records are to become a reliable tool for estimating usage, it might be appropri- 18

ate for libraries to develop a uniform interface for downloading articles from all publishers, and to maintain their own records of journal downloads, which they would share as public information. References Althouse, Benjamin M., Jevin D. West, Carl T. Bergstrom, and Theodore Bergstrom. 2009. Differences in impact factor across fields and over time. Journal of the American Society for Information Science and Technology, 60(1): 27 34. Anauati, Victoria, Sebastian Galiani, and Ramiro H. Gálvez. 2016. Quantifying The Life Cycle Of Scholarly Articles Across Fields Of Economic Research. Economic Inquiry, 54(2): 1339 1355. Bergstrom, Carl T., Jevin D. West, and Marc A. Wiseman. 2008. The Eigenfactor TM Metrics. Journal of Neuroscience, 28(45): 11433 11434. Bollen, Johan, Herbert Van de Sompel, Joan A. Smith, and Rick Luce. 2005. Toward Alternative Metrics of Journal Impact: A Comparison of Download and Citation Data. Inf. Process. Manage., 41(6): 1419 1440. Bornmann, Lutz, and HansDieter Daniel. 2008. What do citation counts measure? A review of studies on citing behavior. Journal of Documentation, 64(1): 45 80. Bouabid, Hamid. 2011. Revisiting Citation aging: a model for citation distribution and life-cycle prediction. Scientometrics, 88: 199 211. Brody, Tim, Stevan Harnad, and Leslie Carr. 2006. Earlier Web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology, 57(8): 1060 1072. Card, David, and Stefano DellaVigna. 2013. Nine Facts about Top Journals in Economics. Journal of Economic Literature, 51(1): 144 161. Coughlin, Daniel M., and Bernard J. Jansen. 2015. Modeling journal bibliometrics to predict downloads and inform purchase decisions at university research libraries. Journal of the Association for Information Science and Technology. Coughlin, Daniel M., Mark C. Campbell, and Bernard J. Jansen. 2013. Measuring the value of library content collections. Proceedings of the American Society for Information Science and Technology, 50(1): 1 13. Duy, Joanna, and Liwen Vaughan. 2006. Can electronic journal usage data replace citation data as a measure of journal use? An empirical examination. The Journal of Academic Librarianship, 32(5): 512 517. 19

Ellison, Glenn. 2013. How Does the Market Use Citation Data? The Hirsch Index in Economics. American Economic Journal: Applied Economics, 5(3): 63 90. Galiani, Sebastian, and Ramiro H. Gálvez. 2017. The life cycle of scholarly articles across fields of research. National Bureau of Economic Research working paper. Gallagher, John, Kathleen Bauer, and Daniel M. Dollar. 2005. Evidence-based librarianship: Utilizing data from all available sources to make judicious print cancellation decisions. Library Collections, Acquisitions, and Technical Services, 29(2): 169 179. Garfield, Eugene. 2007. The evolution of the science citation index. International Microbiology, 10: 65 69. Gibson, John, David L. Anderson, and John Tressler. 2014. Which Journal Rankings Best Explain Academic Salaries? Evidence From The University Of California. Economic Inquiry, 52(4): 1322 1340. Gorraiz, Juan, Christian Gumpenberger, and Christian Schlögl. 2014. Usage versus citation behaviours in four subject areas. Scientometrics, 101(2): 1077 1095. Gould, William. 2011. Use Poisson Rather than Regress, Tell a Friend. The STATA Blog. Hazelkorn, Ellen. 2015. Rankings and the Reshaping of Higher Education. Palgrave Macmillan. Kurtz, Michael J., and Edwin A. Henneken. 2017. Measuring metrics a 40-year longitudinal cross-validation of downloads and peer review in asstrophysics. Journalof the Association for INformation Science and technology, 68(3): 695 708. Kurtz, Michael J., and Johan Bollen. 2010. Usage Bibliometrics. Annual review of information science and technology, 44(1): 1 64. Kurtz, Michael J., Gunther Eichhorn, Alberto Accomazzi, and Stephen S. Murray. 2005. The effect of use and access on citation. Information processing & management, 41(6): 1395 1402. Li, Chan, and Jacqueline Wilson. 2015. Inflated Journal Value Rankings: Pitfalls you should know about HTML and PDF Usage. Slides for talk delivered at American Library Association Annual Conference. McDonald, John D. 2007. Understanding Journal Usage, A statistical analysis of citation and use. Journal of the American society for information science and technology, 58(1): 39 50. 20

Moed, Henk F. 2005. Journal of the American society for information science and technology, 56(10): 1088 1097. Moed, Henk F., and Gali Halevi. 2015. Multidimensional assessment of scholarly research impact: The Multidimensional Assessment of Scholarly Research Impact. Journal of the Association for Information Science and Technology, 66(10): 1988 2002. Moed, Henk F., and Gali Halevi. 2016. On full text download and citation distributions in scientific-scholarly journals. Journal of the Association for Information Science and Technology, 67(2): 412 431. Perneger, Thomas V. 2004. Relation between online hit counts and subsequent citations: prospective study of research papers in the BMJ. BMJ, 329(7465): 546 547. Vaughan, Liwen, Juan Tang, and Rongbin Yang. 2017. Investigating disciplinary differences in the relationships between citations and downloads. Scientometrics, 111(3). Wan, Jin-kun, Ping-huan Hua, Ronald Rousseau, and Xiu-kun Sun. 2010. The journal download immediacy index (DII): experiences using a Chinese full-text database. Scientometrics, 82(3): 555 566. West, Jevin, Theodore Bergstrom, and Carl T. Bergstrom. 2010. Big Macs and Eigenfactor scores: Don t let correlation coefficients fool you. Journal of the American Society for Information Science and Technology, 61(9): 1800 1807. Wiersma, Gabriella. 2016a. Report of the ALCTS CMS collection evaluation and assessment interest group meeting. American Library Association Conference, San Francisco, June 2015. Technical Services Quarterly, 33(2): 183 192. Wiersma, Gabrielle. 2016b. Report of the ALCTS CMS Collection Evaluation & Assessment Interest Group Meeting. American Library Association Annual Conference, San Francisco, June 2015. Technical Services Quarterly, 33(2): 183 192. 21

A Statistical Methods The number of downloads is a count variable taking non-negative integer values. Because count data is not continuous, the traditional approach of specifying the conditional mean of the variable of interest together with a normal error is not always the best approach. For the problem at hand, D j,y has many small integer values, a large number of zeros, and a small number of very large counts (the source of the positive skewness in the downloads distribution), all of which suggest the normal distribution is not appropriate. One common alternative is to convert the integer values to non-integer values (by using the log of the variable of interest) that are then well approximated by a normal distribution. Such an approach is not appealing here, because the log is not defined for the many observations that equal zero. Instead, we model the distribution of downloads, conditional on the covariates x j,y, as a Poisson random variable with distribution defined by P[D j,y = k x j,y ] = e µ j,y (µ j,y ) k k = 0, 1, 2,... (3) k! where µ j,y depends on x j,y. The Poisson approximation to the distribution of downloads is unlikely to work well for non-integer random variables, in particular for the ratio of downloads to citations. The key is to specify the relationship between µ j,y and the covariates, for which a natural specification would be µ j,y = x T j,yβ. One feature of the Poisson distribution is that E[D j,y x j,y ] = µ j,y, hence µ j,y > 0 because downloads are restricted to be non-negative. Unfortunately, the linear specification does not satisfy the restriction µ j,y > 0 for all values of x T j,y β, so the common specification is µ j,y = exp(x T j,yβ). Thus E[D j,y x j,y ] = exp(x T j,yβ). (4) The parameters are estimated via quasi-maximum likelihood. The density for an individual observations is f(d j,y x j,y ) = e exp(xt j,y β) e xt j,y β D j,y D j,y! If we let the full set of observations be denoted (D, x) := {D i, x T i }n i=1, the log likelihood is (5) with first-order conditions L(β d, x) = n [D i x T i β e xt i β log(d i!)], (6) i=1 n [D i e ˆβ]x xt i i = 0, (7) i=1 22

where ˆβ is the maximum likelihood estimator of β. 17 Although (7) does not have a closedform solution, L is a concave function of β and standard numeric optimization methods can be employed. Under the Poisson distribution the mean equals the variance, a restriction that is unrealistic for downloads. Yet ˆβ remains consistent for β even if this restriction is violated, as long as the conditional mean is correctly specified in (4). 18 More care needs to be taken in estimating the standard error of ˆβ. To produce consistent estimators of the standard errors we use the robust variance estimator ˆV ( ˆβ x) n = ( where ˆµ i = exp(x T i ˆβ). 19 i=1 ˆµ i x i x T i ) 1 ( n n (D i ˆµ i ) 2 x i x T i ) ( i=1 i=1 ˆµ i x i x T i ) 1, (8) B Coefficients for narrowly-defined disciplines Tables 10-13 record discipline effects on downloads for each of the narrowly-defined disciplines within each of the four broadly-defined subject areas. The second column of each table shows the ratio of downloads to citations. The third column shows the coefficient F j of an indicator for discipline j when fitting equation 2. These coefficients are normalized so that the mean coefficient for all disciplines is set to 1. The fourth column is the coefficient for each discipline when we allow the parameters α and α + β to differ among the four broad categories. Coefficients are again normalized relative to the mean coefficient for all journals. 17 Technically, ˆβ is a quasi-maximum likelihood estimator, as (7) does not require a Poisson distribution. 18 McDonald (2007) replaces the Poisson distribution with the negative binomial distribution, for which the mean does not equal the variance. While this relaxes a restriction of the Poisson distribution, it does so at the cost of lack of consistency if the distribution is misspecified. 19 Gould (2011) is a helpful guide for implementing this method in the software package Stata. 23

Table 10: Discipline Effects for Arts and Humanities Ratio: Downloads to Citations Intercept: Relative to All Journals Intercept: Relative to Own Broad Category Arts and Humanities 2.18 2.11 1.32 Architecture 1.14 1.03 0.63 Dance 11.3 22.1 1.60 Drama 2.63 8.96 0.67 Film 6.11 6.27 0.90 Fine Arts 2.06 3.64 0.56 Languages 2.83 2.39 0.66 Literature 4.03 4.52 0.78 Music 7.49 19.4 1.81 Philology & Linguistics 1.09 1.76 0.59 Philosophy 1.61 3.35 0.57 Religion 3.77 3.31 0.54 Visual Arts 4.56 7.54 0.80 24

Table 11: Discipline Effects for Life and Health Sciences Ratio: Downloads to Citations Intercept: Relative to All Journals Intercept: Relative to Own Broad Category Life and Health Sciences 0.67 0.95 1.06 Agriculture 0.36 0.53 0.74 Alternative Medicine 0.99 1.06 1.62 Anatomy 0.58 1.11 1.75 Animal Behavior 0.63 1.52 2.22 Animal Sciences 0.36 0.76 1.17 Bioethics 0.78 1.81 3.31 Biology 0.91 1.76 2.48 Biophysics 0.72 1.73 2.28 Botany 0.47 0.93 1.31 Cardiovascular Diseases 5.32 0.86 1.22 Clinical Endocrinology 0.62 0.78 1.12 Clinical Immunology 0.63 0.85 1.17 Cytology 0.87 2.23 2.74 Dentistry 0.74 1.06 1.57 Dermatology 0.53 1.39 2.26 Diet & Clinical Nutrition 0.56 0.79 1.16 Ecology 0.41 0.91 1.31 Emergency Medicine 1.22 1.41 2.16 Food science 0.24 0.40 0.56 Forestry 0.32 0.63 0.91 Gastroenterology 0.40 0.73 1.05 Genetics 1.38 1.07 1.45 Geriatrics 0.56 0.95 1.28 Gynecology & Obstetrics 0.79 1.28 1.89 Hematologic Diseases 0.60 0.85 1.28 Infectious Diseases 0.72 0.91 1.27 Internal Medicine 0.56 1.01 1.50 Invertebrates & Protozoa 0.45 0.77 1.21 Marine Science 0.50 0.93 1.35 Medical Research 0.57 0.77 1.20 Medicine 0.84 1.21 1.67 Microbiology & Immunology 0.63 1.19 1.59 Musculoskeletal System Diseases 1.63 0.95 1.47 Nephrology 1.40 0.48 0.73 Neurology 0.70 1.57 2.14 Neuroscience 1.03 1.59 2.16 Nursing 1.21 1.49 2.34 Occupational Therapy & Rehabilitation 1.05 0.96 1.78 Oncology 0.91 0.91 1.26 Ophthalmology & Optometry 0.79 1.36 1.92 Otorhinolaryngology 1.17 1.28 2.23 Pathology 1.75 0.86 1.23 Pediatrics 0.93 1.24 1.94 Pharmacy, Therapeutics, & Pharmacology 0.54 0.77 1.06 Physical Therapy 1.31 1.20 1.71 Physiology 0.55 0.86 1.23 Plant Physiology 0.37 0.99 1.59 Plant Sciences 0.38 0.68 1.04 Psychiatric Disorders, Individual 0.57 0.93 1.26 Psychiatry 0.66 1.06 1.46 Psychotherapy 0.57 1.16 1.55 Public Health 0.87 1.06 1.52 Radiology, MRI, Ultrasonography & Medical Physics 0.70 1.01 1.55 Sciences 0.47 0.73 1.07 Surgery & Anesthesiology 25 1.04 1.25 2.08 Surgery and By Type 0.82 1.04 1.61 Urology 0.80 0.94 1.42 Vertebrates 0.71 1.43 2.43 Veterinary Medicine 1.12 1.20 1.83 Zoology 0.56 0.89 1.37