Usage versus citation indicators Christian Schloegl * & Juan Gorraiz ** * christian.schloegl@uni graz.at University of Graz, Institute of Information Science and Information Systems, Universitaetsstr. 15, A 8010 Graz, (Austria ) ** juan.gorraiz@univie.ac.at University of Vienna, Library and Archive Services, Bibliometrics Department, Boltzmanngasse 5, A 1090 Vienna (Austria) Background After the rise of electronic journals it has become much easier to collect journal usage data. Due to the world occurrence of e journals it is now possible to view scholarly communication through the eyes of the reader (Rowlands & Nicholas, 2007). Furthermore, compared to citation data, usage data have several advantages like easy and cheap data collection, earlier availability, and reflection of a broader usage scope (Bollen et al., 2005; Brody et al., 2006; Duy et al., 2006; Haustein, 2011). The authors of this proposal have already performed a few analyses with usage data for oncology and pharmacology journals provided by ScienceDirect. The main results which were published in Scientometrics (see Schloegl & Gorraiz, 2010) and Journal of the American Society of Information Science and Technology (see Schloegl & Gorraiz, 2011) and presented at the 10th International Conference on Science and Technology Indicators (STI 2008) can be summarized as follows: strong increase in the usage of e journals for ScienceDirect journals from the fields of oncology and pharmacology between 2001 and 2006 high correlation between article downloads and citation frequencies at journal level which were slightly lower at article level medium to high correlation between relative indicators (usage impact factor and Garfield s impact factor) different obsolescence characteristics the download half lives amounted to approximately 2 years while the cited half lives were on average three times higher. The main limitations of our study are: Download data were only available from 2001 to 2006. Analyses were conducted only for two fields. A follow up study would not only allow us to validate our previous results but also to bring them on a broader scale. Therefore, we would be interested to include e journals also from social sciences and humanities in future analyses. The consideration of a more comprehensive time window would furthermore allow us to investigate the effect of e journals on formal scholarly communication in more detail. Our previous analyses suggested a strong increase in the citations of newly published articles but, beyond that, nearly no shift in the age distribution of the cited materials.
In particular, we plan to address the following issues: 1. long term growth in the use of electronic journals 2. differences in obsolescence characteristics between citations and downloads 3. comparison of download and citation frequencies, absolute and relative, at journal level and at article level; possibility to project citations on the basis of downloads 4. effects of e journals on formal scholarly communication in the past decade 5. differences in the issues 1 to 4 among fields, especially between social sciences, humanities and natural sciences 6. aspects which go beyond download statistics for instance, motivations for downloads 7. maybe also a comparison with other sources providing quick impact assessments (for instance, Mendeley) The Elsevier Bibliometric Research Program would provide us with an excellent opportunity to continue our previous research. Interestingly, there were not performed so many global journal usage studies so far (unlike local journal usage studies), which might be due to limited data availability. Of course, we would observe all privacy issues raised by Elsevier as we did in our previous research. Data requirements The main data sources for these analyses are ScienceDirect for the download data and Scopus for the citation data. We kindly ask you to deliver download data from ScienceDirect for journals (not for books) from the following natural sciences subject categories: Oncology: >= 41 titles (Full text available, Journals and Book Series) http://www.sciencedirect.com/science/browse/sub/oncology Computer science: >= 160 titles (Full text available, Journals and Book Series) http://www.sciencedirect.com/science/browse/sub/computerscience and from the following "Social Sciences and Humanities ScienceDirect groups: Arts & Humanities: >=46 Titles (Full text available, Journals and Book Series) http://www.sciencedirect.com/science/journals/sub/artsandhumanities/ Economics, econometrics and finance: >= 90 Titles (Full text available, Journals and Book Series) http://www.sciencedirect.com/science/journals/sub/socialsciences/ We would need the journal usage data from 2001 2010 at journal level. For example, for journal A the following data would be required (all at journal level): - total number of downloadable items for each year (2001 2010) - number of downloadable items disaggregated by document types for each year (2001 2010) - total download counts (separated by full text article requests FTAs, PDFs and HTMLs and, if possible, visits/views) for each download year (2001 2010)
- download counts disaggregated by the document types for each download year (2001 2010) - download counts (full text article requests FTAs) for each download year (2001 2010) disaggregated by the various publication years (from download year down to 1995 if e journal already available in that year) - percentage of non downloaded items for each download year (2001 2010) disaggregated by the document types and by the various publication years (from download year down to 1995). From all the titles indexed in Scopus we would also require the corresponding citation data. For a detailed specification of the data structure of the download data, please see the enclosed Excel document. In order to be able to relate the downloads in the four subject categories to the overall downloads in ScienceDirect, we would also need the total year wise download counts for ScienceDirect. Since we are aware that these data are very sensitive, it would be sufficient only to get relative data, for instance as an index. Example: Year 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 Index 100 171 250 401 703 833 901 1024 1154 1290 1400 value According to this example, the downloads in 2011 are 14 times the downloads in 2001.) Furthermore, usage (and, if easily possible citation) data at article level for the following selected journals would be highly appreciated: Cognitive Science Historia Mathematica History of European Ideas The International Journal of Nautical Archaeology Journal of Medieval History Journal of Phonetics Language Sciences Russian Literature The Lancet Oncology Biochimica et Biophysica Acta (BBA) Reviews on Cancer The Journal of Strategic Information Systems Information & Management Journal of Environmental Economics and Management Journal of Financial Economics. Here, we would need the monthly download counts beginning from the online publication data of the first issue in 2001 to the end of 2011. For a detailed specification of the data structure of the download data, please see the enclosed Excel document.
Methodology 1. Development of usage metrics Several usage indicators have been suggested in recent years. Most suggestions are based on the classical citation indicators from the Journal Citation Reports (JCR), using download data (usually fulltext article requests) instead of citations. The corresponding usage metrics are usage impact factor (UIF) (Rowlands & Nicholas, 2007 and Bollen & Van de Sompel, 2008), usage immediacy index (Rowlands & Nicholas, 2007) or download immediacy index (Wan et al., 2008), and usage half life (Rowlands & Nicholas, 2007). In our study we also plan to develop novel usage metrics. Due to the fact that the majority of downloads are effectuated in the current and subsequent years after publication (Gorraiz & Schloegl, 2010), the use of a usage impact factor following the same time window as the impact factor seems to be of minor relevance. Instead we suggest a journal usage factor (JUF) considering the reference year and the retrospective two years. The JUF is defined as the number of downloads in the year under consideration from journal items published in the this year and the previous two years divided by the number of items published in these three years. A three year time window assures to consider a very significant amount of downloads in most of the cases (Gorraiz et al., 2010). The download immediacy index (DII) will be calculated as the average number of articles already downloaded in the first 12 months after publication. The download half life (DHL) is defined as the number of years from the download year that account for 50% of the downloads of the journal (in the download year). Further characteristics of the anticipated metrics are: Primarily downloads (PDFs & HTMLs) will be considered, if possible also visits or hits. We aim to disaggregate all calculations on the basis of document types (Articles, Articles in Press, Review Articles, Conference Papers, others (Letters, Notes, ). That means JUF, DII, DHL are calculated for each document type. Our metric will work at synchronic (= downloads are tracked from one fixed year for documents issued in two or more previous publication years) as well as at diachronic level (= downloads of documents issued in a fixed publication year are tracked from two or more (subsequent) years). Also, all indicators will be calculated for the current and the previous two years, and the results will be compared. Due to the skew distribution of downloads we are also interested in other distribution parameters, like the percentage of non downloads, quantiles or quartiles. Time lines and graphs will be provided for the most relevant indicators. Analyses will also be performed at article level for a number of selected journals and the results compared with other studies (e.g. Moed, 2005). In particular, we plan to investigate the influence of downloads on citations, vice versa. 2. Comparison of downloaded impact measures with other sources providing quick impact assessments Contrary to citation data, download data give quick impact estimates (see Schlögl & Gorraiz 2010, 2011). In another analysis we would like to find out if there are other data sources which are able to
provide similar quick and reliable impact assessments. At present, Mendeley, an academic social service for managing and sharing research papers, seems most promising for this purpose. 3. Survey Analyses of log file data can only provide insights inherent to these data. In order to get a deeper understanding, we also plan to perform a survey. Such a survey would enable us to answer questions like, for instance: Which percentage of documents are downloaded but never read? Which percentage of researchers still read print journals rather than electronic journals? In the ideal case it will be possible to elaborate basics for a theory on downloads. So far, we do not know yet if we will survey a general population (e.g. all researchers at a university) or a more specific one (e.g. economists). Probably a combination of the two approaches is most useful. Whatsoever, in order to minimize data collection efforts, an electronic survey will be conducted. References Bollen, J.; Van de Sompel, H.; Smith, J.A.; Luce, R. (2005), Toward alternative metrics of journal impact: a comparison of download and citation data. Information Processing and Management, 41, 1419 1440. online available at URL: http://public.lanl.gov/herbertv/papers/ipm05jb final.pdf [26 November 2008]. Bollen, J. & Van de Sompel, H (2008), Usage impact factor: the effects of sample characteristics on usage based impact metrics. Journal of the American Society for Information Science and Technology, 59(1), 136 149. Brody, T.; Harnad, S.; Carr, L. (2006), Earlier web usage statistics as predictors of later citation impact. Journal of the American Society for Information Science and Technology, 57(8), 1060 1072. Duy, J. & Vaughan, L. (2006), Can electronic journal usage data replace citation data as a measure of journal use? An empirical examination. The Journal of Academic Librarianship, 32(5), 512 517. Gorraiz, J. & Gumpenberger, C. (2010). Going beyond citations: SERUM a new tool provided by a network of libraries. Liber Quarterly 20, 80 93. Haustein, S. (2011). Taking a multidimensional approach toward journal evaluation. Proceedings of the ISSI Conference, Durban, South Africa, 04 07 July, Vol. 1, 280 291. Moed, H.F. (2005). Statistical relationships between downloads and citations at the level of individual documents within a single journal. Journal of the American Society for Information Science and Technology, 56(10), 1088 1097. Rowlands, I. & Nicholas, D. (2007). The missing link: journal usage metrics. Aslib Proceedings, 59(3), 222 228. Schlögl, C. & Gorraiz, J. (2010): Comparison of citation and usage indicators: the case of oncology journals. Scientometrics, 82(3), 567 580.
Schlögl, C. & Gorraiz, J. (2011): Global Usage versus Global Citation Metrics: The Case of Pharmacology Journals. Journal of the American Society for Information Science and Technology, 61(1), 161 170. Wan, J. K., Hua, P. H., Rousseau, R.; Sun, X. K. (2010), The download immediacy index (DII): experiences using the CNKI full text database, Scientometrics, 82(3), 555 566. Members of the project team Assoc. Prof. Dipl. Ing. Dr. Christian Schlögl, University of Graz, Institute of Information Science and Information Systems, Universitätsstr. 15/F3, A 8010 Graz, Austria, christian.schloegl@uni graz.at, supervisor, researcher Dr. Juan Gorraiz, University of Vienna, Library and Archive Services, Bibliometrics Department, Boltzmanngasse 5, A 1090 Vienna, juan.gorraiz@univie.ac.at, researcher Dr. Christian Gumpenberger, University of Vienna, Library and Archive Services, Bibliometrics Department, Boltzmanngasse 5, A 1090 Vienna, christian.gumpenberger@univie.ac.at, researcher Mag. Peter Kraker, University of Graz, Institute of Information Science and Information Systems and Know Center, PhD student N.N., University of Graz, Institute of Information Science and Information Systems, Master student Specification of the computer infrastructure available The execution of the project will only require a basic computer infrastructure (PC, MS Excel and MS Access, SPSS) which is already available. Five most relevant publications by members of the project team Gorraiz, J. & Schloegl C. (2008): A bibliometric analysis of pharmacology and pharmacy journals: Scopus versus Web of Science. Journal of Information Science, 34(5), 715 725. Schlögl, C. & Gorraiz, J. (2011): Global Usage versus Global Citation Metrics: The Case of Pharmacology Journals. Journal of the American Society for Information Science and Technology, 61(1), 161 170. Schlögl, C. & Gorraiz, J. (2010): Comparison of citation and usage indicators: the case of oncology journals. Scientometrics, 82(3), 567 580. Schloegl, C. & Stock W. G. (2008): Practitioners and academics as authors and readers: the case of LIS journals. Journal of Documentation, 64(5), 643 666. Schloegl, C. & Stock, W.G. (2004): Impact and relevance of LIS journals: A scientometric analysis of international and German language LIS journals Citation analysis versus reader survey. Journal of the American Society for Information Science and Technology, 55(13), 1155 1168.
Time table and milestones of the project Period 07/ 12 08/ 12 09/ 12 10/ 12 11/ 12 12/ 12 01/ 13 02/ 13 03/ 13 04/ 13 05/ 13 06/ 13 07/ 13 08/ 13 09/ 13 Rough cut project planning Analyses 1 Analyses 2 : questionnaire design, survey, analysis Composing paper for ISSI 2013 Analyses 3 Composing of journal paper 1 Analyses 4 Preparation of paper for QQML 2013 Composing of journal paper 2 Possible revisions of articles Publication and conference presentation plan Planned conference presentations: 5th International Conference on Qualitative and Quantitative Methods in Libraries 2013 in Rome in Italy 14th International Conference of the International Society for Scientometrics and Informetrics (ISSI 2013) in Vienna Publication plan: It is planned to publish 2 articles in leading scientometrics (Journal of Informetrics or Scientometrics) or information science journals (Information Processing & Management, Journal of the American Society of Information Science & Technology, or Journal of Information Science). Furthermore, we envisage to publish one article in a humanities journal which also covers quantitative studies and aspects of formal scholarly communication (in the humanities).