K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts

Similar documents
K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts

National University of Singapore, Singapore,

Report on the 2nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017)

The linguistic patterns and rhetorical structure of citation context: an approach using n-grams

Comparing Bibliometric Statistics Obtained from the Web of Science and Scopus

The Decline in the Concentration of Citations,

On the relationship between interdisciplinarity and scientific impact

Canadian collaboration networks: A comparative analysis of the natural sciences, social sciences and the humanities

Je veux bien, mais me citerez-vous? On publication language strategies in an anglicized research landscape1

The 2016 Altmetrics Workshop (Bucharest, 27 September, 2016) Moving beyond counts: integrating context

Weak Links and Strong Meaning: The Complex Phenomenon of Negational Citations

Long-term variations in the aging of scientific literature: from exponential growth to steady-state science ( )

Long-Term Variations in the Aging of Scientific Literature: From Exponential Growth to Steady-State Science ( )

Canadian Collaboration Networks: A Comparative Analysis of the Natural Sciences, Social Sciences and the Humanities 1

Figures in Scientific Open Access Publications

THE KISS OF DEATH? THE EFFECT OF BEING CITED IN A REVIEW ON

Scientometrics & Altmetrics

2015: University of Copenhagen, Department of Science Education - Certificate in Higher Education Teaching; Certificate in University Pedagogy

Scientometric Measures in Scientometric, Technometric, Bibliometrics, Informetric, Webometric Research Publications

WOUTER GERRITSMA, VU UNIVERSITY

Improving the Coverage of Social Science and Humanities Researchers Output: The Case of the Érudit Journal Platform

a start time signature, an end time signature, a start divisions value, an end divisions value, a start beat, an end beat.

Bibliometric Analysis of the Indian Journal of Chemistry

PBL Netherlands Environmental Assessment Agency (PBL): Research performance analysis ( )

CITATION CLASSES 1 : A NOVEL INDICATOR BASE TO CLASSIFY SCIENTIFIC OUTPUT

Citation Indexes and Bibliometrics. Giovanni Colavizza

Contribution of Chinese publications in computer science: A case study on LNCS

First Stage of an Automated Content-Based Citation Analysis Study: Detection of Citation Sentences 1

Indian LIS Literature in International Journals with Specific Reference to SSCI Database: A Bibliometric Study

Welcome to the linguistic warp zone: Benchmarking scientific output in the social sciences and humanities 1

2nd International Conference on Advances in Social Science, Humanities, and Management (ASSHM 2014)

arxiv: v1 [cs.dl] 8 Oct 2014

Counting the Number of Highly Cited Papers

Professor Birger Hjørland and associate professor Jeppe Nicolaisen hereby endorse the proposal by

A Visualization of Relationships Among Papers Using Citation and Co-citation Information

Bibliometric glossary

Predicting the Importance of Current Papers

News Analysis of University Research Outcome as evident from Newspapers Inclusion

Bibliometric analysis for information scientists in the University of Tampere in 2012: some results and discussion on information sources

Edited Volumes, Monographs, and Book Chapters in the Book Citation Index. (BCI) and Science Citation Index (SCI, SoSCI, A&HCI)

How comprehensive is the PubMed Central Open Access full-text database?

Focus on bibliometrics and altmetrics

Citation Resolution: A method for evaluating context-based citation recommendation systems

Digital Library Literature: A Scientometric Analysis

Citation Metrics. BJKines-NJBAS Volume-6, Dec

Alphabetical co-authorship in the social sciences and humanities: evidence from a comprehensive local database 1

Visualizing the context of citations. referencing papers published by Eugene Garfield: A new type of keyword co-occurrence analysis

What is Web of Science Core Collection? Thomson Reuters Journal Selection Process for Web of Science

Lessons Learned: The Complexity of Accurate Identification of in-text Citations

Who Publishes, Reads, and Cites Papers? An Analysis of Country Information

Articles with short titles describing the results are cited more often

D 26 DATA AND KNOWLEDGE ENGINEERING X + 27 DRUG INFORMATION JOURNAL E 28 ECONTENT ELECTRONIC LIBRARY

An introduction to concepts of knowledge records and the artifacts that convey them.

Edited volumes, monographs and book chapters in the Book Citation Index (BKCI) and Science Citation Index (SCI, SoSCI, A&HCI)

The Use of Bibliometrics in the Social Sciences and Humanities

Bibliometric Analysis of Parasitological Research in Iran and Turkey: A Comparative Study

HIGHLY CITED PAPERS IN SLOVENIA

Readership Count and Its Association with Citation: A Case Study of Mendeley Reference Manager Software

CONTRIBUTION OF INDIAN AUTHORS IN WEB OF SCIENCE: BIBLIOMETRIC ANALYSIS OF ARTS & HUMANITIES CITATION INDEX (A&HCI)

ENCYCLOPEDIA DATABASE

CITATION METRICS WORKSHOP (WEB of SCIENCE)

A study of scientometrics analysis of research output performance of malaria

RESEARCH TRENDS IN INFORMATION LITERACY: A BIBLIOMETRIC STUDY

Exploiting user interactions to support complex book search tasks

BIBLIOGRAPHIC DATA: A DIFFERENT ANALYSIS PERSPECTIVE. Francesca De Battisti *, Silvia Salini

Editor's Introduction: Information concerning the new subtitle, upcoming issues, distribution, submissions, monograph series...

Peter Ingwersen and Howard D. White win the 2005 Derek John de Solla Price Medal

Web of Science Unlock the full potential of research discovery

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

Scientometric Profile of Presbyopia in Medline Database

International Journal of Library and Information Studies ISSN: Vol.3 (3) Jul-Sep, 2013

THE JOURNAL OF POULTRY SCIENCE: AN ANALYSIS OF CITATION PATTERN

Citation Concentration in ASLIB Proceedings Journal: A Comparative Study of 2005 and 2015 Volumes

Bibliometric analysis of the field of folksonomy research

THE REDIRECTION OF PSYCHOLOGY Essays in Honor of Amedeo P. Giorgi

The Journal Impact Factor: A brief history, critique, and discussion of adverse effects

BIG DATA IN RESEARCH IMPACT AMINE TRIKI CUSTOMER EDUCATION SPECIALIST DECEMBER 2017

How to publish in scientific journals

Abstract. Introduction

SUBJECT INDEXING: A LITERATURE SURVEY AND TRENDS

Deriving the Impact of Scientific Publications by Mining Citation Opinion Terms

Citation Database for Japanese Papers: A new bibliometric tool for Japanese academic society

Journal of American Computing Machinery: A Citation Study

Coverage analysis of publications of University of Mysore in Scopus

Citation Analysis in Research Evaluation

Appendix: The ACUMEN Portfolio

Prof. Farideh Osareh

ARTICLE IN PRESS. Journal of Informetrics xxx (2009) xxx xxx. Contents lists available at ScienceDirect. Journal of Informetrics

LIS Journals in Directory of Open Access Journals: A Study

CitNetExplorer: A new software tool for analyzing and visualizing citation networks

BOOKS AND PAMPHLETS PRODUCTION FOR THE PEROID JANUARY - SEPTEMBER 2010 AND CONTINUED EDITIONS IN 2010

A BIBLIOMETRIC ANALYSIS OF ASIAN AUTHORSHIP PATTERN IN JASIST,

3.1 ANNALS OF LIBRARY AND INFORMATION STUDIES

Mike Thelwall 1, Stefanie Haustein 2, Vincent Larivière 3, Cassidy R. Sugimoto 4

Citation analysis: State of the art, good practices, and future developments

Open Access Determinants and the Effect on Article Performance

Measuring Research Impact of Library and Information Science Journals: Citation verses Altmetrics

Write to be read. Dr B. Pochet. BSA Gembloux Agro-Bio Tech - ULiège. Write to be read B. Pochet

Changes in publication languages and citation practices and their effect on the scientific impact of Russian Science ( ) 1

Bibliography management and scientific communication with Mendeley

Transcription:

K-means and Hierarchical Clustering Method to Improve our Understanding of Citation Contexts Marc Bertin 1 and Iana Atanassova 2 August 11, 2017 1 CIRST - Université du Québec à Montréal (UQAM), Canada 2 CRIT - Centre Tesniere, University of Bourgogne Franche-Comte, France 2 nd Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL 2017) at the 40 th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan The BIRNDL proceedings are published at http://ceur-ws.org/vol-1888/. The video of the presentation is available at https://youtu.be/mntmmrplg9y.

Research Problem Scientific papers usually follow a specific rhetorical structure: IMRaD (Introduction, Method, Result and Discussion) The IMRaD structure plays an important role in determining the types of citation contexts; The specific domains and topics of the various journals, and also their own editorial lines, can have an effect on the direct context of citations. Objective Study the properties of citation contexts on a large scale to be able to create an ontology of citations that reflects the types of citations found in articles. 2 / 10

We propose: a method to analyze citation contexts at a large scale taking into account various criteria; a multidimensional approach to this problem which is based on clusters. We use: k-means; hierarchical clustering. 3 / 10

PLOS Dataset Journal Articles Citations Citation contexts PLOS Biology 1,754 170,785 91,117 PLOS Computational Biology 2,560 243,488 126,870 PLOS Genetics 3,414 332,845 185,5 37 PLOS Medicine 926 72,676 34,819 PLOS Negl. Tropical Diseases 1,872 133,022 73,211 PLOS ONE 72,158 5,363,036 2,854,082 Total 82,684 6,315,852 3,365,636 Published by the Public Library of Science (PLOS), in Open Access; XML, Journal Article Tag Suite (JATS); Entire corpus up to September 2013. 4 / 10

The Elbow Method to determine the number of clusters Elbow with the sum of squared error: Calinsky criterion with interval for groups between 1 and 10: 5 / 10

Results: K-means clustering with k = 4 6 / 10

Results: Hierarchical Clustering 7 / 10

Conclusion We observe the atypical nature of the Methods section in terms of citation contexts, and this confirms previous studies (see [6, 2, 4, 5, 1, 3]); One of the advantages of using the topic modeling approach is the possibility to deal with large volumes of textual data; Studying the structure of scientific papers and observing the regularities in the contexts of in-text citations is an important step towards understanding the phenomenon of citation which is central in the process of building scientific knowledge. 8 / 10

Thank you! Marc Bertin Assistant Professor ELICO Université Claude Bernard Lyon 1, France marc.bertin@protonmail.ch Iana Atanassova Assistant Professor CRIT - Centre Tesniere, University of Bourgogne Franche-Comte, France iana.atanassova@univ-fcomte.fr 9 / 10

Bibliography I Marc Bertin and Iana Atanassova. A study of lexical distribution in citation contexts through the IMRaD standard. In Proceedings of the First Workshop on Bibliometric-enhanced Information Retrieval co-located with 36 th European Conference on Information Retrieval (ECIR 2014), pages 5 12, Amsterdam, The Netherlands, April 13 2014. Marc Bertin and Iana Atanassova. Multiple in-text reference phenomenon. In Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), Joint Conference on Digital Libraries 2016 (JCDL), pages 14 22, Newark, NJ, USA, June 2016. Marc Bertin, Iana Atanassova, Vincent Larivière, and Yves Gingras. The distribution of references in scientific papers: an analysis of the imrad structure. In 14 th International Society of Scientometrics and Informatics Conference, Vienna, Austria, July 15-19 2013. International Society for Scientometrics and Infometrics. Marc Bertin, Iana Atanassova, Vincent Larivière, and Yves Gingras. The linguistic context of citations: a cartography of the structure of scientific papers. In AAAS Annual Meeting, San Jose, CA, February 2015. American Association for the Advancement of Science. 9 / 10

Bibliography II Marc Bertin, Iana Atanassova, Vincent Larivière, and Yves Gingras. Mapping the Linguistic Context of Citations. Bulletin of the Association for Information Science and Technology (ASIST) Featuring the The Future of Science Mapping, 41(2), January 2015. Marc Bertin, Iana Atanassova, Vincent Larivière, and Yves Gingras. The invariant distribution of references in scientific papers. Journal of the Association for Information Science and Technology, 67(1):164 177, January 2016. 10 / 10