Towards a Stratified Learning Approach to Predict Future Citation Counts

Similar documents
arxiv: v1 [cs.dl] 9 May 2017

Understanding Book Popularity on Goodreads

The problems of field-normalization of bibliometric data and comparison among research institutions: Recent Developments

An Introduction to Bibliometrics Ciarán Quinn

Citation Metrics. From the SelectedWorks of Anne Rauh. Anne E. Rauh, Syracuse University Linda M. Galloway, Syracuse University.

Measuring Academic Impact

Web of Science, Scopus, & Altmetrics:

Embedding Librarians into the STEM Publication Process. Scientists and librarians both recognize the importance of peer-reviewed scholarly

Research Evaluation Metrics. Gali Halevi, MLS, PhD Chief Director Mount Sinai Health System Libraries Assistant Professor Department of Medicine

WHO S CITING YOU? TRACKING THE IMPACT OF YOUR RESEARCH PRACTICAL PROFESSOR WORKSHOPS MISSISSIPPI STATE UNIVERSITY LIBRARIES

Your research footprint:

MEASURING EMERGING SCIENTIFIC IMPACT AND CURRENT RESEARCH TRENDS: A COMPARISON OF ALTMETRIC AND HOT PAPERS INDICATORS

WHAT MAKES FOR A HIT POP SONG? WHAT MAKES FOR A POP SONG?

Discussing some basic critique on Journal Impact Factors: revision of earlier comments

Scientometrics & Altmetrics

UNDERSTANDING JOURNAL METRICS

Sampling: What you don t know can hurt you. Juan Muñoz

Release Year Prediction for Songs

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3, Issue 2, March 2014

VIRTUAL NETWORKING AND CITATION ANALYSIS

Music Genre Classification and Variance Comparison on Number of Genres

Eigenfactor : Does the Principle of Repeated Improvement Result in Better Journal. Impact Estimates than Raw Citation Counts?

KDD-Cup Paul Ginsparg, Johannes Gehrke, and Jon Kleinberg. Department of Computer Science Cornell University. 9/3/2003

Research Impact Measures The Times They Are A Changin'

THE USE OF THOMSON REUTERS RESEARCH ANALYTIC RESOURCES IN ACADEMIC PERFORMANCE EVALUATION DR. EVANGELIA A.E.C. LIPITAKIS SEPTEMBER 2014

Citation Analysis of International Journal of Library and Information Studies on the Impact Research of Google Scholar:

Passive Image Forensic Method to Detect Resampling Forgery in Digital Images

CSIRO INFORMATION MANAGEMENT & TECHNOLOGY

Information Networks

Academic Identity: an Overview. Mr. P. Kannan, Scientist C (LS)

Citation analysis: Web of science, scopus. Masoud Mohammadi Golestan University of Medical Sciences Information Management and Research Network

AGENDA. Mendeley Content. What are the advantages of Mendeley? How to use Mendeley? Mendeley Institutional Edition

Enabling editors through machine learning

Using Bibliometric Analyses for Evaluating Leading Journals and Top Researchers in SoTL

LMS301: Reference Management Software (Mendeley)

Microsoft Academic: is the Phoenix getting wings?

Comprehensive Citation Index for Research Networks

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

A Citation Analysis of Articles Published in the Top-Ranking Tourism Journals ( )

Citation Indexes and Bibliometrics. Giovanni Colavizza

Navigate to the Journal Profile page

Privacy Level Indicating Data Leakage Prevention System

Are Your Citations Clean? New Scenarios and Challenges in Maintaining Digital Libraries

What are Bibliometrics?

Promoting your journal for maximum impact

Special Article. Prior Publication Productivity, Grant Percentile Ranking, and Topic-Normalized Citation Impact of NHLBI Cardiovascular R01 Grants

Cited Publications 1 (ISI Indexed) (6 Apr 2012)

Automatic Identification of Instrument Type in Music Signal using Wavelet and MFCC

Web of Science User Training. #1: Getting Started. Setting up. 1) Search. Page1

Scientific contribution of Professor Mahalanobis: a bio-bibliometric study

unbiased , is zero. Yï) + iab Fuller and Burmeister [4] suggested the estimator: N =Na +Nb + Nab Na +NB =Nb +NA.

Detecting Musical Key with Supervised Learning

*Senior Scientific Advisor, Amsterdam, The Netherlands.

A Fast Alignment Scheme for Automatic OCR Evaluation of Books

The Financial Counseling and Planning Indexing Project: Establishing a Correlation Between Indexing, Total Citations, and Library Holdings

Introduction to Citation Metrics

NYU Scholars for Department Coordinators:

NYU Scholars for Individual & Proxy Users:

Joint Optimization of Source-Channel Video Coding Using the H.264/AVC encoder and FEC Codes. Digital Signal and Image Processing Lab

Early Mendeley readers correlate with later citation counts 1

Impact Factors: Scientific Assessment by Numbers

Citation-Based Indices of Scholarly Impact: Databases and Norms

Horizon 2020 Policy Support Facility

Measuring Research Impact of Library and Information Science Journals: Citation verses Altmetrics

New analysis features of the CRExplorer for identifying influential publications

Usage versus citation indicators

IEEE TRANSACTIONS ON PROFESSIONAL COMMUNICATION, VOL. 0, NO.,

Coverage of highly-cited documents in Google Scholar, Web of Science, and Scopus: a multidisciplinary comparison

Does Microsoft Academic Find Early Citations? 1

Elsevier Databases Training

Article-level metrics: a comparison between publishers. Colin Batchelor Data Science Bologna, September 2018

Using Endnote to Organize Literature Searches Page 1 of 6

A Discriminative Approach to Topic-based Citation Recommendation

arxiv: v1 [cs.dl] 8 Oct 2014

Journal of American Computing Machinery: A Citation Study

Neural Network for Music Instrument Identi cation

Library Herald: A Bibliometric Study ( )

ResearchGate vs. Google Scholar: Which finds more early citations? 1

USING THE UNISA LIBRARY S RESOURCES FOR E- visibility and NRF RATING. Mr. A. Tshikotshi Unisa Library

Bibliometric study of the Nigerian Predatory Biomedical Open Access Journals during Willie Ezinwa Nwagwu, PhD and Obinna Ojemeni

Bibliometric analysis of the field of folksonomy research

THE IMPACT OF MIREX ON SCHOLARLY RESEARCH ( )

Scopus. Advanced research tips and tricks. Massimiliano Bearzot Customer Consultant Elsevier

Sentiment Analysis on YouTube Movie Trailer comments to determine the impact on Box-Office Earning Rishanki Jain, Oklahoma State University

Scientific and technical foundation for altmetrics in the US

Application of Bradford s Law on journal citations: A study of Ph.D. theses in social sciences of University of Delhi

Instructions to Authors

hprints , version 1-1 Oct 2008

Percentile Rank and Author Superiority Indexes for Evaluating Individual Journal Articles and the Author's Overall Citation Performance

Bibliometric Rankings of Journals Based on the Thomson Reuters Citations Database

Predicting the Importance of Current Papers

AN OVERVIEW ON CITATION ANALYSIS TOOLS. Shivanand F. Mulimani Research Scholar, Visvesvaraya Technological University, Belagavi, Karnataka, India.

On the differences between citations and altmetrics: An investigation of factors driving altmetrics vs. citations for Finnish articles 1

The 2016 Altmetrics Workshop (Bucharest, 27 September, 2016) Moving beyond counts: integrating context

Bibliometrics & Research Impact Measures

The ACL Anthology Network Corpus. University of Michigan

ur-caim: Improved CAIM Discretization for Unbalanced and Balanced Data

How comprehensive is the PubMed Central Open Access full-text database?

Lyrics Classification using Naive Bayes

Año 8, No.27, Ene Mar What does Hirsch index evolution explain us? A case study: Turkish Journal of Chemistry

Transcription:

Towards a Stratified Learning Approach to Predict Future Citation Counts Tanmoy Chakraborty Google India PhD Fellow IIT Kharagpur, India Suhansanu Kumar, Pawan Goyal, Niloy Ganguly, Animesh Mukherjee Dept. of CSE, IIT Kharagpur, India Digital Libraries, September 8-12, 2014

Citation Patten over the Year Citation count Years after publication of a paper

Citation Profile of An Article Common consensus about the growth of citation count of a paper over time after publication [Garfield, Nature, 01] [Garfield, Nature, 01] [Hirsch, PNAS, 05] [Chakraborty et al., ASONAM, 13]

Bibliometrics Journal Impact factor Immediacy factor Altmetric 5 years Impact factor This observation was drawn from the analysis of a very limited set of publication data [Kulkarni et al., PLoS ONE, 07] [Callaham et al., JAMA, 02]

Publication Universe Crawled entire Microsoft Academic Search Papers in Computer Science domain Basic preprocessing Basic Statistics of papers from 1960-2010 Values Number of valid entries 3,473,171 Number of authors 1,186,412 Number of unique venues 6,143 Avg. number of papers per author 5.18 Avg. number of authors per paper 2.49

Publication Universe (Contd ) Available Metadata Title Unique ID Named entity disambiguated authors name Year of publication Named entity disambiguated publication venue Related research field(s) References Keywords Abstract Available @ http://cnerg.org

Citation Profile Analysis An exhaustive analysis of the citation profiles Papers having at least 10 years history Scale the entries of the citation profile between 0-1 Use peak-detection heuristics Each peak should be at least 75% of the max peak Two consecutive peaks should be separated at least 3 yrs

Six Universal Citation Profiles Q 1 and Q 3 represent the first and third quartiles of the data points respectively. Another category: Oth => having less than one citation (on avg) per year

Application: Future Citation Count Prediction

Problem Definition

Traditional Framework Yan et al., JCDL, 2013 (Best Paper) Assumption: Dataset is homogeneous in terms of citation profile

Stratified Learning Stratification is the process of dividing members of the population into homogeneous subgroups (strata) before sampling. The strata should be mutually exclusive Every element in the population must be assigned to only one stratum Strata Publication dataset

Our Framework: 2-stage Model

Static Features Author-centric Venue-centric Paper-centric Productivity (Max/Avg) H-index (Max/Avg) Versatility (Max/Avg) Sociality (Max/Avg) Prestige Impact Factor Versatility Team-size Reference count Reference diversity Keyword diversity Topic diversity

Performance Evaluation (i) Coefficient of determination (R 2 ) (ii) The more, the better Mean squared error (θ) The less, the better (iii) Pearson correlation coefficient (ρ) The more, the better

Performance of SVM Confusion Matrix

Performance Evaluation

Performance in Different Citation Regions

Feature Analysis

More About the Model Robustness of the categorization o Merging of similar categories (such as PeakInit and MonDec) deteriorates the performance Impact of early citation information o Inclusion of first year s citations of a paper enhances the performance

Take Away Publication universe is heterogeneous in terms of citation profile Stratified Learning, a generic approach in machine learning helps enhancing a citation count prediction model Author-centric features are the most distinguishing ones Adding first year s citation count as a feature can improve the prediction accuracy

Future Plan Deeper analysis of the categorization Inclusion of content information as a feature in the model A new growth-model to mimic this categorization

Thank you http://cnerg.org http://cse.iitkgp.ernet.in/~tanmoyc