Absolute Relevance? Ranking in the Scholarly Domain. Tamar Sadeh, PhD CNI, Baltimore, MD April 2012

Similar documents
Introduction to Primo

MetaLib and Your Resources

Primo Central. Emerging Technologies in Academic Libraries Trondheim 26 April Jürgen Küssow, Senior Consultant Pre Sales Europe

Welcome to Verde. Copyright Statement

Marketing Primo at Your Institution

Ex Libris and Shibboleth

ALEPH Z39.50 Client Conformance to U.S. National Z39.50 Profile (ANSI/NISO Z ) Version and Later

Getting the Most from Alma. Patron Driven Acquisitions (PDA)

November Ex Libris Certified Third-Party Software and Security Patch Release Notes

RESTful API for System Status

SecureFTP Procedure for Alma Implementing Customers

New ILS Data Delivery Guidelines

What You Need to Know About Addressing GDPR Data Subject Rights in Primo

EDI Certification Process for Vendor Partners. November 2017

Learning & Teaching Day: Using Broadsearch to Support Learning and Teaching at UEA

Web of Science Unlock the full potential of research discovery

Staff User s Guide Course Reading and Reserves. Version 22

Primo. Michael Cotta-Schønberg. To cite this version: HAL Id: hprints

Chapter 3 sourcing InFoRMAtIon FoR YoUR thesis

Simple Steps to Effective Library Research :

Citation & Journal Impact Analysis

Page 1 of 5 AUTHOR GUIDELINES OXFORD RESEARCH ENCYCLOPEDIA OF NEUROSCIENCE

WHITEPAPER. Customer Insights: A European Pay-TV Operator s Transition to Test Automation

Using Bibliometric Analyses for Evaluating Leading Journals and Top Researchers in SoTL

Bibliometrics and the Research Excellence Framework (REF)

ISSN: ISO 9001:2008 Certified International Journal of Engineering Science and Innovative Technology (IJESIT) Volume 3, Issue 2, March 2014

Citation analysis: Web of science, scopus. Masoud Mohammadi Golestan University of Medical Sciences Information Management and Research Network

Citation Metrics. From the SelectedWorks of Anne Rauh. Anne E. Rauh, Syracuse University Linda M. Galloway, Syracuse University.

Cited Publications 1 (ISI Indexed) (6 Apr 2012)

2013 Environmental Monitoring, Evaluation, and Protection (EMEP) Citation Analysis

Voyager Technical ReadMe. Version 9.1.1

Embedding Librarians into the STEM Publication Process. Scientists and librarians both recognize the importance of peer-reviewed scholarly

ENSC 105W: PROCESS, FORM, AND CONVENTION IN PROFESSIONAL GENRES

How comprehensive is the PubMed Central Open Access full-text database?

35 Faculty of Engineering, Chulalongkorn University

Journal Citation Reports on the Web. Don Sechler Customer Education Science and Scholarly Research

Microsoft Office Word 2016 for Mac

Ex Libris. Aleph Privacy Impact Assessment

How Scholarly Is Google Scholar? A Comparison of Google Scholar to Library Databases

Dr. Diptanshu Das November 2016

Eigenfactor : Does the Principle of Repeated Improvement Result in Better Journal. Impact Estimates than Raw Citation Counts?

Journal of Documentation : a Bibliometric Study

MEASURING EMERGING SCIENTIFIC IMPACT AND CURRENT RESEARCH TRENDS: A COMPARISON OF ALTMETRIC AND HOT PAPERS INDICATORS

DM DiagMon Architecture

Introduction to Bell Library Resources

WEB OF SCIENCE THE NEXT GENERATAION. Emma Dennis Account Manager Nordics

Journal Citation Reports Your gateway to find the most relevant and impactful journals. Subhasree A. Nag, PhD Solution consultant

INTRODUCTION TO INFORMATION LITERACY

Searching For Truth Through Information Literacy

Measuring Academic Impact

Bringing an all-in-one solution to IoT prototype developers

Enabling editors through machine learning

University of Liverpool Library. Introduction to Journal Bibliometrics and Research Impact. Contents

USING THE UNISA LIBRARY S RESOURCES FOR E- visibility and NRF RATING. Mr. A. Tshikotshi Unisa Library

Bibliometric analysis of the field of folksonomy research

The HKIE Outstanding Paper Award for Young Engineers/Researchers 2019 Instructions for Authors

Promoting your journal for maximum impact

In-House Use and Alma Analytics Reports for In-House Use

Frequently Asked Questions about Rice University Open-Access Mandate

Firmware Update Management Object Architecture

Bibliometric Analysis of Literature Published in Emerald Journals on Cloud Computing

Chapter Two - Finding and Evaluating Sources

Citation Analysis of International Journal of Library and Information Studies on the Impact Research of Google Scholar:

Conducting a successful literature search: A researcher s guide to tools, terms and techniques

Grade 6. Library Media Curriculum Guide August Edition

Where to present your results. V4 Seminars for Young Scientists on Publishing Techniques in the Field of Engineering Science

Finding Influential journals:

INTRODUCTION TO SCIENTOMETRICS. Farzaneh Aminpour, PhD. Ministry of Health and Medical Education

Avoiding plagiarism - information, communication and referencing

Old Fort St. Joseph or Michigan Under Four Flags

MSc Projects Information Searching. MSc Projects Information Searching. Peter Hancox Computer Science

Multi-Media Card (MMC) DLL Tuning

The College Student s Research Companion:

American National Standard for Lamp Ballasts High Frequency Fluorescent Lamp Ballasts

Working BO1 BUSINESS ONTOLOGY: OVERVIEW BUSINESS ONTOLOGY - SOME CORE CONCEPTS. B usiness Object R eference Ontology. Program. s i m p l i f y i n g

Research metrics. Anne Costigan University of Bradford

STEVAL-IHM043V1. 6-step BLDC sensorless driver board based on the STM32F051 and L6234. Features. Description

Journal of Information Literacy

What is Web of Science Core Collection? Thomson Reuters Journal Selection Process for Web of Science

Why not Conduct a Survey?

Google Labs, for products in development:

Information Literacy Skills Tutorial

EVALUATING THE IMPACT FACTOR: A CITATION STUDY FOR INFORMATION TECHNOLOGY JOURNALS

Research Evaluation Metrics. Gali Halevi, MLS, PhD Chief Director Mount Sinai Health System Libraries Assistant Professor Department of Medicine

Firmware Update Management Object Architecture

A Citation Analysis of Articles Published in the Top-Ranking Tourism Journals ( )

UNDERSTANDING JOURNAL METRICS

PUBLIKASI JURNAL INTERNASIONAL

Supplementary Note. Supplementary Table 1. Coverage in patent families with a granted. all patent. Nature Biotechnology: doi: /nbt.

Sample only Oxford University Press ANZ

INFORMATION-RESOURCES AND REFERENCE MANAGEMENT

PSYCINFO. Later this year APA will introduce a new. In this issue 2 PsycCRITIQUES 3 PsycBOOKS 4 PsycBOOKS. 5 Changes to

VS-201H VS-401H VS-801H

Citation Indexes: The Paradox of Quality

I. Introduction Assessment Plan for Ph.D. in Musicology & Ethnomusicology School of Music, College of Fine Arts

InCites Indicators Handbook

CITATION INDEX AND ANALYSIS DATABASES

Ex Libris Rosetta Privacy Impact Assessment

THE AFRICAN DIGITAL LIBRARY: CONCEPT AND PRACTICE

Researching the World s Information

Transcription:

Absolute Relevance? Ranking in the Scholarly Domain Tamar Sadeh, PhD CNI, Baltimore, MD April 2012

Copyright Statement All of the information and material inclusive of text, images, logos, product names is either the property of, or used with permission by Ex Libris Ltd. The information may not be distributed, modified, displayed, reproduced in whole or in part without the prior written permission of Ex Libris Ltd. TRADEMARKS Ex Libris, the Ex Libris logo, Aleph, SFX, SFXIT, MetaLib, DigiTool, Verde, Primo, Voyager, MetaSearch, MetaIndex and other Ex Libris products and services referenced herein are trademarks of Ex Libris, and may be registered in certain jurisdictions. All other product names, company names, marks and logos referenced may be trademarks of their respective owners. DISCLAIMER The information contained in this document is compiled from various sources and provided on an "AS IS" basis for general information purposes only without any representations, conditions or warranties whether express or implied, including any implied warranties of satisfactory quality, completeness, accuracy or fitness for a particular purpose. Ex Libris, its subsidiaries and related corporations ("Ex Libris Group") disclaim any and all liability for all use of this information, including losses, damages, claims or expenses any person may incur as a result of the use of this information, even if advised of the possibility of such loss or damage. Ex Libris Ltd., 2012

The top three keys to success

Content

Speed

Relevance Ranking

This talk is about relevance ranking

What is relevance? What is relevance in the scholarly domain? How do we measure relevance? What can be done and how?

relevance is. Information technology is tangible;; relevance is intangible. Information technology is relatively well understood formally;; relevance is understood intuitively. Information technology has to be learned;; relevance is tacit. Information technology has to be explained to Saracevic, 2007

Relevance is the measure of correspondence between a document and a query as determined by a user Based on Saracevic, 1975

?

System or algorithmic relevance System or algorithmic relevance Topical or subject relevance Topical or subject relevance Cognitive relevance or pertinence Cognitive relevance or pertinence Situational relevance or utility Situational relevance or utility Affective relevance Affective relevance

There is no absolute relevance

Aboutness Bibliographic descriptions Classifications Ontologies Relevance Librarianship Information Retrieval (IR)

Aboutness and relevance Discovery Systems

Effectiveness of IR is measured by the probability of agreement between what the system retrieved or constructed as relevant and what a user assessed or derived as relevant

The ScholarRank Project

The Goal Enhance the Primo relevance ranking algorithm

Relevance ranking was not new to us.

Methodology

Setting up a team Building test environment, tools, and procedures Defining metrics to evaluate our current success and the improvements we make Defining measurements to assess the success of the changes, once implemented

Lab Mode

Precision and Recall Precision= Number of relevant documents retrieved Total number of documents retrieved Recall= Number of relevant documents retrieved Total number of existing relevant documents

Is this item relevant enough to be on the first result page? That is the question.

Evaluation Metrics Mean Average Precision (MAP) http://en.wikipedia.org/wiki/mean_average_precision Mean Reciprocal Rank (MRR) http://en.wikipedia.org/wiki/mean_reciprocal_rank

Infrastructure Data Tools Automated process to run queries Calculation of MAP and MRR metrics Ad hoc changes of parameters in test environment

Evaluators Academic researchers in various disciplines (physics, medicine, philosophy, anthropology, agriculture, biology)

Evaluation Evaluators sent their queries of four types broad-topic, narrow-topic, known-item, other Two sets of the first 20 results were sent back and evaluated One set returned by Primo One set returned by Google Scholar Evaluators were not aware of the origin of the sets Evaluators marked Yes/No If No, why not

* * Initial results (March 2011);; only 15% of the Primo Central content * MAP maximal value: 1.00

More Queries Many more evaluators Full scope of search UI changed to support evaluation

45% 40% 35% 30% 25% 20% 15% 10% 5% 0% MAP 0-0.19 0.2-0.39 0.4-0.59 0.6-0.79 0.8-1 80% 70% 60% 50% 40% 30% 20% 10% 0% 0-0.09 0.1-0.19 0.2-0.29 0.3-0.39 0.5 1 MRR

Real-Life Mode

Monitoring after implementation Number of times that users moved to the next page of results Number of sessions that culminated in the selection of an item Average number of items that were selected per session Location of the selected items on the result list

So?

Phase 1 Maximized existing algorithm

0.355 0.35 0.345 0.34 0.335 0.33 0.325 0.32 0.315 0.31 0.305 Material type = journal article 0 5 10 15 20 25 30 35 MAP MRR Boost value

Phase 2 Added factors

?

abstract author date full text journal language type publisher subject title citations downloads journal impact factor eigenfactor pagerank

academic degree discipline(s) language location previous selections search history

? broad-topic search currency exact-item search material type narrow-topic search

Broad-topic query

Narrow-topic query

Author-related query

Known-item query

?

The match: traditional IR methods, adapted for the scholarly environment

1 4 39

? no. of citations;; no. of selections;; recency;; type;; peer review

1 27 231

Academic degree, discipline?

Author-related query, known-item query, broad-?

Before

Tachycardia Wikipedia the free encyclopedia Tachycardia comes from the Greek words tachys (rapid or accelerated) and kardia (of the heart). Tachycardia typically refers to a heart rate that exceeds the normal range for a resting... After

?

Bibliography Saracevic, T. (1975). Relevance: A review of and a framework for the thinking on the notion of information science. Journal of American Society for Information Science, 26(6), 321 343. Saracevic, T. (2007) Relevance: A Review of the Literature and a Framework for Thinking on the Notion in Information Science. Part II: Nature and Manifestations of Relevance. Journal of the American Society for Information Science and Technology, 58(13):1915 1933 http://www.wired.com/magazine/2010/02/ff_google_algorit hm/all/1

Thank You! Tamar Sadeh, PhD tamar.sadeh@exlibrisgroup.com