Refine
Document Type
- Article (2)
Language
- English (2)
Has Fulltext
- yes (2)
Keywords
- Altmetrics (1)
- Domain-Specific Academic Search (1)
- Evaluation (1)
- Evaluation Infrastructures (1)
- Information Retrieval (1)
- Online/Offline Evaluation (1)
- Polyrepresentation (1)
- Relevance Assessments (1)
- Relevance Theory (1)
- Reproducibility (1)
Academic search systems aid users in finding information covering specific topics of scientific interest and have evolved from early catalog-based library systems to modern web-scale systems. However, evaluating the performance of the underlying retrieval approaches remains a challenge. An increasing amount of requirements for producing accurate retrieval results have to be considered, e.g., close integration of the system’s users. Due to these requirements, small to mid-size academic search systems cannot evaluate their retrieval system in-house. Evaluation infrastructures for shared tasks alleviate this situation. They allow researchers to experiment with retrieval approaches in specific search and recommendation scenarios without building their own infrastructure. In this paper, we elaborate on the benefits and shortcomings of four state-of-the-art evaluation infrastructures on search and recommendation tasks concerning the following requirements: support for online and offline evaluations, domain specificity of shared tasks, and reproducibility of experiments and results. In addition, we introduce an evaluation infrastructure concept design aiming at reducing the shortcomings in shared tasks for search and recommender systems.
Relevance Assessments, Bibliometrics, and Altmetrics: A Quantitative Study on PubMed and arXiv
(2022)
Relevance is a key element for analyzing bibliometrics and information retrieval (IR). In both domains, relevance decisions are discussed theoretically and sometimes evaluated in empirical studies. IR research is often based on test collections for which explicit relevance judgments are made, while bibliometrics is based on implicit relevance signals like citations or other non-traditional quantifiers like altmetrics. While both types of relevance decisions share common concepts, it has not been empirically investigated how they relate to each other on a larger scale. In this work, we compile a new dataset that aligns IR relevance judgments with traditional bibliometric relevance signals (and altmetrics) for life sciences and physics publications. The dataset covers PubMed and arXiv articles, for which relevance judgments are taken from TREC Precision Medicine and iSearch, respectively. It is augmented with bibliometric data from the Web of Science and Altmetrics. Based on the reviewed literature, we outline a mental framework supporting the answers to our research questions. Our empirical analysis shows that bibliometric ( implicit ) and IR ( explicit ) relevance signals are correlated. Likewise, there is a high correlation between biblio- and altmetrics, especially for documents with explicit positive relevance judgments. Furthermore, our cross-domain analysis demonstrates the presence of these relations in both research fields.