Using incomplete citation data for MEDLINE results ranking - PubMed
Using incomplete citation data for MEDLINE results ranking
Jorge R Herskovic et al. AMIA Annu Symp Proc. 2005.
Abstract
Information overload is a significant problem for modern medicine. Searching MEDLINE for common topics often retrieves more relevant documents than users can review. Therefore, we must identify documents that are not only relevant, but also important. Our system ranks articles using citation counts and the PageRank algorithm, incorporating data from the Science Citation Index. However, citation data is usually incomplete. Therefore, we explore the relationship between the quantity of citation information available to the system and the quality of the result ranking. Specifically, we test the ability of citation count and PageRank to identify "important articles" as defined by experts from large result sets with decreasing citation information. We found that PageRank performs better than simple citation counts, but both algorithms are surprisingly robust to information loss. We conclude that even an incomplete citation database is likely to be effective for importance ranking.
Figures

Growth of PubMed measured in articles added per year (data retrieved from PubMed itself)

Basic system architecture. A query is passed to PubMed (Retrieval Engine) for processing. The original ordering is discarded and a new ranking is computed locally. The full PubMed entries are retrieved from a local store (Content Retrieval) and displayed.

Recall/precision curves for the simple citation count with progressively smaller datasets (intermediate curves omitted for clarity)

Recall/precision curves for the PageRank algorithm with progressively smaller datasets (intermediate curves omitted for clarity)
Similar articles
-
Using citation data to improve retrieval from MEDLINE.
Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR. Bernstam EV, et al. J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12. J Am Med Inform Assoc. 2006. PMID: 16221938 Free PMC article.
-
G-Bean: an ontology-graph based web tool for biomedical literature retrieval.
Wang JZ, Zhang Y, Dong L, Li L, Srimani PK, Yu PS. Wang JZ, et al. BMC Bioinformatics. 2014;15 Suppl 12(Suppl 12):S1. doi: 10.1186/1471-2105-15-S12-S1. Epub 2014 Nov 6. BMC Bioinformatics. 2014. PMID: 25474588 Free PMC article.
-
Kuper H, Nicholson A, Hemingway H. Kuper H, et al. BMC Med Res Methodol. 2006 Feb 16;6:4. doi: 10.1186/1471-2288-6-4. BMC Med Res Methodol. 2006. PMID: 16483366 Free PMC article.
-
Promise and pitfalls of extending Google's PageRank algorithm to citation networks.
Maslov S, Redner S. Maslov S, et al. J Neurosci. 2008 Oct 29;28(44):11103-5. doi: 10.1523/JNEUROSCI.0002-08.2008. J Neurosci. 2008. PMID: 18971452 Free PMC article. Review. No abstract available.
-
Searching the MEDLINE literature database through PubMed: a short guide.
Motschall E, Falck-Ytter Y. Motschall E, et al. Onkologie. 2005 Oct;28(10):517-22. doi: 10.1159/000087186. Epub 2005 Aug 19. Onkologie. 2005. PMID: 16186693 Review.
Cited by
-
The 50 most influential papers pertaining to the Ilizarov method: A bibliometric analysis.
Murphy B, Irwin S, Condon F. Murphy B, et al. J Orthop. 2022 Feb 11;30:30-35. doi: 10.1016/j.jor.2022.02.010. eCollection 2022 Mar-Apr. J Orthop. 2022. PMID: 35241884 Free PMC article.
-
Identifying plausible adverse drug reactions using knowledge extracted from the literature.
Shang N, Xu H, Rindflesch TC, Cohen T. Shang N, et al. J Biomed Inform. 2014 Dec;52:293-310. doi: 10.1016/j.jbi.2014.07.011. Epub 2014 Jul 19. J Biomed Inform. 2014. PMID: 25046831 Free PMC article.
-
Using citation data to improve retrieval from MEDLINE.
Bernstam EV, Herskovic JR, Aphinyanaphongs Y, Aliferis CF, Sriram MG, Hersh WR. Bernstam EV, et al. J Am Med Inform Assoc. 2006 Jan-Feb;13(1):96-105. doi: 10.1197/jamia.M1909. Epub 2005 Oct 12. J Am Med Inform Assoc. 2006. PMID: 16221938 Free PMC article.
References
-
- US National Library of Medicine [homepage on the Internet]. PubMed Milestone - 15 Millionth Journal Citation [published July 7, 2004; cited January 15, 2005]. NLM Technical Bulletin. Available from: http://www.nlm.nih.gov/pubs/techbull/ja04/ja04_technote.html
-
- Hersh W. Health and Biomedical Information. In: Hersh W, editor. Information Retrieval. New York: Springer; 2002. p. 22–82.
-
- US National Library of Medicine [homepage on the Internet]. PubMed Help - Display Order [updated March 11, 2005; cited March 16, 2005]. Available from: http://www.ncbi.nlm.nih.gov/entrez/query/static/help/pmhelp.html#Display...
-
- Saracevic, T. Information science: Integration in perspectives. In: Ingwersen P, Pors NO editors. Proceedings of the Second Conference on Conceptions of Library and Information Science (CoLIS 2); 14–17 Oct. 1996; Copenhagen (Denmark). Copenhagen: The Royal School of Librarianship; 1996. p. 201–218.
-
- Dictionary.com [database on the Internet]. The American Heritage® Dictionary of the English Language, Fourth Edition. Houghton Mifflin Company; 2000 [cited March 15, 2005]. Relevance. Available from: http://dictionary.reference.com/search?q=relevance
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources