Healthc Inform Res.  2015 Jan;21(1):35-42. 10.4258/hir.2015.21.1.35.

Challenges and Practical Approaches with Word Sense Disambiguation of Acronyms and Abbreviations in the Clinical Domain

Affiliations
  • 1School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA. Sungrim.Moon@gmail.com
  • 2Department of Computer Science, Virginia Commonwealth University, Richmond, VA, USA.
  • 3Institute for Health Informatics, University of Minnesota, Minneapolis, MN, USA.
  • 4Department of Surgery, University of Minnesota, Minneapolis, MN, USA.

Abstract


OBJECTIVES
Although acronyms and abbreviations in clinical text are used widely on a daily basis, relatively little research has focused upon word sense disambiguation (WSD) of acronyms and abbreviations in the healthcare domain. Since clinical notes have distinctive characteristics, it is unclear whether techniques effective for acronym and abbreviation WSD from biomedical literature are sufficient.
METHODS
The authors discuss feature selection for automated techniques and challenges with WSD of acronyms and abbreviations in the clinical domain.
RESULTS
There are significant challenges associated with the informal nature of clinical text, such as typographical errors and incomplete sentences; difficulty with insufficient clinical resources, such as clinical sense inventories; and obstacles with privacy and security for conducting research with clinical text. Although we anticipated that using sophisticated techniques, such as biomedical terminologies, semantic types, part-of-speech, and language modeling, would be needed for feature selection with automated machine learning approaches, we found instead that simple techniques, such as bag-of-words, were quite effective in many cases. Factors, such as majority sense prevalence and the degree of separateness between sense meanings, were also important considerations.
CONCLUSIONS
The first lesson is that a comprehensive understanding of the unique characteristics of clinical text is important for automatic acronym and abbreviation WSD. The second lesson learned is that investigators may find that using simple approaches is an effective starting point for these tasks. Finally, similar to other WSD tasks, an understanding of baseline majority sense rates and separateness between senses is important. Further studies and practical solutions are needed to better address these issues.

Keyword

Abbreviations as Topic; Medical Records; Natural Language Processing; Artificial Intelligence; Automated Pattern Recognition

MeSH Terms

Abbreviations as Topic
Delivery of Health Care
Equipment and Supplies
Humans
Machine Learning
Medical Records
Natural Language Processing
Pattern Recognition, Automated
Prevalence
Privacy
Research Personnel
Semantics

Reference

1. Pakhomov S. Semi-supervised Maximum Entropy based approach to acronym and abbreviation normalization in medical texts. In : Proceedings of the 40th Annual Meeting on Association for Computational Linguistics (ACL2002); 2002 Jul 6-12; Philadelphia, PA. p. 160–167.
2. Stetson PD, Johnson SB, Scotch M, Hripcsak G. The sublanguage of cross-coverage. Proc AMIA Symp. 2002; 742–746.
3. Pakhomov S, Pedersen T, Chute CG. Abbreviation and acronym disambiguation in clinical discourse. AMIA Annu Symp Proc. 2005; 589–593.
4. Xu H, Markatou M, Dimova R, Liu H, Friedman C. Machine learning and word sense disambiguation in the biomedical domain: design and evaluation issues. BMC Bioinformatics. 2006; 5(7):334.
Article
5. Xu H, Stetson PD, Friedman C. A study of abbreviations in clinical notes. AMIA Annu Symp Proc. 2007; 821–825.
6. Kuhn IF. Abbreviations and acronyms in healthcare: when shorter isn't sweeter. Pediatr Nurs. 2007; 33:392–398.
7. Walsh KE, Gurwitz JH. Medical abbreviations: writing little and communicating less. Arch Dis Child. 2008; 93(10):816–817.
Article
8. Hunt DR, Verzier N, Abend SL, Lyder C, Jaser LJ, Safer N, et al. Fundamentals of medicare patient safety surveillance: intent, relevance, and transparency. In : Henriksen K, Battles JB, Marks ES, Lewin DI, editors. Advances in patient safety: from research to implementation (Volume 2: Concepts and Methodology). Rockville (MD): Agency for Healthcare Research and Quality;2005.
9. Fan JW, Friedman C. Word sense disambiguation via semantic type classification. AMIA Annu Symp Proc. 2008; 177–181.
10. Friedman C, Liu H, Shagina L, Johnson S, Hripcsak G. Evaluating the UMLS as a source of lexical knowledge for medical language processing. Proc AMIA Symp. 2001; 189–193.
11. Schuemie MJ, Kors JA, Mons B. Word sense disambiguation in the biomedical domain: an overview. J Comput Biol. 2005; 12:554–565.
Article
12. Kaplan A. An experimental study of ambiguity and context. Mech Transl. 1950; 2(2):39–46.
13. Choueka Y, Lusignan S. Disambiguation by short contexts. Comput Hum. 1985; 19(3):147–157.
Article
14. Joshi M, Pakhomov S, Pedersen T, Chute CG. A comparative study of supervised learning as applied to acronym expansion in clinical reports. AMIA Annu Symp Proc. 2006; 399–403.
15. Xu H, Stetson PD, Friedman C. Methods for building sense inventories of abbreviations in clinical notes. J Am Med Inform Assoc. 2009; 16(1):103–108.
Article
16. Savova GK, Coden AR, Sominsky IL, Johnson R, Ogren PV, de Groen PC, et al. Word sense disambiguation across two domains: biomedical literature and clinical notes. J Biomed Inform. 2008; 41(6):1088–1100.
Article
17. Manning CD, Schutze H. Foundations of statistical natural language processing. Cambridge (MA): MIT Press;1999.
18. Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. Proc AMIA Symp. 2001; 17–21.
19. McCray AT, Burgun A, Bodenreider O. Aggregating UMLS semantic types for reducing conceptual complexity. Stud Health Technol Inform. 2001; 84(Pt 1):216–220.
20. Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor. 2009; 11(1):10–18.
21. Meystre SM, Savova GK, Kipper-Schuler KC, Hurdle JF. Extracting information from textual documents in the electronic health record: a review of recent research. Yearb Med Inform. 2008; 128–144.
Article
22. NLP research data sets [Internet]. Boston (MA): i2b2;c2014. cited at 2015 Jan 5. Available from: https://www.i2b2.org/NLP/DataSets/Main.php.
23. University of Pittsburgh NLP Repository [Internet]. Pittsburgh (PA): Department of Biomedical Informatics, University of Pittsburgh;c2014. cited at 2015 Jan 5. Available from: http://www.dbmi.pitt.edu/nlpfront.
24. Liu H, Lussier YA, Friedman C. A study of abbreviations in the UMLS. Proc AMIA Symp. 2001; 393–397.
25. Zhou W, Torvik VI, Smalheiser NR. ADAM: another database of abbreviations in MEDLINE. Bioinformatics. 2006; 22(22):2813–2818.
Article
26. Melton GB, Moon S, McInnes B, Pakhomov S. Automated identification of synonyms in biomedical acronym sense inventories. In : Proceedings of the NAACL HLT 2010 Second Louhi Workshop on Text and Data Mining of Health Documents; 2010 Jun 5; Los Angeles, CA. p. 46–52.
27. Resnik P, Yarowsky D. Distinguishing systems and distinguishing senses: new evaluation methods for word sense disambiguation. Nat Lang Eng. 1999; 5(2):113–133.
Article
28. Leroy G, Rindflesch TC. Using symbolic knowledge in the UMLS to disambiguate words in small datasets with a naïve Bayes classifier. Stud Health Technol Inform. 2004; 107(Pt 1):381–385.
29. Stevenson M, Guo Y, Amri AA, Gaizauskas R. Disambiguation of biomedical abbreviations. In : Proceedings of the Workshop on Current Trends in Biomedical Natural Language Processing (BioNLP); 2009 Jun 4-5; Boulder, CO. p. 71–79.
30. Liu H, Teller V, Friedman C. A multi-aspect comparison study of supervised word sense disambiguation. J Am Med Inform Assoc. 2004; 11(4):320–331.
Article
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr