Healthc Inform Res.  2018 Apr;24(2):148-153. 10.4258/hir.2018.24.2.148.

HEDEA: A Python Tool for Extracting and Analysing Semi-structured Information from Medical Records

Affiliations
  • 1Department of Computer Science and Engineering, Thapar Institute of Engineering and Technology, Patiala, India. ajaykumar@thapar.edu

Abstract


OBJECTIVES
One of the most important functions for a medical practitioner while treating a patient is to study the patient's complete medical history by going through all records, from test results to doctor's notes. With the increasing use of technology in medicine, these records are mostly digital, alleviating the problem of looking through a stack of papers, which are easily misplaced, but some of these are in an unstructured form. Large parts of clinical reports are in written text form and are tedious to use directly without appropriate pre-processing. In medical research, such health records may be a good, convenient source of medical data; however, lack of structure means that the data is unfit for statistical evaluation. In this paper, we introduce a system to extract, store, retrieve, and analyse information from health records, with a focus on the Indian healthcare scene.
METHODS
A Python-based tool, Healthcare Data Extraction and Analysis (HEDEA), has been designed to extract structured information from various medical records using a regular expression-based approach.
RESULTS
The HEDEA system is working, covering a large set of formats, to extract and analyse health information.
CONCLUSIONS
This tool can be used to generate analysis report and charts using the central database. This information is only provided after prior approval has been received from the patient for medical research purposes.

Keyword

Medical Records; Information Storage and Retrieval; Data Collection; Metadata; Medical Report; Regular Expression

MeSH Terms

Boidae*
Data Collection
Delivery of Health Care
Humans
Information Storage and Retrieval
Medical Records*

Figure

  • Figure 1 System architecture diagram.

  • Figure 2 Unstructured extraction using regular expressions and distance scoring.

  • Figure 3 Sample output.

  • Figure 4 Blood sugar levels of a patient.

  • Figure 5 Body mass index (BMI) of patients in the comprehensive analytics database.


Cited by  2 articles

Extracting Structured Genotype Information from Free-Text HLA Reports Using a Rule-Based Approach
Kye Hwa Lee, Hyo Jung Kim, Yi-Jun Kim, Ju Han Kim, Eun Young Song
J Korean Med Sci. 2020;35(12):.    doi: 10.3346/jkms.2020.35.e78.

ANNO: A General Annotation Tool for Bilingual Clinical Note Information Extraction
Kye Hwa Lee, Hyunsung Lee, Jin-Hyeok Park, Yi-Jun Kim, Youngho Lee
Healthc Inform Res. 2022;28(1):89-94.    doi: 10.4258/hir.2022.28.1.89.


Reference

1. Dinu V, Nadkarni P. Guidelines for the effective use of entity-attribute-value modeling for biomedical databases. Int J Med Inform. 2007; 76(11-12):769–779.
Article
2. Harkema H, Roberts I, Gaizauskas R, Hepple M. Information extraction from clinical records. In : Proceedings of the 4th UK e-Science All Hands Meeting; 2005 Sep 19–22; Nottingham, UK.
3. Fette G, Ertl M, Worner A, Kluegl P, Stork S, Puppe F. Information extraction from unstructured electronic health records and integration into a data warehouse. In : Proceedings of the 57th Annual Meeting of the German Society for Medical Informatics, Biometry and Epidemiology (GMDS); 2012 Sep 16–20; Braunschweig, Germany. p. 1237–1251.
4. Atzmueller M, Beer S, Puppe F. A data warehouse-based approach for quality management, evaluation and analysis of intelligent systems using subgroup mining. In : Proceedings of the 22nd International Florida Artificial Intelligence Research Society Conference (FLAIRS); 2009 May 19–21; Sanibel Island, FL. p. 372–377.
5. Black N. Why we need observational studies to evaluate the effectiveness of health care. BMJ. 1996; 312(7040):1215–1218.
Article
6. Kamal J, Pasuparthi K, Rogers P, Buskirk J, Mekhjian H. Using an information warehouse to screen patients for clinical trials: a prototype. AMIA Annu Symp Proc. 2005; 2005:1004.
7. Aberdeen J, Bayer S, Yeniterzi R, Wellner B, Clark C, Hanauer D, et al. The MITRE Identification Scrubber Toolkit: design, training, and assessment. Int J Med Inform. 2010; 79(12):849–859.
Article
8. Yang H, Garibaldi JM. Automatic detection of protected health information from clinic narratives. J Biomed Inform. 2015; 58 Suppl. S30–S38.
Article
9. Sondhi P, Gupta M, Zhai C, Hockenmaier J. Shallow information extraction from medical forum data. In : Proceedings of the 23rd International Conference on Computational Linguistics: Posters; 2010 Aug 23–27; Beijing, China.
10. Uzuner O, Solti I, Cadag E. Extracting medication information from clinical text. J Am Med Inform Assoc. 2010; 17(5):514–518.
Article
11. Bae I, Kim JS. A refinement system for medical information extraction from text-based bilingual electronic medical records. J Korean Soc Med Inform. 2008; 14(3):267–274.
Article
12. Park YT, Lee YT, Jo EC. Constructing a real-time prescription drug monitoring system. Healthc Inform Res. 2016; 22(3):178–185.
Article
13. Glavas G. TAKELAB: medical information extraction and linking with MINERAL. In : Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval); 2015 Jun 4–5; Denver, CO. p. 389–393.
14. Kraus S, Blake C, West SL. Information extraction from medical notes. In : Proceedings of the 12th World Congress on Health (Medical) Informatics: Building Sustainable Health Systems; 2007 Aug 20–24; Brisbane, Australia. p. 1913–1915.
15. Chang P, Huang FP, Lai ML. The feasibility of using classification and identification techniques to auto-assess the quality of health information on the web. J Korean Soc Med Inform. 2009; 15(3):247–254.
Article
16. Athavale AV, Zodpey SP. Public health informatics in India: the potential and the challenges. Indian J Public Health. 2010; 54(3):131–136.
Article
17. Li P, Huang H. Clinical information extraction via convolutional neural network [Internet]. Ithaca (NY): arXiv.org;2016. cited at 2018 Apr 1. Available from: https://arxiv.org/abs/1603.09381.
Full Text Links
  • HIR
Actions
Cited
CITED
export Copy
Close
Share
  • Twitter
  • Facebook
Similar articles
Copyright © 2024 by Korean Association of Medical Journal Editors. All rights reserved.     E-mail: koreamed@kamje.or.kr