NLP PEOPLE PRIMER
1. EHR History


Reference to [1].
2. Why or why not EHR?
-
-
An EHR possesses many advantages and disadvantages over previous methods of record taking. According to wikipedia, “an electronic health record (EHR) is the systematized collection of patient and population electronically-stored health information in a digital format.” These
-
Disadvantages
-
It may first be better to discuss the disadvantages of the EHR. The EHR is a controversial tool that struggled to become implemented. The EHR possesses many advantages, but these advantages have hurdles to overcome and may take time to come to fruition. The main disadvantages to EHR adoption are below.
-
-
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
​
Cost-
The EHR cost a lot to establish in the clinical setting. A breakdown of the costs of is listed above is an estimate from a study by Flemming et al. is listed above. The link to the study is below.
-
Quality
-
Physicians also are worried that the amount of time they spend working with electronic health records will take away from the limited time they have with patients. The EHR has a learning curve and still may take longer than the paper record. This takes away from the face to face encounter with the patients. The overall downturn in bedside manner is a key concern for physicians.
-
-
-
-
-
Advantages
-
There are a myriad of advantages to using electronic health records that are key to the large scale push and adoption of EHR’s in recent years. Having a computerized record can reduce errors, costs, and increase medical knowledge among other things. A few of the key features and advantages are listed below.
-
CPOE or “Computerized physician order entry, also known as computerized provider order entry or computerized practitioner order entry, refers to the process of a medical professional entering and sending medication orders and treatment instructions electronically via a computer application instead of on paper charts.” CPOE can reduce errors by physicians when writing prescriptions. It can also do things such as default to the generic version of the drug. CPOE can reduce costs and errors.
-
CDS or “A clinical decision support system is a health information technology system that is designed to provide physicians and other health professionals with clinical decision support, that is, assistance with clinical decision-making tasks.” Clinical decision support can help reduce costs by identifying patients that need further care. For example, one use of CDS is with asthma patients where high cost patients are identified for early intervention. CDS can also be useful to improve costs for quality.
-
Distributing data between doctors is another good reason to use EHR’s. One example of this is when a patient gets a blood test. Instead of the test being deep in the health record it can be pulled up by checking labs.
-
Legibility is another key benefit of electronic health records. Busy doctors have notoriously bad handwriting. By typing, the health record has clean, unified text that can be readable by any individual that needs to read it.
-
Clinical Research is one of the main benefits of using the EHR. One key technological invention of recent years has been the advent of data mining and machine learning techniques. While these advantages don’t come initially, we are now seeing some advantages from rich, longitudinal data sets. As it relates to NLP, processing clinical notes with NLP techniques is now being used. One example of this is the field of radiology where NLP can be used to read radiology reports. All in all, having computerized versions of records enables quantitative research.
-
-
-

3. Future of EHR
It is actually a little hard to dream about a bright future of electronic health records when we know that what we expected it to be 20 years ago has not been fulfilled now. Some standards are not widely accepted, vendors market is largely driven by governmental stimulus instead of “moving along in a natural Darwin way” [2], technical failures happen, and communication between software and institutions is still in haze.
In all means, nothing shall prevent advocates from planning the future the EHR with big blueprints in their minds: Carl Dvorak (Epic’s President) pointed out automation analytics, genomics-informed medicine, telemedicine and next-generation analytics; Paul Black from Allscripts is visioning an entirely new approach to EHR which should be more human centered and cloud based [3]. Researchers from academia and clinical areas may take a slightly different view on EHR’s future from those big vendors. In a 2015 report published by AMIA EHR-2020 Task Force are listed following five areas [4]:
-
Simplify and speed documentation
-
Refocus regulation
-
Clarifying and simplifying certification and Meaningful Use (MU) regulations
-
Improving data exchange and interoperability
-
Reduce data entry and focus on patient outcomes
-
-
Increase transparency and streamline certification
-
Foster innovation
-
The EHR in 2020 must support person-centered care delivery
These 5-year-ago ambitions, at least part of them, are yet to achieve now, but they are still very enlightening into the path our EHR is evolving. Technologies, such as AI, precision medicine, cloud computing, remote healthcare, and FHIR, are surely born to solve these problems, but not all and far from enough and perfect. Human centered design, communication, transportability, and policy making are all playing important roles in current world of EHR, making it a much more complex problem. Each stakeholder, as identified in the other post, may have different views, which are shown in above discussions from only software vendors and academic researchers. But fundamentally people agree on necessity of accessibility, stability, and being user-friendly. Even these basic goals would probably take another 5 or 10 years to be fully addressed, given how different interests each one has in this "big" cake.
​
So, what direction do you think EHR should go?
4. Related Literatures


4.1 Clinical Information Extraction Review
​
Wang, Y., Wang, L., Rastegar-Mojarad, M., Moon, S., Shen, F., Afzal, N., ... & Liu, H. (2018). Clinical information extraction applications: a literature review. Journal of biomedical informatics, 77, 34-49.
​
This is a review paper about clinical information extraction. Accessibility is what EHR is usually criticized for. Clinical information extraction is one way to solve this problem. The paper by Wang et al provided a full literature review from 2009 to 2016 on clinical literature extraction (IE) in terms of publication venues, clinical IE tools, methods, and applications. Figure 1 shows a large proportion of papers were in clinical medicine and informatics journals, and in general the number of publications are increasing.

Fig 1. Distribution of included studies, stratified by category and year (from January 1, 2009, to September 6, 2016).
Some most popular IE frameworks are UIMA, GATE, and Protege, while each of them serves different purposes: UIMA for general unstructured information including text, audio, and video data; GATE for text processing problems; Protege for ontology. Common tools include: cTAKES (http://ctakes.apache.org), MetaMap (https://metamap.nlm.nih.gov), and MedLEE. Toolkits are relatively less common, with a few tops being: WEKA, MALLET, OpenNLP, NLTK, and SPLAT.
Approaches to clinical IE generally include rule-based and machine learning. Common forms of rule-based IE systems are regular expression and logic, with the former being a specific string of characters to be searched, and the latter being developed from either manual knowledge engineering or ontology knowledge base. Machine learning algorithms in the task, preferred by their efficiency and effectiveness, include support vector machine, logistic regression, conditional random field, naive bayes, and random forest.
The common application areas include: disease study (neoplasms, diseases of the circulatory system, diseases of the digestive system, etc), drug-related study (adverse drug reaction, medication extraction, etc), and clinical workflow optimization (adverse events, quality control, patient management, etc).
4.2 NLP application in ICU Management
​
Khadanga, S., Aggarwal, K., Joty, S., & Srivastava, J. (2019). Using Clinical Notes with Time Series Data for ICU Management. arXiv preprint arXiv:1909.09702.0
​
An intensive care unit (ICU) is a special department of a hospital or health care facility that provides intensive treatment medicine for patients who are seriously ill. The mean ICU length of stay was 3.4 (±4.5) days for intensive care patients who survived to hospital discharge, with a median of 2 days (IQR 1–4). A third of patients (35.9%) spent only one day in the ICU, and 88.9% of patients were in the ICU for 1–6 days, representing 58.6% of the ICU bed-days in the cohort [5]. The all-cause mortality rate among patients hospitalized in ICU was 52.3%, and 79.3% of deaths occurred within the first 15 days of hospitalization[6]. The average cost of ICU admission per patient was $31,679 ± 65,867. Estimated ICU costs were $48,744 per survivor to discharge and $61,783 per survivor at one year. Hence, predicting the condition of patients during ICU stay can help better acute care and allocate the hospital’s resources.
​
Figure 2. Doctor notes compliments measured physiological signals for better ICU management [7]
Khadanga et al. [7] provide a natural language processing algorithm to combine clinical notes in addition to the time-series data to improve prediction on benchmark ICU management tasks. Although this multi-model deep neural network marginally enhances the accuracy, this method indicates the possibility that adding clinical notes could improve the performance on in-hospital mortality prediction, modeling decompensation, and length of stay forecasting task. Moreover, accurate forecasts of the patient’s condition in advance could improve the quality of ICU management and decrease the cost.
​
4.3 A Clinical Perspective on the Relevance of Research Domain Criteria in Electronic Health Records
​
McCoy, T. H., Castro, V. M., Rosenfield, H. R., Cagan, A., Kohane, I. S., & Perlis, R. H. (2015). A Clinical Perspective on the Relevance of Research Domain Criteria in Electronic Health Records. American Journal of Psychiatry, 172(4), 316–320. https://doi.org/10.1176/appi.ajp.2014.14091177
​
The paper I chose to write about is A Clinical Perspective on the Relevance of Research Domain Criteria in Electronic Health Records by McCoy et al. The link is below.
This paper attempts to tie in natural language processing techniques into the EHR. McCoy et al. attempts to tie NLP to the EHR by using a framework called research domain criteria. Research Domain Criteria (RDoC) is a proposed dimensional model of psychopathology. Here, RDoC serves as a construct to connect the Electronic Health Record to NLP techniques.
The results of the study find that “in mixed-effects models, loadings for the RDoC cognitive and arousal domains were associated with length of hospital stay, while the negative valence and social domains were associated with hazard of all-cause hospital readmission.” The outcome measures of length of stay and remission are both related to health care cost.
The paper essentially created a list of words related to RDoC concepts and used information extraction techniques to work with clinical text. These techniques worked better as predictors than ICD-9 codes: “In this analysis of data from more than 2,000 patients, leveraging 53,285 documents encompassing more than 89,973,395 words, we demonstrated that eRDoC domain scores are associated with clinically meaningful outcomes, in a manner not fully accounted for by ICD-9 diagnosis code.”
To further dig into the methodology of the paper, we can examine the image below. The first analysis is two-fold leading to a VSM or Vector Space Model. The analysis begins with a RDoC matrix term, then searches for synonyms, then acquires raw results documents from bing, then apply to the main text, then form n-grams from the main text, then meet at the VSM. The second analysis that meets at the VSM is simpler. It begins with clinical admissions notes, converts those to free text, converts those to n-grams, and then meets at a VSM. So essentially the VSM is being made up of n-grams of bing terms related to RDoC and clinical notes.
​
​
​
​
All in all, RDoC mapping of clinical notes provides key predictors that can be potentially used to decrease the cost of care for those with mental illness in the future.
​
5. Reflection Blog
Reflection blog for Part 2: EHR: click me!
References:
[1] Shortliffe, E. H. (2006). Biomedical informatics. J. J. Cimino (Ed.). Springer Science+ Business Media, LLC.
[2] Death by a Thousand Clicks: Where Electronic Health Records Went Wrong. Fortune. Accessed on April 19, 2020: https://fortune.com/longform/medical-records/
[3] Next-gen EHRs: Epic, Allscripts and others reveal future of electronic health records. Healthcare IT News. Accessed on April 19, 2020: https://www.healthcareitnews.com/news/next-gen-ehrs-epic-allscripts-and-others-reveal-future-electronic-health-records
[4] Payne, T. H., Corley, S., Cullen, T. A., Gandhi, T. K., Harrington, L., Kuperman, G. J., ... & Tierney, W. M. (2015). Report of the AMIA EHR-2020 Task Force on the status and future direction of EHRs. Journal of the American Medical Informatics Association, 22(5), 1102-1110.
[5] Moitra, V. K., Guerra, C., Linde-Zwirble, W. T., & Wunsch, H. (2016). Relationship Between ICU Length of Stay and Long-Term Mortality for Elderly ICU Survivors. Crit Care Med, 44(4), 655-662. doi:10.1097/ccm.0000000000001480
[6] Unal, A. U., Kostek, O., Takir, M., Caklili, O., Uzunlulu, M., & Oguz, A. (2015). Prognosis of patients in a medical intensive care unit. Northern clinics of Istanbul, 2(3), 189.
[7] Khadanga, S., Aggarwal, K., Joty, S., & Srivastava, J. (2019). Using Clinical Notes with Time Series Data for ICU Management. arXiv preprint arXiv:1909.09702.0