NLP PEOPLE PRIMER
1. What the University of Washington Do for Security
​

For more information, please refer to UW IT Security.
2. Health Information Protection - HIPAA
2.1 What is HIPAA
HIPAA is the acronym for the Health Insurance Portability and Accountability Act that was passed by Congress in 1996. There have been notable updates to HIPAA to improve privacy protections for patients and health plan members over the years, which helps to ensure healthcare data is safeguarded, and the privacy of patients is protected.
HIPAA does the following:
-
Provides the ability to transfer and continue health insurance coverage for millions of American workers and their families when they change or lose their jobs;
-
Reduces health care fraud and abuse;
-
Mandates industry-wide standards for health care information on electronic billing and other processes; and
-
Requires the protection and confidential handling of protected health information
2.2 What is HIPAA Privacy Rules
The HIPAA Privacy Rule addresses the use and disclosure of individuals’ health information called “Protected Health Information (PHI).” These types of organizations are called “covered entities.” The Privacy Rule HIPAA requirements outline for covered entities individuals’ privacy rights to understand and control how their health information is used.
​
​

2.3 What is HIPAA Violation
A HIPAA violation is a failure to comply with any aspect of HIPAA standards and provisions detailed in 45 CFR Parts 160, 162, and 164.
-
Unauthorized accessing of Healthcare Records: Accessing the health records of patients for reasons other than those permitted by the Privacy Rule – treatment, payment, and healthcare operations – is a violation of patient privacy. Snooping on healthcare records of family, friends, neighbors, co-workers, and celebrities is one of the most common HIPAA violations committed by employees.
-
Improper Disposal of PHI: When physical PHI and ePHI are no longer required, and retention periods have expired, HIPAA Rules require the information to be securely and permanently destroyed.
​
2.4 Q&A
2.4.1 Can you send patient information via email?
Yes, as long as the following three requirements are met:
-
The email is sent within UW Medicine (@u.washington.edu, @uwpn.org, @uwp.washington.edu) or to one of our affiliates (@fhcrc.org, @med.va.gov, @psbc.org, @seattlecca.org, @seattlechildrens.org);
-
The email transmission is secure; and
-
The email contains the minimum amount of patient information necessary to meet the recipient’s needs.
2.4.2 Can you automatically forward email received by your University account to other email accounts such as Hotmail or Yahoo?
No, UW Medicine staff and students may not automatically forward email received by their University account to a personal email account. This action is prohibited by the policies of the University since the transmission is not necessarily secure.​
2.4.3 What steps should be taken when an email containing patient information is sent to the wrong recipient?
If you are the sender, notify the HIPAA Staff in UW Medicine Compliance. If you are the recipient, immediately reply to the sender notifying them of the error, delete the email and inform the HIPAA Staff in UW Medicine Compliance.
​
For more information, please refer to UW Medicine Compliance.
3. i2b2/UTHealth NLP De-identification Challenge
Privacy from the user end is required to do research on data. This amounts to the de-identification of patient data before research occurs. From the natural language processing perspective that this primer aims for, nlp can be used for two purposes. First, nlp can be performed on the deidentified data. This is the most common use of nlp. Second, nlp can be used for the de-identification of data. In this primer we will discuss two articles by Stubbs et al. and one article by Liu et al. that detail their work using nlp to deidentify data from the 2014 i2b2/UTHealth NLP Challenge.
Overview of the Challenge
In the challenge, 10 total teams participated, submitting a total of 22 submissions. The task featured 4 tracks, but here we will mainly discuss the first track of de-identification. The de-identification task attempted to identify PHI in clinical narratives. They attempted to remove the 18 categories of PHI as defined by HIPAA.
Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1
This article discusses the challenge in general. Teams proposed entity and token level annotations. This is due to different needs at different institutions. They measured their results with precision recall and F1 measures at the micro and macro levels. They conclude that de-identification cannot be done perfectly at an automated level.
Stubbs, A., Kotfila, C., & Uzuner, Ö. (2015). Automated systems for the de-identification of longitudinal clinical narratives: Overview of 2014 i2b2/UTHealth shared task Track 1. Journal of Biomedical Informatics, 58, S11–S19. https://doi.org/10.1016/j.jbi.2015.06.007
​
Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus
This paper discusses the task at a whole. They state that for this track, they de-identified a set of 1304 longitudinal medical records describing 296 patients. Here, they focused on replacing identifiable information with surrogates. The average F-measure was .872 and the highest F-measure was .964.
Stubbs, A., & Uzuner, Ö. (2015). Annotating longitudinal clinical narratives for de-identification: The 2014 i2b2/UTHealth corpus. Journal of Biomedical Informatics, 58, S20–S29. https://doi.org/10.1016/j.jbi.2015.07.020
Automatic de-identification of electronic medical records using token-level and character-level conditional random fields
Liu et al.’s paper discusses their approach to the challenge. They proposed a hybrid system of machine learning and rule based approaches. They received an F-score of ~.95 on both the micro and macro data sets. They found that rule-based classifiers had higher precision, but that machine learning classifiers had higher F-measures and recall. They also found that token level conditional random fields and character-level conditional random fields are complementary to each other.
Liu, Z., Chen, Y., Tang, B., Wang, X., Chen, Q., Li, H., Wang, J., Deng, Q., & Zhu, S. (2015). Automatic de-identification of electronic medical records using token-level and character-level conditional random fields. Journal of Biomedical Informatics, 58, S47–S52. https://doi.org/10.1016/j.jbi.2015.06.009
4. Patients and Tech Giants
​
In the fall of 2019, Google and University of Chicago and University of Chicago Medical Center were sued for their violation in University’s sharing not-HIPPA-complaint data and Google’s potential violation in privacy. The University agreed with Google to share patient data to build powerful AI tools to help discover diseases. The plaintiff, Matt Dinerstein, a former patient at UC Medical Center, claimed that the University shared data with date stamps and physician notes, which could be used by Google to identify patients through the help of the tech giant’s information about persons’ activities, recorded everyday from Google Map, search, mail, and Android phones. This brings up a huge privacy violation of patient’s privacy, which Matt didn’t agree to share with Google. And interestingly, the results of the study were published in Nature.
Back to 2016, DeepMind, a London-based AI lab owned by Google’s parent company, Alphabet, was in a similar position of being accused of violating patient’s privacy after its deal with Britain’s National Health Service. The Information Commissioner’s Office (ICO), a Britain’s regulatory office dealing with information and privacy, launched its probe and released the conclusion of the agreement “failed to comply with data protection law” in July 2016.
Dating back further, in 2014, New York-Presbyterian Hospital and Columbia University Medical Center were accused of sharing 6,800 patients’ data with Google back in 2010. The two entities together agreed to pay the fine of $4.8 million to settle this alleged HIPAA violation.
Working with tech giants, is it a lucrative trap in privacy?
5. Reflection Blog
Reflection blog for Part 4: Security: click me!
References:
Google and the University of Chicago Are Sued Over Data Sharing. New York Times. Accessed on May 2nd, 2020: https://www.nytimes.com/2019/06/26/technology/google-university-chicago-data-sharing-lawsuit.html
Google, University of Chicago Face Revamped Health Privacy Suit. Bloomberg. Accessed on May 2nd, 2020: https://news.bloomberglaw.com/privacy-and-data-security/google-university-of-chicago-face-revamped-health-privacy-suit
Google DeepMind patient data deal with UK health service illegal, watchdog says. CNBC. Accessed on May 2nd, 2020: https://www.cnbc.com/2017/07/03/google-deepmind-nhs-deal-health-data-illegal-ico-says.html
Hospitals fined $4.8M for HIPAA violation. Healthcare IT News. Accessed on May 2nd, 2020: https://www.healthcareitnews.com/news/hospitals-fined-48m-hipaa-violation