1. What is analytics?

Analytics (n.) is the systematic computational analysis of data or statistics. Analytics can involve big or small data sets, but in recent years the buzzwords of big data, data science, and machine learning have taken over analytics. Analytics can refer to any of these key words as the definition is fluid. In general, data science and machine learning refers to methods of dealing with big data. Data science has more of a statitistical background potentially involving statistical testing. Machine learning revolves around classification, regression, dimensionality reduction, and clustering. A picture of machine learning techniques in a popular python package called scikit-learn is below.

In our primer, we’d like to illustrate different uses of software that’s been used for text analytics on big biomedical data. Primarily, we’d like to introduce you to Linguistic Inquiry and Word Count (LIWC, pronounced “Luke”) which has been used to characterize big biomedical data, specifically in the fields of psychiatry and psychology.

2. Deep-dive into Text Analytics Software: What is LIWC?

LIWC is a computer program that summarizes big text data. LIWC is a “transparent text analysis program that counts words in psychologically meaningful categories. (Tausczik and Pennebaker)” LIWC calculates the degree to which various categories of words are used in a text, and can process texts ranging from e-mails to speeches, poems and transcribed natural language in either plain text or Word formats. LIWC derives features from narrative text by counting the number of words in a text that correspond to categories in LIWC’s lexicon (or dictionary), with categories defined by lists of words that fall into them. LIWC is specifically used to calculate the psychological meaning of words. For each feature or set of words, LIWC calculates the percentage or proportion of words that corresponds to it.

3. LIWC Applications in Research: Recovery Problem Prediction

Paper to be discussed: Kornfield, R., Sarma, P. K., Shah, D. V., McTavish, F., Landucci, G., Pe-Romashko, K., & Gustafson, D. H. (2018). Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum. Journal of Medical Internet Research, 20(6), e10136. https://doi.org/10.2196/10136

One potential application of LIWC software in the biomedical domain is the discovery of problems in recovery from an online forum. Here, recovery means recovery from addiction issues. Online forums are used to help people with addiction issues reach out to others when they need help. While trained staff may view these messages, they cannot view them all. Due to the deluge of data, alternative approaches are used. The goal of this paper is to combine computational linguistics and machine learning to deal with the issue of overflux. In essence, computational linguistics is going to deal with the predictor variables and machine learning for the binary prediction of whether there is a problem or not.

The training data came from a mobile health intervention of those with alcohol use disorder. They used distributional semantics, specifically a bag-of-words model, the LIWC software, and a hybrid approach.The binary classifiers they used were support vector machines, decision trees, and boosted decision trees. The bag-of-words model relied on domain specific language such as “drink” where the LIWC software did as mentioned above and categorized the variables. They reported that, “a boosted decision tree classifier, utilizing features from both Bag-of-Words and Linguistic Inquiry and Word Count performed best in identifying problems disclosed within the discussion forum, achieving 88% sensitivity and 82% specificity in a separate cohort of patients in recovery.”

In conclusion, language use reflects the health issue of a problem with recovery in those with addiction. This finding goes to show that LIWC (along with distributional semantics and machine learning) can be used to identify small patterns in language that can reflect larger changes in psychological state.

4. Depression Analysis

It turns out that depression can be easily analyzed using the previous text analytics tool we talked about titled Linguistic Inquiry and Word Count. In the paper, “Language use of depressed and depression-vulnerable college students”, the authors discovered that the use of words in essays written by formerly depressed, currently depressed, and never depressed students differed in the wording they used. Depressed participants used more negatively valenced words and used I more often than never depressed individuals. This is consistent with the literature, including Beck's cognitive model and Pyczsinski and Greenberg's self-focus model of depression. Furthermore, formerly depressed individuals did not differ from never depressed individuals in these categories. However, the use of ‘I’ by formerly depressed individuals increased across the essays and was significantly greater than never depressed individuals at the end of the task. The above article is an integral article for Talkspace research because one of the main outcomes of Talkspace research being done right now is the prediction of depression.

5. Talkspace Teletherapy Research Being Done at UW

Talkspace is the #1 online therapy application with over 1 million users. Talkspace is represented by Michael Phelps and you may have seen their commercials. Talkspace makes therapy available and ‘affordable’ for everyone. Their mission is to provide more people with convenient access to licensed therapists who can help those in need live a happier and healthier life. With Talkspace, you can send a therapist text messages, audio messages, as well as picture and video messages in a private, text-based chat room.

A large corpus of Talkspace data is being analyzed right here at the University of Washington. The tool for analysis is the same tool discussed in the Analytics section of the primer and is titled LIWC. The main outcomes or labels for the Talkspace research being done at UW is remission, remission acute, improvement anxiety, chronic, improvement depression, and chronic elevated. These metrics represent the diagnosis for each individual Talkspace client. The previous literature about depression analysis in the paragraph above will be used as justification for using the I measure and negative emotions. In preliminary analysis, we see that I is used least in those with remission and most in the categories of chronic and chronic elevated. We also see that negative emotion words are used least in those with remission and most in those with chronic and chronic elevated. Here at UW, we are hoping to find other significant predictor variables for depression and anxiety and potentially differences in language use that can predict whether someone has treatment resistant depression or not. All in all, this is one of the many projects being done in the department that relates to telehealth applications in medicine.

6. Regulation of Predictive Analytics in Medicine

Machine learning and artificial intelligence application in medical care systems is a hot topic being discussed actively in both academia, with countless papers focusing on different stages of the process: from theoretical research to thorough exploration into healthcare systems. While researchers are still actively examining every step, not many real applications have been done. Regarding this issue, some researchers from the University of Pennsylvania, the University of California at Berkeley, and VA Medical Center proposed five criteria for developing similar framework in terms of evaluating and regulating predictive algorithms.

(1) Meaningful Endpoints

In most theoretical work, the most popular metrics are area under the curve, which is may not be readily understandable by clinicians or patients, or even useless without providing a cutoff suggestion, making it not very clinically meaningful. The proposed endpoints should be based on some established standards of clinical benefit, even though this may involve some downstream outcomes such as overall survival and change in cost of clinical management.

(2) Appropriate Benchmarks

Since the focus points of these algorithms may not be exactly the same as clinical endpoints, there is usually not a universal standard to test that. As standards for comparison are not well-defined in the FDA’s premarket clearance program, studies can be performed without clear comparison with meaningful clinical practice to get approved. So far, a best bet is still the “clinician + algorithms” framework, where algorithms are compared with clinician’s expert review. But this is very limited in terms of recruiting enough experts to do this stuff.

(3) Interoperable, Generalizable

FDA 510(k) is usually for such algorithms in general use. The WAVE system, the first predictive surveillance system that received FDA 510(k) clearance for general clinical use, is largely based on five common key factors: heart rate, respiration rate, oxygen saturation, temperature, and blood pressure. With researchers more and more “greedy”, many other data forms are required in building their algorithms. This may lead to the problem that some of the factors used are very specific to their health systems, so that interoperation becomes a huge obstacle when transporting their methods into another system. As the other side of the coin, their algorithms work pretty well in their system, usually. Some common data models, such ODHSI, may be a good solution for tackling this problem.

(4) Specify Interventions

This is a general problem for these algorithms, which even though provide good accuracy in predicting upcoming events, but are not accompanied with solutions. This may be severe (or useless) in critical settings such as ICU, where a decision should be made momentarily once an error came up. It is still a hard problem for the system to make predictions and give solutions at the same time. So is this a huge limitation to machine learnings?

(5) Audit Mechanisms

Algorithms are buggy, and way from perfect. Just as what FDA did for post market surveillance for drugs and medical equipments, similar strategies should also be applied to algorithms. How the system works, how much it helps with improving healthcare, how many error it prevents, how many error itself makes, are some easy yet very important questions to answer when monitoring a predictive system. Making guidelines for auditing algorithms to be implemented should never be ignored as part of the original plan.

7. Artificial Intelligence (AI) Failure in Medical Practice

In 2014, IBM opened swanky new headquarters for its artificial intelligence division, known as IBM Watson. IBM's artificial intelligence creation Watson wowed the world with its appearance on the TV game show, Jeopardy! in 2011. It was amazing to see a computer play against human beings and do so well. IBM announced a new career path for its AI quiz-show winner: It would become an AI doctor.

Figure 1. AI application in Health care. (Eliza Strickland (2019))

Reflection Blog

Reflection blog for Part 8: Analytics: click me!

References

Parikh, R. B., Obermeyer, Z., & Navathe, A. S. (2019). Regulation of predictive analytics in medicine. Science, 363(6429), 810-812.

Kornfield, R., Sarma, P. K., Shah, D. V., McTavish, F., Landucci, G., Pe-Romashko, K., & Gustafson, D. H. (2018). Detecting Recovery Problems Just in Time: Application of Automated Linguistic Analysis and Supervised Machine Learning to an Online Substance Abuse Forum. Journal of Medical Internet Research, 20(6), e10136. https://doi.org/10.2196/10136

Eliza Strickland (2019). How IBM Watson Overpromised and Underdelivered on AI Health Care. Retrieved from https://spectrum.ieee.org/biomedical/diagnostics/how-ibm-watson-overpromised-and-underdelivered-on-ai-health-care

7.1 Watson for Oncology

Watson for Oncology combines leading oncologists' deep expertise in cancer care with the speed of IBM Watson to help clinicians as they consider individualized potential cancer treatment options for their patients. Watson for Oncology was supposed to learn by searching the vast medical literature on cancer and the health records of real cancer patients. The hope was that Watson, with its mighty computing power, would examine hundreds of variables in these records—including demographics, tumor characteristics, treatments, and outcomes—and discover patterns invisible to humans. It would also keep up to date with the bevy of journal articles about cancer treatments being published every day.

Across the country, preeminent physicians at the University of Texas MD Anderson Cancer Center, in Houston, collaborated with IBM to create a different tool called Oncology Expert Advisor. MD Anderson got as far as testing the tool in the leukemia department, but it never became a commercial product. Watson learned fairly quickly how to scan articles about clinical studies and determine the basic outcomes. But it proved impossible to teach Watson to read the articles the way a doctor would.

7.2 Reasons that Watson Oncology fail in the clinic

(1) Different data training for Jeopardy! and oncology

When set to the task of winning Jeopardy! or board games such as chess, the AI software will search for the outcome most likely to lead to victory – a checkmate in chess or a correct answer in Jeopardy! The training of such AI game problems is supervised, as the datasets contain a large number of previous chess matches or pairs of question and answer in Jeopardy! These datasets are completely labeled, with clear relationships between the input and outcome.
On the other hand, completely labeled datasets are not feasible for the training of Watson for oncology. Since many lab results are quantitatively analyzed and AI excels in processing and analyzing image scans, it is relatively straightforward to train Watson in diagnostics. A paper in The Oncologist reported that Watson was able to achieve very high accuracy when it dealt with clear, defined tasks like diagnosis.
However, it is much harder to train Watson with unstructured, abbreviated, and often subjective information on a patient, such as doctors’ notes and hospital discharge summaries, which make up close to 80% of a patient’s record.

(2) Jeopardy! and oncology are different tasks

With Jeopardy!, Watson has the perfect scenario. The question-and-answer format is specific and defined. Watson was trained on and tested with quiz questions written in the same style. Therefore, the collection and preparation of data for analytics is relatively straightforward. All Watson needs is tremendous computational power to crunch through lots of data and determine the most likely answer.
On the other hand, oncology contains more complexity. In fact, oncology is several problems rolled into one: diagnostics, culling information from previous journal publications and analyzing the unstructured patient information. While AI does well analyzing the quantitative data in lab results, it does not yet have the capability to analyze texts that are rich in context and nuance.