Podcast on ‘Natural Language Processing and its Applications in Healthcare’

Elvin (Host): Hello and welcome to the InApp podcast, where we talk about empowering businesses with transformative digital solutions.

I’m your host, Elvin. Today, we’re going to talk about natural language processing and its applications in healthcare.

My special guest today is Mahalingam, a pre-sales manager here at InApp. He specializes in technologies that companies can use to boost their digital strategy and streamline business processes.

Thanks for being with us today.

Mahalingam (Guest): Thanks for having me, Elvin. It’s great to be here.

Elvin: I’m really excited to learn more about natural language processing, and how it applies to healthcare, an industry that affects all of us, now more than ever.

Let’s start with a quick introduction to natural language processing or NLP. What is it? And how does it work?

Mahalingam: Putting it in simple terms, NLP is all about making the interaction between humans and computers easier than writing programs. Nowadays, whenever we want the computer to do something and give an output, we either write programs or give some written commands that are preprogrammed into the operating system. NLP eliminates that need and helps us give instructions in a form closer to the human language. A very common case study for the same is that of our smart assistants like Google Assistant, Siri, Cortana, Alexa, etc. We communicate with them using our voices, and they understand what we mean to a good extent. They even respond in a way that closely resembles human voices. I use that feature every day when I just say “Siri, remind me to take my medicine after two hours”, and Siri is able to understand that I want to set a reminder, that I want to set for a time two hours from now, and automatically sets it. Some other examples are autocorrecting systems in MS Word or Google Docs or similar applications.

Nowadays we are all familiar with the likes of ChatGPT. That can also be considered as an NLP application that performs both understanding and generation of natural language text.

Elvin: I use those features all the time too. How does it work?

Mahalingam: NLP involves a large pipeline of tasks, which are being fine-tuned after decades of research. The program needs to start by listening to conversations or typed text and understanding where they start and end. Once that chunk is received, it may have to perform some noise removal. Once a clean piece of input is ready, it needs to be broken into individual words called tokens, and each token has to be understood separately. Processes like stemming and lemmatization help convert all higher forms of words into the base form. The tokens will be interpreted one by one, and context is added whenever some confusion is encountered. Contextual information can be managed using models like n-grams, bag-of-words, LSTM, etc. Eventually, they get converted into an intermediate form which can be processed by underlying programs. This pipeline can either be based on a set of predefined rules or using a machine learning approach to learn on the go.

In short, you can consider it analogous to how programming languages are processed, but at a larger scale and complexity. The reason is that programming languages have standardized syntax and semantics that are verified by the compiler or interpreter prior to further processing, whereas there is nothing of that sort in natural language. But one thing is to be remembered. The success of NLP depends on how well the input is managed. It may be voice, text, or even handwritten. Generative models like GPT take this one step further by leveraging state-of-the-art computing facilities and billions of language tokens to ensure the generated text is as logical and sensible.

When we talk about NLP, the application that always comes to our mind is the one I mentioned earlier, which is smart assistants. There are hundreds of other applications that will benefit from NLP, and healthcare is one of the most important ones. That’s why I feel today’s theme is to the point.

Elvin: Fascinating. And how does natural language processing work in the healthcare industry?

Mahalingam: A very good question. As I was exploring the opportunities of NLP in healthcare, I came across an article from Hitachi Solutions. It mentioned applications like clinical assertion, medical de-identification, and anonymization, clinical entity recognition, clinical note digitization, etc. 

The clinical assertion will help in medical decision-making by ensuring that a given list of symptoms corresponds to a particular diagnosis, based on a number of rules. De Identification helps to identify personally identifiable information from the medical text and remove them for regulatory purposes like HIPAA. Clinical entity recognition will help identify aspects like which tests were done, and what is the diagnosis, based on a verbal transcript. Note digitization is one of the most common applications where legacy handwritten clinical notes are converted into digital formats for integrating with Electronic Health Records (EHRs).

We should note that none of these can actually replace any medical professional, but they can support them and point out if there are any fallacies.

Elvin: So, it’s more about helping medical professionals by streamlining these processes. It sounds like the healthcare industry is already embracing natural language processing. Why the sudden increase in adoption?

Mahalingam: Two major reasons – are access to large data volumes and storage capacity and access to computing resources that can handle complex NLP pipelines on large datasets. Most hospitals are currently running in electronic mode using EHRs. With cloud providers gaining popularity with affordable storage and computing, hospitals now have a way of using them to gain insights. Some cloud providers have even come with healthcare-specific applications. An example is “Amazon Comprehend Medical” offered by AWS. Even otherwise, complex NLP pipelines on medical data can now be executed on the cloud with customizable VM configurations, and deployment options like Kubernetes.

Elvin: We know there’s a lot of patient data in an electronic health record system. What are the steps involved in making this data useful for a modern computer-based algorithm?

Mahalingam: Absolutely. They can become a huge asset as far as healthcare is concerned. But when we do that we need to consider things from the data point of view, something we call “the data-centric perspective”.

When we talk about this question, we are making the assumption that data is already on EHRs. In reality, that may be one of the most critical assumptions that can get broken when we try to implement a computerized solution. I recently went to a hospital where I found an army of more than 15 people sitting there and typing everything from old handwritten files to EHRs. That may be a point where automation will have to intervene first, in the form of OCRs. Nevertheless, we will go with the current assumption, and consider all data to be in EHRs.

Elvin: And that’s very sensitive data.

Mahalingam: Yes, medical data is extremely sensitive. So all EHRs should be anonymized using de-identification methods. Services like AWS Database Migration Service already contain data masking mechanisms that can be used when migrating data to the cloud. In addition, services like Amazon Comprehend Medical contain tags that can point to personalized healthcare information (PHI).

Once that is done, we may have to segregate the data into structured and unstructured forms. Some pieces of data like test results, diagnostics, etc. can be tabular in nature and can be managed using database operations. On the other hand, unstructured data like clinician’s notes will have to be taken through a proper NLP pipeline that can extract the core entities and convert them to a form that can be stored appropriately. Once these can be done, there will be enough data in a structured or semi-structured format to train a computer-based model like in machine learning for future predictions and analyses.

Elvin: Can you explain a little more about the NLP pipeline?

Mahalingam: In the case of using the NLP pipeline, the following may have to be done.

  1. Extract named entities like drug names, test names, dates, etc. from the note.
  2. Establish connections between the different entities using methods like N-grams, or even machine learning models like LSTM.
  3. Expand the connections and convert them into a structured form, something like a timeline, so that there is proper tracking.

Elvin: I can see how that would be extremely helpful. Can you tell us about the Optical Character Recognition system?

Mahalingam: OCR, as it is fondly called, converts scanned characters to text. OCR used to be conventionally used for reading documents that were originally written using typewriters. The advantage of such documents is that the letters are well-defined. OCR would process the document once it is scanned and will extract the characters as they occurred in the original document. If it was a fully structured document, it would be converted into a passage. Nowadays, the algorithm has been enhanced to incorporate handwritten characters also. But the challenge there is that each person writes differently, leading to possible misclassification. In healthcare applications, as we have seen before, this would be highly beneficial in converting legacy handwritten documents to digital forms for storage in EHRs.

Interestingly, this feature is available in more places than we think. For example, Acrobat Reader has a feature where you can select text from PDF files, even if they are scanned. Similarly, Apple incorporated this feature into their Preview app, which works like Acrobat Reader. Now on iPhones and newer Android phones, you can open the camera and a text preview will come on the side, which detects text from the live feed. The algorithms have become so optimized that they run on our phones! 

Elvin: There is one common concern that I wanted to ask you – As you know, we all have different writing styles. Doctors also write their diagnoses differently. How can we tackle that issue with OCR?

Mahalingam: That’s a really interesting question. As far as OCR is concerned, it is responsible only for converting whatever is presented to it into a digital form. NLP will have to do the heavy lifting of understanding whether two things mean the same.

If we consider the OCR part of this question, then we should be concerned about the style of writing followed by different people. Some write in distinct letters, some use cursive, etc. And doctors are not always popular for writing legibly. So OCR has a good chance of misinterpreting handwriting if a doctor’s prescription is directly fed into it.

This brings us back to the very first question where we mentioned that the disadvantage of using human language is that there is a highly flexible structure and semantics, especially in languages like English.

For example, “read” in the past tense, and “red” the color, is spoken the same way. How will an NLP system interpret that voice? Even worse, “read” is written the same in present and past tenses. So for a scanned document, what tense should it assume? These will have to be decided based on the context.

Yet another thing to be taken care of is that there are different brands composing the same medication. For example, Indians always fondly call paracetamol Dolo. But paracetamol has yet another name called acetaminophen. Not many people realize this and trust me, I had this exact issue once. So people may ask for Dolo when the doctor prescribes acetaminophen, and the NLP system may have to recognize that these two are the same. It may have to take the help of named entity recognition and understanding that these two are coming in the same context, and both refer to the same item. At the same time, it should recognize the five different classifications of immunoglobulin as five different items, like IgM, is not the same as IgE.

Elvin: Okay So my next question is – is it right to assume that machine-learned algorithms could uncover diseases in medical records that weren’t previously diagnosed?

Mahalingam: In Nick Fury’s words, “I will neither confirm nor deny that story”. It is true that computer-based analysis can bring out patterns that were previously undetected. But whether that pattern will contribute to the cause or effect is yet to be determined. For example, if a machine learning algorithm predicts a value that it was not trained for, it could either be something wrong with the training part itself, or something new we haven’t accounted for in the original dataset.

But as we talk about expanding computer-based algorithms to that extent, one thing we should remember is that “correlation does not equal causation”. In other words, just because two things occur together, they may not be related. One example we use to tease data scientists is that of the bald cyclist survey. A survey collected data from hundreds of cyclists regarding their health and physical appearance, and it so occurred that all of them were bald. The computer-based algorithm gave the verdict that “Cycling causes baldness”. So we should be careful to what extent we take the results.

Elvin: Fair enough. So how do you see the healthcare industry using NLP in the near future?

Mahalingam: NLP holds nearly unlimited potential as far as healthcare is concerned. But adopting it will be a challenge, especially in the case of privacy concerns. People need to be sensitized about how the process is going to be done, and strict healthcare regulations should be in place whenever any automation is done. If the healthcare domain succeeds in convincing everyone, then a number of steps can be taken.

  1. Eliminate hard copies entirely by converting all data into EHRs. Automation can be done using OCRs wherever necessary. It will be a mix of manual and automated processes, especially since the accuracy of converted information is critical. We don’t want any test result or prescription to be misinterpreted.
  2. Once the entire data is in EHR form, NLP can take over and extract essential information using methods like named entity recognition, n-grams, bag-of-words, etc. All essential data can be stored on stable storage for further processing.
  3. Computer-based algorithms can then work on the massive pool of data to generate insights and predictions. They can form decision support systems to help the doctor make quick decisions. But care should be taken that this is done ethically.

Elvin: It’s going to be fascinating to see how that plays out in the next few years. If our listeners have any questions or want to know more about NLP in the healthcare industry, how can they contact you?

MahalingamInApp is active on LinkedIn and Twitter. Feel free to connect with us and message us in case of any questions.

Elvin: Thanks so much for joining us today, Mahalingam. Technology has a lot of exciting possibilities in the healthcare industry.

And thank you to our listeners for joining us. Tune in next time to learn more about transformative digital solutions in the InApp podcast. Have a wonderful day!