February 2024: Voice as a platform in healthcare
AI Scribes are just the first use built on audio. More will come.
The way AI charting spread at Carbon Health has been fairly unintuitive. After the team shipped the initial version, I believed that clinicians would see the documentation benefits immediately and pick it up. In practice what’s happened instead is: clinicians don’t care by default, because most have been failed so often by new technology offerings in the past (that reduced their productivity) that they just don’t want to be early adopters. However, when adoption did occur, it took one of two typical paths.
First, if a clinician joined after AI charting was introduced into the EHR, and AI charting was included as part of their training, they were far more likely to utilize it. Second, once one clinician in a clinic started using it, it would spread like wildfire to all clinicians in that clinic. Remember, in urgent care it’s rare that clinicians get to work directly together and observe each other’s behaviors. But part of the reason it spreads like wildfire in a location is because clinicians actually have to change how they chart in order to get the most out of it, and that typically requires clinician-to-clinician contact.
Charting the old way actually doesn’t extract the most leverage from AI because clinicians need to change how they do the visit and interact with the EHR. As a clinician, you save > 1000 keystrokes per visit, but you’d need to vocalize/articulate your thought process, the paths you went down and the paths you ruled out, the clinical decisions that are pending diagnostic tests and the decisions you’ve made, and the outstanding questions you have (some which the patient would answer on the spot, and some pending a diagnostic test). In the old world the clinician was thinking this anyway, but there wasn't necessarily any value in vocalizing it all. In the new world they’re vocalizing it as well. The first consequence is that this drives a better experience for patients; regardless of anything else about the visit, the patient has a better understanding of what’s happening overall, and the completeness of the clinician’s algorithm. The patient can also interject when the clinician mishears or misses something (which is a positive change in patient behavior), which drives even more completeness. The second consequence is that because the entire interaction is recorded, we now have the patients’ descriptions of what happened, guided by the clinicians’ questions, in their own words. As a society, we’re building, in real time, an extremely high-signal dataset that wasn’t legible before.
Documentation is (one of) healthcare’s achilles heel(s)
Many folks in healthcare (including me) complain about the absolutely massive administration burden in healthcare. As a non-clinician, I fall into that category of admin as well. A natural adaptation to an already large and growing administrative burden constrained by fixed time, is that clinicians often use shorthand to document visits. For instance - the average length of a soap note typed out for a Carbon Health urgent care visit is 400 - 500 characters. In contrast, the average length of an AI generated SOAP note is 1400 characters (even including edits a clinician makes, which is generally uncommon). The difference is primarily time - the clinician would have to do literally 1000 more keystrokes per visit in order to achieve the same level of completeness. In addition, they’d have to remember it all.
The AI generated soap notes simply captures a more accurate and thorough picture of what actually happened during the appointment.
AI Scribes != Dictation or human scribes
The current generation of AI scribes differ from dictation tools like Dragon (from Nuance) in a few material ways. First, clinicians typically don’t dictate during the visit with the patient in the room - they’ll often do it after. This means the audio of the dictation usually doesn’t include the patient talking. Second, the clinician is extracting and interpreting signal, and then only converting that to the medical record, so there’s no raw context to extract or analyze later.
Similarly for human scribes, what gets written to the patient record is not raw audio; it’s filtered through the scribe, and eventually reviewed by the clinician (and even then it’s not a word for word description of what the patient and provider said). The clinician in the visit is likely vocalizing & articulating their algorithm in a similar way as they would with an AI scribe, but the context from the patient isn’t captured.
Our sensors are too blunt
One meta problem in healthcare today is our sensors are too blunt - so many of the ways we measure what's happening to a patient’s health are expensive, time-bound, or invasive. Even when they’re not, we’re often measuring proxies and then using those proxies to infer the actual thing we want to know, and for a variety of reasons we rarely measure what's happening with the patient on a continuous basis (CGMs are an example of continuous measurement, and even those essentially use a proxy to estimate how your blood glucose responds, and even then they’re super expensive and not reimbursed so they don’t get used preventatively).
As an example, a friend of mine was taking care of his mother who had cancer. She would record all the doctors’ visits using the Abridge consumer app (where I’m an investor) and share them with my friend and all his siblings. They would all listen and chime in with thoughts and questions for the provider that could be asked either in between or at the next visit. A few months in, his mother experienced a pretty bad side effect of one of the therapies where she basically couldn’t walk for 3 months. The symptom management was almost entirely pharmacological (and not targeted). It was a relatively new therapy; less than two years old-and the way they figured out what was happening, was listening to the audio and then reading the clinical trials extracts to figure out if/where those symptoms were discussed. After they understood it was a side effect, she shared it with the care team who changed the prescription, and she took to Reddit and Facebook groups for some non-pharmacological guidance on managing the symptoms (this is what worked). In this case, all the signal happened during the visit, but the care team didn’t catch it until she literally brought up that the symptom was a documented side effect. I think in the future, the listening, research, and connecting to community can all be done in one service or application aided by AI, that both the patient and clinician can access.
Audio is a new (non-invasive) sensing tool, and LLMs make it legible
One indirect benefit of AI that I don’t believe we’ve seen the effects of yet is that, by recording the audio of the visit conversation, we as a society now have net new data being passed into the patient record, that we didn’t have before.
Before, patient records (and in particular the visit summaries) were primarily constructed from a clinician listening, extracting signal, interpreting it, and then writing it down (true for both charts created by hand, and dictation). Many things impaired the quality of this, including use of shorthand, clinician burnout, administrative burden, and more. Even the absolute best clinicians wouldn’t have time to document their entire decision process consistently all the time.
AI charting means that the entire contents of the patient conversation will eventually be part of the patient record. This will include the audio record which includes
Exactly what the patient said and how
The patients voice and the provider’s voice (and inflections, variations etc)
Exactly what the clinician said/asked that prompted the patient’s response
The clinician’s interpretation, including what they decided, what they ruled out, what they thought of ordering but didn’t, and more
We already combine this context with a ton of the knowledge we have about a patient (their demographics, medical history, historical labs, prescriptions etc) to generate the SOAP note, and to fill out forms like medical excuse notes and forms for worker’s compensation visits. As an industry, we’ve been so focused on the administrative benefit (and burgeoning clinical benefit) of AI charting that we don’t yet realize how our historical view of patient records will change when there’s audio for every visit. So many startups have been born and died building clinical models on top of medical records, that historically are deeply incomplete and nonstandard. The medical record is about to get far more complete than it has ever been before, and most importantly it will be in a relatively standardized format (audio) that could be easily ingested, processed, and interpreted using a mature technology stack in a way that wasn't possible before. While in the short term that has started helping reduce clinician burnout, in the long term it is going to also improve quality of clinical decision making in superlinear ways.
The second wave
The first wave of the application of this generation of AI in healthcare has been scribing; Nuance, Carbon, Abridge, Freed, Nabla and many others have delivered benefits including reducing the documentation burden and cognitive load on clinicians, so clinicians can focus on the clinical work they’re doing with the patient. Future generations will obviously include:
Copilots (eg actually predicting what the clinician should do in the visit)
Diagnostic support (an actual copilot use case where the system helps ask questions that might drive a differential diagnosis)
Predictive support; eg detecting patterns across multiple patients, or longitudinally for a single patient over time and surfacing them to the provider
Eventually there will be models that function as autonomous doctors
There already are teams and companies working on these applications around the world today.
Second order effects
Beyond the first generation, there are a few implications to the advent and widespread adoption of AI charting that I think are not yet broadly apparent. Many more use cases will be built on top of audio (and the ability to extract patterns out of unstructured audio data nearly as quickly and cheaply as structured data). A few examples:
Patient Portability
This isn’t broadly true yet, but I think over time patients will get access and control over their audio. At Carbon we already make a lot of the patient record accessible in the patient app. In addition to their enterprise provider facing app, Abridge has a free consumer app that allows you to capture audio of medical visits. My guess is over time, a lot of consumer health apps try to host this data as part of your holistic health record (you can imagine Apple Health hosting your audio in the same way that you can pull your historical Quest lab results into it). A lot of use cases will be unlocked by patients being able to access and use the audio of their visit as they please, including
Running your audio through a model yourself, either one you find or provided by your payor, another provider etc (in combination with labs results, medical history etc)[1]
Using models to find others with similar patterns that have similar problems to you and are also not yet diagnosed
Sharing your audio with other providers and specialists to apply their models & analysis.
An exchange probably needs to exist
Anyone familiar with how healthcare data is exchanged will probably roll their eyes at this comment, but we can all admit that the healthcare infrastructure is not yet ready for large amounts of audio to be exchanged between parties. Someone will build this, for both clinician < > clinician exchanges, clinician < > patient exchanges, and potentially clinician < > payer exchanges. Something will need to exist to manage consent, transfer, storage, model accessibility, and more.
My instinct is for the most part no clinician will sit and listen to your historical visits! But most clinicians will have access to models that understand the questions the clinician is trying to answer specifically in this interaction or around this problem, and the audio will give them a higher fidelity view than another clinician’s AI generated SOAP note.
Recreating + reviewing charts will be easier than ever
Charts are longer but often easier to read because of their consistent structure. This consistency has an underrated benefit. You can ask questions to AI which has listened to the audio and has every piece of data about the patient in memory. Chart reviews can require much less human overhead as you can focus the models on visits that actually might have a problem, rather than pure random sampling.
Every visit will require audio
The benefits of audio will be so material that I believe it will eventually be required in all visits. Today there remains some resistance (around malpractice risk among other things) to recording visits. But I believe the combination of the documentation benefit, the eventual clinical benefit in that visit, and the eventual longitudinal benefit will convert the industry over, as visits assisted by AI will be associated with better outcomes across the board. One way this might happen is that payors reimburse differentially for visits that are recorded + charted by AI. Transparently I’m not knowledgeable enough about payer incentives to know how this part will play out - it just feels likely.
Non-invasive public health sensing
One horizontal benefit of having audio or transcripts is we can, in relatively real time, extract related signals from unstructured data across a multitude of patient visits happening in parallel. Imagine 50,000 patients across the country reporting the same, unusual symptom within a short timespan, and being able to surface that to the clinician who is seeing you, to public health leaders, or even to the patient. This is the kind of signal that’s been extractable from lab results historically because they’ve been instrumented/quantitative and reported for years. But as AI charting spreads, it's simultaneously possible to detect something happening across a lot of patients, in multiple practices, hospitals, geos, and even across borders, and surface that information to clinicians, patients and public health authorities in near-real time.
I’d be surprised if this isn’t utilized to detect spread during in the next pandemic, and to capture and measure patient responses to various therapies. To do it well, the technology providers will need to work together (for which there’s currently no incentive) to ensure patient health information remains private, and the reporting is somewhat standardized. But the beauty of this is it doesn’t need to be centralized.
Model as a second opinion
Today, there are many stories of people with chronic conditions (or taking care of chronically ill family members) uploading their health data into ChatGPT and asking it questions, and using that to help figure out a somewhat intractable problem that they have been unable to diagnose after many many clinician visits. Over time, people will upload audio as well. Imagine if you’re diagnosed with a chronic condition later in life, and a model can play back all your historical visits and find signals that predict the condition? This has massive benefits for future patients of that condition, and will enable someone to build a network effect helping patients with similar patterns find each other (right now patients do a lot of this on Reddit - audio just drives more precision).
Why I might be wrong
The most fundamental flaw in my argument is this; if the clinician’s algorithm is always right, then theoretically audio is overkill (or non additive). It would mean that there’s no incremental signal to extract from the audio or transcript (everything material that the patient said was picked up, processed and decisioned by the clinician). The main reason I believe this isn’t the case is because there’s enough evidence that if you have a sufficiently exotic problem, it can take years and tons of visits to diagnose correctly, and if there are many symptoms happening at once it can be difficult to recognize which are related, and when.
Audio (and the AI/LLM based applications that can be built on top of it) will be so impactful, that we’ll look back in a decade and wonder how we lived without it. We’re only just scratching the surface. And the best part is, it’s unequivocally better for patients.
Thanks to Kalie Dove-Maguire, James Sun, Eren Bali, Shiv Rao, and Caesar Djavaherian for helping refine this in drafts.
[1] People are already doing this with their existing medical records and lab results. Audio would just enable them to add the richness and depth of the actual clinical interaction (the input to the clinical algorithm as well as the output)
Fascinating, thanks for sharing! I'm personally amenable to audio because I can't write down the details I need as a patient and also think of questions I should ask. Having that transcript is helpful and a less distracted clinician would be very helpful!
One downside I can see here is that the docs / insurers are (hopefully) counter positioned. I want my doc to be like my lawyer and break down whatever auth barriers are between me and successfully executing their treatment plan. In my experience, docs may misrepresent reality a bit to get a test they felt was relevant paid for, but which an insurer following care guidelines views as wasteful.
If audio is a default and insurers get their hands on the audio, won't the docs and patients self-censor? How much will be said to trigger some claims authorization rule, rather than a discussion of symptoms and objective reality?
Very excited by the potential but a bit wary given my experience with chronic disease and high deductible, auth heavy plans.
Thanks again!
Kunle this is the best so far. I think your point on adoption is particularly strong if insurance providers discover their malpractice suit risk is reduced by having this additional audio evidence, or if premiums go down if you have audio because insurance providers can sell raw audio at scale or audio insights to LLMs or other scale data users. If you had cough and respiratory data across america in audio many drug companies would love to know how that worked against different drugs which people took over those periods. Its a totally new vector of information and a true signal that is hard to falsify at scale. Well done. Niyi