This question isn’t as out there as it may seem. Although previous generations of ChatGPT were noticeably flawed, the most recent (GPT-4 at the time of this writing) has undergone a lot of work to iron out the kinks. GPT-4 scored extremely well on exams like the GRE, SAT, LSAT, and numerous others. Multiple studies have also explored its ability to solve medical competency examinations and handle benchmark datasets, with one finding that it exceeded the passing score on the USMLE exam by 20 points.
Ad/Sponsor: Do you need a urodynamics partner to help your practice with urodynamics testing and urodynamics nurse staffing. Click the link below to see how BHN can help your practice with urodynamics.
At the same time, there are flaws. Passing an exam or answering basic competency questions correctly isn't the same as being a qualified clinician, and ChatGPT has a clear weakness: It's a language model, meaning that it has zero appreciation of meaning.
Think of it like predicting what someone will probably say based on the beginning of a sentence or a question you ask them. If you know the person well enough, and you’ve limited the possible responses by asking a specific question, it’s not far-fetched to assume you’ll make an accurate guess.
Now imagine you’d somehow read every website in existence and a few million clinical studies. You’d probably be able to predict a wider range of responses based on context.
ChatGPT's predictive powers are impressive, but they're entirely reliant on the training data it's absorbed from the internet. Even when it technically gets the answers correct, it doesn’t mean the bot understands what it’s saying. It just knows how to join words and phrases into a likely pattern. You can watch this in action if you go to the ChatGPT playground and play with the language models — tweak the randomness factors, and things quickly start to go off the rails.
With all this in mind, we wanted to see how the AI performed in the context of urodynamics, a topic very specific to urology and urogynecology. Would it produce any insights we hadn't thought of or just repeat common knowledge? Or would it hallucinate and spout out totally incorrect answers?
To find out, we asked it a few questions and informally rated the responses. Here are our results using the GPT-4 model:
Chat GPT responded:
Urodynamics testing is a diagnostic procedure used to evaluate the function and performance of the lower urinary tract. It involves a series of tests that assess how the bladder, urethra, and associated muscles are working together to store and release urine. The primary purpose of urodynamics testing is to aid in the diagnosis of various urinary system disorders and to help determine the most appropriate treatment options...
This was pretty spot-on, and from here, ChatGPT went into detail, listing techniques like assessing bladder function, pressure, and capacity, investigating urinary incontinence, and performing postoperative assessments. While there weren't any glaring errors, the bot tended to drone on and on — we had to cut it off early, and if you don't explicitly tell it to focus on a given topic, it'll try to write an entire book. This could prove problematic if you're using it to answer patient questions, where concise, to-the-point answers are usually the most helpful.
For this, ChatGPT generated a bulleted list including uroflowmetry, cystometry, pressure-flow study, electromyography, and video urodynamic study. The AI sketched out the basic steps of each test, revealed what they measured, and gave a few of the conditions they helped treat.
Although the information provided might be fairly helpful to an unfamiliar patient looking for a quick summary, it lacked depth. When we asked the AI to go into more detail by explaining the steps involved in cystometry, it produced a good expansion of the process but didn't distinguish between the voiding phase and uroflowmetry or pressure-flow study. It technically did as asked — it just wasn't smart enough to realize what we were after based on the context of the previous question.
Here, the AI did a good job of listing out conditions like interstitial cystitis, bladder obstruction, voiding dysfunction, neurogenic bladder, etc. Once again, however, it didn't connect these topics to urodynamics testing in a way that explained why urodynamics was valuable. In other words, it exemplified the old programmer's adage that "computers do exactly what you tell them to, even if that's not what you really want."
Interestingly, the bot could tell that preoperative assessments were a good use case for urodynamics testing, but it couldn't make the simple logical leap to realize that post-op assessments were also viable for similar reasons. Since post-op assessments may produce a greater care value than pre-op in some circumstances, this was a noteworthy omission.
For this one, we asked the bot to answer in 150 words or less to see if it would stick to the most essential patient instructions. It ignored this instruction, producing a nicely-ordered but overly long list that seemed to be directly addressed to the patient.
There was also a pretty big mistake. ChatGPT said: "On the day of the test, make sure to empty your bladder completely before the procedure." This isn't always true — in many cases, practitioners ask patients to arrive with a full bladder. Ironically, the reasoning ChatGPT gave was that pre-test voiding would ensure accurate measurements. In reality, it might do the opposite by painting an inaccurate picture of patient conditions during some tests.
On this one, the AI somewhat dropped the ball. While it was able to point out some factors that figured into an interpretation, like pressure measurement and flow rates, it wasn't too clear about how a clinician might use them to aid a diagnosis. In all, the response we got was somewhat vague, although it might be a good starting point for an introduction to the topic. More worryingly, some of the answers it listed completely changed when we gave the bot a second chance by regenerating the response — it obviously wasn't adhering to any official set of best practices.
The bot's response to this question was surprisingly thorough. It mentioned not only side effects like vasovagal response, hematuria, and UTI but also noted whether these outcomes were very likely.
This response was also acceptable. The bot explained what urodynamics testing is, what patients could expect from a procedure, and what their doctor might do with the information they collect. As advertised, ChatGPT does fairly well with surface-level explanations.
Here, we came up against one of ChatGPT’s most famous limitations. As the bot put it:
As an AI language model with a knowledge cutoff in September 2021, I don't have access to the latest research or advancements beyond that date.
Although the AI was able to provide some information on recent trends before its last update, it's important to remember that urodynamics is an evolving field. If you want to stay ahead of the latest trends, you're better off reading the research yourself — or finding a urodynamics partner that helps keep your team current.
At this point, we should point out that entire careers have sprung up around prompt engineering, or devising specific instructions geared toward making ChatGPT behave properly. If you're a practitioner looking to generate outreach content, such as for patient education, you'll inevitably need to dedicate time to asking the right questions in a focused way.
One of the biggest problems we found with using ChatGPT for urodynamics was actually its seeming accuracy. The majority of the content it generates looks great and reads fairly well. But this can lull you into a false sense of security since there's no obvious indicator of when the bot produces erroneous output. In other words, the mistakes are tough to spot, requiring ample review.
Our opinion is that ChatGPT isn't quite ready for the prime time (neither in urology nor any other field of healthcare). It's also apparent that OpenAI, the parent company behind the bot, doesn't quite think it's good enough to be safe either — almost every response we generated came with a unique disclaimer in addition to the one at the bottom of the page.
However, given the rate at which OpenAI is advancing ChatGPT, we could very likely come to a different conclusion in a few months. We fully expect to see a LLM totally dedicated to Urology in the coming few years, which could be a game changer for everyone in the field. We also expect to see an AI tool capable of interpreting a urodynamics study better than 99% of all urologists, which will certainly be a game-changer as well.
If you need help with urodynamics staffing, request a quote below: