Researchers design a new way to more reliably evaluate AI models' ability to make clinical decisions in realistic scenarios that closely mimic real-life interactions. The analysis finds that large-language models excel at making diagnoses from exam ....