ClinicalBench Physician Adjudication
Review 240 blinded clinical QA items (120 questions x 2 conditions). Rate gold standard accuracy, model correctness, safety, and utility.

Your progress saves automatically. Close anytime and pick up where you left off. Conditions are blinded as A/B -- you will not see which system generated each answer.

Part of the EpiKG ClinicalBench validation study. Questions use de-identified MIMIC-IV clinical data.