One of the most challenging aspects of writing multiple-choice test questions is identifying plausible incorrect response options-i.e., distractors. To help with this task, a procedure is introduced that can mine existing item banks for potential distractors by considering the similarities between a new item's stem and answer and the stems and response options for items in the bank. This approach uses natural language processing to measure similarity and requires a substantial pool of items for constructing the generating model. The procedure is demonstrated with data from the United States Medical Licensing Examination (USMLE®). For about half the items in the study, at least one of the top three system-produced candidates matched a human-produced distractor exactly; and for about one quarter of the items, two of the top three candidates matched human-produced distractors. A study was conducted in which a sample of system-produced candidates were shown to 10 experienced item writers. Overall, participants thought about 81% of the candidates were on topic and 56% would help human item writers with the task of writing distractors.
Keywords: automatic item generation; item writing; large-scale testing; natural language processing; test development.