By Danielle Brewer-Deluce, Ph.D. (Member of Committee for Early-Career Anatomists) | July 6, 2022
With another semester come and gone, most of us are combing through course feedback to determine what might be improved for next year. For many (myself included), assessment is always an area for productive reform. While the options for assessment are many, given my course sizes of 400-1200 students, I frequently find myself returning to multiple-choice questions (MCQs). Originating in 19141, MCQs are now one of the most popular assessment modalities across higher education. Writing a good MCQ is challenging, but the payoff comes as an efficient (often automated) and replicable assessment you can adjust over time. Given their ubiquity, this article seeks to outline best practices for MCQ creation and revision.
MCQ Creation
Great MCQs should both address important content and be well structured. Content should align with program, course and session learning outcomes and parallel course content in distribution, focusing primarily on critical concepts/competencies.2 In fact, explicitly mapping these three levels of outcomes onto each question can help you focus your question content and ensure alignment. Additionally, creating questions throughout the term, involving TAs contributors, and asking others to review/proofread your questions can also help with these goals.
Structurally, “one best answer” MCQs consist of a stem/vignette/question, and several answer options, one of which is a superior choice to the rest, although many institutions or licensing boards require a standardized format for submitted questions. Reviewing these standardized formats is helpful for aligning your questions with those that your trainees may encounter on licensing exams, and ensures the time you spend creating questions is as efficient as possible. Generally, the stem should be just long enough to contain all necessary information (nothing superfluous or irrelevant) and ideally be answerable without reviewing the answer options. Further, ensuring that the question word-count (stem + answer options) is kept to a minimum, simple and clear language is used, and answer items are ordered logically, are all ways to reduce the cognitive load associated with answering a question, thereby increasing your assessment’s validity.3,4
MCQ Difficulty
MCQ difficulty is largely determined by two things: the quality of the distractors and the verb used in the question. All answer options should be plausible and of homogenous construction. They should be of the same length, grammatically correct, and logically compatible with the stem — there shouldn’t be one option that visually stands out. Terms like “all/none of the above” make a question easier to answer, and terms like “often” and “usually” are ambiguous and create confusion.5 Avoiding both is best.
Selecting your verb in the question/stem is the most important part of MCQ creation as it dictates what students should know and be able to do. Bloom’s Taxonomy6, which has been adapted specifically for Anatomy with the Blooming Anatomy Tool7 offers an easy way to consider verb use. In these models, competencies progress from simple to complex with each higher-level subsuming required skills in previous levels. Critically, verbs, tasks and types of knowledge assessed have been mapped onto each cognitive level and are summarized in Figure 1. Simply pick the level of understanding you want to test and an associated verb, then create your question. For example, here are two stems of differing levels about the Musculocutaneous N:
REMEMBER: Which area of the upper limb receives motor information from the musculocutaneous N? (recall)
ANALYZE: A 32-year-old woman dislocates her shoulder and complains of sensory loss on the lateral forearm and displays weak forearm flexion. Based upon your interpretation of this clinical picture, which nerve is most likely to have been impinged?

Further, using an exam blueprint (Figure 2), to create a template for your exam can help ensure you achieve an appropriate distribution of questions across competency levels and topics.

Reviewing & Revising your Assessments
Upon the completion of an MCQ assessment, it’s critical to review response data item analyses, which are typically calculated by most exam software.8
● Distractors are effective if chosen >5% of the time
- Question difficulty is measured as % of correct answers.
- <30% = Difficult, 30-70% = Ideal, >70% = Easy
- Discrimination index correlates the performance of high-scoring students with low-scoring students on each test-item. Range = -1,1.
- Formula: divide students into equal tertiles (U = upper, M = middle, L = lower)
- (# correct answers in U – # correct answers in L)*2/sample size of U + L
- 0-0.19 = Poor, 0.2-0.29 = acceptable, 0.3-0.39 = Good, >0.4 = Excellent discriminator between higher- and lower-achieving students
- Formula: divide students into equal tertiles (U = upper, M = middle, L = lower)
- Distractors are effective if chosen >5% of the time
When reviewing your item analysis, look out for questions with high difficulty and a low or negative discrimination index as these are questions you likely want to drop from an assessment. Revise any question with a discrimination index <0.2 by rephrasing or adjusting a poorly performing distractor. On future assessments ensure an appropriate distribution (10:80:10%) of difficult, ideal and easy questions.
A Summary of How-To MCQ
- Confirm the standardized format required by your institution/licensing board
- Review learning outcomes to select a topic
- Determine how you’ll ask the question – Consider what information is provided and what Bloom’s level is assessed
- Create your stem + question
- Create the correct answer + plausible and homogenous distractors
- Revise based on test data (discrimination index + difficulty)
- We base a lot on assessment scores, so it’s only fair that we put a lot into their creation and use. MCQ assessment is challenging, but with time, tenacity, and the right tools, it can be a great asset to your course.
References:
- Kelly, F. J. Teachers’ marks: Their variability and standardization. Contributions to Education 66, (Teachers College, Columbia University, 1914).
- Case, S. & Swanson, D. Constructing Written Test Questions for the Basic and Clinical Sciences. (2002).
- Gillmor, S., Poggio, J. & Embretson, S. Effects of Reducing the Cognitive Load of Mathematics Test Items on Student Performance. Numeracy 8, (2015).
- Case, S. M., Swanson, D. B. & Becker, D. F. Verbosity, window dressing & red herrings: do they make a better test item. Acad. Med. 71, s28–s30 (1996).
- Case, S. M. The use of imprecise terms in examination questions: how frequent is frequently? Acad. Med. 69, S4–S6 (1994).
- Forehand, M. Bloom’s Taxonomy. Emerg. Perspect. Learn. Teaching, Technol. 12 (2012).
- Thompson, A. R. & O’Loughlin, V. D. The Blooming Anatomy Tool (BAT): A Discipline-Specific Rubric for Utilizing Bloom ’ s Taxonomy in the Design and Evaluation of Assessments in the Anatomical Sciences. Anat. Sci. Educ. 8, 493–501 (2015).
- Rao, C., Kishan Prasad, H., Sajitha, K., Permi, H. & Shetty, J. Item analysis of multiple choice questions: Assessing an assessment tool in medical students. Int. J. Educ. Psychol. Res. 2, 201–204 (2016).