Everything you’ve ever wanted to know: ACCESS test development
Understandably, educators are often asking us questions about how WIDA develops and updates ACCESS for ELLs such as, how does field testing work and who participates? And, for what reasons would WIDA get rid of a test item? We spoke to Fabiana MacMillan, director of test development at WIDA, and Tanya Bitterman, associate director for test development at the Center for Applied Linguistics, to answer some of these questions. Read on to learn more about the development of ACCESS.
How does WIDA know that it’s time to update an assessment?
Fabiana MacMillan: Updating assessments is a very important practice that helps ensure continued test reliability, validity and security. In the case of standards-based assessments, such as ACCESS, revising test content to reflect updates to the WIDA English Language Development Standards Framework contributes to test validity by ensuring the test accurately measures what it claims to measure. These revisions also help ensure that test content reflects content taught in the classroom and allow us to make improvements in accessibility and learner experience. Updating test content also contributes to test security by limiting item exposure.
There are several factors that influence the cadence of assessment updates—two of the most important are stakes and frequency of use. The higher the assessment stakes (i.e., the importance of the decisions made based on assessment scores), the higher the risk for test security. High-stakes assessments that are administered with relative frequency, such as annual tests like ACCESS, require more frequent updates.
Tanya Bitterman: Outside of standards updates, there is also annual refreshment for ACCESS for ELLs Online. Each year we change out a certain percentage of the items on the test (50% for Writing, 67% for Speaking, and roughly 30% for Listening and Reading). We consider several factors when deciding which items to replace, such as how long they have been on the test, and data on how they’re working and how informative they are.
What is the Center for Applied Linguistics (CAL) and how do they partner with WIDA in the test development process?
Tanya: CAL is a non-profit organization whose mission is to promote access, equity and mutual understanding for linguistically and culturally diverse people around the world. We serve as WIDA’s test development partner for ACCESS and other assessments. We have a team of about 35 staff with skills in test development, test production, rater training and psychometrics who work on WIDA assessments.
CAL partners with WIDA at all stages of content development for several of WIDA’s assessments, including ACCESS for ELLs, Kindergarten ACCESS for ELLs, WIDA Screener and WIDA Screener for Kindergarten. Our test development team is responsible for developing new test items, including facilitating reviews by educators and continuously refining new test items until they are ready for large scale field testing. Our test development team also prepares the training materials used for scoring speaking and writing tasks. Our test production team is responsible for creating the graphics that appear on the assessments and creating the final test forms for printing, as well as producing the code packages that allow online test items to be delivered to students.
What are some of the high-level steps that go into developing or updating an assessment?
Fabiana: The test development process involves a meticulous progression from content creation and design, through community review and data analysis, to the finalization of the assessment, ensuring it meets rigorous standards and is suitable for the target student populations. Here’s an overview of those steps:
- CAL starts by creating or updating the item specifications. Each test item has a written specification for what it must test and how it should be created. WIDA and CAL work with educators to generate ideas for the test items based on those specifications, and then work with item writers who take educator input and turn it into a first draft.
- Another group of educators provides feedback, to help ensure the items are aligned to the required standards. CAL’s test development staff and test production staff then further refine the test item and create graphics for it.
- Another group of educators review to check for content appropriateness, look for any potential bias issues and look for any topics that may be too sensitive to present on the test. This step involves the combined efforts of CAL and WIDA.
- CAL’s test development staff apply revisions. The items then proceed to field testing.
- Writing domain items go through one extra step: small-scale “tryouts.” This is where educators administer the items to their students and send the responses back to CAL, allowing CAL and WIDA to fine-tune the tasks before large-scale field testing.
How is a test item created? Who comes up with the topics/ideas?
Fabiana: The initial topics or ideas for items are always generated by groups of educators recruited from across the consortium each year. Educators are presented with item specifications prepared for the upcoming test series. The specification tells what grade-level cluster it is for, what language domain is being assessed, what WIDA ELD standard is being assessed (e.g., Language for Science), what proficiency level should be targeted, and what Key Language Use should be assessed.
With those specifications in mind, educators recommend topic ideas related to state content standards. Those topic ideas are then reviewed by CAL and the WIDA Accessibility and Inclusion team to select topics before they are handed over to professional item writers to develop draft items. Draft items then go through an iterative development and refining process including multiple rounds of reviews by WIDA, CAL and educators, as described in the answer above.
What goes into writing a ‘good’ English language proficiency test item?
Tanya: A good test item starts with a good theme—one that is accessible to all students in the grade-level cluster and free of any potentially sensitive topics.
Test items need to be situated in a content area topic. They are not testing students’ knowledge in that content area, but they must use content area topics to assess its language. For example, a Listening test item assessing the Language for Language Arts needs to assess a students’ ability to interpret language that would be heard in a language arts classroom.
Fabiana: ACCESS is not a test of content knowledge. Instead, ACCESS uses academic content as the vehicle for assessing English language proficiency. As such, test items are developed so that background knowledge of the content used as context is not required for students to successfully answer the questions. In addition to what Tanya mentioned, ACCESS items must meet additional criteria including, but not limited to, the following:
- Natural sounding language that is maximally comprehensible and readable
- Graphics are depicted clearly and simply (e.g., with a clear intended focus and sufficient contrast between colors to ensure everything is clearly visible)
- People, buildings and settings depicted represent a diversity of subgroups (e.g., ethnicity, socioeconomic status, persons with disabilities)
In WIDA assessments, why do students have to read their Listening test answer options?
Fabiana: In grade 1 and in items targeting lower proficiency levels, all response options in listening items are images/graphics. However, in items assessing higher proficiency levels in grades 2 and above, it is often not possible to effectively target the language expectations in the ELD Standards without text answer options. However, it should be noted that whenever text options are used in the Listening domain, they are always written using language that is two proficiency levels below the target level of the item.
Tanya: It’s often extremely difficult to convey possible answers clearly in a graphic. Much of the more complex language required in higher grades and at higher proficiency levels doesn’t lend itself to representation in a small graphic that is easily and clearly interpreted.
How do WIDA and CAL make sure that test items are accessible to all students, unbiased, and appropriate for each grade level?
Fabiana: One of the most important steps in the development of items is the Bias, Sensitivity and Content Review. The ACCESS for ELLs Bias, Sensitivity and Content Review is conducted every year in the spring for Listening, Reading and Speaking domains, and typically in the fall for the Writing domain. During this step, WIDA, CAL and educators thoroughly review the content of new test items for authenticity and grade-level appropriateness.
Reviewers suggest revisions where needed to ensure that test items are accessible to all students and are free of material that might favor any subgroup of students over another based on gender, race or ethnicity, disability, home language, religion, culture, region, or socioeconomic status.
How are test items field tested? Who participates in field testing?
Tanya: ACCESS Online test items are field tested every year, embedded in the test forms that students take during their regular testing experience. All students who take ACCESS Online see field test items, which allows WIDA and CAL to collect robust data on how the items work before deciding whether to use them for operational testing. CAL staff also read or listen to hundreds of Writing and Speaking field test responses each year, to better understand how students are interacting with the tasks and to identify responses that can be used to train raters on how to score the tasks.
Fabiana: As Tanya mentioned, we embed field testing into ACCESS Online each year, and for some other assessments, like WIDA Alternate ACCESS, we conduct a standalone field test. Field testing is used to calibrate new test items and establish how difficult they are relative to all the other items on the test. During field testing, we want to administer items to students who represent those who take the operational test, so that does not include non-ELs.
How does WIDA know that new test items performed as they should have during test development events like field testing?
Tanya: For multiple choice items, like those on the Listening and Reading tests, WIDA and CAL look at several different statistics following administration of field test items. We look at what percentage of students answered each item correctly, as well as data on how difficult the item was, and whether the students we would expect to get it correct did so. We also look at data indicating whether the item may have favored any group of students based on factors such as gender.
For constructed response items, like those on the Speaking and Writing tests, WIDA and CAL evaluate both quantitative and qualitative data. We look at the distribution of scores that were awarded by raters, i.e., what percentage of students received each of the possible scores. We also look at statistics that show how difficult the item was. CAL staff read or listen to student responses to evaluate how the text, graphics and audio that make up a writing or speaking task is working and whether responses show the qualitative characteristics expected.
What would make you get rid of a test item?
Tanya: Test items may become outdated over time. Technology changes and typical school practices change, for example. Test items may no longer reflect the current understanding of language ability (e.g., updated standards no longer support assessing a certain Key Language Use as part of a certain content area standard). Current events may introduce a sensitivity issue into an existing test item, or an issue could come up that was not identified during the development process. Data from larger scale administration may show that a test item isn’t working as well as was initially observed. There may have been an exposure of the content affecting the security of the test item.
Check out our Building a WIDA Assessment page to learn how you can contribute to WIDA's test development!