Test Assembly

Applications of Uniform Test Assembly for the LSAT (RR 05-01)

In a large-scaled high-stakes testing program such as the Law School Admission Test (LSAT), it is necessary to maintain a large bank of test items to support the demand for a new test form at nearly every administration. To assure that the item bank can support the test assembly requirements, ongoing monitoring of the quality of the item bank is necessary to identify deficiencies and direct item development efforts. Recent research along these lines has included efforts to identify the properties of the most valuable item in the item pool, identify the test assembly constraint(s) that are the most difficult to meet, determine the distribution of test taker ability that supports the highest degree of usability of the item pool, and develop statistical test assembly targets for multiple stage testing.

Many of these practical testing problems have recently been addressed by the application of test sampling methods. Test sampling may be described as the sequential assembly of multiple test forms such that each question (item) or item set can be used an unlimited number of times. Therefore, test forms produced can overlap with each other, i.e., have items/item sets in common. The result is a sample of test forms from the finite set of all test forms available from the given item pool under a given set of test assembly constraints. In order to insure that the inferences from such research are statistically correct, the sampling must be uniform, that is, each test should have an equal chance of being assembled. Thus, the test assembly problem plays a fundamental role in the test sampling method. In particular, our interest is in methods of uniform test assembly providing uniform test sampling.

Mixed integer programming is an approach commonly applied to the test assembly problem. This paper presents proof that the mixed integer programming approach cannot guarantee uniformity of the sampling, and goes on to formulate a test assembly algorithm that assures a uniform sampling. An extension for assembling multiple nonoverlapping test forms/test sections based on item usage frequency is described, as are extensions to multiple stage and computerized adaptive testing.

The methods illustrated in the paper provide researchers and practitioners from testing organizations with a simple and flexible framework for assembling tests and monitoring item pools in linear, multiple stage, and computerized adaptive testing.

Request the full report

Additional reports in this collection

An Overview of Research on the Testlet Effect: Associated...

A mathematical model called item response theory is often applied to high-stakes tests to estimate test-taker ability level and to determine the characteristics of test questions (i.e., items). Often, these tests contain subsets of items (testlets) grouped around a common stimulus. This grouping often leads to items within one testlet being more strongly correlated among themselves than among items from other testlets, which can result in moderate to strong testlet effects.

Robust Text Similarity and Its Applications for the LSAT...

Text similarity measurement provides a rich source of information and is increasingly being used in the development of new educational and psychological applications. However, due to the high-stakes nature of educational and psychological testing, it is imperative that a text similarity measure be stable (or robust) to avoid uncertainty in the data. The present research was sparked by this requirement. First, multiple sources of uncertainty that may affect the computation of semantic similarity between two texts are enumerated.