Standardized student achievement tests could help us identify and reward good teachers and get rid of bad teachers, right? This is what the U.S. Department of Education is purporting for the reauthorization of the Elementary and Secondary Education Act (which governs the flow of federal education funds) and is being highlighted by Maine’s newly appointed acting commissioner of education, Tom Desjardin.

For example, my daughter’s reading progress in second grade stalled. On several assessment measures including the DRAs and DIBELS she made very limited progress in second grade, and on the standardized test used for diagnostics in entering third grade, the NWEA, she scored poorly. She just took the tests again in January and scored much higher. Her third-grade teacher helped her jump 30 percentiles in only four months! These are tremendous growth scores. Surely, her third-grade teacher has earned merit pay, and they should fire the second-grade teacher. Why should there be such a difference?

This is the scenario that holds such weight with policymakers, parents and the media. The problem is that it’s a misuse of the data. Large-scale standardized tests are designed for use with large-scale sample sizes. The psychometricians who design the tests put warning labels on them in the small print that no “high-stakes decisions” should be made solely based on this or any other test. That means no single test should determine teacher evaluation, if a child graduates from high school or a child’s qualification for special education services or identification as gifted and talented. Large-scale standardized student achievement tests are designed for accountability of large-scale policy and funding decisions. To use them for accountability for individual students or teachers with sample sizes of one or a class of 20 is malpractice.

Standardized tests have their uses and places. As an educational researcher I use test results to identify trends and patterns that are helpful in understanding the implications of policies and programs. For example, we can identify a very strong correlation between high rates of poverty and low student achievement in Maine. As poverty increases, student achievement decreases, and there are increases in the number of missed school days for lack of health care, food and transportation. Maine is now third-to-last, ahead of only Louisiana and Idaho, in regressively funding our schools on measures of equity, according to the national Education Week Quality Counts report released in January. In other words, Maine gives more school funding to communities that already have more assets. This is predicted to further exacerbate the inequities of student achievement as measured by large-scale standardized tests.

Student achievement tests help us compare our state to other states and, now, even other countries with the disaggregation of national data on the PISA (Program for International Student Achievement). These data show Maine eighth-graders were eighth in the world in science achievement in 2011. In addition, it allows us to make comparisons between groups to enable decisions about where to turn our attention as practitioners and policymakers. For example, in Maine, boys trail girls by 12.1 percent rated as proficient or better on the 11th-grade SAT writing tests. This raises red flags and shows us where to pay attention.

So here’s the punch line with the story of my daughter’s second- and third-grade teachers. It’s the same teacher. If the school had fired her second-grade teacher because of poor student growth scores, my daughter never would have had the opportunity to continue with this outstanding teacher who has skillfully guided the early reading process for her.

The challenge for policymakers now is to ensure that standardized test scores are used properly because there is a growing swell of discontent and pushback that is fueling the opt-out movement — the parent-driven strategy to refuse to have their children take standardized tests, which is their right as legal guardians. Tying large portions (greater than 10 percent) of teacher evaluation scores to standardized student achievement scores — either absolute scores or growth scores — will further fuel this distrust.

To help ensure public trust, and therefore participation in standardized tests, policymakers must be wise about the ethical use of test data and design policies that are psychometrically defensible.

Flynn Ross is associate professor of teacher education and coordinator of the Portland Extended Teacher Education Program at the University of Southern Maine. She is a member of the Maine Regional Network, part of the Scholars Strategy Network, which brings together scholars across the country to address public challenges and their policy implications. Members’ columns appear in the BDN every other week.