Tuesday, May 22, 2007

challenge index

Every year, Washington Post reporter Jay Matthews publishes a list of top high schools in Newsweek called the Challenge Index. He calculates the number of AP/IB/Cambridge tests taken divided by the number of seniors in a school and ranks everyone by that number. Every year people claim that his rating system is too narrow, or too damaging, or even misleading, and every year he writes a column to defend himself.
here is this year's.

His main point is that a school is good if it challenges *all* of its students, not just its highest achievers, and that even if students take an AP class and do poorly on the exam, they have still benefitted from being exposed to college level work. It makes a lot of sense. However, I think he's being too dismissive of his critics. Here's a quote from this year's column:
Recently two education experts, Andrew Rotherham and Sara Mead of the Education Sector think tank in Washington, D.C., said it was wrong for Newsweek to label "best" schools with high dropout rates and low average test scores like many of the low-income schools on the list.

Offhand, it looks like they hit on the two big ways that a school could game the ratings system. I'm sure Matthews would argue that his index isn't influential enough for people to bother gaming it. He's probably mostly right, but the issues they raise are still worth considering.

First, there's the issue of low average test scores. He's absolutely correct in pointing out that standardized test scores overwhelmingly correlate with parental income levels (and have a small additional correlation with race, I'd add), so that if you don't believe that richer (or whiter) kids are simply smarter, you're seeing the lifetime effects of a second class education reflected in those scores. In other words, schools are doing the best they can to play catch up, and we should reward them for trying.
I agree.
However, the reason Matthews counts AP, IB, and Cambridge tests, but not community college or honors classes, is that the first 3 have universal, independently evaluated standards while the last 2 do not. A big program in high schools is "course title inflation," where "senior calculus" is really nothing more than first year algebra, or honors english never has you writing a paper longer than two pages. AP, IB, and Cambridge are less prone to title inflation because there's an exam at the end, so if the whole class fails the exam, you know something fishy is going on.
Except when you don't. because the kids are poor.
See the problem? I don't think you'd find a school deliberately enrolling kids in a faux AP European History class so that they sit through a 3 hour long exam, score their 1's, and improve the school's challenge index rating. I do think you can find schools willing to offer comprehensive, challenging AP Euro classes, until the teachers realizes how underprepared their students are and start assigning 2 page essays and diorama projects instead.

Second, there's the issue of dropout rates. This is really simple: If your formula is (# of tests)/(# of seniors) you can make yourself look good by raising the numerator OR by lowering the denominator. I don't know if this actually happens outside of Pump Up The Volume, but concievably a school could push its problem students out, if not by expelling them, then by making school a place where they really don't want to be. Even if nothing sinister is going on, a school with a high dropout rate is certainly not helping all its students succeed. It's giving up on a lot of them. I think the formula should be reworked to take this into account. Assuming incoming class size stayed static from year to year, you could calculate (# of tests)/(# of freshmen) so that you're measuring the AP participation rates based on a class's original size. or maybe continue to calculate (# of tests)/(# of seniors), but then multiply the result by (1 - dropout rate).

Matthews responds to criticism with a great movie analogy:
The adjective "best" always reflects different values. Your best movie may have won the most awards; mine may have sold the most tickets. In this case, I want to recognize those schools with the teachers who add the most value, even in inner-city schools where no one has yet found a way to reduce dropouts or raise test scores significantly.

Great, but not sufficient. You've just shown me that it's possible to have different, equally valid rating systems for the same product, but you've yet to prove that your system is one of the valid ones. I'll take that movie analogy and respond with a (somewhat tortured) television one:
What are the "best" TV shows? Are they the ones that win the most Emmys or the ones that have the most viewers? Maybe you think the latter is true, so you rank your shows by Neilsen ratings. But take a closer look at Nielsen ratings. People self-report what they watch and do not always mention when they change a channel or leave the room or mute the commercials. You don't know the real quality of a viewer's watching experience. Furthermore, Neilsen doesn't count shows recorded on DVRs unless they're watched within 24 hours, so you're not getting the true number viewers. What Neilsen tries to measure is worth measuring, but that doesn't mean its doing a good job of capturing it. The same could be said of the challenge index.

No comments: