My (imaginary) conversation with a baseball sabermatrician

Baseball’s gets most of the benefits of a very talented group of statisticians. They spend their times trying to figure out the value that each action of a baseball player adds to their team’s chance of winning a game. Every possible action. In fact, ESPN just issued an article regarding the value of a catcher who frames pitches well.

So, I’ve been wondering why education can’t get in on a little bit of that action. A while back, I appealed to Sabermetricians hoping to get some of that talent to play for our team.

Today, through some Twitter conversation with school data aficionado Andrew Cox (@acox) I got an idea of what the conversation might look like if a Sabermatrician responded to my appeal.

We’ll call our statistician Timmy. I’ll play the part of Andrew.

Andrew: Thank you for calling. I have been looking forward to this conversation for a while now.

Timmy: It’s no problem. I’ll do whatever I can to help. But first, I need some information from you.

Andrew: Anything. Just name it.

Timmy: Well, I need to know the goal of the education system?

Andrew: The goal?

Timmy: Yeah, the goal. You know, like baseball’s goal is to win games. The most successful team is the team that wins the most games.

Andrew: Yeah, well, that sort of depends on who you talk to. This article lays out 11 goals (some more explicit than others). President Obama says this. Thomas Jefferson says this. These folks say we should teach entrepreneurship

Timmy: Well… okay. So, you’re saying that education as a whole doesn’t have an agreed upon goal?  Who makes the decision of what a school goal is?

Andrew: Well, school boards make a lot of decisions. Increasingly, it seems like legislators are getting a larger say.

Timmy: I see, well, Okay, okay. So, are schools doing anything to measure how strong the long-term retention is for it’s students?

Andrew: um… while I can’t speak for schools nationwide, I am not aware of any K-12 school districts that are doing anything to measure the long-term retention of the content.

Timmy: Well, that makes it tricky to figure out the practices that contribute to that.

Andrew: I agree.

Timmy: Well, okay. Goals might be tough to define. I understand. It’s a diverse system. What about the means?

Andrew: The means?

Timmy: Yeah, like, in baseball, the means of reaching your goal of winning is to maximize the runs you score and minimize the runs your opponent scores. So, what are the means of reaching the goals of the educational system?

Andrew: Yeah… the means.

Timmy: Yep.

Andrew: Well, it kind of depends on which goal the school has.

Timmy: I’m sorry, I know I come from the baseball field, but isn’t learning the goal?

Andrew: Yes, absolutely.

Timmy: So, what practices maximize that?

Andrew: Well, these people say effective use of formative assessment and differentiated instruction. These folks have all kinds of advice. Some of that advice matches these folks’ advice.

Timmy: Okay, there some things to work with there. Some of that’s teacher stuff. Some of that is student stuff. Some of that is parent stuff. Some of that is administrator stuff.

Andrew: Yup. That is pretty much true.

Timmy: How can you tell if those things are actually happening in a classroom or in a student’s home?

Andrew: That can be tricky business. Principals have a hard time get into the classrooms to support instructors. And you can’t really ask to do walkthroughs on students’ homes. 

Timmy: So, what data do you have?

Andrew: We have TONS of demographic data. We have attendance and behavioral data. We have test scores.

Timmy: I’m sorry, do you really need to contract me so that I can tell you that there is value added to a kid’s experience by showing up to school and not getting in trouble?

Andrew: No… no, we knew that one.

Timmy: One thing that helps baseball statisticians is that every play is recorded from at least 3 camera angles. So, why don’t you just put cameras in each classroom to get a real sense of what teachers and students are doing?

Andrew: Some think that would be usefulLawyers say that’s risky. 

Timmy: Well, Andrew. If you don’t have a goal, you can’t really isolate the means, and we can’t really observe any of the practitioners in any real detail, then what do you expect do get done with your statistics?


Doggone it, Timmy. That’s a great question.

Perception and Reality – (Lean not unto thine own understanding…)

In Basic Economics, Thomas Sowell tells a story about a decision made by a New York politician who was attempting to address the homeless problem in New York City. The politician noticed that most of the people who were homeless were also not very wealthy. The politician moved forward with the idea that the apartment rent prices were simply too high for these people to afford a place to stay.

So, he decided to cap the rent prices… and the homeless problem got worse. How could this possibly be?

Well, according to Dr. Sowell, lowering rent prices, while making the apartments more affordable for those in need, did the same for everyone else. The suddenly cheaper rent prices decreased the rates of young folks sharing apartments. Also, people who have several places they call home throughout the year might not have found it reasonable to pay a high rent price to keep a NYC apartment that they might only stay in a few times throughout the year. Lower rent prices made that seem more reasonable.

Evidence also suggested that there was an increase in apartments being condemned. Lowering rent costs meant that landlords found themselves with fewer resources to maintain buildings, repair damages, pay for inspections, etc.

While the decision made the apartments more affordable, it also made them more scarce. There was a disconnect between a decision-maker’s perception of a situation and the reality. That disconnect led to a decision that ended-up being counterproductive.

I may have just done the same thing… maybe.



Sometimes things make so much sense. If we did this, it would HAVE to produce that. It make so much sense. How could it possibly not work?

This perception was in place among some in my community. It led me to decide to try The 70-70 Trial, which I’ve been at for about 10 weeks now. The perception in place goes like this:

a. Formative assessments prepare students for summative assessments.

b. Students who struggle on formative assessments are more likely to struggle on summative assessments (and the inverse is also true.)

It’s with these two perceptions in mind that we assume that the if we can ensure a student achieves success on each of the formative assessments (regardless of the timeline or the number of tries), we improve their chances of success on the summative assessment.

The 70-70 trial did what it could to ensure that at least 70% of the class achieved 70% or higher proficiency on each formative assessment. (There were four.) This included in-class reteach sessions and offering second (and in some cases third) versions of each assessment. With all of those students making “C-” or better on each formative assessment, how could they possibly struggle on the unit test? That was the perception.

50% of the students scored under 50% on the summative assessment. That was the reality.

Now, I am not an alarmist. I understand that one struggling class in one unit doesn’t discredit an entire education theory. But it sure was perplexing. I’ve never seen a test where, after 8 weeks of instruction on a single unit (Unit 4 from Geometry), half of an entire class unable to successfully complete even half of the unit test.

And when you consider that this class was the one class I had put the most effort into defeating just that kind of struggling, well it seems like the intersection of my perception and the reality wasn’t nearly big enough. I just got a better view.

And I’m having a hard time making sense of what I’m seeing.

Conversation starter: Is failure an option?

Let’s talk about students failing classes, specifically in high school.

Let’s suppose a teacher spent the last ten years teaching high school math. Let’s suppose further that the same teacher hadn’t had a single student fail his or her class for that entire span. This teacher is going to have that data met with a fair amount of suspicion, whether it is fair or not.

Let’s suppose a different teacher spent the same ten years in a comparable district teaching high school math. Let’s suppose further that for that time span 3 out of every 4 students who started that teacher’s math class left with a failing grade at the end of the year. This teacher is going to have this data met with outrage (and in all likelihood would never have made it 10 years like that.)

So, ten years without a failing student is suspicious, it’s potentially evidence of a rubber stamp course. 10 years at a 75% failure rate is outrageous. It is potentially evidence of a course that is unnecessarily difficult for a high school math course.

So… what’s an acceptable number of failures?

Actually, let me ask this question a different way…

How many students should be failing? It seems a little strange to suspect that anyone should fail a high school course, but is there an amount of failures that demonstrate a class is healthy and functioning properly? Is that number zero? Is it 5%? 10%? 20%?

I’ll tell you what motivated this post: I am aware that some schools impose mandatory maximums of failure for their teachers. It might be 12% or 7% or 2%. In these districts, each teacher in the district needs to make sure that at least 88% or 93% or 98% of students earn passing grades for their class each semester.

The implication is that if more students than the accepted maximum fail to earn passing grades, it is a reflection on the inadequacy of the course, the instructor or the support structures. But, I’m not sure if that’s true. And besides that, how does a district or community decide the acceptable percentage of failures?

There is another side of this argument that says that a school should be prepared to fail 100% of their students if the students don’t meet the schools requirements. This is the only way to motivate students to reach for the standard of proficiency that the community has agreed upon. There would certainly never be an instance where a teacher had to fail 100% of his or her class, but if the students didn’t meet the requirements, the teacher would have the support of the school and the community to every student, even if that meant 100% of them.

We should probably figure this out because failure numbers are starting to work their way into the mainstream, as demonstrated by this Op-Ed from the New York Times which asserts that perhaps Algebra should be reconsidered as mandatory for high school graduation because nationwide, math provides a stumbling block and the subsequent failures are leading to increased dropout rates. (This seems like a highly contentious point in itself, but it doesn’t mean that it isn’t driving decision-making in some communities.)

Here are the issues in play here:

What portion of the responsibility of a single high school student successfully earning a high school credit is the school’s and what portion is the student’s?

What are the costs of high standards? If we want to increase rigor, there is almost certainly a trade off in that there will be an increased number of students who are unable or unwilling to go through the more rigorous process to earn the credit.

What are the implications of community with class after class of students who know that the teachers are pressured to pass a certain percentage of their students? Is this effect overstated?

Has this ever been studied? I’m not sure if there’s ever been a comprehensive, research-based statement made on the topic of student failures and what the optimal percentage are. And if that’s the case, then should we be making decisions based on “what seems too high” or “what seems too low”?

I am looking for some conversation on this topic. Let me know what you think. Links to posts or articles by people that you trust are appreciated, too.

The Reteaching Tightrope

So, the 70-70 trial has reached it’s first needed reteach session. (I explain the 70-70 trial here.)

Only, here’s the thing: Not every class who needs to explore a topic for the second time in the same situation. As part of my data collection for this trial, I am exploring the mean of the top 10 scores of each class as well as the scores of the bottom 10 scores on each individual assessment. I am doing this with two classes. One had a Top10-Bottom10 gap of 46.1 percentage points. The other class had a gap of 33.6 percentage points.

My reason for exploring this gap is that if a group is struggling to meet the 70-at-70 line, I want to know if where the mastery of those who understand the material compares to the mastery of the students who are struggling.

If there’s a lot of mastery among the top performers and a very low amount of mastery among the lowest performers, then the reteaching session becomes a little bit tricky because a large chunk of the class fits into two categories: Those who get it really well and those who don’t get it very well. Both of those groups are naturally resistant to reteaching. One because it is completely unnecessary and the other because it is completely uncomfortable.

All of which makes for a very delicate classroom management strategy for that hour, which I didn’t have today. I should have seen it coming. The successful students were not inspired to support the struggling students, and in fact (a few of them) blamed the struggling students for what they considered to be a meaningless class period.  The struggling seemed uncomfortable. I kept forcing them to do work they didn’t know how to do.

The class where the high achievers weren’t quite as high and the lower achievers weren’t as low took to the reteaching much, much better. The second try, the gap closed to 28.8 percentage points, with the average of the top 10 scores being over 90%. It seems like that class had a stronger sense that they all had something to gain from the extra learning time…

… as opposed to the other group where the majority felt like they had nothing to gain.

Where are education’s Sabermetrics?


In the last couple decades, baseball has gone through a statistical revolution because of a fairly simple question: “How good is that baseball player?” For the previous 8 or 10 decades, there were a limited number of metrics that got used to evaluate ballplayers. Hitters, for example, were evaluated by how often they got a hit (batting average), how often their at bat produced a run (RBI), how many home runs they hit. Pitchers were evaluated by the number of batters they struck out, how many runs the other team scored that the pitcher “earned”, and how many games the pitcher started in which the pitcher’s team won (or thereabouts… the pitcher “win” is actually a pretty odd (and seemingly useless) stat…).

What made those statistics appealing is that they were fairly easy to compile and communicate.

But there was a problem: traditional metrics didn’t tell enough of the story. Perhaps a pitcher earning a win had more to do with them pitching for a team that scored a lot of runs. Perhaps a hitter with a lot of home runs played in a home ballpark with shorter fences. Perhaps a hitter with a higher batting average rarely walked and hit into a lot of double plays. The traditional metrics became difficult to trust (especially to an owner deciding to commit tens of millions of dollars to a player). So, new statistical measures were developed that attempted to factor in all of the nuanced information that baseball can provide. (Read all about it…)

Education is dealing with a similar issue. We are trying to become a data-driven. We want to use data to tell us what the education world is like. So, we started measuring things. Important things… like literacy.

Well, this floated across my Facebook wall…

Literacy was at 98%, huh?

Literacy was at 98%, huh?

One can fairly easily derive the intended meaning of this meme. It would seem like “” would like us to think that in 1850, the world was a much more educated place because 98% (of… something…) were literate.

But let’s examine the statement “literacy was at 98 percent.” Talk about a loaded statement! 98 percent of students could decode a text? 98 percent of eligible voters could read the ballot? 98% of families owned a book? What does “literacy was at 98 percent” mean?

Maybe there was a test, and 98% of the kids passed it, which is good, except perhaps before Massachusetts made education compulsory, only kids who could read went to school.

Literacy is complicated. There are some parts of education that are easy to measure. (For example, attendance, homework completion, correct multiple-choice answers, grade-point average). We’d like literacy to be easier, so we invest in tests like DIBELS that attempt take a student’s literacy and work it down to a set of ratings that are easy to communicate. The ACT does the same thing with college readiness. College readiness is complicated, too, but reading an ACT score isn’t complicated.

We’ve tried to quantify as much as we can. We’ve tried to quantify student performance, teacher performance, curriculum performance. We want to know how well they are working. We want to know where we are being successful and where we are letting our students down. That’s a good thing.

The problem is that there are some incredibly important parts of education that are very difficult to measure. Like, impact of an individual classroom management strategy on student achievement, student engagement, scheduling classes to optimize student achievement or the role of extra-curriculars. These are HUGE questions with answers that are not easily quantified. And most school districts are without the means (time, money, qualified personnel) to do an in-depth analysis necessary to achieve a well-rounded look at a complicated issue like overall student achievement for each student each year. So we substitute with some easier to achieve metrics like DIBELS, an ACT score, and grade point average.

And those have become our Pitcher Wins, RBI, and Home Run. They don’t tell us nearly enough of the story.

Where’s our sabermetrics? Where can education go to see the stats that can combine to provide a more three-dimensional look at our system, our teachers, and our students? I understand why baseball got the first turn with the statisticians. There’s way more money in it. Maybe some of you stats folks who have decided that your financial future is secure wouldn’t mind e-mailing me. We’ll sit down. I’ll share with you the data I have (a ton) and we can develop some formulas that produce some metrics. Maybe you can tell me how well that curriculum program is working? How about what kind of environment a particular student performs best in? Which types of literacy patterns are strong predictors for future struggles in mathematics or science?

I look forward to hearing from you.