Where are education’s Sabermetrics?


In the last couple decades, baseball has gone through a statistical revolution because of a fairly simple question: “How good is that baseball player?” For the previous 8 or 10 decades, there were a limited number of metrics that got used to evaluate ballplayers. Hitters, for example, were evaluated by how often they got a hit (batting average), how often their at bat produced a run (RBI), how many home runs they hit. Pitchers were evaluated by the number of batters they struck out, how many runs the other team scored that the pitcher “earned”, and how many games the pitcher started in which the pitcher’s team won (or thereabouts… the pitcher “win” is actually a pretty odd (and seemingly useless) stat…).

What made those statistics appealing is that they were fairly easy to compile and communicate.

But there was a problem: traditional metrics didn’t tell enough of the story. Perhaps a pitcher earning a win had more to do with them pitching for a team that scored a lot of runs. Perhaps a hitter with a lot of home runs played in a home ballpark with shorter fences. Perhaps a hitter with a higher batting average rarely walked and hit into a lot of double plays. The traditional metrics became difficult to trust (especially to an owner deciding to commit tens of millions of dollars to a player). So, new statistical measures were developed that attempted to factor in all of the nuanced information that baseball can provide. (Read all about it…)

Education is dealing with a similar issue. We are trying to become a data-driven. We want to use data to tell us what the education world is like. So, we started measuring things. Important things… like literacy.

Well, this floated across my Facebook wall…

Literacy was at 98%, huh?

Literacy was at 98%, huh?

One can fairly easily derive the intended meaning of this meme. It would seem like “ParentsForLiberty.org” would like us to think that in 1850, the world was a much more educated place because 98% (of… something…) were literate.

But let’s examine the statement “literacy was at 98 percent.” Talk about a loaded statement! 98 percent of students could decode a text? 98 percent of eligible voters could read the ballot? 98% of families owned a book? What does “literacy was at 98 percent” mean?

Maybe there was a test, and 98% of the kids passed it, which is good, except perhaps before Massachusetts made education compulsory, only kids who could read went to school.

Literacy is complicated. There are some parts of education that are easy to measure. (For example, attendance, homework completion, correct multiple-choice answers, grade-point average). We’d like literacy to be easier, so we invest in tests like DIBELS that attempt take a student’s literacy and work it down to a set of ratings that are easy to communicate. The ACT does the same thing with college readiness. College readiness is complicated, too, but reading an ACT score isn’t complicated.

We’ve tried to quantify as much as we can. We’ve tried to quantify student performance, teacher performance, curriculum performance. We want to know how well they are working. We want to know where we are being successful and where we are letting our students down. That’s a good thing.

The problem is that there are some incredibly important parts of education that are very difficult to measure. Like, impact of an individual classroom management strategy on student achievement, student engagement, scheduling classes to optimize student achievement or the role of extra-curriculars. These are HUGE questions with answers that are not easily quantified. And most school districts are without the means (time, money, qualified personnel) to do an in-depth analysis necessary to achieve a well-rounded look at a complicated issue like overall student achievement for each student each year. So we substitute with some easier to achieve metrics like DIBELS, an ACT score, and grade point average.

And those have become our Pitcher Wins, RBI, and Home Run. They don’t tell us nearly enough of the story.

Where’s our sabermetrics? Where can education go to see the stats that can combine to provide a more three-dimensional look at our system, our teachers, and our students? I understand why baseball got the first turn with the statisticians. There’s way more money in it. Maybe some of you stats folks who have decided that your financial future is secure wouldn’t mind e-mailing me. We’ll sit down. I’ll share with you the data I have (a ton) and we can develop some formulas that produce some metrics. Maybe you can tell me how well that curriculum program is working? How about what kind of environment a particular student performs best in? Which types of literacy patterns are strong predictors for future struggles in mathematics or science?

I look forward to hearing from you.

Snow Day Fever



Some kinda weather we’re having, isn’t it?

One of my colleagues posted on this facebook that today was the 8th snow day of 2014. We’ve only had 16 scheduled days of school! That is a perfect one-to-one ratio of days off to days in for the first month of 2014. As my friend Josh flatly put it: “That is not a small amount.”

Making matters more interesting is the fact that finals and semester break happen during the middle of January. So, this snow has done more than save my fuel costs. It has forced schedule updates, which has meant all sorts of other issues. 

Snow days have always caused frenzy, what with arrangement for child care, late phone calls for school employees with longer commutes, hourly employees scrambling to balance budgets missing half their hours for the month, and the like. Social media being what it is, it seems like all of those issues are being intensely reflected on (or at least vented about) these days with snow day after snow day after snow day.

Various social media thread reflect different viewpoints, of course. Many folks (mostly teachers, homemakers, and 2nd shift workers) are rejoicing with the unexpected time with their children and opportunities to catch up on chores. 

Other folks are intensely asserting that this stretch is evidence of how weak we have become as a people. This seems to carry with it the memories that many have of having to get to school in conditions every bit as bad, or worse, than these.

I will say, that it seems like school are more careful this year that in previous years. I remember driving to work in previous years with temps significantly below zero. I wonder what the windchill was on the day I took this photo?




That absolutely isn’t an implication that we should or shouldn’t be having the snow days we are having. Without question, the number of wrecks on the freeways, the significant winds, and bitter cold air are making my 40-mile commute to work completely undesirable. 

In general, I think that schools are making an emphatic statement that we are, first and foremost, concerned about the safety and well-being of students. Maybe schools are being too careful. Maybe. But, how many winters have we weathered Snowpocalypse, The Ice Storm and The Polar Vortex all before February 1st?

High School Calculus – Update after teaching my very first semester…

Well, I am officially through one semester of my first calculus class as a teacher. Before I get into any theories or needed revisions (and there definitely are some…) I want to simply make some observations and thoughts after some reflection:

#1. There’s no substitute for having an answer key made showing all work (multiple processes, if possible) for every single handout you give. The students seem to embrace the idea that I was relearning the calculus, but they got edgy if they got a sense that I didn’t know what I was doing or I was unprepared.

#2. Borrowed and stolen resources are helpful as resources, but you need to make your curriculum your own. I found that even making small edits to the handouts I got from Sam (@samjshah) or James was enough to invest my mind creatively in them more, which made me so much better able to embrace the holistic value of each and every activity.

#3. Advanced math students are never more than one frustratingly-difficult assignment away from behaving just like their struggling counterparts. All of the avoidance, procrastination, off-task, distracting behavior we are accustomed to from the strugglers will show up in any teenager if the math makes them feel overwhelmed and intimidated. (I will admit, “When will we ever use this?” is a question I never expected to get from a student who signed up for Calc voluntarily…)

#4. Advanced Placement might not be all it’s cracked up to be. My class is not AP. Surprisingly, the students spoke rather decisively that they were, at times, deterred from AP classes (for a variety of reasons). AP ties your hands a little bit in the schedule and topics. At the very beginning, I offered an AP prep schedule for anyone who thought they might want to take the AP exam and not a single student took me up on it. And when you consider that AP classes come with questions of grade-weighting, and exams, and GPA and blah… blah… blah (the effects of which are probably overstated anyway…) It seems like making this an honest, in-depth, investigation of calculus for the sake of investigating calculus seems to create the most favorable environment for student risk-taking and teacher responsiveness.

#5. Logarithms and radicals: these two things never quite seem to settle in for students.


Now, I am keenly aware that each group is different, and should I be blessed with this opportunity next year, many of these observations could need some updating.


… but maybe not.

The 70-70 Trial

Education is a world with a whole lot of theories. Intuitive theories at that. I’m sure it’s like this in most professions. We are seeing an issue. We reason out what the problem seems to be. We determine what the solution to our supposed problem seems to be. And we implement.

The problem with that is problems often have multiple causes. Solutions are often biased. Results have a tendency to be counter intuitive. For example, a paper recently published suggested that the increase of homework might actually cause a decrease in independent thinking skills. This probably isn’t a conclusive study, but recognize the idea that if students aren’t demonstrating independent-thinking skills, prescribing a problem-for-problem course of study for them to do on their own might not be the best solution.

This leads me to a trial that I am running in my classroom for a semester. I have four sections of geometry. I am going to leave two as a “control group” (very imprecise usage, I’ll admit) that will run exactly the same as they did first semester. The other two will run “The 70-70 Trial.” This is one of those theories that has gotten tossed about our district many times. It seems intuitive. It seems like it addresses a persistent problem.

The theory goes like this: If you go into a test knowing that 70% of the students have 70% or more on all the formative assessments leading up to the summative assessment, then we know that the students are reasonably prepared to do well on the test. If you give a formative assessment, and you hit the 70-70 line or better, you move on with your unit, business as usual. If you miss the 70-70 line, you pause on the unit until enough of the class is ready to go.

This seems reasonable to some, and ridiculous to others. Our staff meetings have seen some pretty intense discussion over it. Proponents lean on the logic. How can a group of students with high scores on formative assessments struggle on summative assessments? Opponents speak to the time crunch. When do you decide to move on? You can’t just keep stopping and stopping forever? You’ll never get through the material. Both seem like logical points…

But, as far as I can tell, no one has tried it to see what would happen. So, I figured that I had two classes that really struggled their way through first semester. It became very hard to energize and motivate these students because of how difficult they found the material. Perhaps shaking up the classroom management and unit design will add a bit of a spark. These two classes will be the focus of the 70-70 trial. I will use this blog as a way to record my observations and entertain any ideas from people who are looking to give me ideas to help this idea work.

This starts one week from today. I don’t know if it will work. I have my guesses as to what I think will happen, but I am going to keep those to myself. I absolutely want to see this work because if it does, that means my students were successful. My chief area of concern is what to do when 61% of the students score 70% or better (for example). By the rules of the trial, I can’t go on. I need a reteach day, but over half the class finds themselves ready to move on. What do I do to extend the learning for those students, while supporting the learning for those who need some reteaching and another crack at the formative assessment?

These are the kinds of things I will be looking for help with. Thank you for being patient and willing to walk this path with me. I will look forward to hearing whatever ideas you have.

Oh, Data’s driving the decision-making, all right…

Data… oh data… 

Schools, teachers, administrators, school boards, are being asked left and right to “use data” to drive the school improvement process. I put “use data” in quotes because that is term… “use data”… is becoming about as commonplace and vague as the term “school improvement” itself.

Perhaps if we had a better understanding of what it meant to “use data”– personally, I’d like to add a few words in there. How about instead of “use data”, we focus on “using GOOD data WELL“?

Here’s what I’m saying: teachers all over Michigan are being told that they are being evaluated, in part, on how well they “use student achievement data to make instructional decisions” (or something like that…). So, that means… what, exactly?

Are we talking about class averages on summative assessments driving comparisons between two instructors teaching the same course?

Are we talking about teachers deciding to slow down and reteach because of low formative assessment data?

Are we talking about teachers making decisions about what to do with the last two weeks of the semester because their students’ grades are lower than they’d like and they need to do something to inflate them?

All of those examples are decisions made using student achievement data. Are they all effective uses? Are they all using good data well?

I’ll tell you when this hit me. I was preparing a written report summarizing the summative assessments at the end of a geometry unit (a requirement in our district) and I was describing what I thought was contributing to my students’ low unit test scores. In general, this particular test is usually a tough one for the students. It is the first time they’ve seen a math tests completely devoid of number-crunching (gotta love the proving part of geometry!) and that leads to some fairly predictable avoidance behaviors. That is, students avoid practice AND lack of focus on the feedback on their formative assessments, two things that are going to compound the frustration on what is already a frustrating unit. But, alas… I have yet to provide any data to support this conclusion. (Remember, in Michigan the evaluations are becoming more and more focused on how teachers use data to make decisions.)

So, I thought of something. I embrace the practice of allowing students to retake assessments. Now, the nice thing about this potential data set is the retake process is VOLUNTARY! So, the frequency of retaken assessment can give us some indication as to how engaged the students are in one non-mandatory achievement support mechanism.

So, how many formative assessments were retaken during the unit leading into the test that demonstrated the low results? 1.1% (4 retakes out of 360 student-assessments).

Is that a meaningful data set? Well, the students performed a ton better (on average) on Unit 1 test, and the retake rate was up over 12%. Not as well on Unit 2, retake rate 6-ish%, and now quite poorly on Unit 3 Test with a retake rate 1.1%.

There appears to be a correlation (although with only three data pairs to consider, it isn’t really something worth talking about), but which came first? Is the material more difficult, so it drove down engagement in the retakes? Or did the reduced engagement in the retakes drive down achievement?

Quite frankly, I don’t know. But, I have data.

And I know this: My conclusion and subsequent instructional changes are going to depend quite heavily on how I answer the questions I just asked. If I feel like low engagement in the retakes is a cause of the low achievement, then my changes are going to be motivational and structural, with the goal of getting more students to use the feedback on the first-try formative assessments to prepare for a second-try.

If I feel like low engagement in the retakes is a symptom of my instruction being poor during the unit, then I will need to create/steal new activities to drive my instruction.

Oh, and none of that answers (perhaps) the first important question: Is formative assessment retake rate even a useful data set? (I don’t know the answer to this question either, by the way.)

There, I’ve used data to drive my decision-making… sort of.

Question: is this really the work we want our teachers doing? I can see the possibility for a variety of valid arguments from that question. If so, what guidance can we give them deciding what data sets are effective? And how to use them?