I've been playing with a lot of temperature data lately: you can see a short Python analysis of temperatures over at my personal blog. At work (at the University of Minnesota) I'm working with some masters' students on research into the finance of weather derivatives and catastrophe bonds, so I've been thinking a lot about temperature, El Niño, snow in Siberia, etc...! I've also been thinking about how to teach probability and statistics.

So, this post is about using the normal distribution and spreadsheets to deal with real data! November has been very warm, even though I picked the coldest days to go winter camping. How warm is our November here in Minneapolis-St. Paul?

Here is a Google spreadsheet with the minimum and maximum temperatures for every November 8 since 1970:

Google spreadsheet: max, min temps, with precipitation and graphs, Nov. 8

Included are histograms of max and min temperatures, and a scatter plot of max and min against each other. In the worksheet I just wrote, I asked students to create histograms for the maximum and minimum temperatures and look at those histograms to discuss what probability model might be appropriate. I only give the suggestions of uniform and normal, so it's very simplistic. The second page asks students to find some probabilities assuming a normal model.

Simplified spreadsheet: max & min only, Nov. 8

Immediately above is a clean version of the spreadsheet without graphs or additional data, and here below is a version showing what I'm asking students to do to find the standard deviation:

Spreadsheet with standard deviation calculations

You'll need a z-table to find the probabilities, as well.

I think this is a good conversation starter in a class in which you want to get students to work together and argue about whether a normal distribution is a good model, and to get students to talk about what this information means. For instance, in the discussion about global warming/climate change/climate instability, a lot of little anecdotal facts are thrown around: "Last winter was really cold!" or "But it's a really warm fall!" In isolation these facts really don't tell us much about the global patterns in temperature, and we need to make that clear to students on a scientific level. Snow in Iowa doesn't mean global warming is a hoax. At the same time, local changes are really important to the food we're able to grow and the activities we are able to enjoy. Local changes in snowfall and precipitation change our drinking water availability and when we can plant corn. Local changes in heat influence how much milk cows give.

I'm not satisfied with this worksheet yet: it does its tiny part but doesn't go very far or give much of the big picture. Ideal improvements:

- test if a normal distribution is a good model or not
- do some linear regression on max vs min temps
- see if there is a temporal shift in the distribution of maxes and mins

Reasons I haven't done these things:

- hypothesis testing and model validation are complicated topics, and I haven't though much yet how to teach them
- time
- I played around with the temporal shift and can "see" one, but again haven't thought about how to teach the mathematical or statistical analysis of this idea.

Have any of you taught hypothesis testing in a lower-level college class or talked with your high school or college students about how to select models? Anyone else do exploratory data analysis for fun?

Well, there's snow on the ground and the night is pretty cold. Time to eat leftover pumpkin custard and watch movies or something 🙂 Hope you all had a happy Thanksgiving!

*Feel trapped by boring fake word problems in your math textbook? Get intros to real-life issues in the natural world and see math at work.*