Asking more of our data

"What Went Wrong" is an impressive piece of data journalism, but the reporters left valuable statistical inference on the table. The report rests on the idea that storm intensity alone did not predict home destruction from Hurricane Andrew; rather home construction practices had a significant impact on the scale of destruction.

The piece that the report misses here, however, is an analytic assessment of how variable Hurricane Andrew’s destruction was – in other words, how big a role did construction really play in this story? We are given extensive examples – maps, charts, anecdotes, eyewitness accounts – but we never get a number. And given the available data, that is a number that could have been produced: analysts could have built a baseline model that predicts destruction based only on storm intensity, and then compared its error rate to a model that includes other predicters like year built, builder, year of inspection, materials used, price, etc. This would give an empirical value to each of the factors discussed, determining how much of the variance in home destruction is explained by each of these predictors.

Readers are instead asked to come to their own conclusions about the impact of these constructions flaws, and the reporters make their argument with images and extensive reports. For most stories, and perhaps for this story too, that is enough. But given that the Herald went to the trouble of collecting millions of data points, I wish they had grounded their motives in a quantitative analysis.

Show Comments