Thursday, December 3, 2009

How to Lie with Statistics

Another interesting topic popped up on one of the boards I poke around on. The original poster listed up a story about how a local study deemed cycling to be more dangerous than driving. Seemed to all of us to be a pretty ambiguous claim, so I launched into my rehearsed diatribe designed to alert all interested parties to the anomaly known as "How to Lie with Statistics" For a play by play of my on!

I always wonder about these studies when I see them, as in my experience as an Urban Planner and Bike/Ped Coordinator, they seem to be nearly always manipulated to reveal a predetermined and flawed result.
1 - A study done here in the states determined that Louisville Kentucky and Jacksonville Florida (where I am) were the two least walkable communities in the continental US. Results were based on a simple ratio of total city municipal land area to total sum of sidewalks in miles. One problem though... both Louisville and Jacksonville are organized as consolidated governments with their respective counties, (a much larger and more rural boundary than the urbanized city centers.) In other words, Jacksonville and Louisville were punished in the walkability ratings for not having sidewalks on hundreds of miles of rural roads, where other cities without consolidated city/county governments were not held accountable for the lack of sidewalks on roads within their surrounding rural county lands, and therefor fared much much better for no reason other than bad assumptions and misguided methodology.

2 - A local University completed a study of traffic/pedestrian accidents in the coastal community of Jacksonville Beach. All intersection based pedestrian accidents with injuries or fatalities reported were included in the study. Intersections with the highest number of incidents of injury or death were targeted as needing infrastructure improvements. One problem though... There was no baseline data for total number of pedestrians using the intersections to consider the accident data against, so an intersection with, say 20 incidents of injury or death was targeted as being the most dangerous, even though it was arguably the most busy intersection for pedestrian traffic in the town, with thousands of pedestrian trips a day, whereas a much less busy intersection with an injury/death incident count in the single digits, was deemed less dangerous. There was no consideration of the total number of users of the facility in evaluating the odds of an accident, rather the number of accidents was the only factor considered in recommending the improvements. Luckily, the University only intended the study to be an undergraduate research experiment in GIS technology and mapping, but I still sent them, and their substandard project, packing anyway!

3 - Final example I promise! There was a recent walkability safety study done by a national fitness lobbying group here in the states, that determined that the top 4 most dangerous cities in the US for walking to work are all in Florida...Orlando, Tampa, Miami, and Jacksonville. The indicator here that something was wrong to me was that the cities are also the 4 biggest cities in right away I'm tipped off that there's probably another example of bad baseline data in the analysis. Well, it was worse than I'd imagined, and for reasons not at all related to size. In researching the methodology of the project, we discovered that the study collected ALL pedestrian fatality data and compared it to ONLY the sum total of persons who by survey claim to walk to work. ALL pedestrian fatality data...not just fatality data for persons killed while walking to work. One problem though... Here in Florida...where the humidity is almost always 80% or higher, and it rains 200 days a year, and the temperature, especially in Tampa and Miami, is 90 degrees F or more from April to November... very few people WALK to work. Plus, Florida is packed with retirees, who walk all over the place, are at a higher risk for pedestrian accidents because of their reduced mobility, and of course, as retirees...NEVER walk to work! LOTS of people in the state of Florida walk...but they walk for things that aren't schedule-dependent like getting to work on time. Rather, a higher than normal number of people in Florida walk to get exercise, and for recreation and for other reasons where trip arrival and departure times are not a factor. So if you take a state where many people do walk, but not to get to work...and you compare the total number of incidents of death for ALL those walking trips and compare that number to a subset of the walking community, work based trips, then you're most certainly going to unfairly categorize Florida cities as being more dangerous on the whole than others in the US.

Anyway...more long winded than I'd like, but there you go. Long story short, whenever you see these types of safety or statistical evaluations, it's important to look very closely at the hows and whys of the study, because many times these studies and their 'results' are completely full of poo.

Exiting soap box, stage left...


