Boston Rambles

Boston Rambles

A Rambler Walks and Talks About the Hub of the Universe

The Problem with the Silver Standard

As Election Day approaches all manner of wild and crazy claims are made in a desperate hope to get one group or another to the polls or, as I discussed in previous entries, to entice some voters to stay home. One of the major methods of attack is to cite polls showing one or the other candidate winning, losing, catching up – whatever fits the narrative. Fortunately, there are people out there who have tried to reduce the impact of this madness by trying to put polls into an overall context from which a more reasonable prediction about the potential outcome of an election can be made. They are NOT Pollsters, they are individuals who aggregate polls, perform some statistical analysis on them, and try to come up with a number which can be easily digested: for example Clinton has a 65.0% chance of winning the election, or Trump has a 35.0% chance of winning the election. The number is easy to grasp, and is meant to be a summary of all the data out there, compressed and converted to a straightforward number that moves with the data as it comes in.

No person is more closely associated with this application of serious statistical analysis to election results than Nate Silver, the founder of the Fivethirtyeight website. He is widely regarded as something of a guru and, especially for those who need reassurance about their candidate’s chances or need a dose of reality about the likelihood their candidate will lose, he is a calming port in the madness of the media storm. He is widely seen as nonpartisan, and interested in the question of accuracy rather than trying to pick sides, except to a lunatic fringe who seem to think numbers are all the work of the Devil.

So it came as a bit of a shock to many readers of his site to see the odds on the chances of a Hillary Clinton victory fluctuate wildly in recent weeks, especially as many other websites, all of which can be seen in the useful aggegator of pundit and predictor websites the Upshot in the New York Times, seemed to be relatively stable.  This has set off a furor in the circles of people who care about these things: for instance

 What’s Wrong with 538?

or   Nate Silver Is Unskewing Polls-All Of Them- In Trump’s Direction

or   Nate Silver rages at Huffington Post editor in 14-part tweetstorm

or   Is Nate Silver Right?

or    What the Latest Nate Silver Controversy Teaches Us About Big Data

or    Poll Averagers Are Having the Wonk Version of a Knife Fight. Choose Your Side!

or     Nate Silver’s Very Very Wrong Predictions About Donald Trump Are Terrifying

 

I have in some of my recent entries about politics, also taken swipes at Nate Silver, for reasons I will discuss momentarily. However, my antipathy to the type of model 538 uses to make predictions has bothered me for years, going back to at least the 2008 election. Don’t get me wrong, I am extremely grateful to Silver for bringing quantitative analysis and a measure of rigor to polling, for it has taken a sea of innuendo and misinformation and pulled out the essential information needed to make at least some sense of the mountain of data tossed at us from all sides. However, I have long been interested in patterns of demography in both the United States and beyond (essentially since I was a child as my sister Cheryl, who is a devoted reader, can attest), and I have always preferred to come at the question of data from the point of view that we have actual hard information already in existence that can be extremely useful.

For example, I am a big fan of using Census data and past elections to create potential voting scenarios and then begin to look at how the specific campaign might affect the vote one way or the other. It has the benefit of using not opinions, but actual facts. Fact: Philadelphia voted for Obama over Romney by 85% to 14% in 2012. Fact: Philadelphia has slightly under 1.1 million registered voters. Fact: Philadelphia has about 1.5 million residents. And so on: college degrees, racial makeup, income per person, birth rates, death rates, citizenship figures and, most importantly, how these figures change over time. Polls, on the other hand, are not facts, they are a collection of opinions. They are the result of asking a small slice of the population who they might vote for at a time in the future, if they vote and if they are telling the truth and if the data are not skewed by the questions asked, by the political leaning of the pollster, by the weather, by the news of the day, by the conversation you overheard in a bar, and so on.

This contrasts with Silver’s approach in a fundamental way: his approach is to take the polling data as the principal data source, and adjust the data slightly to account for demographics etc. I am going to throw out a number that reflects his method: 94% based on polls and 6% based on demography, which I have seen on the website, but I don’t know if it is entirely accurate or not. It does not matter: for one thing I have read all the information about how they come to their conclusions and admit to being extremely hazy about the actual purpose of much of the adjustment that goes on, so I do not want to throw out a number and be accused of being incompetent or stupid for not getting it right. Also, I really do not want this entry to devolve into a detailed discussion of statistics because it will cause the argument to become shrouded in complicated pettifoggery that will turn off the reader. All I will say is that Silver’s approach is to primarily use polling data to make predictions while my preference is for using past data to make future predictions. Both have potential downsides, but I prefer my method because I know at the very least that if I am wrong, I can go back and compare the new data to the old data as they are both comparable data sets. Silver cannot really do this as polling and pollsters in general is a slippery set of data and data sources.

Thus, I have often thought that, while I agree that polls are telling us something, they are not useful without a heavy input of actual facts. The main issue I have had with Silver’s analysis is that each new daily, hourly, or minute by minute update is not a reflection of the actual intentions of the voting public, but rather reflects the, for lack of a better word, ‘mood’ of the election at a given point. And his graphs of likely probability of a Clinton or Trump victory are quite volatile. As you can probably surmise from my stated interests above, my data is worthless if people are as fickle as his polling data seems to suggest they are. I just cannot believe that some large fraction of the voting public is as indecisive about the election as his data implies. This is why watching his graphs these past 6 months has been incredibly irritating to me.  If Trump is the favorite he is the favorite, if Clinton is the favorite she is the favorite, but the implication that it really is quite variable seems far fetched.

However, until the election actually happens, I can’t say whether I was justified in my sense that there is a flaw in Silver’s methodology, that perhaps the premise of his model needs adjustment. I have poked and prodded at various portions of his model and I have come up with a number of assumptions that either I cannot understand, or disagree with. But, as long as there is no hard data to contradict what Silver is projecting, I have no way to make a cogent case for my methodology. It could be argued that Silver is predicting a Clinton victory just as most, if not all, prognosticators and pundits are, but one cannot help feel that the sudden drop in the numbers relative to all the other prediction websites reflects either a volatility that does not exist, or worse, a convenient manipulation of the data in the event the election is tighter than predicted.

I would not have written anything about this until after the election, except that after I wrote my last piece a number of things happened that made me sit up and take notice. First of all, one or two articles popped up essentially taking Silver’s model to task, which delights me because now this is an argument that will be addressed by many mathematically oriented individuals and perhaps will result in improvements in election data analysis. However, Silver then went on a bender, sending off FU tweets and assorted ad hominem twitter attacks (with a remarkable number of mispellings!) at the Huff Post writer and analyst who wrote the initial critique and now it has quickly devolved into a war of words (to be fair on Silver’s side only; the Huff Post seems to be taking a ‘let’s see what the election results are’ approach) that threatens to undermine the entire process by reducing it to a nerd brawl that others will make fun of rather than to advance the interests of people who think numbers can be useful to understand phenomena such as elections.

The second and, more important in my mind, thing that happened was the publication of early voting data analysis by Jon Ralston, a longtime analyst of Nevada elections, in which he said that the data showed Trump was very likely to lose. Silver’s website has had Nevada, like the national prediction, oscillating wildly for months, between 80% Clinton likelihood to a 50 something % Trump likelihood of victory. Unfortunately for Silver, as recently as October 24 the 538 prediction was at 75% Clinton, after which it plummeted until, as of this past weekend, it favored Trump by 50.5 % to 49.5% over Clinton. This coincided almost exactly with Ralston’s report, so the timing was unfortunate as actual data seemed to suggest an imminent Clinton victory in Nevada at the same time that 538 had suddenly switched from likely Clinton to Trump favored.

Things would have been fine if Silver had then said something like “well, we have to adjust the model to take other factors into account etc etc.”. Instead, a surrogate ‘whiz kid’ at 538 named Harry Enten, wrote a piece on the 538 site about Nevada which, rather than acknowledge they might have screwed up their model, doubled down by essentially saying Ralston is probably wrong. Maybe he is wrong, but the article was touchy to say the least, and desperately looked like an attempt to both CYA and attack somebody simultaneously.

Again, inside baseball to most, and I agree it does not seem like much to get worked up about in the grand scheme of things. However, I got to thinking about the results and predictions and had a look at Nevada myself on the 538 website. I did not like what I saw.

Let me explain.  Last night (Sunday) the odds had moved even further in Trumps’s direction: 48.9% Clinton to 51.1% Trump. There are three sets of predictions in the 538 analysis, the details of which I will spare you; suffice to say that all three versions had Clinton losing to Trump by around the same percentages.  I looked carefully at the section entitled ‘from a poll to a forecast’ the section where they try to explain how they get to their prediction, and the first thing I noticed was that the very first number at the top of the chart, Poll Average, had Clinton ahead by 44.5% to 43.5%. However, by the time the data went through the ringer, Clinton had 46.6% and Trump had 46.8% under the section entitled vote share if election were held today, a 0.2 % lead for Trump after being behind by 1.0%, so a net change of 1.2% in the direction of Trump. A bit strange, but I was willing to accept that they had reasons for their interpretation.

What could account for that change? There are numerous factors involved in ‘adjusting’ the data, but the ONLY one with a significant impact on the result that was readily seen on the chart was what is called Trend Line Adjustment. Whatever it means, and I really do not want to get into the weeds in order to make my argument as clear as possible, Clinton’s trend line adjustment was -0.3%, while Trump’s was +1.7%, a difference of 2.0%, which more than accounts for the change in the polling numbers, regardless of what other adjustments were made to the data.

Fair enough, 538 thinks for some reason the polls are off by 2.0% and adjusts for that, giving Trump the lead when all the data is processed. That is within their model parameters and it is what the model says, so if they are wrong they are wrong but at least they are consistent.

However, I looked again at 6:30 am EST on Monday while drinking my coffee (I am obsessed as you might have noticed) and the odds had now changed to put Clinton ahead at 51.8% chance of winning to 48.2% chance for Trump. When I looked for updated polls to account for this, the only Nevada update was a poll by Remington listed at 10:50 p.m. Sunday night, which had Trump ahead by 1 point at 46% to Clinton 45% adjusted to make a tie by 538. So how did this put Clinton ahead? The poll average for Clinton changed to 44.8% to Trump’s 43.6%, probably because a previous Remington poll dropped down the importance order and it had Trump at 48% to Clinton’s 44%. Except, their was NO adjustment in that poll for Trump as there was in the later poll. It turns out Remington is associated with a group called Axiom Research, a conservative consulting group run by Ted Cruz’s former campaign manager. For some reason it has no letter grade attached to it’s name so I am unsure if that means that the poll is accepted at face value or that it discounted for having no grade; regardless, it belongs to a type of polling I fear is done deliberately to muddy the waters as election day approaches, and yet Silver did not think there was any bias in the poll at the time although, interestingly, he did give a 1% Trump lean to last night’s poll. Also, I am not sure how, if you take 48% out of the average, and replace it with 46% in the average for Trump, his polling average goes UP to 43.6% from 43.5%.

  However, most importantly, I noticed that the data point called Trend Line Adjustment had been ‘readjusted’: Now Clinton was at -0.2% and Trump was +1.4%. This change alone was a change of net 0.4% in the adjustment formula and, unsurprisingly, the final vote share data had Clinton at 46.9% from her previous 46.6% while Trump had dropped from 46.8% to 46.6%.

A miracle!!! Clinton now leads in Nevada on the morning before the election, despite the only poll coming through in the intervening period is a poll showing Trump winning! I am not saying there is deliberate funny business going on, just that it it VERY INTERESTING that the trend line adjustment, which adjusted the data 2.0% in Trump’s favor suddenly dropped to 1.6% overnight. Of course had I not written all this down on paper last night and this morning I would not have been able to go back and compare the data. But write it down I did and, although there has subsequently been new polling data from Nevada that favors Clinton, which has further pushed up her chances to 53.0% as I write, I am deeply suspicious of a model that changes an adjustment factor by 20% overnight for no clear reason; it is not as though I can look at Trend Line Adjustment and see that oh, they took Jon Ralston’s report into account and changed the parameters. It looks from the outside like sausage making.

Further, the newer polls that have come in subsequently this morning have both been favorable to Clinton, but the Trend Line Adjustment has moved up to 1.5% for Trump and 0.1% for Clinton, which seems strange, first of all for Trump’s number to improve given that his polling numbers have dropped and secondly, interesting that only two tight polls can move Clinton 0.3% further in the adjustment. There is too much movement in factors that are too obscure to make any sense of and, consequently, I am beginning to think they are what, in my scientific days, were called fudge factors, used to massage the data to make it fit a desired outcome, not something that was said with admiration.

Perhaps tomorrow will prove Silver correct, that the cautiousness he has shown about making conclusions based on volatile data was warranted. I don’t care. I think that he should be absolutely transparent about the sausage-making and let the crowd decide if what he does to ‘adjust’ the data is legitimate or is pushing the finger down on the scale to have an outcome that best serves his purpose. If tomorrow he claims he predicted Nevada would be blue I will have serious doubts about the veracity of the model he presents and it will affirm my long held suspicion that his modelling merely serves to reflect the state of the mood of the election and not the actual outcome. If Sunday night Trump was favored in Nevada and Monday morning Clinton is favored, despite no new evidence in polling to support the change, it starts to look like somebody wants to be on the right side of history rather than that they want to create a useful mathematical model.

I have made predictions that may be right or may be wrong. My most recent projection of the likely electoral college results is quite favorable to Clinton, more so than almost any I have looked at. This is because I made the prediction in 2012 and want to see how it does in 2016. I could have waffled and tried to pull Ohio and Iowa over to the red column for instance, as polling seems to show. But I did not do that, because I am testing my hypothesis and need to know what works and what does not work. I do not care if I am ‘more right’ than Nate Silver or Sam Wang or whoever. I want to know if my approach to data analysis is useful and if so, how I can make it work better going forward. My opinion is that the polling analytical methodology used by Silver is too reliant on polling and needs to incorporate known data into the model, and to do so in a more transparent manner. What bothers me about this whole affair is that I fear I am seeing what appears to me to be evidence of tampering with the model in order to be right rather than to test a model.

I do not purport to be a soothsayer or to have my finger on the pulse of the electorate. I want to use the elections to test my hypothesis that demographic data is useful in helping us to figure out what is happening in the country and what might happen as the makeup of the inhabitants of the country changes. My goal in making predictions is to try to see how the country is changing, year by year, state by state, county by county, city by city, precinct by precinct. My entries on Ward 6 precinct 9 in South Boston and on Ward 14 precinct 3 in Grove Hall were deliberately written to compare and contrast quite different neighborhoods in Boston, both in the past and in the present, so that in the future somebody (maybe me) might look back and say “so that was what things were like at that point in time, how have they changed?” I like to think of these predictions as open-ended conversations about the constantly evolving country and how it evolves little by little. The quadrennial presidential election is merely a marker of what has changed over time.

To be sure I am interested in the results for personal reasons and for the well-being of both the country and the world (Please, not Trump, please, not Trump!), but I think of elections as large collections of data points that help us understand the direction the nation is headed and where it has come from. Mine is an optimistic world view predicated on the notion that we are a complex multicultural country that has always somehow managed to integrate all the manifold differences in, if not perfect harmony, at least in sufficient harmony that we can get along together in the collective purpose of improving our lives. Change is happening and has always happened, it is THE defining characteristic of the United States of America. It is what makes this the greatest country on Earth and it is what drives ALL of the entries in this project. Even the entries about old roads are more about how the roads came to be what they are rather than a simple history of what they were. Old roads are static, but the roadsides have changed a great deal and the dynamic of that transformation is the story of this project.

******

One final addendum: I have noticed as the day progresses that 538 averages are shooting upwards both nationally and state by state. One small piece of data about Florida caught my eye. A new Trafalgar poll (dated November 6) puts Trump up by 4 points at 50%-46%. This has been adjusted by 538 from Trump +4 to Trump +1, in order to account for the fact that Trafalgar is yet another conservative polling outfit whose polls reliably skew towards Trump. The previous Trafalgar poll (dated October 31) understandably dropped in importance. Interestingly, this poll also had Trump up by 4 points, at 49%-45%, but the adjustment was only from Trump +4 to Trump +3 at the time; Apparently Trafalgar got 2 points more conservative in a week’s time. Again, a small and perhaps justifiable change. But why so suddenly and why so big a change, especially as it clearly helps Clinton by discounting the poll a couple more points and comes after many articles about the likelihood of Florida going Democratic based on the results of early voting.  I repeat: I could understand if the data moved in a logical and consistent direction. However, the ‘adjustment’ factors seems to be ‘adjusted’ every time new ‘hard’ information comes in, but that is not clearly shown in any place the reader might be able to say “okay I get it”.  Nate Silver will surely never respond to my comments. However I want them on record before the election, so that, in the event I get ad hominem attacks I can at least say I did not make this stuff up after the election to pile on with everyone else. I might be wrong about the election but I am pretty sure I have a leg to stand on in my critique of the artificial and arbitrary ‘adjustment’ factors used to influence the direction of movement of the prediction matrix.  If I do not, then tell me why not, in plain English, so all the world can see what is done in the sausage factory.

Leave a Reply

Your email address will not be published.

You may use these HTML tags and attributes:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>