I wrote a piece for the The New Yorker a few weeks ago about a group of people who have created a neural network that predicts (or tries to predict) the box office of movies from their scripts. (It's not up on my site yet, but will be soon).
The piece drew all kinds of interesting responses, a handful of which pointed out obvious imperfections in the system. Those criticisms were entirely accurate. But they were also, I think, in some way beside the point, because no decision rule or algorithm or prediction system is ever perfect. The test of these kinds of decision aids is simply whether--in most cases for most people--they improve the quality of decision-making. They can't be perfect. But they can be good.
In "Blink," for instance, I wrote about the use of a decision tree at Cook County Hospital in Chicago to help diagnose chest pain. Lee Goldman, the physican who devised the chest pain decision rule, says very clearly that he thinks that there are individual doctors here and there who can make better decisions without it. But nonetheless Goldman's work has saved lots and lot of lives and millions and miillions of dollars because it improves the quality of the average decision.
Is the average movie executive better off with a neural network for analyzing scripts than without it? My guess is yes. That's why I wrote the piece. I think that one of the most important changes we're going to see in lots of professions over the next few years is the emergence of tools that close the gap between the middle and the top--that allow the decision-making who is merely competent to avoid his errors to be reach the level of good.
I think the same perspective should be applied to the basketball algorithms I've been writing about. It is easy to point out the ways in which either Hollinger's system or Berri's system fail to completely reflect the reality of what happens on the basketball court. But of course they are imperfect: neither Berri or Hollinger would ever claim that they are not. The issue is--are we better off using them to assist decision-making that we are making entirely judgements about basketball players using conventional metrics? Here I think the answer is a resounding yes. (Keep in mind that I live in New York City and have had to watch Mr. Thomas bungled his way toward disaster. I would think that.)
And the reason that lots of smart people, like Berri and Hollinger and others, spend so much time arguing back and forth about different variations on these algorithms, is that every little tweak raises the quality of decision-making in the middle part of the curve just a little bit higher. That's a pretty noble goal.
That said, here are the latest updates on the Hollinger-Berri back and forth. And remember. I don't think this is a question of one of them being wrong and the other right. They are both right. It's just that one of them may be a little more right than the other.
Here we go. First Hollinger's response, courtesy of truehoop.com, (an excellent site by the way.)
And then. Berri's response.
I think the other concern with decision-making formulae is a subtler one: that over time, we might become compliant on rules and afraid to deviate from them when creative responses are necessary. While an algorithm might make a reasonably intelligent, mostly thoughtful decision-maker better, on average, it might also make a less intelligent, close-minded decision-maker tragically limited in the rare cases when the script doesn't work. Our fear of being told, say, "I really wanted to give you that new kidney, but the formula said I shouldn't" is tremendous -- it's the same reason we don't rely on computers to negotiate peace treaties, though they'd probably be more dispassionate and effective.
Posted by: Gwydion Suilebhan | November 26, 2006 at 10:12 PM
In Technopoly, Neil Postman expresses Gwydion's concern above, except in past tense. His thesis is that we have become so dependent upon technology tools, we are now subservient to them.
Medical diagnostics, for instance, quantize a patient into a matrix of numbers. These are very helpful numbers, of course, but they have a subtle, insidious effect. The role of the doctor used to be to listen, understand, and consider; the new role is almost as an accountant, to ensure that columns of numbers add up. The doctor listens more to the tools than to the patient himself. The doctor trusts the tools more than he trusts his own judgement. The process is dehumanizing.
The solution is not to abandon diagnostics and decision-making tools, of course. But the users of these tools must be trained, foremost, in the tools' limitations. The users must maintain skepticism and common sense, and always attempt to reconcile the tool's recommendation with their own judgement.
Posted by: Bret | November 26, 2006 at 11:45 PM
Aaron Schatz of Football Outsiders has a good saying for this concept: The best is the enemy of better.
Posted by: Kevin Pelton | November 27, 2006 at 12:31 AM
Malcolm,perhaps you'r a tiny bit more right than me.
Posted by: Dane | November 27, 2006 at 12:51 AM
"
The doctor trusts the tools more than he trusts his own judgement. The process is dehumanizing.
"
Oh give me a break. A problem with the US is that everyone expects every other person they deal with, their doctor, their teacher, their lawyer, their parole officer to be their best friend.
Look, if it's your dime, you're welcome to pay your doctor $150 an hour or whatever the rate is to be your friend. When it's the state paying, or insurance (ie everyone else), go find your own friends and let the doctor do his job.
The entire history of the GWB presidency is one big warning about the dangers of people being extremely confident in their own opinions. Every other person in the world did the equivalent of showing the guys statistics, science and decision trees; and these guys told us, "No, trust us, we have plenty of experience and judgement in this area".
The issue is not how confident a doctor is in his opinion, the issue is how many times is he correct, compared to the statistics vs how many times is he wrong.
Posted by: Maynard Handley | November 27, 2006 at 12:57 AM
While it's tempting to criticize rule-based judgment as being too "automatic," or "impersonal," the fact is that the reason they work is that they make decisions more regular. Humans tend to overestimate the probability of low probability events when they have serious consequences. This certainly applies in medical decisions, multi-million dollar contracts, hurricane evacuations, etc. As a result, relying on a rule improves our judgment precisely because it takes away our tendency to find and act on exceptions. When average doctors apply rules most of the time, but violate them when they feel it's necessary, the rules don't help them improve. Of course, who thinks they're an average doctor, or go to one?
Posted by: David Rettinger | November 27, 2006 at 07:51 AM
Highly intelligent and trained individuals have serious trouble updating their belieft systems with Bayes rule.
Any technology which does this for us is an improvement.
Posted by: michael webster | November 27, 2006 at 08:46 AM
This post starts to address the most important feature of decision aids - that they are aids. As I wrote in a comment to the previous entry, we must understand the characteristics of the decision aid. What is the population in which the aid applies? How accurate is the aid? What other issues could influence our decision?
I wrote about this post today in my blog. I will quote some of that post to make some relevant points.
===========
Several key points in considering decision aids. First, we must interpret them given the patient’s context. Using the streptococcal pharyngitis rule that I developed, the score has different implications depending on the pretest probability of streptococcal infection (also know as the prevalence). For example, if a patient has two children with know strep throat, her pretest probability of having strep is much greater than if a college student comes in on Monday to student health after screaming all afternoon Saturday at a football game.
Next, we must understand what the clinical constraints are for the aid. My strep score addresses adult pharyngitis. The original study, and subsequent validation studies, had adult patients. Unless someone validates the aid for children, using it for probability estimation in children probably is a mistake.
Finally, we should not use the aid as our only source of information. One should still do a careful targetted history and physical exam. In the chest pain example, if the patient complains of severe tearing pain, and you hear an aortic insufficiency murmur, then the decision aid is irrelevant. In a sore throat patient, if the patient has marked unilateral swelling and tenderness, one must consider the possibility of peritonsillar abscess. The decision aid does not point to that possibility, we must reserve the aid for use after we exclude other issues.
The key point is that we should call them decision aids. They can aid us, but we should not use them except when the context is appropriate.
Posted by: db | November 27, 2006 at 10:16 AM
Thanks for that DB. I think that puts it beautifully.
Posted by: Malcolm Gladwell | November 27, 2006 at 10:57 AM
Seriously, Malcom. Their (Hollinger, Berri, Rosenbaum) back-and-forth is getting a little over the top. All of the formulas could be improved (and will be, I presume).
I think they all need to spend 2 days in person meeting to discuss/gain understanding (of) their derivations and get past the arguing.
Would you volunteer to fly them all to your place for a 2-day basketball statistics guru meeting? Maybe you and Henry can moderate.
Posted by: Westy | November 27, 2006 at 03:04 PM
Seriously, Malcom. Their (Hollinger, Berri, Rosenbaum) back-and-forth is getting a little over the top. All of the formulas could be improved (and will be, I presume).
I think they all need to spend 2 days in person meeting to discuss/gain understanding (of) their derivations and get past the arguing.
Would you volunteer to fly them all to your place for a 2-day basketball statistics guru meeting? Maybe you and Henry can moderate.
Posted by: fonts | November 27, 2006 at 07:29 PM
I read your story in The New Yorker and tried to apply some of the points to the current Will Ferrell movie Stranger than Fiction. Here is my analysis:
www.canada.com/nationalpost/artslife/popcorn/story.html?id=cf76643e-f33a-4a99-b1a5-753270300996
Today, Nov. 27, Stranger's box office total is US$32.5million, just lower than Eternal Sunshine's US$34-million, but far below Groundhog Day's US$70-million. My prediction, using your research, was that Stranger will end up between US$44-million and US$66-million. Obviously, this is an imperfect study, but it's accuracy is telling. I encourage doubters to take a new movie, find movies that are comparable (boxofficemojo.com isa good resource) and after seeing the movie see what aspects could have been tweaked to improve box office performance. I used a high (Bruce Almighty) and a low (Adaptation) to give me my parameters.
Posted by: Craig | November 27, 2006 at 09:21 PM
What's missing from this blog entry (and may or may not be missing from the article about the neural network) is a demonstrated understanding of the difference between a decision tree and a neural network.
A decision tree is a structure to facilitate decision-making. A neural network is a computer simulation of human memory.
With a decision tree, you can inform your decision making, make conscious descisions about which variables matter and which don't. It also exposes your decision-making process to scrutiny and provides opportunities for optimisation.
A neural network does none of those things. The best-case performance for a neural network that is trained to respond to film scripts is "This movie reminded me of Jaws, and there's some similarity to E.T. as well." It can't tell you why it's been reminded of those things, nor can it open itself to the (minimal) level of scrutiny we can have of our own thinking.
Consider that the more-likely response from the neural network is "This movie reminded me of some movies that have made money in the past" and you get an idea of the limitations of neural net technology.
Posted by: Nick Argall | November 28, 2006 at 12:57 AM
A couple comments here:
http://andrewchen.typepad.com/andrew_chens_blog/2006/11/autmated_models.html
I'd encourage everyone to check out this diagram - I think there's some relevance here:
http://img174.imageshack.us/img174/9195/expertperformancevo9.png
Posted by: Account Deleted | November 28, 2006 at 01:59 AM
that allow the decision-making who is merely competent to avoid his errors to be reach the level of good.
oh, malcolm. perhaps someone will devise a similar tool that will help you improve your proofreading skills.
i suppose once i read the piece i'll have a better understanding of how the network monitors public opinion in order to assess a movie's likely turnout. i have a feeling, though, that it would be similar to nick's idea--looking at movies that have been box-office hits and seeking similar aspects. how long could a system like that really be successful at pitching films to a single generation?
on a more selfish note, my personal movie preferences are different enough from most people's that i might dread a tool like this to be put into widespread use. i understand that most movies are already, and always will be, made based on the likelihood of a substantial return; but if production studios start selecting screenplays based on algorithms, won't that only increase the odds of films that may not appeal to the sensibilities of the vast majority but that are still quite good and deserve to be made, well, not getting made? the blockbusters are rarely the works that stand the test of time. a system like this would probably benefit the executives, it's true, but the rest of us might miss out on a lot of strong works of art.
Posted by: juniper | November 28, 2006 at 03:21 PM
I had the most intense dream about Malcolm last night. Startling and pleasantly overwhelming in its vividness and clarity. Malcolm, are you seeing anyone?
Posted by: jen | November 28, 2006 at 05:13 PM
"i have a feeling, though, that it would be similar to nick's idea--looking at movies that have been box-office hits and seeking similar aspects."
If it's a neural network, it can't do anything else. If it did something else, it wouldn't be a neural network anymore, it'd be some other thing.
(For some really good and thought-provoking information on neural networks, I recommend "Memory and Dreams" by George Christos. Very similar themes to _The Tipping Point_ and _Blink_, too.)
Posted by: Nick Argall | November 29, 2006 at 12:33 AM
No, no, no. You're cheating! Or hiding the salami! False positive rates matter hugely to reliability and utility. You need to factor the cost of the mistakes you will make in reliance on the test over time, or for one-time use you weight that cost times the probability of the false positive and then decide whether you're feeling lucky enough to risk reliance. In your coach and movie exec scenarios you're obscuring the algorithm or fuzzy logic they have to use to decide whether and how much to rely on an imperfect indicator. They didn't even have that problem until you brought them the lousy indicator. We do have fallback algorithms we use for some such situations: Using the "quick and dirty" high false positive/low false negative test to winnow the field and make it practical to rely ultimately on a slow and careful test. But if false pos and neg rates are equal, what have you got? Not that Malcom Gladwell doesn't know all this like the back of his hand. What have you done with Malcom, Malcom?!
Posted by: MT | November 29, 2006 at 01:55 PM
nick,
i have an easier time swallowing chuck palahniuk's theories on SIDS than i do christos's, but the rest of the book is an interesting read. my compulsive science-major side always squirms a little when i present it with theories based on information that's impossible to verify, but i'm doing my best to force it to at least hear the presentations through to the end.
i (mostly) understand neural networks, i was just trying not to make too many assumptions, since, as i said, i hadn't read the article. but now i have, and my feeling is still that if that's all this system can do, then movie lovers are probably better off with "some other thing." lovers of money and repetition, on the other hand, will likely erect monuments to the thing all over the world.
Posted by: juniper | November 29, 2006 at 03:34 PM
I doubt that any algorythm will improve the quality of film, and could quite possibly make cinema even more formulaic.
Unlike diagnosing heart attacks (empirically well known and quantifiable.)
Put in other words. Medical diagnosis works best at the center of the bell curve as most of us come down with maladies that effect many others. Just as standardized algorhytms won't pick up on rare diseases, nueral networks will not pick up the few succesful films from the majority that lose money.
This is a horrible idea for films where where success tend to happen at the margins rather than in the center where quantified decision making actually works. The film business doesn't make money on a steady stream of products, but on a few big hits that cover the losses of numerous misfires. Furthermore, what new trend will catapult a sleeper into a hit is unpredictable to the point of randomness. Hollywood has bankrupted almost any investor who came in trying to apply market principles from a on-entertainment industry to film.
The majority of big hits throughout Hollywood History (Present day included.) have been films that were original in some substantial and originality cannot be configured into an al
How will this nueral network tease out satires from the ernest productions? Would it have predicted the success of Asian Horror Films? Graphic WWII combat films?
All this will do is give undue confidence to outsiders wanting to make 'investments' and allow some to believe they actually no anything.
Posted by: David Schanzle | November 30, 2006 at 12:39 AM
A recent article by Atul Gawande addresses the same issue of using various criteria for obstetrical purposes, such as the Apgar score. He also posits that the increase in C-sections (which he paints, not explicitly, as a kind of standarization of childbirth techniques) has helped because traditional births can involve complications that many doctors are insufficiently experienced with to handle properly. Per this thesis, C-sections help _on average_, even though (as noted here for other specialties) any given situation might be handled differently by someone with special skills.
This is, to put it mildly, not necessarily a widely accepted thesis.
Posted by: mike | December 01, 2006 at 01:56 PM
Juniper,
Yeah, I found myself staring in disbelief at Christos's theory on SIDS. I couldn't fault his reasoning, but it's much more of an 'avenue for possible research' than something I'd promote for pediatric care.
I used to be a huge fan of neural networks and genetic algorithms - then I studied them at university. The most horrifying moment came when I realised that if sentient AI does arise, it will have AI researchers for parents.
Posted by: Nick Argall | December 06, 2006 at 10:42 PM
A couple of profs (Lee Brooks and Geoff Norman) at a school I used to attend (McMaster University)
have done quite a bit of work on rule use in diagnosis among dermatologists. Here's an abstract of a recent book chapter of theirs:
http://www.ingentaconnect.com/content/klu/ahse/1997/00000002/00000002/00146630
Note the second key finding:
2) When experts err, these errors are as likely as novices to occur on typical presentations
Read more of Brooks and Norman for the authortative view on this phenomenon. It seems like medical diagnosis is sufficiently complicated that experts might be devoting attention to "higher level" considerations and therefore not enough to more basic diagnostic considerations. In managing the more basic diagnostic considerations, the neural network can be a boon to the expert and the novice alike.
For perspective, if you saw the show "House" this week: there was a subplot involving a patient with a mysterious, apparent snake bite. House asked his students whether they should a) administer a dangerous anti-venom solution, or b) attempt to locate the rare snake.
In the show, a) and b) both assumed c), that the doctor had accurately diagnosed the wound in the first place. If the doc is going to spend his time worrying about "second-order" decisions (i.e. find the snake v. administer the anti-venom), then perhaps s/he would be well served by a computer to manage the more basic diagnoses - that is, make sure that the primary identification has been done correctly.
Posted by: Christopher Horn | December 07, 2006 at 01:31 PM
Interesting reading. There are clearly many facets of life that can be made better through neural networks or decision trees. I would argue that many of peoples everyday decisions are essentially made in this way - as the body uses mental models of ituations to shortcut a complete unbiased analysis of a given situation (to paraphrase Charlie Munger) What I find particularly fascinating is that people in general probably agree that rigorous mental models, decision trees, or neural networks make better decisions for groups of people than any one individual might (I would point to the music world and have people compare Pandora.com to your typical radio sation and see what you think provides a better musical experience for you as an individual)but people are individualistic in their nature, and thus, want to be treated as individuals in most one on one situations (like a doctors visit). given the prevalance of information on the Internet, this is exactly what is happening in many facets of life. Ask yourself whether you have ever looked up medical information on the Web. I suspect many have. Armed with a wealth of new data and a potential diagnosis, you head to the doctor to discuss your findings. The doctor, knowing little about how you feel or what your history is, begins to run down the diagnostic decision tree to quickly get to the same answer. You feel you are being shortchanged, but the doctor has done just what you have done: only the doctor has done it with years of experience and training behind his or her questions or diagnoses. The doc could be wrong and you could be right - but the point is that given vastly improved access to information and choice, people are far more likely to opt for personalized solutions than ones that are the best for the masses. Any thoughts?
Posted by: Harry DeMott | December 13, 2006 at 10:29 AM
l also just read from the adsense official blog of google about this article and l saw they gawe u a backlink of your article and its should be very friendly for you:)
l am also adsense publisher and like all adsense publishers myself also liked the news changes of best pay per click broadcaster of google adsense's new advertisement blocks.
They are working hard to understand what the advertisers want and how to make them happy with the publishers clicks, from my point of wiew, adsense advertising unit workers also realised that the old wiew of google adsense was becoming unfriendly to the publishers and they become a very apart side of the website they are published. Google is changing this, they are tring to design new templates for their advertisers to let visitors of the adsense publishers pages more good looking and friendly to let them "click" on the advertisement.
l think adsense advertisers also want this new format. They were also realising they advertisers become very unfamilier with the site. They were also looking for the best position to their ads for maximum response. l think google will giwe more friendly looks to their advertisers to let the visitors "click" on the advertisements...
Advertisers happy, Publishers happy. Google happy.
Posted by: recep | April 05, 2007 at 01:16 PM