The Problem with Crowdsourcing
From YouTube to Yelp: What is the role of the Long Tail?
Writ large, the failing of communism is also what I see as one of the failings of crowdsourcing as a whole (in the real world, research and work on crowdsourcing is what I used to do for a living and I have to deal with a lot of evangelizing about how great it is that news reporting, etc, is in the hands of "the people" now — so it's possible that some of my cynicism there is unfairly bleeding over). The promise of crowdsourcing news, say, or data, or information about movies or places, is that by removing the barrier to entry, in the long run the system is self-sustaining and dampens out outliers, incorrect information, deliberate sabotage, etc. You cast a wider net, and the net is self-regulating.

Of course, in practice that's not really what we see at all, for two reasons. The first is that, as it happens, quality of the system and number of participants aren't necessarily correlated. The classic formulation in business is the concept of the "network effect," the quintessential example of which is the telephone. A telephone network with two participants is less useful than one with three, which is less useful than one with three million. The more people you can reach out and touch, the better the network is.

But, of course, it's not really that simple. I would say, in fact, that the strength of a network depends on the degree to which it satisfies four criteria:

1. Each node must be independently strong. Well-managed peer-to-peer file sharing has this as its stock in trade: a network in which 20 people are uploading and downloading simultaneously is more useful than one with 200 participants but where only 10 are uploading, and the other 190 are primarily "leeches" even though the second network has more participants. There are some instances in which something is made strong by its association with something else — epoxy, for instance. But note that I mean made strong, not made less weak; weak nodes in a system always add systemic drag.

2. The system must be at or below its threshold of saturation. Return to that P2P example: if everyone is sharing equally, you've satisfied point 1, but at some point the available bandwidth of the entire network is overloaded — the system becomes so saturated that it stalls out and discourages additional participation. Return on yield from a network effect versus saturation is a curve: a network that satisfies point 1 grows more useful until it reaches an optimal point, the threshold, after which its utility strongly declines and attempting to add more nodes adds overall drag to the system.

3. Each node should be relatively equal to other nodes. Having certain nodes that are much more critical to the network introduce instability and prevent the network from becoming self-regulating — when all nodes are (roughly) equal, stress on the network is distributed evenly. Put another way, a network is as weak as the degree by which its strongest node outpaces the others: if that node falters, it disproportionately compromises the entire network. This single point of failure is what made Napster less powerful than, say, BitTorrent. As well, nodes that are substantially weaker add drag to the system.

4. The nodes must connect with one another. A geodesic, for example, satisfies points 1 (each triangle is independently strong), 2 (geodesics can scale infinitely large) and 3 (it is composed of equal triangles). On the other hand, the tessellation of floor tiles satisfies points 1, 2, and 3 but not 4: one floor tile does not really impact its neighbors. Geodesics can therefore be analyzed in terms of network effects, floors cannot. Book swaps, if the swap list is not too large to be unwieldy and if everyone contributes equally, are strengthened by the network effect; brick and mortar stores, which do not improve for the nodes as more people shop there, are not.

So, for example, the New York Times is an example of a network: each reporter contributes positively, there are not so many reporters that the system becomes unwieldy, all of them are relatively equal and they are connected to one another via their editors and the overall journalistic mission of the paper. But suppose you don't want to have to deal with having a bunch of professional reporters, because they cost a lot. Then, you turn to the Internet. In a system governed only by the classic formulation of the network effect, where strength just goes up as n^2, this makes perfect sense: the greater the breadth of information you have, the more stories you can put out and the more topics you can cover.

But this doesn't really hold. For one, not everyone is a decent reporter. Maybe they don't feel like doing the research. Maybe they don't know how. Maybe they don't know how to write an engaging lede. Maybe they're just re-reporting stories they've heard because they don't want to be scooped. For two, the nodes aren't equal — certain people become rumor clearing-houses (these are "the influentials" of marketing fame), so when it comes time to cover a story the system can't properly distribute its stress and relies too heavily on these supernodes.

Saturation is a problem, too, as anyone who follows a large number of people on Twitter or any other social network knows. Beyond a certain point, we simply can't handle all this information — it has to be cut down in some way. Maybe you have certain groups you follow for certain topics; maybe you only really watch the one or two people who have really important news.

One of the biggest problems, though, is point 4. The assumption is commonly made that by adding people to your net, you're increasing the chance of catching something interesting. This is only true, however, when there is some interconnectivity between those people — otherwise, you may have a net with millions of pieces of rope and huge, gaping holes.

My point, anyway, is that there's an empowerment narrative wrapped up in "Web 2.0" and the modern-day punk aesthetic it embodies, incorporates and extends: "we're casting off the shackles of Big [industry]! We're doing it ourselves! It's transformational that someone's home video can now get just as much exposure as a Michael Bay film! We're changing the system!" I don't entirely disagree with that; certainly, to borrow a term from Marxism, the means of production are shifting into the Long Tail — or whatever you want to call it: the proletariat. The masses. "The rest of us."

But this has created an ongoing fallacy that everything is explicable via network effect — that every system improves as the number of participants grows. This is something that is only just now starting to sink in for Internet businesses dealing with the social graph: that you don't care about the 500 people in your Facebook friends, you care about the 20 that are relevant at that particular moment. It's something that content mills and SEO-optimization firms are having to come to terms with as search engines start to try to filter out the chaff that comes from spitting out millions of terrible websites instead of a handful of powerful, information-heavy ones.

And of course, in point of fact we see this even in those industries that were supposed to be radically changed by the "power of us" — it can't be denied that indie labels have catapulted a lot of small-bands to larger audiences than they would've otherwise found, but nor can the phenomena that are Justin Bieber or Lady Gaga be denied. Open-source software has found some success, but dedicated software engineering firms continue to drive both the enterprise and the home. And while YouTube memes occasionally hit big, the box office is still a huge draw for average folks.

In short, to benefit from large networks, you have to fit those four criteria. Means of communication or transmission generally do — think about cell providers who offer unlimited minutes to other people in their network, for example; think about the way the Internet itself is structured, largely as a series of decentralized nodes. For that matter, traffic flows better when the system is completely and evenly utilized, detours are employed, routes are recalculated, etc.

Means of production or creativity, however, almost universally don't. Productive industries like software engineering learned that you can't just throw people at a problem ("the mythical man-month") a long time ago; at some point, the added bureaucratic drag of making them work together outweighs the notional benefit to the overall network. To one degree or another, all four points weigh against economies of scale in production and creativity.

For example, rather than being independently strong, many nodes are weak. In productive industries, this adds drag as weak producers have to be compensated for elsewhere in the system (exacerbating tenet 3); if handled poorly, this reduces the total productivity of the network by reducing the incentive to produce or overcompensate (leveling tenet 3 back) — the communist's dilemma.

It's also much easier for the network to become oversaturated. Unlike communicating or transferring something, where the goal is to get an object from point A to point B (and therefore giving more opportunities to do so increases the network's strength) productive and creative industries have to effect a change on that object. No two nodes can perform the same change in the same way, so the system is unstable: depending on what version of the changed object you want, your entry point to the network has to change. Without regulation, this takes the form of a poor signal to noise ratio.

Maybe the greatest flaw, though, is with the fact that the nodes don't really actually intersect. Creatively speaking, there is the notion that they do — that being surrounded by other creative people is some sort of creative force multiplier. I don't know whether or not that's true, but in a practical sense it's ancillary to the network. Production and creativity both are really about refining — taking raw stuff (physical in one case, ideas and source material in the other) and boiling it down into something new.

There's a relatively clear production chain involved: adding more drilling rigs to an oil chain doesn't improve the efficiency of the refinery and, if the chain doesn't have the ability to absorb the bureaucratic overhead of those rigs, may actually decrease the overall productivity of the system (this is a real world problem, and one reason why vertical integration is not always desirable). Similarly, being linked to more creative nodes around you doesn't directly bear on your own output. Being part of a larger reviewing network on Yelp doesn't necessarily mean that you write more reviews, in a vacuum.

For these reasons, the network effect just doesn't apply. The empowerment narrative is a nice story, but in terms of actual returns it's just that — a self-regulated large-scale network of producers or creators becomes dysfunctional under its own weight and flaws. In the real world, by and large, this isn't really arguable, which is why everybody has to introduce some means of regulating the system. This is the second problem with crowdsourcing, as I mentioned above: it always comes back to curation. Somebody has to do the sorting, or the network is useless. Sometimes this is procedural (YouTube's "most-viewed" list, aggregate stars on Amazon or Newegg) and sometimes it is human (editorial review, "diggs") and sometimes it is a combination of the two ("your friends most recently [ ]").

For tech companies and anybody involved in trying to mediate the crowd (for example, websites that solicit user submissions or reviews) the trick is figuring out a careful balance. The democratization of technology now allows anyone to be a food critic, a journalist, a photographer, an author. On the other hand, we are now completely oversaturated with information. Right now, technology serves as an amplifier — but it must, inevitably, act as a filter, too, or all we've done is bought ourselves a few more years until the data deluge drowns us completely.
You can use this form to add a comment to this page!




You will be identified by the name you provide. Once posted, comments may not be edited. For markup, use 'bulletin board' code: [i][/i] for italic, [b][/b] for bold, [ind][/ind] to indent, [url=][/url] for URLs, and [quote=Author|Date][/quote] for quotes (you can leave the date blank but you need the pipe). HTML is not allowed. Neither is including your website :)