Saturday, May 5, 2012

How A Private Data Market Could Ruin Facebook

The growing interest in a market for personal data that shares profits with the individuals who own the data could change the business landscape for companies like Facebook

Facebook's imminent IPO raises an interesting issue for many of its users. The company's value is based on its ability to exploit the online behaviours and interests of its users. 

To justify its sky-high valuation, Facebook will have to increase its profit per user at rates that seem unlikely, even by the most generous predictions. Last year, we looked at just how unlikely this is. 

The issue that concerns many Facebook users is this. The company is set profit from selling user data but the users whose data is being traded do not get paid at all. That seems unfair.

Today, Bernardo Huberman and Christina Aperjis at HP Labs in Palo Alto, say there is an alternative. Why not  pay individuals for their data? TR looked at this idea earlier this week.

Setting up a market for private data won't be easy. Chief among the problems is that buyers will want unbiased samples--selections chosen at random from a certain subgroup of individuals. That's crucial for many kinds of statistical tests.

However, individuals will have different ideas about the value of their data. For example, one person might be willing to accept a few cents for their data while another might want several dollars.

If buyers choose only the cheapest data, the sample will be biased in favour of those who price their data cheaply. And if buyers pay everyone the highest price, they will be overpaying. 

So how to get an unbiased sample without overpaying? 

Huberman and Aperjis have an interesting straightforward solution. Their idea is that a middle man, such as Facebook or a healthcare provider, asks everyone in the database how much they want for their data. The middle man then chooses an unbiased sample and works out how much these individuals want in total, adding a service fee. 

The buyer pays this price without knowing the breakdown of how much each individual will receive. The middle man then pays each individual what he or she asked, keeping the fee for the service provided. 

The clever bit is in how the middle man structures the payment to individuals. The trick here is to give each individual a choice. Something like this:

Option A: With probability 0.2, a buyer will get access to your data and you will receive a payment of $10. Otherwise, you’ll receive no payment.
Option B: With probability 0.2, a buyer will get access to your data. You’ll receive a payment of $1 irrespectively of whether or not a buyer gets access

So each time a selection of data is sold, individuals can choose to receive the higher amount if their data is selected or the lower amount whether or not it is selected.

The choice that individuals make will depend on their attitude to risk, say Huberman and Aperjis. Risk averse individuals are more likely to choose the second option, they say, so there will always be a mix of people expecting high and low prices. 

The result is that the buyer gets an unbiased sample but doesn't have to pay the highest price to all individuals.

That's an interesting model which solves some of the problems that other data markets suffer from.

But not all of them. One problem is that individuals will quickly realise how the market works and work together to demand ever increasing returns.  

Another problem is that the idea fails if a significant fraction of individuals choose to opt out altogether because the samples will then be biased towards those willing to sell their data. Huberman and Aperjis say this can be prevent by offering a high enough base price. Perhaps.

Such a market has an obvious downside for companies like Facebook which exploit individual's private data for profit. If they have to share their profit with the owners of the data, there is less for themselves.

And since Facebook will struggle to achieve the kind of profits per user it needs to justify its valuation, there is clearly trouble afoot.

Of course, Facebook may decide on an obvious way out of this conundrum--to not pay individuals for their data.

But that creates an interesting gap in the market for a social network that does pay a fair share to its users (perhaps using a different model to Huberman and Aperjis'). 

Is it possible that such a company could take a significant fraction of the market? You betcha!

Either way, Facebook loses out--it's only a question of when.  

This kind of thinking must eventually filter through to the people who intend to buy and sell Facebook shares. 

For the moment, however, the thinking is dominated by the greater fool theory of economics--buyers knowingly overpay on the basis that some other fool will pay even more. And there's only one outcome in that game.

Ref: arxiv.org/abs/1205.0030: A Market for Unbiased Private Data: Paying Individuals According to their Privacy Attitudes


View the original article here

Twitter Cannot Predict Elections Either

Claims that Twitter can predict the outcome of elections are riddled with flaws, according to a new analysis of research in this area

It wasn't so long ago that researchers were queuing up to explain Twitter's extraordinary ability to predict the future.  

Tweets, we were told, reflect the sentiments of the people who send them. So it stands to reason that they should hold important clues about the things people intend to do, like buying or selling shares, voting in elections and even about paying to see a movie. 

Indeed various researchers reported that social media can reliably predict the stock market, the results of elections and even box office revenues

But in recent months the mood has begun to change. Just a few weeks ago, we discussed new evidence indicating that this kind of social media is not so good at predicting box office revenues after all. Twitter's predictive crown is clearly slipping. 

Today, Daniel Gayo-Avello, at the University of Oviedo in Spain, knocks the crown off altogether, at least as far as elections are concerned. His unequivocal conclusion: “No, you cannot predict elections with Twitter.”

Gayo-Avello backs up this statement by reviewing the work of researchers who claim to have seen Twitter's predictive power. These claims are riddled with flaws, he says.

For example, the work in this area assumes that all tweets are trustworthy and yet political statements are littered with rumours, propaganda and humour. 

Neither does the research take demographics into account. Tweeters are overwhelmingly likely to be younger and this, of course, will bias any results.   "Social media is not a representative and unbiased sample of the voting population," he says.

Then there is the problem of self selection. The people who make political remarks are those most interested in politics. The silent majority is a huge problem, says Gayo-Avello and more work needs to be done to understand this important group.

Most damning is the lack of a single actual prediction. Every analysis on elections so far has been done after the fact. "I have not found a single paper predicting a future result," says Gayo-Avello.

Clearly, Twitter is not all it has been cracked up to be when it comes to the art of prediction. Given the level of hype surrounding social media, it's not really surprising that the more sensational claims do not stand up to closer scrutiny. Perhaps we should have seen this coming (cough).

Gayo-Avello has a solution. He issues the following challenge to anybody working in this area: "There are elections virtually all the time, thus, if you are claiming you have a prediction method you should predict an election in the future!" 

Ref: arxiv.org/abs/1204.6441: “I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper”: A Balanced Survey on Election Prediction using Twitter Data


View the original article here

Twists 'n' Turns

Sorry, I could not read the content fromt this page.

View the original article here

The New Science of Online Persuasion

Researchers are using Google Adwords to test the persuasive power of different messages.

The Web has fundamentally changed the business of advertising in just a few years. So it stands to reason that the process of creating ads is bound to change too. 

The persuasive power of a message is a crucial ingredient in any ad. But settling on the best combination of words is at best a black art and at worst, little more than guesswork.  

So advertisers often try to test their ads before letting them out into the wild.

The traditional ways to test the effectiveness of an advertising campaign are with a survey or a focus group. Surveys are shown to a carefully selected group of people who are asked to give their opinion about various different forms of words. A focus group is similar but uses a small group of people in more intimate setting, often recorded and watched from behind a one way mirror. 

There are clear disadvantages with both techniques. Subjects are difficult to recruit, hard to motivate (often requiring some kind of financial reward ) and the entire process is expensive and time consuming. 

What's more, the results are hard to analyse since any number of extraneous effects can influence them. Focus groups, for example, are notoriously susceptible to group dynamics in which the view of one individual can come to dominate. And there is a general question over whether recruited subjects can ever really measure the persuasiveness of anything.

Then there is the obvious conflict created by the fact that a subject is not evaluating the messages under the conditions in which they were designed to work ie to get the attention of an otherwise disinterested reader.

So there's obvious interest in finding a better way to test the value of persuasive messages. One approach is to use crowdsourcing services such as Mechanical Turk to generate an immediate readership willing to take part. 

But Turks are paid to take part. So the results are no better than those that conventional methods produce, although they are cheaper and quicker to collect.

Today, Marco Guerini at the Italian research organisation Trento-Rise and a couple of buddies say they've found an interesting way round this: to test messages on Google's AdWords service.

The idea here is to use Google Adwords to place many variations of a single message to see which generates the highest click through rate.  

That's a significant improvement over previous methods. The subjects are not paid and make their choice in the very conditions in which the message is designed to work. And the data is quick and relatively cheap to collect.

Google already has a rudimentary tool that can help with this task. The so-called AdWords Campaign Experiments (ACE) tool allows users to test two variations of an ad side-by-side.

But to really get to the heart of persuasiveness requires a much more rigorous approach. Guerini and co make some small steps in this direction by testing various adverts for medieval art at their local castle in Trento.

These guys used Google's ACE tool to test various pairs of adverts and achieved remarkable success with some of their ads. One ad, for example,  achieved a click through rate of over 6 per cent from just a few hundred impressions--that's an impressive statistic in an industry more used to measuring responses in fractions of a percent.

However, this click through rate  was not different in a statistically significant way from its variant so there's no way of knowing what it was about the message that generated the interest.

So while Guerini and co's experiments are interesting pilots they are not extensive enough to provide any insight into the nature of persuasive messaging. That will need testing on a much larger scale.

These will not be easy experiments to perform and present numerous challenges.  For example, the process of changing the wording of an advert is fraught with difficulty. Then there is the question of whether this method is able to test anything other adverts designed for AdWords. It might have limited utility for testing the messages in magazine adverts or billboard posters, for instance.

But the important point is that these kinds of experiments are possible at all. And it's not hard to imagine interesting scenarios for future research. For example, AdWords could be used as part of an evolutionary algorithm. This process might start with a 'population' of messages that are tested on Adwords. The  best performers are then selected to 'reproduce' with various random changes to form a new generation of messages that are again tested. And so on. 

Who knows what kind of insight these kinds of approaches might produce into the nature of persuasiveness and the human mind. But we appear to have a way to carry out these experiments for the first time.

Ref: arxiv.org/abs/1204.5369: Ecological Evaluation of Persuasive Messages Using Google AdWords


View the original article here

The Worrying Consequences of the Wikipedia Gender Gap

Male editors dramatically outnumber female ones on Wikipedia and that could be dramatically influencing the online encyclopedia's content, according to a new study

There was a time when the internet was dominated by men but in recent years that gap has dissolved. Today, surfers are just as likely to be male as female. And in some areas women dominate: women are more likely to Tweet or participate in social media such as Facebook. Even the traditionally male preserve of online gaming is changing too.   

So what's wrong with Wikipedia? Last year, the New York Times pointed out that women make up just 13 per cent of those who contribute to Wikipedia, despite making up almost half the readers. And a few months ago, a study of these gender differences said they hinted at a culture at Wikipedia that is resistant to female participation.

Today, Pablo Aragon and buddies at the Barcelona Media Foundation in Spain suggest that the problem is seriously influencing Wikipedia's content.

These guys have studied the biographies of the best connected individuals on 15 different Wikipedia language sites. They chose the best connected individuals by downloading all the biographies and then constructing a network in which individuals with Wikipedia biographies are nodes. They then drew links between nodes if that person's Wikipedia biography contained a link to another individual.

Finally, they drew up a list of the best connected people.The table above shows the top five for each of the 15 language sites. 

There are some curious patterns. In many countries, politicians and leaders are the best connected individuals. For example, on the Chinese language site, Chiang Kai-shek is the best connected individual; on the English-speaking site it's George W Bush; and on the German site, Adolf Hitler tops the list.

In other countries, entertainers head the list; Frank Sinatra in Italy, Michael Jackson in Portugal and Marilyn Monroe in Norway. 

But most curious of all is the lack of women. Out of a possible total of 75, only three are women: Queen Elizabeth II, Marilyn Monroe and Margaret Thatcher.

That's a puzzling disparity and one for which Aragon and co point to an obvious possibility--that the gender gap among editors directly leads to the gender gap among best connected individuals. 

Of course, that's only speculation but Aragaon and co call it "an intriguing subject for future investigation." We'll be watching to see how that pans out.

In the meantime, the Wikimedia Foundation has  set itself the goal of increasing the proportion of female contributors to 25 per cent by 2015, a step in the right direction but still an embarrassing blot on the landscape of collaborative endeavour.

Ref: arxiv.org/abs/1204.3799: Biographical Social Networks On Wikipedia - A Cross-Cultural Study Of Links That Made History


View the original article here

How Dark Matter Interacts with the Human Body

Dark matter must collide with human tissue, and physicists have now calculated how often. The answer? More often than you might expect.

One of the great challenges in cosmology is understanding the nature of the universe's so-called missing mass.

Astronomers have long known that galaxies are held together by gravity, a force that depends on the amount of mass a galaxy contains. Galaxies also spin, generating a force that tends to cause this mass to fly apart. 

The galaxies astronomers can see are not being torn apart as they rotate, presumably because they are generating enough gravity to prevent this.

But that raises a conundrum. Astronomers can see how much visible mass there is in a galaxy and when they add it all up, there isn't anywhere enough for the required amount of gravity.  So something else must be generating this force. 

One idea is that gravity is stronger on the galactic scale and so naturally provides the extra force to glue galaxies together.

Another is that the galaxies must be filled with matter that astronomers can't see, the so-called dark matter. To make the numbers work, this stuff needs to account for some 80 per cent of the mass of galaxies so there ought to be a lot of it around. So where is it? 

Physicists have been racing to find out with detectors of various kinds and more than one group says it has found evidence that dark matter fills our solar system in quantities even more vast than many theorists expect. If they're right, the Earth and everything on it is ploughing its way through a dense sea of dark matter at this very instant. 

Today, Katherine Freese at the University of Michigan in Ann Arbor, and Christopher Savage at Stockholm University in Sweden outline what this means for us humans, since we must also be pushing our way through this dense fog of dark stuff.

We know that whatever dark matter is, it doesn't interact very strongly with ordinary matter, because otherwise we would have spotted its effects already.

So although billions of dark matter particles must pass through us each second, most pass unhindered. Every now and again, however, one will collide with a nucleus in our body. But how often?

Freese and Savage calculate how many times nucleii in the average-sized lump of flesh ought to collide with particles of dark matter. By average-sized, they mean a 70 kg  lump of meat made largely of oxygen, hydrogen carbon and nitrogen. 

They say that dark matter is most likely to collide with oxygen and hydrogen nuclei in the body. And given the most common assumptions about dark matter,  this is likely to happen about 30 times a year. 

But if the latest experimental results  are correct and dark matter interactions are more common than expected, the number of human-dark matter collisions will be much higher. Freese and Savage calculate that there must be some 100,000 collisions per year for each human on the planet. 

That means you've probably been hit a handful of times while reading this post.  

Freese and Savage make no estimate of the potential impact on health this background rate of collisions might have. That would depend on the energy and motion of a nucleus after it had been hit and what kind of damage it might wreak on nearby tissue. 

It must surely represent a tiny risk per human but what are the implications for the population as a whole?  That would be an interesting next step for a biological physicist with a little spare calculating time. 

Ref: arxiv.org/abs/1204.1339: Dark Matter Collisions With The Human Body


View the original article here

Friday, May 4, 2012

The Amazing Trajectories of Life-Bearing Meteorites from Earth

The asteroid that killed the dinosaurs must have ejected billions of tons of life-bearing rock into space. Now physicists have calculated what must have happened to it.

About 65 million years ago, the Earth was struck by an asteroid some 10 km in diameter with a mass of well over a trillion tons. We now know the immediate impact of this event—megatsunamis, global wildfires ignited by giant clouds of superheated ash, and, of course, the mass extinction of land-based life on Earth.

But in recent years, astrobiologists have begun to study a less well known consequence: the ejection of billions of tons of life-bearing rocks and water into space. By some estimates, the impact could have ejected as much mass as the asteroid itself. 

The question that fascinates them is what happened to all this stuff.

Today, we get an answer from Tetsuya Hara and buddies at Kyoto Sangyo University in Japan. These guys say a surprisingly large amount of Earth could have ended up not just on the Moon and Mars, as might be expected, but much further afield. 

In particular, they calculate how much would have ended up in other places that seem compatible for life: the Jovian moon Europa, the Saturnian moon Enceladus, and Earth-like exoplanets orbiting other stars.

Their results contain a number of surprises. First, they calculate that almost as much ejecta would have ended up on Europa as on the Moon: around 10^8 individual Earth rocks in some scenarios. That's because the huge gravitational field around Jupiter acts as a sink for rocks, which then get swept up by the Jovian moons as they orbit. 

But perhaps most surprising is the amount that makes its way across interstellar space. Last year, we looked at calculations suggesting that more Earth ejecta must end up in interstellar space than all the other planets combined.

Hara and co go further and estimate how much ought to have made its way to Gliese 581, a red dwarf some 20 light years from here that is thought to have a super-Earth orbiting at the edge of the habitable zone.

They say about a thousand Earth-rocks from this event would have made the trip, taking about a million years to reach their destination.

Of course, nobody knows if microbes can survive that kind of journey or even the shorter trips to Europa and Enceladus. But Hara and buddies say that if microbes can survive that kind of journey, they ought to flourish on a super-Earth in the habitable zone. 

That raises another interesting question: how quickly could life-bearing ejecta from Earth (or anywhere else) seed the entire galaxy?

Hara and co calculate that it would take some 10^12 years for ejecta to spread through a volume of space the size of the Milky Way. But since our galaxy is only 10^10 years old, a single ejection event could not have done the trick.

However, they say that if life evolved at 25 different sites in the galaxy 10^10 years ago, then the combined ejecta from these places would now fill the Milky Way.

There's an interesting corollary to this. If this scenario has indeed taken place, Hara and co say: "then the probability is almost one that our solar system is visited by the microorganisms that originated in extra solar system."

Entertaining stuff!

Ref: arxiv.org/abs/1204.1719: Transfer of Life-Bearing Meteorites from Earth to Other Planets


View the original article here