Archive for the ‘Nets 'n' webs’ Category

How big is a city?

August 20, 2008

s-population.jpg

That’s not as silly a question as it sounds. Defining the size of a city is tricky task that has major economic implications: how much should you invest in a city if you don’t know how many people live and work there?

The standard definition is the Metropolitan Statistical Area, which attempts to capture the notion of a city as a functional economic region and requires a detailed subjective knowledge of the area before it can be calculated. The US Census Bureau has an ongoing project dedicated to keeping abreast of the way this one metric changes for cities across the continent.

Clearly that’s far from ideal. So our old friend Eugene Stanley from Boston University and a few pals have come up with a better measure called the City Clustering Algorithm. This divides an area up into a grid of a specific resolution, counts the number of people within each square and looks for clusters of populations within the grid. This allows a city to be defined in a way that does not depend on its administrative boundaries.

That has significant implications because clusters depend on the scale on which you view them. For example, a 1 kilometre grid sees New York City’s population as a cluster of 7 million, a 4 kilometre grid makes it 17 million and the cluster identified with an 8 kilometre grid scale, which encompassing Boston and Philadelphia, has a population of 42 million. Take your pick.

The advantage is that this gives a more or less objective way to define a city. It also means we’ll need to reanalyse of some of the fundamental properties that we ascribe to cities  growth. For example,  the group has studied only a limited numer of cities in the US, UK and Africa but already says we’ll need to rethink Gibrat’s law which states that a city’s growth rate is independent of its size.

Come to think of it, Gibrat’s is a kind of weird law anyway. Which means there may be some low hanging fruit for anybody else who wants to re-examine the nature of cities.

Ref: arxiv.org/abs/0808.2202: Laws of Population Growth

Schroedinger-like PageRank wave equation could revolutionise web rankings

August 7, 2008

quantum-pagerank.jpg

The PageRank algorithm that first set Google on a path to glory measures the importance of a page in the world wide web.  It’s fair to say that an entire field of study has grown up around the analysis of its behaviour.

That field looks set for a shake up following the publication today of an entirely new formulation of the problem of ranking web pages. Nicola Perra at the University of Cagliari in Italy and colleagues have discovered that when they re-arrange the terms in the PageRank equation the result is a Schroedinger-like wave equation.

So what, I hear you say, that’s just a gimmick. Perhaps, but the significance is that it immediately allows the entire mathematical machinery of quantum mechanics to be brought to bear on the problem–that’s 80 years of toil and sweat.

Perra and pals point out some of the obvious advantages and disadvantages of the new formulation.

First, every webpage has a quantum-like potential. The topology of this potential gives the spatial distribution of PageRank throughout the web. What’s more, this distribution can be calculated in a straightforward way which does not require iteration as the conventional PageRank algorithm does.

So the PageRank can be calculated much more quickly for relatively small webs and the team has done a simple analysis of the PageRanking of the .eu domain in this way. However, Perra admits that the iterative method would probably be quicker when working with the tens of billions of pages that make up the entire web.

But the great promise of this Schroedinger-like approach is something else entirely. What the wave equation allows is a study of the dynamic behaviour of PageRanking, how the rankings change and under what conditions.

One of the key tools for this is called perturbation theory. It’s no understatement to say that perturbation theory revolutionised our understanding of the universe when it was applied to quantum theory in the 1920s and 1930s.

The promise is that it could do the same to our understanding of the web and if so, this field is in for an interesting few years ahead.

Ref: arxiv.org/abs/0807.4325: Schroedinger-like PageRank equation and localization in the WWW

The curious kernels of dictionaries

July 7, 2008

Grounded kernel

If you don’t know the meanng of a word, you look it up in the dictionary. But what if you don’t know the meaning of any of the words in the definition? Or the meaning of any of the words in the definitions of these defining words? And so on ad infinitum.

This is known as the “symbol grounding problem” and is related to the nature of meaning in language.  The way out of this problem is to assume that we somehow automatically “know” the meaning of a small kernel of words from which all others can be defined.

The thinking is that some words are so closely linked to the object to which they refer that we know their meaning without a definition. Certain individuals, events and  actions apparently fall into this category. These words are called “grounded”.

How this controversial idea might work, we’ll leave for another day.The question we’re pondering today, thanks to Alexandre Blondin Masse at the University of Quebec in Canada is: how small a kernel of grounded words do we need to access the entire dictionary.

We don’t have an answer for you but Blondin Masse and pals have a method based on the concept of reachable set: “a larger vocabulary whose meanings can be learned from a smaller vocabulary through definition alone, as long as the meanings of the smaller vocabulary  are
themselves already grounded”.

The team have even  developed algorithms to compute a reachable set for any given dictionary and from that the size of the grounded kernel.

It has to be said that modern dictionaries already work like this; they are based on a defining vocabulary of about 2000 words from which all others are defined, although this system does not appear to be rigorously enforced, says Blondin Masse and co.

Nobody knows whether 2000 words is close to the theoretical limit for a grounding kernel. But we’ll expect Blondin Masses and pals to tell us soon.

Ref: arxiv.org/abs/0806.3710: How Is Meaning Grounded in Dictionary Definitions?

Cellphone records reveal the basic pattern of human mobility

June 11, 2008

Mobile phone movement

A few months back, we saw what happens when researchers get their paws on anonymixed mobile phone records. Albert-Laszlo Baribasi at the University of Notre Dame in Indiana and some buddies used them to discover entirely new patterns of human behaviour.

Now Baribasi has dug deeper into the data and discovered a single basic pattern of human mobility.  It’s nothing special: lots of smallish journeys interspersed with occasional long ones (the length of the journey actually follows a power law).

That’s more or less what you’d expect but experimental confirmation is important.

Human mobility is one of the crucial factors in understanding the spread of epidemics. Until now, the models that predict how disease spreads have had to rely on educated guesses about the way human travel patterns might affect this process.

Baribasi’s work will take just little of the guesswork out of future efforts and that can’t be bad.

Ref: arxiv.org/abs/0806.1256: Understanding Individual Human Mobility Patterns

Why do online opinions evolve differently to offline ones?

June 5, 2008

 Online opinions

The way in which opinions form, spread through societies and evolve over time is a hot topic among researchers because of their increasing ability to measure and simulate what’s going on.

The field offers some juicy puzzles that look ripe for picking by somebody with the right kind of insight. For example,  why do people bother to vote in elections in which they have little control over the result when a “rational” individual ought to conclude that it is not worth taking part.

A similar conundrum is why people contribute to online opinion sites such as Amazon‘s book review system or the Internet Movie Database’s (IMDB) ratings system. When there are already a hundred 5-star reviews, why contribute another?

Today Fang Wu and Bernardo Huberman at the HP Laboratories in Palo Alto present the results of their analysis of this problem. And curiously, it looks as if online opinions form in a subtley different way to offline ones.

The researchers studied the patterns of millions of opinions posted on Amazon and the IMDB and found some interesting trends. They say:

Contrary to the common phenomenon of group polarization observed offline, we measured a strong tendency towards moderate views in the course of time.

That might come as a surprise to anyone who has  followed the discussion on almost any online forum but Wu and Huberman have an idea how moderation seems to evolve.  They suggest that people are most likely to express a view when their opinion is different from the prevailing consensus because such a contribution will have a bigger effect on the group.

They tested the idea  by looking at the contributions of people who added detailed reviews against those who simply clicked a button. Sure enough, those who invest more effort are more likely to have an opposing view. It is these opposing views that tend to moderate future views.

By contrast, sites such as Jyte in which users can only click a button to give their opinion tend to show herding behaviour in which people copy their peers, just as they often do offline.

Wu and Huberman’s analysis raises more questions than answers for me. But they point out that the study of online opinions has been neglected until now.  That looks set to change.

Ref: arxiv.org/abs/0805.3537: Public Discourse in the Web Does Not Exhibit Group Polarization

The science of scriptwriting

June 4, 2008

McKee

You don’t have to delve far into the realms of scriptwriting before you’ll be pointed towards a book called Story by Robert McKee, which explains why scriptwriting is more akin to engineering than art. McKee examines story-telling like a biologist dissecting a rat. But after taking it apart, he explains how to build a story yourself using rules that wouldn’t look out of place in a computer programming text book.

McKee has become so influential that huge numbers of films, perhaps most of them, and many TV series are now written using his rules. But the real measure of his success is that there are even anti-McKee films such as Adaptation that attempt to burst McKee’s bubble.

Given that scriptwriting has become so formulaic, shouldn’t science have a role to play in analysing it? That’s exactly what Fionn Murtagh and pals at the Royal Holloway College, University of London have done in a project that analyses scripts in a repeatable, unambiguous and potentially-automatic way.

Using McKee’s rules they compare the script of the film Casablanca, a classic pre-McKee movie, with scripts of six episodes of CSI (Crime Scene Investigation), a classic post-Mckee production, and find numerous similarities.

That’s hardly surprising since McKee learnt his trade analysing films such as Casablanca, so anything written using his rules should have these similarities.

What’s interesting about the work is that Murtagh and mates want to use their technique to develop a kind of project management software for scriptwriting. That’s an ambitious goal but one that might find a handy niche market, particularly since many scripts, TV serials in particular, are now written by teams rather than individuals and so need careful project management from the start.

The challenge for Murtagh and co will be to turn this aproach into a bug-free, easy-to-use package that has the potential to become commercially viable. And for that they’ll almost certainly need some outside help and funding. Anybody got any spare cash?

Ref: arxiv.org/abs/0805.3799: The Structure of Narrative: the Case of Film Scripts

VoIP threatened by steganographic attack

May 30, 2008

VoIP steganography

Steganography is the art of hiding message when they are sent, in a process akin to camouflage. In cryptography, on the other hand, no attempt is made to hide the message, only to conceal its content.

Today, Wojciech Mazurczyk and Krzysztof Szczypiorski of the Warsaw University of Technology in Poland explain how VoIP services are wide open to steganographic attack and even measure how much information can be sent covertly in this way.

VoIP services such as Skype are vulnerable to steganographic attack because they use such a high bandwidth and that makes it relatively easy to embed a hidden message in the bit stream in a way that it is almost impossible to detect.

For precisely this reason, the US Department of Defence specifies in that any covert channel with a bandwidth higher than 100 bps must be considered insecure for average security requirements. For high security requirements, the DoD says the data rate should not exceed 1 bps, making it next to impossible to embed a hidden code without it being noticed.

So VoIP systems such as Skype, with their much higher data rates, are difficult to secure.

And to prove it, Mazurczyk and Szczypiorski have tested a number of steganographic attacks (including two new ones they’ve developed themselves) on a VoIP system to determine how much data could be sent. They say that during an average call (that’s 13 minutes long according to Skype) they were able to covertly transmit as much as 1.3 Mbits of data.

That should get a number of governments, companies and individuals thinking. How secure is your VoIP system?

Ref: arxiv.org/abs/0805.2938: Steganography of VoIP streams

World's oldest social network reconstructed from medieval land records

May 13, 2008

Medieval network

The network of links between peasants who farmed a region of small region of south west France called Lot between 1260 and 1340 have been reconstructed by Nathalie Villa from the Universite de Perpignan in France et amis.

The team took their data from agricultural records that have been preserved from that time. This is a valuable dataset because it records the date, the type of transaction and the peasants involved.

Villa and co used this to recreate the network of links that existed between individuals and families in th 13th and 14th centures in this part of France. They then drew up a self organising map of the network (see above).

But the best is surely to come. What Vilal hasn’t yet done is analyse the network’s properties. Does this medieval network differ in any important ways from the kind of networks we see between individuals in the 21st century? If so, what explains the differences and if not what are the invariants that link our world with 13th century France. The team promises an analysis in the near future.

In the meantime, it’s worth reflecting on the significance of this work. These kinds of networks could provide anthropolgists with an exciting new way to study historical societies.

And while this may be the world’s oldest social network (if anyone knows of an older network, let us know), it’s unlikely to remain so for long. Excellent records survive of transactions in ancient Rome, from the earlier Greek empire and even from the Egyptian civilizations that built the pyramids some 4000 years ago.

If Villa work turns up any useful insights into the nature of medieval society in France, you can be sure that anthroplogists will rush to repeat the method usnig data from from even older societies.

All that’s left is to christen the new science of the study ancient social networks Any suggestions?

Ref: arxiv.org/abs/0805.1374: Mining a Medieval Social Network by Kernel SOM and Related Methods

The mathematics of tackling tax evasion

May 9, 2008

Tax evasion

In recent years, economists have gained the luxury of actually being able to test their ideas in experiments involving the behaviour of real people. And one particularly new and promising area of experimental economics focuses on tax evasion, which ought to be of keen interest to many governments around the world.

A couple of years ago, Simon Gachter at the University of Nottingham carried out a number of experiments on the way people co-operate which had profound implications for tax evasion. Gachter’s conclusion was that people decide whether or not to pay taxes based on the behaviour of their peers. The implication is that in certain circumstances, tax evasion may be a kind of fashion that spreads through society like bell-bottomed jeans.

Today, Georg Zaklan from the University of Bamberg in Bavaria, Germany, and pals show just how this might work in the real world by constructing a model of tax evasion behaviour in society.

His society is an Ising spin model (most commonly used to show critical behaviour in magnetic materials) in which agents can chose to evade taxes or not based on the behaviour of their neighbours.

Sure enough, the model shows that without any control on tax evasion, the behaviour can spread rapidly, disappear equally quickly and re-appear again later (just like bell-bottoms).

But the beauty of Zaklan’s simulation is that it suggests a way in which governments can very easily prevent the spread of tax evasion. The team has modelled the effect of increasing the probability that a tax evader will be caught and show that a small increase could have profound effects on tax evasion.

So what governments should do is increase the number of tax audits they carry out (as well as making sure there are adequate punishments for offenders). Zaklan says the model implies that if only 1 % of the population is tax audited, tax evaders would be brought to heel for good.

That sounds interesting and might be worth a try in some countries, were it not for some important gaps in the paper.

The biggest of these is this: what evidence is there that tax evasion fluctuates in the real world in the way that the Ising model predicts? Zaklan doesn’t present any, so while this work is interesting, I’ll need some better evidence before I’m convinced that his model really describes what’s going on.

Ref: arxiv.org/abs/0805.0998: Controlling tax evasion fluctuations

How many politicians spoil the broth? More than 20…

April 17, 2008

Cabinet size

The Scottish author Robert Louis Stevenson once said: “politics is perhaps the only profession for which no preparation is thought necessary.”

Given that these people run the world’s biggest (and smallest) economies, how many are needed to do a decent job?

It is well known in management circles that decision making becomes difficult in groups of more than 20 or so. The British historian Northcote Parkinson studied this idea in relation to British politics and conjectured that a cabinet loses political grip as soon as its membership passes a critical size of 19-22 due to its inability to make efficient decisions.

Now Peter Klimek and pals from the Complex Systems Research Group at the Medical University of Vienna in Austria have found a similar relationship between the efficacy of political systems around the world and the size of the cabinets they employ to make decisions.

Using data supplied by the CIA (which must obviously be 100 per cent correct), they compared the cabinet sizes in 197 self-governing countries with various indicators related to those countries’ economic, social and democratic performance. For example, the UN’s Human Delevopment Indicator which assesses a country’s achievement in areas such as GDP, life expectancy at birth and literacy.

The size of cabinets varied from just 5 in Liechtenstein and Monaco to 54 in Sri Lanka.

Klimek and co say that the various indicators of success are negatively correlated with cabinet size. Their message is, rather predictably, that too many cooks spoil the broth.

More interesting is their claim that there is a critical value of around 19-20 members, beyond which consensus is more difficult to achieve. They build a (somewhat unconvincing) mathematical model to show that at this critical value “dissensus” becomes more likely because it is easier to form multiple opposing factions in groups of more than 20 members. However, the transition from consensophile to dissensophile groups doesn’t look very critical to me.

All this is of more than passing relevance in Europe where the recent expansion of the European Union has resulted in a club of 27 nations. How will effective decision making be made? By reducing the size of the cabinet, called the European Commission, to 18 members, with various countries coming in and out on a rotation basis.

That means a third of the members will not be represented at the exective level. Which is praisworthy for its practicality but dubious from a democratic point of view. But that’s politics.

Ref: arxiv.org/abs/0804.2202: To How Many Politicians should Government be Left?