300.000 Norwegians move house every year. If the pattern made by their journeys could be compressed into one short animation, what would it look like? What would someone seeing it be able to learn, if anything?
Deluge is a C++ application designed to answer these questions. The underlying data was generated by cross referencing 8 million tax records from 2006 and 2007 to track changes in postal codes.
Norwegian tax returns
In Norway the incomes and fortunes of all tax paying individuals are made public every year. Historically the records were only available as printed matter, but at some point the tax authorities decided to make the records available to news organisations as machine readable files. Though it may seem radical I believe this practice is essentially beneficial in a social democracy with a high level of personal taxation and a great welfare system. The press occasionally runs a few stories on monied individuals with clever accountants, so this seems to work to a certain degree.
In addition to writing a few beneficial stories the press has used this data as the basis for two other activities: news on celebrity earnings and SEO optimized pages for every tax payer. These are pages we constantly stumble across on Google when simply were trying find someone's phone number.
Data as granular as this is usually reserved for government administration, not handed out to the press. The underlying data set lists over 4 million individuals. It contains full name, year of birth, postal code and their attendant financial data. I was annoyed by the lack of investigation the press had given this information and asked if I could have a hard look at it on behalf of the news organization our company works with.
Uniquing tax payers
It seemed interesting to track people across years and it turns out only 70.000 Norwegians aren't unique by name of year of birth. If you subtract these you're left with 3.950.000 unique tax payers. As we have the postal codes for each person we can track who moved between 2006 and 2007.
Crowdsourced postal codes
The practice of transparency regarding tax returns should not be taken to indicate that public Norwegian agencies are extremely generous with their data. Even though we have everyone's income listed on Google the Norwegian postal agency doesn't publish the positions or bounds of postal codes. Yet, thanks to a private initiative by Erik Bolstad, this information has been made public through a crowdsourced effort (awesome).
A sketch was made with a subset of the data in a 3D software package. It had a limit of a few thousand particles, but it did seem somewhat promising
Under 35 year olds with under 100.000 Norwegian kroners income.
The data begs the question: if the patterns of everyone moving in one year could be compressed into one short animation, what would could it look like? If you want this visual answer in realtime and with a decent framerate the volume of information pushes the problem just North of what you can accomplish with Processing or interepreted languages. I don't have much experience with C++, but luckily I got to spend two weeks with a good friend and our families in a storm ravaged resort.
Simen has had plenty of experience writing low level code in building his open source CNC motion control GRBL software and could assist during a couple of late evenings in getting a quick prototype running with the Cinder OpenGL libraries. When running Word and the latest OSes computers may seem to running at approximately the same clip they did 10 years ago, but wonders have happenend when you get sufficiently close to the hardware.
When running at full speed the visualization is clearly lacking in terms of salient features, yet I find it interesting. Then again, I like looking at Pachinko machines and waterfalls – processes comfortably stuck between the random and the ordered. When slowing the animation down and filtering for certain demographies it becomes more useful. At its best laymen, like myself, can visually perceive facets of the national Norwegian migratory process that before were only available through the statistical calculations of researchers in demography.