NoSQL: Is It Time to Ditch Your Relational Database?

April 24, 2010

This week London hosted the largest NoSQL conference so far. The aim was to explore non-relational data stores, which have grown in prominence recently, particularly withTwitter joining Facebook and Digg as a Cassandra adopter.

Pragmatism was a common theme of all the presentations, and the principle of “using the right tool for the job” came up again and again. There seemed to be general agreement that what we are really talking about is “not only SQL”, in other words to use these technologies as a complement to relational databases, where many years of experience have been accumulated.

I recommend reading the presentations for day 1 and day 2, both of which are on myNoSQL (which Makoto Inoue called “The Hello of NoSQL” in his very entertaining talk on Tokyo Cabinet). The two days I was there were enjoyable for many reasons, including many general pearls of architectural wisdom, but I want to focus on practical examples where NoSQL is in use.

So how are people using NoSQL?

Matt Wall and Simon Willison described how they do things at the Guardian. They have an Enterprise Java platform that provides feeds on which front-end developers can build useful features. Around the edges, the team have used various tools for rapid development, including Redis (read Simon’s post) for the BNP heat map and a more performant version of the MP expenses review page, Google AppEngine for Zeitgeist, and Google Docs (specifically spreadsheets) for sharing data.

Jonathan Ellis gave a technical presentation on Cassandra, which is designed from the bottom up for replicating data. Replication across nodes is easily achieved by streaming pre-sorted blocks sequentially.

Kevin Weil told us about the challenges they face at Twitter, starting with the 7TB of data they collect each day (writing that amount of data at a typical disk speed of 80MB/s would take 24.3 hours). In addition to adopting Cassandra, Twitter has developed FlockDB, which is a social graph store.

The BBC uses CouchDB as a key value store for iPlayer, the home page layout, and the film network.  Enda Farrell explained that they control access to CouchDB through an API, which allows them to support authorisation, sharding and JMX instrumentation.

Matthew Ford has been involved in a number of NoSQL projects, and covered the pros and cons of the document-oriented data stores (CouchDB /MongoDB), and key-value stores.

Tobias Iversson‘s slide sums up where NoSQL data stores fit in to the architect’s toolbox; relational databases are suitable for the majority of cases, and we understand well how to manage these. Scaling to size vs Complexity

However, there are some specific cases, that is key-value stores and graph databases, where alternative solutions are better.

The choice of whether to choose a document data store instead of a relational database (RDBMS) is more difficult. Cassandra has been proven with large data sets, with indexing done by pre-defining “supercolumns” that provide the mapping between indexes and their corresponding data values. CouchDB queries are done through views, while MongoDB allows ad-hoc queries. Relational and non-relational databases all deal with structured data, but the non-RDBMS stores all the data in one place whereas the RDBMS requires you to join rows from normalised data tables.

The structured data model feels like a more natural fit to the data model typically used in an application, and avoids the object-relational mapping problems associated with mapping a hierarchical structure to a set of flat tables. However, adopting one of the “newer” data stores is intrinsically more risky than an RDBMS, because these have been around for less long.

Although Cassandra is apparently suitable for single-server installations I expect it will be the option for larger sites for some time, given the additional complexity associated with the super-column model. For smaller sites you may find CouchDB and MongoDBs features appealing, such as CouchDB’s replication (also being adopted by MongoDB), and the easier interfacing through JSON. However, a relational database is still likely to be the right choice for the majority of cases.


The Digital Economy Bill: A Washed Up Bill from a Washed Up Government

April 7, 2010

Like many in the tech community I followed yesterday’s debate on the Digital Economy Bill. This Bill is being sponsored by Lord Mandelson, who developed an interest in clamping down on copyright infringement after he had dinner with David Geffen on his Corfu yacht.

The Bill confers heavy sanctions (to be finalised) for illegal filesharing, yet the Government seems determined to rush it through at the tail end of this Parliament without proper debate.

The turnout was poor; perhaps this was apathy, or the fact that Brown and Mandelson chose to announce the General Election on the same day. (My MP wasn’t there.)

We were initially told by Harriet Harman that the Bill would be dealt with through “super-affirmative” actions, as part of the Parliamentary washup. This means that a committee would look at the Bill and make any necessary amendments. As was pointed out by John Grogan (Labour), pushing through a contentious bill like this without proper debate is unprecedented (all other bills that have been handled in the washup have had broad agreement). Fiona Mactaggart also noted later that “secondary legislation Committees [are] not places where scrutiny occurs; they are another example of pathetic oversight by Parliament.”

On the Labour side John Robertson and Sion Simon were firmly in favour of the Bill. In one exchange, Tom Watson stated that as copyright infringement is theft, infringers should be allowed their day in court; Simon replied indignantly that they would (by appeal at a tribunal), which is completely different thing.

Other notable points:

  • Bradshaw states that the “technical measures” that may be imposed on repeat infringers would “not involve permanent disconnection“. These measures are covered by Clause 18 of the Bill, and Fiona MacTaggart pointed out later on that most MPs hadn’t yet seen the revised clause.
  • John Redwood asked a perfectly reasonable question of what someone could do with, for example, a downloaded song; copy it to another PC, let others listen to it. Stephen Timms told him he was “barking up the wrong tree”, with Bradshaw scornfully following up “not for the first time”; Timms replied that rights owners wouldn’t need to do that. Redwood restated the question and Timms again misinterpreted the question, refusing to allow any further comeback.
  • When Adam Afriyie challenged Tom Watson that it was the latter’s party who were introducing the bill, Watson shot back that the Tories had the power to stop it. (Afriyie coined the phrase “a washed up bill from a washed up government”.)
  • Fiona MacTaggart gave an impassioned and informed speech, pointing out that sharing intellectual property (IP) can create a market for that IP, something that many of the other MPs seem to miss.

I find the Conservative Party’s argument (via Jeremy Hunt), that something needs to be done to prevent damage to the economy, rather weak. The assertion that hundreds of millions of pounds are lost every year is presumably an extrapolation of the figure from the British Phonographic Institute mentioned in the Digital Britain Report. Hunt said that the Tories reserve the right to return to the Bill and make appropriate amendments after the next election, but that pre-supposes that they are actually elected.

When Blair was Prime Minister it was terrorism that was used to justify unreasonable measures. Now it’s the economy.

Yesterday the Government again showed its contempt for democracy in this country. Remember this on May 6th.

The full text of the debate is available on Hansard: http://www.publications.parliament.uk/pa/cm200910/cmhansrd/cm100406/debindx/100406-x.htm

Update: As the Bill has evolved it has gradually strayed from the original intent of the Digital Britain White Paper. Tom Watson and others proposed a number of amendments last night, including tightening up the definitions, e.g. focusing on peer to peer file sharing instead of general online infringement, and challenging the starting assumption of liability on the part of the internet connection owner. Ultimately, Timms wasn’t having any of it. The only significant change in the Bill was to drop clause 43, which would allow for unclaimed (orphaned) works of copyright to be used by others. The Bill was passed, with numerous Labour MPs filing in who hadn’t bothered to turn up for the debate, and very few Conservative MPs voting. Stephen Timms has lost whatever geek credentials he had, as he apparently doesn’t know what an IP address is. And it looks like the Bill is going to meet with a fair amount of resistance.


Training for the Marathon (Six Months To Go)

October 25, 2009

As I mentioned in my previous post, after coming back from Lesotho I signed up* for the Marathon, and am running on behalf of Sentebale. This week I received the Information and Training Booklet which I’m working through. One point that struck me was the list of items that might be purchased with the sponsorship money:

  • £10 for a school uniform (“these last a long time and encourage children to go to school”)
  • £15 for one blanket (“Lesotho gets very cold, especially in the mountains”)
  • £35 for one radiator (“In the winter months these can keep whole families warm”)
  • £60 for one goat (“Animals are very valuable in Lesotho and provide important nutrients for families with milk and meat”)
  • £100 for a bed (“… often shared by whole families so are very important”)
  • £150 puts a child through school for a year

I’m not sure what to expect over the next 6 months. As I haven’t been a regular runner I don’t know how well I’m going to adapt to running long distances, but the uncertainty is part of the excitement of the challenge.

Things I’ve learnt from reading and talking to people so far (in no particular order):

- For longer distances, it’s important to drink not just water, but fluids with salts in, otherwise you can develop hyponatremia.

- “Hitting the wall” at around 20 miles happens because the body is limited in how much glycogen it can store

- Eat 3 hours before a long run, and drink at least an hour before

- Do longer runs off road

- Map My Run has a good selection of training runs

- Join a running club in January or February, which will help with the training when it’s cold

- If you’re out for long runs in the cold you will get cold unless you have hat / gloves / running tights

- The Virgin London Marathon web site has a comprehensive section of training advice

- When you reach the finish line, check who’s around you in to make sure you aren’t being passed by a Teletubbie or octogenarian!

I bought some suitable trainers, short and socks yesterday. Watching the video of my feet on the treadmill was interesting … the left foot tends to come down harder than the right. I need to improve on this (e.g. Allow the knees rather than the feet to lead the movement forward.) Improving core strength, with the Plank exercise for example, will mean that it is easier to maintain good balance and therefore use less energy.

I’m going to aim for “getting round” comfortably, which means a 14-16 week schedule 4 times a week. Looking forward to trying out the trainers today.

* If you want to sponsor me, any amount would be welcome.


HIV in Southern Africa: It’s Worse Than You Think

October 2, 2009

Over the weekend I was in Lesotho for my father-in-law’s funeral. Lesotho (pronounced luh-soo-too) is a country of around 2 million people surrounded by South Africa; to get there we flew down to Johannesburg and then on to Maseru, the capital. I wasn’t sure what to expect, but discovered a a ruggedly beautiful country, a small community of  hospitable ex-pats,  and welcoming locals.

Football in Lesotho

The tragedy of this wonderful country is that it is struggling with a very high rate of HIV infection, one of the highest in Africa, and estimated at around 20% of the population; in the 15-40 year old group it is probably closer to 4 out of every 10 people. Average life expectancy is now around 35 years.

The high rate persists for a number of reasons: although condoms are available, young people don’t like wearing them; free HIV tests are available but the stigma associated with HIV means that potential carriers are reluctant to take the test; some believe that since AIDS makes you thin sleeping with a fat person is safe, or that having intercourse with a young virgin will cure you. Unfortunately, I’m not making this up. And since HIV/AIDS suffers usually die from secondary causes such as, e.g., pneumonia or tuberculosis, it is convenient to deny the underlying cause when someone dies.

There are around 400,000 orphans from AIDS; the boys seem to prefer living on the streets than taking up orphanage places, while the orphaned girls who aren’t looked after often become concubines. Africa has a system of extended family where an aunt or uncle will look after children if anything happens to their parents, but with one wage on average supporting 8 people (and a minimum wage of ZAR 900, or around £70, per month) the demand on the income earner becomes too great, and this system breaks down. Antiretroviral drugs are effective only when combined with a proper diet.

As desperate as this situation is,  it’s important to keep trying to get it under control.  Getting assistance to the people that need it is difficult but not impossible. There are numerous organisations working in Lesotho, such as Kick 4 Life (who work with 12-17 year olds through football), the Durham-Lesotho link, and Prince Harry’s charity Sentebale which is working to transform the lives of Lesotho’s orphans and vulnerable children.

I know that money is tight for most of us at the moment, but if you feel like improving someone’s life you may be surprised how much of an impact a contribution could make. Please check out these links and make your own mind up.

Update: I have signed up to run the London Marathon on behalf of Sentebale. I’m not a regular runner, and I haven’t run the Marathon before so dragging my forty-something body around 26 miles is going to be tough. But I know it’s going to make a difference to the lives of some of the children out there. Please support me to whatever degree you can. Thanks!


MySQL Java Connector Problem on Windows Vista

August 30, 2009

I’ve just been setting up Eclipse for Stripes development on my home PC and hit a problem connecting to the database. I haven’t seen this on any of my Windows XP machines and it seems to be specific to Windows Vista, which was bundled with the PC.

Versions in use are:

  • Windows Vista Home Basic
  • MySQL Server Community Edition 5.1.37
  • Eclipse JEE Galileo (based on Eclipse 3.5)
  • JDK 1.6 Update 16
  • JARs: mysql-connector-java-5.1.6-bin.jar, ibatis-2.3.0.677.jar

I tried re-installing MySQL with User Account Control turned off, and verified that it wasn’t being blocked by the firewall. Still no joy.

Then I found this thread on CodeRanch which suggests that the problem is something to do with resolving localhost. So I checked c:/windows/system32/drivers/etc/hosts and found this:

::1             localhost

It isn't clear why the standard loopback IP address isn't included, but adding it back in fixes the problem (also see Microsoft KB article):

127.0.0.1       localhost
::1             localhost

Activate 09: How to Change the World through Technology and the Internet

July 5, 2009

Last Wednesday I went to a different type of conference for me. Normally I focus on keeping up with the technologies that may be relevant in my job (which I find Future of Web Apps good for), but Activate 09 caught my eye. I enjoy listening to Tech Weekly, and have been watching the various Guardian initiatives including their Open Platform API,
and the MPs’ expenses site, built by Simon Willison and his team in an amazing five days. The objective of Activate 09 was really to explore how can we use this technology to make a difference. The event has been covered by the Guardian Activate 09 blog, James Governor, Roo Reynolds, Martin Belam, and Matt McAlister has described what the Guardian was trying to achieve with the event. Roo’s is the kind of post I would hope to have written after deciphering my copious notes. Rather than recounting in detail, this is more of a summary of the bits I felt were interesting or thought-provoking.

Humanity, Technology & The Web

Werner Vogels talked about reducing the risk of launching your site, by using Amazon’s infrastructure, just “in case no-one comes to your party”, and gave plenty of examples including the Facebook app Animoto & Playfish (with 27M users). Livestream has no infrastructure, but on U.S. election night they supported traffic of 40GB/s, and 90,000 concurrent channels.

Arianna Huffington & Werner Vogels at Activate09

Arianna Huffington made some great points: when trying to create reform, raw data by itself cannot be viral [you need a mechanism by which people can process it]. Also, the internet is self-correcting; in the 2008 Election it was much harder for the Republican Party to convince the American public that Obama was an angry black Muslim fundamentalist who wanted to undermine the Constitution. In the Q&A session that followed Arianna suggested that we should aim to combine the best of old media (story telling), with the best of new media. And we should shift the debate from “how to save newspapers” to “how to save journalism”. Intellectual property and licensing came up a couple of times. In his talk, Ed Parsons mentioned that the National Rail app on the iPhone is one of the most expensive because
of the licensing fee that the developers have to pay to ATOC. And in a later session, Tom Watson criticised the Ordnance Survey, saying it was disgraceful that their data was not generally available for geomapping applications. I missed the later session with the Ordnance Survey representative, but understand the somewhat disingenuous argument was used that it doesn’t cost the taxpayer anything because they charge people to licence it.

Politics, Democracy & Public Life

Emily Bell spoke to Thomas Gensemer, whose company Blue Digital built the site behind the Obama campaign. On a crass level, as he put it, they helped to raise 80% of the money for Obama’s campaign. But there was also a customer service angle of the campaign, such that if someone volunteers in Ohio they get an e-mail in 48 hours. He says “If you have a group of supporters, and ask them to do something that has clear tangible benefits, they will do it.” And “Ask yourself: what do I want my supporters to do today? If you can’t answer that, technology is not the answer.”

Adam Afriye, Emily Bell, Thomas Gensemer, Tom Watson

Tom Watson was supportive of the Conservative innovation agenda, as put forward by Adam Afriye. Adam’s talk followed a party line rather more than Tom’s; as well as criticising Ordnance Survey for not making its geodata more widely available, Tom also suggested that we need to have a government where tolerance of failure is accepted; if an initiative doesn’t work, try something else. Tom is very keen to put public data out there, and see what can be done with it. It will be interesting to see how the Tories’ election campaign evolves, as Adam alluded to them using local news sites to spread their messages during the campaign. Thomas Gensemer was not optimistic about the digital aspect of Labour’s campaign: it will be under-resourced, and there isn’t the space between now and then to start listening to the electorate.

Zimbabwe

Gerry Jackson‘s talk was a story of successes in extreme adversity. Gerry set up the first independent radio station in Zimbabwe, which the government has repeatedly tried to block. Emigrating families leave mobile phones in place to contact their friends and family, and SMS is proving helpful as a means of enabling transparency on what is really happening in Zimbabwe.

Reinventing Heathcare

Jay Parkinson gave an original perspective on healthcare, based on the principles that 65% of doctor pay is overhead (i.e. paperwork), bad behaviour is what kills most Americans, and patients should be in charge of their own health. So he has built a platform, hellohealth.com, which puts patients in contact with experts, and has various tools available to support the relationship, e.g. by sending an SMS once a week to check patient details / weight.

The Rules of Civilisation

In the session, “the means of production in the hands of the many; will the internet lead to a rewriting of the rules of civilisation”, William Perrin noted that 19th century laws (on which the UK is largely run) don’t work with 21st century tools. William PerrinHeath noted that we have lots to learn about how we listen, how to achieve measured constructive self-expression, and mutual self-respect; we are now in touch with more people that we disagree with. A number of times throughout the day, the point came up that systems based on secrecy and privacy are harder to maintain when there is more transparency. Tom Steinberg believes we need someone to enforce this transparency. The questions were also posed: Does our educational methodology need to change? Should children be fed knowledge, or (instead) how to locate it? The best person to act as steward on the journey through that system is the person whose data it is; we need personal, portable education records. Tom thought about other ways that we can help people when dealing with government, e.g. when lots of people are doing the same thing at the same time, such as filling in a tax return. Also, would what be the effects of getting many more people to read laws before they are ratified?

How Children Can Self-Organise to Educate Themselves

By the end of the day my brain was fairly full. But I stayed for Sugata Mitra’s fascinating talk on how children who don’t have any other option can self-organise to learn how to use a computer. He covered the same material at TED, which is well worth watching.

Open Data

I came away from this conference feeling rather optimistic about what could be achievable with technology in this country during my lifetime, and impressed by the line-up that the Guardian had put together. The Cabinet Office’s announcement involving Tim Berners-Lee to advise on how to open up non-personal public data was a very positive step; I realise that technology isn’t the answer by itself, but I hope that this government or the Conservatives will take the opportunity to harness the ability of the many talented developers in this country, and give them the tools they need to create useful applications.

Technorati Tags:


What I’ve Been Doing

July 5, 2009

I’m about to post on the Guardian’s Activate09 conference, and as it’s been a while since I’ve written here, a bit of an update is in order.

For the last 18 months or so I’ve been working at Carphone Warehouse on their CRM application, focusing on performance, and making sure it can do more with less as the number of customers increases. The application (it’s Java-based) seems to be holding up well and I’m probably going to need a new challenge soon.

There have been a few other things going on, like a 3 month home renovation, and the frenetic pace of family life; what free time remains has been spent on a project which plans to do a better job of presenting up to date information about local businesses. It’s an increasingly crowded space for startups, but when I launch, which should be this year, I hope people will find it useful. As part of this I’ve been exploring new languages, including PHP, Ruby and Groovy.


Follow

Get every new post delivered to your Inbox.