Tuesday, 1 June 2010

BP: When an IT failure really isn't one

A number of articles have arisen around the role of IT in the tragic BP oil spill (i.e. BP oil spill ‘slows’ but serious IT failures come to surface and BP oil spill slowing but IT failures revealed).

I have to say I'm rather concerned at these stories in IT publications that I hold in high regard. The reason is that when you read the story and look at the report, it does not back up the claims about IT failure in these articles. They claim that:
"...the US government released a summary of BP’s own early investigation into the problems. The document contains some damning facts about IT at the rig, ..."
Except that it doesn't: the Congressional memorandum makes no mention of IT, or for that matter the words 'information', 'computer' or 'software' (go on, click the above link and see for yourself). The memo does however make plenty of mention of procedural and (non-IT) equipment failures.

There were references to robots and supercomputers in the articles, but in the context of the clear-up (and so are not relevant). The only other part of the two articles that seemed to support their narrative was:
"BP has said the accident “was brought about by the failure of a number of processes, systems and equipment”..."
The problem with leaning on this (unattributed) quote is that it does not specifically apply to IT. In fact the above form of words is so generic that it could apply to anything. The plain English translation would be 'stuff went wrong, we don't know what, but we wish to sound like we do!'.

In short, the content and sources of these articles do not support their title or overall narrative of an IT failure: in fact they point to likely non-IT causes.

Why should this bother me? The reason is that ensuring that an informed public debate on issues as serious as the BP oil leak is important, as is the role of IT in society more generally. Given the blame that will doubt arise as a result of the tragedy, the easy route of blaming IT for the sake of expediency is not the responsible or moral one.

3 comments:

ComputerworldUK said...

Hi Andrew

Thanks for your comments. I think the article and the US government memo make clear that the system failures “played a part” (para 1 of the story) of an overall picture, including processes and human error.

The emergency shutoff and disconnect systems are indeed IT systems, which monitor the flow of oil and communicate between the well head and the rig. If the flow of oil is considered out of balance or communications links are lost, the systems are supposed to shut off the valves. The systems did not function.

IT is not being blamed as the sole cause of the problem – when BP is raising additional concerns over human error and the quality of casing and cementing around the well – but the emergency systems failed to kick in. As other posters note, and the story para 8, there are also concerns around the testing and maintenance of systems.

Nevertheless, some monitoring systems successfully identified issues (page 2) - but were results ignored? Time will tell.

Thanks for your feedback and raising those issues.

Leo

PS> The BP quotation about the "failure of a number of processes, systems and equipment" comes from the following BP press release http://www.bp.com/genericarticle.do?categoryId=2012968&contentId=7062374

Andrew Tuson said...

Leo,

Thanks for your reponse. It is most commendable that you got back so quickly on these issues and clarified them. Also, thanks for the additional source.

My feeling is that we'd have got to a more satisfying article by a narrative focusing on the broader socio-technical issues rather than trying to bend towards IT failure. I think we'd have got more insight out of the report that way.

As you say, time will tell. Myself, I think the community will be served by a future more inquistorial feature article on to what extent IT could prevent such problems in the future, perhaps with input from safety critical systems experts once more about the actual reasons for failure become clear.

Andrew

ComputerworldUK said...

Andrew, thanks for your comments again.

As you say, as time goes on there are definitely important questions over what could be done in the future. A lot of lessons to be learnt.

We are looking into this issue.