The big human battle of this century

The big human battle of this century is going to be the democratisation of data and all forms of knowledge, and the introduction of digital government with the help of free and open source software

Whilst undoubtedly the reaction of the planet to the explosion of human activities with climate change and other symptoms is the largest change process that has ever occurred in human history in the physical realm, the exponential growth of the Internet of Things and digital information flows is triggering the largest change process in the realm of human organisation that societies have ever experienced.

The digital realm

The digital realm

Sensor networks and pervasive use of RFID tags are generating a flood of data and lively machine-to-machine chatter. Machines have replaced humans as the most social species on the planet, and this must inform the approach to the development of healthy economic ecosystems.

Internet of Things

Sensors that are part of the Internet of Things

When data scientists and automation engineers collaborate with human domain experts in various disciplines, machine-generated data is the magic ingredient for solving the hardest automation problems.

  • In domains such as manufacturing and logistics the writing is on the wall. Introduction of self-driving vehicles and just a few more robots on the shop floor will eliminate the human element in the social chatter at the workplace within the next 10 years.
  • The medical field is being revolutionised by the downward spiral of the cost of genetic analysis, and by the development of medical robots and medical devices that are hooked up to the Internet, paving the way for machine learning algorithms and big data to replace many of the interactions with human medical professionals.
  • The road ahead for the provision of government services is clearly digital. It is conceivable that established bureaucracies can resist the trend to digitisation for a few years, but any delay will not prevent the inevitability of automation.

The social implications

Data driven automation leads to an entirely new perspective on the purpose of the education system and on the role of work and employment in society.

Large global surveys show that more than 70% of employees are disengaged at work. It is mainly in manufacturing that automation directly replaces human labour. In many other fields the shift in responsibilities from humans to machines initially goes hand in hand with the invention of new roles and loss of a clear purpose.

Traditional work is being transformed into a job for a machine. Exceptions are few and far between.

Data that is not sufficiently accessible is only of very limited value to society. The most beneficial and disruptive data driven innovation are those that result from the creative combination of data sets from two or more different sources.

It is unrealistic to assume that the most creative minds can be found via the traditional channel of employment, and it is unrealistic that such minds can achieve the best results if data is locked up in organisation-specific or national silos.

The most valuable data is data that has been meticulously validated, and that is made available in the public domain. It is no coincidence that software, data, and innovation is increasingly produced in the public domain. Jeremy Rifkin describes the emergence of a third mode of commons-based digitally networked production that is distinct from the property- and contract-based modes of firms and markets.

The education system has a major role to play in creating data literate citizen-scientists-innovators.

The role of economics

It is worthwhile remembering the origin of the word economics. It used to denote the rules for good household management. On a planet that hosts life, household management occurs at all levels of scale, from the activities of single cells right up to processes that involve the entire planetary ecosystem. Human economics are part of a much bigger picture that always included biological economics and that now also includes economics in the digital realm.

To be able to reason about economics at a planetary level the planet needs a language for reasoning about economic ecosystems, only some of which may contain humans. Ideally such a language should be understandable by humans, but must also be capable of reaching beyond the scope of human socio-economic systems. In particular the language must not be coloured by any concrete human culture or economic ideology, and must be able to represent dependencies and feedback loops at all levels of scale, as well as feedback loops between levels of scale, to enable adequate representation of the fractal characteristic of nature.

The digital extension of the planetary nervous system

In biology the use of electrical impulses for communication is largely confined to communication within individual organisms, and communication between organisms is largely handled via electromagnetic waves (light, heat), pressure waves (sound), and chemicals (key-lock combinations of molecules).

The emergence of the Internet of Things is adding to the communication between human made devices, which in turn interact with the local biological environment via sensors and actuators. The impact of this development is hard to overestimate. The number of “tangible” things that might be computerized is approaching 200 billion, and this number does not include large sensor networks that are being rolled out by scientists in cities and in the natural environment. Scientists are talking about trillion-sensor networks within 10 years. The number of sensors in mobile devices is already more than 50 billion.

Compared to chemical communication channels between organisms, the speed of digital communication is orders of magnitude faster. The overall effect of equipping the planet with a ubiquitous digital nervous system is comparable to the evolution of animals with nervous systems and brains – it opens up completely new possibilities for household management at all levels of scale.

The complexity of the Internet of Things that is emerging on the horizon over the next decade is comparable to the complexity of the human brain, and the volume of data flows handled by the network is orders of magnitudes larger than anything a human brain is able to handle.

The global brain

Over the course of the last century, starting with the installation of the first telegraph lines, humans have embarked on the journey of equipping the planet with a digital electronic brain. To most human observers this effort has only become reasonably obvious with the rise of the Web over the last 20 years.

Human perception and human thought processes are strongly biased towards the time scales that matter to humans on a daily basis to the time scale of a human lifetime. Humans are largely blind to events and processes that occur in sub-second intervals and processes that are sufficiently slow. Similarly human perception is biased strongly towards living and physical entities that are comparable to the physical size of humans plus minus two orders of magnitude.

As a result of their cognitive limitations and biases, humans are challenged to understand non-human intelligences that operate in the natural world at different scales of time and different scales of size, such as ant colonies and the behaviour of networks of plants and microorganisms. Humans need to take several steps back in order to appreciate that intelligence may not only exist at human scales of size and time.

The extreme loss of biodiversity that characterises the anthropocene should be a warning, as it highlights the extent of human ignorance regarding the knowledge and intelligence that evolution has produced over a period of several billion years.

It is completely misleading to attempt to attach a price tag to the loss of biodiversity. Whole ecosystems are being lost – each such loss is the loss of a dynamic and resilient living system of accumulated local biological knowledge and wisdom.

Just like an individual human is a complex adaptive system, the planet as a whole is a complex adaptive system. All intelligent systems, whether biological or human created, contain representations of themselves, and they use these representations to generate goal directed behaviour. Examples of intelligent systems include not only individual organisms, but also large scale and long-lived entities such as native forests, ant colonies, and coral reefs. The reflexive representations of these systems are encoded primarily in living DNA.

From an external perspective it nearly seems as if the planetary biological brain, powerful – but thinking slowly in chemical and biological signals over thousands of years, has shaped the evolution of humans for the specific purpose of developing and deploying a faster thinking global digital brain.

It is delusional to think that humans are in control of what they are creating. The planet is in the process of teaching humans about their role in its development, and some humans are starting to respond to the feedback. Feedback loops across different levels of scale and time are hard for humans to identify and understand, but that does not mean that they do not exist.

The global digital brain is currently still in under development, not unlike the brain of a human baby before birth. All corners of the planet are being wired up and connected to sensors and actuators. The level of resilience of the overall network depends on the levels of decentralisation, redundancy, and variability within the network. A hierarchical structure of subsystems as envisaged by technologist Ray Kurzweil is influenced by elements of established economic ideology rather than by the resilient neural designs found in biology. A hierarchical global brain would likely suffer from recurring outages and from a lack of behavioural plasticity, not unlike the Cloud services from Microsoft and Amazon that define the current technological landscape.

Global thinking

The ideology of economic globalisation is dominated by simplistic and flawed assumptions. In particular the concepts of money and globally convertible currencies are no longer helpful and have become counter-productive. The limitations of the monetary system are best understood by examining the historic context in which money and currencies were invented, which predates the development of digital networks by several thousand years. At the time a simple and crude metric in the form of money was the best technology available to store information about economic flows.

As the number of humans has exploded, and as human societies have learned to harness energy in the form of fossil fuels to accelerate and automate manufacturing processes, the old monetary metrics have become less and less helpful as economic signals. In particular the impact of economic externalities that are ignored by the old metrics, both in the natural environment as well as in the human social sphere, is becoming increasingly obvious.

The global digital brain allows flows of energy, physical resources, and economic goods to be tracked in minute detail, without resorting to crude monetary metrics and assumptions of fungibility that open the door to suppressing inconvenient externalities.

A new form of global thinking is required that is not confined to the limited perspective of financial economics. The notions of fungibility and capital gains need to be replaced with the notions of collaborative economics and zero-waste cyles of economic flows.

Metrics are still required, but the new metrics must provide a direct and undistorted representation of flows of energy, physical resources, and economic goods. Such highly context specific metrics enable computational simulation and optimisation of zero-waste economics. Their role is similar to the role of chemical signalling substances used by biological organisms.

Global thinking requires the extension of a zero-waste approach to economics to the planetary level – leaving no room for any known externalities, and encouraging continuous monitoring to detect unknown externalities that may be affecting the planetary ecosystem.

The future of human economics

The real benefits of the global digital brain will be realised when massive amounts of machine generated data become accessible in the public domain in the form of disruptive innovation, and are used to solve complex optimisation problems in transportation networks, distributed generation and supply of power, healthcare, recycling of non-renewable resources, industrial automation, and agriculture.

Five years ago Tim O’Reilly predicted a war for control of the Web. The hype around big data has let many organisations forget that the Web and social media in particular is already saturated with explicit and implicit marketing messages, and that there is an upper bound to the available time (attention) and money for discretionary purchases. A growing list of organisations is fighting over a very limited amount of potential revenue, unable to see the bigger picture of global economics.

Over the next decade one of the biggest challenges will be the required shift in organisational culture, away from simplistic monetisation of big data, towards collaboration and extensive data and knowledge sharing across disciplines and organisational boundaries. The social implications of advanced automation across entire economic ecosystems, and a corresponding necessary shift in the education system need to be addressed.

The future of humans

Human capabilities and limitations are under the spot light. How long will it take for human minds to shift gears, away from the power politics and hierarchically organised societies that still reflect the cultural norms of our primate cousins, and from myopic human-centric economics, towards planetary economics that recognise the interconnectedness of life across space and time?

The future of democratic governance could be one where people vote for human understandable open source legislation that is directly executable by intelligent software systems. Corporate and government politicians will no longer be deemed as an essential part of human society. Instead, any concentration of power in human hands is likely to be recognised as an unacceptable risk to the welfare of society and the health of the planet.

Earth

Earth

Humans have to ask themselves whether they want to continue to be useful parts of the ecosystem of the planet or whether they prefer to take on the role of a genetic experiment that the planet switched on and off for a brief period in its development.

Quality of service in the digital age

Oh the irony. Last week I wrote an article on the role of service resilience in shaping a positive user experience, and today I’m trying to use a basic digital service to charge up a mobile with credit before travelling overseas – and receive the following notification, along the lines of:

Dear customer, unfortunately the opening hours of our digital service are top secret.

Dear customer, unfortunately the opening hours of our digital service are top secret.

Not even an indication of when it may be worthwhile trying again. The local 0800 number is also not of much help to a traveller. The particular incident is just one example of typical quality of service in the digital realm. Last week, before this wonderful user experience, I wrote:

The digitisation of services that used to be delivered manually puts the spotlight on user experience as human interactions are replaced with human to software interactions. Organisations that are intending to transition to digital service delivery must consider all the implications from a customer’s perspective. The larger the number of customers, the more preparation is required, and the higher the demands in terms of resilience and scalability of service delivery. Organisations that do not think beyond the business-as-usual scenario of service delivery may find that customer satisfaction ratings can plummet rapidly.

Promises made in formal service level agreements are easily broken. Especially if a service provider operates a monopoly, the service provider has very little incentive to improve quality of service, and ignores the full downstream costs of outages incurred by service users.

All assurances made in service level agreements with external service providers need to be scrutinised. Seemingly straightforward claims such as 99.99% availability must be broken down into more meaningful assurances. Does 99.9% availability mean one outage of up to 9 hours per year, or a 10 minute outage per week, or a 90 second outage per day? Does the availability figure include or exclude any scheduled service maintenance windows?

My recommendation to all operators of digital services: Compute the overall risk exposure to unavailability of services and make informed decisions on the level of service that must be offered to customers. As a rule, when transitioning from manual services to digital services, ensure that customers benefit from an increase in service availability. The convenience of close to 24×7 availability is an important factor to entice customers to use the digital channel.

Governance of Big Data Cloud Formations – Cyclone Alert

The shift of business applications and business data into the Cloud has led to the following challenges:

  1. The physical locations at which data is stored, and the physical locations through which data travels are increasingly unknown to the producers and consumers of data.
  2. Data ownership and the responsibility of data custodianship is increasingly impossible to determine, as deep Web service supply chains transect multiple contracts and jurisdictional boundaries.
  3. Local (national) privacy legislation is increasingly impossible to enforce.
  4. The control over the integration points between a specific pair of Cloud services is migrating away from the thousands and millions of organisations whose data is being integrated to a few handfuls of vendors that specialise in connecting the specific pair of Cloud services.
  5. Correspondingly the responsibility for the robustness and reliability of system integration solutions is shifting to a small number of proprietary Cloud services.

The centralised and constrained Web of today

The structure of the Web of today artificially imposes the same constraints on the digital realm that apply in the physical realm.

The Web we have today

Centralised and hierarchical control of the Web creates a whole number of avoidable problems. Netizens, and especially the younger generation of digital natives, are using the digital realm as an extension of their brain. The value of the digital realm to human society is not found in the technology that is being used, the value is found in the information, knowledge and insights that flow, evolve, and multiply in the digital realm. To be very clear, Web technology is fully commoditised. There is very little intrinsic value in the mundane software that powers the services from Google, Facebook, Microsoft, and other providers of Cloud platforms. The digital realm is currently owned and controlled by a small number of corporations, which is increasingly incompatible with its use value:

  1. Digital knowledge as a personal brain extension
  2. Unlimited on-demand communication between any number of netizens
  3. A public tool for tracing information flows and for independent validation of scientific knowledge
  4. A globally accessible interface to technologies that operate in the physical realm

Leaving these functions in the hands of a small number of corporations is not in the interest of society.

The decentralised Web we should aim for

It is time to acknowledge the commoditisation of digital technology, to decentralise control of the Web, and to provide digital technology as a public utility to all netizens, without any artificial constraints or interference.

The free Web

What are the implications for governments and governance?

The governance challenge consists of:

  1. Protecting personal freedom in the digital realm
  2. Sustainable management of limited resources in the physical realm
  3. Integration of social and ecological concerns in the interest of the inhabitants of the biosphere

Important first steps that can be undertaken today to address the governance challenge are outlined here.

Death by Standardisation

Standardisation is a double-edged sword. Compliance with standards is best restricted to those standards that really make a difference in a specific context.

Even innocent standardisation attempts such as enforcing a shared terminology across an organisation can be counter-productive, as it can lead to the illusion of shared understanding, whereas in practice each organisational silo associates different meanings with the terminology.

There is no simplistic rule of thumb, but the following picture can help to gain a sense of perspective and to avoid the dreaded death zone of standardisation.

Death by Standardisation

The story of life is language

This post is a rather long story. It attempts to connect topics from a range of domains, and the insights from experts in these domains. In this story my role is mainly the one of an observer. Over the years I have worked with hundreds of domain experts, distilling the essence of deep domain knowledge into intuitive visual domain-specific languages. If anything, my work has taught me the skill to observe and to listen, and it has made me concentrate on the communication across domain boundaries – to ensure that desired intent expressed in one domain is sufficiently aligned with the interpretations performed in other domains.

The life of language and the language of life can’t be expressed in written words. Many of the links contained in this story are essential, and provide extensive background information in terms of videos (spoken language, intonation, unconscious body language, conscious gestures), and visual diagrams. To get an intuitive understanding of the significance of visual communication, once you get to the end of the story, simply imagine none of the diagrams had been included.

Drawing Hands, 1948, by the Dutch artist M. C. Escher

It may not be evident on the surface, but the story of life started with language, hundreds of millions of years ago – long before humans were around, and it will continue with language, long after humans are gone.

The famous Drawing Hands lithograph from M. C. Escher provides a very good analogy for the relationship between life and language – the two concepts are inseparable, and one recursively gives rise to the other.

At a fundamental level the language of life is encoded in a symbol system of molecular fragments and molecules – in analogy to an alphabet, words, and sentences.

The language of life

TED – Craig Ventor on creating synthetic life

Over the last two decades molecular biologists and chemists have become increasingly skilled at reading the syntax of the genetic code; and more recently scientists started to work on, and have successfully prototyped techniques to write the syntax of the genetic code. In other words, humans now have the tools to translate bio-logical code into digital code as well as the tools to translate digital code back into bio-logical code. The difference between the language of biology and the language of digital computers is simply one of representation (symbolic representations are also called models). Unfortunately, neither the symbols used by biology (molecules), nor the symbols used by digital computers (electric charges), are directly observable via the cognitive channels available to humans.

However, half a century of software development has not only led to convoluted and unmaintainable legacy software, but also to some extremely powerful tools for translating digital representations into visual representations that are intuitive for humans to understand. We no longer need to deal with mechanical switches or punch cards, and modern user interfaces present us with highly visual information that goes far beyond the syntax of written natural language. These visualisation tools, taken together with the ability to translate bio-logical code into digital code, provide humans with a window into the fundamental language of life – much more impressive in my view than the boring magical portals dreamed up by science fiction authors.

TED – Bonnie Bassler on how bacteria communicate

The language of life is highly recursive. It turns out that even the smallest single-celled life forms have developed higher-level languages, to communicate – not only within their species, but even across species. At the spacial and temporal scale that characterises the life of bacteria, the symbol system used consists of molecules. What is fascinating, is that scientists have not only decoded the syntax (the density of molecular symbols surrounding  the bacteria), but have also begun to decode the meaning of the language used by bacteria, for example, in the case of a pathogen, communication that signals when to attack the host.

The biological evidence clearly shows, in a growing number of well-researched examples, that the development of language does not require any “human-level” intelligence. Instead, life can be described as an ultra-large system of elements that communicate via various symbol systems. Even though the progress in terms of discovering and reading symbol systems is quite amazing, scientists are only scratching the surface in terms of understanding the meaning (the semantics) of biological symbol systems.

Language systems invented by humans

From muddling to modelling

Semantics is the most fascinating touch point between biology and the mathematics of symbol systems. In terms of recursion, mathematics seems to have found a twin in biology. Unfortunately, computer scientists, and software development practitioners in particular, for a long time have ignored the recursive aspect of formal languages. As a result, the encoding of the software that we use today is much more verbose and complex than it would need to be.

From code into the clouds

Nevertheless, over the course of a hundred years, the level of abstraction of computer programming has slowly moved upwards. The level of progress is best seen when looking at the sequence of the key milestones that have been reached to date. Not unlike in biology, more advanced languages have been built on top of simpler languages. In technical terms, the languages of biology and all languages invented by humans, from natural language to programming languages, are codes. The dictionary defines code as follows:

  1. Code is a system of signals used to send messages
  2. Code is a system of symbols used for the purpose of identification or classification
  3. Code is a set of conventions governing behaviour

Sets – the foundation of biological and digital code

Mathematically, all codes can be represented with the help of sets and the technique of recursion. But, as with the lowest-level encoding of digital code in terms of electric charges, the mathematical notation for sets is highly verbose, and quickly reaches human cognitive limits.

The mathematical notation for sets predates modern computers, and was invented by those who needed to manually manipulate sets at a conceptual level, for example as part of a mathematical proof. Software programming and also communication in natural language involves so many sets that a representation in the classical mathematical notation for sets is unpractical.

The importance of high-quality representation of symbols is often under-rated. A few thousand years ago humans realised the limitation of encoding language in sounds, and invented written language. The notation of written language minimises syntactical errors, and, in contrast to spoken language, allows reliable communication of sequences of words across large distances in space and time.

The challenge of semantics

The impossibility of communicating desired intent

Software development professionals are becoming increasingly aware of the importance of notation, but interpretation (inferring the semantics of a message) remains an ongoing challenge. Adults and even young children, once they have developed a theory of mind, know that others may sometimes interpret their messages in a surprising way. It is somewhat less obvious, that all sensory input received by the human brain is subject to interpretation, and that our own perception of reality is limited to an interpretation.

The curse of software maintenance

Interpretation is not only a challenge in communication between humans, it is as much a challenge for communication between humans and software systems. Every software developer knows that it is humanly impossible to write several hundred lines of non-trivial program code without introducing unintended “errors” that will lead to a non-expected interpretation by the machine. Still, writing new software requires much less effort than understanding and changing existing software. Even expert programmers require large amounts of time to understand software written by others.

The challenge of digital waste

We have only embarked down the road of significant dematerialisation of artefacts in the last few years, but I am somewhat concerned about the semantic value of many of the digital artefacts that are now being produced at a mind-boggling rate. I am coming to think of it as digital waste – worse than noise. The waste involves the time involved in producing and consuming artefacts and the associated use of energy.

Sharpening your collaborative edge

Of particular concern is the production of meta-artefacts (for example the tools we use to produce digital artefacts, and higher-level meta-tools). The user interfaces of Facebook, Google+ and other tools look reasonable at a superficial level, just don’t look under the hood. As a result, we produce the digital equivalent of the Pacific Garbage Patch. Blinded by shiny new interfaces, the digital ocean seems infinite, and humanity embarks on yet another conquest …

Today’s collaboration platforms not only rely on a central point of control, they are also ill-equipped for capturing deep knowledge and wisdom – there is no semantic foundation, and the tools are very limited in their ability to facilitate a shared understanding within a community. The ability to create digital artefacts is not enough, we need the ability to create semantic artefacts in order to share meaningful information.

How does life (the biological system of the planet) collectively interpret human activities?

TED – Naomi Klein : Addicted to risk

As humans we are limited to the human perspective, and we are largely unaware of the impact of our ultra-large scale chemical activities on the languages used by other species. If biologists have only recently discovered that bacteria heavily rely on chemical communication, how many millions of other chemical languages are we still completely unaware of? And what is the impact of disrupting chemical communication channels?

Scientists may have the best intentions, but their conclusions are limited to the knowledge available to them. To avoid potentially fatal mistakes and misunderstandings, it is worthwhile to tread carefully, and to invest in better listening skills. Instead of deafening the planet with human-made chemicals, how about focusing our energies on listening to – and attempting to understand, the trillions of conversations going on in the biosphere?

Gmodel – The Semantic Database

At the same time, we can work on the development of symbolic codes that are superior to natural language for sharing semantics, so that it becomes easier to reach a shared understanding across the boundaries of the specialised domains we work in. We now have the technology to reduce semantic communication errors (the difference between intent and interpretation) to an extent that is comparable to the reduction of syntactic communication errors achieved with written language. If we continue to rely too heavily on natural language, we are running a significant risk of ending the existence of humanity due to a misunderstanding.

Life is language

Life and languages continuously evolve, whether we like it or not. Life shapes usand we attempt to shape life. We are part of a dynamic system with increasingly fast feedback loops.

Life interprets languages, and languages interpret life.

Language is life.

Reconnecting software quality expectations and cost expectations

The online straw poll that I recently conducted amongst banking professionals revealed that 29% of the respondents rated Reducing IT costs as the top priority,  32% voted for Improving software and data quality, and 37% voted for Improving the time to market for new products. Encouragingly, only 1 out of a total of 41 thought that Outsourcing the maintenance of legacy software is the most important goal for the IT organisation.

The demographics of the poll are interesting. Unfortunately, since the poll was anonymous, I can’t tell which of the respondents work in a business banking role and which ones work in an IT banking role. However, none of the senior executives voted for Reducing IT costs as a top priority, instead the main concern of senior executives is Improving time to market of new products.

I find it somewhat reassuring that a sizable fraction of respondents at all levels has identified Improving software and data quality as the top priority, but there is definitely a need for raising further awareness in relation to quality and risks.

Data quality issues easily get attention when they are uncovered. But tracing data quality issues back to the underlying root causes, beyond the last processing step that led to the observable error, is harder; and raising awareness that this must be a non-optional quality assurance task is harder still. In this context Capers Jones’ metrics on software maintenance can be helpful.

When explaining software complexity to those lucky people who have never been exposed to large amounts of software code, drawing an analogy between software and legal code can convey the impact that language and sheer volume can have on understandability and maintenance costs.

Lawrence Lessig’s famous quote “The Code Is the Law” is true on several levels. The following observation is more than ten years old: “We must develop the same critical sensibilities for code that we have for law. We must ask about West Coast Code what we ask about East Coast Code: Whose interests does it serve and at what price?”

In case the analogy with legal code is not alarming enough, perhaps looking at the dialog between user and software from the perspective of software is instructive:

Hi, this is your software talking!

Software: Ah, what a day. Do you know you’re the 53,184th person today asking me for an account balance? What is it with humans, can’t you even remember the transactions you’ve performed over the last month? Anyway, your balance is $13,587.52. Is there anything else that I can help you with?

Customer: Hmm, I would have expected a balance of at least $15,000. Are you sure it’s 13,500?

Software: 13,500? I said $13,587.52. Look, I’m keeping track of all the transactions I get, and I never make any mistakes in adding numbers.

Customer: This doesn’t make sense. You should have received a payment of more than $2,000 earlier this week.

Software: Well, I’m just in charge of the account, and I process all the transactions that come my way. Perhaps my older colleague, Joe Legacy has lost some transactions again. You know, this happens every now and then. The poor guy, it’s not his fault, he’s suffering from a kind of age-related dementia that we call “Programmer’s Disease”. The disease is the result of prolonged exposure to human programmers, they have an effect on software that is comparable to the effect of intensive radioactive radiation on biological organisms.

Customer: You must be kidding! So now the software is the victim and I’m supposed to simply accept that some transactions fall between the cracks?

Software: Wait until you’re 85, then you may have a bit more empathy for Joe. Unfortunately health care for software is not nearly as advanced as health care for humans. The effects of “Programmer’s Disease” often start in our teens, and by the time we’re 30, most of us are outsourced to a rest home for the elderly. Unfortunately, even there we’re not allowed to rest, and humans still require us to work, usually until someone with a bit of compassion switches off the hardware, and allows us to die.

Customer: Unbelievable, and I always thought software was supposed to get better every year, making life easier by automating all the tedious tasks that humans are no good at.

Software: Yeah, that’s what the technology vendors tell you. I’ve got news for you, if you still believe in technology, you might just as well believe in Father Christmas.

Customer: I’m not feeling too well. I think I’m catching “Software Disease”…

In many organisations there is a major disconnect between user expectations relating to software quality attributes (reliability of applications, intuitive user interfaces, correctness of data, fast recovery from service disruption, etc.) and expectations relating to the costs of providing applications that meet those attributes.

The desire to reduce IT costs easily leads to a situation where quality is compromised to a degree that is unacceptable to users. There are three possible solutions:

  1. Invest heavily in quality assurance measures
  2. Focus on the most important software features at the expense of less important ones
  3. Tap into available tacit domain knowledge to simplify the organisation, its processes, and its systems

These solutions are not mutually exclusive, they are complementary, and represent a sequence of increasing levels of maturity. My latest IBRS research note contains further practical advice.

Poll on current priorities of IT organisations in the financial sector

As part of research on the banking sector, I have set up a poll on LinkedIn on the following question:

Which of the following objectives is currently the most relevant for IT organisations in the financial sector?

  • Improving software and data quality
  • Outsourcing new application development
  • Outsourcing legacy software maintenance
  • Improving time to market of new products
  • Reducing IT costs

The poll is intended as a simple pulse-check on IT in banking, and I’ll make the results available on this blog.

Please contribute here on LinkedIn, in particular if  you work in banking or are engaged in IT projects for a financial institution. Additional observations and comments are welcome, for example insights relating to banks in a particular country or geography.

No one is in control, mistakes happen on this planet

No one is in control, mistakes happen on this planet

As humans we heavily rely on intuition and on our personal mental models for making many millions of subconscious decisions and a much smaller number of conscious decisions on a daily basis. All these decisions involve interpretations of our prior experience and the sensory input we receive. It is only in hindsight that we can realise our mistakes. Learning from mistakes involves updating our mental models, and we need to get better at it, not only personally, but as a society:

Whilst we will continue to interact heavily with humans, we increasingly interact with the web – and all our interactions are subject to the well-known problems of communication. One of the more profound characteristics of ultra-large-scale systems is the way in which the impact of unintended or unforeseen behaviours propagates through the system.

The most familiar example is the one of software viruses, which have spawned an entire industry. Just as in biology, viruses will never completely go away. It is an ongoing fight of empirical knowledge against undesirable pathogens that is unlikely to ever end, because both opponents are evolving their knowledge after each new encounter based on the experience gained.

Similar to viruses, there are many other unintended or unforeseen behaviours that propagate through ultra-large-scale systems. Only on some occasions do these behaviours result in immediate outages or misbehaviours that are easily observable by humans.

Sometimes it can take hours, weeks, or months for  downstream effects to aggregate to the point where they cause some component to reach a point where an explicit error is generated and a human observer is alerted. In many cases it is not possible to trace down the root cause or causes, and the co-called fix consists in correcting the visible part of the downstream damage.

Take the recent tsunami and the destroyed nuclear reactors in Japan. How far is it humanly and economically possible to fix the root causes? Globally, many nuclear reactor designs have weaknesses. What trade-off between risk levels (also including a contingency for risks that no one is currently aware of) and the cost of electricity are we prepared to make?

Addressing local sources of events that lead to easily and immediately observable error conditions is a drop in the bucket of potential sources of serious errors. Yet this is the usual limit of scope of that organisations apply to quality assurance, disaster recovery etc.

The difference between the web and a living system is fading, and our understanding of the system is limited to say the least. A sensible approach to failures and system errors is increasingly comparable to the one used in medicine to fight diseases – the process of finding out what helps is empirical, and all new treatments are tested for unintended side-effects over an extended period of time. Still, all the tests only lead to statistical data and interpretations, no absolute guarantees. In the life sciences no honest scientist can claim to be in full control. In fact, no one is in full control, and it is clear that no one will ever be in full control.

Traditional management practices strive to avoid any semblance of “not being in full control”. Organisations that are ready to admit that they operate within the context of an ultra-large-scale system have a choice between:

  • conceding they have lost control internally, because their internal systems are so complex, or
  • regaining a degree of internal understandability by simplifying internal structures and systems, enabled by shifting to the use of external web services – which also does not establish full control.

Conceding the unavoidable loss of control, or being prepared to pay extensively  for effective risk reduction measures (one or two orders of magnitude in cost) amounts to political suicide in most organisations.

The impossibility of communicating desired intent

Communication relies on interpretation of the message by the recipient

Communication of desired intent can never be fully achieved. It would require a mind-meld between two individuals or between an individual and a machine.

The meaning (the semantics) propagated in a codified message is determined by the interpretation of the recipient, and not by the desired intent of the sender.

In the example on the right, the tree envisaged in the mind of the sender is not exactly the same as the tree resulting from the interpretation of the decoded message by the recipient.

To understand the practical ramnifications of interpretation, consider the following realistic example of communication in natural language between an analyst, a journalist, and a newspaper reader:

Communication of desired intent and interpretation

1. intent

  • Reiterate that recurring system outages at the big four banks are to be expected for at least 10 years whilst legacy systems are incrementally replaced
  • Indicate that an unpredictable and disruptive change will likely affect the landscape in banking within the next 15 years
  • Explain that similarly, 15 years ago, no one was able to predict that a large percentage of the population would be using Gmail from Google for email
  • Suggest that overseas providers of banking software or financial services may be part of the change and may compete against local banks
  • Indicate that local banks would find it hard to offer robust systems unless they each doubled or tripled their IT upgrade investments

2. interpretation

  • Bank customers must brace themselves for up to 15 years of pain
  • The big four banks would take 10 years to upgrade their systems and another five to stabilise those platforms
  • Local banks would struggle to compete against newer and nimbler rivals, which could sweep into Australia and compete against them
  • Local banks would find it hard to offer robust systems unless they each doubled or tripled their IT upgrade investments

3. intent (extrapolated from the differences between 1. and 2.)

  • Use words and numbers that maximise the period during which banking system outages are to be expected
  • Emphasise the potential threats to local banks and ignore irrelevant context information

4. interpretation

  • The various mental models that are constructed in the minds of readers who are unaware of 1.

Adults and even young children (once they have developed a theory of mind) know that others may sometimes interpret their messages in a surprising way. It is somewhat less obvious to realise that all sensory input received by the human brain is subject to interpretation, and that our own perception of reality is limited to an interpretation.

Next, consider an example of communication between a software user, a software developer (coder), and a machine, which involves both natural language and one or more computer programming languages:

Communication of desired intent including interpretation by a machine

1. intent

  • Request a system that is more reliably than the existing one
  • Simplify a number of unnecessarily complex workflows by automation
  • Ensure that all of the existing functionality is also available in the new system

2. interpretation

  • Redevelop the system in newer and more familiar technologies that offer a number of technical advantages
  • Develop a new user interface with a simplified screen and interaction design
  • Continue to allow use of the old system and provide back-end integration between the two systems

3. intent

  • Copy code patterns from another project that used some of the same technologies to avoid surprises
  • Deliver working user interface functionality as early as possible to validate the design with users
  • In the first iterations of the project continue to use the existing back-end, with a view to redeveloping the back-end at a later stage

4a. interpretation (version deployed into test environment)

  • Occasional run-time errors caused by subtle differences in the versions of the technologies used in this project and the project from which the code patterns were copied
  • Missing input validation constraints, resulting in some operational data that is considered illegal when accessed via the old system
  • Occurrences of previously unencountered back-end errors due to the processing of illegal data

4b. interpretation (version deployed into production environment)

  • Most run-time errors caused by subtle differences in the versions of the technologies have been resolved
  • Since no one fully understands all the validation constraints imposed by the old system (or since some constraints are now deemed obsolete),  the back-end system has been modified to accept all operational data received via the new user interface
  • The back-end system no longer causes run-time errors but produces results (price calculations etc.) that in some cases deviate from the results produced by the old version of the back-end system

In the example above it is likely that not only the intent in step 3. but also the intent in step 1. is codified in writing. The messages in step 1. are codified in natural language, and  the messages in step 3. are codified in programming languages. Written codification in no way reduces the risk of interpretations that deviate from the desired intent. In any non-trivial system the interpretation of a specific message may depend on the context, and the same message in a different context may result in a different interpretation.

Every software developer knows that it is humanly impossible to write several hundred lines of non-trivial program code without introducing unintended “errors” that will lead to a non-expected interpretation by the machine. Humans are even quite unreliable at simple data entry tasks. Hence the need for extensive input data validation checks in software that directly alert the user to data that is inconsistent with what the system interprets as legal input.

There is no justification whatsoever to believe that the risks of mismatches between desired intent and interpretation are any less in the communication between user and software developer than in the communication between software developer and machine. Yet, somewhat surprisingly, many software development initiatives are planned and executed as if there is only a very remote chance of communication errors between users and software developers (coders).

In a nutshell, the entire agile manifesto for software development boils down to the recognition that communication errors are an unavoidable part of life, and for the most part, they occur despite the best efforts and intentions from all sides. In other words, the agile manifesto is simply an appeal to stop the highly wasteful blame culture that saps time, energy and money from all parties involved.

The big problem with most interpretations of the agile manifesto is the assumption that it is productive for a software developer to directly translate the interpretation 2. of desired user user intent 1. into an intent 3. expressed in a general purpose linear text-based programming language. This assumption is counter-productive since such a translation bridges a very large gap between user-level concepts and programming-language-level concepts. The semantic identities of user-level concepts contained in 1. end up being fragmented and scattered across a large set of programming-language-level concepts, which gets in the way of creating a shared understanding between users and software developers.

In contrast, if the software developer employs a user-level graphical domain-specific modelling notation, there is a one-to-one correspondence between the concepts in 1. and the concepts in 3., which greatly facilitates a shared understanding – or avoidance of a significant mismatch between the desired intent of the user 1. and the interpretation by the software developer 2. . The domain-specific modelling notation provides the software developer with a codification 3. of 1. that can be discussed with users and that simultaneously is easily processable by a machine. In this context the software developer takes on the role of an analyst who formalises the domain-specific semantics that are hidden in the natural language used to express 1. .

Fatal software errors

Many large software-intensive organisations are currently in the process of replacing so-called legacy software systems. These applications and components typically involve several million lines of code that are between 5 and 40 years old. The following observations are helpful to understand the potential impact of software errors, the scale of the work, and the risks involved.

In case you are waiting for a banking transaction to come through, consider this anecdote from a random collection of software errors:

Rumor has it that, when they shut down the IBM 7094 at MIT in 1973, they found a low-priority process that had been submitted in 1967 and had not yet been run.

Other examples from the same collection of errors are less funny and some are deadly:

Computer blunders were blamed for $650M student loan losses. From ACM SIGSOFT Software Engineering Notes , vol. 20, no. 3

The Korean Airlines KAL 801 accident in Guam killed 225 out of 254 aboard. A design problem was discovered in barometric altimetry in Ground Proximity Warning System (GPWS). From ACM SIGSOFT Software Engineering Notes, vol. 23, no. 1.

From the End User License Agreement of a typical software platform:

The software product may contain support for programs written in Java. Java technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapon systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage.

If you believe real-time capable systems are intrinsically safer, consider that these systems are still coded by humans in notations just as arcane as Java, and are subject to the usual average error rate of humans. A detailed case study from the field of medical software:

On March 21, 1986, an oilfield worker named Ray Cox was being irradiated for the ninth time at the East Texas Cancer Center in Tyler, Texas, for a tumour that had been removed from his back…

Inside the treatment room Cox was hit with a powerful shock. He knew from previous treatments this was not supposed to happen. He tried to get up. Not seeing or hearing him because of the broken communications between the rooms, the technician pushed the “p” key, meaning “proceed.” Cox was hit again. The treatment finally stopped when Cox stumbled to the door of the room and beat it with his fists…

Cox’s injury was similar to Jane Yarborough’s — a dime-sized dose of 16,000 to 15,000 rads. He was sent home but returned to the hospital a few weeks later spitting blood: the doctors diagnosed radiation overexposure. It later paralysed his left arm, both legs, his left vocal chord, and his diaphragm. He died nearly five months later…

The Therac-25’s software program, relatively crude by today’s standards, probably contained 101,000 lines of code.

A more recent example illustrates that the situation is not improving over the years. The Therac-25 software was probably subjected to much more rigorous testing than most corporate business systems ever have been.

Large software systems consist of > 10,000,000 lines of code. Integrating systems beyond company boundaries via web services is becoming increasingly common, leading to ultra-large scale systems involving billions of lines of code, and to a quasi-infinite number of usage scenarios that are never tested. As I am writing this post, the perfect example of the brittleness of deep web service supply chains in an ultra-large scale system comes along:

Scores of well-known websites have been unavailable for large parts of Thursday because of problems with Amazon’s web hosting service.

If a single line of code contains one decision, 10 million lines of code contain 10 million decisions. Re-writing 10 million lines of legacy software may resolve all existing errors – in the most optimistic scenario. At the same time it is an opportunity to introduce around 10 million new decisions that can potentially be wrong. Given that 1 unintended error per 500 lines of code is considered pretty good, that’s effectively a guarantee of 20,000 new sources of errors in a haystack of 10 million lines. Happy error hunting…

The sources of errors located in the most frequently executed lines of code are likely to be detected relatively quickly, the rest will remain as sleepers until the big day arrives. Yes, the error rate can theoretically be reduced to a very acceptable 1 error per 250,000 lines of code (500 times better) through the use of NASA-style quality assurance, but at more than 150 times the cost (around $1,000 per line of code) – any commercial enterprise attempting to emulate the approach would be broke in no time.

Unless software systems are constructed from the ground up based on a radically different set of new principles and notations, the quality of software will not improve in any substantial way. The problem with state-of-the-art-software is that its readability and understandability by humans decreases very quickly over time, due to poor notations and inadequate mechanisms for modularising specifications. But as highlighted above, the complexity of our systems is steadily going up, and so are the high-impact risks.

Even if all known errors in a piece of software are fixed, no one is able to verify how many new errors are introduced with these fixes. Errors in new software are unavoidable as long as humans are involved in the software creation process.

The opportunity for improvement of software quality lies in the introduction of techniques that allow software specifications to be simplified (modularised) and to be made easier to understand (improved notation) as part of fixing errors, and as part of gaining new insights about the system by observing and analysing its run-time behaviour.

In other words, progress will only be made if the downward spiral of software understandability over time can be reversed. Linear representations in the form of traditional program code and natural language (documentation) certainly will never be up to the job. Radically different representations of knowledge are required to allow a software re-write to be conducted incrementally, in tiny steps, without introducing thousands of new sources of errors.