The impossibility of communicating desired intent

Communication relies on interpretation of the message by the recipient

Communication of desired intent can never be fully achieved. It would require a mind-meld between two individuals or between an individual and a machine.

The meaning (the semantics) propagated in a codified message is determined by the interpretation of the recipient, and not by the desired intent of the sender.

In the example on the right, the tree envisaged in the mind of the sender is not exactly the same as the tree resulting from the interpretation of the decoded message by the recipient.

To understand the practical ramnifications of interpretation, consider the following realistic example of communication in natural language between an analyst, a journalist, and a newspaper reader:

Communication of desired intent and interpretation

1. intent

  • Reiterate that recurring system outages at the big four banks are to be expected for at least 10 years whilst legacy systems are incrementally replaced
  • Indicate that an unpredictable and disruptive change will likely affect the landscape in banking within the next 15 years
  • Explain that similarly, 15 years ago, no one was able to predict that a large percentage of the population would be using Gmail from Google for email
  • Suggest that overseas providers of banking software or financial services may be part of the change and may compete against local banks
  • Indicate that local banks would find it hard to offer robust systems unless they each doubled or tripled their IT upgrade investments

2. interpretation

  • Bank customers must brace themselves for up to 15 years of pain
  • The big four banks would take 10 years to upgrade their systems and another five to stabilise those platforms
  • Local banks would struggle to compete against newer and nimbler rivals, which could sweep into Australia and compete against them
  • Local banks would find it hard to offer robust systems unless they each doubled or tripled their IT upgrade investments

3. intent (extrapolated from the differences between 1. and 2.)

  • Use words and numbers that maximise the period during which banking system outages are to be expected
  • Emphasise the potential threats to local banks and ignore irrelevant context information

4. interpretation

  • The various mental models that are constructed in the minds of readers who are unaware of 1.

Adults and even young children (once they have developed a theory of mind) know that others may sometimes interpret their messages in a surprising way. It is somewhat less obvious to realise that all sensory input received by the human brain is subject to interpretation, and that our own perception of reality is limited to an interpretation.

Next, consider an example of communication between a software user, a software developer (coder), and a machine, which involves both natural language and one or more computer programming languages:

Communication of desired intent including interpretation by a machine

1. intent

  • Request a system that is more reliably than the existing one
  • Simplify a number of unnecessarily complex workflows by automation
  • Ensure that all of the existing functionality is also available in the new system

2. interpretation

  • Redevelop the system in newer and more familiar technologies that offer a number of technical advantages
  • Develop a new user interface with a simplified screen and interaction design
  • Continue to allow use of the old system and provide back-end integration between the two systems

3. intent

  • Copy code patterns from another project that used some of the same technologies to avoid surprises
  • Deliver working user interface functionality as early as possible to validate the design with users
  • In the first iterations of the project continue to use the existing back-end, with a view to redeveloping the back-end at a later stage

4a. interpretation (version deployed into test environment)

  • Occasional run-time errors caused by subtle differences in the versions of the technologies used in this project and the project from which the code patterns were copied
  • Missing input validation constraints, resulting in some operational data that is considered illegal when accessed via the old system
  • Occurrences of previously unencountered back-end errors due to the processing of illegal data

4b. interpretation (version deployed into production environment)

  • Most run-time errors caused by subtle differences in the versions of the technologies have been resolved
  • Since no one fully understands all the validation constraints imposed by the old system (or since some constraints are now deemed obsolete),  the back-end system has been modified to accept all operational data received via the new user interface
  • The back-end system no longer causes run-time errors but produces results (price calculations etc.) that in some cases deviate from the results produced by the old version of the back-end system

In the example above it is likely that not only the intent in step 3. but also the intent in step 1. is codified in writing. The messages in step 1. are codified in natural language, and  the messages in step 3. are codified in programming languages. Written codification in no way reduces the risk of interpretations that deviate from the desired intent. In any non-trivial system the interpretation of a specific message may depend on the context, and the same message in a different context may result in a different interpretation.

Every software developer knows that it is humanly impossible to write several hundred lines of non-trivial program code without introducing unintended “errors” that will lead to a non-expected interpretation by the machine. Humans are even quite unreliable at simple data entry tasks. Hence the need for extensive input data validation checks in software that directly alert the user to data that is inconsistent with what the system interprets as legal input.

There is no justification whatsoever to believe that the risks of mismatches between desired intent and interpretation are any less in the communication between user and software developer than in the communication between software developer and machine. Yet, somewhat surprisingly, many software development initiatives are planned and executed as if there is only a very remote chance of communication errors between users and software developers (coders).

In a nutshell, the entire agile manifesto for software development boils down to the recognition that communication errors are an unavoidable part of life, and for the most part, they occur despite the best efforts and intentions from all sides. In other words, the agile manifesto is simply an appeal to stop the highly wasteful blame culture that saps time, energy and money from all parties involved.

The big problem with most interpretations of the agile manifesto is the assumption that it is productive for a software developer to directly translate the interpretation 2. of desired user user intent 1. into an intent 3. expressed in a general purpose linear text-based programming language. This assumption is counter-productive since such a translation bridges a very large gap between user-level concepts and programming-language-level concepts. The semantic identities of user-level concepts contained in 1. end up being fragmented and scattered across a large set of programming-language-level concepts, which gets in the way of creating a shared understanding between users and software developers.

In contrast, if the software developer employs a user-level graphical domain-specific modelling notation, there is a one-to-one correspondence between the concepts in 1. and the concepts in 3., which greatly facilitates a shared understanding – or avoidance of a significant mismatch between the desired intent of the user 1. and the interpretation by the software developer 2. . The domain-specific modelling notation provides the software developer with a codification 3. of 1. that can be discussed with users and that simultaneously is easily processable by a machine. In this context the software developer takes on the role of an analyst who formalises the domain-specific semantics that are hidden in the natural language used to express 1. .

Fatal software errors

Many large software-intensive organisations are currently in the process of replacing so-called legacy software systems. These applications and components typically involve several million lines of code that are between 5 and 40 years old. The following observations are helpful to understand the potential impact of software errors, the scale of the work, and the risks involved.

In case you are waiting for a banking transaction to come through, consider this anecdote from a random collection of software errors:

Rumor has it that, when they shut down the IBM 7094 at MIT in 1973, they found a low-priority process that had been submitted in 1967 and had not yet been run.

Other examples from the same collection of errors are less funny and some are deadly:

Computer blunders were blamed for $650M student loan losses. From ACM SIGSOFT Software Engineering Notes , vol. 20, no. 3

The Korean Airlines KAL 801 accident in Guam killed 225 out of 254 aboard. A design problem was discovered in barometric altimetry in Ground Proximity Warning System (GPWS). From ACM SIGSOFT Software Engineering Notes, vol. 23, no. 1.

From the End User License Agreement of a typical software platform:

The software product may contain support for programs written in Java. Java technology is not fault tolerant and is not designed, manufactured, or intended for use or resale as on-line control equipment in hazardous environments requiring fail-safe performance, such as in the operation of nuclear facilities, aircraft navigation or communication systems, air traffic control, direct life support machines, or weapon systems, in which the failure of Java technology could lead directly to death, personal injury, or severe physical or environmental damage.

If you believe real-time capable systems are intrinsically safer, consider that these systems are still coded by humans in notations just as arcane as Java, and are subject to the usual average error rate of humans. A detailed case study from the field of medical software:

On March 21, 1986, an oilfield worker named Ray Cox was being irradiated for the ninth time at the East Texas Cancer Center in Tyler, Texas, for a tumour that had been removed from his back…

Inside the treatment room Cox was hit with a powerful shock. He knew from previous treatments this was not supposed to happen. He tried to get up. Not seeing or hearing him because of the broken communications between the rooms, the technician pushed the “p” key, meaning “proceed.” Cox was hit again. The treatment finally stopped when Cox stumbled to the door of the room and beat it with his fists…

Cox’s injury was similar to Jane Yarborough’s — a dime-sized dose of 16,000 to 15,000 rads. He was sent home but returned to the hospital a few weeks later spitting blood: the doctors diagnosed radiation overexposure. It later paralysed his left arm, both legs, his left vocal chord, and his diaphragm. He died nearly five months later…

The Therac-25’s software program, relatively crude by today’s standards, probably contained 101,000 lines of code.

A more recent example illustrates that the situation is not improving over the years. The Therac-25 software was probably subjected to much more rigorous testing than most corporate business systems ever have been.

Large software systems consist of > 10,000,000 lines of code. Integrating systems beyond company boundaries via web services is becoming increasingly common, leading to ultra-large scale systems involving billions of lines of code, and to a quasi-infinite number of usage scenarios that are never tested. As I am writing this post, the perfect example of the brittleness of deep web service supply chains in an ultra-large scale system comes along:

Scores of well-known websites have been unavailable for large parts of Thursday because of problems with Amazon’s web hosting service.

If a single line of code contains one decision, 10 million lines of code contain 10 million decisions. Re-writing 10 million lines of legacy software may resolve all existing errors – in the most optimistic scenario. At the same time it is an opportunity to introduce around 10 million new decisions that can potentially be wrong. Given that 1 unintended error per 500 lines of code is considered pretty good, that’s effectively a guarantee of 20,000 new sources of errors in a haystack of 10 million lines. Happy error hunting…

The sources of errors located in the most frequently executed lines of code are likely to be detected relatively quickly, the rest will remain as sleepers until the big day arrives. Yes, the error rate can theoretically be reduced to a very acceptable 1 error per 250,000 lines of code (500 times better) through the use of NASA-style quality assurance, but at more than 150 times the cost (around $1,000 per line of code) – any commercial enterprise attempting to emulate the approach would be broke in no time.

Unless software systems are constructed from the ground up based on a radically different set of new principles and notations, the quality of software will not improve in any substantial way. The problem with state-of-the-art-software is that its readability and understandability by humans decreases very quickly over time, due to poor notations and inadequate mechanisms for modularising specifications. But as highlighted above, the complexity of our systems is steadily going up, and so are the high-impact risks.

Even if all known errors in a piece of software are fixed, no one is able to verify how many new errors are introduced with these fixes. Errors in new software are unavoidable as long as humans are involved in the software creation process.

The opportunity for improvement of software quality lies in the introduction of techniques that allow software specifications to be simplified (modularised) and to be made easier to understand (improved notation) as part of fixing errors, and as part of gaining new insights about the system by observing and analysing its run-time behaviour.

In other words, progress will only be made if the downward spiral of software understandability over time can be reversed. Linear representations in the form of traditional program code and natural language (documentation) certainly will never be up to the job. Radically different representations of knowledge are required to allow a software re-write to be conducted incrementally, in tiny steps, without introducing thousands of new sources of errors.

Participate: Tweeting in the format URL relationship URL

Twitter has emerged as a very powerful medium for propagating ideas and thoughts. Possibly Twitter is the ideal data input tool for harnessing the collective insights of the humans and systems that are connected to the web – effectively a significant proportion of all humans and virtually every non-trivial system on the planet.

By simply adopting a convention of twittering important insights in the format <some URL> <some relationship> <some other URL>, users can incrementally, one step at a time, create a personal model of the web. These personal models can grow arbitrarily large, and Twitter is certainly not the appropriate tool for visualising, modularising and analysing such models. But arguably, Twitter is the most elegant and simplest possible front end for capturing atoms of knowledge.

Note that URLs used on Twitter typically point to a substantial piece of information, and not a simple word or sentence. Often a URL references an entire article, a web site, or a non-trivial web-based system. These articles, web sites or systems can be considered semantic identities in that specific users (or groups of users) associate them with specific semantics (or “meaning”). Hence tweets in the <some URL> <some relationship> <some other URL> format suggested above represent connections between two semantic identities. A set of such tweets amounts to the construction of a mathematical graph, where the URLs are the vertices, and the relationships are the edges.

If we add functions for transforming graphs into the mix, and considering that we are connecting representations of semantic identities, we end up in the mathematical discipline of model theory. Considering further that Twitter models are user specific, and that the semantics that users associate with a URL are not necessarily identical – but rather complementary, we can further exploit results from the mathematics of denotational semantics. For the average user there is no need to worry about the formal mathematics, and it is sufficient to understand that the <some URL> <some relationship> <some other URL> format (I will use #URLrelURL on Twitter when referencing this format) allows the articulation of insights that correspond to the atoms of knowledge that humans store in their brains.

With appropriate software technology it is extremely easy to translate sets of #URLrelURL tweets into a proper mathematical graph, and into a user specific semantic model. These models can then be analysed, modularised, visualised, compared, and transformed with the help of machine & human intelligence. Amongst other things, retweets can be taken as an indication of some degree of shared understanding in relation to a particular insight. Further qualification of the semantic significance of specific tweets can be calculated from the connections between Twitter users, and from analysis of the information/functionality offered by the two connected URLs.

The most interesting results are unlikely to be the individual mental models that are recorded via #URLrelURL tweets, but will rather be the overlay of all the mental models, leading to a complex graph with weighted edges, which can be analysed from various perspectives. This graph represents a much better organisation of semantic knowledge than the organisation of information delivered by systems like Google search.

Instead of processing semantic models, Google search must process entire web sites with arbitrary syntactic content, with no indication of which pairs of URLs constitute insights useful to humans. Google can only indirectly infer (and make assumptions about) the semantics that humans associate with URLs by applying statistics and proprietary algorithms to syntactic information.

In contrast, the raw aggregated #URLrelURL tweet model of the world captures collective human semantics, and any additional machine generated #URLrelURL insights can be marked as such. The latter insights will not necessarily be of less value, but it will be reassuring to know that they are firmly grounded in the collective semantic perspective of human web users.

Making this semantic perspective accessible to humans and to software via appropriate search, visualisation, and analysis tools will constitute a huge step forwards in terms of learning, effective collaboration, quality of decision making, and in terms of eliminating the boundary between biological and computer software intelligence.

Therefore, please join me in capturing valuable nuggets of insight in the format of
<some URL> <some relationship> <some other URL> tweets.

Example: #gmodel can be used to #translate twitter models into #semantic #models