The antidote to misuse of mathematics and junk data

transparencyDepending on who you ask, the perceptions of mathematics range from an esoteric discipline that has little relevance to everyday life to a collection of magical rituals and tools that shape the operations of human cultures. In an age of exponentially increasing data volumes, the public perception has increasingly shifted towards the latter perspective.

On the one hand it is nice to see a greater appreciation for the role of mathematics, and on the other hand the growing use of mathematical techniques has led to a set of cognitive blind spots in human society:

  1. Blind use of mathematical formalisms – magical rituals
  2. Blind use of second hand data – unvalidated inputs
  3. Blind use of implicit assumptions – unvalidated assumptions
  4. Blind use of second hand algorithms – unvalidated software
  5. Blind use of terminology – implicit semantic integration
  6. Blind use of numbers – numbers with no sanity checks

Construction of formal models is no longer the exclusive domain of mathematicians, physical scientists, and engineers. Large and fast flowing data streams from very large networks of devices and sensors have popularised the discipline of data science, which is mostly practiced within corporations, within constraints dictated by business imperatives, and mostly without external and independent supervision.

The most worrying aspect of corporate data science is the power that corporations can wield over the interpretation of social data, and the corresponding lack of power of those that produce and share social data. The power imbalance between corporations and society is facilitated by the six cognitive blind spots, which affect the construction of formal models and their technological implementations in multiple ways:

  1. Magical rituals lead to a lack of understanding of algorithm convergence criteria and limits of applicability, to suboptimal results, and to invalid conclusions. Examples: Naive use of frequentist statistical techniques and incorrect interpretations of p-values by social scientists, or naive use of numerical algorithms by developers of machine learning algorithms.
  2. Unvalidated inputs open the door for poor measurements and questionable sampling techniques. Examples: use of data sets collected by a range of different instruments with unspecified characteristics, or incorrect priors in Bayesian probabilistic models.
  3. Unvalidated assumptions enable the use of speculative causal relationships, simplistic assumptions about human nature, and create a platform for ideological bias. Examples: many economic models rest on outdated assumptions about human behaviour, and consciously ignore evidence from other disciplines that conflicts with established economic dogma.
  4. Unvalidated software can produce invalid results, contradictions, and unexpected error conditions . Examples: outages of digital services from banks and telecommunications service providers are often treated as unavoidable, and computational errors sometimes cost hundreds of millions of dollars or hundreds of lives.
  5. Unvalidated semantic links between mathematical formalisms, data, assumptions and software facilitate further bias and spurious complexity. Examples: Many case studies show that formalisation of semantic links and systematic elimination of spurious complexity can reduce overall complexity by factors between 3 and 20, whilst improving computational performance.
  6. Unvalidated numbers can enable order of magnitude mistakes and obvious data patterns to remain undetected. Example: Without adequate visual representations, even simple numbers can be very confusing for a numerically challenged audience.

Whilst a corporation may not have an explicit agenda for creating distorted and dangerously misleading models, the mechanics of financial economics create an irresistible temptation to optimise corporate profit by systematically shifting economic externalities into cognitive blind spots. A similar logic applies to government departments that have been tasked to meet numerically specified objectives.

Mathematical understanding and numerical literacy is becoming increasingly important, but it is unrealistic to assume that the majority of the population will become sufficiently proficient in mathematics and statistics to be able to validate and critique the formal models employed by corporations and governments. Transparency, including open science, open data, and open source software are are emerging as essential tools for independent oversight of cognitive blind spots:

  1. Mathematicians must be able to review the formalisms that are being used
  2. Statisticians must be able to review measurement techniques and input data sources
  3. Scientists and experts from disciplines relevant to the problem domain must be able to review assumptions
  4. Software engineers familiar with the software tools that are being used must be able to review software implementations
  5. Mathematicians with an understanding of category theory, model theory, denotational semantics, and conceptual modelling must be able to review semantic links between mathematical formalisms, terminology, data, assumptions, and software
  6. Mathematicians and statisticians must be able to review data representations

In a zero marginal cost society, transparency allows scarce and highly specialised mathematical knowledge to be used for the benefit of society. It is very encouraging to note the similarity in knowledge sharing culture between the mathematical community and the open source software community, and to note the decreasing relevance of opaque closed source software.

The more society depends on decisions made with the help of mathematical models, the more important it becomes that these decisions adequately accommodate the concrete needs of individuals and local communities, and that the language used to reason about economics remains understandable, and enables the articulation of economic goals in simple terms.

The big human battle of this century

The big human battle of this century is going to be the democratisation of data and all forms of knowledge, and the introduction of digital government with the help of free and open source software

Whilst undoubtedly the reaction of the planet to the explosion of human activities with climate change and other symptoms is the largest change process that has ever occurred in human history in the physical realm, the exponential growth of the Internet of Things and digital information flows is triggering the largest change process in the realm of human organisation that societies have ever experienced.

The digital realm

The digital realm

Sensor networks and pervasive use of RFID tags are generating a flood of data and lively machine-to-machine chatter. Machines have replaced humans as the most social species on the planet, and this must inform the approach to the development of healthy economic ecosystems.

Internet of Things

Sensors that are part of the Internet of Things

When data scientists and automation engineers collaborate with human domain experts in various disciplines, machine-generated data is the magic ingredient for solving the hardest automation problems.

  • In domains such as manufacturing and logistics the writing is on the wall. Introduction of self-driving vehicles and just a few more robots on the shop floor will eliminate the human element in the social chatter at the workplace within the next 10 years.
  • The medical field is being revolutionised by the downward spiral of the cost of genetic analysis, and by the development of medical robots and medical devices that are hooked up to the Internet, paving the way for machine learning algorithms and big data to replace many of the interactions with human medical professionals.
  • The road ahead for the provision of government services is clearly digital. It is conceivable that established bureaucracies can resist the trend to digitisation for a few years, but any delay will not prevent the inevitability of automation.

The social implications

Data driven automation leads to an entirely new perspective on the purpose of the education system and on the role of work and employment in society.

Large global surveys show that more than 70% of employees are disengaged at work. It is mainly in manufacturing that automation directly replaces human labour. In many other fields the shift in responsibilities from humans to machines initially goes hand in hand with the invention of new roles and loss of a clear purpose.

Traditional work is being transformed into a job for a machine. Exceptions are few and far between.

Data that is not sufficiently accessible is only of very limited value to society. The most beneficial and disruptive data driven innovation are those that result from the creative combination of data sets from two or more different sources.

It is unrealistic to assume that the most creative minds can be found via the traditional channel of employment, and it is unrealistic that such minds can achieve the best results if data is locked up in organisation-specific or national silos.

The most valuable data is data that has been meticulously validated, and that is made available in the public domain. It is no coincidence that software, data, and innovation is increasingly produced in the public domain. Jeremy Rifkin describes the emergence of a third mode of commons-based digitally networked production that is distinct from the property- and contract-based modes of firms and markets.

The education system has a major role to play in creating data literate citizen-scientists-innovators.

The role of economics

It is worthwhile remembering the origin of the word economics. It used to denote the rules for good household management. On a planet that hosts life, household management occurs at all levels of scale, from the activities of single cells right up to processes that involve the entire planetary ecosystem. Human economics are part of a much bigger picture that always included biological economics and that now also includes economics in the digital realm.

To be able to reason about economics at a planetary level the planet needs a language for reasoning about economic ecosystems, only some of which may contain humans. Ideally such a language should be understandable by humans, but must also be capable of reaching beyond the scope of human socio-economic systems. In particular the language must not be coloured by any concrete human culture or economic ideology, and must be able to represent dependencies and feedback loops at all levels of scale, as well as feedback loops between levels of scale, to enable adequate representation of the fractal characteristic of nature.

The digital extension of the planetary nervous system

In biology the use of electrical impulses for communication is largely confined to communication within individual organisms, and communication between organisms is largely handled via electromagnetic waves (light, heat), pressure waves (sound), and chemicals (key-lock combinations of molecules).

The emergence of the Internet of Things is adding to the communication between human made devices, which in turn interact with the local biological environment via sensors and actuators. The impact of this development is hard to overestimate. The number of “tangible” things that might be computerized is approaching 200 billion, and this number does not include large sensor networks that are being rolled out by scientists in cities and in the natural environment. Scientists are talking about trillion-sensor networks within 10 years. The number of sensors in mobile devices is already more than 50 billion.

Compared to chemical communication channels between organisms, the speed of digital communication is orders of magnitude faster. The overall effect of equipping the planet with a ubiquitous digital nervous system is comparable to the evolution of animals with nervous systems and brains – it opens up completely new possibilities for household management at all levels of scale.

The complexity of the Internet of Things that is emerging on the horizon over the next decade is comparable to the complexity of the human brain, and the volume of data flows handled by the network is orders of magnitudes larger than anything a human brain is able to handle.

The global brain

Over the course of the last century, starting with the installation of the first telegraph lines, humans have embarked on the journey of equipping the planet with a digital electronic brain. To most human observers this effort has only become reasonably obvious with the rise of the Web over the last 20 years.

Human perception and human thought processes are strongly biased towards the time scales that matter to humans on a daily basis to the time scale of a human lifetime. Humans are largely blind to events and processes that occur in sub-second intervals and processes that are sufficiently slow. Similarly human perception is biased strongly towards living and physical entities that are comparable to the physical size of humans plus minus two orders of magnitude.

As a result of their cognitive limitations and biases, humans are challenged to understand non-human intelligences that operate in the natural world at different scales of time and different scales of size, such as ant colonies and the behaviour of networks of plants and microorganisms. Humans need to take several steps back in order to appreciate that intelligence may not only exist at human scales of size and time.

The extreme loss of biodiversity that characterises the anthropocene should be a warning, as it highlights the extent of human ignorance regarding the knowledge and intelligence that evolution has produced over a period of several billion years.

It is completely misleading to attempt to attach a price tag to the loss of biodiversity. Whole ecosystems are being lost – each such loss is the loss of a dynamic and resilient living system of accumulated local biological knowledge and wisdom.

Just like an individual human is a complex adaptive system, the planet as a whole is a complex adaptive system. All intelligent systems, whether biological or human created, contain representations of themselves, and they use these representations to generate goal directed behaviour. Examples of intelligent systems include not only individual organisms, but also large scale and long-lived entities such as native forests, ant colonies, and coral reefs. The reflexive representations of these systems are encoded primarily in living DNA.

From an external perspective it nearly seems as if the planetary biological brain, powerful – but thinking slowly in chemical and biological signals over thousands of years, has shaped the evolution of humans for the specific purpose of developing and deploying a faster thinking global digital brain.

It is delusional to think that humans are in control of what they are creating. The planet is in the process of teaching humans about their role in its development, and some humans are starting to respond to the feedback. Feedback loops across different levels of scale and time are hard for humans to identify and understand, but that does not mean that they do not exist.

The global digital brain is currently still in under development, not unlike the brain of a human baby before birth. All corners of the planet are being wired up and connected to sensors and actuators. The level of resilience of the overall network depends on the levels of decentralisation, redundancy, and variability within the network. A hierarchical structure of subsystems as envisaged by technologist Ray Kurzweil is influenced by elements of established economic ideology rather than by the resilient neural designs found in biology. A hierarchical global brain would likely suffer from recurring outages and from a lack of behavioural plasticity, not unlike the Cloud services from Microsoft and Amazon that define the current technological landscape.

Global thinking

The ideology of economic globalisation is dominated by simplistic and flawed assumptions. In particular the concepts of money and globally convertible currencies are no longer helpful and have become counter-productive. The limitations of the monetary system are best understood by examining the historic context in which money and currencies were invented, which predates the development of digital networks by several thousand years. At the time a simple and crude metric in the form of money was the best technology available to store information about economic flows.

As the number of humans has exploded, and as human societies have learned to harness energy in the form of fossil fuels to accelerate and automate manufacturing processes, the old monetary metrics have become less and less helpful as economic signals. In particular the impact of economic externalities that are ignored by the old metrics, both in the natural environment as well as in the human social sphere, is becoming increasingly obvious.

The global digital brain allows flows of energy, physical resources, and economic goods to be tracked in minute detail, without resorting to crude monetary metrics and assumptions of fungibility that open the door to suppressing inconvenient externalities.

A new form of global thinking is required that is not confined to the limited perspective of financial economics. The notions of fungibility and capital gains need to be replaced with the notions of collaborative economics and zero-waste cyles of economic flows.

Metrics are still required, but the new metrics must provide a direct and undistorted representation of flows of energy, physical resources, and economic goods. Such highly context specific metrics enable computational simulation and optimisation of zero-waste economics. Their role is similar to the role of chemical signalling substances used by biological organisms.

Global thinking requires the extension of a zero-waste approach to economics to the planetary level – leaving no room for any known externalities, and encouraging continuous monitoring to detect unknown externalities that may be affecting the planetary ecosystem.

The future of human economics

The real benefits of the global digital brain will be realised when massive amounts of machine generated data become accessible in the public domain in the form of disruptive innovation, and are used to solve complex optimisation problems in transportation networks, distributed generation and supply of power, healthcare, recycling of non-renewable resources, industrial automation, and agriculture.

Five years ago Tim O’Reilly predicted a war for control of the Web. The hype around big data has let many organisations forget that the Web and social media in particular is already saturated with explicit and implicit marketing messages, and that there is an upper bound to the available time (attention) and money for discretionary purchases. A growing list of organisations is fighting over a very limited amount of potential revenue, unable to see the bigger picture of global economics.

Over the next decade one of the biggest challenges will be the required shift in organisational culture, away from simplistic monetisation of big data, towards collaboration and extensive data and knowledge sharing across disciplines and organisational boundaries. The social implications of advanced automation across entire economic ecosystems, and a corresponding necessary shift in the education system need to be addressed.

The future of humans

Human capabilities and limitations are under the spot light. How long will it take for human minds to shift gears, away from the power politics and hierarchically organised societies that still reflect the cultural norms of our primate cousins, and from myopic human-centric economics, towards planetary economics that recognise the interconnectedness of life across space and time?

The future of democratic governance could be one where people vote for human understandable open source legislation that is directly executable by intelligent software systems. Corporate and government politicians will no longer be deemed as an essential part of human society. Instead, any concentration of power in human hands is likely to be recognised as an unacceptable risk to the welfare of society and the health of the planet.



Humans have to ask themselves whether they want to continue to be useful parts of the ecosystem of the planet or whether they prefer to take on the role of a genetic experiment that the planet switched on and off for a brief period in its development.

Big data blah $ blah $ blah $

Why does LinkedIn feed me with big data hype from 2011?
By only talking about dollar metrics, potential big data intelligence is turned into junk data science.

blunt abstraction of native domain metrics into dollars is a source of junk data

All  meaningful automation, quality, energy efficiency, and resilience metrics are obliterated by translating into dollars. Good business decisions are made by understanding the implications of domain-specific metrics:
  1. Level of automation
  2. Volume of undesirable waste
  3. Energy use
  4. Reliability and other quality of service attributes

Any practitioner of Kaizen knows that sustainable cost reductions are the result of improvements in concrete metrics that relate directly to the product that is being developed or the service that is being delivered. The same domain expertise that is useful for Kaizen can be combined with high quality external big data sources to produce insights that enable radical innovation.

Yes, often the results have a highly desirable effect on operating costs or sales, but the real value can only be understood in terms of native domain metrics. The healthcare domain is a good example. Minimising the costs of high quality healthcare is desirable, but only when patient outcomes and quality of care are not compromised.

When management consultants only talk about results in dollars, there is a real danger of only expressing goals in financial terms. This then leads down the slippery slope of tinkering with outcomes and accounting procedures until the desirable numbers are within range. It is too late when experts start to ask questions about outcomes, and when lacking native domain metrics expose reductions in operational costs as a case of cutting corners.

Before believing a big data case study, always look beyond the dollars. If in doubt, survey customers to confirm claims of improved outcomes and customer satisfaction. The referenced McKinsey article does not encourage corner cutting, but it fails to highlight the need for setting targets in native domain metrics, and it distracts the reader with blunt financial metrics.

Let’s talk semantics. Do you know what I mean?

Over the last few years the talk about search engine optimisation has given way to hype about semantic search.


context matters

The challenge with semantics is always context. Any useful form of semantic search would have to consider the context of a given search request. At a minimum the following context variables are relevant: industry, organisation, product line, scientific discipline, project, and geography. When this context is known, a semantic search engine can realistically tackle the following use cases:

  1. Looking up the natural language names or idioms that are in use to refer to a specific concept
  2. Looking for domain knowledge; i.e. looking for all concepts that are related to a given concept
  3. Investigating how a word or idiom is used in other industries, organisations, products, research projects, geographies; i.e. investigating the variability of a concept across industries, organisations, products, research projects, and geographies
  4. Looking up all the instances where a concept is used in Web content
  5. Investigating how established a specific word or idiom is in the scientific community, to distinguish between established terminology and fashionable marketing jargon
  6. Looking up the formal names that are used in database definitions, program code, and database content to refer to a specific concept
  7. Looking up all the instances where a concept is used in database definitions, program code, and database content

These use cases relate to the day-to-day work of many knowledge workers. The following presentation illustrates the challenges of semantic search and it contains examples that illustrate how semantic search based on concepts differs from search based on words.

semantic search

Do you know what I mean?

The current semantic Web is largely blind to the context parameters of industry, organisation, product line, scientific discipline, and project. Google, Microsoft, and other developers of search engines consider a fixed set of filter categories such as geography, time of publication, application, etc. and apply a more or less secret sauce to deduce further context from a user’s preferences and browsing history. This approach is fundamentally flawed:

  • Each search engine relies on an idiosyncratic interpretation of user preferences and browsing history to deduce the values of further context variables, and the user is only given limited tools for influencing the interpretation, for example via articulating “likes” and “dislikes”
  • Search engines rely on idiosyncratic algorithms for translating filters, and “likes” and “dislikes” into search engine semantics
  • Search engines are unaware of the specific intent of the user at a given point in time, and without more dynamic and explicit mechanisms for a user to articulate intent, relying on a small set of filter categories, user’s preferences, and browsing history is a poor choice

The weak foundations of the “semantic Web”, which evolved from a keynote from Tim Berners-Lee in 1994, compound the problem:

“Adding semantics to the Web involves two things: allowing documents which have information in machine readable forms, and allowing links to be created with relationship values.”

Subsequently developed W3C standards are the result of the design by committee with the best intentions.

All organisations that have high hopes for turning big data into gold should pause for a moment, and consider the full implication of “garbage in, garbage out” in their particular context. Ambiguous data is not the only problem. Preconceived notions about semantics are another big problem. Implicit assumptions are easily baked into analytical problem statements, thereby confining the space of potential “insights” gained from data analysis to conclusions that are consistent with preconceived interpretations of so-called metadata.

The root cause of the limitations of state-of-the-art semantic search lies in the following implicit assumptions:

  • Text / natural language is the best mechanism for users to articulate intent, i.e. a reliance on words rather than concepts
  • The best mechanism to determine context is via a limited set of filter categories, user preferences, and via browsing history
words vs concepts

words vs concepts

Semantic search will only improve if and when Web browsers rely on explicit user guidance to translate words into concepts before executing a search request. Furthermore, to reduce search complexity, a formal notion of semantic equivalence is essential.

semantic equivalence

semantic equivalence

Lastly, the mapping between labels and semantics depends significantly on linguistic scope. For example the meaning of solution architecture in organisation A is typically different from the meaning of solution architecture in organisation B.

linguistic scope

linguistic scope

If the glacial speed of innovation in mainstream programming languages and tools is any indication, the main use case of semantic search is going to remain:

User looks for a product with features x, y, and z

The other use cases mentioned above may have to wait for another 10 years.

Death by Standardisation

Standardisation is a double-edged sword. Compliance with standards is best restricted to those standards that really make a difference in a specific context.

Even innocent standardisation attempts such as enforcing a shared terminology across an organisation can be counter-productive, as it can lead to the illusion of shared understanding, whereas in practice each organisational silo associates different meanings with the terminology.

There is no simplistic rule of thumb, but the following picture can help to gain a sense of perspective and to avoid the dreaded death zone of standardisation.

Death by Standardisation

The story of life is language

This post is a rather long story. It attempts to connect topics from a range of domains, and the insights from experts in these domains. In this story my role is mainly the one of an observer. Over the years I have worked with hundreds of domain experts, distilling the essence of deep domain knowledge into intuitive visual domain-specific languages. If anything, my work has taught me the skill to observe and to listen, and it has made me concentrate on the communication across domain boundaries – to ensure that desired intent expressed in one domain is sufficiently aligned with the interpretations performed in other domains.

The life of language and the language of life can’t be expressed in written words. Many of the links contained in this story are essential, and provide extensive background information in terms of videos (spoken language, intonation, unconscious body language, conscious gestures), and visual diagrams. To get an intuitive understanding of the significance of visual communication, once you get to the end of the story, simply imagine none of the diagrams had been included.

Drawing Hands, 1948, by the Dutch artist M. C. Escher

It may not be evident on the surface, but the story of life started with language, hundreds of millions of years ago – long before humans were around, and it will continue with language, long after humans are gone.

The famous Drawing Hands lithograph from M. C. Escher provides a very good analogy for the relationship between life and language – the two concepts are inseparable, and one recursively gives rise to the other.

At a fundamental level the language of life is encoded in a symbol system of molecular fragments and molecules – in analogy to an alphabet, words, and sentences.

The language of life

TED – Craig Ventor on creating synthetic life

Over the last two decades molecular biologists and chemists have become increasingly skilled at reading the syntax of the genetic code; and more recently scientists started to work on, and have successfully prototyped techniques to write the syntax of the genetic code. In other words, humans now have the tools to translate bio-logical code into digital code as well as the tools to translate digital code back into bio-logical code. The difference between the language of biology and the language of digital computers is simply one of representation (symbolic representations are also called models). Unfortunately, neither the symbols used by biology (molecules), nor the symbols used by digital computers (electric charges), are directly observable via the cognitive channels available to humans.

However, half a century of software development has not only led to convoluted and unmaintainable legacy software, but also to some extremely powerful tools for translating digital representations into visual representations that are intuitive for humans to understand. We no longer need to deal with mechanical switches or punch cards, and modern user interfaces present us with highly visual information that goes far beyond the syntax of written natural language. These visualisation tools, taken together with the ability to translate bio-logical code into digital code, provide humans with a window into the fundamental language of life – much more impressive in my view than the boring magical portals dreamed up by science fiction authors.

TED – Bonnie Bassler on how bacteria communicate

The language of life is highly recursive. It turns out that even the smallest single-celled life forms have developed higher-level languages, to communicate – not only within their species, but even across species. At the spacial and temporal scale that characterises the life of bacteria, the symbol system used consists of molecules. What is fascinating, is that scientists have not only decoded the syntax (the density of molecular symbols surrounding  the bacteria), but have also begun to decode the meaning of the language used by bacteria, for example, in the case of a pathogen, communication that signals when to attack the host.

The biological evidence clearly shows, in a growing number of well-researched examples, that the development of language does not require any “human-level” intelligence. Instead, life can be described as an ultra-large system of elements that communicate via various symbol systems. Even though the progress in terms of discovering and reading symbol systems is quite amazing, scientists are only scratching the surface in terms of understanding the meaning (the semantics) of biological symbol systems.

Language systems invented by humans

From muddling to modelling

Semantics is the most fascinating touch point between biology and the mathematics of symbol systems. In terms of recursion, mathematics seems to have found a twin in biology. Unfortunately, computer scientists, and software development practitioners in particular, for a long time have ignored the recursive aspect of formal languages. As a result, the encoding of the software that we use today is much more verbose and complex than it would need to be.

From code into the clouds

Nevertheless, over the course of a hundred years, the level of abstraction of computer programming has slowly moved upwards. The level of progress is best seen when looking at the sequence of the key milestones that have been reached to date. Not unlike in biology, more advanced languages have been built on top of simpler languages. In technical terms, the languages of biology and all languages invented by humans, from natural language to programming languages, are codes. The dictionary defines code as follows:

  1. Code is a system of signals used to send messages
  2. Code is a system of symbols used for the purpose of identification or classification
  3. Code is a set of conventions governing behaviour

Sets – the foundation of biological and digital code

Mathematically, all codes can be represented with the help of sets and the technique of recursion. But, as with the lowest-level encoding of digital code in terms of electric charges, the mathematical notation for sets is highly verbose, and quickly reaches human cognitive limits.

The mathematical notation for sets predates modern computers, and was invented by those who needed to manually manipulate sets at a conceptual level, for example as part of a mathematical proof. Software programming and also communication in natural language involves so many sets that a representation in the classical mathematical notation for sets is unpractical.

The importance of high-quality representation of symbols is often under-rated. A few thousand years ago humans realised the limitation of encoding language in sounds, and invented written language. The notation of written language minimises syntactical errors, and, in contrast to spoken language, allows reliable communication of sequences of words across large distances in space and time.

The challenge of semantics

The impossibility of communicating desired intent

Software development professionals are becoming increasingly aware of the importance of notation, but interpretation (inferring the semantics of a message) remains an ongoing challenge. Adults and even young children, once they have developed a theory of mind, know that others may sometimes interpret their messages in a surprising way. It is somewhat less obvious, that all sensory input received by the human brain is subject to interpretation, and that our own perception of reality is limited to an interpretation.

The curse of software maintenance

Interpretation is not only a challenge in communication between humans, it is as much a challenge for communication between humans and software systems. Every software developer knows that it is humanly impossible to write several hundred lines of non-trivial program code without introducing unintended “errors” that will lead to a non-expected interpretation by the machine. Still, writing new software requires much less effort than understanding and changing existing software. Even expert programmers require large amounts of time to understand software written by others.

The challenge of digital waste

We have only embarked down the road of significant dematerialisation of artefacts in the last few years, but I am somewhat concerned about the semantic value of many of the digital artefacts that are now being produced at a mind-boggling rate. I am coming to think of it as digital waste – worse than noise. The waste involves the time involved in producing and consuming artefacts and the associated use of energy.

Sharpening your collaborative edge

Of particular concern is the production of meta-artefacts (for example the tools we use to produce digital artefacts, and higher-level meta-tools). The user interfaces of Facebook, Google+ and other tools look reasonable at a superficial level, just don’t look under the hood. As a result, we produce the digital equivalent of the Pacific Garbage Patch. Blinded by shiny new interfaces, the digital ocean seems infinite, and humanity embarks on yet another conquest …

Today’s collaboration platforms not only rely on a central point of control, they are also ill-equipped for capturing deep knowledge and wisdom – there is no semantic foundation, and the tools are very limited in their ability to facilitate a shared understanding within a community. The ability to create digital artefacts is not enough, we need the ability to create semantic artefacts in order to share meaningful information.

How does life (the biological system of the planet) collectively interpret human activities?

TED – Naomi Klein : Addicted to risk

As humans we are limited to the human perspective, and we are largely unaware of the impact of our ultra-large scale chemical activities on the languages used by other species. If biologists have only recently discovered that bacteria heavily rely on chemical communication, how many millions of other chemical languages are we still completely unaware of? And what is the impact of disrupting chemical communication channels?

Scientists may have the best intentions, but their conclusions are limited to the knowledge available to them. To avoid potentially fatal mistakes and misunderstandings, it is worthwhile to tread carefully, and to invest in better listening skills. Instead of deafening the planet with human-made chemicals, how about focusing our energies on listening to – and attempting to understand, the trillions of conversations going on in the biosphere?

Gmodel – The Semantic Database

At the same time, we can work on the development of symbolic codes that are superior to natural language for sharing semantics, so that it becomes easier to reach a shared understanding across the boundaries of the specialised domains we work in. We now have the technology to reduce semantic communication errors (the difference between intent and interpretation) to an extent that is comparable to the reduction of syntactic communication errors achieved with written language. If we continue to rely too heavily on natural language, we are running a significant risk of ending the existence of humanity due to a misunderstanding.

Life is language

Life and languages continuously evolve, whether we like it or not. Life shapes usand we attempt to shape life. We are part of a dynamic system with increasingly fast feedback loops.

Life interprets languages, and languages interpret life.

Language is life.

Sharpening your collaborative edge

All animals that have a brain, including humans, rely on mental models (representations) that are useful within the specific context of the individual. As humans we are consciously aware of some of the concepts that are part of our mental model of the world, and we can use empirical techniques to scratch the surface of the large unconscious parts of our mental model.

When making decisions, it is important to remember that there is no such thing as a correct model, and we entirely rely on models that are useful or seem useful from the perspective of our individual view point, which has been shaped by our perceptions of the interactions with our surroundings. One of the most useful features of our brains is the subconscious ability to perceive concrete instances of animals, plants, and inanimate objects. This ability is so fundamental that we have an extremely hard time not to think in terms of instances, and we even think about abstract concepts as distinct things or sets (water, good, bad, love, cats, dogs, …). Beyond concepts, our mental model consist of the perceived connections between concepts (spacial and temporal perceptions, cause and effect perceptions, perceived meaning, perceived understanding, and other results of the computations performed by our brain).

The last two examples (perceived meaning and understanding) in combination with the unconscious parts of our mental model are the critical elements that shape human societies. Scientists that attempt to build useful models face the hard tasks of

  • making parts of their mental model explicit,
  • designing measurement tools and experiments to validate the usefulness of their models,
  • and of reaching a shared understanding amongst a group of peers in relation to the usefulness of a model.

In doing so, natural scientists and social scientists resort to mathematical techniques, in particular techniques that lead to models with predictive properties, which in turn can be validated by empirical observations in combination with statistical techniques. This approach is known as the scientific method, and it works exceptionally well in physics and chemistry, and to a very limited extent it also works in the life sciences, in the social sciences, and other domains that involve complex systems and wicked problems.

The scientific method has been instrumental in advancing human knowledge, but it has not led to any useful models for representing the conscious parts of our mental model. This should not surprise. Our mental model is simply a collection of perceptions, and to date all available tools for measuring perceptions are very crude, most being limited to measuring brain activity in response to specific external stimuli. Furthermore, each brain is the result of processing a unique sequence of inputs and derived perceptions, and our perceptions can easily lead us to beliefs that are out of touch with scientific evidence and the perceptions of others. In a world that increasingly consists of digital artefacts, and where humans spend much of their time using and producing digital artefacts, the lack of scientifically validated knowledge about how the human brain creates the perception of meaning and understanding is of potential concern.

The mathematics of shared understanding

However, in order to improve the way in which humans collaborate and make decisions, there is no need for an empirically validated model of the human brain. Instead, it is sufficient to develop a mathematical model that allows the representation of concepts, meaning, and understanding in a way that allows humans to share and compare parts of mental models. Ideally, the shared representations in question are designed by humans for humans, to ensure that digital artefacts make optimal use of the human senses (sight, hearing, taste, smell, touch, acceleration, temperature, kinesthetic sense, pain) and human cognitive abilities. Model theory and denotational semantics, the mathematical disciplines needed for representing the meaning of any kind of symbol system, have only recently begun to find their way into applied informatics. Most of the mathematics were developed many years ago, in the first half of the 20th century.

To date the use of model theory and denotational semantics is mainly limited to the design of compilers and other low-level tools for translating human-readable specifications into representations that are executable by computing hardware. However, with a bit of smart software tooling, the same mathematical foundation can be used for sharing symbol systems and associated meanings amongst humans, significantly improving the speed at which perceived meaning can be communicated, and the speed at which shared understanding can be created and validated.

For most scientists this represents an unfamiliar use of mathematics, as meaning and understanding is not measured by an apparatus, but is consciously decided by humans: The level of shared understanding between two individuals with respect to a specific model is quantified by the number of instances that conform to the model based on the agreement between both individuals. At a practical level the meaning of a concept can be defined as the usage context of the concept from the specific view point of an individual. An individual’s understanding of a concept can be defined as the set of use cases that the individual associates with the concept (consciously and subconsciously).

These definitions are extremely useful in practice. They explain why it is so hard to communicate meaning, they highlight the unavoidable influence of perception, and they encourage people to share use cases in the form of stories to increase the level of shared understanding. Most importantly, these definitions don’t leave room for correct or incorrect meanings, they only leave room for different degrees of shared understanding – and encourage a mindset of collaboration rather than competition for “The truth”. The following slides provide a road map for improving your collaborative edge.

Sharpening Your Collaborative Edge

After reaching a shared understanding with respect to a model, individuals may apply the shared model to create further instances that match new usage contexts, but the shared understanding is only updated once these new usage contexts have been shared and agreement has been reached on model conformance.

Emerging technologies for semantic modelling have the potential to reshape communication and collaboration to a significant degree, in particular in all those areas that rely on creating a shared understanding within a community or between communities.