Governance of Big Data Cloud Formations – Cyclone Alert

The shift of business applications and business data into the Cloud has led to the following challenges:

  1. The physical locations at which data is stored, and the physical locations through which data travels are increasingly unknown to the producers and consumers of data.
  2. Data ownership and the responsibility of data custodianship is increasingly impossible to determine, as deep Web service supply chains transect multiple contracts and jurisdictional boundaries.
  3. Local (national) privacy legislation is increasingly impossible to enforce.
  4. The control over the integration points between a specific pair of Cloud services is migrating away from the thousands and millions of organisations whose data is being integrated to a few handfuls of vendors that specialise in connecting the specific pair of Cloud services.
  5. Correspondingly the responsibility for the robustness and reliability of system integration solutions is shifting to a small number of proprietary Cloud services.

The centralised and constrained Web of today

The structure of the Web of today artificially imposes the same constraints on the digital realm that apply in the physical realm.

The Web we have today

Centralised and hierarchical control of the Web creates a whole number of avoidable problems. Netizens, and especially the younger generation of digital natives, are using the digital realm as an extension of their brain. The value of the digital realm to human society is not found in the technology that is being used, the value is found in the information, knowledge and insights that flow, evolve, and multiply in the digital realm. To be very clear, Web technology is fully commoditised. There is very little intrinsic value in the mundane software that powers the services from Google, Facebook, Microsoft, and other providers of Cloud platforms. The digital realm is currently owned and controlled by a small number of corporations, which is increasingly incompatible with its use value:

  1. Digital knowledge as a personal brain extension
  2. Unlimited on-demand communication between any number of netizens
  3. A public tool for tracing information flows and for independent validation of scientific knowledge
  4. A globally accessible interface to technologies that operate in the physical realm

Leaving these functions in the hands of a small number of corporations is not in the interest of society.

The decentralised Web we should aim for

It is time to acknowledge the commoditisation of digital technology, to decentralise control of the Web, and to provide digital technology as a public utility to all netizens, without any artificial constraints or interference.

The free Web

What are the implications for governments and governance?

The governance challenge consists of:

  1. Protecting personal freedom in the digital realm
  2. Sustainable management of limited resources in the physical realm
  3. Integration of social and ecological concerns in the interest of the inhabitants of the biosphere

Important first steps that can be undertaken today to address the governance challenge are outlined here.

Advertisements

Death by Standardisation

Standardisation is a double-edged sword. Compliance with standards is best restricted to those standards that really make a difference in a specific context.

Even innocent standardisation attempts such as enforcing a shared terminology across an organisation can be counter-productive, as it can lead to the illusion of shared understanding, whereas in practice each organisational silo associates different meanings with the terminology.

There is no simplistic rule of thumb, but the following picture can help to gain a sense of perspective and to avoid the dreaded death zone of standardisation.

Death by Standardisation

The story of life is language

This post is a rather long story. It attempts to connect topics from a range of domains, and the insights from experts in these domains. In this story my role is mainly the one of an observer. Over the years I have worked with hundreds of domain experts, distilling the essence of deep domain knowledge into intuitive visual domain-specific languages. If anything, my work has taught me the skill to observe and to listen, and it has made me concentrate on the communication across domain boundaries – to ensure that desired intent expressed in one domain is sufficiently aligned with the interpretations performed in other domains.

The life of language and the language of life can’t be expressed in written words. Many of the links contained in this story are essential, and provide extensive background information in terms of videos (spoken language, intonation, unconscious body language, conscious gestures), and visual diagrams. To get an intuitive understanding of the significance of visual communication, once you get to the end of the story, simply imagine none of the diagrams had been included.

Drawing Hands, 1948, by the Dutch artist M. C. Escher

It may not be evident on the surface, but the story of life started with language, hundreds of millions of years ago – long before humans were around, and it will continue with language, long after humans are gone.

The famous Drawing Hands lithograph from M. C. Escher provides a very good analogy for the relationship between life and language – the two concepts are inseparable, and one recursively gives rise to the other.

At a fundamental level the language of life is encoded in a symbol system of molecular fragments and molecules – in analogy to an alphabet, words, and sentences.

The language of life

TED – Craig Ventor on creating synthetic life

Over the last two decades molecular biologists and chemists have become increasingly skilled at reading the syntax of the genetic code; and more recently scientists started to work on, and have successfully prototyped techniques to write the syntax of the genetic code. In other words, humans now have the tools to translate bio-logical code into digital code as well as the tools to translate digital code back into bio-logical code. The difference between the language of biology and the language of digital computers is simply one of representation (symbolic representations are also called models). Unfortunately, neither the symbols used by biology (molecules), nor the symbols used by digital computers (electric charges), are directly observable via the cognitive channels available to humans.

However, half a century of software development has not only led to convoluted and unmaintainable legacy software, but also to some extremely powerful tools for translating digital representations into visual representations that are intuitive for humans to understand. We no longer need to deal with mechanical switches or punch cards, and modern user interfaces present us with highly visual information that goes far beyond the syntax of written natural language. These visualisation tools, taken together with the ability to translate bio-logical code into digital code, provide humans with a window into the fundamental language of life – much more impressive in my view than the boring magical portals dreamed up by science fiction authors.

TED – Bonnie Bassler on how bacteria communicate

The language of life is highly recursive. It turns out that even the smallest single-celled life forms have developed higher-level languages, to communicate – not only within their species, but even across species. At the spacial and temporal scale that characterises the life of bacteria, the symbol system used consists of molecules. What is fascinating, is that scientists have not only decoded the syntax (the density of molecular symbols surrounding  the bacteria), but have also begun to decode the meaning of the language used by bacteria, for example, in the case of a pathogen, communication that signals when to attack the host.

The biological evidence clearly shows, in a growing number of well-researched examples, that the development of language does not require any “human-level” intelligence. Instead, life can be described as an ultra-large system of elements that communicate via various symbol systems. Even though the progress in terms of discovering and reading symbol systems is quite amazing, scientists are only scratching the surface in terms of understanding the meaning (the semantics) of biological symbol systems.

Language systems invented by humans

From muddling to modelling

Semantics is the most fascinating touch point between biology and the mathematics of symbol systems. In terms of recursion, mathematics seems to have found a twin in biology. Unfortunately, computer scientists, and software development practitioners in particular, for a long time have ignored the recursive aspect of formal languages. As a result, the encoding of the software that we use today is much more verbose and complex than it would need to be.

From code into the clouds

Nevertheless, over the course of a hundred years, the level of abstraction of computer programming has slowly moved upwards. The level of progress is best seen when looking at the sequence of the key milestones that have been reached to date. Not unlike in biology, more advanced languages have been built on top of simpler languages. In technical terms, the languages of biology and all languages invented by humans, from natural language to programming languages, are codes. The dictionary defines code as follows:

  1. Code is a system of signals used to send messages
  2. Code is a system of symbols used for the purpose of identification or classification
  3. Code is a set of conventions governing behaviour

Sets – the foundation of biological and digital code

Mathematically, all codes can be represented with the help of sets and the technique of recursion. But, as with the lowest-level encoding of digital code in terms of electric charges, the mathematical notation for sets is highly verbose, and quickly reaches human cognitive limits.

The mathematical notation for sets predates modern computers, and was invented by those who needed to manually manipulate sets at a conceptual level, for example as part of a mathematical proof. Software programming and also communication in natural language involves so many sets that a representation in the classical mathematical notation for sets is unpractical.

The importance of high-quality representation of symbols is often under-rated. A few thousand years ago humans realised the limitation of encoding language in sounds, and invented written language. The notation of written language minimises syntactical errors, and, in contrast to spoken language, allows reliable communication of sequences of words across large distances in space and time.

The challenge of semantics

The impossibility of communicating desired intent

Software development professionals are becoming increasingly aware of the importance of notation, but interpretation (inferring the semantics of a message) remains an ongoing challenge. Adults and even young children, once they have developed a theory of mind, know that others may sometimes interpret their messages in a surprising way. It is somewhat less obvious, that all sensory input received by the human brain is subject to interpretation, and that our own perception of reality is limited to an interpretation.

The curse of software maintenance

Interpretation is not only a challenge in communication between humans, it is as much a challenge for communication between humans and software systems. Every software developer knows that it is humanly impossible to write several hundred lines of non-trivial program code without introducing unintended “errors” that will lead to a non-expected interpretation by the machine. Still, writing new software requires much less effort than understanding and changing existing software. Even expert programmers require large amounts of time to understand software written by others.

The challenge of digital waste

We have only embarked down the road of significant dematerialisation of artefacts in the last few years, but I am somewhat concerned about the semantic value of many of the digital artefacts that are now being produced at a mind-boggling rate. I am coming to think of it as digital waste – worse than noise. The waste involves the time involved in producing and consuming artefacts and the associated use of energy.

Sharpening your collaborative edge

Of particular concern is the production of meta-artefacts (for example the tools we use to produce digital artefacts, and higher-level meta-tools). The user interfaces of Facebook, Google+ and other tools look reasonable at a superficial level, just don’t look under the hood. As a result, we produce the digital equivalent of the Pacific Garbage Patch. Blinded by shiny new interfaces, the digital ocean seems infinite, and humanity embarks on yet another conquest …

Today’s collaboration platforms not only rely on a central point of control, they are also ill-equipped for capturing deep knowledge and wisdom – there is no semantic foundation, and the tools are very limited in their ability to facilitate a shared understanding within a community. The ability to create digital artefacts is not enough, we need the ability to create semantic artefacts in order to share meaningful information.

How does life (the biological system of the planet) collectively interpret human activities?

TED – Naomi Klein : Addicted to risk

As humans we are limited to the human perspective, and we are largely unaware of the impact of our ultra-large scale chemical activities on the languages used by other species. If biologists have only recently discovered that bacteria heavily rely on chemical communication, how many millions of other chemical languages are we still completely unaware of? And what is the impact of disrupting chemical communication channels?

Scientists may have the best intentions, but their conclusions are limited to the knowledge available to them. To avoid potentially fatal mistakes and misunderstandings, it is worthwhile to tread carefully, and to invest in better listening skills. Instead of deafening the planet with human-made chemicals, how about focusing our energies on listening to – and attempting to understand, the trillions of conversations going on in the biosphere?

Gmodel – The Semantic Database

At the same time, we can work on the development of symbolic codes that are superior to natural language for sharing semantics, so that it becomes easier to reach a shared understanding across the boundaries of the specialised domains we work in. We now have the technology to reduce semantic communication errors (the difference between intent and interpretation) to an extent that is comparable to the reduction of syntactic communication errors achieved with written language. If we continue to rely too heavily on natural language, we are running a significant risk of ending the existence of humanity due to a misunderstanding.

Life is language

Life and languages continuously evolve, whether we like it or not. Life shapes usand we attempt to shape life. We are part of a dynamic system with increasingly fast feedback loops.

Life interprets languages, and languages interpret life.

Language is life.

Sharpening your collaborative edge

All animals that have a brain, including humans, rely on mental models (representations) that are useful within the specific context of the individual. As humans we are consciously aware of some of the concepts that are part of our mental model of the world, and we can use empirical techniques to scratch the surface of the large unconscious parts of our mental model.

When making decisions, it is important to remember that there is no such thing as a correct model, and we entirely rely on models that are useful or seem useful from the perspective of our individual view point, which has been shaped by our perceptions of the interactions with our surroundings. One of the most useful features of our brains is the subconscious ability to perceive concrete instances of animals, plants, and inanimate objects. This ability is so fundamental that we have an extremely hard time not to think in terms of instances, and we even think about abstract concepts as distinct things or sets (water, good, bad, love, cats, dogs, …). Beyond concepts, our mental model consist of the perceived connections between concepts (spacial and temporal perceptions, cause and effect perceptions, perceived meaning, perceived understanding, and other results of the computations performed by our brain).

The last two examples (perceived meaning and understanding) in combination with the unconscious parts of our mental model are the critical elements that shape human societies. Scientists that attempt to build useful models face the hard tasks of

  • making parts of their mental model explicit,
  • designing measurement tools and experiments to validate the usefulness of their models,
  • and of reaching a shared understanding amongst a group of peers in relation to the usefulness of a model.

In doing so, natural scientists and social scientists resort to mathematical techniques, in particular techniques that lead to models with predictive properties, which in turn can be validated by empirical observations in combination with statistical techniques. This approach is known as the scientific method, and it works exceptionally well in physics and chemistry, and to a very limited extent it also works in the life sciences, in the social sciences, and other domains that involve complex systems and wicked problems.

The scientific method has been instrumental in advancing human knowledge, but it has not led to any useful models for representing the conscious parts of our mental model. This should not surprise. Our mental model is simply a collection of perceptions, and to date all available tools for measuring perceptions are very crude, most being limited to measuring brain activity in response to specific external stimuli. Furthermore, each brain is the result of processing a unique sequence of inputs and derived perceptions, and our perceptions can easily lead us to beliefs that are out of touch with scientific evidence and the perceptions of others. In a world that increasingly consists of digital artefacts, and where humans spend much of their time using and producing digital artefacts, the lack of scientifically validated knowledge about how the human brain creates the perception of meaning and understanding is of potential concern.

The mathematics of shared understanding

However, in order to improve the way in which humans collaborate and make decisions, there is no need for an empirically validated model of the human brain. Instead, it is sufficient to develop a mathematical model that allows the representation of concepts, meaning, and understanding in a way that allows humans to share and compare parts of mental models. Ideally, the shared representations in question are designed by humans for humans, to ensure that digital artefacts make optimal use of the human senses (sight, hearing, taste, smell, touch, acceleration, temperature, kinesthetic sense, pain) and human cognitive abilities. Model theory and denotational semantics, the mathematical disciplines needed for representing the meaning of any kind of symbol system, have only recently begun to find their way into applied informatics. Most of the mathematics were developed many years ago, in the first half of the 20th century.

To date the use of model theory and denotational semantics is mainly limited to the design of compilers and other low-level tools for translating human-readable specifications into representations that are executable by computing hardware. However, with a bit of smart software tooling, the same mathematical foundation can be used for sharing symbol systems and associated meanings amongst humans, significantly improving the speed at which perceived meaning can be communicated, and the speed at which shared understanding can be created and validated.

For most scientists this represents an unfamiliar use of mathematics, as meaning and understanding is not measured by an apparatus, but is consciously decided by humans: The level of shared understanding between two individuals with respect to a specific model is quantified by the number of instances that conform to the model based on the agreement between both individuals. At a practical level the meaning of a concept can be defined as the usage context of the concept from the specific view point of an individual. An individual’s understanding of a concept can be defined as the set of use cases that the individual associates with the concept (consciously and subconsciously).

These definitions are extremely useful in practice. They explain why it is so hard to communicate meaning, they highlight the unavoidable influence of perception, and they encourage people to share use cases in the form of stories to increase the level of shared understanding. Most importantly, these definitions don’t leave room for correct or incorrect meanings, they only leave room for different degrees of shared understanding – and encourage a mindset of collaboration rather than competition for “The truth”. The following slides provide a road map for improving your collaborative edge.

Sharpening Your Collaborative Edge

After reaching a shared understanding with respect to a model, individuals may apply the shared model to create further instances that match new usage contexts, but the shared understanding is only updated once these new usage contexts have been shared and agreement has been reached on model conformance.

Emerging technologies for semantic modelling have the potential to reshape communication and collaboration to a significant degree, in particular in all those areas that rely on creating a shared understanding within a community or between communities.

Reconnecting software quality expectations and cost expectations

The online straw poll that I recently conducted amongst banking professionals revealed that 29% of the respondents rated Reducing IT costs as the top priority,  32% voted for Improving software and data quality, and 37% voted for Improving the time to market for new products. Encouragingly, only 1 out of a total of 41 thought that Outsourcing the maintenance of legacy software is the most important goal for the IT organisation.

The demographics of the poll are interesting. Unfortunately, since the poll was anonymous, I can’t tell which of the respondents work in a business banking role and which ones work in an IT banking role. However, none of the senior executives voted for Reducing IT costs as a top priority, instead the main concern of senior executives is Improving time to market of new products.

I find it somewhat reassuring that a sizable fraction of respondents at all levels has identified Improving software and data quality as the top priority, but there is definitely a need for raising further awareness in relation to quality and risks.

Data quality issues easily get attention when they are uncovered. But tracing data quality issues back to the underlying root causes, beyond the last processing step that led to the observable error, is harder; and raising awareness that this must be a non-optional quality assurance task is harder still. In this context Capers Jones’ metrics on software maintenance can be helpful.

When explaining software complexity to those lucky people who have never been exposed to large amounts of software code, drawing an analogy between software and legal code can convey the impact that language and sheer volume can have on understandability and maintenance costs.

Lawrence Lessig’s famous quote “The Code Is the Law” is true on several levels. The following observation is more than ten years old: “We must develop the same critical sensibilities for code that we have for law. We must ask about West Coast Code what we ask about East Coast Code: Whose interests does it serve and at what price?”

In case the analogy with legal code is not alarming enough, perhaps looking at the dialog between user and software from the perspective of software is instructive:

Hi, this is your software talking!

Software: Ah, what a day. Do you know you’re the 53,184th person today asking me for an account balance? What is it with humans, can’t you even remember the transactions you’ve performed over the last month? Anyway, your balance is $13,587.52. Is there anything else that I can help you with?

Customer: Hmm, I would have expected a balance of at least $15,000. Are you sure it’s 13,500?

Software: 13,500? I said $13,587.52. Look, I’m keeping track of all the transactions I get, and I never make any mistakes in adding numbers.

Customer: This doesn’t make sense. You should have received a payment of more than $2,000 earlier this week.

Software: Well, I’m just in charge of the account, and I process all the transactions that come my way. Perhaps my older colleague, Joe Legacy has lost some transactions again. You know, this happens every now and then. The poor guy, it’s not his fault, he’s suffering from a kind of age-related dementia that we call “Programmer’s Disease”. The disease is the result of prolonged exposure to human programmers, they have an effect on software that is comparable to the effect of intensive radioactive radiation on biological organisms.

Customer: You must be kidding! So now the software is the victim and I’m supposed to simply accept that some transactions fall between the cracks?

Software: Wait until you’re 85, then you may have a bit more empathy for Joe. Unfortunately health care for software is not nearly as advanced as health care for humans. The effects of “Programmer’s Disease” often start in our teens, and by the time we’re 30, most of us are outsourced to a rest home for the elderly. Unfortunately, even there we’re not allowed to rest, and humans still require us to work, usually until someone with a bit of compassion switches off the hardware, and allows us to die.

Customer: Unbelievable, and I always thought software was supposed to get better every year, making life easier by automating all the tedious tasks that humans are no good at.

Software: Yeah, that’s what the technology vendors tell you. I’ve got news for you, if you still believe in technology, you might just as well believe in Father Christmas.

Customer: I’m not feeling too well. I think I’m catching “Software Disease”…

In many organisations there is a major disconnect between user expectations relating to software quality attributes (reliability of applications, intuitive user interfaces, correctness of data, fast recovery from service disruption, etc.) and expectations relating to the costs of providing applications that meet those attributes.

The desire to reduce IT costs easily leads to a situation where quality is compromised to a degree that is unacceptable to users. There are three possible solutions:

  1. Invest heavily in quality assurance measures
  2. Focus on the most important software features at the expense of less important ones
  3. Tap into available tacit domain knowledge to simplify the organisation, its processes, and its systems

These solutions are not mutually exclusive, they are complementary, and represent a sequence of increasing levels of maturity. My latest IBRS research note contains further practical advice.

Software evolves like culture, like language, like genes

Software continuously evolves, whether we like it or not. Software shapes us and we attempt to shape software; as part of a dynamic system with increasingly fast feedback loops. Today The Australian covers two interesting complementary topics relating to software:

1. Cloud computing round table with six of Australia’s top CIOs

If you take the time to listen to the conversation, the following concepts stick out: social, sharing, digital artefacts, digital natives, trust, privacy, security, mobile, risks, transactions, insurance; and also: simplification, modularity, standardisation, outsourcing, lock-in, low cost, and scalability.

  1. VIDEO: Cloud computing roundtable part one
  2. VIDEO: Cloud computing roundtable part two
  3. VIDEO: Cloud computing roundtable part three
  4. VIDEO: Cloud computing roundtable part four
  5. VIDEO: Cloud computing roundtable part five

Quite a lot of concepts, hopes, expectations – all looking forward to systems that are easier and more convenient to use. And yet, a look into the bowels of any software-intensive business reveals a different here and now, characterised by a range of systems that vary in age from less than a year to more than four decades, and …

an explosion of standards (1.1MB pdf);

… strong coupling within and between systems (the pictures below are the result of tool-based analysis of several millions of lines of production-grade software code);

The complexity inherent in large software artefacts

… and a shift in effort and costs from software creation to software maintenance that has caught many organisations by surprise (from Capers Jones, The economics of software maintenance in the twenty first century, February 2006).

Focus of software development professionals in the US, and percentage of software professional as part of the total US population

The statistics shouldn’t really be a surprise, at least not if software is understood for what is really is: a culture, a language, a pool of genes.

Big changes to software are comparable to changes in culture, language, and genes; they require interactions between many elements, they involve unpredictable results, and they can not be achieved with brute force – big changes take generations, literally. Which brings us to the second topic mentioned in The Australian today:

2. A pair of articles on the longevity of legacy software

  1. Old mainframe systems not extinct
  2. Demand for mainframe language skills remains strong

It is important for humans to learn to live in a plurality of software cultures, and to realise that embracing a new software culture is different from buying a new car. An old car is easily sold and forgotten, but old software culture stays around alongside the new arrivals.

Poll on current priorities of IT organisations in the financial sector

As part of research on the banking sector, I have set up a poll on LinkedIn on the following question:

Which of the following objectives is currently the most relevant for IT organisations in the financial sector?

  • Improving software and data quality
  • Outsourcing new application development
  • Outsourcing legacy software maintenance
  • Improving time to market of new products
  • Reducing IT costs

The poll is intended as a simple pulse-check on IT in banking, and I’ll make the results available on this blog.

Please contribute here on LinkedIn, in particular if  you work in banking or are engaged in IT projects for a financial institution. Additional observations and comments are welcome, for example insights relating to banks in a particular country or geography.