Photo: Gerardo Garcia for tangiblemode

leer versión en español


Who has ever really listened to a word?

When it comes to listening to words, ears commonly tend to focus on a limited range of decoding processes. From meaning to subliminal tone and intensity clues, most of the cognitive effort goes into ‘understanding’ the speaker. From a broader aural point of view, that’s quite a poor listening, even for a single spoken word. Spoken words are not just symbols –such as those flat, typed words on a screen or on paper– but real things, physical objects, living events in our 3-dimensional world.

So far, little attention has been paid to sound richness when digitally broadcasting or streaming speech. Even less so in e-learning audiovisual contexts wherein sound quality most often plays a secondary role — surrendering to poor production and/or awful amounts of digital compression. Most web multimedia players are built to prioritise visual quality by default. ‘Minimum viable quality’ for audio normally applies as soon as that blunt threshold of ‘intelligible’ speech is reached. Again, making possible merely ‘decoding’ the meaning of words…


Actually, there is much more to listen to and discover in spoken words. Real words are placed in a 3D space, they have location, size, width, projection… In short: words, like real physical objects, have a strong presence — they fill our space, not just our time.

Consider the following list of spoken words. Where does that voice come from? Could you locate it within a virtual 3D* space surrounding your own body?

[*Please note: use your usual ear/headphones in order to access 3D sound. Not intended for loudspeakers. Set a mid volume, never too loud. And set the video quality to HD-1080p.]

With eyes closed…


…or with your eyes open:

Monday or Tuesday | example | 105



. . .


Once we start listening to words-in-space, we might also start listening to speech as a sort of ‘music’, exposing our attention to a waterfall of timings and rhythms that, actually, had always been there…

Now consider the following “word-shower”, a string of word-lists. Pay close attention to things like the time distance between consecutive words within a list, the changing rhythms and slight overlap produced by different speeds and speech timings.

You can either open or close your eyes:

Monday or Tuesday | example | 116_107


Throughout the word-shower above, it is possible to feel a certain relaxation ‘curve’ as the so-called “cognitive load” decreases. The cognitive load could be defined as the amount of mental effort we need in order to process (understand, digest, even enjoy) the total audio/visual/verbal information received within a specific time length. In this example, the ratio between the amount of words and the length of a given time unit (list) would provide an acceptable measure of “cognitive load” in a very basic way.

From the centre of the word-shower onwards, you will probably start perceiving the meaning of individual, somewhat disconnected words while the sense of different word streams (set of lists) lingers. On the very last list, the distance between words is such that you have a lot of ‘spare time’ to fill in between. That could provoke different kinds of on-the-fly perceptions: thoughts, short memories or… perhaps a sense of emptiness, nothing at all.

. . .

meaning, density & cognitive load

Far from disconnected, every word-list shown encompasses a certain unit of meaning. We could consider them as sort of ‘micro-chapters’ in development. They can also be seen as ‘seeds’ to grow or, in any case, small linguistic artifacts ready to develop… You might even like to know that those fragments come from a classic literary piece. At this point, the title and well-known author still don’t need to be revealed.

The Cognitive Load Theory, established decades ago and further developed for optimising instructional design and e-learning, doesn’t fully account for the special type of extended, hybrid perception we are aiming at.

We intend to place all this somewhere in between concise meaning/s and a more creative or aesthetic appreciation of language.

Let’s go back to listening and make a leap forward. We started this intro by talking about ‘decoding’ meaning on a single word, understanding a single voice, facing one speaker at a time. However, our ears and brain, under proper training, can get to process two, three or even more voices at once. In this case, the cognitive strategy in place falls closer to music polyphony than any other field.

Consider the next string and get ready to split your attention into halves, thirds, fourths, fifths or even sixths…

Monday or Tuesday | example | 102_602


Something curious happens here. It seems that, despite some extra cognitive load and voice split, meaning gets to emerge eventually as a single unit. Could you tell? Notice that this takes place through 6 consecutive “stages” of meaning development. More and more detail is shown — as if meaning progressively increased its resolution. Every stage (#1, #2, #3, etc.) is named after the amount of joint speech particles (the so-called n-grams) between silences. “Stage #1” is made up of single words. A progressive addition of particles (articles, prepositions, nouns, verbs, adjectives…) leads to strings of fully-formed sentences. The process peaks at “stage #6” where, in this case, a fully formed version of the text-unit (verse, paragraph, excerpt) is presented in a completely fluent state.

. . .


We could even move further, to a somewhat ‘advanced level’. Let’s call it the “super-fluent” state. By progressively increasing word density and text complexity in parallel, we can boost training quality both in listening and reading. The top state would be pointing towards the art of memory, the type of artistic skill required for professionals such as play actors.

Monday or Tuesday | example | 116_616


Speakers, listeners, readers, language learners, academics, lecturers, creative readers, actors, all of them according to specific needs, are equally suitable for ‘growing’ whichever texts –in space and time– out of single word-lists spreading out across a virtual 3D audio/visual space.

. . .

a starting point

This sort of walk-through, peppered with some 3D-audio and multiscreen visuals, was intended to draw your attention to spoken words as powerful sound objects in the real world with significant cognitive implications (attention, perception, memory, aesthetics). Some of these ideas are currently being turned into highly innovative tools for e-learning and aesthetic enjoyment: Augmented Reality applied to arts and education content.

Originally published on in L-iterations (Jul 2, 2017).