shytone  books  music  essays  home  exploratories  new this month

book reviews



Steven Mithen: The Singing Neanderthals:
the origins of music, language, mind and body
(Weidenfeld & Nicolson: 2005)


“Music and language are universal features of human society. They can be manifest vocally, physically, and in writing; they are hierarchical, combinatorial systems which involve expressive phrasing, and are reliant on rules that provide recursion and generate an infinite number of expressions from a finite set of elements. Both communication systems involve gesture and body movement. In all of these regards, they may well share what Douglas Dempster calls some ‘basic cognitive stuff’. Yet the differences are profound. Spoken language transmits information because it is constituted by symbols, which are given their full meaning by grammatical rules; notwithstanding formulaic phrases, linguistic utterances are compositional. On the other hand, musical phrases, gestures, and body language are holistic; their ‘meaning’ derives from the whole phrase as a single entity. Spoken language is both referential and manipulative; some utterances refer to things in the world, while others [aim to] make the hearer think and behave in certain ways. Music, on the other hand, is principally manipulative, because it induces emotional states and physical movement by entrainment. So, where does this leave us with regard to the relationship between music and language? ...Music is too different from language to be adequately explained as an evolutionary spin-off [- or vice versa - and ]...while music and language have their own unique properties, they still share more features than one would expect from entirely independent evolutionary histories. The remaining possibility is that there was a single precursor for both music and language: a communication system that had the characteristics that are now shared by music and language, but that split into two systems at some date in our evolutionary history.... It is my task in this book...not only to explain the origin of music and language, but also to provide a more accurate picture of the life and thought of our human ancestors.”
(Mithen, pp.25-6)

Whilst the evolutionary background of many of our complex behaviours has been the subject of much debate of late - language being only the most obvious case - music until very recently has been largely ignored, or dismissed as an inconsequential by-product of language. Yet, as Mithen convincingly argues, this is extremely unlikely, given its incredible hold upon our emotional lives...for emotions are phylogenetically old - key determinants of our choices - and anything capable of stirring them so deeply must have its own evolutionary raison d’etre, even if the likes of Steven Pinker cannot recognize it...

But, of course, Pinker is a linguist - and a good Chomskyan, at that - and so, trained to dismiss an enormous amount of evidence which does not accord with his prejudices...including quite a lot of important elements of language itself, elements which Steven Mithen would argue are central to the evolution of both language and music. For this book offers the best argument yet for the nature and development of what is usually described as “proto-language”...albeit, as Mithen argues, this term neglects the strongly musical nature of such, and hence encourages the neglect of much of importance. However, to properly deal with his arguments, we had better start with the definitions...and their lacunae:


“Bruno Nettl, the distinguished ethnomusicologist, defined music as ‘human sound communication outside the scope of language’. That is perhaps as good a definition as we can get.... The definition of language is, perhaps, more straightforward: a communication system composed of a lexicon - a collection of words with agreed meanings - and a grammar - a set of rules for how words are combined to form utterances. But, even this definition is contentious. Alison Wray, the champion of holistic proto-language, has argued that a considerable element of spoken language consists of ‘formulaic’ utterances - prefabricated phrases that are learnt and used as a whole. Idioms are the most obvious example, such as ‘straight from the horse’s mouth’ or ‘a pig in a poke’. Unlike the other sentences within this paragraph, the meaning of such phrases cannot be understood by knowledge of the English lexicon and grammatical rules. Wray and certain other linguists argue that the ‘words and rules’ definition of language places undue emphasis on the analysis of written sentences, and pays insufficient attention to the everyday use of spontaneous speech, which often contains very little corresponding to a grammatically correct sentence.... [Moreover,] traditional linguistics has neglected to study the rhythms and tempos of verbal interaction - the manner in which we synchronize our utterances when having a conversation. This is a fundamental and universal feature of our language use, and has an evident link with communal music-making.... [Furthermore,] even when sign language is taken out of the equation, many would argue that it is equally artificial to separate language from gesture. Movements of the hands or the whole body very frequently accompany spoken utterances...[and] the majority are quite spontaneous. Speakers are often unaware that they are gesticulating, and many find it difficult to inhibit such movements - in a manner similar to people’s inability to stop moving their bodies when they hear music.”
(Mithen, pp.11-17)

“The majority of spontaneous gestures used by modern humans are iconic, in the sense that they directly represent whatever is being verbally expressed, and..one striking finding is that everyone appears to use a similar suite of spontaneous gestures, irrespective of what language they speak. [Moreover,] gestures play a complementary role to spoken utterances, rather than being merely derivative or supplementary. So, gestures are not used simply to help the speaker retrieve words from his or her mental dictionary; they provide information that cannot be derived from the spoken utterance alone...[and] are particularly important for conveying information about the speed and direction of movement, about the relative position of people and objects, and about the relative size of people and objects.... The critical role of gesture in human communication...is perhaps best expressed in the words of David McNeill, whose 1992 book, Hand and Mind, pioneered the notion that gesture can reveal thought. McNeill explained that ‘Utterances possess two sides, only one of which is speech; the other is imagery, actional and visuo-spatial. To exclude the gestural side, as has been traditional, is tantamount to ignoring half of the message out of the brain.’ Thus body movement appears to be as crucial to language as it is to music...[even though] we are very poor at consciously attending to and using body language today. I suspect that this was very different for our non-linguistic ancestors.... The true significance of body language can perhaps be appreciated by recognizing that whereas speaking is an intermittent activity - it has been estimated that the average person talks for no more than twelve minutes a day - our body language is a continuous form of communication.... [For our ancestors, such movements] would have placed different nuances of intent or meaning onto the same basic holistic utterance/gesture. [Rudolf] Laban gives the simple example of the expressive range of gestures that can accompany the word ‘no’. He explains that one can ‘say’ this with movements that are pressing, flicking, wringing, dabbing, thrusting, floating, slashing, or gliding, each of which says ‘no’ in a quite different manner. Once such gestures are integrated into a sequence of body movements and vocalizations, once some are exaggerated, repeated, embedded within each other, one has both a sophisticated means of self-expression and communication, and a pattern of movements that together can be observed as pure dance alone.”
(Mithen, pp.155-7)

Amid the formalist obsession w/grammar that has plagued mainstream linguistics since Chomsky, such aspects of language are relegated to the sidelines...with the consequence that many of the most interesting - and, arguably, ancient - aspects of language are then perversely ignored by those attempting to theorize its evolution. And, when the evidence speaks loudly of the priority of prosody in the creation of meaning - as follows here - the usual tactic is to return to the ineffability of grammar, without attempting to answer the question:


“Both music and language have the property of expressive phrasing. This refers to how the acoustic properties of both spoken utterances and musical phrases can be modulated to convey emphasis and emotion. It can apply to either a whole utterance or phrase, or to selected parts. The word ‘prosody’ refers to the melodic, [timbral,] and rhythmical nature of spoken utterances; when the prosody is intense, speech sounds highly musical. Prosody plays a major role in the speech directed towards infants; indeed, whether the utterances ‘spoken’ to very young babies should be considered as language or as music is contentious. [Significantly,] although the content of language can be used to express emotion, it is subservient to the prosody. I can, for instance, state that ‘I am feeling sad’. The words alone, however, may be unconvincing. If I say that ‘I am feeling sad’ in a really happy voice, priority will be given to the intonation, and the inference drawn that I am, for some unknown reason, being ironic.”
(Mithen, p.24)

One of the most fascinating - and revealing - studies to be cited by Mithen examines so-called “musical savants”, whose outstanding abilities developed either without language, or in concert with severely restricted language abilities. Such people tend to share many similarities - perfect pitch (also extremely common in pre-linguistic infants), a strong tendency to echolalia (combining extreme sensitivity to sounds with an inability to attach symbolic meanings to same) and, perhaps most surprisingly, an aversion to rote playing, and a highly sophisticated level of musical understanding right across all areas tested, easily comparable to well-trained professional musicians. As Mithen underlines, this radically separates them from other types of savants, whose skills appear to be strongly circumscribed, suggesting that music as a whole may have far deeper evolutionary roots than other savant skills...

However, this does not mean that musical awareness cannot be dismembered by brain damage, for it is clear from the clinical record that timbre, rhythm and melody are all dissociable with specific types of brain damage - meaning that they are to a large extent ‘modular’ in nature, as is normal with regards to the earlier stages of sensory inputs into the brain. However, this should not be confused with modular claims re higher brain functions - although it often is - for which there is little neurobiological support. Interestingly, although Mithen has championed one version of the latter - his "cognitive fluidity" hypothesis - and still argues for it here, its status in this book is rather of a fifth wheel, since he has now come around to supporting other, more strongly based hypotheses re the inventive stagnation of erectus cultures (particularly Donald's "mimetic culture") which obviate the need for this claim. And prosody, lest we forget, is a key element in the making of mimetic  culture...


“[It is] the variations in intonation - dynamics, speed, timbre and so forth - that infuse speech with emotional content, and often influence its meaning. Prosody, as this is called, can sound very music-like, especially when exaggerated, as in the speech used to address young children. And it has a musical equivalent in melodic contour - the way pitch rises and falls as a piece of music is played out.... A study published in 1998 by Isabelle Peretz and her colleagues is particularly important, because it explicitly attempts to identify whether the same neural network within the brain processes sentence prosody and melodic contour, or whether independent systems are used.... The study examined two individuals who were suffering from amusia, but who appeared to differ in their abilities to perceive prosody in speech and melodic contour in music.... Peretz and her colleagues designed a very clever set of tests.... They began with sixty-eight spoken sentences, recorded as pairs, that were lexically identical, but differed in their prosody and hence their meaning.  For instance, the sentence ‘he wants to leave now?’ was spoken first as a question, by stressing the final word, and then as a statement, ‘he wants to leave now’. These are known as ‘question-statement’ pairs. ‘Focus-shift’ pairs were also used, in which the emphasis of the sentence was altered - for example, ‘take the train to Bruges, Anne’ was paired with ‘take the train to Bruges, Anne’.... Sentences of a third type formed ‘timing-shift’ pairs, where the location of a pause was varied so as to alter the meaning. For instance, ‘Henry, the child, eats a lot’ was paired with ‘Henry, the child eats a lot’.... [From the results of their tests,] Peretz and her colleagues concluded that there is indeed a stage where the processing of language and of melody utilize a single, shared neural network...[which is] used for holding pitch and temporal patterns in short-term memory.”
(Mithen, pp.55-8)

This result re prosody, however, is atypical, in that most aspects of musical awareness appear to be separate from their counterparts in language - as revealed by lesion studies - despite the undoubted fact that brain-imaging results suggest that they do share the same networks! What this strongly suggests, I would claim, is exactly what Mithen is arguing in this book - that the two communication systems were originally one, that it was much closer to music than to language, and that they have only comparatively recently divided...with language, in consequence, “colonizing” neighboring neural areas where its structural needs differed sufficiently. And, once we remove Chomsky’s blinkers, and admit the full range of language-related behaviour as evidence, such a model looks to be very  strongly supported by the evidence...


“’Baby-talk’, ‘motherese’, and ‘infant-directed speech’ (IDS) are all terms used for the very distinctive manner in which we talk to infants who have not yet acquired full language competence - that is, from birth up until around three years old. The general character of IDS will be well known to all: a higher overall pitch, a wider range of pitch, longer ‘hyperarticulated’ vowels and pauses, shorter phrases, and greater repetition than are found in speech directed to older children and adults. We talk like this because human infants demonstrate an interest in, and sensitivity to, the rhythms, tempos, and melodies of speech long before they are able to understand the meanings of words. In essence, the usual melodic and rhythmic features of spoken language - prosody - are highly exaggerated, so that our utterances adopt an explicitly musical character.... In general, the exaggerated prosody of IDS helps infants to split up the sound stream they hear, so that individual words and phrases can be identified. In fact, mothers of young children fine-tune the manner in which they use prosody to their infants’ current linguistic level. One would be mistaken, however, to believe that the prosody of IDS is primarily intended to help children accomplish the truly astounding task of learning language.”
(Mithen, pp.69-71)

“Ann Fernald has identified four developmental stages of IDS, of which only the last is explicitly about facilitating language acquisition.... For newborn and very young infants, IDS serves to engage and maintain the child’s attention, by providing an auditory stimulus to which it responds. Relatively intense sounds will cause an orienting response; sounds with a gently rising pitch may elicit eye opening; while those with an abrupt rising pitch will lead to eye closure and withdrawal. With slightly older infants...it now begins to modulate arousal and emotion. When soothing a distressed infant, an adult is more likely to use a low pitch and falling pitch contours; when trying to engage attention and elicit a response, rising pitch contours are more commonly used. If an adult is attempting to maintain a child’s gaze, then her speech will most likely display a bell-shaped contour. Occasions when adults need to discourage very young infants are rare; but when these do arise, IDS takes on a similar character to the warning signals found in non-human primates - brief and staccato, with steep, high-pitched contours. As a child ages, IDS enters its third stage, and its prosody takes on a more complex function: it now not only arouses the child, but also communicates the speaker’s feelings and intentions.... Owing to its exaggerated prosody, IDS is a more powerful medium than adult-directed speech for communicating intent to young children...[as] in IDS ‘the melody is the message’....[Finally, when] young children begin to understand the meaning of words, further subtle changes to IDS occur...[in which] the specific patterns of intonation and pauses facilitate the acquisition of language itself.”
(Mithen, pp.71-2)

“The idea that IDS is not primarily about language is supported by the universality of its musical elements. Whatever country we come from, and whatever language we speak, we alter our speech patterns in essentially the same way when we talk to infants.... If the exaggerated prosody of IDS were no more than a language-learning device, one would expect to find the IDS of peoples speaking Xhosa, Chinese and Japanese [- all voiced languages -] to be quite different from that of those speaking English, German, and Italian. That this is not the case strengthens the argument that the mental machinery of IDS belongs originally to a musical ability concerned with regulating social relationships and emotional states.... [Furthermore, there is a similarly] striking degree of cross-cultural unity in the melodies, rhythms, and tempos [of lullabies]”
(Mithen, pp.72-9)

Personally, I find the evidence from IDS to be one of the most compelling portions of Mithen’s argument, since it so clearly situates a holistic, near-musical communication system as developmentally prior to full-blown language, yet also - significantly - independent of it. Combined with the extremely strong parallels with the communication systems of our most vocal evolutionary cousins, I find it difficult to see how anyone bar what we might term “linguistic snobs” could fail to see the overwhelming support for a holistic proto-language - especially since there is precisely zero evidence for anything else....


“Studies of gelada monkeys have been important in understanding how acoustically variable calls, many of which sound distinctly musical, mediate social interactions.... [These include] ‘fast rhythms, slow rhythms, staccato rhythms, glissando rhythms; first-beat accented rhythms, end-accented rhythms; melodies that have evenly spaced musical intervals covering a range of two or three octaves; melodies that repeat exactly, previously produced, rising or falling musical intervals; and on and on: geladas vocalize a profusion of rhythmic and melodic forms.’ ...After making a detailed description of their use, and exploring the contexts in which they arose, [Bruce Richman] concluded that they performed much the same function as the rhythm and melody that is found in human speech and singing. In essence, the geladas used changes in rhythm and melody to designate the start and end of an utterance; to parse an utterance, so allowing others to follow along; to enable others to appreciate that the utterance was being addressed to them; and to enable others to make their own contribution at the most appropriate moment. In fact, Richman’s interpretation of how geladas use rhythm and melody appears strongly analogous to its use in the early and non-linguistic stages of infant-directed speech.”
(Mithen, pp.109-10)

Moreover, other studies have also found a nearly exact match between the type of pitch changes which mark the emotional expressions of humans and macaque monkeys - further evidence for evolutionary continuity in the prosodic aspects of speech.


“The communication systems and apes remain little understood. It was once believed that their vocalizations were entirely involuntary, and occurred only in highly emotional contexts...[however, it is now known they] are often deliberate, and play a key role in social life.... There are some common features. First, none of the vocalizations or gestures are equivalent to human words. They lack consistent and arbitrary meanings, and are not composed into utterances by a grammar that provides an additional level of meaning.... They are holistic. Secondly, the term ‘manipulative’ is also generally applicable...[since] monkeys and apes probably simply do not appreciate that other individuals lack the knowledge and intentions that they themselves possess. Rather than being referential, their calls and gestures...are trying to generate some form of desired behaviour in another.... A third feature may be applicable to the African apes alone: their communication systems are multi-modal, in the sense that they use gesture as well as vocalization. In this regard, they are similar to human language.... Finally, a key feature of the gelada and gibbon communication systems is that they are musical in nature, in the sense that they make substantial use of rhythm and melody, and involve synchronization and turn-taking. Again, depending on how one would wish to define ‘musical’, this term could be applied to non-human primate communication systems as a whole. The holistic, manipulative, multi-modal, and musical [acronym: “Hmmmm’] characteristics of ape communication systems provided the ingredients for that of the earliest human ancestors, living in Africa 6 million years ago, from which human language and music ultimately evolved.”
(Mithen, pp.120-1)

And so, with holistic communication systems firmly in place in both the evolutionary and developmental tracks - and with a range of other evidence, as we have seen, all pointing in the same direction - Mithen, in the second half of his book, goes on to flesh out the evolutionary story of what he (unfortunately) wants to call “Hmmmm”, an acronym for Holistic, Manipulative, Multi-Modal & Musical communication...a term I am fairly certain will not catch on, being extremely awkward to pronounce, as well as difficult to distinguish from its immediate evolutionary successor amongst Early Humans, “Hmmmmm”...in which “Mimetic” is added to the list!

Still, acronyms aside, the arguments are strong, the evidence (particularly when all factors are considered) surprisingly clear, and the result is by far the most impressive approach to communications amongst our ancestors we have yet seen.


“We can think of sounds emitted from the mouth as deriving from ‘gestures’, each created by a particular position of the so-called articulatory machinery - the muscles of the tongue, lips, jaw, and velum (soft palate). When we say the word ‘bad’, for instance, we begin with a gesture of the lips pursed together, whereas the word ‘dad’ begins with a gesture involving the tip of the tongue and the hard palate. So, each of our syllables relates to a particular oral gesture. The psychologist Michael Studdert-Kennedy argues that such gestures provide the fundamental units of speech, just as they form the units of ape vocalizations today, and hominid vocalizations in the past. As motor actions, such gestures ultimately derive from ancient mammalian capacities for sucking, licking, swallowing, and chewing. These began the neuroanatomical differentiation of the tongue that has enabled the tongue tip, tongue body, and tongue root to be used independently from each other.... Consequently, even though we should think of hominid vocalizations as holistic in character, they must have been constituted by a series of syllables derived from oral gestures. These, therefore, had the potential ultimately to be identified as discrete units...which could be used in a compositional language.”
(Mithen, p.129)

“We should envisage each holistic utterance as being made of one, or more likely a string, of [these] vocal gestures...expressed in conjunction with hand or arm gestures, and perhaps body language as a whole.... In addition, particular levels of pitch, tempo, melody, loudness, repetition, and rhythm would have been used to create particular emotional effects for each of these ‘Hmmmm’ utterances. Recursion, the embedding of one phrase within another, is likely to have become particularly important, in order to express and induce emotions with maximum effect.”
(Mithen, pp.149-150)

‘[There exist] two differing conceptions of proto-language - compositional and holistic...[and] monkey and ape vocalizations...are holistic, and provide a suitable evolutionary precursor for the type of holistic proto-language proposed by Alison Wray. But they provide no foundation for a [compositional] ‘words without grammar’ type of proto-language, as proposed by Derek Bickerton.... [Furthermore,] while [the latter] may have been adequate for communicating some basic observations about the world, it would have been unsuitable for what Alison Wray describes as ‘the other kind of messages’ - those relating to physical, emotional, and perceptual manipulation. It would not, for instance, have been suitable for the type of subtle and sensitive communication that is required for the development and maintenance of social relationships...the principle selective pressure for the evolution of vocal communication in early hominids. It is important to appreciate that Homo ergaster would have lived in socially intimate communities within which there would have been a great deal of shared experience and knowledge...[and] relatively slight demands for information exchange, compared with our experience today.... [Therefore,] there would have been limited, if any, selective pressure within their society for a ‘creative language’, one that could generate new utterances in the manner of the compositional language upon which we depend.”
(Mithen, pp.147-8)

As Mithen argues - dovetailing nicely with the work of both Jonathan Kingdon and Frank R. Wilson - the earliest shifts away from ape standards were simply by-products of Australopithecine’s erect stance, with no need for specific selection re communicative capabilities. However, the more open ground they (very) gradually ventured onto would have eventually forced them to congregate in larger groups (like Geladas, incidentally) , placing increased pressure of vocal communication and social intelligence. The results, however, were not straightforward, since many intertwined factors were involved:


“The increased range and diversity of vocalizations made possible by the new position and form of the larynx, and changes in dentition and facial anatomy in general, would certainly have enhanced the capacity for emotional expression and the inducing of emotions in others. But the musical implications of bipedalism go much further than simply increasing the range of sounds that could be made. Rhythm, sometimes described as the most central feature of music, is essential to efficient walking, running and, indeed, any complex coordination of our peculiar bipedal bodies. Without rhythm, we couldn’t use these effectively: just as important as the evolution of knee joints and narrow hips, bipedalism required the evolution of mental mechanisms to maintain the rhythmic coordination of muscle groups....  The key point is that, as our ancestors evolved into bipedal humans so, too, would their inherent musical abilities evolve - they got rhythm. One can easily imagine an evolutionary snowball occurring as the selection of cognitive mechanisms for time-keeping improved bipedalism, which led to the ability to engage in further physical activities that in turn required time-keeping for their efficient execution.... It may, indeed, be in this connection that the phenomenon of entrainment - the automatic movement of body to music - arose.”
(Mithen, pp.150-3)

“Whereas we should imagine the vocal communications of the australopithecines, Homo habilis and Homo rudolfensis as more melodious versions of those made by non-human primates today, those made by members of the Homo ergaster species, such as the Nariokotome boy, must have been very different, with no adequate analogy in the modern world.... I must, however, be careful not to exaggerate the musicality and communication skills of Homo ergaster, as this species marks only the beginning of an evolutionary process. The holistic phrases used by Homo ergaster - generic forms of greetings, statements, and requests - are likely to have been small in number, and the potential expressiveness of the human body may not have been realized until later, bigger-brained species of Homo had evolved. Moreover, Homo ergaster certainly lacked the anatomical adaptations for fine breathing control that are necessary for the intricate vocalizations of modern human speech and song.... The Nariokotome specimen did have a relatively large brain, compared to the 450 cubic centimetres of living African apes and australopithecines, but this is primarily a reflection of that specimen’s large body size. It is not until after 600,000 years ago that the brain size of Homo increases significantly...[which] can best be explained by selection pressures for enhanced communication, resulting in a far more advanced form of ‘Hmmmm’ than that used by Homo ergaster.”
(Mithen, p.158)

“As well as imitating how animals move, Early Humans could have imitated their calls, along with the other sounds of the natural world. We know that traditional peoples, those living close to nature, make extensive use of onomatopoeia in their names for living things...[whilst] the study of animal names provides another clue to the nature of Early Human ‘Hmmmm’ utterances, by virtue of the phenomenon of...‘sound synaesthesia’...the mapping from one type of variable - size - onto another - sound. Sound synaesthesia was recognized by Otto Jespersen in the 1920s.... Jespersen noted that ‘the sound [i] comes to be easily associated with small, and [u,o] with bigger things’.... [Moreover,] onomatopoeia and sound synaesthesia may not be the only universal principles at work in the naming of animals. The bird names of the Huambisa tend to have a relatively large number of segments of acoustically high frequency, which appear to denote quick and rapid motion, or what [ethnobiologist Brent] Berlin calls ‘birdness’. In contrast, fish names have lower frequency segments, which have connotations of smooth, slow, continuous flow - ‘fishness’.... In general, it appears that we can intuitively recognize the names belonging to certain types of animals, in languages that are quite unfamiliar to us, by making an unconscious link between the sound of the word and the physical characteristics of the animal. This finding challenges one of the most fundamental claims of linguistics: that of the arbitrary link between an entity and its name...[and] the implications for Early Human ‘Hmmmm’ utterances are profound.”
(Mithen, pp.169-71)

“The key feature of [Early Human pre-linguistic communications] is that they would not have been constructed out of individual elements that could could be recombined in a different order and with different elements, so as to make new messages. Each phrase would have been an indivisible unit, that had to be learned, uttered, and understood as a single acoustic sequence [like animal calls]. As Wray points out, the inherent weakness of a communication system of this type is that the number of messages will always be limited...[and] if holistic phrases were used with insufficient frequency, they would simply drop out of memory, and be lost. Similarly, the introduction of new phrases would be slow and difficult, because it would rely on a sufficient number of individuals learning the association....The ‘Hmmmmm’ communication system would, therefore, have been dominated by utterances descriptive of frequent and quite general events...[and] would instigate and preserve conservatism in thought and behaviour in a manner that a language constituted by words and grammatical rules would not.”
(Mithen, pp.172-3)

One useful aspect of Mithen’s approach, here, is that he incorporates key aspects of the best current theories in this area, which - unfortunately - are all too often presented separately w/no attempt at synthesis. Thus Merlin Donald’s work on mimetic culture, Geoffrey Miller’s arguments re sexual selection, Ellen Dissanayake’s theories on IDS, and William Benzon’s approach to group bonding through music all make their way into Mithen’s synthesis, as he sifts through the evidence, looking for how different aspects of the evolutionary story may have played out. However, the key theorist throughout remains Alison Wray, and her ideas are particularly important with regard to the emergence of language and music from their holistic precursor....


‘[Alison] Wray uses the term ‘segmentation’ to describe the process whereby humans began to break up holistic phrases into separate units, each of which had its own referential meaning and could then be recombined with units from other utterances, to create an infinite array of other utterances. This is the emergence of compositionality, the feature that makes language so much more powerful than any other communication system. Wray suggests that segmentation may have arisen from the recognition of chance associations between the phonetic segments of the holistic utterance, and the objects or events to which they were related. Once recognized, these associations might then have been used in a referential fashion to create new, compositional phrases.... The feasibility of Wray’s process of segmentation [is] enhanced when her own characterization of holistic proto-language is replaced by the rather more complex and sophisticated perspective I have developed, in the form of ‘Hmmmmm’. [For] the presence of onomatopoeia, vocal imitation, and sound synaesthesia would have created non-arbitrary associations...[and] significantly increased the likelihood that particular phonetic segments would eventually come to refer to the relevant entities, and hence to exist as words.... The likelihood would have been further increased by the use of gesture and body language, especially if a phonetic segment of the utterance regularly occurred in combination with a gesture pointing to some entity in the world. Once some words had emerged, others would have followed more readily, by means of the segmentation process Wray describes. The musicality of ‘Hmmmmm’ would also have facilitated this process, because pitch and rhythm would have emphasized particular phonetic segments, and thus increased the likelihood that they would become perceived as discrete entities with their own meanings.... This [is] the case with regard to language acquisition by infants: the exaggerated prosody of IDS helps infants to split up the sound stream.... The musicality of ‘Hmmmmm’ would, moreover, have also ensured that holistic utterances were of sufficient length, so that the process of segmentation would have some raw material to work with.... Further confidence in the process of segmentation derives from the use of computer models to simulate the evolution of language...[for Simon] Kirby’s simulations show that...the process of learning itself can lead to the emergence of grammatical structures. Hence, if there is such a thing as ‘Universal Grammar’, it may be the product of cultural transmission through a ‘learning bottleneck’ between generations, rather than of natural selection during biological evolution; ‘poverty of the stimulus’ becomes a creative force rather than a constraint on language acquisition.”
(Mithen, pp.253-6)

“Together, Wray and Kirby have helped us to understand how compositional language evolved from holistic phrases. However, they have also posed us with an unexpected problem: why did this only happen in Africa after 200,000 years ago? ...There are two possibilities, one relating to social life, and one to human biology. As regards the first, we should note initially that Kirby found holistic languages remain stable in those simulations in which learning-agents  hear so much of the speaking-agent’s utterances that they learn every single association between symbol string and meaning. In other words, there is no learning bottleneck for language to pass through, and hence, no need for generalization.... This would indeed have been quite likely in the type of hominid and Early Human communities I have outlined.... The kick-start for [wider social and economic ties] may have been a chance genetic mutation - the second possible reason.... This may have provided a new ability to identify phonetic segments in holistic utterances.... We have already seen that some aspects of language are dependent on the possession of the specific gene FOXP2, the modern human version of which seems to have appeared in Africa at soon after 200,000 years ago.... Indeed, it may be significant that those members of the KE family that were afflicted with a faulty version of the FOXP2 gene had difficulties not only with grammar, but also with...the segmentation of what sound to to them like holistic utterances.”
(Mithen, pp.257-8)

“The compositional utterances that emerged from holistic phrases by a process of segmentation would have begun as mere supplements...the holistic utterances providing a cultural scaffold for the gradual adoption of words and new utterances structured by grammatical rules. Moreover, the first words may initially have been of primary significance to the speaker as a means to facilitate their own thought and planning, rather than a means of communication.... Talking to oneself is something that we all occasionally do, especially when we are trying to undertake a complex task. Children do this more than adults, and their so-called ‘private speech’ has been recognized as an essential part of cognitive development...[and] private speech may have been crucial in the development of a compositional language to sufficiently complex a state for it to become a meaningful vehicle for information exchange...a supplement to ‘Hmmmmm’ and, eventually, the dominant form of communication.... The brains of infants and children would have developed in a new fashion, once consequence of which would have been the loss of perfect pitch in the majority of individuals, and a diminution of musical abilities. Once the process of segmentation had begun, we should expect a rapid evolution of grammatical rules, building on those that would had been inherited from ‘Hmmmmm’. Such rules would have evolved by the process of cultural transmission in the manner that Kirby describes, and perhaps through natural selection leading to the appearance of genetically based neural networks enabling more complex grammatical constructions.”
(Mithen, pp.259-60)

Rather than a “big bang”, therefore, this theory would predict a gradual evolution of language-enabled modern behaviour...with this only becoming universal after a considerable transitional period had ended, due to the demographic shift provided by much denser populations - and hence social ties. This, as it happens, is exactly what the African record reveals. But, what of music in all of this...Mithen’s original concern in researching this book, before the densely interwoven histories of language and music took over?


“Music emerged from the remnants of ‘Hmmmmm’, after language evolved. Compositional, referential language took over the role of information exchange so completely that ‘Hmmmmm’ became a communication system almost entirely concerned with the expression of emotion, and the forging of group identities, tasks at which language is relatively ineffective. Indeed, having been relieved of the need to transmit and manipulate information, ‘Hmmmmm’ could specialize in these roles, and was free to evolve into the communication system we now call music. As the language-using modern humans were able to invent complex instruments, the capabilities of the human body became extended and elaborated...[but, still,] throughout history, we have been using music to explore our evolutionary past.... [However,] technological developments have served both to democratise the availability of music, and to create a musical elite...[through] musical complexity and then exclusion. When the technical level of what is defined as  musicality is raised, some people will be defined as unmusical, and the very nature of music will become defined to serve the needs of an emergent musical elite.”
(Mithen, pp.266-71)

“Music...maintains many features of ‘Hmmmmm’, some quite evident, such as its emotional impact and holistic nature, others requiring a moment’s reflection. It is now apparent, for instance, why even when listening to music made by instruments rather than the human voice, we treat music as a virtual person, and attribute to it an emotional state and sometimes a personality and intention. It is now also clear why so much of music is structured as if a conversation is taking place within the music itself, and why we often intuitively feel that a piece of music should have a meaning attached to it, even though we cannot grasp what that might be.... [And] if IDS is one remnant of ‘Hmmmmm’, then another is  the use of spontaneous gestures when speaking...[even if] the listener/watcher may be quite unaware that some of the information he/she is receiving is coming from the gesture rather than the words being heard. Spontaneous gestures maintain the key features of ‘Hmmmmm’ - they are holistic and often both manipulative and mimetic. Had we not evolved/developed language, we might be far more effective at inferring information from such gestures, and would have grown up in a culture where such gesturing was recognized as a key means of communication, rather than as a curious hangover from our evolutionary past.... Perhaps of most significance, however, is our propensity to use holistic phrases whenever the possibility arises.... One might argue that we use such formulaic phrases simply to reduce the mental effort.... But, to my mind, their frequency in our everyday speech reflects an evolutionary history of language that for millions of years was based on holistic phrases alone: we simply can’t rid ourselves of the habit.”
(Mithen, pp.275-7)

Steven Mithen’s The Singing Neanderthal is the essential book on the interlaced histories of language and music, and - in combination with the works of Merlin Donald, William Benzon, and Terrence Deacon - makes clear how we evolved such complex and deeply paradoxical skills in the first place. In direct contrast to the theories in vogue within mainstream linguistics, these writers are not afraid to explore all the relevant evidence, and their arguments make very real sense of the archaeological record...hardly surprising in Mithen’s case, we should note, as he is an archaeologist himself...

And, this evolutionary history proves itself to be highly relevant to a proper understanding of all of our communications, rather than simply of antiquarian interest. For, just as music is more than harmony and melody, so too language is (much) more than grammar and semantics. The impoverished notions of mainstream musicology and linguistics may attempt to convince us otherwise but, as Mithen shows us, these two forms are enormously richer than that, and - when this richness is properly assessed - is becomes clear just how we should understand both them, and their evolutionary forerunner. And, for this, and much else, we have Steven Mithen to thank...


“‘Hmmmmm’ communication would have involved dance-like performance, and this might explain an intriguing feature of the Neanderthal archaeological record. When either the whole or a substantial part of a Neanderthal-occupied cave is excavated, the debris they left behind is typically found in a very restricted area. Paul Mellars, a Cambridge archaeologist with a particularly detailed and extensive knowledge of Neanderthal archaeology, has remarked upon this pattern...[and] provides two possible interpretations for each case: either the ‘empty’ areas were used for sleeping, or the groups within the caves had been very small. There is, of course, a third: those empty areas could have been used for performance.... [But] trying to understand the...world of a Neanderthal is challenging, owing to the limitations of our imaginations, the inevitable speculation involved, and the restricted evidence on which these speculations must be based. Also, I believe that all modern humans are relatively limited in their musical abilities, when compared with the Neanderthals. This is partly because the Neanderthals evolved neural networks for the musical features of ‘Hmmmmm’ that did not evolve in the Homo sapiens lineage, and partly because the evolution of language has inhibited the musical abilities inherited from the common ancestor we share with Homo neanderthalensis. Occasionally, however, we have an intense musical experience that may capture some of the richness that was commonplace to the Neanderthals...[and] other experiences might also remind us of how ‘desensitized’ we are to the music-like sounds around us. And so...I would like to quote for a second time how his teacher described her walk with Eddie, the music savant.... ‘I found that a walk with Eddie is a journey through a panorama of sounds. He runs his hand along metal gates, to hear the rattle; he bangs on every lamp post, and names the pitch if it has a good tone; he stops to hear a car stereo; he looks into the sky, to track airplanes and helicopters; he imitates the birds chirping; he points out the trucks rumbling down the street...If it is aural, Eddie is alert to it, and through the aural he is alert to so much more.’”
(Mithen, p.242-5)


John Henry Calvinist