Speech Representation

Verbal narrative, it has long been assumed, is especially qualified to represent speech events because in this case, unlike any other, the object and the medium of representation are identical―language. The speech of characters can be represented directly, through quotation: “She said, ‘No, no, I can’t just now, but tomorrow I will.’” Or it can be paraphrased by a narrator and represented indirectly: “She said that she couldn’t just then, but that the next day she would.” There is also the option to narrate speech acts in an intermediate mode, called free indirect discourse: “No, no, she couldn’t just now, but tomorrow she would.” Consciousness, at least that part of it that resembles unspoken interior speech, can be represented using the same three forms: directly, as quoted interior monologue; indirectly, as thought report, also called psycho-narration (cf. Cohn 1978); or using free indirect discourse. It has been clear for some time, however, that the three discrete forms fall far short of exhausting the range of speech representation in narrative, much less the representation of consciousness, so that analysts have become increasingly willing to consider more diffuse and generalized effects of voice (e.g., Baxtin [1934/35] 1981) and fictional mind (e.g. Palmer 2004).


Speech representation in verbal narrative can be conceived in terms of a relationship between two utterances, a framing utterance and an inset (framed) utterance (Sternberg 1982), or alternatively in terms of interference or interaction between two texts, the narrator’s text and the character’s text. (For further details on the Textinterferenz approach advocated by Schmid and others, see section 3.3 below.) In direct discourse (DD), whether it represents a speech event or an unspoken thought, the transition from frame to inset is clearly visible, typically signaled typographically and/or by an introductory verb of speech or thought: “She said,” “She thought.” DD is conventionally understood to replicate exactly what the quoted character is supposed to have said or thought, preserving (for instance) expressive elements of the original utterance: “No, no.” Of course, the “originality” of direct quotation in fiction is entirely illusory (Fludernik 1993: 409–14); moreover, so is the independence of the quoted inset, which is always controlled by the framing context. DD shorn of its introductory clause, which some call free direct discourse (FDD), is the basis of interior monologues, and a staple of modernist novels.

In indirect discourse (ID), the narrator is much more evidently in control. Here the inset is grammatically subordinated to the framing utterance, with person, tense, and deixis adjusted to conform to those of the frame. According to some authorities (e.g. Banfield 1982), expressive and dialectal or idiolectal features are excluded from ID, but in fact such features are well-attested in actual narrative texts (Vološinov [1929] 1973: 131–2; McHale 1978, 1983). Types and degrees of paraphrase and summary vary widely in ID, from instances that appear quite faithful to the original utterance (though of course, no such “original” exists), through instances that preserve only its content or gist to those that minimally acknowledge that a speech event took place (Vološinov [1929] 1973: 129–33; Leech & Short 1981: 318–51). In representing consciousness, ID shades off into psycho-narration (Cohn 1978: 21–57) where the narrator analyzes the content of the character’s mind, potentially including its habitual and/or subliminal (unconscious) aspects.

Free indirect discourse (FID) is the most problematic and, no doubt for that very reason, still the most widely discussed form for representing speech, thought, and perception. (For further details on the free indirect representation of perception, see section 3.4 below.) Here frame and inset become much harder to distinguish. FID handles person and tense as ID would (though in French it is identifiable by a distinctive past-tense form, the imparfait, in narrative contexts where the passé simple would be expected). On the other hand, it treats deixis as DD would, reflecting the character’s rather than the narrator’s position: “she couldn’t just now, but tomorrow she would.” FID also tolerates many of the expressive elements characteristic of direct quotation―how many, and which ones, remains controversial. In terms of the Textinterferenz model, person and tense evoke the narrator's text, while deictic, expressive and other features evoke the character’s text. To further complicate matters, many instances of FID entirely lack the form’s defining features so that, taken out of context, they appear indistinguishable from non-quoting narrative sentences. Manifestly, it is contextual cues more than formal features that determine, in many cases, whether or not a sentence will be interpreted as a free indirect representation of speech, thought or perception (McHale 1978; Ehrlich 1990).

In view of the range and diversity of each of these forms, especially ID and FID, and the evidence of intermediate or ambiguous instances, some analysts have concluded that a scale of possibilities would be more adequate than the three-category model (McHale 1978; Leech & Short 1981). Such scalar approaches, however, are hardly an improvement on the three-category model when it comes to capturing those diffuse and transient effects of “voice” that are such a regular experience of reading novels. Especially pointed is the dissatisfaction of some analysts with the mapping of categories deriving from speech representation onto the phenomena of represented consciousness. Consciousness in fiction, it has been compellingly argued (e.g. Palmer 2004), is much more ubiquitous and variegated than speech and is not adequately captured by speech-based models of interior discourse. (For further discussion, see section 3.4 below.)

History of the Concept and its Study


The foundation for the categorical approach to speech representation, and the source for many of the conceptual difficulties that continue to beset it, can be traced back to the ancient world. Plato in Republic III distinguishes between situations in which the poet speaks in his own voice (Plato calls this “pure narration,” haple diegesis) and those in which the poet mimics a character’s voice. Classical rhetoric recognized two categories of speech representation proper, oratio recta and oratio obliqua, direct and indirect discourse; however, FID, though already present in ancient Greek and Latin literature and in biblical narrative, would not be identified until the last decades of the 19th century. Pervasive in the 19th-century novel, from Austen to Flaubert, Zola, James and beyond (Pascal 1977), FID did not attain the threshold of visibility until, arguably, the 1857 trial of Madame Bovary, which hinged on whether certain free indirect expressions of indecent and anti-social sentiments were attributable to the author (LaCapra 1982; Toolan 2006). In any case, French and German Romance philologists identified this “new” form around the turn of the nineteenth century, calling it erlebte Rede, verschleierte Rede, or style indirect libre (Tobler 1887; Kalepky 1899, 1913; Bally 1912; Lorck 1914; Lerch 1914; Lips 1926). In English, FID has also been called “narrated monologue” (Cohn) and “represented speech and thought” (Banfield); Israeli scholars call it “combined discourse.” A prescient critique of grammar-based descriptions of FID was mounted as early as 1929 by Vološinov, Baxtin’s collaborator and/or alter ego. However, Vološinov’s contribution dropped out of sight until the “rediscovery” of the Baxtin circle in the late 1960s and early 1970s, and in the meantime the forms of speech representation continued to be treated less as narratological than as grammatical phenomena, whether according to traditional models of grammar (e.g. Ullmann 1957) or in terms of the transformational- generative paradigm (Banfield 1982).

Over the course of the 20th century, scholars of FID gradually expanded the range of what had initially been perceived as a rather local and specialized phenomenon limited to third-person (heterodiegetic) literary narratives. It was identified in first-person, second-person, and present-tense contexts as well as in non-literary prose and oral narrative (Todemann 1930; Cohn 1969; Fludernik 1993: 82–104), and its historical roots were pushed back to the Middle Ages and earlier. Apart from the Romance, Germanic and Slavic languages, it has been attested in Hungarian, Finnish, Japanese, and Chinese, among others (Steinberg 1971; Coulmas ed. 1986; Hagenaar 1992; Tammi & Tommola eds. 2006). Above all, it has come to be recognized not only as a tool for regulating distance from a character―from empathetic identification at one extreme to ironic repudiation at the other―but also as one of the primary vehicles of what modernist poetics taught us to call the stream of consciousness.

Stream of consciousness is best thought of not as a form but as a particular content of consciousness, characterized by free association, the illusion of spontaneity, and constant micro-shifts among perception, introspection, anticipation, speculation, and memory (Humphrey 1954; Friedman 1955; Bickerton 1967). It can be realized formally by first-person “autonomous” interior monologue (as in Molly Bloom’s soliloquy from Ulysses, or the first three sections of Faulkner’s The Sound and the Fury), or by FID (as in Joyce’s Portrait of the Artist as a Young Man, or Virginia Woolf’s Mrs.Dalloway and To the Lighthouse), or indeed by a combination of means. Modernist innovations in stream of consciousness technique seemed to monopolize the agenda of scholarly investigation of the representation of consciousness for much of the 20th century, at least until Cohn (1978) reasserted the importance and ubiquity of less “glamorous” techniques, such as psycho-narration. Since then, cognitive narratologists in particular have taken up the challenge of investigating the presence of consciousness in fiction outside the well-worn channels of the stream of consciousness (e.g. Fludernik 1993, 1996; Palmer 2004; Zunshine 2006).


Progress in understanding speech and consciousness representation has been hampered by fundamental confusion about the concept of mimesis. Two senses of mimesis are regularly conflated: on the one hand, mimesis in the sense, derived ultimately from Plato, of the author’s speaking in a character’s voice rather than his own; on the other hand, mimesis in the sense of faithful reproduction of what we take to be reality. An unexamined assumption throughout much of the discussion of speech representation has been that mimesis in the sense of speaking for the character should correlate with mimesis in the sense of faithfulness of reproduction―that the more direct the representation was, the more realistic or life-like it would be (Sternberg 1982). Thus, DD should be the most faithful to reality, and ID the least, with FID somewhere in between. Nothing could be further from the truth; in fact, speech representation is a classic illustration of what Sternberg (1982) decries as the fallacy of “package deals” in poetics whereby forms and functions are bundled together in one-to-one relationships. Actually, the forms of speech representation stand in a many-to-many relationship to their reproductive functions: some instances of DD are highly imitative of “real” speech, while others are deliberately stylized and un-mimetic; some instances of ID or FID are more imitative of “real” speech than DD often is, while other instances are less so; etc. (Fludernik 1993: 312–15). Attempts to elaborate the three-category repertoire of speech representation into a continuous scale from maximally to minimally mimetic, in the faithfulness-of-reproduction sense (e.g. McHale 1978; cf. Genette [1972] 1980), stumble at just this point. They invariably place DD (or FDD) at the most-mimetic pole and ID at the opposite pole. But no matter how many gradations such scales admit in between, they obscure the fact that degree of faithfulness does not correspond to formal categories: one scale cuts across the other.

Moreover, the very notion of “faithfulness to reality” here is highly suspect. Another of the unexamined assumptions of speech representation scholarship is that verbal narrative is better able to represent speech than anything else because narratives share one and the same medium, namely language (e.g. Genette [1972] 1980: 169–74). But this, too, is fallacious, as a glance at a transcription of spontaneous conversation would immediately confirm. At one level of analysis, conversation in novels may indeed reflect the “rules” of spontaneous real-world conversation (e.g. Toolan 1987; Thomas 2002; Herman 2002: 171–93). But at a finer-grained level, speech in the novel appears utterly unlike real-world speech. Novelistic speech is always highly schematized and stylized, depending for its effects of verisimilitude on very limited selections of speech-features, many of them derived not from actual speakers’ behavior but from literary conventions, linguistic stereotypes, and folk-linguistic attitudes. This is especially evident in representations of foreign accents, regional dialects, and specialized professional registers (Page 1973). Perhaps the most powerful factor in producing effects of “realistic” speech is textual context, which induces the reader to accept thin sprinklings of conventional or possibly arbitrary features as faithful representations of real-world speech behavior (McHale 1994). In short, the mimesis of speech in fiction is a “linguistic hallucination” (Fludernik 1993: 453); it depends on our willingness to play a “mimetic language-game” (Ron 1981).

If speech in fiction is not a faithful imitation but an effect produced by a combination of convention, selection, and contextualization, then this must also be the case for consciousness in fiction, only more so, for consciousness is at best only partly linguistic. Nevertheless, the operating assumption of much recent cognitivist work on consciousness in narrative is that fictional minds are modeled on real-world mental processes (e.g. Palmer 2004: 11). But what if consciousness in fiction is just as conventional, schematic, selective, and context-dependent as speech in fiction―just as much an effect, just as much a hallucination or language-game? Surely this is a hypothesis that ought to be entertained (Mäkelä 2006).


If speech representation always involves a quoting frame and quoted inset, this means that it involves two agents or instances of speech―two voices. The two voices are readily distinguished in DD and in content- paraphrase types of ID, but only with difficulty in FID. In FID, the effects of voice all seem to derive from the quoted character, with the narrator’s contribution reduced to the bare grammatical minimum of tense and person. Indeed, an early controversy in the scholarship on FID hinged on the question of the narrator’s putative self-effacement and empathetic identification with the character. However, FID is just as likely to serve as a vehicle of irony, and it is in these instances that the so-called dual-voice hypothesis (Vološinov [1929] 1973; Baxtin [1929] 1984; Pascal 1977) seems most compelling. According to the dual-voice hypothesis, in sentences of FID (and some instances of ID) the voice of the narrator is combined with that of the character (hence “combined discourse”) or superimposed on it. “It partook, she felt, helping Mr. Bankes to a specially tender piece, of eternity”: in this famous sentence from To the Lighthouse, the parenthetical clause (“she felt, helping Mr. Bankes,” etc.) introduces a plane of narratorial comment that ironizes Mrs. Ramsay’s experience of eternity. (Or does it? This is actually an interpretative crux in the novel.) Irony of this kind seems best accounted for in terms of the dual-voice hypothesis (Uspenskij 1973: 102–5).

With the rediscovery of the Baxtin circle, the dual-voice analysis of FID, already anticipated by Vološinov ([1929] 1973), came to be viewed in the light of wider phenomena of dialogue in the novel. According to Baxtin and his school, the text of the novel is shot through with more or less veiled dialogues between voices that “speak for” social roles, ideologies, attitudes, etc. The forms of dialogue range from outright parody and stylization to implicit rejoinders and veiled polemics (Baxtin [1929] 1984). FID is folded in among these categories, reflecting as it does (according to the dual-voice hypothesis) the internal dialogization of the sentence of speech representation itself.

Related to the Baxtinian approach, but less ideologically driven, and capable of much finer-grained analyses, is Schmid’s model of Textinterferenz ([1973] 1986, 2010: 137–74; see also Doležel 1973; de Haard 2006). The Textinterferenz approach treats speech representation as a matter of interference or interaction between two texts, the narrator’s text and the character’s text. Textual segments display varying kinds and degrees of interaction between these two texts, depending upon how various features are distributed between the narrator’s and the character’s voices. These features include thematic and ideological (or evaluative) markers; grammatical person, tense and deixis; types of speech acts (Sprachfunktion); and features of lexical, syntactical and graphological style. In DD, all the markers point to the character’s voice. In ID, person, tense and syntax can be assigned to the narrator’s text, while thematic and ideological markers, deixis, and lexical style point to the character’s voice; the speech-act level points both directions. Finally, in FID, person and tense evoke the narrator’s text, while all the other features can be assigned to the character’s text.

In the light of dialogism and Textinterferenz, speech representation comes to be reconceived as only more or less discrete instances of the pervasive heteroglossia (Tjupa → Heteroglossia) of the novel, its multiplicity of voices (Baxtin [1934/35] 1981). According to the Baxtinian account, samples of socially-inflected discourse―styles, registers, regional and social dialects, etc. with their associated attitudes and ideologies―are dispersed throughout the novel, appearing even where there is no frame/inset structure of quotation to “legitimize” or naturalize them. The language of a novel diversifies into various zones, including zones associated with specific characters, even in the absence of syntactical indications of quotation or paraphrase. This analysis of novelistic discourse was paralleled in the Anglophone world, albeit in a casual and pre-theoretical way, by Kenner’s (1978) jocular proposal of the “Uncle Charles Principle,” named after a typical sentence from Joyce’s Portrait of the Artist: “Uncle Charles repaired to the outhouse.” The sentence is attributable to the heterodiegetic narrator, but it is “colored” by Uncle Charles’ characteristic periphrasis, “repaired.” The Uncle Charles Principle, also called stylistic “contagion” or “infection” (Spitzer [1922] 1961; Vološinov ([1929] 1973: 133–36; Stanzel [1979] 1984; Fludernik 1993: 332–38), involves the dispersal of a character’s idiom into the narrative prose in the proximity of that character (Koževnikova 1971).

At the opposite extreme from the dual-voice hypothesis and its extensions is the controversial no-narrator hypothesis advanced by Banfield (1982). According to Banfield, free indirect sentences of thought representation (though not of speech) in third-person hetereodiegetic contexts entirely lack a narrator, and so could hardly be dual-voiced. In effect, Banfield has revived the empathetic reading of FID endorsed by early commentators, but in a way calculated to scandalize anyone committed to a communications-model approach to narrative. Indeed, it might be argued that in certain FID representations of thought, those representing what Banfield calls non-reflective consciousness, there is no discernible voice at all: “It was raining, she saw” (Banfield 1982: 183–223; Fludernik 1993: 376–79). Whereas sentences of reflective consciousness express what the character is aware of as passing through her mind―what she “thought to herself”―sentences of nonreflective consciousness express what the character perceives or apprehends without being aware of perceiving or apprehending. At this point, issues of voice shade off into even more diffuse issues of fictional minds.


Pervasive voice in the novel is mirrored by a parallel pervasiveness of consciousness. Investigating the presence of fictional consciousness, cognitive narratologists have become impatient with the so-called “speech-category approach,” which in effect limits consciousness in fiction to varieties of inner speech. Not all consciousness in fiction is inner speech, they argue―perhaps relatively little of it. As we have already seen, however, even approaches to the representation of consciousness using speech categories eventually run up against phenomena that exceed those categories in various ways. Speech categories “bleed” at their edges, trailing off into less category-bound forms of fictional mind. At one edge, for instance, ID bleeds into psycho-narration, whereby the narrator takes charge of analyzing the character’s mind, including subconscious levels that might not be accessible to the character herself, or habitual dispositions that might not manifest themselves in inner speech. At the other edge, FID bleeds into nonreflective consciousness. Indeed, almost from the earliest days of scholarship on FID, it was recognized that the speech category of FID was intimately related to a form of so-called “substitutionary perception” (Fehr 1938; see also Bühler 1937), sometimes called “represented perception” (Brinton 1980) or even “free indirect perception” (Palmer 2004): “She opened the door and looked out. It was raining harder. The cat would be around to the right. Perhaps she could go along under the eaves.” The third and fourth of these sentences are unmistakably FID (as indicated by the past-tense modals would and could, and the adverbial of doubt, perhaps), but the second is substitutionary perception.

Reorienting the study of represented consciousness away from speech categories opens up new areas of inquiry. For instance, characters can be shown to read each other’s minds―not in any science-fiction sense, but in the sense that they develop working hypotheses about what others are thinking, inferring interior states from speech and external behavior, just as one does in everyday life; they do “Theory of Mind,” in other words (Zunshine 2006). Indeed, all actions of characters in a narrative fiction must be animated by mental states or acts; otherwise, we might not be disposed to call them “actions” at all. So thought ought not to be viewed as separable from action, but rather as forming together with action a “thought-action continuum” whereby actions are animated by consciousness throughout (Palmer 2004: 212–14).

The most radical statement of this reorientation of analysis away from the speech-category approach and toward “mind in action” must surely be Fludernik’s redefinition of narrativity itself as experientiality (Fludernik 1996: 20–43; compare Antin 1995). According to Fludernik’s account, narrativity is not adequately defined in terms of sequences of events or even in terms of causal connections among events, but only in terms of the experiencing of events by a human (or anthropomorphic) subject. In other words, it is ultimately the presence of consciousness that determines narrative, and not anything else.

This is a far cry from the carving up of blocks of prose into discrete units labeled DD, ID, FID. Nevertheless, it is not as unprecedented a development as some cognitive narratologists have claimed. For instance, the analysis of informational gaps and gap-filling, as practiced by exponents of the Tel Aviv school (Perry & Sternberg [1968] 1986; Perry 1979), is every bit as finely attuned to characters’ ventures in mind-reading and the thought-action continuum as anything to be found in the new cognitivist narratology (Palmer 2004: 182; Herman → Cognitive Narratology). But if cognitive narratology sometimes overestimates its own novelty and underrates its precursors, this does not prevent it from standing at the cutting edge of research into the representation of fictional mind at the present time.

Topics for Further Investigation

(a) One is tempted to recommend (albeit facetiously) a moratorium on further research into FID proper until other, more diffuse and pervasive effects of mind and voice in fiction are better understood. Among other advantages, this might give us the opportunity to evaluate critically some of the bold claims of the cognitive narratologists with respect to fictional minds, and of the Baxtin school with respect to “dialogue” (Shepherd → Dialogism). Baxtin, in particular, has become a victim of his own (posthumous) success; serial (mis)appropriations of his approach by a diverse range of literary and cultural theories, coupled with uncritical endorsement of his ideological positions, has made critical evaluation of his account of dialogue virtually impossible. (b) Too little is still known about the role of models (schemata, stereotypes, folk-linguistic knowledge, etc.) in the production and recognition of representations of language varieties (styles, dialects, registers, etc.) in fiction. (c) Similarly, there is still much that remains to be clarified about the operation of textual context and its interaction with models of speech and thought in producing the effect or illusion of mimesis (though with respect to context see Ehrlich 1990). (d) “Currently, there is a hole in literary theory between the analysis of consciousness, characterization, and focalization […] a good deal of fictional discourse is situated precisely within this analytical gap” (Palmer 2004: 186). Palmer perhaps underestimates the quantity and value of the work that has already gone into knitting together consciousness, characterization and focalization. Nevertheless, he is basically right: this is one of the holes that remain in narrative theory, and closing it should be a high priority of future research.


