Return to Geoff Nunberg's publications

Ch. 20: Punctuation

Main authors: Geoffrey Nunberg, Ted Briscoe & Rodney Huddleston

1 Preliminaries 2

1.1 The domain of punctuation 2

1.2 Indicators and characters 3

1.3 The status of punctuation rules 4

1.4 Units of syntax and units of writing 5

1.5 Functions and classification of punctuation indicators 7

2 Primary terminals 9

3 The secondary boundary marks: comma, semicolon and colon 13

3.1 Some formal preliminaries 14

3.2 Uses of the secondary boundary marks 16

3.2.1 Coordination, syndetic or subclausal 17

3.2.2 Supplementation, syndetic and subclausal 18

3.2.3 Asyndetic combinations of main clauses 19

3.2.4 Further cases of simple boundary marking at the subclausal level 21

3.2.5 Delimiting commas 22

4 Parentheses 26

5 The dash 28

6 Quotation marks and related indicators 31

7 Capitalisation 35

8 Word-level punctuation 37

8.1 Word boundaries 37

8.2 Hyphens 37

8.2.1 Some initial distinctions 37

8.2.2 Inherent and long hyphens 38

8.3 The apostrophe 41

8.4 The abbreviation full stop and minor reduction markers 41

8.5 The slash 42

1 Preliminaries

1.1 The domain of punctuation

The central concern of punctuation is with the use of the various punctuation marks, such as the full stop, comma, semicolon, colon, question mark, quotation marks, parentheses, and so on. These serve to give indications of the grammatical structure and/or meaning of stretches of written text. The punctuation marks are all segmental units of writing -- i.e. they fully occupy a position in the linear sequence of written symbols. There are, however, various non-segmental features which can serve the same kind of purpose as the punctuation marks. For example, titles of literary or other works may be italicised as an alternative to being enclosed in quotation marks. And while the end of a sentence is indicated segmentally by a punctuation mark (a full stop, question mark or exclamation mark), the beginning of a sentence is indicated non-segmentally by capitalisation of the first letter. We will therefore regard punctuation as covering the use not only of punctuation marks but also of such non-segmental features as italics, capital letters, bold face and small capitals. Ordinary lower-case roman represents the default form, and these non-segmental features can be regarded as modifications of the default form.

One other important aspect of punctuation is the use of space, notably to separate one word from the next. Space between words is a segmental unit: like the punctuation marks, it occupies the whole of one position in linear sequence. For example, in the first sentence of this paragraph, a word-space occupies the fourth position, the tenth position, and so on. We will use the term punctuation indicator as a general term covering punctuation marks and the other devices that fall within the domain of punctuation. The classification is thus as follows:

<REB:52> [1]

[1] punctuation indicators Segmental Punctuation marks Spaces

Non-segmental Modifications

On another dimension, we need to clarify the domain of punctuation with respect to the size of the unit to which the punctuation applies. The punctuation marks mentioned above generally occur within a sentence (including its final boundary) but outside the individual words. There are two punctuation marks, however, that are normally word-internal: the apostrophe and the hyphen. Words may also contain various non-segmental marks, diacritics, but we do not regard these as falling within the domain of punctuation. For example, accents (which do not of course appear in native English words, but are nevertheless found in some words that are otherwise fully anglicised, such as fiancJ) are simply a matter of word-spelling. There are also punctuation marks that can apply beyond the sentence: parentheses and quotation marks can enclose stretches of writing longer than a sentence. In addition, the division of a text into paragraphs (marked by a new line and, usually, indentation space) can also be regarded as a matter of punctuation. It is not usual, however, and nor would it be helpful, to extend the domain of punctuation to cover the lay-out of larger units (division into chapters or sections, use and format of headings, and so on). It follows that a non-segmental feature such as italics counts as a punctuation indicator when it serves to mark quotation, a title or emphasis, but not when it is used for a heading of a certain hierarchical level in the organisational structure of a book or comparable document.

1.2 Indicators and characters

In virtually all written material the apostrophe is physically -- or, as we shall say, graphically -- identical with a single quotation mark. We need, therefore, to distinguish between two kinds of concept which we will call indicators and characters. The characters are the graphical shapes, or symbols, that realise the indicators. Apostrophe and single quotation mark are then distinct indicators that may be realised by the same character.

For reasons we will discuss in #6, we take single and double quotation marks as distinct indicators, but each of them can be realised by three different characters. Quotation marks normally occur in pairs, and there is one character that is used to open the quotation (` or ``), another which is used to close it (' or ''), and a third that is used in both positions in fonts (such as the standard typewriter keyboard) that do not have the separate opening and closing characters (U or "). We do not have opening and closing quotation marks as distinct indicators because the choice of character is predictable from the position: they are contextual variants of the same indicator. But the apostrophe has to be distinguished from the single quotation mark because it can never be realised by the character used to open a quotation -- even when it is used at the beginning of a word, as in rock 'n' roll (not *rock `n' roll).

The distinction between indicator and character is also important with respect to dashes and hyphens. We distinguish three indicators, illustrated in

<REB:53> [2]

[2] i dash He's late -- he always is.

ii (ordinary) hyphen non-negotiable

iii long hyphen the doctor--patient relationship

We also distinguish three characters: em-rule (C), en-rule (--), and hyphen-character (-). Depending in part on the resources available (the standard typewriter keyboard has only the hyphen-character), in part on the publisher's house style, the dash indicator may be realised by any of the three characters or by a sequence of two hyphen-characters; the en-rule and the single hyphen-character are flanked by spaces, while the other two realisations may or may not be. The ordinary hyphen is realised by a hyphen-character without flanking spaces. The long hyphen is a relatively minor punctuation mark with a very restricted use; in some styles it is not used at all, the ordinary hyphen taking over its functions (as in the doctor-patient relationship), but when it is recognised as a distinct indicator it is normally realised by the en-rule without flanking spaces.

Consider finally the full stop and ellipsis points. The full stop is used to mark the end of a sentence or an abbreviation (as in Col. Blimp), but as these always have the same realisation we regard them as different uses of a single indicator. There is, however, a distinct indicator that is used to mark omission (as in The President said, `We will send as many troops as it takes ... to restore order in the region'); this indicator, which we call ellipsis points, is realised by a sequence of three dot characters or a single character consisting of a sequence of three dots.

In most other cases we have a simple one-to-one relation between indicator and character. The following table lists the punctuation marks we shall be concerned with, giving their realisation and commonly used alternative terms:

<REC:31> [3]

[3] indicator realisation(s) alternative terms

i full stop . period (AmE)

ii question mark ?

iii exclamation mark ! exclamation point (AmE)

iv comma ,

v semicolon ;

vi colon :

vii dash C -- - ---

viii parenthesis ( ) round bracket (BrE)

ix square bracket [ ] bracket (AmE)

x ellipsis points ... ellipsis

xi double quotation mark `` '' " ) double/single quote (mark)

xii single quotation mark ` ' U ) inverted commas

xiii apostrophe ' U

xiv slash / stroke, solidus, virgule

xv long hyphen -- en-dash

xvi ordinary hyphen -

xvii asterisk *

1.3 The status of punctuation rules

A great deal of the written material that we read is put out by publishers (of books, newspapers, journals, etc.) with the text edited by people whose profession is precisely to prepare text for publication. To a significant extent this process involves the conscious application of codified rules, set out in manuals specific to a particular publishing house or accepted more widely as authoritative guides. Those outside the publishing trade are generally likely to be unfamiliar with at least some of the more technical rules, and in the context of preparing text for potential publication many writers will defer to the advice of handbooks and the like. It is true, of course, that style guides commonly deal with points of grammatical usage too, but here they have a less influential role: a very high proportion of our use of language involves spontaneous speech, with no need or opportunity to consult such works. For this reason, we ourselves in writing this chapter on punctuation have given greater weight to the prescriptions of major style manuals than we have in the chapters on grammar. But we should also note that many of the rules of punctuation that have been mastered by competent writers are part of tacit linguistic knowledge no less than the rules of spoken language are, and as such are never mentioned in usage manuals or style guides.

D Variation

In spite of the codification mentioned above, punctuation practice is by no means entirely uniform. On some matters, such as whether or not to mark abbreviations with a full stop, we find variation from one publishing house to another. More important, there is some significant regional variation, most notably with respect to the interaction between quotation marks and other punctuation marks.

It is worth noting, however, that we do not find social variation between standard and non-standard such as we have in grammar: there is no punctuational counterpart of grammatically non-standard usage like I ain't done nothing or Who done that? -- that is, a repertory of variants that are used in a consistent way by one social group but not by another. Moreover, the style contrast between formal and informal is of relatively limited relevance to punctuation. One might say that the multiple question marks and exclamation marks in [4] belong to informal style:

<REB:64> [4]

[4] i They're coming for a week: what on earth are we going to do with them??

ii Thanks for inviting us -- we had a wonderful time!!

It is true that such uses of punctuation are rarely if ever found in the formal style of academic or legal writing. But we should bear in mind that there are many writers who would never use punctuation in this way: such usage is not comparable with the informal style grammar of, say, I don't know who he's referring to, which virtually all speakers would use in all but quite formal contexts in preference to I don't know to whom he's referring.

What we do find, however, is a distinction between light and heavy punctuation styles that is independent of regional and publishing house variation:

<REB:57> [5]

[5] i On Sundays they like to have a picnic lunch in the park if it's fine. [light]

ii On Sundays, they like to have a picnic lunch in the park, if it's fine. [heavy]

This distinction has to do with optional punctuation, especially commas: a light style puts in relatively few commas (or other marks) in those places where they are optional rather than obligatory.

1.4 Units of syntax and units of writing

D The orthographic sentence

Syntax is traditionally defined as the study of the way words combine to form sentences, but from a syntactic point of view the delimitation of the sentence is quite problematic. A sentence may have the form of a clause or of a sequence of clauses, and while sentences with the form of a clause can generally be delimited straightforwardly, it is not so clear when successive clauses are syntactically combined into a larger unit. The central cases of sentences with the form of a sequence of clauses are those where the clauses are coordinated, with at least one of them being marked by a coordinator. The syntactic construction of coordination, however, does not have to be explicitly marked by means of a coordinator: coordination can be asyndetic (Ch. 15, #@@). This is evident from the examples like Her family, her friends, her colleagues had all rallied to her support, where the underlined NPs combine to form an asyndetic NP-coordination that functions as subject of the clause. Note also that the gapping construction can occur with asyndetic coordination. Compare, then:

<REB:54> [6]

[6] i Kim went to the concert, but Pat stayed at home.

ii Some went by bus, some by train.

iii Some went to the concert, some stayed at home.

In [i] the coordination is marked by the coordinator but. In [ii] omission of the verb of the second clause, by gapping, serves to mark unequivocally that the clauses belong together in a larger syntactic unit. But in [iii] there is no overt marking of the coordinative relation between the clauses.

Coordination, moreover, is not the only syntactic relation that need not be marked by any formal device. The same applies with supplementation (Ch. 15, #@@). Compare:

<REB:55> [7]

[7] i There's another reason why we should hesitate -- (namely,) the likelihood that interest rates will rise again in a few months.

ii There's another reason why we should hesitate -- (namely,) it is likely that interest rates will raise again in a few months.

In [i] the apposed element is an NP, while in [ii] it is a main clause; in both cases the apposition marker namely is optional, so that in the version of [ii] without namely there is no structural marking of the relation between the two clauses.

For these reasons there will often be no syntactically-marked distinction between a sentence with the form of a combination of two successive main clauses and a sequence of two sentences each of which has the form of a main clause. In writing, one function of punctuation is precisely to indicate whether successive clauses belong together or are to be treated as separate. In speech, prosody also serves to convey information about the relation between successive clauses, but it is important to emphasise that punctuation cannot be described as a means of representing the prosodic properties of utterances. When we are talking about the relation between successive main clauses, therefore, we cannot be neutral between the spoken and written medium. The term orthographic sentence is therefore applied to the unit that is defined by punctuation: leaving aside complications that we will take up below, an orthographic sentence is a unit of writing that begins with a capital letter and ends with a full stop, question mark or exclamation mark. The term `orthographic sentence' embodies no commitment as to whether or not the unit concerned is syntactically a sentence, a question which may have no determinate answer. Since this chapter is about punctuation, however, we will henceforth take it for granted that the term `sentence' on its own is to be understood as ``orthographic sentence'' unless we explicitly indicate otherwise.

D The orthographic word

Similar issues arise with the word. In the grammar we make a distinction between a morphologically complex word and a syntactic construction containing separate words:

<REB:56> [8]

[8] i I left the watering-can in the greenhouse. [complex word]

ii Who lives in that green house opposite? [syntactic construction]

In [i] greenhouse is a single complex word (more specifically, a compound) denoting a building made of glass used for growing plants that need warmth; in [ii] green house is a sequence of two words forming a nominal with the structure modifier + head and denoting a house that is green in colour. The basis for drawing this distinction is discussed in Ch. 5, #@@, but again the criteria do not always yield clear-cut results, and even where they do the grammatical analysis will not always match up with the written form. We therefore need the concept of an orthographic word that is defined by punctuation: leaving aside again certain complications, an orthographic word is a minimal unit of writing that is flanked by spaces which are either immediately adjacent to it or are separated from it by punctuation marks. As with `orthographic sentence', the term `orthographic word' is neutral as to whether or not the unit is grammatically a single whole word, and again for the remainder of this chapter the term `word' on its own is to be understood as ``orthographic word'' unless otherwise specified.

1.5 Functions and classification of punctuation indicators

D Four main functions

The punctuation indicators serve a range of functions which can be grouped (leaving aside a few minor special purpose uses) into four main types.

E (a) Indicating boundaries

<REB:58> [9]

[9] i You will have to make a decision soon. It is not for me to try to influence you.

ii By all means take the book with you, but be sure to return it.

In [i] we have a succession of two sentences, their boundaries being marked by the capital letter at the beginning and the full stop at the end. In addition, the spaces mark the boundaries between words. In [ii] the comma marks the boundary between two main clauses that are combined within a single sentence. Boundary marking can be regarded as the primary function of punctuation marks; it is not mutually exclusive with the other functions, and indeed is at least incidentally involved in virtually all uses.

E (b) Indicating status

<REB:60> [10]

[10] i What does Frank think about it?

ii The boys' behaviour was hardly likely to make her change her mind!

The question mark in [i] serves to mark the sentence boundary, but at the same time it indicates that the sentence is a question; and the capital letter at the beginning of Frank indicates that this expression has the status of a proper name. In [ii] the apostrophe marks the noun as genitive, while the exclamation mark combines the functions of marking the sentence boundary and indicating that the sentence has the status of an exclamation.

E (c) Indicating omission

<REB:61> [11]

[11] i She goes on to say, `But Johnson ... was willing to accept a fee for the work.'

ii `F*** off!' he yelled, `or I'll call the police.'

The ellipsis points indicator in [i] marks the omission from the reported speech of one or more words that occupied this position in the original. In [ii] the asterisks mark the suppression of letters from the taboo word fuck, while the apostrophe signals the reduction and cliticisation of the word will.

E (d) Indicating linkage

<REB:62> [12]

[12] i The Management will continue to concentrate on completing the redevelopment/acquisition program outlined above.

ii I met her in the dining-car of the London--Glasgow express.

The slash and the two types of hyphen serve to link, to relate the items on either side of them. In [i] we understand ``the program of redevelopment or acquisition''; in [ii] the ordinary hyphen joins the two noun bases into a compound noun, while the long hyphen joins the two place-names into a single modifier of the head express (with the interpretation ``express going from London to Glasgow'').

D Prevention of misreading

We have noted that punctuation marks are often optional, with light and heavy styles differing with respect to how many of these optional marks are inserted. Even in what is overall a light style, however, such indicators will tend to be added if their omission might lead to an initial misreading of the sentence. Indeed, indicators may be inserted to prevent confusion of this kind even in places where they would not normally be permitted. Compare:

<RED:92> [13]

[13] i Liz recognised the t-shirt he took from the bag and gasped.

ii Liz recognised the man who entered the room, and gasped.

iii Most of those who can, work at home.

While [i] has no internal punctuation, [ii] -- which has the same syntactic structure in relevant respects -- has a comma which serves to make clear that it was Liz who gasped, not the man. In [iii] the comma marks the boundary between subject and verb, contrary to the general rule prohibiting punctuation in this position; what makes it justifiable here is that without it work is likely to be at first taken as head of the complement of can rather than of the matrix predicate.

D Organisation of this chapter

It will be evident from the brief survey with which we began this section that a number of indicators have diverse functions. Most notably, perhaps, the full stop can mark the end of a sentence or indicate an abbreviation. As a consequence, it is not possible to draw up a satisfactory unidimensional classification of the punctuation indicators. The organisation of the rest of this chapter, therefore, represents a compromise between treating them in successive subsets and dealing with them function by function.

In #2 we describe what we call the primary terminals: the full stop as used to end a sentence, and the question and exclamation marks. With the latter two the function of marking status is more important than that of marking a terminal boundary, and they are not constrained to occur at the end of a sentence; nevertheless, they are mutually exclusive with the terminal full stop, and hence form a natural group with it.

A second group, dealt with #3, consists of the comma, the semicolon, and the colon, which we refer to as secondary boundary marks. They are secondary in the sense that they mark boundaries within a sentence, not between sentences. Or rather, that is invariably the case with the comma and the semicolon, and predominantly the case with the colon.

We turn next, in #4, to parentheses. These occur in pairs (with distinct opening and closing characters), enclosing units which are usually smaller than a sentence, but do not have to be. In #5 we turn to the dash; in most of its uses this is a secondary boundary mark, but it has considerable affinities with the parenthesis, and hence is best dealt with at this point in the exposition.

The following section, #6, covers the related functions of quotation, citation and naming. Quotation marks, single or double, are the main indicators for these functions, but italicisation is used too, and there are also places where there is no punctuational indication at all. Square brackets and ellipsis points occur primarily within quotations and are thus dealt with in this section. Related in some respects to quotation is capitalisation, the topic of #7.

Finally, #8 deals with those aspects of word-level punctuation not already covered. By word-level punctuation we mean the marking of word boundaries and the use of punctuation marks (mainly hyphens and apostrophes) within a word. Other punctuation we will refer to by contrast as higher-level punctuation. We treat the slash as a word-level punctuation indicator on the grounds that it is not (or at least not normally) flanked by spaces.

2 Primary terminals

For the most part, discursive written text consists of a sequence of sentences, each beginning with a capital letter and ending with a primary terminal -- a full stop, a question mark or an exclamation mark.

The full stop that marks the end of a sentence we refer to as the terminal full stop, as opposed to the abbreviation full stop (and various more specialised uses of this indicator). We suggested above that the primary function of the question and exclamation marks is to indicate status rather than boundaries, and this is reflected in the fact that they differ from the terminal full stop in being able to occur medially, internally within a sentence, and to be followed by other punctuation marks:

<REC:20> [1]

[1] i She had finally decided -- and who can blame her? -- to go her own way.

ii Her son -- what a scoundrel he is! -- is threatening to sue her.

iii *Southern liberals -- There are a good many. -- often exhibit blithe insouciance.
Medial questions and exclamations do not normally begin with a capital letter except in the case of quotation (see #6): the expressions interpolated between the dashes in [i--ii] are thus grammatical clauses but not orthographic sentences.

D Sentence terminals and clause type

In sentences with the form of a single clause, there is a significant correlation between terminals and clause type. The default relations are illustrated in:

<REB:75> [2]

[2] clause type sentence terminal

i Kim has arrived. declarative ) full stop

ii Let me know if you need any help. imperative )

iii Have you seen my glasses? interrogative question mark

iv What nonsense they talk! exclamative exclamation mark

The correlation, however, is very imperfect: the punctuation marks match the meaning and illocutionary force much more directly than do the syntactic clause type categories (see Ch. 10, #@@ for the concept of illocutionary force and its relation to clause type).

D Question mark

As the name implies, this indicates that the constituent it terminates has the status of a question.

E Terminal of unembedded sentence

In the simplest case the question mark occurs at the end of an unembedded sentence, in which case it is in contrast with the full stop and (normally) the exclamation mark. It is the default punctuation mark following an interrogative main clause, whether closed or open. It is also used after other clause types with the punctuation itself signalling the question meaning, as rising intonation does in the corresponding spoken forms:

<REB:67> [3]

[3] i Have you seen today's paper? [closed interrogative]

ii Why do fools fall in love? [open interrogative]

iii You saw him, then? [declarative]

iv Take it back on Saturday? [imperative]

With imperatives (and likewise exclamatives) such cases are generally restricted to echo questions.

Examples where the sentence has the form of a sequence of main clauses are:

<REB:68> [4]

[4] i It would be hard to criticise the measures, wouldn't it?

ii Where did you get it from and how much did it cost?

iii It certainly looks very good, but isn't it rather expensive?

In [i] wouldn't it? is an interrogative tag; its effect is make the whole sentence a question. Example [ii] is a coordination of two interrogatives but has just one question mark, at the end: it is punctuated as a compound question. In [iii] we have a coordination of declarative (statement) and interrogative (question): the semantic scope of the question mark is thus just the second clause, but it serves as terminal boundary mark for the whole sentence.

The question mark tends to be replaced by one of the other sentence terminals in questions that are used as indirect speech acts:

<REB:69> [5]

[5] i Would you tell Jill that I'll be replying to her letter shortly.

ii Why don't you try to get this report to me by tomorrow.

iii Aren't they lucky to have got away with it!

iv Who cares what I think about it, anyway!

Examples [i--ii], with the form of a closed and open interrogative respectively, are used as indirect directives, and are punctuated according to the illocutionary force, not the grammatical form or literal meaning. Examples [iii--iv] have the force of exclamatory statements and again are punctuated accordingly; in the open interrogative type [iv], however, it is equally possible to use a question mark instead.

E Embedded questions

When a question is embedded, the punctuation depends on the grammatical form: it normally takes a question mark if it has main clause form, but not if it has the form of a subordinate content clause. Compare:

<REB:70> [6]

[6] main clause syntax subordinate clause syntax i a. She asked, `Where is Kim going?' b. She asked where Kim was going.

ii a. Again the question arises: why were b. Again the question arises as to why

we not consulted? we were not consulted.

iii a. Her son (you remember him, don't b. [no subordinate version]

you?) has just been arrested.

Note that where the question has main clause syntax it may or may not begin with a capital letter, signalling its presentation as an embedded sentence. Example [ia] is a case of direct reported speech (Ch. 11, #@@); here a capital is required if the question is enclosed in quotation marks, but otherwise lower case is permissible, especially with relatively short questions (I'm afraid he always asks himself, what's in it for me?). In [iia] the question is cited or identified, and here capitalisation is optional. In [iiia] the question is parenthesised (and for this type there is no matching subordinate construction); here capitalisation, while not impossible, is relatively unlikely.

E Parentheticals

Sentences containing interrogative parentheticals, or parentheticals in construction with an interrogative main clause, are illustrated in:

<REB:76> [7]

[7] i There is nothing in the structure of English that prohibits us from referring to a woman as John Smith or, shall we say, George Eliot.

ii Will he tell them?, she asked.

iii Will he tell them, I wonder?

iv Will he tell them, do you think?

Parentheticals like shall we say, dare I say, would you believe have interrogative main clause syntax but no inquiry force, and generally have no question mark, as in [i]. Examples [ii--iii] look alike syntactically, but they are understood, and hence punctuated, differently. In [ii] the whole sent> 

Transfer interrupted!

king a question. But I wonder in [iii] indicates that the sentence is posing a question, so the question mark goes at the end. In [iv] we have a sequence of two interrogatives, but they express a single question, and permit only a single question mark, at the end (see Ch. 10, #5.4).

E Use of question mark to indicate doubt

<REB:71> [8]

[8] i Michaelangelo Merisi (b. 1571?, Milan? -- d. July 18, 1610, Port'Ercole, Tuscany)

ii He lives with an ophthalmologist (?) in Kensington.

Example [i] illustrates the use of the question mark to indicate uncertainty about attributed dates and places; in some cases it is placed before the item in doubt (b. ?1571). In [ii] the question mark is enclosed in parentheses; this belongs to relatively informal style but again indicates uncertainty about the correctness of the item concerned (I may have doubts as to whether the person is in fact an ophthalmologist -- or perhaps I'm unsure about the spelling).

D Exclamation mark

E Terminal of unembedded sentence

<REB:72> [9]

[9] i To hell with you! Up the Socceroos! Blast! Fire! Talk about arrogance!

If only we had listened to her! That it should have come to this! Quick!

ii What a mess they made of it! How kind you are!

iii Look out! Get some water!

iv That's cheating! They had come without any money!

v Isn't it fantastic! What does it matter, anyway!

Exclamation marks are often used with sentences whose form departs from the major main clause constructions: a variety of patterns of this kind are illustrated in [i]. Replacement of the exclamation mark by a full stop in such cases would be impossible or else would completely change the interpretation (several of them, for example, could occur with a full stop when standing as an elliptical answer to a question). The remaining examples have ordinary main clause form. Those in [ii] are syntactically exclamative, and here the exclamation mark is virtually obligatory (leaving aside echo questions). The examples in [iii] are imperative; as noted above, the full stop is the default terminal for imperatives, but the exclamation mark is also commonly used. It may serve to impart a sense of urgency, and/or to give the directive the force of a command or an entreaty, as opposed, say, to a request. With declaratives, as in [iv], the exclamation mark indicates that the content is regarded as remarkable or sensational, something that merits or requires special noting. Exclamation marks are also found with interrogatives, as in [v], when the illocutionary force is that of a statement: see the discussion of [5iii--iv] above.

E Embedding

Like questions, exclamations can be embedded within a matrix sentence, and may also be subclausal:

<REB:73> [10]

[10] i He replied, `I've never been so insulted in my life!'

ii At first things went smoothly, but soon, alas!, the casualties began and we had to devise a new strategy.

Exclamative clauses do not show the clear difference between main clause and subordinate clause internal syntax that we illustrated for interrogatives in [6]. Nevertheless, we still find that clauses identifiable as subordinate exclamatives on external grounds do not take exclamation marks. Compare:

<REB:74> [11]

[11] i She remembered what a struggle it had been in those days to make ends meet.

ii It's amazing what a difference a good night's sleep can make!

In [i] the exclamative clause is complement of remembered, with the matrix clause a declarative terminated by a full stop. It [ii] the exclamative clause is embedded as extraposed subject; the matrix is declarative but takes an exclamation mark as terminal because of the exclamatory meaning associated with its predicate is amazing.

D Multiple terminals

It is possible for question and exclamation marks to be iterated for emphatic effect, and for an exclamation mark to follow a question mark:

<REB:92> [12]

[12] i Who, I wonder, is going to volunteer for the late shift??

ii Guess what -- we've sold the house at last!!

iii Did you see his face when she mentioned the doctor?!

This again reflects the fact that the main function of these two indicators is to indicate status: there is no comparable use of the terminal full stop, a pure boundary marker. In [iii] the question mark signals that the sentence is a question, while the exclamation mark conveys that there was something remarkable about the situation -- presumably his face showed strong emotion of one kind or another. Examples like those in [12] tend to be disfavoured by the manuals; as observed in #1.3, they are restricted to informal style.

D Punctuation of phrases and coordinate main clauses as separate sentences

<REB:93> [13]

[13] i He had broken the vase. Deliberately.

ii The house needs painting. And there's still the roof to be fixed.

The default punctuation here would be as a single sentence. In [i], where deliberately is interpreted as an adjunct modifying the verb broken, the division into two sentences has an information packaging function: it presents the whole as a sequence of two messages, which serves to give extra importance to the contribution of the adjunct. In speech the same effect is achieved by setting the adjunct apart prosodically. The division of a clause coordination, as in [ii], may be motivated by the same consideration, but it may also serve simply to keep the sentences shorter than they would otherwise be, and for this reason is particularly common in journalism.

3 The secondary boundary marks: comma, semicolon and colon

While the terminal full stop marks the boundaries between successive sentences, the comma, semicolon and colon normally mark boundaries within a sentence, and hence can be regarded as secondary boundary marks. They indicate a weaker boundary than the full stop, and we will see in #3.1 that there are grounds for regarding the comma as weaker than the colon or semicolon, so that these indicators may be arranged into a hierarchy of relative strength as follows:

<REC:5> [1]

[1] full stop > ( colon ) > comma ( semicolon )

In the present section we confine our attention to sentences containing neither parentheses nor dashes. The dash is also a secondary boundary mark in its main use, but it does not fit neatly into the hierarchy of strength, and we defer consideration of it until #5.

D Exception: colon marking a non-final sentence

One exception to the distributional distinction between the primary and secondary boundary marks is that the colon is sometimes followed by a capital letter:

<REB:95> [2]

[2] i Libraries have not tried hard to compete in this domain: Their collections are still dominated by books.

ii A number of questions remain to be answered: Who will take responsibility for converting the records to digital form? How are the old records to be stored? Who will have access to the digital files?

It seems best in such cases to take the colon as marking the boundary of a sentence, so that [i] will consist of a sequence of two sentences, and [ii] as a sequence of four. Clearly, however, a sentence with a colon as terminal could never be the last sentence in a text.

3.1 Some formal preliminaries

D Asymmetry between marking of left and right boundaries

There is an important asymmetry in the marking of boundaries:

<REB:96> [3]

[3] i Constituents whose right boundary is marked very often have no marking of their left boundary.

ii Constituents whose left boundary is marked almost always have their right boundary marked -- by a mark at least as strong as the one on the left.

Compare the following examples, where the relevant constituents are underlined:

<REB:97> [4]

[4] i a. There'll be no problem because anyone can take part, provided they're over 18.

b. She suggested that the most important factor had been overlooked: the cost.

c. He has written books on Babe Ruth; on Tinker, the shortstop, Evans, the second baseman, and Chance; and on Hank Aaron.

d. *Jill was in fact, keeping her options open.

ii a. *Anyone can take part, provided they're over 18 so there'll be no problem

b. *He told the press his reason: he did not want have to renegotiate his contract, but he did not give any explanation to the team owners.

c. *He has written books on Babe Ruth; on Tinker, the shortstop, Evans, the second baseman, and Chance; and on Hank Aaron, and they've all sold well.

d. Kim, Pat, and Alex had done most of the organising.
Examples [ia--c] have constituents with respectively a comma, colon and semicolon at the right boundary, but no mark at the left -- and they are completely well-formed. There are certainly some cases (discussed in #3.2.5) where a comma at the right boundary requires a mark at the left too, as evident from [id]; but they do not represent the general pattern. In [iia--c] we have constituents with a comma, colon and semicolon on the left and no mark, or a weaker one, on the right, and they are strongly deviant. The only systematic exception involves the comma, and is virtually restricted to coordination, as in [iid].

D The strength hierarchy

It is constraint [3ii] that justifies the hierarchy of strength given in [1] above. In particular, it provides evidence that the comma is weaker than the colon and semicolon. A constituent with a colon or semicolon on the left cannot have a comma on the right, as illustrated in [4iib--c]. It is not possible to establish any categorical difference between colon and semicolon in this respect, and it is for this reason that we have placed them at the same position in [1]. Compare, for example:

<REB:99> [5]

[5] i He told the press his reason: he did not want have to renegotiate his contract; but he did not give any explanation to the team owners.

ii With a book as complex and anarchic as this, such reductionism is misleading. You could as easily say it was about the failure of Sixties' radicalism; the decline of the dollar; the hegemony of television culture: it is all these, and more.

In [i] we have a colon on the left and a semicolon on the right. The structure of the whole sentence, at the top level, is `X; but Y': the semicolon marks the terminal boundary of all that precedes it, and hence can be said to have scope over the colon, which is included within the X. In [ii] we have the converse situation. The underlined NP has a semicolon on the left and a colon on the right. At the top level, the sentence has the form `X: Y', for the part following the colon provides an elaboration on the whole of what proceeds, not just the underlined NP. This time, then, the colon has scope over the semicolon, which is included within the X. It is much more usual for a semicolon to have scope over a preceding colon than vice versa, but neither relation is formally excluded, and we cannot therefore establish any strict hierarchical ordering between these two punctuation marks.

D The single level constraint on the colon and semicolon

Two colons or semicolons may not occur at different levels within a single construction (leaving aside cases where one is located within a parenthesised element). Compare:

<REC:2> [6]

[6] i I wouldn't recommend it, but he can certainly take part, provided he's 18. ii *A new policy has been instituted: the evaluation will be made by groups that will have only one responsibility: to prepare the year-end reports.

iii *All students had to take a language; Sue took French; she already spoke it well.

In [i] the first comma separates the two main clauses linked syntactically by but; the second comma then marks off an adjunct located in the second of the coordinated clauses. The second comma thus marks a boundary at a lower level than that marked by the first. This is not permitted with colons and semicolons, however, as illustrated by the unacceptability of [ii--iii]. In [ii] the second colon marks the boundary between only one responsibility and the supplement to prepare the year-end reports which provides an elaboration of it. The first colon marks the boundary between two main clauses, and since only one responsibility is a constituent of one of them, the second colon is at a lower level than the first. The same applies with the semicolons in [iii], though perhaps not so obviously. The second clause, Sue took French, provides an elaboration on the first: we infer that Sue is one of the students, and the sentence moves from a statement of a general requirement applying to all students to a particular statement concerning one student's satisfaction of the requirement. The third clause then provides an elaboration of the second: a natural interpretation is that Sue chose to take French because she already spoke it well. There is then no direct relation between the third clause and the first: the third is a supplement to the second, and the second is a supplement to the first. This means that the second colon is at a lower level than the first, as in [ii], and since it is not a comma this results in deviance.

D Further constraints on the colon.

The colon is subject to two further constraints. Firstly, unlike the comma and the semicolon, it is not used to separate elements in a coordinative relation, but is restricted to constructions containing just two terms. Compare:

<REC:3> [7]

[7] i Many welcomed the proposal, some were indifferent, a few strongly opposed it.

ii Many welcomed the proposal; some were indifferent; a few strongly opposed it.

iii *Many welcomed the proposal: some were indifferent: a few strongly opposed it.
Combined with the constraint illustrated in [6], this means that whenever a sentence contains two colons they will belong in separate constituents, as in:

<REC:4> [8]

[8] The press secretary gave them the rules: they were not allowed to speak to the committee directly; all other members were forbidden to discuss what the committee had decided: a hiring freeze would take place.
Here the topmost constituent division is the one marked by the semicolon, and the colons thus occur in distinct constituents of the sentence. Neither has scope over the other.

Secondly, a constituent whose left boundary is marked by a colon cannot be followed by further material in the same clause:

<REC:21> [9]

[9] *Smith has written books on the Risorgimento, which was an exciting period; on the topic of this conference: the Neapolitan Revolution of 1799; and on the `Italietta' period of the late 19th century.
Here the colon marks a boundary within a non-final coordinate that is subclausal: only a comma (or a dash) would be admissible in this context.

3.2 Uses of the secondary boundary marks

We observed in #1 that the syntactic relations of coordination and supplementation need not be formally marked, so that with a sequence of main clauses there may be indeterminacy as to whether or not they are syntactically related in a coordination or apposition construction. For this reason we will look first, in ##3.2.1--2, at these constructions in cases where they are formally marked (i.e. they are syndetic rather than asyndetic) and/or involve constituents lower in the hierarchy than main clauses (i.e. they are subclausal, with the understanding that this covers subordinate clauses). Then in #3.2.3 we consider asyndetic combinations of main clauses. The last two subsections, ##3.2.4--5, deal with remaining cases of subclausal boundaries, the first covering cases where there is no requirement that the left boundary be marked as well as the right, the second with what we call delimiting commas, where both boundaries must normally be marked.

3.2.1 Coordination, syndetic or subclausal

In coordination, punctuation is commonly used to separate one coordinate from the next. The comma is the default mark; under certain conditions, however, a semicolon (but not a colon) is used instead. We will look in turn at bare and expanded coordinates, i.e. those that respectively lack or contain a coordinator (see Ch. 15, #@@).

D Non-initial bare coordinates: left boundary mark obligatory

<REB:79> [10]

[10] i The President will chair the first session, Dr Jones will chair the second, and I myself will look after the third.

ii The President, Dr Jones, and I myself will chair the first three sessions.

iii Do you call this government of the people, by the people, for the people?

iv They can, should, and indeed must make due restitution.

v It has a powerful, fuel-injected engine.

The underlined coordinates are neither initial nor marked by a coordinator; in this context, the indicator at the left boundary is strictly obligatory. In the case of modification in the structure of nominals, as in [v], the punctuation distinguishes coordination from the stacking of modifiers (see Ch. 5, #14.2@@). In [v] itself, then, engine is modified by a coordination of adjectives, giving the meaning ``engine that is both powerful and has fuel injection''. In a powerful fuel-injected engine, by contrast, there are two layers of modification: engine is modified by fuel-injected to form the nominal fuel-injected engine, and this is in turn modified by powerful, allowing a somewhat different interpretation, ``engine that is powerful by the standards applicable to fuel-injected ones''.

D Non-initial expanded coordinates

With coordinates introduced by a coordinator, we have no categorical rule comparable to the one given for bare coordinates. This is an area where we find variation between heavy and light punctuation, the former style including more commas in this position than the latter. Punctuational marking is more likely before a long and complex coordinate than before a short and simple one (and thus, other things being equal, before a clause than before a VP, for example); it is somewhat more likely with but than with and and or; it is inadmissible in joint coordination (Ch. 15, #@@). Compare:

<REB:80> [11]

[11] i Their friendship for Augusta became rather hollow, and the news that Byron had left her practically all his money caused it to crumble to oblivion.

ii He packed up his papers and stormed out of the room.

iii I'll do my best, but I doubt whether I'll get very far.

iv My flat-mate and my brother's philosophy tutor have just got engaged.

Joint coordination is illustrated in [iv]: in the intended interpretation, they got engaged to each other, which completely excludes the possibility of a comma. Punctuation may also be added to prevent misreadings, as illustrated in [13ii] of #1 (Liz recognised the man who entered the room, and gasped).

D Use of the semi-colon in coordination

A semicolon can be used instead of a comma, typically in relatively formal style, under conditions illustrated in:

<REB:81> [12]

[12] i In the 1890s Chicago had more Germans than any of Kaiser Wilhelm's cities except Berlin and Hamburg; more Swedes than any place in Sweden except for Stockholm and G`teborg; and more Norwegians than any Norwegian town outside of Christiana (now Oslo) and Bergen.

ii After the war, the United States produced half of the world's goods; our manufacturers had no peers; and our military, bolstered by the atomic bomb, had enemies but no equals.

iii His band members are Phil Palmer, guitar; Steve Ferrone, drums; Alan Clark and Greg Phillinganes, keyboards; Nathan East, bass; and Ray Cooper, percussion.

iv Professor Brownstein will chair the first session, and the second session will be postponed; or I will chair both sessions.

v He had forgotten the thing he needed most: a map; and he was soon utterly lost.

In [i] the semicolon is motivated by the length and complexity of the coordinates. This is a rather untypical example, however, in that none of the coordinates contains a comma; usually one or more of them do, as in [ii--iv]. In such cases the punctuation helps in the perception of the hierarchical structure, with the semi-colon separating constituents higher in the tree structure than the commas. A special case is seen in [iv], where we have layering of coordination: and joins the first two clauses into a single unit which as a whole is coordinated to the third by means of or. It is or, then, that marks the upper layer of coordination, and which is consequently preceded by the stronger boundary indicator. In [v] the first coordinate contains not a comma but a colon. In this case a comma could not replace the semicolon (see the discussion of [4iib]), whereas it could occur as a less preferred option in [12i--iv].

3.2.2 Supplementation, syndetic and subclausal

D Markers of supplementation

As noted in Ch. 15, #@@, supplements may be marked bu such indicators as namely, that is, that is to say, viz, for example, in particular, and so on. Supplements introduced by such items may be preceded by any of the secondary boundary markers. Examples [13i--ii] have subclausal and main clause supplements respectively:

<REB:94> [13]

[13] i a. The 19th century cases on which the Act was based were mainly sales between businessmen and organisations, that is, sales by manufacturers and suppliers.

b. This statement is still valid today, since `resemblances' lead us to think in `as if' terms; that is, in metaphorical terms.

c. Wittgenstein's treatment of the `Other Minds' problem is an extended illustration of a point in philosophical logic: namely, that the meaningfulness of some of the things we say is dependent on contingent facts of nature.

ii a. Mature connective tissues are avascular, that is, they do not have their own blood supply.

b. One way of speaking about this is to say that images in a dream seem to appear simultaneously; that is, no part precedes or causes another part of the dream.

c. Pneumatic bearings also have a considerable application which has not been developed outside gyroscopes: for example, a patent has recently been taken out covering the use of a pneumatic bearing for a glass polishing head.

D Asyndetic subclausal supplementation

The left boundary of subclausal supplements may be marked by a comma or a colon, though the constraints outlined in #3.1 mean that the colon is admissible only if the supplement follows the clause containing the anchor:

<REC:7> [14]

[14] i Bishop Terry Lloyd, the only Welshman in the college, had opposed the plan.

ii They went to Bill Clinton, the only man who could help them.

iii It was her face that frightened him most of all, the frosty smile, the brilliant unblinking eyes.

iv Either eat your breakfast or get dressed, one or the other.

v The ship steered between the buoy and the island: the only course that would avoid the rocky shoals.

vi Areas with a high concentration of immigrants tend also to be areas of ethnic conflict: Los Angeles, Miami, Adams-Morgan, Crown Heights.

In [iii--v] either a comma or a colon could be used. But it is not always so: a colon would be out of order in [ii], for example. Tnhis is because the supplement provides descriptive, not identifying, information -- compare They went to the only man who could help them: Bill Clinton, where the supplement does identify. In [vi], on the other hand, the colon could not be replaced by a comma. {RDH: Is there a concise explanation that can be added?}

3.2.3 Asyndetic combinations of main clauses

In combinations of main clauses with no coordinator or supplementation marker, there is no grammatical indication of the nature of the relation between the clauses. In some cases, notably where and or but could readily be inserted, they can be interpreted as coordinate; in others, the second provides an elaboration of the first -- an explanation, an exemplification, a consequence, and so on. In general, the absence of any grammatical link strongly favours a stronger indicator than a comma to separate the clauses. Thus, although examples like the following occur, they would be widely regarded as infelicitous in varying degrees:

<REB:86> [15]

[15] i ?The locals prefer wine to beer, the village pub resembles a city wine bar.

ii *Your Cash Management Call Account does not incur any bank fees, however, government charges apply.

Example [i] illustrates what prescriptivists call a `spliced' or `run-on' comma, with the implication that the sentence should be split into two. A special case of this is where the second clause begins with a connective adjunct such as however, nevertheless, thus and the like; while [ii] is an attested example, it would generally be regarded as unacceptable.

Nevertheless, there are certainly conditions under which a comma is acceptable, and we will accordingly give in turn examples of these asyndetic main clause combinations marked by a comma, semicolon and colon.

D Comma

<REC:8> [16]

[16] i It was raining heavily, so we decided to postpone the trip.

ii To keep a child of twelve or thirteen under the impression that nothing nasty ever happens is not merely dishonest, it is unwise.

iii Some players make good salaries, others play for the love of the game.

Example [i] might be called `quasi-syndetic': although so here does not belong to the syntactic category of coordinators, it serves a similar linking function, and a comma is strongly preferred over a semicolon or colon. Yet behaves in the same way: see Ch. 15, #2.10 Example [ii] is representative of constructions where a positive clause follows a negative, especially one where the negation combines with only, simply, merely or just. In such cases the positive clause is often introduced by but, giving syndetic coordination; the asyndetic construction without but is also common, however, and readily allows a comma (as well as other marks: see below). In [iii] the comma is justified by the close parallelism between the clauses and their relative simplicity. The comma-linked cases are thus broadly coordinative in interpretation.

D Semicolon

<REB:87> [17]

[17] i They came on the Mayflower; they came in groups brought over by colonial proprietors; they came as indentured servants.

ii The Latin, for example, was not only clear; it was even beautiful.

iii Some colonies started under the rule of private corporations that looked for the profits in fish, fur, and tobacco; some were begun by like-minded religious seekers.

iv All students had to take a language; Sue took French.

v The bill was withdrawn; the sponsors felt there was not sufficient support to pass it this session.

The semicolon allows both coordinative and elaborative interpretations. Example [i] has three clauses at the same hierarchical level, putting it very clearly with the coordinative type. Examples [17ii--iii] are comparable to [16ii--iii], but have a semicolon instead of the comma. Again we note that here the first clause contains internal commas, so the semicolon serves to show that the boundary it marks is higher in the hierarchical structure of the sentence. These two examples can be also subsumed under the category of asyndetic coordination: the clauses could be linked by but and and respectively. In [iv--v], on the other hand, the relation is elaborative rather than coordinative, and here the semicolon could be replaced by a colon but not by a comma.

D Colon

<REB:89> [18]

[18] i Roosevelt was not a socialist: his solution was not to eliminate capital, but to tame and regulate it so that it could coexist harmoniously with labour.

ii He told us his preference: Jan would take Spanish; Betty would take French.

iii The rules were clear: they were not allowed to speak to the committee directly.

iv Brown pointed out the costs to the community on the radio last night, and McReady mentioned the political consequence in this morning's paper: the bill will cost the taxpayers more than $100,000 in the first year, and may be seen as giving the Republicans an unfair electoral advantage.

The colon, we have seen, is not used in syndetic coordination, and in aysndetic combinations it indicates an elaborative rather than coordinative interpretation. What it elaborates on may be a whole clause, as in [i], or a smaller element, such as his preference in [ii] or the non-final NP the rules in [iii]; indeed, there may be more than one such item, as in [iv], where the clause following the colon elaborates on both the costs and the political consequence.

Like the comma and semicolon, the colon can separate a positive--negative sequence, where the first clause contains not + only/simply/merely/just:

<REB:90> [19]

[19] The Romans built not only the Fort of Othona: they had a pharos, or lighthouse, on Mersea.
This does not invalidate our statement that the colon cannot be used to separate clauses in a coordinative relation. It is, rather, that the elaboration relation makes perfect sense in this context: the second clause provides an explanation or demonstration of what is said in the first clause. Note, then, that it would be quite impossible to insert but after the colon.

3.2.4 Further cases of simple boundary marking at the subclausal level

D Between verb and direct reported speech complement: obligatory comma or colon

<REB:83> [20]

[20] i Kim asked plaintively, `What am I going to do?'

ii He added: `Some missiles missed their targets, resulting in collateral damage.'

In this construction the reported speech is complement of the reporting verb -- see #6 for the construction where the reporting verb is in a parenthetical. The direct reported speech complement (whether enclosed in quotation marks or not) is required to be preceded by a punctuation mark, usually a comma. A colon is also possible provided the reported speech is relatively long and complex. Note the contrast with indirect reported speech, where such punctuation is inadmissible: *He added, that some missiles had missed their target.

D Before certain types of complement: optional colon

<REC:9> [21]

[21] i The seminar will cover: superannuation; financial planning; personal insurance; home and investment loans.

ii The question to be considered next is: `How long should artificial respiration be continued in the absence of signs of recovery?'

The complement in [i] has the form of a list; this type is semantically comparable to appositive supplementation, with the list anchored to some such expression as the following topics. In [ii] the colon occurs before the complement of be in its identifying sense -- a complement, moreover, which has the form of a main clause; this case has affinities with the reported speech construction. Constructions [20]--[21] are exceptional: in general, a verb may not be separated from its complement by punctuation.

D Between the main constituents of a gapped clause: optional comma

<REB:84> [22]

[22] i The first film was released in October in just a few large cities and the second, in Christmas week in more than 400 theatres across the country.

ii Some of the immigrants went to small farms in the Midwest; others, to large Eastern cities.

The second clauses here belong to the gapping construction (Ch. 15, #@@), with the comma marking the place where material is missing: was released and went respectively. In short and simple cases, however, it is more usual not to mark the gap, especially if the gapped clause is itself preceded by a comma: One of them was French, the other German.

D Between subject and verb: comma under exceptional circumstances

<REB:85> [23]

[23] i *The right of the people to keep and bear arms, shall not be infringed. ii What he thought it was, was not clear.
In Present-day English there is normally a strong prohibition on punctuation separating subject and verb: examples like [i] are now completely inadmissible. The rule is relaxed, however, in certain cases. In [ii], for example, the comma prevents any confusion that might be caused by the juxtaposition of two tokens of the verb-form was. And in Most of those who can, work at home ([13iii] of #1) it prevents work at home being taken as complement of can.

3.2.5 Delimiting commas

Simple examples of delimiting commas are seen in:

<REC:10> [24]

[24] i Some, however, complained about the air-conditioning.

ii The plumber, it seems, had omitted to replace the washer.

iii Henry, who hasn't even read the report, insists that it was an accident.

iv I suggest, Audrey, that you drop the idea.

Here the commas mark both left and right boundaries of a subclausal constituent that is set apart from the main part of the sentence, usually indicating that it is in some sense less central to the message. If the left or right boundary coincides with that of a larger construction that is marked by a stronger indicator, then the comma is superseded by, absorbed into, the latter:

<REC:11> [25]

[25] i Most of them liked it. However, some complained about the air-conditioning.

ii Things are quite difficult: unlike you, I don't get an allowance from my parents.

iii We've been making good progress; even so, we've still a long way to go.

iv The plumber had omitted to replace the washer, it seems.

v They want to question Henry, who hasn't even read the report: it's quite unfair.

vi I suggest you drop the idea, Audrey; it would be better to stay where you are.

Examples [i--iii] show the left boundary superseded by a full stop, colon and semicolon respectively, and [iv--vi] show the same for the right boundary. In most cases, as in these examples, this arises when the delimited constituent is initial or final in the construction containing it. It is not of course possible for both boundaries to coincide with a higher one, so a delimited constituent will normally have a comma marking at least one of its boundaries. Colons and semicolons do not serve this function of setting a constituent apart. They could thus not replace the right commas in [i--iii] or the left commas in [iv--vi].

D Types of delimited element

The above examples illustrate the range of elements that are commonly delimited. In [25i--iii] we have an adjunct, in [iv] a parenthetical, in [v] a supplementary relative clause, in [vi] a vocative. With parentheticals and vocatives delimitating punctuation is required. Supplementary relative clauses, and similarly detached participials, are usually set off punctuationally, but (contrary to the rules given in the manuals) examples without punctuation are certainly attested. Supplementary NPs interpolated within a clause also take delimiting punctuation, as seen earlier in [14i]. In addition, commas are obligatory with the peripheral elements in left and right dislocation structures (Ch. 16, #@@): My neighbour, she's just won the lottery (left dislocation), I don't think a lot of him, the new manager (right).

E Constituents introduced by coordinators

We have seen that commas are often used to separate coordinates but, less commonly, they have a delimiting function:

<REC:12> [26]

[26] i The students, and indeed the staff too, opposed all these changes.

ii She laughed, and laughed again.

iii He seemed to be both attracted to, and overawed by, the new lodger.

The effect in [i] is to present the underlined NP as a parenthetical addition rather than an element on a par with the preceding element in terms of information packaging: we treat this too as a kind of supplementation rather than genuine coordination. In [ii] the second VP is clause-final, so it is not immediately obvious that the comma is delimiting; this becomes apparent, however, when we add a complement: She laughed, and laughed again, at the antics of the little man. Example [iii] belongs to the delayed right constituent construction (Ch. 15, #@@); the commas, though optional, help show that the new lodger is understood as complement not only of by but also of to.

E Adjuncts and complements

Because the function of delimitation is to set an element off from the central part of the message, it applies in clause structure predominantly with adjuncts rather than complements. Delimitation of a complement in its basic position is normally grossly deviant: *He blamed, the accident, on his children. With adjuncts, there is considerable variation as to when delimiting commas are used: this is the area where the contrast between the heavy and light styles of punctuation is most evident.

The main factors influencing the use of delimiting punctuation are:

<REC:13> [27]

[27] i length and complexity of the constituent

ii whether or not there are punctuation marks nearby

iii the linear position of the constituent

iv the semantic category of an adjunct

v the possibility of misparsing

vi prosody

Other things being equal, a short simple constituent is less likely to be marked off than a long complex one (e.g. one with the form of, or containing, a subordinate clause). The influence of nearby punctuation is seen in such a pair as:

<REC:14> [28]

[28] i She was not sorry he sat by her, but in fact was flattered.

ii She was not sorry he sat by her but, in fact, was flattered.

In [i] we have a comma before but, separating the coordinate main clauses, and the following adjunct in fact is not marked off. Conversely, in [ii] there is no comma before the coordinator and the adjunct is delimited. It would be possible to combine the comma of [i] with those of [ii], but to have three commas in such close proximity is likely to be perceived as noticeably heavy punctuation.

As for position, delimiting commas are most likely with adjuncts located internally within the clause. And they are more likely with elements in front position than at the end of the clause. This latter point applies, indeed, to complements too: in the relatively few cases where complements are delimited they are in front position. Compare, then:

<REC:15> [29]

[29] i a. You'll have to train every day to have any chance of winning. ) [adjunct]

b. To have any chance of winning, you'll have to train every day. )

ii a. He's not humble. ) [complement]

b. Humble, he's not. )

A delimiting comma before a final adjunct of a type that readily occurs without one may have the effect of presenting the content as a separate unit of information; it may also serve to mark the semantic scope of a negative. Compare:

<REC:17> [30]

[30] i He had seen her at the supermarket, only two days earlier.

ii She didn't buy it, because her sister had one.

In [i] the comma has an information packaging function, dividing the whole message into two units of information. In [ii] the comma indicates that the negative does not have scope over the reason adjunct: we understand ``Her sister having one was the reason for her not buying it'', not ``Her sister having one was not the reason for her buying it''.

Consider next the semantic category of the adjunct. We have noted that complements are not normally delimited and this reflects the fact that they are more tightly integrated into the main predication; similarly, within the very wide range of adjunct types, those that are related most directly to the verb and its complements are less likely to be marked off by commas than the semantically more peripheral ones. Within the (necessarily incomplete) list of categories given in [1]@@ of Ch. 8, #1, the later ones thus tend to favour delimitation more than the earlier ones. Among the categories that most strongly favour commas are adjuncts of result, evaluative adjuncts (especially when non-initial), speech-act related adjuncts and connectives:

<REC:18> [31]

[31] i They increased the rent, so that it now took 40% of our income.

ii No one had noticed us leave, fortunately.

iii Frankly, it was an absolute disgrace.

iv It now looks likely, moreover, that there will be another rate increase this year.

The use of delimiting punctuation to forestall possible misparsings is illustrated in:

<REC:19> [32]

[32] Most of the clothes, my father had bought at Myers.
The initial element here is a complement, and hence would not generally be marked off, but the comma serves to forestall an initial reading where my father is subject of a relative clause modifying clothes.

Consider finally the relevance of prosody. We have emphasised that punctuation cannot be regarded as a means of representing the prosodic properties of utterances, but there is no doubt that there is some significant degree of correlation between the use of delimiting commas and the likelihood that the constituent concerned would be set apart prosodically in speech. Compare, for examaple:

<REJ:36> [33]

[33] i That is probably true. However, we should consider some alternatives.

ii That is clearly unsatisfactory. Thus the original proposal still looks the best.

In speech an initial however is characteristically prosodically detached from the rest, while thus is not, and this correlates with the fact that delimiting punctuation is very much more frequent with however than with thus.

4 Parentheses

In their primary use parentheses occur in pairs and enclose what we will call a parenthesised element. Their function is to present that element as extraneous to a minimal interpretation of the text, as inessential material that can be omitted without affecting the well-formedness and without any serious loss of information. They provide an elaboration, illustration, refinement of, or comment on, the content of the accompanying text.

D Range of parenthesised elements

<REC:22> [1]

[1] i Amazingly, only about 500,000 legal immigrants entered the U.S. in the whole of the 1930s. (In those days there was little illegal immigration.)

ii Southern liberals (there are a good many) often exhibit blithe insouciance.

iii But listening to his early recordings (which have just been re-issued by Angel), one has the impression of an artist who has not yet found his voice.

iv If your doctor bulk bills (that is, sends the bill directly to the Government) you will not have to pay anything.

v It seems that (not surprisingly) she rejected his offer.

vi The discussion is lost in a tangle of digressions and (pseudo-) philosophical pronunciamentos.

vii Any file(s) checked out must be approved by the librarian.

viii One answer might be that only different (sequences of) pitch directions count as different tones with respect to the inventory.

A very great range of expressions can be parenthesised. In [i] we have a sentence -- and indeed it could be a sequence of sentences or a whole paragraph. In [ii] we have a main clause (it could not be punctuated as a sentence), and in [iii] a subordinate clause (a supplementary relative). The parenthesised element in [iv--v] is a phrase (VP and AdvP respectively), in [vi] a combining form, and in [vii] an inflectional suffix. Finally, [viii] shows that it need not be a grammatical constituent: sequences is head of the NP sequences of pitch directions, while of is the first word of the complement PP.

In all these examples except [ii], the parenthesised element is integrable in the sense that the parentheses could be omitted or (as in [iii--v]) replaced by commas (at the left or both left and right boundaries). With the non-integrable type the status of the parenthesised element cannot be changed in this way. Where it is medial within the containing clause, the parentheses could only be replaced by dashes, which would make hardly any change to its informational status; where it is final, a colon or semicolon could be used to separate it from what precedes. The non-integrable type characteristically has the form of a main clause; we also find sequences like that in:

<REC:23> [2]

[2] The facts of her background include a beloved older brother who was institutionalised in his early 20s for `dementia praecox' (schizophrenia, probably) and died there some ten years later.
This consists of an NP followed by a modal adjunct, and if the order were reversed it would be integrable.

D Linear position

Non-integrable parenthesised elements must follow the constituent they are associated with., their anchor. Compare [1ii], for example, with *The committee included a group of (there are still a few around) Southern liberals. Integrable ones occupy the same position as they would if the parentheses were dropped, but there is a constraint prohibiting parenthesisation of an element at the absolute beginning of a clause. Thus we can parenthesise an element following a clause subordinator, as in [1iv], or following a coordinator -- as in but (not surprisingly) she rejected his offer -- but not right at the beginning: *(Not surprisingly) she rejected his offer.

D Combination with other punctuation marks

Punctuation within the parentheses depends mainly on the requirements of the parenthesised element itself. Thus terminal question and exclamation marks are used when it has the appropriate status. A full stop, and associated initial capital, however, is permitted only when the parenthesised element is not embedded within a sentence: compare [1i--ii]. The hyphen in (pseudo-) philosophical is required to be inside the parentheses because if pseudo were dropped the hyphen would drop too.

Punctuation outside the parentheses depends on the requirements of the containing sentence: it is the same as it would be if the parenthesised element were omitted. Any such punctuation normally follows, rather than precedes, the parentheses, as in [1iii].

D The single layer constraint

It is normally inadmissible to have one pair of parentheses included within another (leaving aside the secondary uses mentioned in footnote 14). Some manuals recommend that where the need for such embedding arises square brackets should be used at the lower level, but this is very much a minority usage. The usual way of solving the problem is to have parentheses at one level, dashes at the other, with no constraint on which of them occurs at the higher level:

<REC:24> [3]

[3] i There was a time when the Fourth of July was an occasion for re-creating the days of the American Revolution. (I hope that it makes a comeback, despite the assaults of a misguided -- and, it has to be said, self-defeating -- `multiculturalism'.)

ii Measures by Britain -- land of la vache folle (mad cow disease) -- to contain the problem have been ineffective.

D The insulating effect of parentheses

Parentheses set the enclosed material apart from the main text in such a way that the latter cannot depend on it for its well-formedness or interpretation. This is why such examples as the following are inadmissible:

<REC:25> [4]

[4] i *Kim (and Pat) have still not been informed.

ii *She brought in a loaf of bread (and a jug of wine) and set them on the table.

iii *Ed won at Indianopolis (and Sue came in second at Daytona) in the same car.

iv *Languages like these (which linguists call `agglutinating') are of great interest. Agglutinating languages are found in many parts of the world.

In [i] the parenthesised element is included within the subject that determines the form of the verb. In [ii] it is included within the antecedent for the pronoun them. In [iii] it is included within the comparison expressed by same: Ed and Sue drove the same car. In [iv] it provides an explanation of the term `agglutinating', which is used in the following sentence. In all these cases dropping the parenthesised element naturally maintains the anomaly, but dropping the parentheses (with commas substituted in [iv]) removes it.

5 The dash

Dashes occur either in pairs or singly, marking an ostensible break or pause in the production of the text. They are not used to separate coordinates, and hence, unlike the comma and the semicolon, they do not occur in open-ended series.

D Paired dashes

<REC:58> [1]

[1] i There's a difference over goals, but the end -- namely freedom -- is the same.

ii Exeter clearly enjoyed full employment -- as full, that is, as was attainable in the conditions of the time -- while Coventry languished in the grip of severe unemployment.

iii The book -- and the movie -- were strongly condemned by the Legion of Decency.

iv Immigrants do come predominantly from one sort of area -- 85 per cent of the 11.8 million legal immigrants arriving in the U.S. between 1971 and 1990 were from the Third World; 20 percent of them were from Mexico -- but services have not adapted to that reality.

v Many of Updike's descriptions of Hollywood -- the place -- are nicely observed.

vi In theory -- no, no theory! -- ideally, both description and dialogue should forward narrative.

When they occur in pairs they serve to set off some constituent from the rest of the text, giving it the character of an interpolation. The interpolation typically provides an elaboration, explanation or qualification of what precedes.

In this function dashes are in competition with delimiting commas and parentheses; either could replace them in [i--ii], while commas could in [iii]. They mark a clearly stronger break from the surrounding text than commas, and allow a larger range of constituent types to be delimited -- including, for example, a main clause, or combination of main clauses, as in [iv]. The distinction between integrable and non-integrable parenthesised elements drawn in #4 thus applies to dash-interpolations too.

There are also significant differences between paired dashes and parentheses. Dashes cannot enclose part of a word or a separate whole sentence: they could not, for example, replace the parentheses in [1i, vi--vii] of #4. They would also be at best very questionable in [1viii], where sequences of is a non-constituent that is not coordinated or otherwise paired with a comparable one. No less important is the functional difference. We noted that a parenthesised element is presented as inessential to and insulated from the accompanying text. This is often not so with dash-interpolations. In [1v] of this section, for example, the place is understood in a semantically restrictive sense, serving to distinguish Hollywood the place from Hollywood the industry: with parentheses it would give descriptive rather than identifying information, like a supplementary relative clause (as in Hollywood, which is a place). In [1vi] the interpolation serves to justify the correction of in theory to ideally, and the dashes are neither omissible nor replaceable by parentheses. Example [1iii] shows that dashes do not insulate the interpolation: the verb-form were agrees with the coordinate subject, and the pronoun they has it as antecedent. Note, then, that all but one of the deviant examples in [4] can be corrected by replacing the parentheses with dashes. The exception is [4iii], where comparison with same requires that the coordinates be of equal status; compare, similarly, *Kim -- and Pat -- are a happy couple.

D Single dashes

<REC:59> [2]

[2] i We could invite one of the ladies from next door -- Miss Savage, for example.

ii Initiative, self-reliance, maturity -- these are the qualities we're looking for.

iii We've got to get her to change her mind; the question is -- how?

iv You may be right -- but that isn't what I came here to discuss.

v But we would like your permission to do -- that is, to go further if need be.

vi `I think --' `I'm not interested in what you think,' he shouted.

In many cases a single dash is like the first member of a pair of dashes with the second member being superseded by or absorbed into an indicator that marks a higher level boundary. This is illustrated in [i], which may be compared with One of the ladies from next door -- Miss Savage, for example -- could be invited. There are other cases, however, where a single dash has a somewhat different function. Example [ii] is a special case of the left dislocation construction (Ch. 16, #@@): the dash follows a coordination in initial position which provides the antecedent for an anaphor in the clause nucleus, typically a demonstrative, as here. A colon might be used in this construction, but a dash is the usual punctuation. In [iii] the dash matches a prosodic pause in speech, serving to highlight the final complement; this use is also found after supplementation indicators such as namely, that is, for example, etc. In [iv] the dash signals an abrupt change of topic, and in [v] a change in grammatical construction. Finally, in [vi] it signals simply a breaking off, an interruption with no resumption.

D Relations with other indicators

A dash can follow a question or exclamation mark and a closing quotation mark or parenthesis, but otherwise it is normally mutually exclusive with other indicators -- in particular, the comma:

<REC:60> [3]

[3] i *Some of them -- Sue, for example, -- wanted to lodge a formal complaint.

ii *As he had no money -- he'd spent it all at the races --, he had to walk home.

These are both corrected by dropping the comma and retaining the dash:

<REC:61> [4]

[4] i Some of them -- Sue, for example -- wanted to lodge a formal complaint.

ii As he had no money -- he'd spent it all at the races -- he had to walk home.

The omission of the comma in [4i] is simply a further case of the absorption of the second of a pair of delimiting commas into an indicator that marks a higher constituent boundary: the comma-delimited constituent for example in [3i] is part of the larger dash-interpolation. But in [3ii] it is the comma that has wider scope, marking the top level division between all that precedes and the main clause that follows. So in [4ii] we have the unusual phenomenon of an indicator that marks the boundary of one construction superseding the indicator that would be expected to mark the boundary of a larger conmstruction. This relationship arises only between the dash and the comma; for another example, see [1ii]. It is, however, subject to severe constraints, as evident from:

<REC:62> [5]

[5] i *Kim and Pat, who were easily the best qualified candidates -- both had PhDs -- were the only ones shortlisted.

ii *Only four people came to the meeting: Ed, Mr Lake -- Ed's father -- Sue and me.

In [i] the second dash occurs at the end not only of the main clause interpolation but also of the relative clause supplement, and the need for a comma to mark the latter boundary is too great to permit its absorption by an indicator at a lower level. Example [ii] is even more sharply deviant. Here the second dash marks the boundary of the supplement Ed's father and also of the second coordinate, which in this context requires a following comma. Both examples could most easily be corrected by substituting parentheses for the dashes and adding the comma; in [ii] we could also use semicolons to separate the coordinates, with the second dash then being absorbed into the higher boundary mark: Ed; Mr Lake -- Ed's father; Sue; and me.

As far as the scope hierarchy shown in [1] of #3 is concerned, the dash can be placed on a level with the colon and the semicolon. Example [4i] shows that a dash can have scope over a comma, while the impossibility of having a comma in place of the second dash of [4ii] shows that a comma cannot have scope over a dash. Both scope relations hold between dash and colon or semicolon. In [1iv] the second dash has scope over the semicolon, while the semicolon has scope over the dash in The results are somewhat disappointing -- 20% down on last year; nevertheless, we are confident that the full year's results will match last year's. Similar pairs can be found for dash and colon.

Like the colon, the semicolon and parentheses, the dash cannot occur at two different hierarchical levels within a single constituent. The functional similarity between dashes and parentheses, however, means that where the need for such embedding might arise, the formal constraint can be avoided by alternating between the two different indicators, as in [3] of #4.

6 Quotation marks and related indicators

D Functions of quotation marks

Quotation marks serve to assign a special status to the stretch of text they enclose, which may be anything from a word to a sequence of paragraphs. Usually they indicate that the wording of the matter enclosed is taken from another source instead of being freely selected by the writer, as with ordinary text. The main categories of enclosed matter are as in [1], with corresponding examples given in [2]:

<REC:40> [1]

[1] i direct speech

ii quotation from written works

iii certain kinds of proper names, e.g. titles of articles, or radio/TV programs

iv technical terms, or expressions used ironically or in some similar way

v expressions used metalinguistically

<REC:41> [2] [2] i `Let's not bother,' he replied.

ii Fowler suggested that many mistakes made in writing result `from the attempt to avoid what are rightly or wrongly taken to be faults of grammar or style'.

iii `Neighbours' is Channel Nine's longest-running soap.

iv Their `mansion' was in fact a very ordinary three-bedroom house in suburbia.

v He doesn't know how to spell `supersede'.

D Single and double quotation marks

The above functions can all be indicated by means of either single or double quotation marks. AmE predominantly uses the double marks, while usage in BrE is divided, though British manuals tend to favour single marks. Strictly speaking, then, all examples containing quotation marks should have the % annotation, but we will simplify by omitting them, allowing this general statement to stand instead. (Our practice in this book is to differentiate between the two types, with single marks used for general purposes and double marks used for the special metalinguistic function of indicating meanings.)

When quotation marks are needed at different levels there is agreement that the two kinds of quotation marks should alternate:

<REC:42> [3]

[3] i Wilson's claim that `Shakespeare's ``To be or not to be'' is surely the most famous line of English literature, or any other' is disputed by French critics.

ii Wilson's claim that ``Shakespeare's `To be or not to be' is surely the most famous line of English literature, or any other'' is disputed by French critics.

In the rare cases where there are more than two levels the alternation continues: whichever type is used at level 1 is used again at level 3, while the other is used at level 2 and again at level 4, and so on. It is because of this pattern of alternation, and also the possibility of distinguishing them for special purposes, that we regard single and double quotes as distinct indicators, not merely different characters realising a single indicator (cf. #1.2 above).

D The pairing of quotation marks.

Quotation marks normally come in pairs, with one member marking the beginning, the other the end, of the quotation. One departure from this pattern is sometimes found in fictional writing. If a single character's speech extends over more than one paragraph, an opening quotation mark may be used at the beginning of each successive paragraph, with the closing one being reserved for the end of the final paragraph of the entire sequence. This is especially common in older (e.g. Victorian) novels, some of which have whole chapters told by a character in the 1st person, with opening quotation marks at the beginning of every paragraph. However, it is found in contemporary fiction as well.

D Quotation marks in combination with other punctuation marks

When an expression is enclosed within quotation marks inside a larger matrix sentence we need to consider the distribution of punctuation marks within the quotation itself and in the matrix sentence. This is a matter on which there is a good deal of variation, firstly between AmE and BrE, and secondly, within BrE (and other non-American varieties), between different publishing houses.

Let us begin with an untypically simple example:

<REC:32> [4]

[4] She replied, `Why are you wasting my time?' and stormed out of the room.
We will say that the question mark is internal, i.e. within the quotation marks, while the comma and full stop are external, outside the quotation marks. And what makes the example simple is that the formal location of the punctuation marks matches the meaning. The quotation is a question and hence needs a question mark; the matrix is a (non-exclamatory) statement and hence needs a full stop, while the comma separates the matrix verb from its direct speech complement. We take this to represent the default situation even though it is unusual: we need deal only with those cases which depart from this pattern. We will examine them under four headings.

E (a) An internal terminal full stop cannot occur medially within the matrix

<REC:33> [5]

[5] i *`I don't know.' she said, and stormed out of the room.

ii *She said, `I don't know.' and stormed out of the room.

iii *Nor would he consider trying to join Leslie and his men, rumoured to be close at hand and making for Scotland, `which I thought to be absolutely impossible. I decided instead to make for France', where it was hoped that Louis would back the royalist cause.

The inadmissibility of such examples is something on which all varieties are agreed. In [i] we need to replace the internal full stop with a comma -- whose position is discussed in (d) below. Example [ii] can be corrected by simply dropping the internal full stop, while [iii] requires radical reconstruction. The rule does not exclude examples like:

<REC:34> [6]

[6] i She replied, `I don't know. Does it matter?'

ii Yet Craig remains confident that the pitching `will come round sooner or later. We just have to hope everybody stays healthy.'

As far as the orthography is concerned, each of these consists of a sequence of two sentences, separated by a full stop. The quotation marks thus enclose part of the first sentence and the whole of the second.

D (b) Raising of semicolons and colons

<REC:35> [7]

[7] i We ought to get going; the train leaves in half an hour.

ii `We ought to get going,' she said; `the train leaves in half an hour.'

When a sentence containing a semicolon is quoted, and divided at the boundary marked by the semicolon, the latter is positioned in the matrix after the reporting frame, with a comma taking its place in the quotation. The same applies with colons. A dash can be treated in the same way, but it is more usual to place it within the second set of quotation marks:

<REC:36> [8]

[8] `We ought to get going,' she said `-- the train leaves in half an hour.'
D (c) Quotations at end of matrix sentence: combinations of sentence terminals

With quotations in final position it is usual to suppress one of the sentence terminals:

<REC:37> [9]

[9] i She added, `It wasn't your fault.' ) [suppression of matrix full stop]

ii So I asked, `Whose fault was it?' )

iii Did he really say `I couldn't care less'? [suppression of internal full stop]

iv %Did he really ask, `Whose fault was it?'?
If both terminals would be full stops, the matrix one is suppressed, as in [i]. If one is a full stop and the other a question mark, the full stop is suppressed, as in [ii--iii]. If both are question marks, there is variation: in [iv] both are retained, but it is probably more usual to drop one or other of them. Exclamation marks behave in the same way as question marks.

D (d) Relative order of comma or full stop and closing quotation mark

AmE has a rule that when a comma or full stop is adjacent to a closing quotation mark the latter must follow, irrespective of the relative semantic scope. BrE tends to position the punctuation marks according to scope, i.e. the meaning, subject to the constraints covered in (a)--(c) above. Meaning, however, does not always provide an unequivocal criterion, so we find a certain amount of variation within BrE practice.

The following cases are straightforward and uncontroversial, with the versions given here representing uniform BrE practice:

<REC:39> [10]

[10] i He'd apparently just been trying to `help one of my patients'.

ii Instead of doing his homework he was watching `Neighbours'.

iii I replied, `It was all Angela's fault.'

In [i--ii] the quotation is subclausal and does not license any internal punctuation; the full stop thus belongs semantically in the matrix and is hence located externally (contrary to AmE practice). In [iii], the quotation is a sentence and thus licenses a full stop; from a semantic point of view the matrix also merits a terminal full stop, but this is suppressed in accordance with (c) above, so that this time BrE matches AmE practice.

Less straightforward are cases like the following:

<REC:43> [11]

[11] i %`It was all Angela's fault,' I replied.

ii %`She said, `It was all Angela's fault', but no one believed her.

iii %`In that case,' she said, `we'll do it ourselves.'

iv %`Some of them', she said, `look very unsafe.'

In [i] the quotation would have a full stop if it stood alone, but cannot have one here because of point (a): it is in medial position within the matrix. It can be argued, then, that the comma does duty for the inadmissible full stop, and hence belongs internally. In [ii] the quotation is the same, but this time it can be argued that the matrix has a stronger claim on the comma, as it were, since it is needed to separate the coordinate clauses of the matrix compound sentence. In [iii] the quotation would quite likely have a comma after the adjunct in that case if it stood alone, so an internal comma is justified. In [iv], we could not have a comma after the subject some of them if the quotation stood alone, so this time there is no scope justification for an internal comma. Some styles will thus punctuate the examples in the way shown here; others, however, will prefer a simpler rule that locates all the commas externally (and likewise those in [7ii] and [8]) on the grounds that they are separating the quotation from other elements in the structure of the matrix sentence -- a rule that gives the opposite result to the AmE one.

D Marking alterations to quotations: ellipsis points and square brackets

Two indicators are used to mark alterations made to quoted matter -- ellipsis points indicate omissions and square brackets indicate substitutions or additions made by the quoting writer:

<REC:44> [12]

[12] i He goes on to say, `But Johnson ... was willing to accept a fee for the work.'

ii She concluded: `The first [model] fails the test of descriptive adequacy.'

iii According to Jones, `[N]o other language has such an elaborate tense system.'

iv It says that `the first version has been superceded [sic] by a cheaper model.'

In [i] some of the original text has been omitted after Johnson. In [ii] the writer adds model to clarify the denotation of what in the original was presumably an anaphorically reduced NP (the first or the first one). In [iii] the square brackets round N indicate a change from a small to a capital letter, a change made to satisfy the requirement that a quotation with the form of a main clause begin with a capital letter, except when it follows a subordinator, as in [iv]. Example [iv] illustrates the use of square brackets to enclose a comment by the writer: sic indicates that (contrary to appearances -- in this case the misspelling) what precedes is faithful to the original text.

D Alternatives to the use of quotation marks

E Block quotes

In expository texts, quotations of a substantial length (more than five lines, according to some style manuals) are often presented as block quotes, indented and set off from the surrounding text (and often in smaller type). In this case, no quotation marks are used:

<RDR:21> [13]

[13] As J. P. Quincy wrote in 1876: To the free library we may hopefully look for the gradual deliverance of the people from the wiles of the rhetorician and stump orator. As the varied intelligence which books can supply shall be more and more widely assimilated, the essential elements of every political and social question may be confidently submitted to that instructed common sense upon which the founders of our government relied.
The founders of the library movement envisioned the public library as an equal partner of the public school in achieving these goals.
E Italics and other modifications

For the less central functions of quotation marks given in [1iii--v], italics are often used instead. With titles, it is common to make a distinction between various categories, with quotation marks used for articles in periodicals or chapters in books, for example, and italics for whole monographs or journals. Bold face and small capitals provide alternative means of indicating technical terms, and works on language will typically employ a variety of indicators for different kinds of metalinguistic use, as we do in this book. Italics are also commonly used for foreign language expressions, or for emphasis:

<REC:47> [14]

[14] i I now realise that the baroque love of trompe l'oeil had a spiritual dimension.

ii Ed is a writer -- a writer! -- and Sue composes crossword puzzles for magazines.

E Absence of overt indication

Direct reported speech -- in the broad sense of the term -- is not always marked as such by punctuational means, especially when it is a matter of thought or interior monologue:

<REC:45> [15]

[15] i Where can she be?, he wondered.

ii I bet she's missed the train, he thought.

In texts consisting of dialogue, it is common practice just to give the speaker's name followed by a colon or dash.

7 Capitalisation

The use of capital letters has two main functions: to mark a left boundary and to assign special status to a unit. As a boundary markers, capitalisation normally applies to the first letter of the first word of a sentence, though in verse it occurs at the beginning of a new line. The use of capitals to mark sentence boundaries has been dealt with in #@@, and in the present section we will confine our attention to capitalisation as a marker of status.

D Kinds of special status

As status markers, capitals are prototypically used with institutionalised proper names and functionally comparable expressions. In addition, they can mark personal and relative pronouns anaphoric to the name of a deity (God in His infinite mercy), personification (We can conceptualise this as a game played against Nature), emphasis or loudness (I said, Don't Do That!; He must be a Really Important Guy in your life), or key terms in technical and legal texts (the Tenant shall be responsible for all damage). Capitals are also used in many initialisms -- abbreviations (TV, VIP) or acronyms (AIDS, TESOL): see Ch. 19, #2.2@@). And there is the use of I for the nominative form of the 1st person singular pronoun.

D Grammatical categories marked by capitals

<REC:55> [1]

[1] i NP Kim Smith, the Bishop of London, The Times

ii noun (or nominal) next Monday, a Ford Cortina, a Beethoven symphony

iii adjective French, Edwardian, Pinteresque, un-American

iv clause What's Up, Doc?, Alice Doesn't Live Here Anymore

The most common case is that of the NP -- which may of course consist of just a noun, as in Kim did it. We also have nouns or nominals that are not head of a capitalised NP, as with the examples in [ii], the first two of which have head function, while Beethoven is here a modifier. Capitalised adjectives are derived from nouns that have capitals, as French from France, and so on. Both noun and adjective categories can apply to bases in complex words as well as to whole words: mid-October, un-American. Capitalised clauses are normally restricted to the titles of artistic works, which are functionally like NPs -- cf. They saw What's Up, Doc? three times.

The precise way in which capitalised expressions are marked is subject to some variation, but the above examples illustrate a very common practice. Each word in a capitalised NP or clause has a capital letter except for short transitive prepositions (such as of, in, on), coordinators and, under certain conditions, the articles. The latter have a capital when part of the official title of a publication (such as The Times) or the official name of an institution (e.g. The European Union), but not in reference to holders of offices (the Bishop of London, the Queen) or when not part of the official title (the New Scientist). With an increasing number of compound proper nouns invented as product or business names, initial capitals appear in separate bases within the word even when there is no hyphenation: PetsMart, WordPerfect.

D Semantic categories

Capitalised expressions are used to refer to or denote a great range of different kinds of entity: indeed there would seem in principle to be no limit to it. Many are personal names, where surnames, given names and initials are capitalised (Jane Austen, T.S. Eliot). A personal name may be preceded by a capitalised appellation, abbreviated or not (Dr Jones, Professor Chomsky, Ms Greer, General Noriega, Rabbi Lionel Blum). Capitalisation is also used with the names of places (London, Steeple Bumstead), a geographical or topographical feature (the Thames, the Black Forest, the Gulf Stream), a monument or public building (the White House, the Cenotaph), an organisation (the Home Office, Amnesty International, Shell, Dolland and Aitchison), a political or economic alliance (the European Union), a country, nation or region (Great Britain, Scotland, Tyneside), languages and peoples (English, Chinese), historical or cultural periods or events (the Renaissance, the South Sea Bubble), social or artistic movements (Chartism, Decorated style), days of the week, various specials days, and months (Tuesday, Christmas Day, September), deities (God), honorifics (Her Majesty), trademarks (Coca-Cola), computer software (Word, Emacs), a kind whose name is taken from a proper name (a Chevrolet, an Oscar, a Boeing 747), and more. Capitalisation is commonly accompanied by italicisation or quotation marks in the titles of published and artistic works, as described in #6 above.

Common nouns denoting roles or institutions are often capitalised when used in combination with the definite article in reference to a particular individual or entity:

<REC:57> [2]

[2] i Shortly afterwards, the Bishop ordered a pastoral letter to be read.

ii I hear the University has increased its student intake again.

These may be contrasted with such examples as In those dioceses, the bishop has considerable autonomy or I'm told the oldest university is Fez, in Morocco. With such expressions as the board of directors or the chief executive the choice between capital and small letters tends to reflect perspective: they will be capitalised when used by members of the company, especially in official material, but commonly not when used by outsiders.

8 Word-level punctuation

8.1 Word boundaries

Word boundaries are marked by space, immediately adjacent to the word or separated from it by one or more punctuation marks. Opening quotation marks, parentheses and square brackets are located between the space and the left boundary of the word, other punctuation marks between the right boundary of the word and the following space. The dash is exceptional among the higher-level punctuation marks in that it is immediately adjacent to both the word on its left and the one on its right or is separated by space from both (as in the style used in this book). These points are illustrated in sentence [1i], whose ten (orthographic) words are listed separately, in abstraction from the higher-level punctuation, in [1ii]:

<REC:48> [1]

[1] i The vice-consul -- Ed's `companion' -- hasn't (I'm told) seen Oklahoma! yet.

ii the vice-consul Ed's companion hasn't I'm told seen Oklahoma yet

The quotation marks in [i] enclose a single word, but this is incidental and they are not part of the word itself. Similarly, the exclamation mark is part of the punctuation of a proper name which happens to contain just one word, but need not (for the distinction between proper names and proper nouns, see Ch. 5, #@@). The first word is listed in [ii] as the because the capital letter in [i] is a matter of sentence punctuation; the initial capitals in Ed's, I'm and Oklahoma, however, are inherent features of these words.

8.2 Hyphens

8.2.1 Some initial distinctions

There are two hyphen indicators, an ordinary hyphen and a long hyphen, which is realised by an en-rule and of very limited distribution. As noted in #1.2, when the en-rule character is not available (as in handwriting or material written on a conventional typewriter), the functions of the long hyphen are taken over by the ordinary one.

At the first level we can distinguish three uses of the (ordinary) hyphen:

<REC:50> [2]

[2] i To join grammatical components in complex words: the hard hyphen

ii To mark a break within a word at the end of a line: the soft hyphen

iii To represent in direct speech either stuttering (`When c-c-can I come?') or exaggeratedly slow and careful pronunciation (`Speak c-l-e-a-r-l-y!')

The terms `hard' and `soft' are taken from word-processing: a hard hyphen is introduced into a document by a keystroke, while a soft one is inserted by the word-processing program. We will devote most of our attention to the hard hyphen. Nothing further need be said about use [iii], but a few comments should be made about [ii].

D The soft hyphen.

The purpose of this hyphen is to allow the amount of space between words on different lines to be relatively uniform. It occurs especially, but by no means exclusively, in typeset and right-justified text, and in these cases the division is made by the printer or the word-processing program. Normally, the division is made in a manner designed to facilitate reading, based on a mixture of morphological, phonological and purely visual criteria. The precise rules used will depend on the publishing house style or the word processing system, but there is also significant regional variation, with AmE tending to favour breaks at syllable boundaries (e.g. democ-racy) and BrE those at morphological or etymological boundaries (demo-cracy). Regional differences are likely to diminish with the increasing internationalisation of publishing, and the increasing tendency to rely on automatic systems for word separation provided by word processing systems, which are for the most part developed in the United States and not redesigned to take account of other countries' traditional hyphenation practices.

Divisions are not normally permitted within monosyllabic words, or within components that have (or could have) a hard hyphen at one of their boundaries (thus school-master, but not *schoolmas-ter). They also tend to be disallowed if they would yield a unit spelt the same way as some unrelated word (*of-ten, *the-rapist, or *putt-ing, as a form of the verb put).

8.2.2 Inherent and long hyphens

Among the hard hyphens we can distinguish (though not always sharply) between those that are lexical and those that are syntactic. The lexical hyphens are found in morphologically complex bases formed by processes of lexical word-formation, as described in Ch. 19. Syntactic hyphens join forms together when they occur in a specific syntactic construction, namely as attributive modifier in a nominal.

D Lexical hyphens

The hyphen may join the bases of a compound (bee-sting) or the affix and base of a derivative (ex-wife).

E Compounds

We have noted that the component bases of what from a morphological point of view is a compound may be written in three ways: juxtaposed (blackboard), hyphenated (stage-manager) or separated (Nissan hut). It is an area where we find a great deal of variation, with respect either to particular items (e.g. startingpoint, starting-point, or starting point) or to different compounds of the same morphological type (e.g. dressmaking vs letter-writing). There are two general tendencies to be noted. First, compounds which are long established are more likely to be written in juxtaposed format than more recent ones (compare dishwasher and chip-maker). Second, AmE tends to use hyphens somewhat less than BrE.

To a large extent, the choice between the three formats has to be specified individually in the dictionary. We illustrate in [3], however, a range of morphological types where hyphens are found in most or a high proportion of cases (the categories and concepts invoked here are explained in Ch. 19):

<REC:51> [3]

[3] i compound adjective bone-dry, oil-rich, snow-white, red-hot

ii contains transitive prep free-for-all, sister-in-law, serjeant-at-arms

iii intransitive prep as 2nd base break-in, build-up, drop-out, phone-in, stand-off

iv coordinative compound Alsace-Lorraine, freeze-dry, murder-suicide

v nominal compound + @ed one-eyed, red-faced, three-bedroomed

vi numerals and fractions twenty-one, ninety-nine, five-eighths

vii dephrasal compounds cold-shoulder (V), has-been (N), old-maidish

viii verb with noun as 1st base baby-sit, gift-wrap, hand-wash, tape-record

ix 1st base is letter-name H-bomb, t-shirt, U-turn, V-sign

x rhyming-base compounds clap-trap, hoity-toity, teeny-weeny, walkie-talkie

Type [iii], with the preposition at the end, may be contrasted with the type where it occupies first position and is generally juxtaposed: downside, outbreak, uptake. Words of type [v] are themselves derivatives (formed at the top level by suffixation of @ed), but they contain a compound base whose components are joined by a hyphen, and the same applies with old-maidish in [vii]: the hyphen in such cases does not mark the top level morphological division within the word. Hyphens are used in compound numerals expressing numbers between ``21'' and ``99''. With fractions there is variation between hyphenated forms (two-thirds) and separate words (two thirds); hyphens are not used if either the denominator or the numerator contains a hyphen (thirteen twenty-eighths), but otherwise hyphenation is more likely than separation when the denominator is greater than ``4''.

There are also particular bases which always or usually take a hyphen: great, as used in kinship terms, always does (great-uncle), while self and the combining form pseudo@ usually do (self-knowledge, pseudo-science).

E Derivatives

Suffixes are almost invariably juxtaposed, whereas there are a number of prefixes which in BrE are usually or commonly hyphenated: non@, pre@, post@, pro@, anti@, ex@, co@, mid@ (but compare such semantically specialised forms as nonentity, midnight, etc.). It is also the usual practice, in both BrE and AmE, to insert a hyphen where there might otherwise be a danger of confusion caused by successive vowel letters or repeated sequences (re-elect, de-emphasise, de-ice, re-release), or to distinguish a word where the prefix is used in its productive sense from one where it is no longer analysable as a separate component (e.g. re-form, ``form again'', from reform; or re-cover, ``cover again'', from recover). Prefixes are generally hyphenated before a base beginning with a capital letter: un-American.

E Conflicts of scope

In general a space marks a division at a higher level of constituent structure than a hyphen. The immediate constituents of oil-rich kingdom, for example, are oil-rich and kingdom not oil and rich kingdom. There are cases, however, that depart from this pattern:

<REC:52> [4]

[4] i inter- and intrastate pre- or post-industrial Australian-born and -educated

ii ex-army officer non-mass market pro-United States mass market-style

The coordination of prefixes, as in the first two examples of [i], is not uncommon; that of bases, as in the third, much less so -- and one occasionally finds this latter type without the second hyphen. In non-coordinative examples like those in [ii] some writers resolve the conflict between punctuation and scope or constituent structure by inserting extra hyphens (e.g. ex-army-officer); note, however, that in unselfconscious (from the base self-conscious) the problem is solved by using juxtaposition instead of hyphenation.

D Syntactic hyphens

Hyphens are also used to join into a single orthographic word sequences of two or more grammatical words functioning as attributive modifier in the structure of a nominal:

<REC:53> [5]

[5] i a well-argued reply a Bradford-based company a hard-drinking man

ii a four-point plan a fast-food outlet the small-business sector

iii out-of-town shopping the Hobart-to-Sydney classic a creamier-than-average taste a never-to-be-repeated offer the what-was-it-all-for? factor

The forms in [i] have a past participle or a gerund-participle as their final element, those in [ii] have a noun, and those in [iii] have forms which do not freely occur as attributive modifiers (see Ch. 5, #@@). Hyphenation is not used with AdjPs (a very old cat) nor, generally, with past participles and gerund-participles modified by an adverb in @ly (a beautifully executed performance, rapidly diminishing returns). With noun-headed modifiers, hyphens are used very commonly but by no means invariably -- compare an affirmative action policy, city council elections; they are not used with proper names: United States agents. The hyphen explicitly indicates that the linked items form a constituent and hence may remove potential constituent structure ambiguities. Small-business sector, for example, means ``sector comprising small business'', while small business sector can mean either that or ``business sector of small size''.

The syntactic hyphen is used with expressions in modifier function that either do not occur elsewhere in the same grammatical form (The plan contains four points/*point, The company is based in Bradford / *Bradford-based) or occur elsewhere without hyphens (The reply was well argued, We shop out of town).

D The long hyphen

This is used instead of an ordinary syntactic hyphen with adjuncts consisting of nouns or proper names where the semantic relation is ``between X and Y'' or ``from X to Y'':

<REC:54> [6]

[6] a parent--teacher meeting a French--English dictionary the 1914--18 war
It can be used with more than two components, as in the London--Paris--Bonn axis. It is also found with adjectives derived from proper names: French--German relations. There is potentially a semantic contrast between the two hyphens, as in the Llewelyn--Jones Company (a partnership) vs the Llewelyn-Jones Company (with a single compound proper name). This hyphen is also used in giving spans of page numbers, dates or the like: pages 23--64, Franz Schubert (1797--1828).

8.3 The apostrophe

The apostrophe has three distinguishable uses:

<REC:49> [7]

[7] i genitive: Kim'sdog's dogs' Moses' *it's

ii reduction: can't there's fo'c's'le ma'am o'clock

iii separation: A's PhD's if's 1960's

The apostrophe occurs as a case marker on the last word of genitive NPs, except those with one of the core personal pronouns as head (thus its former shape, not *it's former shape; This is yours, not *This is your's). There are two types of genitives: 's genitives (Kim's, dog's) and bare genitives, marked in writing by the apostrophe alone and homophonous in speech with the non-genitive counterpart (dogs', Moses'); for the choice between the two types, see Ch. 18, #@@.

The most common uses of the abbreviating apostrophe mark the negative inflectional forms of auxiliary verbs, as in can't, and the cliticisation of auxiliary verbs, as in There's no time: see Ch. 18, ##5.5, 6.3@@). Fo'c's'le is an alternative spelling of forecastle, one which matches the pronunciation. Ma'am is related to madam, but there are differences of use/meaning between the two forms. The apostrophe in o'clock reflects the etymology (of the clock), but there is no alternation with the full form in the current language. The apostrophe does not normally appear at the left or right boundary of a word in established spellings: such forms as 'phone or 'flu are now clearly archaic. The form 'n' is an abbreviation of and used in a small number of fixed expressions, mainly rock 'n' roll and fish 'n' chips. Omission of initial h (the 'ammer) or the final g of the gerund-participle suffix (huntin') is found in the representation of direct speech to indicate socially distinctive pronunciations.

A minor use of the apostrophe is to separate the plural suffix from the base, as in [7iii]; this occurs when the base consists of a letter (She got three A's in philosophy), certain kinds of abbreviation, a word used metalinguistically, or a numeral (see Ch. 18, #4.1.1@@).

8.4 The abbreviation full stop and minor reduction markers

D The full stop as a marker of abbreviation

The full stop is commonly used to mark an abbreviation -- in a broad sense of that term, covering certain kinds of contraction and acronyms. This use is subject to a great deal of variation. The omission of the abbreviation full stop is more common in BrE than in AmE, and more common in recent publications than in those of, say, twenty or thirty years ago. While there are certain kinds of reduced form where a full stop is categorically excluded, it is doubtful if there are now any cases where a full stop is required in all varieties and house styles.

The alternation is illustrated in [8] for various categories of abbreviation:

<REC:63> [8]

[8] i Gen/Gen. Smith Mr/Mr. Smith fig/fig. 3 5 kg/kgs/kg./kgs.

ii T S Eliot / T.S. Eliot JFK/J.F.K.

iii eg/e.g. cf/cf.RSVP/R.S.V.P.

iv FBI/F.B.I. pc/p.c.

v NATO/?N.A.T.O. radar/*r.a.d.a.r.

vi demo/*demo.

The abbreviations in [i] occur in a limited range of contexts: the first two with following proper names, the last two with numerals: we don't (normally) write *Smith was a fine Gen(.) or *We need one more kg(.) of sugar. There has been a tradition in BrE of distinguishing `abbreviations' from contractions: in the former the last part of the full word is missing, while in the latter at least the last letter of the full word is retained. One rule is then to have a full stop for the `abbreviations' (Gen.) but not for the contractions (Mr); this rule, however, is much less widely followed than it used to be, and BrE tends to favour Gen as well as Mr, etc. With measure terms there is variation as to whether the plural is marked with @s; in either case some styles specifically exclude a full stop with these terms. In [ii] we have initial letters of personal names. Where the surname follows, the version with stops (T.S. Eliot) is still much the more usual, whereas with full initials referring to a famous figure the stops are commonly omitted (JFK). Many abbreviations are based on phrases or words from foreign languages, especially Latin, as in [iii]; the version with stops is the more usual, though the other version is now by no means rare. The words in [iv--v] are formed by the process we have called `initialism': see Ch. 19, #2.2@@; those in [iv] are pronounced as sequences of letters, while those in [v] are acronyms, with the letters having their usual phonological values. Full stops are becoming somewhat marginal in acronyms consisting of capital letters, and are excluded in those consisting of small letters. Demo in [vi] illustrates the category of back-clippings (Ch. 19, #2.3.1@@), and again the full stop is inadmissible.

D Terminal full stop omitted after abbreviation full stop

The abbreviation full stop is part of an orthographic word and as such can be followed by higher-level punctuation marks. A terminal full stop, however, is suppressed after an abbreviation full stop to avoid a sequence of two full stops. Compare:

<REB:78> [9]

[9] i a. Why did she go to Washington, D.C.? b. She lives in Washington, D.C.
D Asterisk and dash

The asterisk or dash can be used to reduce taboo words (though such reductions are much less common than they used to be); the dash is also found in other types of reduction, for example of names:

<REC:64> [10]

[10] F*** off! B-- off! Count von O--
8.5 The slash

We include the slash among the word-level indicators since it usually occurs without flanking spaces:

<REC:65> [11]

[11] i director/secretary flat/apartment and/or he/she

ii the June/July period staff/student relations

In [i] it indicates an ``or'' relationship (an inclusive ``or'', in the sense of Ch. 15, #@@), while in some styles it occurs as an alternant of the long hyphen, as in [ii]. A special case is s/he, ``she or he'', equivalent to the above he/she; the slash in effect indicates that the initial s is optional, and hence is here doing duty for a pair of parentheses, which are not permitted in initial position -- *(s)he. The slash is also used in a few abbreviations, such as a/c and c/o.