alt="w w Superscript upper R"/> notation obscures the way that the CFG in (7) actually works. Importantly, this CFG generates
abaaba not by combining the prefix
aba with the suffix
aba, but rather, by combining the infix
baab with the surroundings
a__
a, using the first rule shown in (7). An infix‐based analysis of
, however, is no help.
5.5 Beyond Context‐Free Grammars
From the very outset there were doubts about about whether CFGs could form the basis of a theory of natural language syntax. Chomsky (1956, Section 4) argued that even if the generative capacity of CFGs (unlike FSGs) turned out to be sufficient for English (a question he left open), the resulting grammars would be unreasonably complex.16 Relatedly, one motivation for considering Type 1 rules in the first place was the recognition that, in practice, linguists found uses for contextual restrictions on rewrite rules, for example to state selectional restrictions (Chomsky, 1959, p. 148; Chomsky and Miller, 1963, p. 295; Chomsky, 1963, p. 363).
Furthermore, it has more recently been discovered that CFGs might be insufficient, even on the straightforward basis of generative capacity, to describe some natural languages. The best‐known case is a construction in Swiss‐German (Shieber, 1985) that exhibits crossing dependencies of the sort exhibited by in (6); these contrast with the nested dependencies exhibited by , which are neatly handled by CFGs. See e.g. Pullum (1986), Partee et al. (1990, pp. 503–505), Frank (2004), Kallmeyer (2010, pp. 17–20) and Jäger and Rogers (2012) for useful discussion; ideas closely related to the crucial point about Swiss‐German can be traced back to Huybregts (1976, 1984) and Bresnan et al. (1982).
But despite these reasons for looking beyond CFGs, Type 1 or context‐sensitive grammars (CSGs) have not proven to be a particularly useful tool for linguistics; they have turned out to be “too close” to unrestricted rewriting grammars. While CSGs can generate the crossing‐dependency patterns of and Swiss‐German, their generative capacity extends far beyond this. For example, there is even a CSG that generates .17 The sense that this stringset seems not at all “language‐like” plausibly stems from the property of CSGs that caused Chomsky the most concern initially: contextually restricted rewrites produced structural descriptions that could not be interpreted along the lines of immediate constituent analysis. Immediately after showing that CSGs could generate stringsets that no CFG could generate, Chomsky (1959, p. 148) surmised that “the extra power of grammars that do not meet Restriction 2 appears …to be a defect of such grammars, with regard to the intended interpretation.” The underlying issue here is the absence of any meaningful kind of intersubstitutability at the core of CSGs: what distinguishes a Type 1 grammar from FSGs and CFGs is exactly the fact that the substrings derivable from a symbol in the context might not be derivable from in another context.
Chomsky's discussion of the undesirable properties of CSGs focuses on their ability to, in effect, reorder constituents. For example, a permuting rule “CD DC,” which does not itself satisfy Restriction 1 (recall that the Type 0 grammar in Figure 5.1 contains rules like this), can be mimicked by a sequence of Type 1 rules “CD XD XC DC” (Chomsky, 1959, p. 148; Chomsky, 1963, p. 365). Chomsky considers using this kind of reordering to derive a question form such as will John come in a way that relates it to its corresponding declarative John will come. The CSG in (8) shows how this would work. The first group of rules shown in (8) generates the declaratives John will come and John comes as shown in (9); these are all context‐free rules, and notice that they correctly capture the intersubstitutability of will come with comes, via the nonterminal Pred. The second group of rules in (8) serves to turn “NP Aux” into “Aux NP”; in particular, the derivation in (10) uses them to derive “Aux NP come,” and then eventually will John come, from the (canonically ordered, intuitively) “NP Aux come.”
1 (8)S NP PredNP JohnPred Aux VPred comesAux willV comeNP Aux X AuxX Aux X NPX NP Aux NP
1 (9)
2 (10)
The fact that each step of the derivation in (10) rewrites only a single nonterminal symbol ensures that we can construct a tree structure that indicates which parts of the eventual string were derived from which nonterminal symbols.18 (This would not be possible for a derivation that implemented the reordering directly with the rule “NP Aux Aux NP”; recall Figure 5.1.) But the resulting tree structure says “that will in this sentence is a noun phrase …and that John is a modal auxiliary, contrary to our intention” (Chomsky, 1963, p. 365). This result is undesirable because we do not want will to be in general intersubstitutable with John, or the other strings that we would expect to be derivable from the nonterminal symbol NP if this tiny grammar were expanded. So the labeled constituency relationships that can be read off the trees associated with Type 1 derivations are not interpretable as statements about intersubstitutability, as they are in more restricted grammars. In other words, Restriction 1's requirement that each derivational step rewrites only a single nonterminal symbol turned out to be insufficient to capture the important linguistic intuitions regarding categorization and intersubstitutability that underlie immediate constituent analysis.
In the light of more recent developments, the difficulties raised by the issue of reordering can be seen as stemming from the tight connection between intersubstitutability (in the sense that can be captured in rewriting systems of the sort Chomsky was exploring) and linear contiguity. Only linearly contiguous strings of symbols have the chance to be placed in an equivalence class. While familiar, there is nothing necessary about this connection: a sub‐part of a string might belong to a class of intersubstitutable subexpressions without being contiguous. In this case, the relevant sub‐parts will not themselves be strings, but will be tuples of strings. To illustrate, it suffices to consider tuples of size two, i.e. pairs of strings that are co‐dependent, and together constitute an expression belonging to a meaningful grammatical category, but need not be pronounced together. For example:
1 (11)The pair (will, come) and the pair (must, leave) are intersubstitutable, in the sense that we can replace the former with the latter in will the students come to produce must the students leave. (As well as in John will come to produce John must leave.)The pair (John, to be tall) and the pair (the girl, to win) are intersubstitutable, in the sense that we can replace