By ESL writers vs. by native writers: a corpus analysis of native and non-native speakers' written English

Imran Ho
Linguistics Section
School of Languages
University of Otago
Dunedin, New Zealand

Deep South v.2 n.3 (Spring 1996)

Copyright (c) 1996 by Imran Ho

1. Introduction.

This paper presents the results of a preliminary corpus study of the preposition by in the Fiction Section (Text Category K - L) of the Wellington Corpus of Written New Zealand English (NZE) and a corpus of Malaysian Short Stories (ME). The aim of the study is to contrast prepositional usage between English as a native language variety (ENL) and English as a second language variety (ESL)[1].

The distinction and classification of speech communities as ENL and ESL according to the status and function of English within the speech communities is widely accepted [2](Platt, Weber & Ho, 1984; Kachru, 1986, 1992; Smith, 1992; Cheshire, 1991; Widdowson, 1994). Apart from the different status, roles and functions of English in the different speech communities, linguists have also attempted to describe and compare the features of these varieties at different levels of linguistic analysis. However, Schmied (1990:259) notes that "the analysis of non-native Englishes (ie. ESL/EFL varieties) have concentrated on the more salient features of pronunciation, loan words and idiomatic expressions while lexical-grammatical analysis is still underdeveloped". As an attempt to correct this imbalance, this paper focuses on the prepositional item by which occupies an intermediate position between the lexical and grammatical [2] and thus provides an interesting testing ground for investigating lexico-grammatical variations in intervarietal studies.

2. Data selection and classification.

The Oxford Concordance Programme was used to extract the relevant data from the two corpora. Subsequently, each token of by was examined in the context of the sentence it occured in and assigned to a semantic category [2]. The semantic categorisation of preposition meanings is basic to the analysis to be carried out and thus deserves further discussion here. While it is generally accepted that prepositional items are highly polysemous (Jackendoff, 1983; Brugman, 1988; Taylor, 1992, 1995), 'the claim that a word is polysemous raises questions concerning the number of different senses, and the criteria by which these are established and demarcatedŐ (Taylor, 1995:18). A survey of some major reference dictionaries will reveal the considerable discrepancy in the number of 'senses' attributable to the preposition by (see Table 1).

Table 1.

The number of different senses of the preposition by.

Dictionary                                    No. of senses
Oxford Advanced Learners Dictionary               18
Collins Cobuild English Dictionary                19
Random House - Websters (CD)                      15

The scenario is not too different with reference grammar books. Three reference grammar books surveyed (Quirk et. al, 1985, Celce-Murcia, 1983, Downing & Locke, 1993) also vary in the number of prepositional meanings ascribed to by and other prepositions. However, they all share the view that prepositional meanings are difficult to capture and that their semantic boundaries are difficult to define and classify (Downing & Locke, 1993:595; Quirk et. al, 1985:695). Quirk et. al. suggest that it might be better in some cases "to think of a range or spectrum of meaning, first as a single category, then as broken up into separate overlapping sections" (ibid.). For instance, they establish a Means/Agentive spectrum of prepositional meaning and further distinguish five different meanings in that spectrum, namely Manner, Means, Instrument, Agentive, and Stimulus.

Lexical-semantics analyses also have differing views on the functions and meanings of prepositions. For instance Rauh (1995:99) notes that "discrepancies and inconsistencies have almost become an integral characteristic of the description of English prepositions". Even within a single theoretical framework, for instance cognitive grammar, it is possible to have several competing analyses of a single preposition (cf. Vandeloise, 1994).

However, the discrepancies and divergences do not necessarily mean that a study of the distribution of the semantics of a preposition is doomed from the start. A comprehensive yet restricted scheme of categorisation can still be derived. In recent times, the lexical-semantics analysis of prepositions seems to have taken centre stage within cognitive grammar (Schlesinger,1978; Jackendoff, 1983; Brugman, 1988; Dirven, 1993; Taylor, 1995). The insights and analysis afforded by cognitive grammarians might yet provide a comprehensive approach to word meanings.

For the purpose of this paper, seven semantic categories corresponding to conceptual domains are introduced. In addition, each of these categories may have a number of finer senses. While the categories are supposedly conceptual in that they are meant to correspond to the way we perceive the world (Dirven, 1993, 1995), the finer senses within each category could be motivated by selectional restrictions and syntactic considerations. The advantage of a cognitive approach is that the extensional relationship between the different categories can be established. However, in the present paper, these relationships will not be explored. As pointed out earlier, some of these senses overlap and it is sometimes difficult to determine which category a particular usage falls under. Nonetheless, they provide a sufficient set, forming general clusters of meaning by which to analyse the data. A brief description and core examples of each concept is presented below:

2.1 Spatial.

Spatial uses of by characterises spatial dispositions (cf. Taylor, 1993). Three distinct senses of the spatial use of by will be considered: (a.) proximity e.g. The house is by the lake. (b.) path [2] e.g. They drove by the post office. (c.) (subparts) locative e.g. The murderer was hung by the neck; He grabbed the bicycle by its handlebars.

2.2 Temporal.

The temporal uses of prepositions are often regarded as extensions of their spatial uses. Corresponding to the static-proximity function and the dynamic-path function of spatial by, Dirven (1993) sees two distinct senses of the temporal by: (a.) connection in time ( i.e. an event is located in a period of time - by day meaning during the day) e.g. He slept by day and worked by night. (b.) connection with time-point /relative to a point in time (i.e. specifying an end-point) e.g. I will return the book by Monday.

2.3 Mode.

Mode uses of by establishes the relationship between two states/events/ processes that are linked or connected by a medium which serves as the means or the manner by which one is accomplished or achieved. While spatial and temporal uses answer the question where and when respectively, mode uses answer the question how. (a.) means e.g. He usually goes to school by bus (b.) manner e.g. He gained entry into the compound by bribing the guards

2.4 Cause / Agency.

In contrast to Mode, Cause/Agency addresses the question of who and what rather than how[2]. The concept of Cause, as defined by Dirven, embodies "a situation (S1) which triggers, causes or initiates another situation (S2) where the term 'situation' includes states, events, processes and activities. The relation between S1 and S2 is such that S2 is the result of S1; [or] to put it another way, S2 would not have come about had it not been for S1".[2]

Quirk et. al. also makes a distinction between the animate and inanimate agents (e.g. a storm, a strong gale). (a.) animate agency e.g. The village was destroyed by the enemy. (b.) inanimate agency e.g. The crops were ruined by frost. This distinction will be maintained in the present analysis, as it is based on a selectional criteria rather than a conceptual one. The concept of Cause/Agency is closely related to the grammatical role of by as marker of the passive subject. However, conceptually, agency is not necessarily tied in to a passive construction as is shown by the examples below: (1.) He likes the play by Chomsky.[2] (2.) Instantly, by some perverse chemistry of his body or nervous system, he feels tired and drowsy. (Dirven (1993))

2.5 Circumstance.

The concept of Circumstance relates two situations/events/states, where one situation, state, event, process is a condition or contingency of the other e.g. He was singled out by sheer bad luck.

2.6 Nature (Area).

The conceptual domain of Nature (or Area in Dirven, 1993) relates two states/entities where the one entity governs or gives the other its properties e.g. He is a kind person by nature; he is a lawyer by profession.

2.7 Estimation/Numeration.

The concept of Estimation and Numeration extends over five senses, namely gradual quantification, quantity, increment, measurement and mathematical. The various senses expound the idea of measurement and estimation. The senses are illustrated by these examples respectively: a. He put the jig-saw together piece by piece. b. The coats are exported by the dozen. c. Since arriving in New Zealand, Danny has grown by two inches. d. The plot for sale is 50 feet by 100 feet. e. Multiply 13000 by 570 and you will get the answer.

The concepts above are not meant to be exhaustive uses of by. By also functions as adverb, phrasal verb particle, part of a complex preposition (by virtue of), and in idiomatic phrases (by and large). These are excluded from the present study.

3. Results and Discussion

The ME subcorpus used for this study contains full-text short stories. The size of the texts currently stands at 49,114 words. This subcorpus contains 157 tokens of by: eleven of which are used in idiomatic expressions/phrases, four as adverbs and two as particles in phrasal verbs. These fall outside the scope of the present paper. The distribution of by in the remaining 149 occurrences is shown in Table 2. The relative frequency of by in the corpus is 30.34 per 10,000 words.

Table 2.

The distribution of senses of by in the ME subcorpus. MEANINGS ME
Conceptual Domain                           Senses              Raw%
Spatial               Proximity                 5                3.4
                      Path                      1                0.7
                      Locative                  2                1.3
Temporal              Connection in time        -                 -
                      Rel. to a point in time   15               10.1
Mode                  Means                     5                 3.4
                      Manner                    16               10.7
                      Cause/Agency              92               61.7   
                      Circumstance               6                4.0
                      Nature                     1                0.7
Estimation/numeration Grad. quant.               4                2.7
                      Quantity                   1                0.7
                      Increment                  1                0.7
                      Measurement                -                 -
                      Mathematical               -                 -
TOTAL                                          149                100

By and large, by shows an overwhelming tendency to indicate agency. The agentive uses of by account for 61.7% of the occurrences of by. This is followed by the use of by in the mode category (14.1%). Surprisingly, the spatial use of by (which is regarded as the basic sense of the item) ranks fourth (5.4%) behind temporal use (10.1%) which is supposedly derived from the spatial sense.

The second set of data is derived from the Fiction Section of the Wellington Corpus of written New Zealand English. The fiction section is approximately five times the size of the ME subcorpus, with total words of 311,298. This subcorpora contain 683 tokens of by, 66 of which are used as particles, adverbs etc. In the remaining 617 occurrences of by, their semantic distribution is shown in Table 3. The relative frequency of the preposition by is 19.82 per 10,000 words.

Table 3.

The distribution of senses of by in the Wellington Corpus of written New Zealand English MEANINGS NZE
Conceptual            Domain             Senses             Raw%
Spatial               Proximity          57                  9.2
                      Path                3                  0.5
                      Locative            8                  1.3
Temporal              Connection in time  1                  0.2
                      Rel. to a point    52                  8.4
                              in time
Mode                  Means              31                  5.0
                      Manner             54                  8.8
Agency/Cause          animate/personal  358                 58.0
                      Circumstance        7                  2.8
                      Nature             12                  1.9
Estimation/numeration Reduplication      15                  2.4
                      Quantity            4                  0.6
                      Increment           2                  0.3
                      Measurement         -                   -
                      Mathematical       32                  0.5
TOTAL                                   617                 100.0

The distribution of by in NZE subcorpus is in some ways similar to that in the ME subcorpus. Agentive uses of by predominate, accounting for 58.0% of the occurrences. However, Mode usage is not as significant in the NZE data (13.8%). In contrast with the ME data, spatial uses (11%) rank higher than temporal uses (8.6%). The percentages for the different senses of by in the New Zealand subcorpus and the Malaysian subcorpus is presented in graphic form in Figure 1.

4. Conclusions

The following conclusions can be made regarding the results and data presented in Table 2 and 3: (a.) The relative frequency of by is higher in ME than in NZE. (b.) By is most frequently used to indicate the agentive. The agentive use of by accounts for almost half the total of all occurences in both subcorpora. (c.) There are some apparent differences in the semantic distribution of by between the NZE data and the ME data. Spatial usage ranks higher than temporal usage in NZE while in ME, by is more frequently used in the temporal than the spatial dimension.

The preliminary results of this inquiry into the lexical-semantics of two varieties of English seem promising. There appear not only to be a higher frequency of by in ME (perhaps case of over use (cf. Granger, 1994)) but also that by in ME is put to different uses in contrast to NZE. Having said that, the framework adopted for the present inquiry might need refining to cater for borderline cases which have been assigned to only one category in this study. However, this will not affect the findings in some of the conceptual domains (e.g. spatial and temporal domains where the different usages are quite clear-cut).

Explanations for the observations above remain to be established at this initial stage of the research. Some possible explanations with respect to the differences could be attributed to L1 interference, the pedagogical practises in the ESL environment or even ESL pragmatics principles.[3] These suggestions fall within different theoretical positions with respect to explaining intervariety differences. Explanations based on L1 interference and 'faulty learning' are aligned to the 'inner processing theories' of non-native speakers' language, while the differences in semantic and pragmatic principles fall into sociolinguistics and discourse theories.


