Academic Vocabulary Lists



The Academic Word List [AWL] (Coxhead, 2000) has been very useful for teachers and learners since it was released in 2000. Nevertheless, we believe that the new academic vocabulary lists that are available here are more accurate and useful, in at least four important ways. The following is a short summary of the discussion from our 2013 article in Applied Linguistics.

First, the AWL based on an older, much smaller corpus -- just 3.5 million words of academic texts from the 1990s. Ours is based on more than 120 million words of academic texts in the Corpus of Contemporary American English, which contains texts as recent as 2015. Our academic corpus is composed all 86 million words of academic journals in COCA, as well as 26 million words from academically-oriented magazine articles. The following table shows the size of the different sub-genres, with the number of words in millions is shown in parenthesis:

History (14.3 million words) Education (8.5) Law and political science (12.5)
Social Science (16.7) Humanities (11.1) Philosophy, religion, psychology (12.5)
Science and technology (22.8) Medicine and health (9.7) Business and finance (12.8)

Second, our word lists provide better coverage of academic English. The 570 "word families" in the AWL cover 7.2% of the words in the COCA academic texts, but the top 570 word families in our list cover 14.0% -- nearly twice as much. In a "neutral" corpus -- the 32 million words of academic and semi-academic texts in the British National Corpus -- the AWL covers 7.1% and our list covers 14.0% -- again nearly twice as much. Part of this difference is due to the fact that the AWL "sits on top of" the General Service List (GSL), which already has many high-frequency words, but there are other factors at play as well.

Our academic list is also very much oriented towards just academic, compared to other genres. For example, it covers 14.0% of academic texts in COCA, 7.3% of the 85 million words of newspapers in COCA, and just 3.4% of the 86 million words of fiction texts in COCA. That's exactly what you'd want -- a list that is oriented mainly towards academic, rather than a general word list for all types of English.

Third, we believe that our lists are more usable -- they provide the data in a number of different formats (not just "one size fits all" word families), which are oriented towards different needs. You can download the data for the 3,000 "general academic" words (sample), the words grouped into AWL-like "word families" (sample), and the top 20,000 words in COCA Academic overall (sample).

We also provide a wealth of information in the word families, which is not available in the standard AWL families (see a sample of our lists):

  1. The words are grouped by lemma (e.g. [decide] = {decide, decided, deciding}, etc), which eliminates clutter. (For example, most people don't really need to see two separate entries for decide and decides).

  2. There are different entries by part of speech, so you know, for example, whether abstract is used more as a noun, verb, or adjective.

  3. The words are also color-coded to let you know whether the word is a "general" academic word, or whether it is a more "technical" one that occurs in just a few sub-genres.

  4. And most importantly, the entries are listed in order of frequency, to help you focus more on words that you will actually see in the real world -- rather than just having a mass of unorganized words, as with the traditional AWL word families.

Fourth, our word lists are integrally tied into the COCA corpus, so that you can see a great deal of information about the meaning and usage of each word -- its definition, the frequency in each of the nine academic sub-genres (e.g. Medicine, Science, or Business), the collocates (nearby words, which provide great insight into meaning and usage), and many re-sortable concordance lines for each word, which show the patterns in which the word occurs. The regular AWL lists are integrated into the Compleat LexTutor site, but not to the same degree that ours are with COCA.