The databases of Kolson (1960) and Moe, Hopkins, and Rush (1982) were combined to form a corpus of words, pronunciations, and spoken word frequency by kindergarten and first grade children. The original Kolson (1960) database includes the spoken word frequency of 3,728 words used by kindergarten children at home, at school, and elicited by pictures. The original Moe et al. database includes the spoken word frequency of 6,412 words spoken by first grade children during a child-examiner interview.
Words from Kolson (1960) and Moe et al. (1982) were combined to form one corpus: The Child Mental Lexicon (CML). The log base 10 of the combined raw frequency counts of each word in Kolson (1960) and Moe et al. (1982) were calculated. A constant value of 1.0 was added to each log frequency, to guard against missing data resulting from the undefined log of 0. The log base 10 (plus the constant value of 1) of the combined raw frequency counts are available from the CML. For words appearing in both Kolson and Moe et al., the frequencies were summed for the CML (i.e. log base 10 of Kolson frequency + Moe et al. frequency). Homophonous word forms were collapsed into one form for the CML, so that only the higher frequency form was included in the corpus. All inflected forms of a word (e.g., running) were eliminated from the database only if the uninflected form also appeared in the database (e.g., run). All ungrammatical word forms were also deleted (e.g. deers, mostest). After eliminating the aforementioned forms and combining duplicate words across the databases, 4,832 words remain in the CML.
The pronunciation of each word in the CML was obtained either from the Hoosier Mental Lexicon (HML; Nusbaum, Pisoni, & Davis, 1984) or from a dictionary that provides the phonemic transcription of English words (Longman, 1983). Pronunciations not available from the HML or Longman dictionary were entered manually by Native English speakers proficient in phonemic transcription. All pronunciations were entered into the corpus using the computer-readable phonemic transcription referred to as Klattese.
Kolson (1960). The vocabulary of kindergarten children. Unpublished doctoral dissertation, University of Pittsburgh, Pittsburgh.
Longman (1993). Longman dictionary of American English. White Plains, NY: Longman.
Moe, A.J., Hopkins, C.J. & Rush, R.T. (1982). The vocabulary of first grade children. Springfield, IL: Charles C. Thomas Publisher.
Nusbaum, H.C., Pisoni, D.B., & Davis, C.K. Sizing up the Hoosier mental lexicon: Measuring the familiarity of 20,000 words (Research on Speech Perception, Progress Report No. 10). Bloomington: Indiana University, Psychology Department, Speech Research Laboratory.
Storkel, H.L. (2004). Methods for minimizing the confounding effects of word length in the analysis of phonotactic probability and neighborhood density. Journal of Speech, Language, and Hearing Research, 47, 1454-1468.