Academic
Undergraduate Degree in Computer Science, Linguistics and German
A Statistical Analysis of Stylistics and Homogeneity in the English Wikipedia
Data
Original Corpora:
BA-corpus word-frequency distribution
LS-corpus word-frequency distribution
SE-corpus word-frequency distribution
RAND corpora:
RAND-BA-corpus word-frequency distribution
RAND-LS-corpus word-frequency distribution
RAND-SE-corpus word-frequency distribution
RAND2 Corpora:
RAND2-BA-Corpus word-frequency distribution
RAND2-LS-Corpus word-frequency distribution
RAND2-SE-Corpus word-frequency distribution
Figures
7.1 Word-length frequency distributions for the original corpora
BA-Corpus, LS-Corpus, SE-Corpus
7.2 Sentence-length frequencies for the original corpora
BA-Corpus, LS-Corpus, SE-Corpus
7.3 Sentence-length frequencies for the RAND corpora
RAND-BA-Corpus, RAND-LS-Corpus, RAND-SE-Corpus
7.4 Sentence-length frequencies for the RAND2 corpora
RAND2-BA-Corpus, RAND2-LS-Corpus, RAND2-SE-Corpus
7.5 Sentence-length frequencies for some literary works
Jane Austen’s Emma, Lewis Carroll’s Alice’s Adventures in Wonderland, All Project Gutenberg books packaged with NLTK
7.6 Sentence-length standard deviations for original corpora
BA-Corpus, LS-Corpus, SE-Corpus
7.7 Sentence-length standard deviations for RAND2 corpora






