Sunday, January 1, 2012

high frequency lists

When one is an aimless weakly motivated German learner, as I am, shortcuts are very attractive. Presumably, learning the most frequently used German words is the most efficient way to German comprehension. The question is how many words are necessary. Somewhere in the internets a poster mentioned that 3000 German words are necessary for the level B1. B2 requires 4500-5000 words, and C1 6000-6500 words. In comparison, the number of words known by a native speaker seems to range between 10k to 20k+. In this case, "word" likely refers to word families (or "lemma") rather than individual words, meaning knowledge of "sein" encompasses knowledge of all conjugations. From the same forum linked above, level designations for the English language have been correlated to the following percentage coverage in a text: A2 is 80%, B1 is 84%, B2 87%, and C1 89%. Assuming this applies closely to German, I am supposedly at roughly 80% comprehension, yet I cannot do more than guess what the general topic is of most conversations. How hard fought is each percentage point above 80%. And at what percentage do I start to follow the particulars of a conversation. One poster suggests that at the C1 level, you still have one unknown word per sentence, or 20-30 unknown words per page.

Various sources have made use of readily available databases of words to determine frequency lists. Two easily found lists listed below use German subtitles which is indicative of spoken German, and an analysis made on a selection of German texts from books, magazines, and newspapers. There is also a list from wikipedia entries, and a list as analyzed by a Leipzig University research group, and another list from subtitles in many languages compiled for free by an industrious individual. A helpful aspect of frequency lists of individual words as opposed to word families is that we can get a sense of the use and importance of verb tenses.

The top 100 word families, from German text
Unknown words:
bereits - already
jedoch - nevertheless

Top 1000 words from German text
Further unknown words (up to 400):
192 allerdings - however
204 vergehen- to pass/elapse (i.e. vergangenen, as in während der vergangenen zwei Monate, during the past two months)
205 denen - relative pronouns
227 bisher - to date
231 Regierung - government
235 während - during
249 sogar - even
264 kaum - barely
281 wohl - no doubt
284 deshalb - therefore
289 gegenüber - compared to
311 gilt (gelten) - to apply
334 derzeit - currently
336 deutlich - clearly
355 schließlich - in conclusion
359 eher - if anything
373 insgesamt - altogether
378 ebenfalls - likewise
383 Ziel - aim
394 Gesellschaft - company
395 damals - back then

Top 1000 words from subtitles of television and movies in 2009
Surprisingly, and hearteningly, most of the 1000 were known. These were less familiar:
272 Leid - suffering
340 Sache - matter
470 dran
596 sorgen - to provide
631 Arsch - ass
650 Waffe - weapon
693 retten - to save
720 bereits - already
742 Hölle - hell
744 deshalb -therefore
770 Kumpel - buddy
794 reichen - to range
829 drüben - over there
903 tragen - to lift, to wear
911 hassen - to hate
933 Gefängnis - jail
961 Arschloch - asshole
981 solcher - such
993 Bewegung - movement
995 sonst - otherwise

Jailhouse action flick anyone?

No comments:

Post a Comment