Sunday, January 1, 2012

high frequency lists

When one is an aimless weakly motivated German learner, as I am, shortcuts are very attractive. Presumably, learning the most frequently used German words is the most efficient way to German comprehension. The question is how many words are necessary. Somewhere in the internets a poster mentioned that 3000 German words are necessary for the level B1. B2 requires 4500-5000 words, and C1 6000-6500 words. In comparison, the number of words known by a native speaker seems to range between 10k to 20k+. In this case, "word" likely refers to word families (or "lemma") rather than individual words, meaning knowledge of "sein" encompasses knowledge of all conjugations. From the same forum linked above, level designations for the English language have been correlated to the following percentage coverage in a text: A2 is 80%, B1 is 84%, B2 87%, and C1 89%. Assuming this applies closely to German, I am supposedly at roughly 80% comprehension, yet I cannot do more than guess what the general topic is of most conversations. How hard fought is each percentage point above 80%. And at what percentage do I start to follow the particulars of a conversation. One poster suggests that at the C1 level, you still have one unknown word per sentence, or 20-30 unknown words per page.

Various sources have made use of readily available databases of words to determine frequency lists. Two easily found lists listed below use German subtitles which is indicative of spoken German, and an analysis made on a selection of German texts from books, magazines, and newspapers. There is also a list from wikipedia entries, and a list as analyzed by a Leipzig University research group, and another list from subtitles in many languages compiled for free by an industrious individual. A helpful aspect of frequency lists of individual words as opposed to word families is that we can get a sense of the use and importance of verb tenses.

The top 100 word families, from German text
Unknown words:
bereits - already
jedoch - nevertheless

Top 1000 words from German text
Further unknown words (up to 400):
192 allerdings - however
204 vergehen- to pass/elapse (i.e. vergangenen, as in während der vergangenen zwei Monate, during the past two months)
205 denen - relative pronouns
227 bisher - to date
231 Regierung - government
235 während - during
249 sogar - even
264 kaum - barely
281 wohl - no doubt
284 deshalb - therefore
289 gegenüber - compared to
311 gilt (gelten) - to apply
334 derzeit - currently
336 deutlich - clearly
355 schließlich - in conclusion
359 eher - if anything
373 insgesamt - altogether
378 ebenfalls - likewise
383 Ziel - aim
394 Gesellschaft - company
395 damals - back then

Top 1000 words from subtitles of television and movies in 2009
Surprisingly, and hearteningly, most of the 1000 were known. These were less familiar:
272 Leid - suffering
340 Sache - matter
470 dran
596 sorgen - to provide
631 Arsch - ass
650 Waffe - weapon
693 retten - to save
720 bereits - already
742 Hölle - hell
744 deshalb -therefore
770 Kumpel - buddy
794 reichen - to range
829 drüben - over there
903 tragen - to lift, to wear
911 hassen - to hate
933 Gefängnis - jail
961 Arschloch - asshole
981 solcher - such
993 Bewegung - movement
995 sonst - otherwise

Jailhouse action flick anyone?

German vocabulary from HP1, chapters 1-6

Hmm, I seem to be unmethodical in my German learning ... why not start at the beginning? Here is the vocabulary that I don't already know for the first Harry Potter, up to chapter 6. What seems to work fairly well now, is I watch sections of the movie three different ways. I first watch with English speech but subtitles with German for the hearing impaired. I pause frequently to scan for words I don't know, and to test if I understand what I am reading. I suppose this is similar to German speech and English subtitles, but in that I know instantly what word is being used. Then I watch two more times with German speech, one with German subtitles, and then without.

One thing I find, when scanning the German subtitles for the first time, is that even if I am familiar with all the words, the word order or grammar baffles me, and I have no idea of the meaning of the phrase. This is after having some instruction so I know basic sentence construction. These are presumably more specific constructions, or colloquial phrases. I have not made note of these here, but perhaps I will in the future.

And as an example of how the English construction does not transfer to German construction, here is what Dudley's English mother says on his birthday, "I want everything to be special on my Dudley's special day". Dudley's German mother says instead (directly translated) "I want, that everything is perfect, when my Dudley has his special day". More verbs in the German one. And somehow, the Germans split up complex sentences into prepositional phrases automatically.


das Gerüchte - rumor
trauen - to be true
etwas fürchten - to fear something
etwas für etwas halten - to regard/hold something to be something
weise - wise (and, btw: der Weise - a sage person; die Weise - a tune)
jemand etwas anvertrauen - to entrust someone with something
heulen - to howl (ein Motor heult)
der Rabauke - hooligan
einschlafen - to fall asleep
aufwecken - to arouse
verantworten - to take responsibility
jmdm gehören - to belong to someone
der Verwandten - a relative
berühmt - famous
aufwachsen - to grow up
keineswegs - by no means/in no way
das Lebewohl - farewell
aufstehen - to get up
aufwachen - to wake up
anbrennen - to scorch
warnen - to warn
die Albernheit - silliness
die Fratze - grotesque face
die Scheibe - disk, or panel
der Käfig - cage
aufzeigen - to show
fassen - to bite
das Thema - issue/matter (i.e. kein Thema)
das Geschrei - yelling, clamor
reingeraten - to get into
nässen - to moisten
verschwinden - to disappear, dissolve
der Magen - stomach
verdorben - spoiled/tainted
vergammeln - to go to seed (i.e. vergammelten Muschel?)
etwas kriegen - to get something (colloquial)
der Bohrer - drill
surren - to buzz/whir
der Schlitz - slot (i.e. Briefschlitz)
Kusch! - Shoo!
etw. austragen - to distribute something
verflixt - darned/blasted
lumpig - measly
winzig - tiny, miniscule
rumpeln - to rumble
krachen - to crash
unbefugt - unauthorized
eindringen - to break in/infiltrate (i.e. Sie sind unbefugt hier eingedrungen)
der Hüter - guardian
verdammt - damned
wetten - to wager
die Übung - practice
sobald - as soon as
etw. verwechseln - to mix something up
geschehen - to occur
unklärlich - inexplicable
rätselhaft - baffling
die Wut - anger
die Angst - fear
mitteilen - to inform
etw. aufnehmen - to accept something
schwören - to swear
der Unfug - mischief
der Stolz (stolz) - pride (proud)
eigenartig - strange
unnormal - abnormal
in die Luft jagen - to be blown up
der Autounfall - car crash
die Schande - disgrace
vermerken - to make note
der Dummkopf - bozo
jmdm etw. beibringen - to teach someone something
jmdn beleidigen - to insult
die Gegenwart - presence
aufbrechen - to hit the road/to break open
austatten - to equip
folgende - following
gestatten - to allow
das Übliche - the usual stuff
der Auftrag - assignment/mission
etw. besorgen - to obtain something
der Federkiel - quill
die Tinte - ink
der Kram - odds and ends
der Besen - broom (i.e. Rennbesen)
überhaupt - at all
der Kobold - goblin
abheben - to withdraw (money), to lift, scrape off
der Lümmel - bugger
Momentchen - hold
das Verlies - dungeon
für jmdn sorgen - to look after someone
äußerst - extremely
geheim - secret
treten - to step