pycrypt.scorers package¶
Submodules¶
pycrypt.scorers.cgetngramfrequencies module¶
pycrypt.scorers.czechfrequencies module¶
Czech frequencies, extract from http://ufal.mff.cuni.cz/~hajic/courses/npfl067/stats/czech.html data from 564532247 characters, kept only most relevant for speed
pycrypt.scorers.czechscorer module¶
-
class
pycrypt.scorers.czechscorer.CzechScorer[source]¶ Bases:
pycrypt.scorers.languagescorer.LanguageScorerCzech scorer, credits for frequencies go to MFF
pycrypt.scorers.englishfrequencies module¶
pycrypt.scorers.englishscorer module¶
-
class
pycrypt.scorers.englishscorer.EnglishScorer[source]¶ Bases:
pycrypt.scorers.languagescorer.LanguageScorerEnglish scorer, frequencies got from interwebz
pycrypt.scorers.languagescorer module¶
-
class
pycrypt.scorers.languagescorer.LanguageScorer[source]¶ Bases:
pycrypt.scorers.scorer.ScorerScorer for languages based on N-grams and words
-
words= None¶
-
minWordLen= 3¶
-
maxWordLen= 10¶
-
log= False¶
-
ngramWeights= None¶
-
wordWeight= 0¶
-
unidec= True¶
-
setWeights(ngram_weights, word_weight=0)[source]¶ Score multipliers, ngram_weights is list corresponding to ideal frequencies when something is 0, it’s ignored when scoring
-
getNgramFrequencies(text, length)[source]¶ Get dictionary of frequencies of N-grams (of given length)
-
score(*args, **kwargs)¶
-