pycrypt.scorers package

Submodules

pycrypt.scorers.cgetngramfrequencies module

pycrypt.scorers.czechfrequencies module

Czech frequencies, extract from http://ufal.mff.cuni.cz/~hajic/courses/npfl067/stats/czech.html data from 564532247 characters, kept only most relevant for speed

pycrypt.scorers.czechscorer module

class pycrypt.scorers.czechscorer.CzechScorer[source]

Bases: pycrypt.scorers.languagescorer.LanguageScorer

Czech scorer, credits for frequencies go to MFF

pycrypt.scorers.englishfrequencies module

pycrypt.scorers.englishscorer module

class pycrypt.scorers.englishscorer.EnglishScorer[source]

Bases: pycrypt.scorers.languagescorer.LanguageScorer

English scorer, frequencies got from interwebz

pycrypt.scorers.languagescorer module

class pycrypt.scorers.languagescorer.LanguageScorer[source]

Bases: pycrypt.scorers.scorer.Scorer

Scorer for languages based on N-grams and words

words = None
minWordLen = 3
maxWordLen = 10
log = False
ngramWeights = None
wordWeight = 0
unidec = True
setIdealNgramFrequencies(freqs)[source]
loadWordList(path, minwordlen=3, maxwordlen=10)[source]

Load words from file, 1 word per line

setWeights(ngram_weights, word_weight=0)[source]

Score multipliers, ngram_weights is list corresponding to ideal frequencies when something is 0, it’s ignored when scoring

getNgramFrequencies(text, length)[source]

Get dictionary of frequencies of N-grams (of given length)

scoreNgrams(text)[source]
scoreWords(text)[source]
score(*args, **kwargs)

pycrypt.scorers.ngram_converter module

pycrypt.scorers.scorer module

class pycrypt.scorers.scorer.Scorer[source]

Abstract class for scoring strings (i.e. language resemblance)

score(text)[source]

Get score of a string

Module contents