python - Pointwise Mutual Information from scratch -
i want write own pmi (python) code without relying on nltk.
i know have use formula log(p(x , y)/p(x)p(y))
suppose have corpus c contains n words , looking bigrams. right in thinking that:
p(x) = (the number of times first word occurs) / (n) p(y) = (the number of times second word occurs) / (n) p(x , y) = (the number of times first word , the second word occur bigram) / (the total number of bigrams)
for example, if had sentence "the fox jumped on fence".
the pmi(fox, jumped) worked out using:
p(x) = 1/6
p(y) = 1/6
p(x , y) = 1/5 (as there 5 distinct bigrams pairs , 1 of them "the fox")
any advice appreciated.
Comments
Post a Comment