python - Pointwise Mutual Information from scratch -


i want write own pmi (python) code without relying on nltk.

i know have use formula log(p(x , y)/p(x)p(y))

suppose have corpus c contains n words , looking bigrams. right in thinking that:

p(x) = (the number of times first word occurs) / (n)  p(y) = (the number of times second word occurs) / (n)  p(x , y) = (the number of times first word , the second  word occur bigram) / (the total number of bigrams) 

for example, if had sentence "the fox jumped on fence".

the pmi(fox, jumped) worked out using:

p(x) = 1/6

p(y) = 1/6

p(x , y) = 1/5 (as there 5 distinct bigrams pairs , 1 of them "the fox")

any advice appreciated.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -