python - Pointwise Mutual Information from scratch -


i want write own pmi (python) code without relying on nltk.

i know have use formula log(p(x , y)/p(x)p(y))

suppose have corpus c contains n words , looking bigrams. right in thinking that:

p(x) = (the number of times first word occurs) / (n)  p(y) = (the number of times second word occurs) / (n)  p(x , y) = (the number of times first word , the second  word occur bigram) / (the total number of bigrams) 

for example, if had sentence "the fox jumped on fence".

the pmi(fox, jumped) worked out using:

p(x) = 1/6

p(y) = 1/6

p(x , y) = 1/5 (as there 5 distinct bigrams pairs , 1 of them "the fox")

any advice appreciated.


Comments

Popular posts from this blog

java - is not an enclosing class / new Intent Cannot Resolve Constructor -

python - Error importing VideoFileClip from moviepy : AttributeError: 'PermissionError' object has no attribute 'message' -

qt - QML MouseArea onWheel event not working properly when inside QML Scrollview -