python - Pearson Correlation after Normalization -
i want normalize data , compute pearson correlation. if try without normalization works. normalization error message: attributeerror: 'numpy.ndarray' object has no attribute 'corr' can solve problem?
import numpy np import pandas pd filename_train = 'c:\users\xxx.xxx\workspace\dataset\!train_data.csv' names = ['a', 'b', 'c', 'd', 'e', ...] df_train = pd.read_csv(filename_train, names=names) sklearn.preprocessing import normalizer normalizeddf_train = normalizer().fit_transform(df_train) #pearson correlation pd.set_option('display.width', 100) pd.set_option('precision', 2) print(normalizeddf_train.corr(method='pearson'))
you need dataframe
constructor, because output of fit_transform
numpy array
, work dataframe.corr
:
df_train = pd.dataframe({'a':[1,2,3], 'b':[4,5,6], 'c':[7,8,9], 'd':[1,3,5], 'e':[5,3,6], 'f':[7,4,3]}) print (df_train) b c d e f 0 1 4 7 1 5 7 1 2 5 8 3 3 4 2 3 6 9 5 6 3 sklearn.preprocessing import normalizer normalizeddf_train = normalizer().fit_transform(df_train) print (normalizeddf_train) [[ 0.08421519 0.33686077 0.58950634 0.08421519 0.42107596 0.58950634] [ 0.1774713 0.44367825 0.70988521 0.26620695 0.26620695 0.3549426 ] [ 0.21428571 0.42857143 0.64285714 0.35714286 0.42857143 0.21428571]] print(pd.dataframe(normalizeddf_train).corr(method='pearson')) 0 1 2 3 4 5 0 1.000000 0.917454 0.646946 0.998477 -0.203152 -0.994805 1 0.917454 1.000000 0.896913 0.894111 -0.575930 -0.872187 2 0.646946 0.896913 1.000000 0.603899 -0.878063 -0.565959 3 0.998477 0.894111 0.603899 1.000000 -0.148832 -0.998906 4 -0.203152 -0.575930 -0.878063 -0.148832 1.000000 0.102420 5 -0.994805 -0.872187 -0.565959 -0.998906 0.102420 1.000000
Comments
Post a Comment