x86 - How to compile Tensorflow with SSE4.2 and AVX instructions? -
this message received running script check if tensorflow working:
i tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcublas.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcudnn.so.5 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcufft.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcuda.so.1 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcurand.so.8.0 locally w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use sse4.2 instructions, these available on machine , speed cpu computations. w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use avx instructions, these available on machine , speed cpu computations. tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful numa node read sysfs had negative value (-1), there must @ least 1 numa node, returning numa node 0 i noticed has mentioned sse4.2 , avx,
1) sse4.2 , avx?
2) how these sse4.2 , avx improve cpu computations tensorflow tasks.
3) how make tensorflow compile using 2 libraries?
i ran same problem, seems yaroslav bulatov's suggestion doesn't cover sse4.2 support, adding --copt=-msse4.2 suffice. in end, built with
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package without getting warning or errors.
probably best choice system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package -mfpmath=both works gcc, not clang.  i'm not sure default -o2 or -o3 setting is.  gcc -o3 enables auto-vectorization, won't help, , make slower.
what does: --copt bazel build passes option directly gcc compiling c , c++ files (but not linking, need different option cross-file link-time-optimization)
x86-64 gcc defaults using sse2 or older simd instructions, can run binaries on any x86-64 system. (see https://gcc.gnu.org/onlinedocs/gcc/x86-options.html). that's not want. want make binary takes advantage of instructions cpu can run, because you're running binary on system built it.
-march=native enables options cpu supports, makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant.  (also, -mavx2 enables -mavx , -msse4.2, yaroslav's command should have been fine).  if you're using cpu doesn't support 1 of these options (like fma), using -mfma make binary faults illegal instructions.
tensorflow's ./configure defaults enabling -march=native, using should avoid needing specify compiler options manually.
-march=native enables -mtune=native, it optimizes cpu things sequence of avx instructions best unaligned loads.
this applies gcc, clang, or icc.
Comments
Post a Comment