x86 - How to compile Tensorflow with SSE4.2 and AVX instructions? -
this message received running script check if tensorflow working:
i tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcublas.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcudnn.so.5 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcufft.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcuda.so.1 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcurand.so.8.0 locally w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use sse4.2 instructions, these available on machine , speed cpu computations. w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use avx instructions, these available on machine , speed cpu computations. tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful numa node read sysfs had negative value (-1), there must @ least 1 numa node, returning numa node 0
i noticed has mentioned sse4.2 , avx,
1) sse4.2 , avx?
2) how these sse4.2 , avx improve cpu computations tensorflow tasks.
3) how make tensorflow compile using 2 libraries?
i ran same problem, seems yaroslav bulatov's suggestion doesn't cover sse4.2 support, adding --copt=-msse4.2
suffice. in end, built with
bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
without getting warning or errors.
probably best choice system is:
bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package
-mfpmath=both
works gcc, not clang. i'm not sure default -o2
or -o3
setting is. gcc -o3
enables auto-vectorization, won't help, , make slower.
what does: --copt
bazel build
passes option directly gcc compiling c , c++ files (but not linking, need different option cross-file link-time-optimization)
x86-64 gcc defaults using sse2 or older simd instructions, can run binaries on any x86-64 system. (see https://gcc.gnu.org/onlinedocs/gcc/x86-options.html). that's not want. want make binary takes advantage of instructions cpu can run, because you're running binary on system built it.
-march=native
enables options cpu supports, makes -mavx512f -mavx2 -mavx -mfma -msse4.2
redundant. (also, -mavx2
enables -mavx
, -msse4.2
, yaroslav's command should have been fine). if you're using cpu doesn't support 1 of these options (like fma), using -mfma
make binary faults illegal instructions.
tensorflow's ./configure
defaults enabling -march=native
, using should avoid needing specify compiler options manually.
-march=native
enables -mtune=native
, it optimizes cpu things sequence of avx instructions best unaligned loads.
this applies gcc, clang, or icc.
Comments
Post a Comment