x86 - How to compile Tensorflow with SSE4.2 and AVX instructions? -


this message received running script check if tensorflow working:

i tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcublas.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcudnn.so.5 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcufft.so.8.0 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcuda.so.1 locally tensorflow/stream_executor/dso_loader.cc:125] opened cuda library libcurand.so.8.0 locally w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use sse4.2 instructions, these available on machine , speed cpu computations. w tensorflow/core/platform/cpu_feature_guard.cc:95] tensorflow library wasn't compiled use avx instructions, these available on machine , speed cpu computations. tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:910] successful numa node read sysfs had negative value (-1), there must @ least 1 numa node, returning numa node 0 

i noticed has mentioned sse4.2 , avx,

1) sse4.2 , avx?

2) how these sse4.2 , avx improve cpu computations tensorflow tasks.

3) how make tensorflow compile using 2 libraries?

i ran same problem, seems yaroslav bulatov's suggestion doesn't cover sse4.2 support, adding --copt=-msse4.2 suffice. in end, built with

bazel build -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 --config=cuda -k //tensorflow/tools/pip_package:build_pip_package 

without getting warning or errors.

probably best choice system is:

bazel build -c opt --copt=-march=native --copt=-mfpmath=both --config=cuda -k //tensorflow/tools/pip_package:build_pip_package 

-mfpmath=both works gcc, not clang. i'm not sure default -o2 or -o3 setting is. gcc -o3 enables auto-vectorization, won't help, , make slower.


what does: --copt bazel build passes option directly gcc compiling c , c++ files (but not linking, need different option cross-file link-time-optimization)

x86-64 gcc defaults using sse2 or older simd instructions, can run binaries on any x86-64 system. (see https://gcc.gnu.org/onlinedocs/gcc/x86-options.html). that's not want. want make binary takes advantage of instructions cpu can run, because you're running binary on system built it.

-march=native enables options cpu supports, makes -mavx512f -mavx2 -mavx -mfma -msse4.2 redundant. (also, -mavx2 enables -mavx , -msse4.2, yaroslav's command should have been fine). if you're using cpu doesn't support 1 of these options (like fma), using -mfma make binary faults illegal instructions.

tensorflow's ./configure defaults enabling -march=native, using should avoid needing specify compiler options manually.

-march=native enables -mtune=native, it optimizes cpu things sequence of avx instructions best unaligned loads.

this applies gcc, clang, or icc.


Comments

Popular posts from this blog

python - How to insert QWidgets in the middle of a Layout? -

python - serve multiple gunicorn django instances under nginx ubuntu -

module - Prestashop displayPaymentReturn hook url -