summaryrefslogtreecommitdiff
path: root/tests/cppbasic-getpointer/compute.cpp
diff options
context:
space:
mode:
authorMiao Wang <miaowang@google.com>2016-09-08 11:53:32 -0700
committerMiccia <bono.michele94@gmail.com>2017-02-07 10:57:26 +0100
commit1372cad4b4d406b81ef939d5d84855aafad43396 (patch)
tree5602b83933d89a2d925a4e017f5c7bceded04682 /tests/cppbasic-getpointer/compute.cpp
parent5871479caf83c0593d2c9341f8d599766f45c2ca (diff)
Implement multi-thread CPU GEMM for BLAS Intrinsics
- Multi-thread GEMM utilizes existing RS thread pool on top of Eigen. - Large matrix-matrix multiplication is decomposed into multiple tiled matrix-matrix multiplications. Each thread iterates on the unfinished works. - The tiling applies to ONLY ONE dimension of each input matrix, and whether to tile X or Y depends on the transpose of the matrix. - The performance increase is proportional to the number of available CPU cores, for sufficiently large matrices. Test: CTS test (rsblas) pass on Angler, Fugu and new devices. Performance test with RsBlasBenchmark and RsNeuralNet demo on Anger, Ryu, Seed, Shamu, Volantis, Fugu and new devices, showing roughly 70%(Volantix 2 core) ~ 400+%(Angler 8 core) perf gain. Change-Id: If96f4119fd34d5d9d98a2542801495e7ffe577ae (cherry picked from commit 41ab8faaf0d90238d42d8e2bbb7177467c10b4f6)
Diffstat (limited to 'tests/cppbasic-getpointer/compute.cpp')
0 files changed, 0 insertions, 0 deletions