frameworks_rs.git - frameworks

diff options

author	Miao Wang <miaowang@google.com>	2016-09-08 11:53:32 -0700
committer	Miccia <bono.michele94@gmail.com>	2017-02-07 10:57:26 +0100
commit	1372cad4b4d406b81ef939d5d84855aafad43396 (patch)
tree	5602b83933d89a2d925a4e017f5c7bceded04682 /tests/cppbasic-getpointer/compute.cpp
parent	5871479caf83c0593d2c9341f8d599766f45c2ca (diff)

Implement multi-thread CPU GEMM for BLAS Intrinsics

- Multi-thread GEMM utilizes existing RS thread pool on top of Eigen. - Large matrix-matrix multiplication is decomposed into multiple tiled matrix-matrix multiplications. Each thread iterates on the unfinished works. - The tiling applies to ONLY ONE dimension of each input matrix, and whether to tile X or Y depends on the transpose of the matrix. - The performance increase is proportional to the number of available CPU cores, for sufficiently large matrices. Test: CTS test (rsblas) pass on Angler, Fugu and new devices. Performance test with RsBlasBenchmark and RsNeuralNet demo on Anger, Ryu, Seed, Shamu, Volantis, Fugu and new devices, showing roughly 70%(Volantix 2 core) ~ 400+%(Angler 8 core) perf gain. Change-Id: If96f4119fd34d5d9d98a2542801495e7ffe577ae (cherry picked from commit 41ab8faaf0d90238d42d8e2bbb7177467c10b4f6)

Diffstat (limited to 'tests/cppbasic-getpointer/compute.cpp')

0 files changed, 0 insertions, 0 deletions


context:
space:
mode: