diff options
| author | Miao Wang <miaowang@google.com> | 2016-09-08 11:53:32 -0700 |
|---|---|---|
| committer | Miccia <bono.michele94@gmail.com> | 2017-02-07 10:57:26 +0100 |
| commit | 1372cad4b4d406b81ef939d5d84855aafad43396 (patch) | |
| tree | 5602b83933d89a2d925a4e017f5c7bceded04682 /tests/cppbasic-getpointer/compute.cpp | |
| parent | 5871479caf83c0593d2c9341f8d599766f45c2ca (diff) | |
Implement multi-thread CPU GEMM for BLAS Intrinsics
- Multi-thread GEMM utilizes existing RS thread pool on top of
Eigen.
- Large matrix-matrix multiplication is decomposed into multiple
tiled matrix-matrix multiplications. Each thread iterates on
the unfinished works.
- The tiling applies to ONLY ONE dimension of each input matrix,
and whether to tile X or Y depends on the transpose of the matrix.
- The performance increase is proportional to the number of
available CPU cores, for sufficiently large matrices.
Test: CTS test (rsblas) pass on Angler, Fugu and new devices.
Performance test with RsBlasBenchmark and RsNeuralNet demo
on Anger, Ryu, Seed, Shamu, Volantis, Fugu and new devices,
showing roughly 70%(Volantix 2 core) ~ 400+%(Angler 8 core) perf gain.
Change-Id: If96f4119fd34d5d9d98a2542801495e7ffe577ae
(cherry picked from commit 41ab8faaf0d90238d42d8e2bbb7177467c10b4f6)
Diffstat (limited to 'tests/cppbasic-getpointer/compute.cpp')
0 files changed, 0 insertions, 0 deletions
