summaryrefslogtreecommitdiff
path: root/cpu_ref/rsCpuIntrinsicBLAS.cpp
Commit message (Collapse)AuthorAgeFilesLines
* Implement multi-thread CPU GEMM for BLAS IntrinsicsMiao Wang2017-02-071-13/+182
| | | | | | | | | | | | | | | | | | | | - Multi-thread GEMM utilizes existing RS thread pool on top of Eigen. - Large matrix-matrix multiplication is decomposed into multiple tiled matrix-matrix multiplications. Each thread iterates on the unfinished works. - The tiling applies to ONLY ONE dimension of each input matrix, and whether to tile X or Y depends on the transpose of the matrix. - The performance increase is proportional to the number of available CPU cores, for sufficiently large matrices. Test: CTS test (rsblas) pass on Angler, Fugu and new devices. Performance test with RsBlasBenchmark and RsNeuralNet demo on Anger, Ryu, Seed, Shamu, Volantis, Fugu and new devices, showing roughly 70%(Volantix 2 core) ~ 400+%(Angler 8 core) perf gain. Change-Id: If96f4119fd34d5d9d98a2542801495e7ffe577ae (cherry picked from commit 41ab8faaf0d90238d42d8e2bbb7177467c10b4f6)
* Switch "transpose" for Matrix A & B, after gemmlowp change.Miao Wang2016-02-031-2/+2
| | | | Change-Id: I26fcaebcca828388ef6fe53c6e9e4db8e60dd4d9
* Update IntrinsicBLAS call to gemmlowp after rebase.Miao Wang2015-09-141-5/+8
| | | | Change-Id: Id084ac7b53ea0b3c61311b4f4c78312f397b7c5f
* Update eight_bit_int_gemm call after gemmlowp rebase and provideMiao Wang2015-07-161-1/+44
| | | | | | | | non-optimal path for armv7 without NEON. - gemmlowp will handle the optimal path for x86, NEON and aarch64 Change-Id: I67ce4c1e5b3195017a3d46895a8ce096682bc172
* Making libRSSupport able to optionally bundle libblas(V8) through dlopenMiao Wang2015-07-151-1/+15
| | | | | | and dlsym. Change-Id: I3ade3ad2802f3b8e5fc5661319b98a6212e6d8a2
* Update the BNNM cpu reference implementation with NEON friendlyMiao Wang2015-07-151-31/+5
| | | | | | gemmlowp. Change-Id: I5bcfd0fa988d8075e70272f277d7d7fab93d5fea
* update the offset type for BLAS.BNNMMiao Wang2015-06-301-8/+8
| | | | | | | bug: 22184114 Change-Id: I6ec212f8d5feb46fc9d0f97862b206978af1675b (cherry picked from commit 22cb808b0dfc9bd514d2e19b302a97f8455b5731)
* Use "override" instead of "virtual" when replacing methods.Stephen Hines2015-05-221-10/+10
| | | | | | Bug: 20306487 Change-Id: Ic83cb04cac153a7556f5d516e8f5ec88b5527b6f
* remove dead code (ALOGE) in rsCpuIntrinsicBLAS.cppMiao Wang2015-05-111-1/+0
| | | | | | bug: 21028875 Change-Id: Ia2d85a265f6e4a2617373f99b5c7bdc3810a7f24
* fix the CHER, CHPR, ZHER, ZHPR crash due to incorrect param order.Miao Wang2015-05-071-4/+4
| | | | Change-Id: If91cbf969c75e01afc6d93b204bc8167180c9ef9
* fix RsBlas_xgemv and RsBlas_xgbmv crash. (typo)Miao Wang2015-05-071-8/+8
| | | | Change-Id: Ia948afa2bc4af22f99323618738d5eb7d415ca97
* Rename BGEMM to BNNM. Modify layout of eight-bit GEMM-like intrinsic storage.Tim Murray2015-04-151-14/+17
| | | | Change-Id: If4b1267dfd42d6dd65bedf20c0b674479eefab35
* Add eight-bit GEMM-like intrinsic.Tim Murray2015-04-031-0/+61
| | | | Change-Id: I9b920900b4cb8b27e2ab27386d05f4175142d6b2
* Add BLAS to supported intrinsics.Tim Murray2015-02-171-0/+653
Change-Id: I8e776b2ffdbac09a73924035eee2eca0a12facb3