| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
| |
This reverts commit e886d68b9c40c941d8966b9c90d0e265c75fb19e.
Reason for revert: simulator implemention is not ready yet.
Test: lunch aosp_cf_x86_phone-userdebug && m
Test: art/test.py --run-test --optimizing --host
Change-Id: I03c8c09ea348205b0238d7a26caef3477cd6ae3b
|
| |
|
|
|
|
|
|
| |
The debug_suspend_count TLS value has been dead for a while and was
accidentally left in. Remove it entirely.
Test: ./test.py --host
Change-Id: Ie2ead0d30e5ff3885cdd83242cad2c826c7fb732
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 3060bb919cd2f37c6a97e87c1581ac5294af72b3.
Reason for revert: relanding original change. The fix is setting
`device_supported: false` for libart(d)-simulator module in the .bp
file (`m checkbuild` attempted to build it for arm32 and failed).
Original commit message:
VIXL simulator for ART (Stage1)
Quick User Guide: test/README.simulator.md
This CL enables running ART run-tests in a simulator on host machine.
Some benefits of using this simulator approach:
- No need to use a target device at all.
Save developers from solving the device troubles: build, flash, usb,
adb, etc.
- Speed up development/debug/test cycle.
- Allows easy debugging/testing new instruction features without real
hardware.
- Allows using a smaller AOSP Android manifest master-art.
The Stage1 CL provides support for running 30% of current run-tests.
The rest unsupported test cases are kept in knownfailures.json.
Future work will be supporting proper stack frame layout between
simulator and quick entrypoints, so that stack walk,
QuickArgumentVisitor, deoptimization, etc can be supported.
This CL adds libart(d)-simulator-container library to the ART APEX. It
has caused the following increase of the APEX size (small, about 0.13%
for release APEX, measured for target aosp_arm64-userdebug):
Before:
88992 com.android.art.debug.apex
51612 com.android.art.release.apex
112352 com.android.art.testing.apex
After:
89124 com.android.art.debug.apex
51680 com.android.art.release.apex
112468 com.android.art.testing.apex
Change-Id: I461c80aa9c4ce0673eef1c0254d2c539f2b6a8d5
Test: art/test.py --run-test --optimizing --simulate-arm64
Test: art/test.py --run-test --optimizing --host
Test: m test-art-host-gtest
|
| |
|
|
|
|
|
|
|
| |
This reverts commit 48ca6a681efe1fa1cf82d8af918bf9bbfd35ae96.
Reason for revert: broken build 6685551 on aosp-master on full-eng
Bug: 161440641
Change-Id: I849fe53f56c4786f0f2a1605cbfd215559f11072
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Quick User Guide: test/README.simulator.md
This CL enables running ART run-tests in a simulator on host machine.
Some benefits of using this simulator approach:
- No need to use a target device at all.
Save developers from solving the device troubles: build, flash, usb,
adb, etc.
- Speed up development/debug/test cycle.
- Allows easy debugging/testing new instruction features without real
hardware.
- Allows using a smaller AOSP Android manifest master-art.
The Stage1 CL provides support for running 30% of current run-tests.
The rest unsupported test cases are kept in knownfailures.json.
Future work will be supporting proper stack frame layout between
simulator and quick entrypoints, so that stack walk,
QuickArgumentVisitor, deoptimization, etc can be supported.
This CL adds libart(d)-simulator-container library to the ART APEX. It
has cause the following increase of the APEX size (small, about 0.13% for
release APEX, measured for target aosp_arm64-userdebug):
Before:
88992 com.android.art.debug.apex
51612 com.android.art.release.apex
112352 com.android.art.testing.apex
After:
89124 com.android.art.debug.apex
51680 com.android.art.release.apex
112468 com.android.art.testing.apex
Test: art/test.py --run-test --optimizing --simulate-arm64
Test: art/test.py --run-test --optimizing --host
Test: m test-art-host-gtest
Change-Id: I078812dde9aaf7128d9f262b2102251927596b7f
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
Test: Add and enable tests in 178-app-image-native-method
Test: Add and enable tests in jni_compiler_test
Test: Manually step through the new stub in GDB and check
that backtrace works at various points.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: aosp_taimen-userdebug boots.
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Bug: 112189621
Change-Id: If094e5062acbb99eefa88f2afb4815f93730cb82
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The old 'internal' JDWP implementation hasn't been used for a few
releases and it's a lot of code that's barely being tested and is at
risk of bit-rot. To simplify the runtime and remove potentially buggy
code this removes it.
We also needed to rewrite the DdmThreadNotification code since it
relied on the suspension functionality from the old debugger and was
generally unsafe.
Test: ./test.py --host
Test: atest --test-mapping cts/tests/jdwp/TEST_MAPPING
Test: atest --test-mapping cts/hostsidetests/jdwptunnel/TEST_MAPPING
Test: Manual ddms
Bug: 119034743
Change-Id: I775f310a009141296b730e4a6c2503506a329481
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
- Add a new hotness count in the ProfilingInfo to not conflict with
interpreter hotness which may use it for OSR.
- Add a baseline flag in the OatQuickMethodHeader to identify baseline
compiled methods.
- Add a -Xusetieredjit flag to experiment and test.
Bug: 119800099
Test: test.py with Xusetieredjit to true
Change-Id: I8512853f869f1312e3edc60bf64413dee9143c52
|
| |
|
|
|
|
|
|
|
| |
In trying to remove profiling from interpreter, to speed up
interpreter performance.
Bug: 119800099
Test: test.py --baseline
Change-Id: Ica1fa41a889b31262d9f5691b30a31fbcec01b34
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Several of the new tests make use of the invoke-custom opcode. This
opcode is not supported by dexter/slicer causing the tests to fail.
This reverts commit c34eab45161c51bf63e548e44645cbcc59d01268.
Reason for revert: Added tests to redefine-stress known failures
Test: ./test.py --host --redefine-stress
Bug: 134162467
Change-Id: Ic1b375a0cb1e44d0252c17115af92c269fb8efc5
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit ea2a3d949354c8b054983ba629c81bc5ff7163da.
Bug: 134162467
Reason for revert: Fails redefine stress
Change-Id: If487c0bcacaf3a3f565ff475b6dad8321e3428b9
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 712fa800b2b78e527d36c88dc369bf4b723587ea.
We incorrectly didn't check if a method was obsolete before giving its
class's MethodIds array. We then incorrectly used this array and the
(placeholder) -1 index to try to find the previous method-id. Since -1
is not a valid array index we got check failures. To fix this we
simply added a check that the method is not obsolete and if it is we
go to the slow-path.
Reason for revert: Fixed issue causing out-of-bounds array access
Test: ./test.py --host --debuggable --ndebuggable
Bug: 134162467
Change-Id: Iaffefeab6e889b4fb6554a11452d0af051001cb7
|
| |
|
|
|
|
|
|
|
|
| |
This reverts commit c84fc3a742b160ce51cbf01c2e5f971ccc0a2c6c.
Bug: 134162467
Reason for revert: Test fails on debuggable.
Change-Id: I240d58fafcc7434749947330b64c67d65b9b7a1e
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
During structural class redefinition we sometimes need to update some
of the ArtMethod/ArtField pointers held by runtime frames. This adds
support for doing this through a StackReflectiveHandleScope similar to
the StackHandleScope used for holding object references. This also
updates various places where reflective-handles to ArtMethods and
ArtFields are needed, for example the JniIdManager, field Read/Write
operations and events, field resolution, and the old debugger.
Test: ./test.py --host
Bug: 134162467
Change-Id: I4ea73e85956a07735c6d7b125c5828a4233670bc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Recognize appending with StringBuilder and replace the
entire expression with a runtime call that perfoms the
append in a more efficient manner.
For now, require the entire pattern to be in a single block
and be very strict about the StringBuilder environment uses.
Also, do not accept StringBuilder/char[]/Object/float/double
arguments as they throw non-OOME exceptions and/or require a
call from the entrypoint back to a helper function in Java;
these shall be implemented later.
Boot image size for aosp_taimen-userdebug:
- before:
arm/boot*.oat: 19653872
arm64/boot*.oat: 23292784
oat/arm64/services.odex: 22408664
- after:
arm/boot*.oat: 19432184 (-216KiB)
arm64/boot*.oat: 22992488 (-293KiB)
oat/arm64/services.odex: 22376776 (-31KiB)
Note that const-string in compiled boot image methods cannot
throw, but for apps it can and therefore its environment can
prevent the optimization for apps. We could implement either
a simple carve-out for const-string or generic environment
pruning to allow this pattern to be applied more often.
Results for the new StringBuilderAppendBenchmark on taimen:
timeAppendLongStrings: ~700ns -> ~200ns
timeAppendStringAndInt: ~220ns -> ~140ns
timeAppendStrings: ~200ns -> 130ns
Bug: 19575890
Test: 697-checker-string-append
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Test: aosp_taimen-userdebug boots.
Test: run-gtests.sh
Test: testrunner.py --target --optimizing
Test: vogar --benchmark art/benchmark/stringbuilder-append/src/StringBuilderAppendBenchmark.java
Change-Id: I51789bf299f5219f68ada4c077b6a1d3fe083964
|
| |
|
|
|
|
|
|
| |
We are not using jmpbuf and co.
Bug: 119869270
Test: m
Change-Id: I85993e2ce506b059801d8d8da8b440e93ee9e3fd
|
| |
|
|
|
|
|
|
|
|
| |
They are currently unused and I don't have plans to use them.
The alternate table made it possible to enable extra mterp checks.
However, it is possible to move the debug checks to the main path.
Test: test.py -b -r --interpreter -t 001-HelloWorld
Change-Id: I45a39ec73abaefaecf5b8c636f3f9d519a0a8bb0
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Change the position of the InterpreterCache field in Thread to be
directly below the tlsPtr_ field. Since both members of the tlsPtr_
and InterpreterCache fields are used by asm_code we need their offsets
in asm_support.h. The fields at the end of the Thread struct have been
undergoing changes. By moving this field up we avoid the need to
update asm_support.h whenever one of the fields is modified.
Test: ./test.py --host
Change-Id: Ic2863116ed446af155badfc3bf098add7ba0b699
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Rename the InitializeType and InitializeTypeAndVerifyAccess
entrypoints to Resolve* to better match their semantics.
Keep the InitializeStaticStorage name for now as the most
appropriate name InitializeType would clash with the old
name of the ResolveType entrypoint.
Test: m test-art-host-gtest
Test: testrunner.py --host --optimizing
Change-Id: Ide55b58c490d085ab37d8536f90699f7ed571d59
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When compiling debuggable code we would compile a new-instance String
instruction into a StringFactory.newEmptyString invoke. This
additional invoke could be observed using tracing and is inconsistent
with the interpreter, where the string is simply allocated directly.
In order to bring these two modes into alignment we added a new
AllocStringObject quick entrypoint that will be used instead of the
normal AllocObject<...> entrypoints when allocating a string. This
entrypoint directly allocates a new string in the same manner the
interpreter does.
Needs next CL for test to work.
Bug: 110884646
Test: ./test/testrunner/testrunner.py --host --runtime-option=-Xjitthreshold:0 --jit
Test: Manual inspection of compiled code.
Change-Id: I7b4b084bcf7dd9a23485c0e3cd2cd04a04b43d3d
|
| |
|
|
|
|
|
|
|
|
|
| |
Add support for the compiler to call into the runtime for
invoke-custom bytecodes.
Bug: 35337872
Test: art/test.py --host -r -t 952
Test: art/test.py --target --64 -r -t 952
Test: art/test.py --target --32 -r -t 952
Change-Id: I821432e7e5248c91b8e1d36c3112974c34171803
|
| |
|
|
|
|
|
|
|
|
| |
Implemented as a runtime call.
Bug: 66890674
Test: art/test.py --target -r -t 979
Test: art/test.py --target --64 -r -t 979
Test: art/test.py --host -r -t 979
Change-Id: I67f461c819a7d528d7455afda8b4a59e9aed381c
|
| |
|
|
|
|
|
|
|
|
| |
Implemented as a runtime call.
Bug: 66890674
Test: art/test.py --target -r -t 979
Test: art/test.py --target --64 -r -t 979
Test: art/test.py --host -r -t 979
Change-Id: I4b3d3969d455d0198cfe122eea8abd54e0ea20ee
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
MathBenchmarks.java#timePow results on taimen's little cores
fixed at frequency 1401600 with forced JIT compilation:
- before:
- X32: 356.33 (@FastNative), 315.39 (@CriticalNative)
- X64: 357.31 (@FastNative), 315.37 (@CriticalNative)
- after (LICM defeats the benchmark):
- X32: 2.88
- X64: 2.87
- after but with kAllSideEffects to prevent LICM:
- X32: 275.42
- X64: 275.67
Test: Rely on TreeHugger.
Bug: 70727450
Change-Id: Iaa31f70acabbd57c163cfeafe02eed67c1348861
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This CL implements async exception support in the switch interpreter.
It also adds support for the MTerp to detect and switch back to the
switch interpreter in cases where an async exception is detected.
Tests follow in next CL.
Test: ./test.py --host -j50
Bug: 62821960
Bug: 34415266
Change-Id: Idb53711a40c20f962de8aa6b74662676b8bd25c6
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Let clang-format reorder the header includes.
Derived with:
* .clang-format:
BasedOnStyle: Google
IncludeIsMainRegex: '(_test|-inl)?$'
* Steps:
find . -name '*.cc' -o -name '*.h' | xargs sed -i.bak -e 's/^#include/ #include/' ; git commit -a -m 'ART: Include cleanup'
git-clang-format -style=file HEAD^
manual inspection
git commit -a --amend
Test: mmma art
Change-Id: Ia963a8ce3ce5f96b5e78acd587e26908c7a70d02
|
| |
|
|
|
|
|
|
| |
Forgot to retest.
Test: test-art-host-gtest
Change-Id: I72b07c3872079452a3a01db4fbd2c4ee0060f294
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Unblock some signals (SIGABRT, SIGBUS, SIGSEGV) that could happen
inside of the ART internal fault handlers, to report crashes inside of
the signal handler. Because we can't use sigaction to change the
handler when this happens, because it modifies global state, add a new
member variable in Thread to track whether a call to the fault handler
is reentrant or not.
Remove the old nested signal implementation that attempted to do this.
Bug: http://b/35853436
Test: changed the #if 0 to #if 1, ran a dummy process that
threw a NullPointerException, inspected logcat
Change-Id: I04bb4a09433c6817933d64ec681ec433b528f2a5
|
| |
|
|
|
|
|
|
|
| |
- Update architectures that have fast paths for
array allocation to use it.
- Will add more fast paths in follow-up CLs.
Test: test-art-target test-art-host.
Change-Id: I138cccd16464a85de22a8ed31c915f876e78fb04
|
| |
|
|
|
| |
Test: test-art-host test-art-target
Change-Id: I910d1c912c7c9056ecea0e1e7da7afb2a7220dfa
|
| |
|
|
|
|
|
|
| |
Remove unused ones to facilitate the transition to compressed
dex caches.
test: test-art-host, test-art-target
Change-Id: I1d1cb0daffa86dd9dda2eaa3c1ea3650a5c8d9d0
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Move fields `thread_local_start`, `thread_local_pos`,
`thread_local_end` and `thread_local_objects` before fields
`jni_entrypoints` and `quick_entrypoints` within
art::Thread, to avoid repetitive art::Thread field moves in
future CLs caused by the addition or deletion of entry
points.
Test: m test-art-host
test: m test-art-target (on ARM)
Change-Id: Ib67842e44a7f21a871ca4d1bb95dc6f7cfedc829
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This reverts commit 0fb5af1c8287b1ec85c55c306a1c43820c38a337.
This takes us back to the original change and attempts to fix the
issues encountered:
- Adds transition record push/pop around artInvokePolymorphic.
- Changes X86/X64 relocations for MacSDK.
- Implements MIPS entrypoint for art_quick_invoke_polymorphic.
- Corrects size of returned reference in art_quick_invoke_polymorphic
on ARM.
Bug: 30550796,33191393
Test: art/test/run-test 953
Test: m test-art-run-test
Change-Id: Ib6b93e00b37b9d4ab743a3470ab3d77fe857cda8
|
| |
|
|
|
|
| |
This reverts commit f7aaacd97881c6924b8212c7f8fe4a4c8721ef53.
Change-Id: I6756cd1e6110bb45231f62f5e388f16c044cb145
|
| |
|
|
|
|
|
|
| |
960-default-smali64 is failing.
This reverts commit 2b615ba29c4dfcf54aaf44955f2eac60f5080b2e.
Change-Id: Iebb8ee5a917fa84c5f01660ce432798524d078ef
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Change motivated by:
- Dex cache compression: having the allocation fast path do a
dex cache lookup will be too expensive. So instead, rely on the
compiler having direct access to the class (either through BSS for
AOT, or JIT tables for JIT).
- Inlining: the entrypoints relied on the caller of the allocation to
have the same dex cache as the outer method (stored at the bottom of
the stack). This meant we could not inline methods from a different
dex file that do allocations. By avoiding the dex cache lookup in
the entrypoint, we can now remove this restriction.
Code expansion on average for Docs/Gms/FB/Framework (go/lem numbers):
- Around 0.8% on arm64
- Around 1% for x64, arm
- Around 1.5% on x86
Test: test-art-host, test-art-target, ART_USE_READ_BARRIER=true/false
Test: test-art-host, test-art-target, ART_DEFAULT_GC_TYPE=SS ART_USE_TLAB=true
Change-Id: I41f3748bb4d251996aaf6a90fae4c50176f9295f
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Avoid lock contention on a singleton VerifierDeps by allocating
temporary per-thread VerifierDeps that get merged after verification.
This saves around ~35% compile-times on interpret-only.
Only the creation of extra strings is guarded by a lock, for simplicity.
Test: test-art-host, test-art-target
bug: 32641252
bug: 30937355
Change-Id: I11a2367da882b58e39afa7b42cba2e74a209b75d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Reduces code size since we do not need to reload class before
calling slow path.
TODO: Delete read barriers in the check cast code since the slow
path will retry with the proper read barriers if the check fails.
Bug: 12687968
Bug: 29516974
Test: test-art-host + test-art-target with CC
Change-Id: Ia4eb9bbe3fe2d2016e44523cf0451210828d7b88
|
| |
|
|
|
| |
Bug: 32088975
Change-Id: I16f8b7ec6b251812af60ab25f2153d9b72f37044
|
| |
|
|
|
|
| |
Run ART test suite on host and Nexus 6.
Bug: 31464666
Change-Id: I5aa737726031adae0b132f759cf802a93d581a7f
|
| |
|
|
|
|
|
|
|
| |
* Add parentheses around macro parameters, or
use NOLINT to suppress warning.
Bug: 28705665
Test: build with WITH_TIDY=1
Change-Id: Ifc922c2e66215772042bac372754ea70074f0053
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adds a faster path for java methods annotated with
dalvik.annotation.optimization.FastNative .
Intended to replace usage of fast JNI (registering with "!(FOO)BAR" descriptors).
Performance Microbenchmark Results (Angler):
* Regular JNI cost in nanoseconds: 115
* Fast JNI cost in nanoseconds: 60
* @FastNative cost in nanoseconds: 36
Summary: Up to 67% faster (vs fast jni) JNI transition cost
Change-Id: Ic23823ae0f232270c068ec999fd89aa993894b0e
|
| |
|
|
|
|
|
|
|
|
|
|
| |
As entry points ReadBarrierMarkReg30 and
ReadBarrierMarkReg31 are undefined on all architectures
supporting the read barrier configuration (ARM, ARM64, x86
and x86-64), remove them from the entry point list.
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I500626e54f00aebfc095b4ef5f81b49fa43f7768
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Replace entry point ReadBarrierMark with 32
ReadBarrierMarkRegX entry points, using register
number X as input and output (instead of the standard
runtime calling convention) to save two moves in Baker's
read barrier mark slow-path code.
Test: ART host and target (ARM, ARM64) tests.
Bug: 29506760
Bug: 12687968
Change-Id: I73cfb82831cf040b8b018e984163c865cc44ed87
|
| |
|
|
|
|
|
|
|
|
|
| |
Replace String.charAt() with HArrayLength, HBoundsCheck and
HArrayGet. This allows GVN on the HArrayLength and BCE on
the HBoundsCheck as well as using the infrastructure for
HArrayGet, i.e. better handling of constant indexes than
the old intrinsic and using the HArm64IntermediateAddress.
Bug: 28330359
Change-Id: I32bf1da7eeafe82537a60416abf6ac412baa80dc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevents the spinning that used to happen if RunCheckpoint was called
with 3 pending checkpoints. This spinning was done when holding
thread_list_lock_ and thread_suspend_count_lock_ and could deadlock
if any of the pending checkpoints required any of these locks.
The fix is to use an overflow list instead of having a fixed limit of
3.
Changed suspend stress test to have more threads and only compare last
line since there may be libbacktrace spam like:
"+E/libbacktrace(69891): void SignalHandler(int, siginfo_t *, void *):
Timed out waiting for unwind thread to indicate it completed."
Bug: 28988206
Change-Id: I2ae611506147d5199d59a08eee0395f7fa35d448
|
| |
|
|
|
|
|
|
|
| |
Speedup (GSS GC with TLAB on N5):
BinaryTrees: 1872 -> 796 ms (-57%)
MemAllocTest: 2522 -> 2219 ms (-12%)
Bug: 9986565
Change-Id: Icb9d1259461f3abe83a4a82c8aff911937eaf57d
|
| |
|
|
|
|
|
|
|
|
|
| |
Very small space savings (< 1%) after device boot and up to 10%
allocation speedup.
Some minor cleanup.
Bug: 9986565
Change-Id: I51d791c4674d6944fe9a7ee78537ac3490c1a02c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Add a Dalvik-style fast interpreter to Art.
Three primary deficiencies in the existing Art interpreter
will be addressed:
1. Structural inefficiencies (primarily the bloated
fetch/decode/execute overhead of the C++ interpreter
implementation).
2. Stack memory wastage. Each managed-language invoke
adds a full copy of the interpreter's compiler-generated
locals on the shared stack. We're at the mercy of
the compiler now in how much memory is wasted here. An
assembly based interpreter can manage memory usage more
effectively.
3. Shadow frame model, which not only spends twice the memory
to store the Dalvik virtual registers, but causes vreg stores
to happen twice.
This CL mostly deals with #1 (but does provide some stack memory
savings). Subsequent CLs will address the other issues.
Current status:
Passes all run-tests.
Phone boots interpret-only.
2.5x faster than Clang-compiled Art goto interpreter on fetch/decode/execute
microbenchmark, 5x faster than gcc-compiled goto interpreter.
1.6x faster than Clang goto on Caffeinemark overall
2.0x faster than Clang switch on Caffeinemark overall
68% of Dalvik interpreter performance on Caffeinemark (still much slower,
primarily because of poor invoke performance and lack of execute-inline)
Still nearly an order of magnitude slower than Dalvik on invokes
(but slightly better than Art Clang goto interpreter.
Importantly, saves ~200 bytes of stack memory per invoke (but still
wastes ~400 relative to Dalvik).
What's needed:
Remove the (large quantity of) bring-up hackery in place.
Integrate into the build mechanism. I'm still using the old Dalvik manual
build step to generate assembly code from the stub files.
Remove the suspend check hack. For bring-up purposes, I'm using an explicit
suspend check (like the other Art interpreters). However, we should be
doing a Dalvik style suspend check via the table base switch mechanism.
This should be done during the alternative interpreter activation.
General cleanup.
Add CFI info.
Update the new target bring-up README documentation.
Add other targets.
In later CLs:
Consolidate mterp handlers for expensive operations (such as new-instance) with
the code used by the switch interpreter. No need to duplicate the code for
heavyweight operations (but will need some refactoring to align).
Tuning - some fast paths needs to be moved down to the assembly handlers,
rather than being dealt with in the out-of-line code.
JIT profiling. Currently, the fast interpreter is used only in the fast
case - no instrumentation, no transactions and no access checks. We
will want to implement fast + JIT-profiling as the alternate fast
interpreter. All other cases can still fall back to the reference
interpreter.
Improve invoke performance. We're nearly an order of magnitude slower than
Dalvik here. Some of that is unavoidable, but I suspect we can do
better.
Add support for our other targets.
Change-Id: I43e25dc3d786fb87245705ac74a87274ad34fedc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Introduce an x86 fast path implementation in Optimizing for
Baker's read barriers (for both heap reference loads and GC
root loads). The marking phase of the read barrier is
performed by a slow path, invoking a new runtime entry point
(artReadBarrierMark).
Other read barrier algorithms continue to use the original
slow path based implementation, which has been renamed as
GenerateReadBarrierSlow/GenerateReadBarrierForRootSlow.
Bug: 12687968
Change-Id: Ie610c4befc19ff22378a8cba38b422dcacb54320
|