| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The tuning knobs for triggering trace compilation for the JIT
had not been revisited for several years. In that time, the
working set of some applications have significantly increased,
leading to frequent cache overlows & flushes.
This CL adds the ability to set the maximum size of the JIT's
cache on the command line, and we expect to use different settings
depending on device configuration (rule of thumb: 1K for each 1M
for system RAM, with 2M limit).
Additionally, the trace compilation trigger has been tightened to
limit the compilation of cold traces.
Change-Id: Ice22c5d9d46a93e465c57dd83f50ca3912f1672e
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
My previous "fix" (c89d83e1c05979b68037ad15413fa4460a88e36f) had the
conditions reversed, so you _had_ to use -Xjitthreshold to get a non-zero
threshold, but when you did, you'd get the default instead of what you
asked for!
This was spotted by the jank tests.
Bug: 8285558
Bug: https://code.google.com/p/android/issues/detail?id=52017
Change-Id: I28270f2573d46929eb10d30789fecf7d5a8cea75
|
| |
|
|
|
|
|
|
| |
Previously, we'd always overwrite the user-supplied value because
the architecture-specific default gets set so late.
Bug: https://code.google.com/p/android/issues/detail?id=52017
Change-Id: I469bf9ce599820f5ce3dea346aa8f680deffb0c5
|
| |
|
|
|
|
|
|
|
| |
See https://android-git.corp.google.com/g/#/c/157220
Also fix an occurrence of LOGW missed in an earlier change.
Bug: 5449033
Change-Id: I2e3b23839e6dcd09015d6402280e9300c75e3406
|
| |
|
|
|
|
|
|
|
|
| |
An leading underscore followed by a capital letter is a reserved
name space in C and C++.
This change also moves any #include directives within the include
guard in some of the compiler/codegen/arm header files.
Change-Id: I9715e2c5301699d31886e61d0fe6e29483555a2a
|
| |
|
|
| |
Change-Id: I9fb5d33f23ec7aeb2b9a3908d4125b34be0599ae
|
| |\
| |
| |
| | |
Change-Id: I99c4289bd34f63b0b970b6ed0fa992b44e805393
|
| | |
| |
| |
| | |
Change-Id: I9f9fe52bd4ceebb6dde48251a89190ba6bb00ce4
|
| | |
| |
| |
| | |
Change-Id: I236c5a1553a51f82c9bc3eaaab042046c854d3b4
|
| |/
|
|
| |
Change-Id: Idffbdb02c29e2be03a75f5a0a664603f2299504a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Fix a few miscellaneous bugs from the interpreter restructuring that were
causing a segfault on debugger attach.
Added a sanity checking routine for debugging.
Fixed a problem in which the JIT's threshold and on/off switch
wouldn't get initialized properly on thread creation.
Renamed dvmCompilerStateRefresh() to dvmCompilerUpdateGlobalState() to
better reflect its function.
Change-Id: I5b8af1ce2175e3c6f53cda19dd8e052a5f355587
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a restructuring of the Dalvik ARM and x86 interpreters:
o Combine the old portstd and portdbg interpreters into a single
portable interpreter.
o Add debug/profiling support to the fast (mterp) interpreters.
o Delete old mechansim of switching between interpreters. Now, once
you choose an interpreter at startup, you stick with it.
o Allow JIT to co-exist with profiling & debugging (necessary for
first-class support of debugging with the JIT active).
o Adds single-step capability to the fast assembly interpreters without
slowing them down (and, in fact, measurably improves their performance).
o Remove old "polling for safe point" mechanism. Breakouts now achieved
via modifying base of interpreter handler table.
o Simplify interpeter control mechanism.
o Allow thread-granularity control for profiling & debugging
The primary motivation behind this change was to improve the responsiveness
of debugging and profiling and to make it easier to add new debugging and
profiling capabilities in the future. Instead of always bailing out to the
slow debug portable interpreter, we can now stay in the fast interpreter.
A nice side effect of the change is that the fast interpreters
got a healthy speed boost because we were able to replace the
polling safepoint check that involved a dozen or so instructions
with a single table-base reload. When combined with the two earlier CLs
related to this restructuring, we show a 5.6% performance improvement
using libdvm_interp.so on the Checkers benchmark relative to Honeycomb.
Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.
Here's why:
In principio creavit Fadden Thread et InterpState. And it was good.
Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.
Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.
The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.
Yuck.
With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.
This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.
Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
| |
|
|
|
|
|
|
|
|
| |
Enhanced code cache management to accommodate both trace and method
compilations. Also implemented a hacky dispatch routine for virtual
leaf methods.
Microbenchmark showed 3x speedup in leaf method invocation.
Change-Id: I79d95b7300ba993667b3aa221c1df9c7b0583521
|
| |
|
|
|
|
|
|
|
|
|
| |
Only register entry points dispatched through [r6+#offset] in
JitToInterpEntries.
For ARM targets check the size of JitToInterpEntries explicitly to
make sure that its last entry is within 128 byte from InterpState
due to the Thumb codegen constraint.
Change-Id: I74184115cb3a3c89afc3a5fe53685671d9cb1027
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
|
| |
Also incorporate the former contents of OpCodeNames.h. This is a small
attempt to increase naming consistency in libdex. There will be a bit
more to come, in a follow-up.
Change-Id: Ia7ab06042dde2e19eda02ef1fee72fb4260e899d
|
| |
|
|
|
|
|
|
| |
Slight reworking of the memory barrier instruction generation to
generalize it, and then add "dmb st" for the new return-void-barrier
instruction.
Change-Id: Iad95aa5b0ba9b616a17dcbe4c6ca2e3906bb49dc
|
| |
|
|
|
|
|
|
|
| |
Most of CodegenFactory.c is at a high-enough abstraction level to reuse
for other targets. This CL moves the target-depending routines into a
new source file (ArchFactory.c) and what's left up a level into the
target-independent directory.
Change-Id: I792d5dc6b2dc8aa6aaa384039da464db2c766123
|
| |
|
|
|
|
|
| |
Much of the register utility code is target independent. Move it up
a level so the x86 JIT can use it.
Change-Id: Id9895a42281fd836cb1a2c942e106de94df62a9a
|
| |
|
|
|
|
| |
Also, on SMP systems generate memory barriers.
Change-Id: If64f7c98a8de426930b8f36ac77913e53b7b2d7a
|
| |
|
|
|
|
|
| |
The JIT was pulling it out of the dexdump directory, which is Just
Plain Wrong[tm]. Now it's part of libdex, for all to enjoy.
Change-Id: Ic1e4c981eb2d70ccc3c841ceb5a54f4f77af2008
|
| |
|
|
|
|
| |
Also re-enabled the JIT for the ARMv5te target.
Change-Id: I89fd229205e30e6ee92a4933290a7d8dca001232
|
| |
|
|
|
| |
Bug: 2517606
Change-Id: I2b5aa92ceaf23d484329330ae20de5966704280b
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Re-enabled load/store motion that had inadvertently been turned off for
non-armv7 targets. Tagged memory references with the kind of memory
they touch (Dalvik frame, literal pool, heap) to enable more aggressive
load hoisting. Eliminated some largely duplicate code in the target
specific files. Reworked temp register allocation code to allocate next
temp round-robin (to improve scheduling opportunities).
Overall, nice gain for Sapphire. Shows 5% to 15% on some benchmarks, and
measurable improvements for Passion.
|
| |
|
|
|
|
|
|
|
| |
The search for optimial value is still ongoing. The current settings are:
v5 v7
JIT profile table 512 2048
JIT code cache 512K 1M
JIT threshold 200 40
|
| |
|
|
| |
Also, fix blocking mode initialization.
|
| |
|
|
|
|
|
| |
The new flag is -Xjitcheckcg which will crawl the stack frame of the
translation requesting thread.
Bug: 2368995
|
| |
|
|
|
| |
Also, move setting of Jit table parameters to architecture-specific init
funciton.
|
| |
|
|
|
|
|
| |
Code in the template directory will occupy space in the code cache and is
invoked from JIT'ed code. Since these routines are only invoked from statically
compiled functions we can move them to the codegen directory which also has
arch-variant configurations.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The original Codegen.c is broken into three components:
- CodegenCommon.c (arch-independend)
- CodegenFactory.c (Thumb1/2 dependent)
- CodegenDriver.c (Dalvik dependent)
For the Thumb/Thumb2 directories, each contain the followin three files:
- Factory.c (low-level routines for instruction selections)
- Gen.c (invoke the ISA-specific instruction selection routines)
- Ralloc.c (arch-dependent register pools)
The FP directory contains FP-specific codegen routines depending on
Thumb/Thumb2/VFP/PortableFP:
- Thumb2VFP.c
- ThumbVFP.c
- ThumbPortableFP.c
Then the hierarchy is formed by stacking these files in the following top-down
order:
1 CodegenCommon.c
2 Thumb[2]/Factory.c
3 CodegenFactory.c
4 Thumb[2]/Gen.c
5 FP stuff
6 Thumb[2]/Ralloc.c
7 CodegenDriver.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is the first step towards enabling translation & self-cosim stress modes.
When trace selection begins, the trace head address is pinned and
remains in a limbo state until the translation is complete. Previously,
if the trace selected aborted for any reason, the trace head would remain
forever in limbo. This was not a correctness problem, but caused some
small performance anomolies and made life more difficult for self-cosimulation
mode.
This CL introduces a pseudo-translation that simply routes control to
the interpreter. When we detect that a trace selection attempt has
failed, the trace head is associated with this fully-chainable
pseudo-translation. This also has the benefit for self-cosimulation that
we are guaranteed forward progress.
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Direct usage of registers abstracted out.
Live values tracked locally. Redundant loads and stores suppressed.
Address of registers and register pairs unified w/ single "location" mechanism
Register types inferred using existing dataflow analysis pass.
Interim (i.e. Hack) mechanism for storing register liveness info. Rewrite TBD.
Stubbed-out code for linear scan allocation (for loop and long traces)
Moved optimistic lock check for monitor-enter/exit inline for Thumb2
Minor restructuring, renaming and general cleanup of codegen
Renaming of enums to follow coding convention
Formatting fixes introduced by the enum renaming
Rewrite of RallocUtil.c and addition of linear scan to come in stage 2.
|
| |
|
|
| |
Improved performance by 50% over existing JIT for some FP benchmarks.
|
| |
|
|
|
|
|
| |
Added support for Thumb2 IT. Moved compare-long and floating point
comparisons inline. Temporarily disabled use of Thumb2 CBZ & CBNZ
because they were causing too many out-of-range assembly restarts.
Bug fix for LIR3 assert.
|
| |
|