| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Change-Id: Idffbdb02c29e2be03a75f5a0a664603f2299504a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Split the original literal pool into class object literals and
constants. Elements in the class object pool have to match the specicial
values perfectly (ie no +delta space optimizations) since they might be
relocated.
2) Implement dvmJitScanAllClassPointers(void (*callback)(void *))
which is the entry routine to report all memory locations in the code cache
that contain class objects (ie class object pool and predicted chaining
cells for virtual calls).
3) Major codegen changes on how/when the class object pool are populated
and how predicted chains are patched. Before this change the compiler
thread is always in the VM_WAIT state, which won't prevent GC from
running. Since the class object pointers captured by a worker thread
are no longer guaranteed to be stable at JIT time, change various
internal data structures to capture the class descriptor/loader
tuple instead. The conversion from descriptor/loader tuple to actual
class object pointers are only performed when the thread state is
RUNNING or at GC safe point.
4) Separate the class object installation phase out of the main
dvmCompilerAssembleLIR routine so that the impact to blocking GC
requests is minimal. Add new stats to report the potential block time.
For example:
Potential GC blocked by compiler: max 46 us / avg 25 us
5) Various cleanup in the trace structure walkup code. Modified the
verbose print routine to show the class descriptor in the class literal
pool. For example:
D/dalvikvm( 1450): -------- end of chaining cells (0x007c)
D/dalvikvm( 1450): 0x44020628 (00b4): .class
(Lcom/android/unit_tests/PerformanceTests$EmptyClass;)
D/dalvikvm( 1450): 0x4402062c (00b8): .word (0xaca8d1a5)
D/dalvikvm( 1450): 0x44020630 (00bc): .word (0x401abc02)
D/dalvikvm( 1450): End
Bug: 3482956
Change-Id: I2e736b00d63adc255c33067544606b8b96b72ffc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation is to reconstruct the leaf Dalvik frame and
punt to the interpreter, since the amount of work involed to match
each catch block and walk through the stack frames is just not worth
JIT'ing.
Additional changes:
- Fixed a control-flow bug where a block that ends with a throw shouldn't
have a fall-through block.
- Fixed a code cache lookup bug so that method-based compilation is
guaranteed a slot in the profiling table.
- Created separate handler routines based on opcode format for the
method-based JIT.
- Renamed a few core registers that also have special meanings to the
VM or ARM architecture.
Change-Id: I429b3633f281a0e04d352ae17a1c4f4a41bab156
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The polling is expensive for now as it is done through three
instructions: ld/ld/branch. As a result, a bunch of bonus stuff has
been worked on to mitigate the extra overhead:
- Cleaned up resource flags for memory disambiguation.
- Rewrote load/store elimination and scheduler routines to hide
the ld/ld latency for GC flag. Seperate the dependency checking into
memory disambiguation part and resource conflict part.
- Allowed code motion for Dalvik/constant/non-aliasing loads to be
hoisted above branches for null/range checks.
- Created extended basic blocks following goto instructions so that
longer instruction streams can be optimized as a whole.
Without the bonus stuff, the performance dropped about ~5-10% on some
benchmarks because of the lack of headroom to hide the polling latency
in tight loops. With the bonus stuff, the performance delta is between
+/-5% with polling code generated. With the bonus stuff but disabling
polling, the new bonus stuff provides consistent performance
improvements:
CaffeineMark 3.6%
Linpack 11.1%
Scimark 9.7%
Sieve 33.0%
Checkers 6.0%
As a result, GC polling is disabled by default but can be turned on
through the -Xjitsuspendpoll flag for experimental purposes.
Change-Id: Ia81fc85de3e2b70e6cc93bc37c2b845892003cdb
|
| |
|
|
|
|
|
|
|
| |
- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE
Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
|
| |
|
|
| |
Change-Id: If3fb3a36f33aaee8e5fdded4e9fa607be54f0bfb
|
| |
|
|
| |
Change-Id: I06292964a6882ea2d0c17c5c962db95e46b01543
|
| |
|
|
|
|
|
|
| |
kNumDalvikInstructions is now kNumPackedOpcodes, there is a new
kMaxOpcodeValue, and both are generated by opcode-gen.
Change-Id: Ic46f1f52d2d21382452c8e777024f4a985ad31d3
Bonus: Reworded the switch and array data comment for clarity.
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
| |
In particular, use it instead of just saying 256, and similarly for
255. The number of opcodes will be changing soon.
Change-Id: Icc77120c2673968dddd6b4003f717245d46e4159
|
| |
|
|
|
|
| |
Mark some functions "static".
Change-Id: Ia80bccab1f72690729e43f99783d34fe366108b2
|
| |
|
|
|
|
| |
Also re-enabled the JIT for the ARMv5te target.
Change-Id: I89fd229205e30e6ee92a4933290a7d8dca001232
|
| |
|
|
|
| |
Bug: 2561283
Change-Id: I9fd94928f3e661de97098808340ea92b28cafa07
|
| |
|
|
|
|
|
|
| |
Issue 2175597 Jit compile failures should abort translation, but not the VM
Added new dvmCompileAbort() to replace uses of dvmAbort() when something goes
wrong during the compliation of a trace. In that case, we'll abort the translation
and set it's head to the interpret-only "translation".
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Re-enabled load/store motion that had inadvertently been turned off for
non-armv7 targets. Tagged memory references with the kind of memory
they touch (Dalvik frame, literal pool, heap) to enable more aggressive
load hoisting. Eliminated some largely duplicate code in the target
specific files. Reworked temp register allocation code to allocate next
temp round-robin (to improve scheduling opportunities).
Overall, nice gain for Sapphire. Shows 5% to 15% on some benchmarks, and
measurable improvements for Passion.
|
| | |
|
| |
|
|
|
| |
Renaming of all of those register utilities which used to be local because
of our include mechanism to the standard dvmCompiler prefix scheme.
|
| |
|
|
|
| |
This was a bug in the def tracking - or more specifically neglecting to
reset tracking at a possible rollback location.
|
|
|
The original Codegen.c is broken into three components:
- CodegenCommon.c (arch-independend)
- CodegenFactory.c (Thumb1/2 dependent)
- CodegenDriver.c (Dalvik dependent)
For the Thumb/Thumb2 directories, each contain the followin three files:
- Factory.c (low-level routines for instruction selections)
- Gen.c (invoke the ISA-specific instruction selection routines)
- Ralloc.c (arch-dependent register pools)
The FP directory contains FP-specific codegen routines depending on
Thumb/Thumb2/VFP/PortableFP:
- Thumb2VFP.c
- ThumbVFP.c
- ThumbPortableFP.c
Then the hierarchy is formed by stacking these files in the following top-down
order:
1 CodegenCommon.c
2 Thumb[2]/Factory.c
3 CodegenFactory.c
4 Thumb[2]/Gen.c
5 FP stuff
6 Thumb[2]/Ralloc.c
7 CodegenDriver.c
|