| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Change-Id: I74d152ea9cfe5b15daa9a8353ca27d8afa7474d2
|
| |\
| |
| |
| | |
Change-Id: I885fab2470352d0a625c9946d0d5c9111486b713
|
| |/
|
|
|
|
|
|
|
| |
Add support for customer device extension
Change-Id: I0402a630ba212d1c5e81cda110f61210f7b60384
(cherry picked from commit 11499df326462bfe25890a35c6abbf019ff7784e)
(cherry picked from commit e03b8f8da9cf4eef64cedf39ce9ca90d26ce5124)
(cherry picked from commit fb360be406f35b9591f12c61936657f03cc5880f)
|
| |
|
|
|
|
|
|
|
|
|
| |
For easy multiplication using reverse subtract (when
lit is 2^n-1) use the barrel shifter for rsb.
This improves arithmetic performance for code executing
in Dalvik. E.g String.hashCode.
Change-Id: Ifb086dcec344b30fd3e392ac21d508b43e820cdc
Signed-off-by: Patrik Ryd <patrik.ryd@stericsson.com>
|
| |
|
|
|
|
|
|
|
| |
See https://android-git.corp.google.com/g/#/c/157220
Also fix an occurrence of LOGW missed in an earlier change.
Bug: 5449033
Change-Id: I2e3b23839e6dcd09015d6402280e9300c75e3406
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This one was tricky to track down. The underlying problem arose
with the consolidation of InterpState with Thread. Rather than
having a state structure for each instance of the interpreter, we
moved to a model that had a single thread-local struct shared by all
interpreter instances running on that thread. A portion of interpreter
state can't be shared - and thus was saved and restored on nested
invocations of the interpreter.
The bug here was that the storage for method return values was not
included in the state that needed save/retore. In normal operation,
it doesn't need to be saved - that storage isn't live across an
invoke that could trigger a nested interpreter activation. However,
when debugging, the debugger itself may hijack threads and create
new interpreter instances for its own purposed - and there is a small
window in which live retval can be trashed.
The fix is simply to move retval into the InterpSave struct.
Change-Id: Ib621824b799c5caa16fdfa8f5689a181159059df
|
| |
|
|
|
|
|
|
| |
A Thumb2 pc-relative load is slipped into the codegen stream even though
the selected platform is armv5te (eg the emulator).
Bug: 4399358
Change-Id: I61dd6853cad6c82de43f384814c903dd9f3ae302
|
| |
|
|
| |
Change-Id: Idffbdb02c29e2be03a75f5a0a664603f2299504a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Split the original literal pool into class object literals and
constants. Elements in the class object pool have to match the specicial
values perfectly (ie no +delta space optimizations) since they might be
relocated.
2) Implement dvmJitScanAllClassPointers(void (*callback)(void *))
which is the entry routine to report all memory locations in the code cache
that contain class objects (ie class object pool and predicted chaining
cells for virtual calls).
3) Major codegen changes on how/when the class object pool are populated
and how predicted chains are patched. Before this change the compiler
thread is always in the VM_WAIT state, which won't prevent GC from
running. Since the class object pointers captured by a worker thread
are no longer guaranteed to be stable at JIT time, change various
internal data structures to capture the class descriptor/loader
tuple instead. The conversion from descriptor/loader tuple to actual
class object pointers are only performed when the thread state is
RUNNING or at GC safe point.
4) Separate the class object installation phase out of the main
dvmCompilerAssembleLIR routine so that the impact to blocking GC
requests is minimal. Add new stats to report the potential block time.
For example:
Potential GC blocked by compiler: max 46 us / avg 25 us
5) Various cleanup in the trace structure walkup code. Modified the
verbose print routine to show the class descriptor in the class literal
pool. For example:
D/dalvikvm( 1450): -------- end of chaining cells (0x007c)
D/dalvikvm( 1450): 0x44020628 (00b4): .class
(Lcom/android/unit_tests/PerformanceTests$EmptyClass;)
D/dalvikvm( 1450): 0x4402062c (00b8): .word (0xaca8d1a5)
D/dalvikvm( 1450): 0x44020630 (00bc): .word (0x401abc02)
D/dalvikvm( 1450): End
Bug: 3482956
Change-Id: I2e736b00d63adc255c33067544606b8b96b72ffc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation is to reconstruct the leaf Dalvik frame and
punt to the interpreter, since the amount of work involed to match
each catch block and walk through the stack frames is just not worth
JIT'ing.
Additional changes:
- Fixed a control-flow bug where a block that ends with a throw shouldn't
have a fall-through block.
- Fixed a code cache lookup bug so that method-based compilation is
guaranteed a slot in the profiling table.
- Created separate handler routines based on opcode format for the
method-based JIT.
- Renamed a few core registers that also have special meanings to the
VM or ARM architecture.
Change-Id: I429b3633f281a0e04d352ae17a1c4f4a41bab156
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.
Here's why:
In principio creavit Fadden Thread et InterpState. And it was good.
Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.
Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.
The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.
Yuck.
With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.
This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.
Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
| |
|
|
|
|
|
|
|
| |
- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE
Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted the code to JIT a call a C function extracted so I can potentially
use it elsewhere. The functions that sometimes JIT instructions directly and
other times bail out to C can now call this, simplifying the body of the
switch. I think there's a behavioral change here with the ThumbVFP
genInlineSqrt, which previously had the wrong return value.
Tested on passion to ensure that the performance characteristics of assembler
intrinsics, C intrinsics, and library native methods haven't changed (using
the Math and Float classes).
Change-Id: Id79771a31abe3a516f403486454e9c0d9793622a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for method compilation, this CL causes all traces to
include two entry points: profiling and non-profiling. For now, the
profiling entry will only be used if dalvik is run with -Xjitprofile,
and largely works like it did before. The difference is that profiling
support no longer requires the "assert" build - it's always there now.
This will enable us to do a form of sampling profiling of
traces in order to identify hot methods or hot trace groups,
while keeping the overhead low by only switching profiling on periodically.
To turn the periodic profiling on and off, we simply unchain all existing
translations and set the appropriate global profile state. The underlying
translation lookup and chaining utilties will examine the profile state to
determine which entry point to use (i.e. - profiling or non-profiling) while
the traces naturally rechain during further execution.
Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
| |
|
|
|
|
|
|
| |
Remove vestiges of code intended for linear scan register allocation
in the trace compiler. New plan is to stick with local allocation for
traces and build a new linear scan allocator for the method compiler.
Change-Id: Ic265ab5a7936b144cbe7fa4dc667fa7aba579045
|
| |
|
|
| |
Change-Id: I8828bc628f110aaade578a197bf1f51b30bf1be7
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
| |
This allows better use of cbz/cbnz on Thumb2 targets. Also, removed
the clrex from the inline monitor enter code (not necessary).
Change-Id: I3bfa90bcdf34f6ef3e2447c9c6f1b49a98a89e58
|
| |
|
|
|
|
| |
Also re-enabled the JIT for the ARMv5te target.
Change-Id: I89fd229205e30e6ee92a4933290a7d8dca001232
|
| |
|
|
|
|
|
| |
All invoked functions are documented in compiler/codegen/arm/CalloutHelper.h
Bug: 2567981
Change-Id: Ia7cd4107272df1b0b5588fbcc0aafcc6d0723d60
|
| |
|
|
| |
Change-Id: I2f31492f68cb753f76dd664cd6b0a52d7d32de4c
|
| |
|
|
| |
Change-Id: I7b922e223fe1f5242d1f3db1fa18f54aaed725af
|
| |
|
|
|
|
|
|
| |
Issue 2175597 Jit compile failures should abort translation, but not the VM
Added new dvmCompileAbort() to replace uses of dvmAbort() when something goes
wrong during the compliation of a trace. In that case, we'll abort the translation
and set it's head to the interpret-only "translation".
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Re-enabled load/store motion that had inadvertently been turned off for
non-armv7 targets. Tagged memory references with the kind of memory
they touch (Dalvik frame, literal pool, heap) to enable more aggressive
load hoisting. Eliminated some largely duplicate code in the target
specific files. Reworked temp register allocation code to allocate next
temp round-robin (to improve scheduling opportunities).
Overall, nice gain for Sapphire. Shows 5% to 15% on some benchmarks, and
measurable improvements for Passion.
|
| |
|
|
| |
(I saw these the other day, but preferred a separate patch.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than make these changes in the libraries (*10 being a common case),
let's do them once and for all in the JIT.
The 2^n-1 case could be better if we generated RSB instructions, but the
current "fake" RSB is still better than a full multiply.
Thumb doesn't support reg/reg/reg/shift instructions, so we can't optimize
the "population count <= 2" cases (such as *10) there.
Tested on sholes, passion, and passion-running-sapphire (and visually
inspected to check we weren't trying to generate Thumb2 instructions there).
Also tested with the self-verifier.
|
| | |
|
| |
|
|
|
| |
Renaming of all of those register utilities which used to be local because
of our include mechanism to the standard dvmCompiler prefix scheme.
|
| | |
|
|
|
The original Codegen.c is broken into three components:
- CodegenCommon.c (arch-independend)
- CodegenFactory.c (Thumb1/2 dependent)
- CodegenDriver.c (Dalvik dependent)
For the Thumb/Thumb2 directories, each contain the followin three files:
- Factory.c (low-level routines for instruction selections)
- Gen.c (invoke the ISA-specific instruction selection routines)
- Ralloc.c (arch-dependent register pools)
The FP directory contains FP-specific codegen routines depending on
Thumb/Thumb2/VFP/PortableFP:
- Thumb2VFP.c
- ThumbVFP.c
- ThumbPortableFP.c
Then the hierarchy is formed by stacking these files in the following top-down
order:
1 CodegenCommon.c
2 Thumb[2]/Factory.c
3 CodegenFactory.c
4 Thumb[2]/Gen.c
5 FP stuff
6 Thumb[2]/Ralloc.c
7 CodegenDriver.c
|