| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Change-Id: I74d152ea9cfe5b15daa9a8353ca27d8afa7474d2
|
| |
|
|
|
|
|
|
| |
Makes DMB domain ISH or ISHST instead of the implicit System.
ISH (Inner Shareable) should be sufficient for all cores/clusters,
but is not sufficient for GPU or other memory-mapped peripherals
Change-Id: Id159228daba97bc3692d2eb1ee2786bae2ee34a7
|
| |\
| |
| |
| | |
Change-Id: I885fab2470352d0a625c9946d0d5c9111486b713
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
Add support for customer device extension
Change-Id: I0402a630ba212d1c5e81cda110f61210f7b60384
(cherry picked from commit 11499df326462bfe25890a35c6abbf019ff7784e)
(cherry picked from commit e03b8f8da9cf4eef64cedf39ce9ca90d26ce5124)
(cherry picked from commit fb360be406f35b9591f12c61936657f03cc5880f)
|
| |\|
| |
| |
| | |
Android 4.4 Release 1.0
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For easy multiplication using reverse subtract (when
lit is 2^n-1) use the barrel shifter for rsb.
This improves arithmetic performance for code executing
in Dalvik. E.g String.hashCode.
Change-Id: Ifb086dcec344b30fd3e392ac21d508b43e820cdc
Signed-off-by: Patrik Ryd <patrik.ryd@stericsson.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change 256211 (JIT: Performance Fix for const doubles) introduced a
defect that can cause the JIT to use the wrong floating point
double constant in traces in which the following conditions hold:
o Two (or more) different 64-bit floating point constants are used.
o The physical register holding the first constant is still live
at the time the second constant is used.
o The low 32 bits of the two constants are identical.
In this situation, the load/copy optimization pass will incorrectly
determine that the two constants are the same, delete the load of
the second constant and re-use the first constant value.
Note: this problem only occurs with 64-bit floating point literals.
64-bit long literals are unaffected.
This CL works around the problem, and a subsequent CL will rework
disambiguation of 64-bit immediates in a somewhat cleaner fashion.
(cherry-pick of c1757a6deab0ca0bfd42c38612d92b2f26c41dbe.)
Change-Id: I795b4b753550d2745cbbdd83ae25f4a7088990f6
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Some recent Arm processors take a performance hit when
creating a floating point double by loading it as a pair of singles.
Legacy code to support soft floating point doubles as a pair of core
registers loaded double immediates in this way.
With the CL, we handle double immediates as a single unit.
(cherry-pick of c8129911e598ad0ca8d7b31012444ab6ce8bce45.)
Change-Id: Ic1512e34bfd233a6f5ffd58ce843965adbbad875
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This reverts commit aa897b06230453519c4ec636f229c72ac0015897.
Revert "Reject dex files that attempt to use unspecified class access flags"
This reverts commit 2f824d3e4835479409724ea02d0a23114cd4ff81.
Revert "If dalvik wants ASCII casing, it needs to ask for it."
This reverts commit d91250308fc4c423d11955174c21566fa19df07c.
Revert "JIT: Combine add with shift and offset for array load & store."
This reverts commit a9ecd84e5f5423a1ba6bbb2bb9256b0dc382de44.
Revert "JIT: Use rsb and shift in easy multiply."
This reverts commit 25b94295a57290623e34882e7fd86ea10928a54e.
Revert "Excessive JNI: Dump HPROF dump."
This reverts commit 8d30a7402d48c4ffe2bf28ede78c6b3b52b15304.
Revert "dalvik/vm: Dalvik startup with a low memory footprint"
This reverts commit 15726c81059b74bf2352db29a3decfc4ea9c1428.
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Optimize long and double array load / store for ARM JIT.
Array load / store performs a logical shift left and add,
replace it with add capable of performing shift in the
same instruction.
Array load / store performs an add instead of using offset
for vldr/vstr. Replace the add and vldr/vstr with a vldr/vstr
that is capable of handling offset.
This improves performance for usecases involving long and double
array code execution in Dalvik. E.g WindowOrientation.
Change-Id: I90220c349ab936cdba1987139ccdf4dc31d7bbb0
Signed-off-by: Patrik Ryd <patrik.ryd@stericsson.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
For easy multiplication using reverse subtract (when
lit is 2^n-1) use the barrel shifter for rsb.
This improves arithmetic performance for code executing
in Dalvik. E.g String.hashCode.
Change-Id: Ifb086dcec344b30fd3e392ac21d508b43e820cdc
Signed-off-by: Patrik Ryd <patrik.ryd@stericsson.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Optimize monitor unlock for ARM Thumb2 JIT.
Monitor unlock performs a logical shift left
and sub, replace it with a sub capable of
performing the shift in the same instruction.
This improves performance for usecases involving code
executing in Dalvik.
Change-Id: Iaf062d750c3bc941926f3c3b8a64dc9c7984a477
Signed-off-by: Patrik Ryd <patrik.ryd@stericsson.com>
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change 256211 (JIT: Performance Fix for const doubles) introduced a
defect that can cause the JIT to use the wrong floating point
double constant in traces in which the following conditions hold:
o Two (or more) different 64-bit floating point constants are used.
o The physical register holding the first constant is still live
at the time the second constant is used.
o The low 32 bits of the two constants are identical.
In this situation, the load/copy optimization pass will incorrectly
determine that the two constants are the same, delete the load of
the second constant and re-use the first constant value.
Note: this problem only occurs with 64-bit floating point literals.
64-bit long literals are unaffected.
This CL works around the problem, and a subsequent CL will rework
disambiguation of 64-bit immediates in a somewhat cleaner fashion.
Change-Id: I33baf78402bab58d9b0ca46189f26491c2b2a751
|
| |/
|
|
|
|
|
|
|
|
|
| |
Some recent Arm processors take a performance hit when
creating a floating point double by loading it as a pair of singles.
Legacy code to support soft floating point doubles as a pair of core
registers loaded double immediates in this way.
With the CL, we handle double immediates as a single unit.
Change-Id: I91aca9da6d4b38e180479dd8f75c82dbc7b4a526
|
| |
|
|
|
|
|
|
|
| |
See https://android-git.corp.google.com/g/#/c/157220
Also fix an occurrence of LOGW missed in an earlier change.
Bug: 5449033
Change-Id: I2e3b23839e6dcd09015d6402280e9300c75e3406
|
| |
|
|
|
|
|
|
| |
A Thumb2 pc-relative load is slipped into the codegen stream even though
the selected platform is armv5te (eg the emulator).
Bug: 4399358
Change-Id: I61dd6853cad6c82de43f384814c903dd9f3ae302
|
| |
|
|
| |
Change-Id: Idffbdb02c29e2be03a75f5a0a664603f2299504a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Split the original literal pool into class object literals and
constants. Elements in the class object pool have to match the specicial
values perfectly (ie no +delta space optimizations) since they might be
relocated.
2) Implement dvmJitScanAllClassPointers(void (*callback)(void *))
which is the entry routine to report all memory locations in the code cache
that contain class objects (ie class object pool and predicted chaining
cells for virtual calls).
3) Major codegen changes on how/when the class object pool are populated
and how predicted chains are patched. Before this change the compiler
thread is always in the VM_WAIT state, which won't prevent GC from
running. Since the class object pointers captured by a worker thread
are no longer guaranteed to be stable at JIT time, change various
internal data structures to capture the class descriptor/loader
tuple instead. The conversion from descriptor/loader tuple to actual
class object pointers are only performed when the thread state is
RUNNING or at GC safe point.
4) Separate the class object installation phase out of the main
dvmCompilerAssembleLIR routine so that the impact to blocking GC
requests is minimal. Add new stats to report the potential block time.
For example:
Potential GC blocked by compiler: max 46 us / avg 25 us
5) Various cleanup in the trace structure walkup code. Modified the
verbose print routine to show the class descriptor in the class literal
pool. For example:
D/dalvikvm( 1450): -------- end of chaining cells (0x007c)
D/dalvikvm( 1450): 0x44020628 (00b4): .class
(Lcom/android/unit_tests/PerformanceTests$EmptyClass;)
D/dalvikvm( 1450): 0x4402062c (00b8): .word (0xaca8d1a5)
D/dalvikvm( 1450): 0x44020630 (00bc): .word (0x401abc02)
D/dalvikvm( 1450): End
Bug: 3482956
Change-Id: I2e736b00d63adc255c33067544606b8b96b72ffc
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation is to reconstruct the leaf Dalvik frame and
punt to the interpreter, since the amount of work involed to match
each catch block and walk through the stack frames is just not worth
JIT'ing.
Additional changes:
- Fixed a control-flow bug where a block that ends with a throw shouldn't
have a fall-through block.
- Fixed a code cache lookup bug so that method-based compilation is
guaranteed a slot in the profiling table.
- Created separate handler routines based on opcode format for the
method-based JIT.
- Renamed a few core registers that also have special meanings to the
VM or ARM architecture.
Change-Id: I429b3633f281a0e04d352ae17a1c4f4a41bab156
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.
Here's why:
In principio creavit Fadden Thread et InterpState. And it was good.
Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.
Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.
The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.
Yuck.
With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.
This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.
Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
| |
|
|
|
|
|
|
|
| |
- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE
Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
|
| |
|
|
|
|
|
|
|
|
| |
1) Thumb 'push' can handle lr and 'pop' can handle pc, so make use of them.
2) Thumb2 push was incorrectly encoded as stmia, which should be stmdb
instead.
None of the above affect the code that we currently ship.
Change-Id: I89ab46b032a3d562355c2cc3bc05fe308ba40957
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
I wanted the code to JIT a call a C function extracted so I can potentially
use it elsewhere. The functions that sometimes JIT instructions directly and
other times bail out to C can now call this, simplifying the body of the
switch. I think there's a behavioral change here with the ThumbVFP
genInlineSqrt, which previously had the wrong return value.
Tested on passion to ensure that the performance characteristics of assembler
intrinsics, C intrinsics, and library native methods haven't changed (using
the Math and Float classes).
Change-Id: Id79771a31abe3a516f403486454e9c0d9793622a
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for method compilation, this CL causes all traces to
include two entry points: profiling and non-profiling. For now, the
profiling entry will only be used if dalvik is run with -Xjitprofile,
and largely works like it did before. The difference is that profiling
support no longer requires the "assert" build - it's always there now.
This will enable us to do a form of sampling profiling of
traces in order to identify hot methods or hot trace groups,
while keeping the overhead low by only switching profiling on periodically.
To turn the periodic profiling on and off, we simply unchain all existing
translations and set the appropriate global profile state. The underlying
translation lookup and chaining utilties will examine the profile state to
determine which entry point to use (i.e. - profiling or non-profiling) while
the traces naturally rechain during further execution.
Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
| |
|
|
|
|
|
|
| |
Remove vestiges of code intended for linear scan register allocation
in the trace compiler. New plan is to stick with local allocation for
traces and build a new linear scan allocator for the method compiler.
Change-Id: Ic265ab5a7936b144cbe7fa4dc667fa7aba579045
|
| |
|
|
| |
Change-Id: I06292964a6882ea2d0c17c5c962db95e46b01543
|
| |
|
|
|
|
|
|
|
| |
A lot of this is more about properties of opcodes as opposed to
inspecting instructions per se, and the new naming attempts to
make it clear what is being queried and what sort of data is being
returned.
Change-Id: Ice6f9f2ebf4f1cfa8c99597419aa13d1134a33b2
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This inclduded fixing all the accessor functions to refer to the
global ones defined in InstrUtils.[ch] instead of taking separate
"table pointer" arguments.
This did end up adding a few more truly global references to some of
the code paths, particularly when performing dex optimization, so I
went ahead and measured the time to do a cold first-boot both before
and after the change (on real hardware). The times were identical (to
one-second granularity), so I'm reasonably comfortable making this
change.
Change-Id: I604d9f7882bad4245bb11371218d13b06c3a5375
|
| |
|
|
|
|
|
|
|
| |
At one point, returning a negative width for dexopt output was useful.
That stopped being the case a long time ago.
This also removes a bad assert that went into my previous checkin.
Change-Id: I18880c2316f5499a09dc479d271ca70b2a5be259
|
| |
|
|
|
|
| |
My local build wasn't doing Thumb2. Unsurprising in retrospect.
Change-Id: I38ab4dc80e1115cf459f6d890c7d0eb2705fa7c9
|
| |
|
|
|
|
|
|
| |
Slight reworking of the memory barrier instruction generation to
generalize it, and then add "dmb st" for the new return-void-barrier
instruction.
Change-Id: Iad95aa5b0ba9b616a17dcbe4c6ca2e3906bb49dc
|
| |
|
|
|
|
|
| |
This allows better use of cbz/cbnz on Thumb2 targets. Also, removed
the clrex from the inline monitor enter code (not necessary).
Change-Id: I3bfa90bcdf34f6ef3e2447c9c6f1b49a98a89e58
|
| |
|
|
|
|
| |
Possibly the cause of [2950977 error in onDraw() method for stingray]
Change-Id: I84da4dcb04735ccbedc21fa84c11c3ee8c4aa4e9
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Previously, the JIT wasn't generating short-form compare and branch on
zero/not zero instructions for Thumb2. The reason was that these only
allow a 1-byte displacement, and when they didn't reach the assembler would
abort the trace, split it in half and try again. This change re-enables
cbz, cbnz generation and introduces a relatively lightweight retry
mechanism.
Also includes changes for Thumb2 to always generate large displacement
literal loads and conditional branches to minimize the number of retry
attempts.
Change-Id: Icf066836fad203f5c0fcbbb2ae8e1aa73d1cf816
|
| |
|
|
|
|
|
|
|
|
|
| |
In an attempt to avoid unnecessary register copies, the JIT allows
data items to live in either floating point or core registers until
an instruction is used which requires one or the other. The bug here
was that sub-word data was allowed to live in floating point registers
at the point of a load or store. This cl forces the use of core registers
in those cases.
Change-Id: I60c2a0d1df9a299f6c5130371f44f2be9c348ded
|
| |
|
|
|
|
|
|
|
|
|
| |
The JIT was incorrectly keeping a short value in a floating point
register rather than copying it to a core register before storing.
There was an assert to catch this case, but asserts don't fire in
production builds.
The fix is safe and simple - just exclude this case from the "optimization".
Change-Id: I33767c8a202b6fa36a19d918ac5b914a5e4e4de3
|
| |
|
|
| |
Change-Id: Icbe24eaf1ad499f28b68b6a5f05368271a0a7e86
|
| |
|
|
|
|
|
| |
All invoked functions are documented in compiler/codegen/arm/CalloutHelper.h
Bug: 2567981
Change-Id: Ia7cd4107272df1b0b5588fbcc0aafcc6d0723d60
|
| |
|
|
| |
Change-Id: I7b922e223fe1f5242d1f3db1fa18f54aaed725af
|
| |
|
|
|
|
|
|
| |
Issue 2175597 Jit compile failures should abort translation, but not the VM
Added new dvmCompileAbort() to replace uses of dvmAbort() when something goes
wrong during the compliation of a trace. In that case, we'll abort the translation
and set it's head to the interpret-only "translation".
|
| |
|
|
|
|
| |
The Jit was using ldrex/strex to clear an owned thin lock in the
fast path. This was not necessary - an integer store works and is
much faster.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Re-enabled load/store motion that had inadvertently been turned off for
non-armv7 targets. Tagged memory references with the kind of memory
they touch (Dalvik frame, literal pool, heap) to enable more aggressive
load hoisting. Eliminated some largely duplicate code in the target
specific files. Reworked temp register allocation code to allocate next
temp round-robin (to improve scheduling opportunities).
Overall, nice gain for Sapphire. Shows 5% to 15% on some benchmarks, and
measurable improvements for Passion.
|
| |
|
|
| |
(I saw these the other day, but preferred a separate patch.)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Rather than make these changes in the libraries (*10 being a common case),
let's do them once and for all in the JIT.
The 2^n-1 case could be better if we generated RSB instructions, but the
current "fake" RSB is still better than a full multiply.
Thumb doesn't support reg/reg/reg/shift instructions, so we can't optimize
the "population count <= 2" cases (such as *10) there.
Tested on sholes, passion, and passion-running-sapphire (and visually
inspected to check we weren't trying to generate Thumb2 instructions there).
Also tested with the self-verifier.
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
Two problems with monitor-exit:
1. The Jit code wasn't checking for exception thrown following
unlocks of fat locks using dvmUnlockObject().
2. The mterp interpreter unlock code branched to handle exceptions
thrown during dvmUnlockObject() with the wrong dalvik PC (the
dPC of the unlock, rather than the instruction following the unlock).
Similar issue with the x86 interpreter fixed. Also, deleted armv7-a
MONITOR_ENTER template, which turned out to be identical to the armv5te
one.
|
| | |
|
| |
|
|
|
| |
Renaming of all of those register utilities which used to be local because
of our include mechanism to the standard dvmCompiler prefix scheme.
|
| | |
|
| |
|
|
|
|
|
|
|
|
|
|
| |
The Jit must stop all threads in order to flush the translation cache (and
other tables). Threads which are blocked in a monitor wait cause some
headache here because they effectively hold a references to the translation
cache (though the return address on the native stack). The new model
introduced in this CL is that for the fast path of monitor enter, control
is allowed to resume in the translation cache. However, if we need to do a
heavyweight lock (which may cause us to block) control does not return to the
translation cache but instead bails out to the interpreter. This allows us to
safely clear the code cache even if some threads are in THREAD_MONITOR state.
|