| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
| |
Change-Id: Idffbdb02c29e2be03a75f5a0a664603f2299504a
|
| |
|
|
|
|
|
|
|
|
|
|
| |
On SMP systems, Dalvik opcodes referencing volatile fields will be
rewritten to their _VOLATILE variant. On non-SMP systems, though,
this rewriting is not done. The JIT, however, needs to know about
volatility for all systems in order to avoid performing unsafe
optimizations. This change fixes the JIT's volatility test to be
either _VOLATILE opcode or the volatile flag in the field access bits
depending on SMP type.
Change-Id: I2edde58dc25f22cba88f62c5f1a2d125473309e6
|
| |
|
|
| |
This reverts commit 11fb99d598ebe640719743a0d3bd7ed091e5be03.
|
| |
|
|
|
|
|
|
|
|
|
|
| |
On SMP systems, Dalvik opcodes referencing volatile fields will be
rewritten to their _VOLATILE variant. On non-SMP systems, though,
this rewriting is not done. The JIT, however, needs to know about
volatility for all systems in order to avoid performing unsafe
optimizations. This change fixes the JIT's volatility test to be
either _VOLATILE opcode or the volatile flag in the field access bits
depending on SMP type.
Change-Id: If485875c5abbf0147c3ef4b6d557faa89ed85426
|
| |
|
|
|
|
|
|
| |
In the past, it was possible to have a volatile field with a
non-volatile opcode. This is no longer the case, so this change
eliminates the volatile field flag check.
Change-Id: I1cface4e813144634b2f90732c76b0a16f08c304
|
| |
|
|
| |
Change-Id: I79cdac62baa40582bba160a04cbd4c8b2c9151a5
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Adapt the existing counted loop analysis and range/null check
elimination code to work with the new loop building heuristics.
Cleaned up the old ad-hoc loop builder.
Suspend polling is enabled by default for loops. The backward chaining
cell will be used in self-verification and profiling mode.
If the loop includes accesses to resolved fields/classes, abort code
generation for now and revert to the basic acyclic trace. Added
tests/090-loop-formation to make sure the JIT won't choke on such
instructions.
Change-Id: Idbc57df0a745be3b692f68c1acb6d4861c537f75
|
| |\ |
|
| | |
| |
| |
| |
| |
| | |
Neglected to git-add the most recent version in Change I0c0b6d17.
Change-Id: I5f9e630f652edcf70ab893ade6f559056ed31f8f
|
| |\| |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
We were missing a dozen or so of the new jumbo ops in the JIT's
code generator. Interestingly, our compiler error recovery mechanism was
good enough that we didn't notice (we die on an assert build,
but silently recover and continue on a production build).
Change-Id: I0c0b6d1704c47e81b39c7dcf7d1172dbdcd29856
|
| |/
|
|
| |
Change-Id: If38ce20b2ac30488118259e3b3bdfbc9f068e5c3
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is a restructuring of the Dalvik ARM and x86 interpreters:
o Combine the old portstd and portdbg interpreters into a single
portable interpreter.
o Add debug/profiling support to the fast (mterp) interpreters.
o Delete old mechansim of switching between interpreters. Now, once
you choose an interpreter at startup, you stick with it.
o Allow JIT to co-exist with profiling & debugging (necessary for
first-class support of debugging with the JIT active).
o Adds single-step capability to the fast assembly interpreters without
slowing them down (and, in fact, measurably improves their performance).
o Remove old "polling for safe point" mechanism. Breakouts now achieved
via modifying base of interpreter handler table.
o Simplify interpeter control mechanism.
o Allow thread-granularity control for profiling & debugging
The primary motivation behind this change was to improve the responsiveness
of debugging and profiling and to make it easier to add new debugging and
profiling capabilities in the future. Instead of always bailing out to the
slow debug portable interpreter, we can now stay in the fast interpreter.
A nice side effect of the change is that the fast interpreters
got a healthy speed boost because we were able to replace the
polling safepoint check that involved a dozen or so instructions
with a single table-base reload. When combined with the two earlier CLs
related to this restructuring, we show a 5.6% performance improvement
using libdvm_interp.so on the Checkers benchmark relative to Honeycomb.
Change-Id: I8d37e866b3618def4e582fc73f1cf69ffe428f3c
|
| |
|
|
|
|
|
| |
For example:
chaining cell (predicted): Ljava/lang/Object;getClass
Change-Id: Ia53340baab87d6b744fc7189b141737a4a54cc42
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
1) Split the original literal pool into class object literals and
constants. Elements in the class object pool have to match the specicial
values perfectly (ie no +delta space optimizations) since they might be
relocated.
2) Implement dvmJitScanAllClassPointers(void (*callback)(void *))
which is the entry routine to report all memory locations in the code cache
that contain class objects (ie class object pool and predicted chaining
cells for virtual calls).
3) Major codegen changes on how/when the class object pool are populated
and how predicted chains are patched. Before this change the compiler
thread is always in the VM_WAIT state, which won't prevent GC from
running. Since the class object pointers captured by a worker thread
are no longer guaranteed to be stable at JIT time, change various
internal data structures to capture the class descriptor/loader
tuple instead. The conversion from descriptor/loader tuple to actual
class object pointers are only performed when the thread state is
RUNNING or at GC safe point.
4) Separate the class object installation phase out of the main
dvmCompilerAssembleLIR routine so that the impact to blocking GC
requests is minimal. Add new stats to report the potential block time.
For example:
Potential GC blocked by compiler: max 46 us / avg 25 us
5) Various cleanup in the trace structure walkup code. Modified the
verbose print routine to show the class descriptor in the class literal
pool. For example:
D/dalvikvm( 1450): -------- end of chaining cells (0x007c)
D/dalvikvm( 1450): 0x44020628 (00b4): .class
(Lcom/android/unit_tests/PerformanceTests$EmptyClass;)
D/dalvikvm( 1450): 0x4402062c (00b8): .word (0xaca8d1a5)
D/dalvikvm( 1450): 0x44020630 (00bc): .word (0x401abc02)
D/dalvikvm( 1450): End
Bug: 3482956
Change-Id: I2e736b00d63adc255c33067544606b8b96b72ffc
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
The invoke-object-init instruction pretends to be a regular invoke
that only knows how to call Object.<init>. As such it always takes
one argument, and if we use the /range version we can specify the
"this" register with 16 bits instead of only 4.
Bug 3486699
Change-Id: I9ee4700c6935beee1dcbaa583b57befd33641414
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The current implementation is to reconstruct the leaf Dalvik frame and
punt to the interpreter, since the amount of work involed to match
each catch block and walk through the stack frames is just not worth
JIT'ing.
Additional changes:
- Fixed a control-flow bug where a block that ends with a throw shouldn't
have a fall-through block.
- Fixed a code cache lookup bug so that method-based compilation is
guaranteed a slot in the profiling table.
- Created separate handler routines based on opcode format for the
method-based JIT.
- Renamed a few core registers that also have special meanings to the
VM or ARM architecture.
Change-Id: I429b3633f281a0e04d352ae17a1c4f4a41bab156
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The key datastructure for the interpreter is InterpState.
This change eliminates it, merging its data with the Thread structure.
Here's why:
In principio creavit Fadden Thread et InterpState. And it was good.
Thread holds thread-private state, while InterpState captures data
associated with a Dalvik interpreter activation. Because JNI calls
can result in nested interpreter invocations, we can have more than one
InterpState for each actual thread. InterpState was relatively small,
and it all worked well. It was used enough that in the Arm version
a register (rGLUE) was dedicated to it.
Then, along came the JIT guys, who saw InterpState as a convenient place
to dump all sorts of useful data that they wanted quick access to through
that dedicated register. InterpState grew and grew. In terms of
space, this wasn't a big problem - but it did mean that the initialization
cost of each interpreter activation grew as well. For applications
that do a lot of callbacks from native code into Dalvik, this is
measurable. It's also mostly useless cost because much of the JIT-related
InterpState initialization was setting up useful constants - things that
don't need to be saved and restored all the time.
The biggest problem, though, deals with thread control. When something
interesting is happening that needs all threads to be stopped (such as
GC and debugger attach), we have access to all of the Thread structures,
but we don't have access to all of the InterpState structures (which
may be buried/nested on the native stack). As a result, polling for
thread suspension is done via a one-indirection pointer chase. InterpState
itself can't hold the stop bits because we can't always find it, so
instead it holds a pointer to the global or thread-specific stop control.
Yuck.
With this change, we eliminate InterpState and merge all needed data
into Thread. Further, we replace the decidated rGLUE register with a
pointer to the Thread structure (rSELF). The small subset of state
data that needs to be saved and restored across nested interpreter
activations is collected into a record that is saved to the interpreter
frame, and restored on exit. Further, these small records are linked
together to allow tracebacks to show nested activations. Old InterpState
variables that simply contain useful constants are initialized once at
thread creation time.
This CL is large enough by itself that the new ability to streamline
suspend checks is not done here - that will happen in a future CL. Here
we just focus on consolidation.
Change-Id: Ide6b2fb85716fea454ac113f5611263a96687356
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This shifts responsibility for marking an object as "finalizable" from
object creation to object initialization. We want to make the object
finalizable when Object.<init> completes. For performance reasons we
skip the call to the Object constructor (which doesn't do anything)
and just take the opportunity to check the class flag.
Handling of clone()d object isn't quite right yet.
Also, fixed a minor glitch in stubdefs.
Bug 3342343
Change-Id: I5b7b819079e5862dc9cbd1830bb445a852dc63bf
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The polling is expensive for now as it is done through three
instructions: ld/ld/branch. As a result, a bunch of bonus stuff has
been worked on to mitigate the extra overhead:
- Cleaned up resource flags for memory disambiguation.
- Rewrote load/store elimination and scheduler routines to hide
the ld/ld latency for GC flag. Seperate the dependency checking into
memory disambiguation part and resource conflict part.
- Allowed code motion for Dalvik/constant/non-aliasing loads to be
hoisted above branches for null/range checks.
- Created extended basic blocks following goto instructions so that
longer instruction streams can be optimized as a whole.
Without the bonus stuff, the performance dropped about ~5-10% on some
benchmarks because of the lack of headroom to hide the polling latency
in tight loops. With the bonus stuff, the performance delta is between
+/-5% with polling code generated. With the bonus stuff but disabling
polling, the new bonus stuff provides consistent performance
improvements:
CaffeineMark 3.6%
Linpack 11.1%
Scimark 9.7%
Sieve 33.0%
Checkers 6.0%
As a result, GC polling is disabled by default but can be turned on
through the -Xjitsuspendpoll flag for experimental purposes.
Change-Id: Ia81fc85de3e2b70e6cc93bc37c2b845892003cdb
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
| |
The invoke-direct-empty instruction was introduced to remove the
overhead of calling the empty Object constructor. We now need it
to do some extra work on behalf of object construction, so it's
appropriate to change the instruction name to match the role it
fills rather than the more general role it was hoped to fill.
No functional changes.
Bug 3342343
Change-Id: I65dd6a2c00c99581c9a19b16fe193b70642c8fbb
|
| |
|
|
|
|
|
|
|
| |
- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE
Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Closes a window in which the "interpret-only" templace could get chained
to an existing trace while the intended translation was under construction.
Note that this CL also introduces some small, but fundamental changes in trace
formation:
1. Previouosly, when an exception or other trace terminating event
occurred during trace formation, the entire trace was abandoned. With this
change, we instead end the trace at the last successful instruction.
2. We previously allowed multiple attempts (perhaps by multiple threads)
to form a trace compilation request for a dalvik PC. This was done in an
attempt to allow recovery from compiler failures. Now we enforce a new rule:
only the thread that wins the race to allocate an entry in the JitTable will
form the trace request.
3. In a (probably misguided) attempt avoid unnecessary contention, we
previously allowed work order enqueue requests to be dropped if a requester
did not aquire TableLock on first attempt (assuming that if the trace were
hot, it would be requested again). Now we block on enqueue.
Change-Id: I40ea4f1b012250219ca37d5c40c5f22cae2092f1
|
| |/
|
|
|
|
|
| |
This feature has been in the code base for several releases but has never
been enabled.
Change-Id: Ia770b03ebc90a3dc7851c0cd8ef301f9762f50db
|
| |
|
|
|
|
|
|
|
|
| |
Enhanced code cache management to accommodate both trace and method
compilations. Also implemented a hacky dispatch routine for virtual
leaf methods.
Microbenchmark showed 3x speedup in leaf method invocation.
Change-Id: I79d95b7300ba993667b3aa221c1df9c7b0583521
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I wanted the code to JIT a call a C function extracted so I can potentially
use it elsewhere. The functions that sometimes JIT instructions directly and
other times bail out to C can now call this, simplifying the body of the
switch. I think there's a behavioral change here with the ThumbVFP
genInlineSqrt, which previously had the wrong return value.
Tested on passion to ensure that the performance characteristics of assembler
intrinsics, C intrinsics, and library native methods haven't changed (using
the Math and Float classes).
Change-Id: Id79771a31abe3a516f403486454e9c0d9793622a
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change builds on an earlier bccheng change that allowed JIT'd code
to avoid reverting to the debug portable interpeter when doing traceview-style
method profiling. That CL introduced a new traceview build (libdvm_traceview)
because the performance delta was too great to enable the capability for
all builds.
In this CL, we remove the libdvm_traceview build and provide full-speed
method tracing in all builds. This is done by introducing "_PROF"
versions of invoke and return templates used by the JIT. Normally, these
templates are not used, and performace in unaffected. However, when method
profiling is enabled, all existing translation are purged and new translations
are created using the _PROF templates. These templates introduce a
smallish performance penalty above and beyond the actual tracing cost, but
again are only used when tracing has been enabled.
Strictly speaking, there is a slight burden that is placed on invokes and
returns in the non-tracing case - on the order of an additional 3 or 4
cycles per invoke/return. Those operations are already heavyweight enough
that I was unable to measure the added cost in benchmarks.
Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
|
| |\
| |
| |
| |
| | |
* commit '45e9a9908f8874b64294dbd3e4dcfb6b76c4b6e3':
Only generate debugging LIRs in verbose mode.
|
| | |
| |
| |
| |
| |
| |
| | |
This should reduce memory usage and JIT time a bit.
Affected opcodes: kArmPseudoSSARep and kArmPseudoDalvikByteCodeBoundary.
Change-Id: I18ce9338b8d258270df51a66f9dc98cd2d9dd0e8
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
This enables jumbo opcodes by default, and they will get used by the
current build without modification. Support has been added for arm, x86,
and the portable interpreter. x86-atom support is on the TODO list. This
commit also includes a test for the new jumbo opcodes.
Change-Id: Ic3f1b41b51645861c5196f76aaf0e96e727ea537
|
| |\|
| |
| |
| |
| |
| |
| | |
entry point.
* commit 'af5aa1f4ce7eecc1b47a4c038cebb67d33f08f18':
Don't treat dvmJitToPatchPredictedChain as a Jit-to-Interp entry point.
|
| | |
| |
| |
| |
| |
| | |
It is just a native callout helper function.
Change-Id: I6398b6876f5ba579b76e732107157a4c99337796
|
| |\|
| |
| |
| |
| | |
* commit 'a85893356ac4d86ef7d7dd18807d7bef95d7dddb':
[Jit] Fix for 3311468 Maps crashed at handleFmt...
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change https://android-git.corp.google.com/g/#change,86452 eliminated unused
chaining cells for direct JNI calls. However, a code path in CodegenDriver.c
assumed all similar invokes would have such cells. Slightly re-arranged the
to avoid relying on the existance of the cell in cases in which it isn't
needed.
Change-Id: Ifc28acf559455a292b4b915ef1302085557e1d81
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparation for method compilation, this CL causes all traces to
include two entry points: profiling and non-profiling. For now, the
profiling entry will only be used if dalvik is run with -Xjitprofile,
and largely works like it did before. The difference is that profiling
support no longer requires the "assert" build - it's always there now.
This will enable us to do a form of sampling profiling of
traces in order to identify hot methods or hot trace groups,
while keeping the overhead low by only switching profiling on periodically.
To turn the periodic profiling on and off, we simply unchain all existing
translations and set the appropriate global profile state. The underlying
translation lookup and chaining utilties will examine the profile state to
determine which entry point to use (i.e. - profiling or non-profiling) while
the traces naturally rechain during further execution.
Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
| | |
| |
| |
| | |
Change-Id: If3fb3a36f33aaee8e5fdded4e9fa607be54f0bfb
|
| |/
|
|
| |
Change-Id: I06292964a6882ea2d0c17c5c962db95e46b01543
|
| |
|
|
|
|
|
|
| |
kNumDalvikInstructions is now kNumPackedOpcodes, there is a new
kMaxOpcodeValue, and both are generated by opcode-gen.
Change-Id: Ic46f1f52d2d21382452c8e777024f4a985ad31d3
Bonus: Reworded the switch and array data comment for clarity.
|
| |
|
|
|
|
|
| |
With this change, it's still implemented as an unused opcode, but
it's now ready for its new life!
Change-Id: Ic70d311704925067e47d87b657d133a792144e65
|
| |
|
|
|
|
|
|
|
| |
A lot of this is more about properties of opcodes as opposed to
inspecting instructions per se, and the new naming attempts to
make it clear what is being queried and what sort of data is being
returned.
Change-Id: Ice6f9f2ebf4f1cfa8c99597419aa13d1134a33b2
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
| |
In particular, use it instead of just saying 256, and similarly for
255. The number of opcodes will be changing soon.
Change-Id: Icc77120c2673968dddd6b4003f717245d46e4159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This inclduded fixing all the accessor functions to refer to the
global ones defined in InstrUtils.[ch] instead of taking separate
"table pointer" arguments.
This did end up adding a few more truly global references to some of
the code paths, particularly when performing dex optimization, so I
went ahead and measured the time to do a cold first-boot both before
and after the change (on real hardware). The times were identical (to
one-second granularity), so I'm reasonably comfortable making this
change.
Change-Id: I604d9f7882bad4245bb11371218d13b06c3a5375
|
| |
|
|
|
|
|
|
|
| |
At one point, returning a negative width for dexopt output was useful.
That stopped being the case a long time ago.
This also removes a bad assert that went into my previous checkin.
Change-Id: I18880c2316f5499a09dc479d271ca70b2a5be259
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This is in prep for -- recurring theme here -- adding the new extended
opcode formats. It turns out that we can avoid a lot of duplicated code
if we determine the type of thing referred to in index-bearing instructions
inside the general instruction decoder. To do so straightforwardly, this
means adding a new opcode info table and then passing it into the decoder.
Rather than add another argument to the decoder, I defined a struct to
contain all the info tables together, and a pointer to that can get passed
in.
I simplified the setting up of the info tables, too, so all the
allocation is handled within InstrUtils, rather than being (partially)
duplicated in a couple places. The only downside is that dexdump will
construct one more table than it actually needs, but given that
construction is quick and the table is only 256 bytes (though will
soon be growing to -- gasp! -- 294 bytes), I figure it's not such a
big deal.
Most of the files that changed only had edits for how to refer to these
info tables.
Change-Id: Ia6f1cb25da6e558ac90c6dd3af6bce36b82a6b4d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In particular, I altered the naming of some instruction format fields
as well as the names of instruction formats themselves, all in an attempt
to make the implementation be a more straightforward match of the spec.
This patch mostly changes comments to reflect the new harmonized
reality. The only "code-like" change is the renaming of kFmt3inline
and kFmt3rinline to kFmt35mi and kFmt3rmi (respectively), which is
what they're called in the spec.
Bonus: Added the new extended opcode instruction formats to
InstrUtils.h, though I left them commented out for now.
Change-Id: I0109f361c1e9b6f0308c45e8cda5320e9ad3060c
|
| |
|
|
|
|
|
|
| |
Slight reworking of the memory barrier instruction generation to
generalize it, and then add "dmb st" for the new return-void-barrier
instruction.
Change-Id: Iad95aa5b0ba9b616a17dcbe4c6ca2e3906bb49dc
|