| Commit message (Collapse) | Author | Age | Files | Lines |
| ... | |
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
This shifts responsibility for marking an object as "finalizable" from
object creation to object initialization. We want to make the object
finalizable when Object.<init> completes. For performance reasons we
skip the call to the Object constructor (which doesn't do anything)
and just take the opportunity to check the class flag.
Handling of clone()d object isn't quite right yet.
Also, fixed a minor glitch in stubdefs.
Bug 3342343
Change-Id: I5b7b819079e5862dc9cbd1830bb445a852dc63bf
|
| |/
|
|
|
|
|
|
|
|
| |
The debugging profile mode prints out a list of the top ten traces,
followed by recompilations. In some cases, it is possible that a trace
was requested, but did not finish compiling before the run ended. If
so, that could cause the dump to fail. This CL adds a check for null
codeAddress to detect those cases.
Change-Id: I415fd94d8fa9e270f75d5114fa5cc5d993bd6997
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Even though execute-inline is now a mandatory optimization, you can't be sure
the inline natives will be invoked that way. There's reflection and JNI, for
example, and there's the special case of String.equals that might be invoked
as Object.equals. This patch adds a regular native method corresponding to
each inline native, so that a corresponding libcore patch can drop its
implementations. (For example, despite the fact that we all believed last week
that the Java implementation of String.equals is never used, that turned out
not to be true: every HashMap lookup will have used it. This pair of patches
brings reality in line with our existing belief.)
Change-Id: I19e64c23bea83e91696206ca40ce4e3faf853040
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
The polling is expensive for now as it is done through three
instructions: ld/ld/branch. As a result, a bunch of bonus stuff has
been worked on to mitigate the extra overhead:
- Cleaned up resource flags for memory disambiguation.
- Rewrote load/store elimination and scheduler routines to hide
the ld/ld latency for GC flag. Seperate the dependency checking into
memory disambiguation part and resource conflict part.
- Allowed code motion for Dalvik/constant/non-aliasing loads to be
hoisted above branches for null/range checks.
- Created extended basic blocks following goto instructions so that
longer instruction streams can be optimized as a whole.
Without the bonus stuff, the performance dropped about ~5-10% on some
benchmarks because of the lack of headroom to hide the polling latency
in tight loops. With the bonus stuff, the performance delta is between
+/-5% with polling code generated. With the bonus stuff but disabling
polling, the new bonus stuff provides consistent performance
improvements:
CaffeineMark 3.6%
Linpack 11.1%
Scimark 9.7%
Sieve 33.0%
Checkers 6.0%
As a result, GC polling is disabled by default but can be turned on
through the -Xjitsuspendpoll flag for experimental purposes.
Change-Id: Ia81fc85de3e2b70e6cc93bc37c2b845892003cdb
|
| |\
| |
| |
| | |
into dalvik-dev
|
| | |
| |
| |
| |
| | |
Bug: 3448446
Change-Id: I98e2bbc4886443ba3c27c2963d7540fcee5790bb
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
| |
The invoke-direct-empty instruction was introduced to remove the
overhead of calling the empty Object constructor. We now need it
to do some extra work on behalf of object construction, so it's
appropriate to change the instruction name to match the role it
fills rather than the more general role it was hoped to fill.
No functional changes.
Bug 3342343
Change-Id: I65dd6a2c00c99581c9a19b16fe193b70642c8fbb
|
| |
|
|
|
|
|
|
|
| |
- Set up resource masks correctly for Thumb push/pop when LR/PC are involved.
- Preserve LR around simulated heap references under self-verification mode.
- Compact a few simple flags in ArmLIR into bit fields.
- Minor performance tuning in TEMPLATE_MEM_OP_DECODE
Change-Id: Id73edac837c5bb37dfd21f372d6fa21c238cf42a
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Closes a window in which the "interpret-only" templace could get chained
to an existing trace while the intended translation was under construction.
Note that this CL also introduces some small, but fundamental changes in trace
formation:
1. Previouosly, when an exception or other trace terminating event
occurred during trace formation, the entire trace was abandoned. With this
change, we instead end the trace at the last successful instruction.
2. We previously allowed multiple attempts (perhaps by multiple threads)
to form a trace compilation request for a dalvik PC. This was done in an
attempt to allow recovery from compiler failures. Now we enforce a new rule:
only the thread that wins the race to allocate an entry in the JitTable will
form the trace request.
3. In a (probably misguided) attempt avoid unnecessary contention, we
previously allowed work order enqueue requests to be dropped if a requester
did not aquire TableLock on first attempt (assuming that if the trace were
hot, it would be requested again). Now we block on enqueue.
Change-Id: I40ea4f1b012250219ca37d5c40c5f22cae2092f1
|
| |/
|
|
|
|
|
| |
This feature has been in the code base for several releases but has never
been enabled.
Change-Id: Ia770b03ebc90a3dc7851c0cd8ef301f9762f50db
|
| |
|
|
|
|
|
|
|
|
| |
Enhanced code cache management to accommodate both trace and method
compilations. Also implemented a hacky dispatch routine for virtual
leaf methods.
Microbenchmark showed 3x speedup in leaf method invocation.
Change-Id: I79d95b7300ba993667b3aa221c1df9c7b0583521
|
| |
|
|
|
|
| |
c++ dislikes variables named template.
Change-Id: I6aaf623b449bfdb0c88b9664c55824268992058d
|
| |
|
|
| |
Change-Id: I366df47ebb597a629cb50046320ee3a6230d1ed9
|
| |
|
|
|
|
|
|
|
|
| |
1) Thumb 'push' can handle lr and 'pop' can handle pc, so make use of them.
2) Thumb2 push was incorrectly encoded as stmia, which should be stmdb
instead.
None of the above affect the code that we currently ship.
Change-Id: I89ab46b032a3d562355c2cc3bc05fe308ba40957
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
I wanted the code to JIT a call a C function extracted so I can potentially
use it elsewhere. The functions that sometimes JIT instructions directly and
other times bail out to C can now call this, simplifying the body of the
switch. I think there's a behavioral change here with the ThumbVFP
genInlineSqrt, which previously had the wrong return value.
Tested on passion to ensure that the performance characteristics of assembler
intrinsics, C intrinsics, and library native methods haven't changed (using
the Math and Float classes).
Change-Id: Id79771a31abe3a516f403486454e9c0d9793622a
|
| |/
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This change builds on an earlier bccheng change that allowed JIT'd code
to avoid reverting to the debug portable interpeter when doing traceview-style
method profiling. That CL introduced a new traceview build (libdvm_traceview)
because the performance delta was too great to enable the capability for
all builds.
In this CL, we remove the libdvm_traceview build and provide full-speed
method tracing in all builds. This is done by introducing "_PROF"
versions of invoke and return templates used by the JIT. Normally, these
templates are not used, and performace in unaffected. However, when method
profiling is enabled, all existing translation are purged and new translations
are created using the _PROF templates. These templates introduce a
smallish performance penalty above and beyond the actual tracing cost, but
again are only used when tracing has been enabled.
Strictly speaking, there is a slight burden that is placed on invokes and
returns in the non-tracing case - on the order of an additional 3 or 4
cycles per invoke/return. Those operations are already heavyweight enough
that I was unable to measure the added cost in benchmarks.
Change-Id: Ic09baf4249f1e716e136a65458f4e06cea35fc18
|
| |\
| |
| |
| |
| | |
* commit '45e9a9908f8874b64294dbd3e4dcfb6b76c4b6e3':
Only generate debugging LIRs in verbose mode.
|
| | |
| |
| |
| |
| |
| |
| | |
This should reduce memory usage and JIT time a bit.
Affected opcodes: kArmPseudoSSARep and kArmPseudoDalvikByteCodeBoundary.
Change-Id: I18ce9338b8d258270df51a66f9dc98cd2d9dd0e8
|
| | |
| |
| |
| |
| |
| |
| |
| |
| | |
This enables jumbo opcodes by default, and they will get used by the
current build without modification. Support has been added for arm, x86,
and the portable interpreter. x86-atom support is on the TODO list. This
commit also includes a test for the new jumbo opcodes.
Change-Id: Ic3f1b41b51645861c5196f76aaf0e96e727ea537
|
| |\|
| |
| |
| | |
Change-Id: I56b52104f50d2e67115227e61e4b250e1116135d
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Only register entry points dispatched through [r6+#offset] in
JitToInterpEntries.
For ARM targets check the size of JitToInterpEntries explicitly to
make sure that its last entry is within 128 byte from InterpState
due to the Thumb codegen constraint.
Change-Id: I74184115cb3a3c89afc3a5fe53685671d9cb1027
|
| |\|
| |
| |
| |
| |
| |
| | |
entry point.
* commit 'af5aa1f4ce7eecc1b47a4c038cebb67d33f08f18':
Don't treat dvmJitToPatchPredictedChain as a Jit-to-Interp entry point.
|
| | |
| |
| |
| |
| |
| | |
It is just a native callout helper function.
Change-Id: I6398b6876f5ba579b76e732107157a4c99337796
|
| |\|
| |
| |
| |
| | |
* commit 'a85893356ac4d86ef7d7dd18807d7bef95d7dddb':
[Jit] Fix for 3311468 Maps crashed at handleFmt...
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Change https://android-git.corp.google.com/g/#change,86452 eliminated unused
chaining cells for direct JNI calls. However, a code path in CodegenDriver.c
assumed all similar invokes would have such cells. Slightly re-arranged the
to avoid relying on the existance of the cell in cases in which it isn't
needed.
Change-Id: Ifc28acf559455a292b4b915ef1302085557e1d81
|
| |\|
| |
| |
| |
| | |
* commit '0d1aac383a4bdce9feaad2f614df42234c2dcced':
Revert "Remove inline natives for an unused performance test."
|
| | |
| |
| |
| |
| |
| | |
This reverts commit 7ecd89dc02ce00c425788bd4989bdb6cde9a618a.
Change-Id: I427635b7e3f7be45cfde78b8046dab3b23b64562
|
| |\|
| |
| |
| |
| | |
* commit '7ecd89dc02ce00c425788bd4989bdb6cde9a618a':
Remove inline natives for an unused performance test.
|
| | |
| |
| |
| | |
Change-Id: I80cfb918bdf174aeb6de83909c840563f6b945dd
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
In preparation for method compilation, this CL causes all traces to
include two entry points: profiling and non-profiling. For now, the
profiling entry will only be used if dalvik is run with -Xjitprofile,
and largely works like it did before. The difference is that profiling
support no longer requires the "assert" build - it's always there now.
This will enable us to do a form of sampling profiling of
traces in order to identify hot methods or hot trace groups,
while keeping the overhead low by only switching profiling on periodically.
To turn the periodic profiling on and off, we simply unchain all existing
translations and set the appropriate global profile state. The underlying
translation lookup and chaining utilties will examine the profile state to
determine which entry point to use (i.e. - profiling or non-profiling) while
the traces naturally rechain during further execution.
Change-Id: I9ee33e69e33869b9fab3a57e88f9bc524175172b
|
| | |
| |
| |
| |
| |
| |
| |
| | |
Remove vestiges of code intended for linear scan register allocation
in the trace compiler. New plan is to stick with local allocation for
traces and build a new linear scan allocator for the method compiler.
Change-Id: Ic265ab5a7936b144cbe7fa4dc667fa7aba579045
|
| |\ \ |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | | |
Nuked a void* cast warnings and moved cacheflush into a target-specific
utility wrapper.
Change-Id: I36c841288b9ec7e03c0cb29b2e89db344f36fad1
|
| |/ /
| |
| |
| | |
Change-Id: I8828bc628f110aaade578a197bf1f51b30bf1be7
|
| | |
| |
| |
| | |
Change-Id: If3fb3a36f33aaee8e5fdded4e9fa607be54f0bfb
|
| |/
|
|
| |
Change-Id: I06292964a6882ea2d0c17c5c962db95e46b01543
|
| |
|
|
|
|
|
|
| |
kNumDalvikInstructions is now kNumPackedOpcodes, there is a new
kMaxOpcodeValue, and both are generated by opcode-gen.
Change-Id: Ic46f1f52d2d21382452c8e777024f4a985ad31d3
Bonus: Reworded the switch and array data comment for clarity.
|
| |
|
|
|
|
|
| |
With this change, it's still implemented as an unused opcode, but
it's now ready for its new life!
Change-Id: Ic70d311704925067e47d87b657d133a792144e65
|
| |
|
|
|
|
|
|
|
| |
A lot of this is more about properties of opcodes as opposed to
inspecting instructions per se, and the new naming attempts to
make it clear what is being queried and what sort of data is being
returned.
Change-Id: Ice6f9f2ebf4f1cfa8c99597419aa13d1134a33b2
|
| |
|
|
|
|
|
|
|
|
| |
Similarly "Opcode" not "OpCode".
This appears to be the general worldwide consensus on the matter. Other
residents of my office didn't seem to mind one way or the other how it's
spelled in our code, but for whatever reason, it really bugged me.
Change-Id: Ia0b73d19c54aefc0f543a9c9451dda22ee876a59
|
| |
|
|
|
|
|
|
| |
Also incorporate the former contents of OpCodeNames.h. This is a small
attempt to increase naming consistency in libdex. There will be a bit
more to come, in a follow-up.
Change-Id: Ia7ab06042dde2e19eda02ef1fee72fb4260e899d
|
| |
|
|
|
|
|
| |
In particular, use it instead of just saying 256, and similarly for
255. The number of opcodes will be changing soon.
Change-Id: Icc77120c2673968dddd6b4003f717245d46e4159
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This inclduded fixing all the accessor functions to refer to the
global ones defined in InstrUtils.[ch] instead of taking separate
"table pointer" arguments.
This did end up adding a few more truly global references to some of
the code paths, particularly when performing dex optimization, so I
went ahead and measured the time to do a cold first-boot both before
and after the change (on real hardware). The times were identical (to
one-second granularity), so I'm reasonably comfortable making this
change.
Change-Id: I604d9f7882bad4245bb11371218d13b06c3a5375
|
| |
|
|
|
|
|
|
|
| |
At one point, returning a negative width for dexopt output was useful.
That stopped being the case a long time ago.
This also removes a bad assert that went into my previous checkin.
Change-Id: I18880c2316f5499a09dc479d271ca70b2a5be259
|
| |\
| |
| |
| |
| | |
* commit '72ef412b56becfbdd54f239ea672a48b163ff1d2':
Fix uninitialized variable warning(error).
|
| | |
| |
| |
| | |
Change-Id: I10815b033199758976950b28af7cc412f093a7f5
|