| Commit message (Collapse) | Author | Age | Files | Lines |
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
cpu_power has been added to keep track of amount of power each task is
consuming. cpu_power is updated whenever stime and utime are updated for
a task. power is computed by taking into account the frequency at which
the current core was running and the current for cpu actively
running at hat frequency.
Bug: 21498425
Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
Git-commit: 94877641f6b6ea17aa335729f548eb5647db3e3e
Git-repo: https://android.googlesource.com/kernel/msm/
Signed-off-by: Nirmal Abraham <nabrah@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
sched_get_nr_running_avg() returns average nr_running and nr_iowait
task count since it was last invoked. Fix several bugs in their
calculation.
* sched_update_nr_prod() needs to consider that nr_running count can
change by more than 1 when CFS_BANDWIDTH feature is used
* sched_get_nr_running_avg() needs to sum up nr_iowait count across
all cpus, rather than just one
* sched_get_nr_running_avg() could race with sched_update_nr_prod(),
as a result of which it could use curr_time which is behind a cpu's
'last_time' value. That would lead to erroneous calculation of
average nr_running or nr_iowait.
While at it, fix also a bug in BUG_ON() check in
sched_update_nr_prod() function and remove unnecessary nr_running
argument to sched_update_nr_prod() function.
Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Each time a task wakes up, scheduler evaluates its load and notifies
governor if the resulting frequency of destination CPU is larger than
a threshold. However, some governor wakes up a separate task that
handles frequency change, which again calls wake_up_process().
This is dangerous because if the task being woken up meets the
threshold and ends up being moved around, there is a potential for
endless recursive notifications.
Introduce a new API for waking up a task without triggering
frequency notification.
Change-Id: I24261af81b7dc410c7fb01eaa90920b8d66fbd2a
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
|
| |
|
|
|
|
|
| |
Add trace events for core control module.
Change-Id: I36da5381709f81ef1ba82025cd9cf8610edef3fc
Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
Extend sched_get_nr_running_avg() API to return average nr_big_tasks,
in addition to average nr_running and average nr_io_wait tasks. Also
add a new trace point to record values returned by
sched_get_nr_running_avg() API.
Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
[rameezmustafa@codeaurora.org: Resolve trivial merge conflicts]
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| |
|
|
| |
Change-Id: Ic50c8f4aea56d08a791d1a2b07fe69515757df22
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There have been a few reported issues wrt. the lack of locking around
changing event->ctx. This patch tries to address those.
It avoids the whole rwsem thing; and while it appears to work, please
give it some thought in review.
What I did fail at is sensible runtime checks on the use of
event->ctx, the RCU use makes it very hard.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
(cherry picked from commit f63a8daa5812afef4f06c962351687e1ff9ccb2b)
Bug: 30955111
Bug: 31095224
Change-Id: I5bab713034e960fad467637e98e914440de5666d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When moving a group_leader perf event from a software-context
to a hardware-context, there's a race in checking and
updating that context. The existing locking solution
doesn't work; note that it tries to grab a lock inside
the group_leader's context object, which you can only
get at by going through a pointer that should be protected
from these races. To avoid that problem, and to produce
a simple solution, we can just use a lock per group_leader
to protect all checks on the group_leader's context.
The new lock is grabbed and released when no context locks
are held.
Bug: 30955111
Bug: 31095224
Change-Id: If37124c100ca6f4aa962559fba3bd5dbbec8e052
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry picked from commit 43761473c254b45883a64441dd0bc85a42f3645c)
There is a double fetch problem in audit_log_single_execve_arg()
where we first check the execve(2) argumnets for any "bad" characters
which would require hex encoding and then re-fetch the arguments for
logging in the audit record[1]. Of course this leaves a window of
opportunity for an unsavory application to munge with the data.
This patch reworks things by only fetching the argument data once[2]
into a buffer where it is scanned and logged into the audit
records(s). In addition to fixing the double fetch, this patch
improves on the original code in a few other ways: better handling
of large arguments which require encoding, stricter record length
checking, and some performance improvements (completely unverified,
but we got rid of some strlen() calls, that's got to be a good
thing).
As part of the development of this patch, I've also created a basic
regression test for the audit-testsuite, the test can be tracked on
GitHub at the following link:
* https://github.com/linux-audit/audit-testsuite/issues/25
[1] If you pay careful attention, there is actually a triple fetch
problem due to a strnlen_user() call at the top of the function.
[2] This is a tiny white lie, we do make a call to strnlen_user()
prior to fetching the argument data. I don't like it, but due to the
way the audit record is structured we really have no choice unless we
copy the entire argument at once (which would require a rather
wasteful allocation). The good news is that with this patch the
kernel no longer relies on this strnlen_user() value for anything
beyond recording it in the log, we also update it with a trustworthy
value whenever possible.
Reported-by: Pengfei Wang <wpengfeinudt@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Paul Moore <paul@paul-moore.com>
Change-Id: I10e979e94605e3cf8d461e3e521f8f9837228aa5
Bug: 30956807
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
There's a race on CPU unplug where we free the swevent hash array
while it can still have events on. This will result in a
use-after-free which is BAD.
Simply do not free the hash array on unplug. This leaves the thing
around and no use-after-free takes place.
When the last swevent dies, we do a for_each_possible_cpu() iteration
anyway to clean these up, at which time we'll free it, so no leakage
will occur.
Change-Id: I368c84b9d9c77c82121358a8fd29a5b012364321
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
When kernel.perf_event_open is set to 3 (or greater), disallow all
access to performance events by users without CAP_SYS_ADMIN.
Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that
makes this value the default.
This is based on a similar feature in grsecurity
(CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making
the variable read-only. It also allows enabling further restriction
at run-time regardless of whether the default is changed.
https://lkml.org/lkml/2016/1/11/587
Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
Bug: 29054680
Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
We're copying the on-stack structure to userspace, but forgot to give
the right number of bytes to copy. This allows the calling process to
obtain up to PAGE_SIZE bytes from the stack (and possibly adjacent
kernel memory).
This fix copies only as much as we actually have on the stack
(attr->size defaults to the size of the struct) and leaves the rest of
the userspace-provided buffer untouched.
Found using kmemcheck + trinity.
Fixes: d50dde5a10f30 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI")
Bug: 28731691
Change-Id: Iaf2ecb4d2b0fa9df09539f2fc9ad6fd123d87aa2
Cc: Dario Faggioli <raistlin@linux.it>
Cc: Juri Lelli <juri.lelli@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1392585857-10725-1-git-send-email-vegard.nossum@oracle.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
|
| |
|
|
|
|
|
|
|
|
|
|
| |
This deliberately changes the behavior of the per-cpuset
cpus file to not be effected by hotplug. When a cpu is offlined,
it will be removed from the cpuset/cpus file. When a cpu is onlined,
if the cpuset originally requested that that cpu was part of the cpuset, that
cpu will be restored to the cpuset. The cpus files still
have to be hierachical, but the ranges no longer have to be out of
the currently online cpus, just the physically present cpus.
Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924
|
| |
|
|
| |
Change-Id: Ic1b61b2bbb7ce74c9e9422b5e22ee9078251de21
|
| |
|
|
|
|
|
|
|
| |
Decare war on uninterruptible sleep. Add a tracepoint which
walks the kernel stack and dumps the first non-scheduler function
called before the scheduler is invoked.
Change-Id: I19e965d5206329360a92cbfe2afcc8c30f65c229
Signed-off-by: Riley Andrews <riandrews@google.com>
|
| |
|
|
|
|
|
|
|
|
|
| |
prctl_set_vma_anon_name could attempt to set the name across
two vmas at the same time due to a typo, which might corrupt
the vma list. Fix it to use tmp instead of end to limit
the name setting to a single vma at a time.
Change-Id: Ie32d8ddb0fd547efbeedd6528acdab5ca5b308b4
Reported-by: Jed Davis <jld@mozilla.com>
Signed-off-by: Colin Cross <ccross@android.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry pick from commit 73af963f9f3036dffed55c3a2898598186db1045)
__ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task !=
current, this can can lead to surprising results.
For example, a sub-thread can't readlink("/proc/self/exe") if the
executable is not readable. setup_new_exec()->would_dump() notices that
inode_permission(MAY_READ) fails and then it does
set_dumpable(suid_dumpable). After that get_dumpable() fails.
(It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we
could add PTRACE_MODE_NODUMPABLE)
Change __ptrace_may_access() to use same_thread_group() instead of "task
== current". Any security check is pointless when the tasks share the
same ->mm.
Signed-off-by: Mark Grondona <mgrondona@llnl.gov>
Signed-off-by: Ben Woodard <woodard@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Bug: 26016905
Change-Id: If9e2a0eb3339d26d50a9d84671a189fe405f36a3
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Prevent unintended kernel NULL pointer dereferencing.
Code:
hlist_del_rcu(&event->hlist_entry);
Fix: Adding pointer check:
if(!hlist_unhashed(&p_event->hlist_entry))
hlist_del_rcu(&p_event->hlist_entry);
Bug: 25364034
Change-Id: Ib13a7400d4a36a4b08b0afc9b7d69c6027e741b6
Signed-off-by: Yuan Lin <yualin@google.com>
Signed-off-by: Christian Bejram <cbejram@google.com>
(cherry picked from commit 8e04ed2bc057656d343a28ab8b01718371ca9f42)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
(cherry picked from commit https://lkml.org/lkml/2015/12/21/337)
ASLR only uses as few as 8 bits to generate the random offset for the
mmap base address on 32 bit architectures. This value was chosen to
prevent a poorly chosen value from dividing the address space in such
a way as to prevent large allocations. This may not be an issue on all
platforms. Allow the specification of a minimum number of bits so that
platforms desiring greater ASLR protection may determine where to place
the trade-off.
Bug: 24047224
Signed-off-by: Daniel Cashman <dcashman@android.com>
Signed-off-by: Daniel Cashman <dcashman@google.com>
Change-Id: I66ac01c6f4f2c8dcfc84d1f1e99490b8385b3ed4
(cherry picked from commit 00ead9ddada26be1726539b1ead14abf974d235d)
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
while_each_thread() and next_thread() should die, almost every lockless
usage is wrong.
1. Unless g == current, the lockless while_each_thread() is not safe.
while_each_thread(g, t) can loop forever if g exits, next_thread()
can't reach the unhashed thread in this case. Note that this can
happen even if g is the group leader, it can exec.
2. Even if while_each_thread() itself was correct, people often use
it wrongly.
It was never safe to just take rcu_read_lock() and loop unless
you verify that pid_alive(g) == T, even the first next_thread()
can point to the already freed/reused memory.
This patch adds signal_struct->thread_head and task->thread_node to
create the normal rcu-safe list with the stable head. The new
for_each_thread(g, t) helper is always safe under rcu_read_lock() as
long as this task_struct can't go away.
Note: of course it is ugly to have both task_struct->thread_node and the
old task_struct->thread_group, we will kill it later, after we change
the users of while_each_thread() to use for_each_thread().
Perhaps we can kill it even before we convert all users, we can
reimplement next_thread(t) using the new thread_head/thread_node. But
we can't do this right now because this will lead to subtle behavioural
changes. For example, do/while_each_thread() always sees at least one
task, while for_each_thread() can do nothing if the whole thread group
has died. Or thread_group_empty(), currently its semantics is not clear
unless thread_group_leader(p) and we need to audit the callers before we
can change it.
So this patch adds the new interface which has to coexist with the old
one for some time, hopefully the next changes will be more or less
straightforward and the old one will go away soon.
Bug 200004307
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Sergey Dyasly <dserrg@gmail.com>
Tested-by: Sergey Dyasly <dserrg@gmail.com>
Reviewed-by: Sameer Nanda <snanda@chromium.org>
Acked-by: David Rientjes <rientjes@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mandeep Singh Baines <msb@chromium.org>
Cc: "Ma, Xindong" <xindong.ma@intel.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
(cherry picked from commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c)
Signed-off-by: Sri Krishna chowdary <schowdary@nvidia.com>
Change-Id: Id689cb1383ceba2561b66188d88258619b68f5c6
Reviewed-on: http://git-master/r/419041
Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Current upstream kernel hangs with mips and powerpc targets in
uniprocessor mode if SECCOMP is configured.
Bisect points to commit dbd952127d11 ("seccomp: introduce writer locking").
Turns out that code such as
BUG_ON(!spin_is_locked(&list_lock));
can not be used in uniprocessor mode because spin_is_locked() always
returns false in this configuration, and that assert_spin_locked()
exists for that very purpose and must be used instead.
Fixes: dbd952127d11 ("seccomp: introduce writer locking")
Cc: Kees Cook <keescook@chromium.org>
Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Kees Cook <keescook@chromium.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Applying restrictive seccomp filter programs to large or diverse
codebases often requires handling threads which may be started early in
the process lifetime (e.g., by code that is linked in). While it is
possible to apply permissive programs prior to process start up, it is
difficult to further restrict the kernel ABI to those threads after that
point.
This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for
synchronizing thread group seccomp filters at filter installation time.
When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC,
filter) an attempt will be made to synchronize all threads in current's
threadgroup to its new seccomp filter program. This is possible iff all
threads are using a filter that is an ancestor to the filter current is
attempting to synchronize to. NULL filters (where the task is running as
SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be
transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS,
...) has been set on the calling thread, no_new_privs will be set for
all synchronized threads too. On success, 0 is returned. On failure,
the pid of one of the failing threads will be returned and no filters
will have been applied.
The race conditions against another thread are:
- requesting TSYNC (already handled by sighand lock)
- performing a clone (already handled by sighand lock)
- changing its filter (already handled by sighand lock)
- calling exec (handled by cred_guard_mutex)
The clone case is assisted by the fact that new threads will have their
seccomp state duplicated from their parent before appearing on the tasklist.
Holding cred_guard_mutex means that seccomp filters cannot be assigned
while in the middle of another thread's exec (potentially bypassing
no_new_privs or similar). The call to de_thread() may kill threads waiting
for the mutex.
Changes across threads to the filter pointer includes a barrier.
Based on patches by Will Drewry.
Suggested-by: Julien Tinnes <jln@chromium.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
This changes the mode setting helper to allow threads to change the
seccomp mode from another thread. We must maintain barriers to keep
TIF_SECCOMP synchronized with the rest of the seccomp state.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Conflicts:
kernel/seccomp.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Normally, task_struct.seccomp.filter is only ever read or modified by
the task that owns it (current). This property aids in fast access
during system call filtering as read access is lockless.
Updating the pointer from another task, however, opens up race
conditions. To allow cross-thread filter pointer updates, writes to the
seccomp fields are now protected by the sighand spinlock (which is shared
by all threads in the thread group). Read access remains lockless because
pointer updates themselves are atomic. However, writes (or cloning)
often entail additional checking (like maximum instruction counts)
which require locking to perform safely.
In the case of cloning threads, the child is invisible to the system
until it enters the task list. To make sure a child can't be cloned from
a thread and left in a prior state, seccomp duplication is additionally
moved under the sighand lock. Then parent and child are certain have
the same seccomp state when they exit the lock.
Based on patches by Will Drewry and David Drysdale.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Conflicts:
kernel/fork.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
In preparation for adding seccomp locking, move filter creation away
from where it is checked and applied. This will allow for locking where
no memory allocation is happening. The validation, filter attachment,
and seccomp mode setting can all happen under the future locks.
For extreme defensiveness, I've added a BUG_ON check for the calculated
size of the buffer allocation in case BPF_MAXINSN ever changes, which
shouldn't ever happen. The compiler should actually optimize out this
check since the test above it makes it impossible.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Conflicts:
kernel/seccomp.c
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Since seccomp transitions between threads requires updates to the
no_new_privs flag to be atomic, the flag must be part of an atomic flag
set. This moves the nnp flag into a separate task field, and introduces
accessors.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Conflicts:
kernel/sys.c
Conflicts:
kernel/sys.c
Change-Id: Ief9d8a459e7e66bd7e025ed494bc06a8b1dea54d
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
This adds the new "seccomp" syscall with both an "operation" and "flags"
parameter for future expansion. The third argument is a pointer value,
used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must
be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...).
In addition to the TSYNC flag later in this patch series, there is a
non-zero chance that this syscall could be used for configuring a fixed
argument area for seccomp-tracer-aware processes to pass syscall arguments
in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter"
for this syscall. Additionally, this syscall uses operation, flags,
and user pointer for arguments because strictly passing arguments via
a user pointer would mean seccomp itself would be unable to trivially
filter the seccomp syscall itself.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
Conflicts:
arch/x86/syscalls/syscall_32.tbl
arch/x86/syscalls/syscall_64.tbl
include/uapi/asm-generic/unistd.h
kernel/seccomp.c
And fixup of unistd32.h to truly enable sys_secomp.
Change-Id: I95bea02382c52007d22e5e9dc563c7d055c2c83f
Conflicts:
arch/arm64/include/asm/unistd32.h
|
| |
|
|
|
|
|
|
|
| |
Separates the two mode setting paths to make things more readable with
fewer #ifdefs within function bodies.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
|
| |
|
|
|
|
|
|
|
| |
To support splitting mode 1 from mode 2, extract the mode checking and
assignment logic into common functions.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
|
| |
|
|
|
|
|
|
|
|
| |
In preparation for having other callers of the seccomp mode setting
logic, split the prctl entry point away from the core logic that performs
seccomp mode setting.
Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Oleg Nesterov <oleg@redhat.com>
Reviewed-by: Andy Lutomirski <luto@amacapital.net>
|
| |
|
|
|
|
|
|
|
| |
This is required to run net_test, which builds for ARCH=um.
b/24150139
Change-Id: I7041756c057b913a09554b66cd817b4abaa712a6
Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Call acct_update_power() to track power usage of task only if
CONFIG_CPU_FREQ_STAT is enabled, otherwise we run into
following build failure:
---------------
kernel/built-in.o: In function `account_user_time':
kernel/sched/cputime.c:155: undefined reference to `acct_update_power'
kernel/built-in.o: In function `__account_system_time':
kernel/sched/cputime.c:208: undefined reference to `acct_update_power'
make: *** [vmlinux] Error 1
---------------
Signed-off-by: Amit Pundir <amit.pundir@linaro.org>
(cherry picked from commit 5371356ed838799125b1379fa38cdddfd1d189fd)
|
| |
|
|
|
|
|
|
|
|
|
|
| |
cpu_power has been added to keep track of amount of power each task is
consuming. cpu_power is updated whenever stime and utime are updated for
a task. power is computed by taking into account the frequency at which
the current core was running and the current for cpu actively
running at hat frequency.
Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248
Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com>
(cherry picked from commit 85a6bd2bc4c903df43186e6f41209746aa6fdf05)
|
| |
|
|
|
|
|
|
|
|
|
| |
Hardened SE Linux policies that prevent module_request cause a lot
of SE Linux denials when using a kernel configured with module
support. lookup_exec_domain will try to request a personality-8
module which does not exist. Disable module loading which will
fall back to lookup_exec_domain returning the defaul exec domain.
Bug: 20429596
Change-Id: I4efd1eb46b901857233ce39fa8bfe7a0804d339d
|
| |
|
|
|
|
|
| |
Needed when building some sensor drivers as modules
Change-Id: Ibe90f893ae280624327c66ab8f9a2ace2e1baae0
Signed-off-by: Tom Cherry <tomcherry@google.com>
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
Add another dimension for task packing based on frequency. This patch
adds a per-cpu tunable, rq->mostly_idle_freq, which when set will
result in tasks being packed on a single cpu in cluster as long as
cluster frequency is less than set threshold.
Change-Id: I8c65376801efd158c8145073a10a1a555004f1da
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| |\| |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack
tasks on cpus to some extent. In some cases, it may be desirable to
have different packing limits for different cpus. For example, pack to
a higher limit on high-performance cpus compared to power-efficient
cpus.
This patch removes the global mostly_idle tunables and makes them
per-cpu, thus letting task packing behavior to be controlled in a
fine grained manner.
Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
|
| |\ \ |
|
| | | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | |
| | | |
hrtimer_interrupt() has the following subtle issue:
hrtimer_interrupt()
lock(cpu_base);
expires_next = KTIME_MAX;
expire_timers(CLOCK_MONOTONIC);
expires = get_next_timer(CLOCK_MONOTONIC);
if (expires < expires_next)
expires_next = expires;
expire_timers(CLOCK_REALTIME);
unlock(cpu_base);
wakeup()
hrtimer_start(CLOCK_MONOTONIC, newtimer);
lock(cpu_base();
expires = get_next_timer(CLOCK_REALTIME);
if (expires < expires_next)
expires_next = expires;
So because we already evaluated the next expiring timer of
CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be
earlier than the overall next expiry time in hrtimer_interrupt().
To solve this, remove the caching of the next expiry value from
hrtimer_interrupt() and reevaluate all active clock bases for the next
expiry value. To avoid another code duplication, create a shared
evaluation function and use it for hrtimer_get_next_event(),
hrtimer_force_reprogram() and hrtimer_interrupt().
There is another subtlety in this mechanism:
While hrtimer_interrupt() is running, we want to avoid to touch the
hardware device because we will reprogram it anyway at the end of
hrtimer_interrupt(). This works nicely for hrtimers which get rearmed
via the HRTIMER_RESTART mechanism, because we drop out when the
callback on that CPU is running. But that fails, if a new timer gets
enqueued like in the example above.
This has another implication: While hrtimer_interrupt() is running we
refuse remote enqueueing of timers - see hrtimer_interrupt() and
hrtimer_check_target().
hrtimer_interrupt() tries to prevent this by setting cpu_base->expires
to KTIME_MAX, but that fails if a new timer gets queued.
Prevent both the hardware access and the remote enqueue
explicitely. We can loosen the restriction on the remote enqueue now
due to reevaluation of the next expiry value, but that needs a
seperate patch.
Folded in a fix from Vignesh Radhakrishnan.
Change-Id: I803322cc29a294eab73fa2046e9f3a2e5f66755e
Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: vigneshr@codeaurora.org
Cc: john.stultz@linaro.org
Cc: viresh.kumar@linaro.org
Cc: fweisbec@gmail.com
Cc: cl@linux.com
Cc: stuart.w.hayes@gmail.com
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Patch-mainline : linux-arm-kernel @ 01/23/15, 03:21
[vigneshr@codeaurora.org : Changes to the file kernel/time/hrtimer.c
is made to the file kernel/hrtimer.c]
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
|
| |\ \ \
| |/ /
|/| | |
|
| | |/
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
cpufreq governor does not send frequency change notifications for
offline CPUs. This means that a hot removed CPU's cur_freq information
can get stale if there is a frequency change while that CPU is offline.
When the offline CPU is hotplugged back in, all subsequent load
calculations are based off the stale information until another frequency
change occurs and the corresponding set of notifications are sent out.
Avoid this incorrect load tracking by updating the cur_freq for all
CPUs in the same frequency domain.
Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5
Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
|
| |/
|
|
|
|
|
|
|
| |
As the new design of power-off alarm, it's not necessary to boot up
the phone 2 minutes earlier than the actual power-off alarm time. So
remove this limit here.
Change-Id: I10c47d191d62452a14253db00cb8922368aa9216
Signed-off-by: lijuang <lijuang@codeaurora.org>
|
| |\ |
|
| | |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| |
| | |
sysctl_sched_prefer_idle controls selection of idle cpus for waking
tasks. In some cases, waking to idle cpus help performance while in
other cases it hurts (as tasks incur latency associated with C-state
wakeup). Its ideal if scheduler can adapt prefer_idle behavior based
on the task that is waking up, but that's hard for scheduler to
figure by itself. PF_WAKE_UP_IDLE hint can be provided by external
module/driver in such case to guide scheduler in preferring an idle
cpu for select tasks irrespective of sysctl_sched_prefer_idle flag.
This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE
hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a
hint for scheduler to prefer idle cpu for waking tasks. Similarly
scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on
idle cpu when they wakeup.
Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| |/
|
|
|
|
|
|
|
|
| |
Currently kmemleak scans module memory as provided
in the area list. This takes up lot of time with
irq's and preemption disabled. Provide a compile
time configurable config to enable this functionality.
Change-Id: I5117705e7e6726acdf492e7f87c0703bc1f28da0
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| |
Currently irq_work_cpu_notify is registered using
device_initcall().
In cases where CPU is hotplugged early (example
would be thermal engine hotplugging CPU), there
are chances where irq_work_cpu_notifier has not
even registered, but CPU is already hotplugged out.
irq_work uses CPU_DYING notifier to clear out the
pending irq_work. But since the cpu notifier is not
even registered that early, pending irq_work
items are never run since this pending list is
percpu.
One specific scenario where this is impacting
the system is, rcu framework using irq_work
to wakeup and complete cleanup operations. In this
scenario we notice that RCU operations needs cleanup
on the hotplugged CPU.
Fix this by registering irq_work_cpu_notify
in early init.
CRs-Fixed: 768180
Change-Id: Ibe7f5c77097de7a342eeb1e8d597fb2f72185ecf
Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org>
Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
|
| |
|
|
|
|
|
|
|
|
|
|
|
| |
It is possible that rq->clock_task was not updated in put_prev_task()
in which case we can potentially overcharge a real-time task for time
it did not run. This is because clock_task could be stale and not
represent the exact time real-time task started running.
Fix this by forcing update of rq->clock_task when real-time task
starts running.
Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c
Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
|
| |\ |
|