aboutsummaryrefslogtreecommitdiff
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* sched: cpufreq: Adds a field cpu_power in the task_structRuchi Kandoi2017-04-211-4/+0
| | | | | | | | | | | | | | | cpu_power has been added to keep track of amount of power each task is consuming. cpu_power is updated whenever stime and utime are updated for a task. power is computed by taking into account the frequency at which the current core was running and the current for cpu actively running at hat frequency. Bug: 21498425 Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> Git-commit: 94877641f6b6ea17aa335729f548eb5647db3e3e Git-repo: https://android.googlesource.com/kernel/msm/ Signed-off-by: Nirmal Abraham <nabrah@codeaurora.org>
* sched: Fix bug in average nr_running and nr_iowait calculationSrivatsa Vaddagiri2017-03-163-13/+21
| | | | | | | | | | | | | | | | | | | | | | | | sched_get_nr_running_avg() returns average nr_running and nr_iowait task count since it was last invoked. Fix several bugs in their calculation. * sched_update_nr_prod() needs to consider that nr_running count can change by more than 1 when CFS_BANDWIDTH feature is used * sched_get_nr_running_avg() needs to sum up nr_iowait count across all cpus, rather than just one * sched_get_nr_running_avg() could race with sched_update_nr_prod(), as a result of which it could use curr_time which is behind a cpu's 'last_time' value. That would lead to erroneous calculation of average nr_running or nr_iowait. While at it, fix also a bug in BUG_ON() check in sched_update_nr_prod() function and remove unnecessary nr_running argument to sched_update_nr_prod() function. Change-Id: I46737614737292fae0d7204c4648fb9b862f65b2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* sched: Provide a wake up API without sending freq notificationsJunjie Wu2017-03-162-5/+32
| | | | | | | | | | | | | | | | | Each time a task wakes up, scheduler evaluates its load and notifies governor if the resulting frequency of destination CPU is larger than a threshold. However, some governor wakes up a separate task that handles frequency change, which again calls wake_up_process(). This is dangerous because if the task being woken up meets the threshold and ends up being moved around, there is a potential for endless recursive notifications. Introduce a new API for waking up a task without triggering frequency notification. Change-Id: I24261af81b7dc410c7fb01eaa90920b8d66fbd2a Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
* tracing: power: Add trace events for core controlJunjie Wu2017-03-071-1/+2
| | | | | | | Add trace events for core control module. Change-Id: I36da5381709f81ef1ba82025cd9cf8610edef3fc Signed-off-by: Junjie Wu <junjiew@codeaurora.org>
* sched: Keep track of average nr_big_tasksSrivatsa Vaddagiri2017-03-073-7/+54
| | | | | | | | | | | | Extend sched_get_nr_running_avg() API to return average nr_big_tasks, in addition to average nr_running and average nr_io_wait tasks. Also add a new trace point to record values returned by sched_get_nr_running_avg() API. Change-Id: Id3591e6d04da8db484b4d1cb9d95dba075f5ab9a Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> [rameezmustafa@codeaurora.org: Resolve trivial merge conflicts] Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* Merge remote-tracking branch 'aosp/android-msm-seed-3.10-nougat-mr1.1' into HEADArvin Quilao2017-01-082-5/+11
| | | | Change-Id: Ic50c8f4aea56d08a791d1a2b07fe69515757df22
* BACKPORT: perf: Fix event->ctx lockingAriel Yin2016-10-171-37/+207
| | | | | | | | | | | | | | | | | | | | | | | | | There have been a few reported issues wrt. the lack of locking around changing event->ctx. This patch tries to address those. It avoids the whole rwsem thing; and while it appears to work, please give it some thought in review. What I did fail at is sensible runtime checks on the use of event->ctx, the RCU use makes it very hard. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.209535886@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> (cherry picked from commit f63a8daa5812afef4f06c962351687e1ff9ccb2b) Bug: 30955111 Bug: 31095224 Change-Id: I5bab713034e960fad467637e98e914440de5666d
* perf: protect group_leader from races that cause ctx double-freeJohn Dias2016-10-171-0/+15
| | | | | | | | | | | | | | | | | | When moving a group_leader perf event from a software-context to a hardware-context, there's a race in checking and updating that context. The existing locking solution doesn't work; note that it tries to grab a lock inside the group_leader's context object, which you can only get at by going through a pointer that should be protected from these races. To avoid that problem, and to produce a simple solution, we can just use a lock per group_leader to protect all checks on the group_leader's context. The new lock is grabbed and released when no context locks are held. Bug: 30955111 Bug: 31095224 Change-Id: If37124c100ca6f4aa962559fba3bd5dbbec8e052
* BACKPORT: audit: fix a double fetch in audit_log_single_execve_arg()Paul Moore2016-09-161-170/+167
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry picked from commit 43761473c254b45883a64441dd0bc85a42f3645c) There is a double fetch problem in audit_log_single_execve_arg() where we first check the execve(2) argumnets for any "bad" characters which would require hex encoding and then re-fetch the arguments for logging in the audit record[1]. Of course this leaves a window of opportunity for an unsavory application to munge with the data. This patch reworks things by only fetching the argument data once[2] into a buffer where it is scanned and logged into the audit records(s). In addition to fixing the double fetch, this patch improves on the original code in a few other ways: better handling of large arguments which require encoding, stricter record length checking, and some performance improvements (completely unverified, but we got rid of some strlen() calls, that's got to be a good thing). As part of the development of this patch, I've also created a basic regression test for the audit-testsuite, the test can be tracked on GitHub at the following link: * https://github.com/linux-audit/audit-testsuite/issues/25 [1] If you pay careful attention, there is actually a triple fetch problem due to a strnlen_user() call at the top of the function. [2] This is a tiny white lie, we do make a call to strnlen_user() prior to fetching the argument data. I don't like it, but due to the way the audit record is structured we really have no choice unless we copy the entire argument at once (which would require a rather wasteful allocation). The good news is that with this patch the kernel no longer relies on this strnlen_user() value for anything beyond recording it in the log, we also update it with a trustworthy value whenever possible. Reported-by: Pengfei Wang <wpengfeinudt@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Paul Moore <paul@paul-moore.com> Change-Id: I10e979e94605e3cf8d461e3e521f8f9837228aa5 Bug: 30956807
* perf: Fix race in swevent hashAriel Yin2016-09-161-18/+1
| | | | | | | | | | | | | | | There's a race on CPU unplug where we free the swevent hash array while it can still have events on. This will result in a use-after-free which is BAD. Simply do not free the hash array on unplug. This leaves the thing around and no use-after-free takes place. When the last swevent dies, we do a for_each_possible_cpu() iteration anyway to clean these up, at which time we'll free it, so no leakage will occur. Change-Id: I368c84b9d9c77c82121358a8fd29a5b012364321
* FROMLIST: security,perf: Allow further restriction of perf_event_openJeff Vander Stoep2016-06-211-0/+6
| | | | | | | | | | | | | | | | | | | When kernel.perf_event_open is set to 3 (or greater), disallow all access to performance events by users without CAP_SYS_ADMIN. Add a Kconfig symbol CONFIG_SECURITY_PERF_EVENTS_RESTRICT that makes this value the default. This is based on a similar feature in grsecurity (CONFIG_GRKERNSEC_PERF_HARDEN). This version doesn't include making the variable read-only. It also allows enabling further restriction at run-time regardless of whether the default is changed. https://lkml.org/lkml/2016/1/11/587 Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Bug: 29054680 Change-Id: Iff5bff4fc1042e85866df9faa01bce8d04335ab8
* sched: Fix information leak in sys_sched_getattr()Vegard Nossum2016-06-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | We're copying the on-stack structure to userspace, but forgot to give the right number of bytes to copy. This allows the calling process to obtain up to PAGE_SIZE bytes from the stack (and possibly adjacent kernel memory). This fix copies only as much as we actually have on the stack (attr->size defaults to the size of the struct) and leaves the rest of the userspace-provided buffer untouched. Found using kmemcheck + trinity. Fixes: d50dde5a10f30 ("sched: Add new scheduler syscalls to support an extended scheduling parameters ABI") Bug: 28731691 Change-Id: Iaf2ecb4d2b0fa9df09539f2fc9ad6fd123d87aa2 Cc: Dario Faggioli <raistlin@linux.it> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Vegard Nossum <vegard.nossum@oracle.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1392585857-10725-1-git-send-email-vegard.nossum@oracle.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* cpuset: Make cpusets restore on hotplugRiley Andrews2016-05-101-15/+33
| | | | | | | | | | | | This deliberately changes the behavior of the per-cpuset cpus file to not be effected by hotplug. When a cpu is offlined, it will be removed from the cpuset/cpus file. When a cpu is onlined, if the cpuset originally requested that that cpu was part of the cpuset, that cpu will be restored to the cpuset. The cpus files still have to be hierachical, but the ranges no longer have to be out of the currently online cpus, just the physically present cpus. Change-Id: I3efbae24a1f6384be1e603fb56f0d3baef61d924
* cpuset: Add allow_attach hook for cpusets on android.Riley Andrews2016-05-101-0/+18
| | | | Change-Id: Ic1b61b2bbb7ce74c9e9422b5e22ee9078251de21
* sched: add sched blocked tracepoint which dumps out context of sleep.Riley Andrews2016-04-201-0/+1
| | | | | | | | | Decare war on uninterruptible sleep. Add a tracepoint which walks the kernel stack and dumps the first non-scheduler function called before the scheduler is invoked. Change-Id: I19e965d5206329360a92cbfe2afcc8c30f65c229 Signed-off-by: Riley Andrews <riandrews@google.com>
* mm: fix prctl_set_vma_anon_nameColin Cross2016-03-241-1/+1
| | | | | | | | | | | prctl_set_vma_anon_name could attempt to set the name across two vmas at the same time due to a typo, which might corrupt the vma list. Fix it to use tmp instead of end to limit the name setting to a single vma at a time. Change-Id: Ie32d8ddb0fd547efbeedd6528acdab5ca5b308b4 Reported-by: Jed Davis <jld@mozilla.com> Signed-off-by: Colin Cross <ccross@android.com>
* __ptrace_may_access() should not deny sub-threadsMark Grondona2016-02-081-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | (cherry pick from commit 73af963f9f3036dffed55c3a2898598186db1045) __ptrace_may_access() checks get_dumpable/ptrace_has_cap/etc if task != current, this can can lead to surprising results. For example, a sub-thread can't readlink("/proc/self/exe") if the executable is not readable. setup_new_exec()->would_dump() notices that inode_permission(MAY_READ) fails and then it does set_dumpable(suid_dumpable). After that get_dumpable() fails. (It is not clear why proc_pid_readlink() checks get_dumpable(), perhaps we could add PTRACE_MODE_NODUMPABLE) Change __ptrace_may_access() to use same_thread_group() instead of "task == current". Any security check is pointless when the tasks share the same ->mm. Signed-off-by: Mark Grondona <mgrondona@llnl.gov> Signed-off-by: Ben Woodard <woodard@redhat.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Bug: 26016905 Change-Id: If9e2a0eb3339d26d50a9d84671a189fe405f36a3
* msm: null pointer dereferencingWish Wu2016-01-222-2/+9
| | | | | | | | | | | | | | | | | Prevent unintended kernel NULL pointer dereferencing. Code: hlist_del_rcu(&event->hlist_entry); Fix: Adding pointer check: if(!hlist_unhashed(&p_event->hlist_entry)) hlist_del_rcu(&p_event->hlist_entry); Bug: 25364034 Change-Id: Ib13a7400d4a36a4b08b0afc9b7d69c6027e741b6 Signed-off-by: Yuan Lin <yualin@google.com> Signed-off-by: Christian Bejram <cbejram@google.com> (cherry picked from commit 8e04ed2bc057656d343a28ab8b01718371ca9f42)
* FROMLIST: mm: mmap: Add new /proc tunable for mmap_base ASLR.dcashman2016-01-131-0/+22
| | | | | | | | | | | | | | | | | | (cherry picked from commit https://lkml.org/lkml/2015/12/21/337) ASLR only uses as few as 8 bits to generate the random offset for the mmap base address on 32 bit architectures. This value was chosen to prevent a poorly chosen value from dividing the address space in such a way as to prevent large allocations. This may not be an issue on all platforms. Allow the specification of a minimum number of bits so that platforms desiring greater ASLR protection may determine where to place the trade-off. Bug: 24047224 Signed-off-by: Daniel Cashman <dcashman@android.com> Signed-off-by: Daniel Cashman <dcashman@google.com> Change-Id: I66ac01c6f4f2c8dcfc84d1f1e99490b8385b3ed4 (cherry picked from commit 00ead9ddada26be1726539b1ead14abf974d235d)
* introduce for_each_thread() to replace the buggy while_each_thread()Oleg Nesterov2015-09-282-0/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | while_each_thread() and next_thread() should die, almost every lockless usage is wrong. 1. Unless g == current, the lockless while_each_thread() is not safe. while_each_thread(g, t) can loop forever if g exits, next_thread() can't reach the unhashed thread in this case. Note that this can happen even if g is the group leader, it can exec. 2. Even if while_each_thread() itself was correct, people often use it wrongly. It was never safe to just take rcu_read_lock() and loop unless you verify that pid_alive(g) == T, even the first next_thread() can point to the already freed/reused memory. This patch adds signal_struct->thread_head and task->thread_node to create the normal rcu-safe list with the stable head. The new for_each_thread(g, t) helper is always safe under rcu_read_lock() as long as this task_struct can't go away. Note: of course it is ugly to have both task_struct->thread_node and the old task_struct->thread_group, we will kill it later, after we change the users of while_each_thread() to use for_each_thread(). Perhaps we can kill it even before we convert all users, we can reimplement next_thread(t) using the new thread_head/thread_node. But we can't do this right now because this will lead to subtle behavioural changes. For example, do/while_each_thread() always sees at least one task, while for_each_thread() can do nothing if the whole thread group has died. Or thread_group_empty(), currently its semantics is not clear unless thread_group_leader(p) and we need to audit the callers before we can change it. So this patch adds the new interface which has to coexist with the old one for some time, hopefully the next changes will be more or less straightforward and the old one will go away soon. Bug 200004307 Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Sergey Dyasly <dserrg@gmail.com> Tested-by: Sergey Dyasly <dserrg@gmail.com> Reviewed-by: Sameer Nanda <snanda@chromium.org> Acked-by: David Rientjes <rientjes@google.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mandeep Singh Baines <msb@chromium.org> Cc: "Ma, Xindong" <xindong.ma@intel.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit 0c740d0afc3bff0a097ad03a1c8df92757516f5c) Signed-off-by: Sri Krishna chowdary <schowdary@nvidia.com> Change-Id: Id689cb1383ceba2561b66188d88258619b68f5c6 Reviewed-on: http://git-master/r/419041 Reviewed-by: Bharat Nihalani <bnihalani@nvidia.com>
* seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lockGuenter Roeck2015-09-282-6/+6
| | | | | | | | | | | | | | | | | Current upstream kernel hangs with mips and powerpc targets in uniprocessor mode if SECCOMP is configured. Bisect points to commit dbd952127d11 ("seccomp: introduce writer locking"). Turns out that code such as BUG_ON(!spin_is_locked(&list_lock)); can not be used in uniprocessor mode because spin_is_locked() always returns false in this configuration, and that assert_spin_locked() exists for that very purpose and must be used instead. Fixes: dbd952127d11 ("seccomp: introduce writer locking") Cc: Kees Cook <keescook@chromium.org> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Kees Cook <keescook@chromium.org>
* seccomp: implement SECCOMP_FILTER_FLAG_TSYNCKees Cook2015-09-281-1/+134
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Applying restrictive seccomp filter programs to large or diverse codebases often requires handling threads which may be started early in the process lifetime (e.g., by code that is linked in). While it is possible to apply permissive programs prior to process start up, it is difficult to further restrict the kernel ABI to those threads after that point. This change adds a new seccomp syscall flag to SECCOMP_SET_MODE_FILTER for synchronizing thread group seccomp filters at filter installation time. When calling seccomp(SECCOMP_SET_MODE_FILTER, SECCOMP_FILTER_FLAG_TSYNC, filter) an attempt will be made to synchronize all threads in current's threadgroup to its new seccomp filter program. This is possible iff all threads are using a filter that is an ancestor to the filter current is attempting to synchronize to. NULL filters (where the task is running as SECCOMP_MODE_NONE) are also treated as ancestors allowing threads to be transitioned into SECCOMP_MODE_FILTER. If prctrl(PR_SET_NO_NEW_PRIVS, ...) has been set on the calling thread, no_new_privs will be set for all synchronized threads too. On success, 0 is returned. On failure, the pid of one of the failing threads will be returned and no filters will have been applied. The race conditions against another thread are: - requesting TSYNC (already handled by sighand lock) - performing a clone (already handled by sighand lock) - changing its filter (already handled by sighand lock) - calling exec (handled by cred_guard_mutex) The clone case is assisted by the fact that new threads will have their seccomp state duplicated from their parent before appearing on the tasklist. Holding cred_guard_mutex means that seccomp filters cannot be assigned while in the middle of another thread's exec (potentially bypassing no_new_privs or similar). The call to de_thread() may kill threads waiting for the mutex. Changes across threads to the filter pointer includes a barrier. Based on patches by Will Drewry. Suggested-by: Julien Tinnes <jln@chromium.org> Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
* seccomp: allow mode setting across threadsKees Cook2015-09-281-11/+26
| | | | | | | | | | | | | This changes the mode setting helper to allow threads to change the seccomp mode from another thread. We must maintain barriers to keep TIF_SECCOMP synchronized with the rest of the seccomp state. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/seccomp.c
* seccomp: introduce writer lockingKees Cook2015-09-282-2/+63
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Normally, task_struct.seccomp.filter is only ever read or modified by the task that owns it (current). This property aids in fast access during system call filtering as read access is lockless. Updating the pointer from another task, however, opens up race conditions. To allow cross-thread filter pointer updates, writes to the seccomp fields are now protected by the sighand spinlock (which is shared by all threads in the thread group). Read access remains lockless because pointer updates themselves are atomic. However, writes (or cloning) often entail additional checking (like maximum instruction counts) which require locking to perform safely. In the case of cloning threads, the child is invisible to the system until it enters the task list. To make sure a child can't be cloned from a thread and left in a prior state, seccomp duplication is additionally moved under the sighand lock. Then parent and child are certain have the same seccomp state when they exit the lock. Based on patches by Will Drewry and David Drysdale. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/fork.c
* seccomp: split filter prep from check and applyKees Cook2015-09-281-23/+66
| | | | | | | | | | | | | | | | | | | In preparation for adding seccomp locking, move filter creation away from where it is checked and applied. This will allow for locking where no memory allocation is happening. The validation, filter attachment, and seccomp mode setting can all happen under the future locks. For extreme defensiveness, I've added a BUG_ON check for the calculated size of the buffer allocation in case BPF_MAXINSN ever changes, which shouldn't ever happen. The compiler should actually optimize out this check since the test above it makes it impossible. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/seccomp.c
* sched: move no_new_privs into new atomic flagsKees Cook2015-09-282-3/+3
| | | | | | | | | | | | | | | | | | | Since seccomp transitions between threads requires updates to the no_new_privs flag to be atomic, the flag must be part of an atomic flag set. This moves the nnp flag into a separate task field, and introduces accessors. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: kernel/sys.c Conflicts: kernel/sys.c Change-Id: Ief9d8a459e7e66bd7e025ed494bc06a8b1dea54d
* seccomp: add "seccomp" syscallKees Cook2015-09-282-5/+53
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This adds the new "seccomp" syscall with both an "operation" and "flags" parameter for future expansion. The third argument is a pointer value, used with the SECCOMP_SET_MODE_FILTER operation. Currently, flags must be 0. This is functionally equivalent to prctl(PR_SET_SECCOMP, ...). In addition to the TSYNC flag later in this patch series, there is a non-zero chance that this syscall could be used for configuring a fixed argument area for seccomp-tracer-aware processes to pass syscall arguments in the future. Hence, the use of "seccomp" not simply "seccomp_add_filter" for this syscall. Additionally, this syscall uses operation, flags, and user pointer for arguments because strictly passing arguments via a user pointer would mean seccomp itself would be unable to trivially filter the seccomp syscall itself. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net> Conflicts: arch/x86/syscalls/syscall_32.tbl arch/x86/syscalls/syscall_64.tbl include/uapi/asm-generic/unistd.h kernel/seccomp.c And fixup of unistd32.h to truly enable sys_secomp. Change-Id: I95bea02382c52007d22e5e9dc563c7d055c2c83f Conflicts: arch/arm64/include/asm/unistd32.h
* seccomp: split mode setting routinesKees Cook2015-09-281-23/+48
| | | | | | | | | Separates the two mode setting paths to make things more readable with fewer #ifdefs within function bodies. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
* seccomp: extract check/assign mode helpersKees Cook2015-09-281-4/+18
| | | | | | | | | To support splitting mode 1 from mode 2, extract the mode checking and assignment logic into common functions. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
* seccomp: create internal mode-setting functionKees Cook2015-09-281-2/+14
| | | | | | | | | | In preparation for having other callers of the seccomp mode setting logic, split the prctl entry point away from the core logic that performs seccomp mode setting. Signed-off-by: Kees Cook <keescook@chromium.org> Reviewed-by: Oleg Nesterov <oleg@redhat.com> Reviewed-by: Andy Lutomirski <luto@amacapital.net>
* Put device-specific code behind #ifndef CONFIG_UML.Lorenzo Colitti2015-09-171-0/+4
| | | | | | | | | This is required to run net_test, which builds for ARCH=um. b/24150139 Change-Id: I7041756c057b913a09554b66cd817b4abaa712a6 Signed-off-by: Lorenzo Colitti <lorenzo@google.com>
* sched: cpufreq: update power usage only if cpufreq_stat is enabledAmit Pundir2015-08-051-0/+4
| | | | | | | | | | | | | | | | Call acct_update_power() to track power usage of task only if CONFIG_CPU_FREQ_STAT is enabled, otherwise we run into following build failure: --------------- kernel/built-in.o: In function `account_user_time': kernel/sched/cputime.c:155: undefined reference to `acct_update_power' kernel/built-in.o: In function `__account_system_time': kernel/sched/cputime.c:208: undefined reference to `acct_update_power' make: *** [vmlinux] Error 1 --------------- Signed-off-by: Amit Pundir <amit.pundir@linaro.org> (cherry picked from commit 5371356ed838799125b1379fa38cdddfd1d189fd)
* sched: cpufreq: Adds a field cpu_power in the task_structRuchi Kandoi2015-08-052-0/+8
| | | | | | | | | | | | cpu_power has been added to keep track of amount of power each task is consuming. cpu_power is updated whenever stime and utime are updated for a task. power is computed by taking into account the frequency at which the current core was running and the current for cpu actively running at hat frequency. Change-Id: Ic535941e7b339aab5cae9081a34049daeb44b248 Signed-off-by: Ruchi Kandoi <kandoiruchi@google.com> (cherry picked from commit 85a6bd2bc4c903df43186e6f41209746aa6fdf05)
* Disable module loading in lookup_exec_domainChristian Bejram2015-06-221-1/+2
| | | | | | | | | | | Hardened SE Linux policies that prevent module_request cause a lot of SE Linux denials when using a kernel configured with module support. lookup_exec_domain will try to request a personality-8 module which does not exist. Disable module loading which will fall back to lookup_exec_domain returning the defaul exec domain. Bug: 20429596 Change-Id: I4efd1eb46b901857233ce39fa8bfe7a0804d339d
* time: export symbol nsecs_to_jiffies64Tom Cherry2015-03-171-0/+1
| | | | | | | Needed when building some sensor drivers as modules Change-Id: Ibe90f893ae280624327c66ab8f9a2ace2e1baae0 Signed-off-by: Tom Cherry <tomcherry@google.com>
* Merge "sched: Packing support until a frequency threshold"Linux Build Service Account2015-02-183-0/+64
|\
| * sched: Packing support until a frequency thresholdSrivatsa Vaddagiri2015-02-113-0/+64
| | | | | | | | | | | | | | | | | | | | Add another dimension for task packing based on frequency. This patch adds a per-cpu tunable, rq->mostly_idle_freq, which when set will result in tasks being packed on a single cpu in cluster as long as cluster frequency is less than set threshold. Change-Id: I8c65376801efd158c8145073a10a1a555004f1da Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | Merge "sched: per-cpu mostly_idle threshold"Linux Build Service Account2015-02-185-50/+69
|\|
| * sched: per-cpu mostly_idle thresholdSrivatsa Vaddagiri2015-02-115-50/+69
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sched_mostly_idle_load and sched_mostly_idle_nr_run knobs help pack tasks on cpus to some extent. In some cases, it may be desirable to have different packing limits for different cpus. For example, pack to a higher limit on high-performance cpus compared to power-efficient cpus. This patch removes the global mostly_idle tunables and makes them per-cpu, thus letting task packing behavior to be controlled in a fine grained manner. Change-Id: Ifc254cda34b928eae9d6c342ce4c0f64e531e6c2 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org> Signed-off-by: Hanumath Prasad <hpprasad@codeaurora.org>
* | Merge "hrtimer: Prevent stale expiry time in hrtimer_interrupt()"Linux Build Service Account2015-02-131-58/+50
|\ \
| * | hrtimer: Prevent stale expiry time in hrtimer_interrupt()Thomas Gleixner2015-02-131-58/+50
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | hrtimer_interrupt() has the following subtle issue: hrtimer_interrupt() lock(cpu_base); expires_next = KTIME_MAX; expire_timers(CLOCK_MONOTONIC); expires = get_next_timer(CLOCK_MONOTONIC); if (expires < expires_next) expires_next = expires; expire_timers(CLOCK_REALTIME); unlock(cpu_base); wakeup() hrtimer_start(CLOCK_MONOTONIC, newtimer); lock(cpu_base(); expires = get_next_timer(CLOCK_REALTIME); if (expires < expires_next) expires_next = expires; So because we already evaluated the next expiring timer of CLOCK_MONOTONIC we ignore that the expiry time of newtimer might be earlier than the overall next expiry time in hrtimer_interrupt(). To solve this, remove the caching of the next expiry value from hrtimer_interrupt() and reevaluate all active clock bases for the next expiry value. To avoid another code duplication, create a shared evaluation function and use it for hrtimer_get_next_event(), hrtimer_force_reprogram() and hrtimer_interrupt(). There is another subtlety in this mechanism: While hrtimer_interrupt() is running, we want to avoid to touch the hardware device because we will reprogram it anyway at the end of hrtimer_interrupt(). This works nicely for hrtimers which get rearmed via the HRTIMER_RESTART mechanism, because we drop out when the callback on that CPU is running. But that fails, if a new timer gets enqueued like in the example above. This has another implication: While hrtimer_interrupt() is running we refuse remote enqueueing of timers - see hrtimer_interrupt() and hrtimer_check_target(). hrtimer_interrupt() tries to prevent this by setting cpu_base->expires to KTIME_MAX, but that fails if a new timer gets queued. Prevent both the hardware access and the remote enqueue explicitely. We can loosen the restriction on the remote enqueue now due to reevaluation of the next expiry value, but that needs a seperate patch. Folded in a fix from Vignesh Radhakrishnan. Change-Id: I803322cc29a294eab73fa2046e9f3a2e5f66755e Reported-and-tested-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Based-on-patch-by: Stanislav Fomichev <stfomichev@yandex-team.ru> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: vigneshr@codeaurora.org Cc: john.stultz@linaro.org Cc: viresh.kumar@linaro.org Cc: fweisbec@gmail.com Cc: cl@linux.com Cc: stuart.w.hayes@gmail.com Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1501202049190.5526@nanos Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Patch-mainline : linux-arm-kernel @ 01/23/15, 03:21 [vigneshr@codeaurora.org : Changes to the file kernel/time/hrtimer.c is made to the file kernel/hrtimer.c] Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
* | | Merge "sched: Update cur_freq for offline CPUs in notifier callback"Linux Build Service Account2015-02-131-5/+11
|\ \ \ | |/ / |/| |
| * | sched: Update cur_freq for offline CPUs in notifier callbackSyed Rameez Mustafa2015-02-101-5/+11
| |/ | | | | | | | | | | | | | | | | | | | | | | | | | | cpufreq governor does not send frequency change notifications for offline CPUs. This means that a hot removed CPU's cur_freq information can get stale if there is a frequency change while that CPU is offline. When the offline CPU is hotplugged back in, all subsequent load calculations are based off the stale information until another frequency change occurs and the corresponding set of notifications are sent out. Avoid this incorrect load tracking by updating the cur_freq for all CPUs in the same frequency domain. Change-Id: Ie11ad9a64e7c9b115d01a7c065f22d386eb431d5 Signed-off-by: Syed Rameez Mustafa <rameezmustafa@codeaurora.org>
* / alarmtimer: set power off alarm to be triggered on timeLijuan Gao2015-02-111-5/+1
|/ | | | | | | | | As the new design of power-off alarm, it's not necessary to boot up the phone 2 minutes earlier than the actual power-off alarm time. So remove this limit here. Change-Id: I10c47d191d62452a14253db00cb8922368aa9216 Signed-off-by: lijuang <lijuang@codeaurora.org>
* Merge "sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()"Linux Build Service Account2015-02-051-1/+12
|\
| * sched: Consider PF_WAKE_UP_IDLE in select_best_cpu()Srivatsa Vaddagiri2015-01-231-1/+12
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sysctl_sched_prefer_idle controls selection of idle cpus for waking tasks. In some cases, waking to idle cpus help performance while in other cases it hurts (as tasks incur latency associated with C-state wakeup). Its ideal if scheduler can adapt prefer_idle behavior based on the task that is waking up, but that's hard for scheduler to figure by itself. PF_WAKE_UP_IDLE hint can be provided by external module/driver in such case to guide scheduler in preferring an idle cpu for select tasks irrespective of sysctl_sched_prefer_idle flag. This patch enhances select_best_cpu() to consider PF_WAKE_UP_IDLE hint. Wakeup posted from any task that has PF_WAKE_UP_IDLE set is a hint for scheduler to prefer idle cpu for waking tasks. Similarly scheduler will attempt to place any task with PF_WAKE_UP_IDLE set on idle cpu when they wakeup. Change-Id: Ia8bf334d98fd9fd2ff9eda875430497d55d64ce6 Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* | kmemleak : Make module scanning optional using configVignesh Radhakrishnan2015-02-051-1/+7
|/ | | | | | | | | | Currently kmemleak scans module memory as provided in the area list. This takes up lot of time with irq's and preemption disabled. Provide a compile time configurable config to enable this functionality. Change-Id: I5117705e7e6726acdf492e7f87c0703bc1f28da0 Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
* irq_work: register irq_work_cpu_notify in early initVignesh Radhakrishnan2014-12-301-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently irq_work_cpu_notify is registered using device_initcall(). In cases where CPU is hotplugged early (example would be thermal engine hotplugging CPU), there are chances where irq_work_cpu_notifier has not even registered, but CPU is already hotplugged out. irq_work uses CPU_DYING notifier to clear out the pending irq_work. But since the cpu notifier is not even registered that early, pending irq_work items are never run since this pending list is percpu. One specific scenario where this is impacting the system is, rcu framework using irq_work to wakeup and complete cleanup operations. In this scenario we notice that RCU operations needs cleanup on the hotplugged CPU. Fix this by registering irq_work_cpu_notify in early init. CRs-Fixed: 768180 Change-Id: Ibe7f5c77097de7a342eeb1e8d597fb2f72185ecf Signed-off-by: Prasad Sodagudi <psodagud@codeaurora.org> Signed-off-by: Vignesh Radhakrishnan <vigneshr@codeaurora.org>
* sched: Fix inaccurate accounting for real-time taskSrivatsa Vaddagiri2014-12-091-0/+9
| | | | | | | | | | | | | It is possible that rq->clock_task was not updated in put_prev_task() in which case we can potentially overcharge a real-time task for time it did not run. This is because clock_task could be stale and not represent the exact time real-time task started running. Fix this by forcing update of rq->clock_task when real-time task starts running. Change-Id: I8320bb4e47924368583127b950d987925e8e6a6c Signed-off-by: Srivatsa Vaddagiri <vatsa@codeaurora.org>
* Merge "msm: rtb: Add timestamp to rtb logging"Linux Build Service Account2014-11-261-3/+11
|\