aboutsummaryrefslogtreecommitdiff
path: root/kernel
Commit message (Collapse)AuthorAgeFilesLines
* boeffla_wl_blocker: [SQUASHED]andip712019-04-201-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | [first commit] boeffla_wl_blocker: add generic wakelock blocker driver v1.0.0 Based on ideas of FranciscoFranco's non-generic driver. Sysfs node: /sys/class/misc/boeffla_wakelock_blocker/wakelock_blocker - list of wakelocks to be blocked, separated by semicolons /sys/class/misc/boeffla_wakelock_blocker/debug - write: 0/1 to switch off and on debug logging into dmesg - read: get current driver internals /sys/class/misc/boeffla_wakelock_blocker/version - show driver version [second commit] boeffla_wl_blocker: update to wakelock blocker driver v1.0.1 - currently active wakelocks on the list are forcefully killed [third commit] boeffla_wl_blocker: update to wakelock blocker driver v1.1.0 There are now two lists: - the previously existing list of user defined wakelocks to block - a new list called "wakelock_blocker_default" which comes prepopulated with the most common and safe wakelocks to block: qcom_rx_wakelock;wlan;wlan_wow_wl;wlan_extscan_wl;netmgr_wl;NETLINK A combination of both wakelock lists will be blocked finally. [fourth commit] boeffla_wl_blocker: export get_active_wakeup_sources Change-Id: Ieb188396babfc6e4128febb4b2b79ee1108bf46f
* asmlinkage Make __stack_chk_failed and memcmp visibleAndi Kleen2019-03-041-1/+1
| | | | | | | | | | | | | | | In LTO symbols implicitely referenced by the compiler need to be visible. Earlier these symbols were visible implicitely from being exported, but we disabled implicit visibility fo EXPORTs when modules are disabled to improve code size. So now these symbols have to be marked visible explicitely. Do this for __stack_chk_fail (with stack protector) and memcmp. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-10-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
* stackprotector: Increase the per-task stack canary's random range from 32 ↵Daniel Micay2019-03-041-1/+1
| | | | | | | | | | | | | | | | | | | | bits to 64 bits on 64-bit platforms The stack canary is an 'unsigned long' and should be fully initialized to random data rather than only 32 bits of random data. Signed-off-by: Daniel Micay <danielmicay@gmail.com> Acked-by: Arjan van de Ven <arjan@linux.intel.com> Acked-by: Rik van Riel <riel@redhat.com> Acked-by: Kees Cook <keescook@chromium.org> Cc: Arjan van Ven <arjan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-hardening@lists.openwall.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20170504133209.3053-1-danielmicay@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* workqueue: allow rescuer thread to do more work.NeilBrown2019-03-041-1/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When there is serious memory pressure, all workers in a pool could be blocked, and a new thread cannot be created because it requires memory allocation. In this situation a WQ_MEM_RECLAIM workqueue will wake up the rescuer thread to do some work. The rescuer will only handle requests that are already on ->worklist. If max_requests is 1, that means it will handle a single request. The rescuer will be woken again in 100ms to handle another max_requests requests. I've seen a machine (running a 3.0 based "enterprise" kernel) with thousands of requests queued for xfslogd, which has a max_requests of 1, and is needed for retiring all 'xfs' write requests. When one of the worker pools gets into this state, it progresses extremely slowly and possibly never recovers (only waited an hour or two). With this patch we leave a pool_workqueue on mayday list until it is clearly no longer in need of assistance. This allows all requests to be handled in a timely fashion. We keep each pool_workqueue on the mayday list until need_to_create_worker() is false, and no work for this workqueue is found in the pool. I have tested this in combination with a (hackish) patch which forces all work items to be handled by the rescuer thread. In that context it significantly improves performance. A similar patch for a 3.0 kernel significantly improved performance on a heavy work load. Thanks to Jan Kara for some design ideas, and to Dongsu Park for some comments and testing. tj: Inverted the lock order between wq_mayday_lock and pool->lock with a preceding patch and simplified this patch. Added comment and updated changelog accordingly. Dongsu spotted missing get_pwq() in the simplified code. Cc: Dongsu Park <dongsu.park@profitbricks.com> Cc: Jan Kara <jack@suse.cz> Cc: Lai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Tejun Heo <tj@kernel.org>
* fs: add dirtytime_expire_seconds sysctlTheodore Ts'o2019-03-041-0/+8
| | | | | | | | Add a tuning knob so we can adjust the dirtytime expiration timeout, which is very useful for testing lazytime. Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz>
* sched/cputime: Fix invalid gtime in procHiroshi Shimamoto2019-03-041-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | /proc/stats shows invalid gtime when the thread is running in guest. When vtime accounting is not enabled, we cannot get a valid delta. The delta is calculated with now - tsk->vtime_snap, but tsk->vtime_snap is only updated when vtime accounting is runtime enabled. This patch makes task_gtime() just return gtime without computing the buggy non-existing tickless delta when vtime accounting is not enabled. Use context_tracking_is_enabled() to check if vtime is accounting on some cpu, in which case only we need to check the tickless delta. This way we fix the gtime value regression on machines not running nohz full. The kernel config contains CONFIG_VIRT_CPU_ACCOUNTING_GEN=y and CONFIG_NO_HZ_FULL_ALL=n and boot without nohz_full. I ran and stop a busy loop in VM and see the gtime in host. Dump the 43rd field which shows the gtime in every second: # while :; do awk '{print $3" "$43}' /proc/3955/task/4014/stat; sleep 1; done S 4348 R 7064566 R 7064766 R 7064967 R 7065168 S 4759 S 4759 During running busy loop, it returns large value. After applying this patch, we can see right gtime. # while :; do awk '{print $3" "$43}' /proc/10913/task/10956/stat; sleep 1; done S 5338 R 5365 R 5465 R 5566 R 5666 S 5726 S 5726 Signed-off-by: Hiroshi Shimamoto <h-shimamoto@ct.jp.nec.com> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Chris Metcalf <cmetcalf@ezchip.com> Cc: Christoph Lameter <cl@linux.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Luiz Capitulino <lcapitulino@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul E . McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1447948054-28668-2-git-send-email-fweisbec@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* sched/cputime: Fix prev steal time accouting during CPU hotplugWanpeng Li2019-03-042-14/+0
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit: e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug") ... set rq->prev_* to 0 after a CPU hotplug comes back, in order to fix the case where (after CPU hotplug) steal time is smaller than rq->prev_steal_time. However, this should never happen. Steal time was only smaller because of the KVM-specific bug fixed by the previous patch. Worse, the previous patch triggers a bug on CPU hot-unplug/plug operation: because rq->prev_steal_time is cleared, all of the CPU's past steal time will be accounted again on hot-plug. Since the root cause has been fixed, we can just revert commit e9532e69b8d1. Signed-off-by: Wanpeng Li <wanpeng.li@hotmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Fixes: 'commit e9532e69b8d1 ("sched/cputime: Fix steal time accounting vs. CPU hotplug")' Link: http://lkml.kernel.org/r/1465813966-3116-3-git-send-email-wanpeng.li@hotmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
* perf/core: Drop kernel samples even though :u is specifiedJin Yao2019-02-281-0/+21
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When doing sampling, for example: perf record -e cycles:u ... On workloads that do a lot of kernel entry/exits we see kernel samples, even though :u is specified. This is due to skid existing. This might be a security issue because it can leak kernel addresses even though kernel sampling support is disabled. The patch drops the kernel samples if exclude_kernel is specified. For example, test on Haswell desktop: perf record -e cycles:u <mgen> perf report --stdio Before patch applied: 99.77% mgen mgen [.] buf_read 0.20% mgen mgen [.] rand_buf_init 0.01% mgen [kernel.vmlinux] [k] apic_timer_interrupt 0.00% mgen mgen [.] last_free_elem 0.00% mgen libc-2.23.so [.] __random_r 0.00% mgen libc-2.23.so [.] _int_malloc 0.00% mgen mgen [.] rand_array_init 0.00% mgen [kernel.vmlinux] [k] page_fault 0.00% mgen libc-2.23.so [.] __random 0.00% mgen libc-2.23.so [.] __strcasestr 0.00% mgen ld-2.23.so [.] strcmp 0.00% mgen ld-2.23.so [.] _dl_start 0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4 0.00% mgen ld-2.23.so [.] _start We can see kernel symbols apic_timer_interrupt and page_fault. After patch applied: 99.79% mgen mgen [.] buf_read 0.19% mgen mgen [.] rand_buf_init 0.00% mgen libc-2.23.so [.] __random_r 0.00% mgen mgen [.] rand_array_init 0.00% mgen mgen [.] last_free_elem 0.00% mgen libc-2.23.so [.] vfprintf 0.00% mgen libc-2.23.so [.] rand 0.00% mgen libc-2.23.so [.] __random 0.00% mgen libc-2.23.so [.] _int_malloc 0.00% mgen libc-2.23.so [.] _IO_doallocbuf 0.00% mgen ld-2.23.so [.] do_lookup_x 0.00% mgen ld-2.23.so [.] open_verify.constprop.7 0.00% mgen ld-2.23.so [.] _dl_important_hwcaps 0.00% mgen libc-2.23.so [.] sched_setaffinity@@GLIBC_2.3.4 0.00% mgen ld-2.23.so [.] _start There are only userspace symbols. Signed-off-by: Jin Yao <yao.jin@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: acme@kernel.org Cc: jolsa@kernel.org Cc: kan.liang@intel.com Cc: mark.rutland@arm.com Cc: will.deacon@arm.com Cc: yao.jin@intel.com Link: http://lkml.kernel.org/r/1495706947-3744-1-git-send-email-yao.jin@linux.intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* Enable arch powerKiran Anto2019-02-281-1/+1
| | | | Signed-off-by: Joe Maples <joe@frap129.org>
* kernel: Disable SCHED_HRTICKJoe Maples2019-02-281-0/+1
| | | | Signed-off-by: Joe Maples <joe@frap129.org>
* asmlinkage: Make main_extable_sort_needed visibleAndi Kleen2019-02-281-1/+1
| | | | | | | | | | | | main_extable_sort_needed is used by the build system and needs to be a normal ELF symbol. Make it visible so that LTO does not remove or mangle it. Signed-off-by: Andi Kleen <ak@linux.intel.com> Link: http://lkml.kernel.org/r/1391845930-28580-8-git-send-email-ak@linux.intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* extable: skip sorting if the table is emptyUwe Kleine-König2019-02-281-1/+1
| | | | | | | | | | | | | | | | At least on ARM no-MMU the extable is empty and so there is nothing to sort. So add a check for the table to be empty which effectively only changes that the misleading pr_notice is suppressed. Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: David Daney <david.daney@cavium.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Borislav Petkov <bp@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Pranav Vashi <neobuddy89@gmail.com> Signed-off-by: Joe Maples <joe@frap129.org>
* Merge tag 'android-8.1.0_r0.80' into android-msm-angler-3.10Nathan Chancellor2018-07-021-8/+95
|\ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Android 8.1.0 Release 0.80 (OPM6.171019.030.E1,angler) * tag 'android-8.1.0_r0.80': BACKPORT: futex: Prevent overflow by strengthen input validation Revert "BACKPORT: futex: Prevent overflow by strengthen input validation" soc: q6dspv2: apr: fix client registration refcount msm: sensor: ois: add conditional check for ioctl msm: adsprpc: Use unsigned integer for length values udp: consistently apply ufo or fragmentation drivers: cpuidle: lpm-levels: Fix untrusted pointer dereference. BACKPORT: futex: Remove requirement for lock_page() in get_futex_key() BACKPORT: futex: Prevent overflow by strengthen input validation ASoC: msm: qdsp6v2: check for buffer size before read msm: mdss: fix race condition between rotator api's UPSTREAM: scsi: sg: don't return bogus Sg_requests Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
| * BACKPORT: futex: Prevent overflow by strengthen input validationLi Jinyue2018-05-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Erick Reyes <erickreyes@google.com> Signed-off-by: Oleg Matcovschi <omatcovschi@google.com> Cc: peterz@infradead.org Cc: dvhart@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com Cherry-picked from <fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a> Bug: 76106267 Change-Id: I1ec41dfab44abfa5154d6f073e475bf499dac882
| * Revert "BACKPORT: futex: Prevent overflow by strengthen input validation"Badhri Jagan Sridharan2018-05-101-3/+0
| | | | | | | | | | | | | | | | This reverts commit <40d05cf87b550f7240fade844dfda72f181a8e3a> Bug: 76106267 Change-Id: I5e367c56e440c0cdeb59b3c714ac335920d28295 Signed-off-by: Badhri Jagan Sridharan <Badhri@google.com>
| * BACKPORT: futex: Remove requirement for lock_page() in get_futex_key()Mel Gorman2018-05-101-8/+92
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 65d8fc777f6dcfee12785c057a6b57f679641c90 upstream. When dealing with key handling for shared futexes, we can drastically reduce the usage/need of the page lock. 1) For anonymous pages, the associated futex object is the mm_struct which does not require the page lock. 2) For inode based, keys, we can check under RCU read lock if the page mapping is still valid and take reference to the inode. This just leaves one rare race that requires the page lock in the slow path when examining the swapcache. Additionally realtime users currently have a problem with the page lock being contended for unbounded periods of time during futex operations. Task A get_futex_key() lock_page() ---> preempted Now any other task trying to lock that page will have to wait until task A gets scheduled back in, which is an unbound time. With this patch, we pretty much have a lockless futex_get_key(). Experiments show that this patch can boost/speedup the hashing of shared futexes with the perf futex benchmarks (which is good for measuring such change) by up to 45% when there are high (> 100) thread counts on a 60 core Westmere. Lower counts are pretty much in the noise range or less than 10%, but mid range can be seen at over 30% overall throughput (hash ops/sec). This makes anon-mem shared futexes much closer to its private counterpart. Signed-off-by: Mel Gorman <mgorman@suse.de> [ Ported on top of thp refcount rework, changelog, comments, fixes. ] Signed-off-by: Davidlohr Bueso <dbueso@suse.de> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: Chris Mason <clm@fb.com> Cc: Darren Hart <dvhart@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: dave@stgolabs.net Link: http://lkml.kernel.org/r/1455045314-8305-3-git-send-email-dave@stgolabs.net Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Chenbo Feng <fengc@google.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Bug: 74250718 Change-Id: I80e300aa740a553fbbbf84143d6a4165f31c8a90 Signed-off-by: David Lin <dtwlin@google.com>
| * BACKPORT: futex: Prevent overflow by strengthen input validationLi Jinyue2018-05-101-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | UBSAN reports signed integer overflow in kernel/futex.c: UBSAN: Undefined behaviour in kernel/futex.c:2041:18 signed integer overflow: 0 - -2147483648 cannot be represented in type 'int' Add a sanity check to catch negative values of nr_wake and nr_requeue. Signed-off-by: Li Jinyue <lijinyue@huawei.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Erick Reyes <erickreyes@google.com> Signed-off-by: Oleg Matcovschi <omatcovschi@google.com> Cc: peterz@infradead.org Cc: dvhart@infradead.org Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/1513242294-31786-1-git-send-email-lijinyue@huawei.com Cherry-picked from fbe0e839d1e22d88810f3ee3e2f1479be4c0aa4a Change-Id: I954cc2848678318b60ec3f103d0c15f87b4605a4
* | Merge 3.10.108 into android-msm-angler-3.10-oreo-m5Nathan Chancellor2018-01-243-8/+29
|\ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 3.10.108: (141 commits) ipvs: SNAT packet replies only for NATed connections net: reduce skb_warn_bad_offload() noise net: skb_needs_check() accepts CHECKSUM_NONE for tx Staging: comedi: comedi_fops: Avoid orphaned proc entry udp: consistently apply ufo or fragmentation Bluetooth: bnep: bnep_add_connection() should verify that it's dealing with l2cap socket Bluetooth: cmtp: cmtp_add_connection() should verify that it's dealing with l2cap socket tcp: introduce tcp_rto_delta_us() helper for xmit timer fix tcp: enable xmit timer fix by having TLP use time when RTO should fire tcp: fix xmit timer to only be reset if data ACKed/SACKed mm/page_alloc: Remove kernel address exposure in free_reserved_area() leak in O_DIRECT readv past the EOF usb: renesas_usbhs: fix the behavior of some usbhs_pkt_handle usb: renesas_usbhs: fix the sequence in xfer_work() usb: renesas_usbhs: Fix DMAC sequence for receiving zero-length packet fs/exec.c: account for argv/envp pointers rxrpc: Fix several cases where a padded len isn't checked in ticket decode xfrm: policy: check policy direction value nl80211: check for the required netlink attributes presence ALSA: seq: Fix use-after-free at creating a port MIPS: Send SIGILL for BPOSGE32 in `__compute_return_epc_for_insn' serial: ifx6x60: fix use-after-free on module unload KEYS: fix dereferencing NULL payload with nonzero length usb: chipidea: debug: check before accessing ci_role cpufreq: conservative: Allow down_threshold to take values from 1 to 10 powerpc/kprobes: Pause function_graph tracing during jprobes handling staging: comedi: fix clean-up of comedi_class in comedi_init() brcmfmac: fix possible buffer overflow in brcmf_cfg80211_mgmt_tx() vt: fix unchecked __put_user() in tioclinux ioctls crypto: talitos - Extend max key length for SHA384/512-HMAC and AEAD PM / Domains: Fix unsafe iteration over modified list of device links powerpc/64: Fix atomic64_inc_not_zero() to return an int powerpc: Fix emulation of mfocrf in emulate_step() powerpc/asm: Mark cr0 as clobbered in mftb() usb: renesas_usbhs: fix usbhsc_resume() for !USBHSF_RUNTIME_PWCTRL MIPS: Actually decode JALX in `__compute_return_epc_for_insn' MIPS: Fix unaligned PC interpretation in `compute_return_epc' MIPS: math-emu: Prevent wrong ISA mode instruction emulation libata: array underflow in ata_find_dev() workqueue: restore WQ_UNBOUND/max_active==1 to be ordered ext4: fix SEEK_HOLE/SEEK_DATA for blocksize < pagesize ext4: fix overflow caused by missing cast in ext4_resize_fs() media: platform: davinci: return -EINVAL for VPFE_CMD_S_CCDC_RAW_PARAMS ioctl target: Avoid mappedlun symlink creation during lun shutdown fuse: initialize the flock flag in fuse_file on allocation scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabled scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress path scsi: zfcp: fix missing trace records for early returns in TMF eh handlers scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace records scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late response usb: renesas_usbhs: fix the BCLR setting condition for non-DCP pipe usb: renesas_usbhs: fix usbhsf_fifo_clear() for RX direction iommu/amd: Finish TLB flush in amd_iommu_unmap() direct-io: Prevent NULL pointer access in submit_page_section USB: serial: console: fix use-after-free after failed setup KEYS: don't let add_key() update an uninstantiated key FS-Cache: fix dereference of NULL user_key_payload ext4: keep existing extra fields when inode expands MIPS: Fix mips_atomic_set() retry condition KEYS: prevent creating a different user's keyrings KEYS: encrypted: fix dereference of NULL user_key_payload md/bitmap: disable bitmap_resize for file-backed bitmaps. lib/digsig: fix dereference of NULL user_key_payload netfilter: invoke synchronize_rcu after set the _hook_ to NULL md/raid10: submit bio directly to replacement disk md: fix super_offset endianness in super_1_rdev_size_change lib/cmdline.c: fix get_options() overflow while parsing ranges ext4: fix SEEK_HOLE net: prevent sign extension in dev_get_stats() kernel/extable.c: mark core_kernel_text notrace wext: handle NULL extra data in iwe_stream_add_point better netfilter: nf_ct_ext: fix possible panic after nf_ct_extend_unregister ext4: in ext4_seek_{hole,data}, return -ENXIO for negative offsets ext4: avoid deadlock when expanding inode size sctp: don't dereference ptr before leaving _sctp_walk_{params, errors}() sctp: fix the check for _sctp_walk_params and _sctp_walk_errors sctp: fully initialize the IPv6 address in sctp_v6_to_addr() sctp: potential read out of bounds in sctp_ulpevent_type_enabled() tcp: disallow cwnd undo when switching congestion control netfilter: xt_TCPMSS: add more sanity tests on tcph->doff tcp: reset sk_rx_dst in tcp_disconnect() tcp: avoid setting cwnd to invalid ssthresh after cwnd reduction states tcp: when rearming RTO, if RTO time is in past then fire RTO ASAP tcp: initialize rcv_mss to TCP_MIN_MSS instead of 0 net/packet: check length in getsockopt() called with PACKET_HDRLEN net: Set sk_prot_creator when cloning sockets to the right proto net/mlx4_core: Fix VF overwrite of module param which disables DMFS on new probed PFs net: 8021q: Fix one possible panic caused by BUG_ON in free_netdev x86/io: Add "memory" clobber to insb/insw/insl/outsb/outsw/outsl kvm: async_pf: fix rcu_irq_enter() with irqs enabled net: ping: do not abuse udp_poll() scsi: qla2xxx: don't disable a not previously enabled PCI device drm/vmwgfx: Handle vmalloc() failure in vmw_local_fifo_reserve() net: xilinx_emaclite: fix receive buffer overflow serial: efm32: Fix parity management in 'efm32_uart_console_get_options()' x86/mm/32: Set the '__vmalloc_start_set' flag in initmem_init() mfd: omap-usb-tll: Fix inverted bit use for USB TLL mode pvrusb2: reduce stack usage pvr2_eeprom_analyze() usb: r8a66597-hcd: select a different endpoint on timeout usb: r8a66597-hcd: decrease timeout drivers/misc/c2port/c2port-duramar2150.c: checking for NULL instead of IS_ERR() net: phy: fix marvell phy status reading net: korina: Fix NAPI versus resources freeing xfrm: NULL dereference on allocation failure xfrm: Oops on error in pfkey_msg2xfrm_state() cpufreq: s3c2416: double free on driver init error path KVM: x86: zero base3 of unusable segments KEYS: Fix an error code in request_master_key() ipv6: avoid unregistering inet6_dev for loopback cfg80211: Validate frequencies nested in NL80211_ATTR_SCAN_FREQUENCIES cfg80211: Check if PMKID attribute is of expected size mm: fix overflow check in expand_upwards() crypto: caam - fix signals handling ir-core: fix gcc-7 warning on bool arithmetic udf: Fix deadlock between writeback and udf_setsize() perf annotate: Fix broken arrow at row 0 connecting jmp instruction to its target net/mlx4: Remove BUG_ON from ICM allocation routine ipv4: initialize fib_trie prior to register_netdev_notifier call. workqueue: implicit ordered attribute should be overridable packet: fix tp_reserve race in packet_set_ring staging:iio:resolver:ad2s1210 fix negative IIO_ANGL_VEL read ALSA: core: Fix unexpected error at replacing user TLV ACPI / APEI: Add missing synchronize_rcu() on NOTIFY_SCI removal qlge: avoid memcpy buffer overflow ipv6: fix memory leak with multiple tables during netns destruction ipv6: fix typo in fib6_net_exit() ip6_gre: fix endianness errors in ip6gre_err crypto: AF_ALG - remove SGL terminator indicator when chaining scsi: qla2xxx: Fix an integer overflow in sysfs code tracing: Apply trace_clock changes to instance max buffer tracing: Erase irqsoff trace with empty write btrfs: prevent to set invalid default subvolid IB/ipoib: rtnl_unlock can not come after free_netdev team: fix memory leaks IB/qib: fix false-postive maybe-uninitialized warning KVM: nVMX: fix guest CR4 loading when emulating L2 to L1 exit usb: gadget: composite: Fix use-after-free in usb_composite_overwrite_options scsi: scsi_dh_emc: return success in clariion_std_inquiry() can: esd_usb2: Fix can_dlc value for received RTR, frames x86/apic: fix build breakage caused by incomplete backport to 3.10 Linux 3.10.108 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com>
| * | tracing: Erase irqsoff trace with empty writeBo Yan2017-11-021-2/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 8dd33bcb7050dd6f8c1432732f930932c9d3a33e upstream. One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Link: http://lkml.kernel.org/r/1505754215-29411-1-git-send-email-byan@nvidia.com Cc: <mingo@redhat.com> Cc: stable@vger.kernel.org Fixes: 4acd4d00f ("tracing: give easy way to clear trace buffer") Signed-off-by: Bo Yan <byan@nvidia.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | tracing: Apply trace_clock changes to instance max bufferBaohong Liu2017-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream. Currently trace_clock timestamps are applied to both regular and max buffers only for global trace. For instance trace, trace_clock timestamps are applied only to regular buffer. But, regular and max buffers can be swapped, for example, following a snapshot. So, for instance trace, bad timestamps can be seen following a snapshot. Let's apply trace_clock timestamps to instance max buffer as well. Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com Cc: stable@vger.kernel.org Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers") Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | workqueue: implicit ordered attribute should be overridableTejun Heo2017-11-021-4/+9
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 0a94efb5acbb6980d7c9ab604372d93cd507e4d8 upstream. 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") automatically enabled ordered attribute for unbound workqueues w/ max_active == 1. Because ordered workqueues reject max_active and some attribute changes, this implicit ordered mode broke cases where the user creates an unbound workqueue w/ max_active == 1 and later explicitly changes the related attributes. This patch distinguishes explicit and implicit ordered setting and overrides from attribute changes if implict. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 5c0338c68706 ("workqueue: restore WQ_UNBOUND/max_active==1 to be ordered") Cc: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | kernel/extable.c: mark core_kernel_text notraceMarcin Nowakowski2017-11-021-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c0d80ddab89916273cb97114889d3f337bc370ae upstream. core_kernel_text is used by MIPS in its function graph trace processing, so having this method traced leads to an infinite set of recursive calls such as: Call Trace: ftrace_return_to_handler+0x50/0x128 core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 return_to_handler+0x10/0x30 return_to_handler+0x0/0x30 return_to_handler+0x0/0x30 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 core_kernel_text+0x10/0x1b8 ftrace_ops_no_ops+0x114/0x1bc core_kernel_text+0x10/0x1b8 prepare_ftrace_return+0x6c/0x114 ftrace_graph_caller+0x20/0x44 (...) Mark the function notrace to avoid it being traced. Link: http://lkml.kernel.org/r/1498028607-6765-1-git-send-email-marcin.nowakowski@imgtec.com Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Meyer <thomas@m3y3r.de> Cc: Ingo Molnar <mingo@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | workqueue: restore WQ_UNBOUND/max_active==1 to be orderedTejun Heo2017-11-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5c0338c68706be53b3dc472e4308961c36e4ece1 upstream. The combination of WQ_UNBOUND and max_active == 1 used to imply ordered execution. After NUMA affinity 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues"), this is no longer true due to per-node worker pools. While the right way to create an ordered workqueue is alloc_ordered_workqueue(), the documentation has been misleading for a long time and people do use WQ_UNBOUND and max_active == 1 for ordered workqueues which can lead to subtle bugs which are very difficult to trigger. It's unlikely that we'd see noticeable performance impact by enforcing ordering on WQ_UNBOUND / max_active == 1 workqueues. Let's automatically set __WQ_ORDERED for those workqueues. Signed-off-by: Tejun Heo <tj@kernel.org> Reported-by: Christoph Hellwig <hch@infradead.org> Reported-by: Alexei Potashnik <alexei@purestorage.com> Fixes: 4c16bd327c74 ("workqueue: implement NUMA affinity for unbound workqueues") Cc: stable@vger.kernel.org # v3.10+ Signed-off-by: Willy Tarreau <w@1wt.eu>
* | | Merge 3.10.107 into android-msm-angler-3.10-oreo-m5Nathan Chancellor2018-01-246-14/+24
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 3.10.107: (270 commits) Revert "Btrfs: don't delay inode ref updates during log, replay" Btrfs: fix memory leak in reading btree blocks ext4: use more strict checks for inodes_per_block on mount ext4: fix in-superblock mount options processing ext4: add sanity checking to count_overhead() ext4: validate s_first_meta_bg at mount time jbd2: don't leak modified metadata buffers on an aborted journal ext4: fix fencepost in s_first_meta_bg validation ext4: trim allocation requests to group size ext4: preserve the needs_recovery flag when the journal is aborted ext4: return EROFS if device is r/o and journal replay is needed ext4: fix inode checksum calculation problem if i_extra_size is small block: fix use-after-free in sys_ioprio_get() block: allow WRITE_SAME commands with the SG_IO ioctl block: fix del_gendisk() vs blkdev_ioctl crash dm crypt: mark key as invalid until properly loaded dm space map metadata: fix 'struct sm_metadata' leak on failed create md/raid5: limit request size according to implementation limits md:raid1: fix a dead loop when read from a WriteMostly disk md linear: fix a race between linear_add() and linear_congested() CIFS: Fix a possible memory corruption during reconnect CIFS: Fix missing nls unload in smb2_reconnect() CIFS: Fix a possible memory corruption in push locks CIFS: remove bad_network_name flag fs/cifs: make share unaccessible at root level mountable cifs: Do not send echoes before Negotiate is complete ocfs2: fix crash caused by stale lvb with fsdlm plugin ocfs2: fix BUG_ON() in ocfs2_ci_checkpointed() can: raw: raw_setsockopt: limit number of can_filter that can be set can: peak: fix bad memory access and free sequence can: c_can_pci: fix null-pointer-deref in c_can_start() - set device pointer can: ti_hecc: add missing prepare and unprepare of the clock can: bcm: fix hrtimer/tasklet termination in bcm op removal can: usb_8dev: Fix memory leak of priv->cmd_msg_buffer ALSA: hda - Fix up GPIO for ASUS ROG Ranger ALSA: seq: Fix race at creating a queue ALSA: seq: Don't handle loop timeout at snd_seq_pool_done() ALSA: timer: Reject user params with too small ticks ALSA: seq: Fix link corruption by event error handling ALSA: seq: Fix racy cell insertions during snd_seq_pool_done() ALSA: seq: Fix race during FIFO resize ALSA: seq: Don't break snd_use_lock_sync() loop by timeout ALSA: usb-audio: Add QuickCam Communicate Deluxe/S7500 to volume_control_quirks usb: gadgetfs: restrict upper bound on device configuration size USB: gadgetfs: fix unbounded memory allocation bug USB: gadgetfs: fix use-after-free bug USB: gadgetfs: fix checks of wTotalLength in config descriptors xhci: free xhci virtual devices with leaf nodes first USB: serial: io_ti: bind to interface after fw download usb: gadget: composite: always set ep->mult to a sensible value USB: cdc-acm: fix double usb_autopm_put_interface() in acm_port_activate() USB: cdc-acm: fix open and suspend race USB: cdc-acm: fix failed open not being detected usb: dwc3: gadget: make Set Endpoint Configuration macros safe usb: host: xhci-plat: Fix timeout on removal of hot pluggable xhci controllers usb: dwc3: gadget: delay unmap of bounced requests usb: hub: Wait for connection to be reestablished after port reset usb: gadget: composite: correctly initialize ep->maxpacket USB: UHCI: report non-PME wakeup signalling for Intel hardware arm/xen: Use alloc_percpu rather than __alloc_percpu xfs: set AGI buffer type in xlog_recover_clear_agi_bucket xfs: clear _XBF_PAGES from buffers when readahead page ssb: Fix error routine when fallback SPROM fails drivers/gpu/drm/ast: Fix infinite loop if read fails scsi: avoid a permanent stop of the scsi device's request queue scsi: move the nr_phys_segments assert into scsi_init_io scsi: don't BUG_ON() empty DMA transfers scsi: storvsc: properly handle SRB_ERROR when sense message is present scsi: storvsc: properly set residual data length on errors target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER export scsi: lpfc: Add shutdown method for kexec scsi: sr: Sanity check returned mode data scsi: sd: Fix capacity calculation with 32-bit sector_t s390/vmlogrdr: fix IUCV buffer allocation libceph: verify authorize reply on connect nfs_write_end(): fix handling of short copies powerpc/ps3: Fix system hang with GCC 5 builds sg_write()/bsg_write() is not fit to be called under KERNEL_DS ftrace/x86: Set ftrace_stub to weak to prevent gcc from using short jumps to it cred/userns: define current_user_ns() as a function net: ti: cpmac: Fix compiler warning due to type confusion tick/broadcast: Prevent NULL pointer dereference netvsc: reduce maximum GSO size drop_monitor: add missing call to genlmsg_end drop_monitor: consider inserted data in genlmsg_end igmp: Make igmp group member RFC 3376 compliant HID: hid-cypress: validate length of report Input: xpad - use correct product id for x360w controllers Input: i8042 - add noloop quirk for Dell Embedded Box PC 3000 Input: iforce - validate number of endpoints before using them Input: kbtab - validate number of endpoints before using them Input: joydev - do not report stale values on first open Input: tca8418 - use the interrupt trigger from the device tree Input: mpr121 - handle multiple bits change of status register Input: mpr121 - set missing event capability Input: i8042 - add Clevo P650RS to the i8042 reset list i2c: fix kernel memory disclosure in dev interface vme: Fix wrong pointer utilization in ca91cx42_slave_get sysrq: attach sysrq handler correctly for 32-bit kernel pinctrl: sh-pfc: Do not unconditionally support PIN_CONFIG_BIAS_DISABLE x86/PCI: Ignore _CRS on Supermicro X8DTH-i/6/iF/6F qla2xxx: Fix crash due to null pointer access ARM: 8634/1: hw_breakpoint: blacklist Scorpion CPUs ARM: dts: da850-evm: fix read access to SPI flash NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT vmxnet3: Wake queue from reset work Fix memory leaks in cifs_do_mount() Compare prepaths when comparing superblocks Move check for prefix path to within cifs_get_root() Fix regression which breaks DFS mounting apparmor: fix uninitialized lsm_audit member apparmor: exec should not be returning ENOENT when it denies apparmor: fix disconnected bind mnts reconnection apparmor: internal paths should be treated as disconnected apparmor: check that xindex is in trans_table bounds apparmor: add missing id bounds check on dfa verification apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: fix module parameters can be changed after policy is locked apparmor: do not expose kernel stack vfio/pci: Fix integer overflows, bitmask check bna: Add synchronization for tx ring. sg: Fix double-free when drives detach during SG_IO move the call of __d_drop(anon) into __d_materialise_unique(dentry, anon) serial: 8250_pci: Detach low-level driver during PCI error recovery bnx2x: Correct ringparam estimate when DOWN tile/ptrace: Preserve previous registers for short regset write sysctl: fix proc_doulongvec_ms_jiffies_minmax() ISDN: eicon: silence misleading array-bounds warning ARC: [arcompact] handle unaligned access delay slot corner case parisc: Don't use BITS_PER_LONG in userspace-exported swab.h header nfs: Don't increment lock sequence ID after NFS4ERR_MOVED ipv6: addrconf: Avoid addrconf_disable_change() using RCU read-side lock af_unix: move unix_mknod() out of bindlock drm/nouveau/nv1a,nv1f/disp: fix memory clock rate retrieval crypto: api - Clear CRYPTO_ALG_DEAD bit before registering an alg ata: sata_mv:- Handle return value of devm_ioremap. mm/memory_hotplug.c: check start_pfn in test_pages_in_a_zone() mm, fs: check for fatal signals in do_generic_file_read() ARC: [arcompact] brown paper bag bug in unaligned access delay slot fixup sched/debug: Don't dump sched debug info in SysRq-W tcp: fix 0 divide in __tcp_select_window() macvtap: read vnet_hdr_size once packet: round up linear to header len vfs: fix uninitialized flags in splice_to_pipe() siano: make it work again with CONFIG_VMAP_STACK futex: Move futex_init() to core_initcall rtc: interface: ignore expired timers when enqueuing new timers irda: Fix lockdep annotations in hashbin_delete(). tty: serial: msm: Fix module autoload rtlwifi: rtl_usb: Fix for URB leaking when doing ifconfig up/down af_packet: remove a stray tab in packet_set_ring() MIPS: Fix special case in 64 bit IP checksumming. mm: vmpressure: fix sending wrong events on underflow ipc/shm: Fix shmat mmap nil-page protection sd: get disk reference in sd_check_events() samples/seccomp: fix 64-bit comparison macros ath5k: drop bogus warning on drv_set_key with unsupported cipher rdma_cm: fail iwarp accepts w/o connection params NFSv4: fix getacl ERANGE for some ACL buffer sizes bcma: use (get|put)_device when probing/removing device driver powerpc/xmon: Fix data-breakpoint KVM: VMX: use correct vmcs_read/write for guest segment selector/base KVM: PPC: Book3S PR: Fix illegal opcode emulation KVM: s390: fix task size check s390: TASK_SIZE for kernel threads xtensa: move parse_tag_fdt out of #ifdef CONFIG_BLK_DEV_INITRD mac80211: flush delayed work when entering suspend drm/ast: Fix test for VGA enabled drm/ttm: Make sure BOs being swapped out are cacheable fat: fix using uninitialized fields of fat_inode/fsinfo_inode drivers: hv: Turn off write permission on the hypercall page xhci: fix 10 second timeout on removal of PCI hotpluggable xhci controllers crypto: improve gcc optimization flags for serpent and wp512 mtd: pmcmsp: use kstrndup instead of kmalloc+strncpy cpmac: remove hopeless #warning mvsas: fix misleading indentation l2tp: avoid use-after-free caused by l2tp_ip_backlog_recv net: don't call strlen() on the user buffer in packet_bind_spkt() dccp: Unlock sock before calling sk_free() tcp: fix various issues for sockets morphing to listen state uapi: fix linux/packet_diag.h userspace compilation error ipv6: avoid write to a possibly cloned skb dccp: fix memory leak during tear-down of unsuccessful connection request futex: Fix potential use-after-free in FUTEX_REQUEUE_PI futex: Add missing error handling to FUTEX_REQUEUE_PI give up on gcc ilog2() constant optimizations cancel the setfilesize transation when io error happen crypto: ghash-clmulni - Fix load failure crypto: cryptd - Assign statesize properly ACPI / video: skip evaluating _DOD when it does not exist Drivers: hv: balloon: don't crash when memory is added in non-sorted order s390/pci: fix use after free in dma_init cpufreq: Fix and clean up show_cpuinfo_cur_freq() igb: Workaround for igb i210 firmware issue igb: add i211 to i210 PHY workaround ipv4: provide stronger user input validation in nl_fib_input() tcp: initialize icsk_ack.lrcvtime at session start time ACM gadget: fix endianness in notifications mmc: sdhci: Do not disable interrupts while waiting for clock uvcvideo: uvc_scan_fallback() for webcams with broken chain fbcon: Fix vc attr at deinit crypto: algif_hash - avoid zero-sized array virtio_balloon: init 1st buffer in stats vq c6x/ptrace: Remove useless PTRACE_SETREGSET implementation sparc/ptrace: Preserve previous registers for short regset write metag/ptrace: Preserve previous registers for short regset write metag/ptrace: Provide default TXSTATUS for short NT_PRSTATUS metag/ptrace: Reject partial NT_METAG_RPIPE writes libceph: force GFP_NOIO for socket allocations ACPI: Fix incompatibility with mcount-based function graph tracing ACPI / power: Avoid maybe-uninitialized warning rtc: s35390a: make sure all members in the output are set rtc: s35390a: implement reset routine as suggested by the reference rtc: s35390a: improve irq handling padata: avoid race in reordering HID: hid-lg: Fix immediate disconnection of Logitech Rumblepad 2 HID: i2c-hid: Add sleep between POWER ON and RESET drm/vmwgfx: NULL pointer dereference in vmw_surface_define_ioctl() drm/vmwgfx: avoid calling vzalloc with a 0 size in vmw_get_cap_3d_ioctl() drm/vmwgfx: Remove getparam error message drm/vmwgfx: fix integer overflow in vmw_surface_define_ioctl() Reset TreeId to zero on SMB2 TREE_CONNECT metag/usercopy: Drop unused macros metag/usercopy: Zero rest of buffer from copy_from_user powerpc: Don't try to fix up misaligned load-with-reservation instructions mm/mempolicy.c: fix error handling in set_mempolicy and mbind. mtd: bcm47xxpart: fix parsing first block after aligned TRX net/packet: fix overflow in check for priv area size x86/vdso: Plug race between mapping and ELF header setup iscsi-target: Fix TMR reference leak during session shutdown iscsi-target: Drop work-around for legacy GlobalSAN initiator xen, fbfront: fix connecting to backend char: lack of bool string made CONFIG_DEVPORT always on platform/x86: acer-wmi: setup accelerometer when machine has appropriate notify event platform/x86: acer-wmi: setup accelerometer when ACPI device was found mm: Tighten x86 /dev/mem with zeroing reads virtio-console: avoid DMA from stack catc: Combine failure cleanup code in catc_probe() catc: Use heap buffer for memory size test net: ipv6: check route protocol when deleting routes Drivers: hv: don't leak memory in vmbus_establish_gpadl() Drivers: hv: get rid of timeout in vmbus_open() ubi/upd: Always flush after prepared for an update x86/mce/AMD: Give a name to MCA bank 3 when accessed with legacy MSRs powerpc: Reject binutils 2.24 when building little endian net/packet: fix overflow in check for tp_frame_nr net/packet: fix overflow in check for tp_reserve tty: nozomi: avoid a harmless gcc warning hostap: avoid uninitialized variable use in hfa384x_get_rid gfs2: avoid uninitialized variable warning net: neigh: guard against NULL solicit() method sctp: listen on the sock only when it's state is listening or closed ip6mr: fix notification device destruction MIPS: Fix crash registers on non-crashing CPUs RDS: Fix the atomicity for congestion map update xen/x86: don't lose event interrupts p9_client_readdir() fix nfsd: check for oversized NFSv2/v3 arguments ftrace/x86: Fix triple fault with graph tracing and suspend-to-ram kvm: nVMX: Allow L1 to intercept software exceptions (#BP and #OF) tun: read vnet_hdr_sz once printk: use rcuidle console tracepoint ipv6: check raw payload size correctly in ioctl x86: standardize mmap_rnd() usage x86/mm/32: Enable full randomization on i386 and X86_32 mm: larger stack guard gap, between vmas mm: fix new crash in unmapped_area_topdown() Allow stack to grow up to address space limit Linux 3.10.107 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Conflicts: arch/x86/mm/mmap.c drivers/mmc/host/sdhci.c drivers/usb/host/xhci-plat.c fs/ext4/super.c kernel/sched/core.c
| * | printk: use rcuidle console tracepointSergey Senozhatsky2017-06-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fc98c3c8c9dcafd67adcce69e6ce3191d5306c9c upstream. Use rcuidle console tracepoint because, apparently, it may be issued from an idle CPU: hw-breakpoint: Failed to enable monitor mode on CPU 0. hw-breakpoint: CPU 0 failed to disable vector catch =============================== [ ERR: suspicious RCU usage. ] 4.10.0-rc8-next-20170215+ #119 Not tainted ------------------------------- ./include/trace/events/printk.h:32 suspicious rcu_dereference_check() usage! other info that might help us debug this: RCU used illegally from idle CPU! rcu_scheduler_active = 2, debug_locks = 0 RCU used illegally from extended quiescent state! 2 locks held by swapper/0/0: #0: (cpu_pm_notifier_lock){......}, at: [<c0237e2c>] cpu_pm_exit+0x10/0x54 #1: (console_lock){+.+.+.}, at: [<c01ab350>] vprintk_emit+0x264/0x474 stack backtrace: CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-20170215+ #119 Hardware name: Generic OMAP4 (Flattened Device Tree) console_unlock vprintk_emit vprintk_default printk reset_ctrl_regs dbg_cpu_pm_notify notifier_call_chain cpu_pm_exit omap_enter_idle_coupled cpuidle_enter_state cpuidle_enter_state_coupled do_idle cpu_startup_entry start_kernel This RCU warning, however, is suppressed by lockdep_off() in printk(). lockdep_off() increments the ->lockdep_recursion counter and thus disables RCU_LOCKDEP_WARN() and debug_lockdep_rcu_enabled(), which want lockdep to be enabled "current->lockdep_recursion == 0". Link: http://lkml.kernel.org/r/20170217015932.11898-1-sergey.senozhatsky@gmail.com Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Reported-by: Tony Lindgren <tony@atomide.com> Tested-by: Tony Lindgren <tony@atomide.com> Acked-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Cc: Petr Mladek <pmladek@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Lindgren <tony@atomide.com> Cc: Russell King <rmk@armlinux.org.uk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> [wt: changes are in kernel/printk.c in 3.10] Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | padata: avoid race in reorderingJason A. Donenfeld2017-06-201-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit de5540d088fe97ad583cc7d396586437b32149a5 upstream. Under extremely heavy uses of padata, crashes occur, and with list debugging turned on, this happens instead: [87487.298728] WARNING: CPU: 1 PID: 882 at lib/list_debug.c:33 __list_add+0xae/0x130 [87487.301868] list_add corruption. prev->next should be next (ffffb17abfc043d0), but was ffff8dba70872c80. (prev=ffff8dba70872b00). [87487.339011] [<ffffffff9a53d075>] dump_stack+0x68/0xa3 [87487.342198] [<ffffffff99e119a1>] ? console_unlock+0x281/0x6d0 [87487.345364] [<ffffffff99d6b91f>] __warn+0xff/0x140 [87487.348513] [<ffffffff99d6b9aa>] warn_slowpath_fmt+0x4a/0x50 [87487.351659] [<ffffffff9a58b5de>] __list_add+0xae/0x130 [87487.354772] [<ffffffff9add5094>] ? _raw_spin_lock+0x64/0x70 [87487.357915] [<ffffffff99eefd66>] padata_reorder+0x1e6/0x420 [87487.361084] [<ffffffff99ef0055>] padata_do_serial+0xa5/0x120 padata_reorder calls list_add_tail with the list to which its adding locked, which seems correct: spin_lock(&squeue->serial.lock); list_add_tail(&padata->list, &squeue->serial.list); spin_unlock(&squeue->serial.lock); This therefore leaves only place where such inconsistency could occur: if padata->list is added at the same time on two different threads. This pdata pointer comes from the function call to padata_get_next(pd), which has in it the following block: next_queue = per_cpu_ptr(pd->pqueue, cpu); padata = NULL; reorder = &next_queue->reorder; if (!list_empty(&reorder->list)) { padata = list_entry(reorder->list.next, struct padata_priv, list); spin_lock(&reorder->lock); list_del_init(&padata->list); atomic_dec(&pd->reorder_objects); spin_unlock(&reorder->lock); pd->processed++; goto out; } out: return padata; I strongly suspect that the problem here is that two threads can race on reorder list. Even though the deletion is locked, call to list_entry is not locked, which means it's feasible that two threads pick up the same padata object and subsequently call list_add_tail on them at the same time. The fix is thus be hoist that lock outside of that block. Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | futex: Add missing error handling to FUTEX_REQUEUE_PIPeter Zijlstra2017-06-201-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream. Thomas spotted that fixup_pi_state_owner() can return errors and we fail to unlock the rt_mutex in that case. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | futex: Fix potential use-after-free in FUTEX_REQUEUE_PIPeter Zijlstra2017-06-201-9/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream. While working on the futex code, I stumbled over this potential use-after-free scenario. Dmitry triggered it later with syzkaller. pi_mutex is a pointer into pi_state, which we drop the reference on in unqueue_me_pi(). So any access to that pointer after that is bad. Since other sites already do rt_mutex_unlock() with hb->lock held, see for example futex_lock_pi(), simply move the unlock before unqueue_me_pi(). Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | futex: Move futex_init() to core_initcallYang Yang2017-06-201-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 25f71d1c3e98ef0e52371746220d66458eac75bc upstream. The UEVENT user mode helper is enabled before the initcalls are executed and is available when the root filesystem has been mounted. The user mode helper is triggered by device init calls and the executable might use the futex syscall. futex_init() is marked __initcall which maps to device_initcall, but there is no guarantee that futex_init() is invoked _before_ the first device init call which triggers the UEVENT user mode helper. If the user mode helper uses the futex syscall before futex_init() then the syscall crashes with a NULL pointer dereference because the futex subsystem has not been initialized yet. Move futex_init() to core_initcall so futexes are initialized before the root filesystem is mounted and the usermode helper becomes available. [ tglx: Rewrote changelog ] Signed-off-by: Yang Yang <yang.yang29@zte.com.cn> Cc: jiang.biao2@zte.com.cn Cc: jiang.zhengxiong@zte.com.cn Cc: zhong.weidong@zte.com.cn Cc: deng.huali@zte.com.cn Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1483085875-6130-1-git-send-email-yang.yang29@zte.com.cn Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | sched/debug: Don't dump sched debug info in SysRq-WRabin Vincent2017-06-201-1/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit fb90a6e93c0684ab2629a42462400603aa829b9c upstream. sysrq_sched_debug_show() can dump a lot of information. Don't print out all that if we're just trying to get a list of blocked tasks (SysRq-W). The information is still accessible with SysRq-T. Signed-off-by: Rabin Vincent <rabinv@axis.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1459777322-30902-1-git-send-email-rabin.vincent@axis.com Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Nikolay Borisov <n.borisov.lkml@gmail.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | sysctl: fix proc_doulongvec_ms_jiffies_minmax()Eric Dumazet2017-06-201-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ff9f8a7cf935468a94d9927c68b00daae701667e upstream. We perform the conversion between kernel jiffies and ms only when exporting kernel value to user space. We need to do the opposite operation when value is written by user. Only matters when HZ != 1000 Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | tick/broadcast: Prevent NULL pointer dereferenceThomas Gleixner2017-06-201-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c1a9eeb938b5433947e5ea22f89baff3182e7075 upstream. When a disfunctional timer, e.g. dummy timer, is installed, the tick core tries to setup the broadcast timer. If no broadcast device is installed, the kernel crashes with a NULL pointer dereference in tick_broadcast_setup_oneshot() because the function has no sanity check. Reported-by: Mason <slash.tmp@free.fr> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Anna-Maria Gleixner <anna-maria@linutronix.de> Cc: Richard Cochran <rcochran@linutronix.de> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Daniel Lezcano <daniel.lezcano@linaro.org> Cc: Peter Zijlstra <peterz@infradead.org>, Cc: Sebastian Frias <sf84@laposte.net> Cc: Thibaud Cornic <thibaud_cornic@sigmadesigns.com> Cc: Robin Murphy <robin.murphy@arm.com> Link: http://lkml.kernel.org/r/1147ef90-7877-e4d2-bb2b-5c4fa8d3144b@free.fr Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
* | | Merge 3.10.106 into android-msm-angler-3.10-oreo-m5Nathan Chancellor2018-01-247-23/+105
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 3.10.106: (252 commits) packet: fix race condition in packet_set_ring crypto: crypto_memneq - add equality testing of memory regions w/o timing leaks EVM: Use crypto_memneq() for digest comparisons libceph: don't set weight to IN when OSD is destroyed KVM: x86: fix emulation of "MOV SS, null selector" KVM: x86: Introduce segmented_write_std posix_acl: Clear SGID bit when setting file permissions tmpfs: clear S_ISGID when setting posix ACLs fbdev: color map copying bounds checking selinux: fix off-by-one in setprocattr tcp: avoid infinite loop in tcp_splice_read() xfrm_user: validate XFRM_MSG_NEWAE XFRMA_REPLAY_ESN_VAL replay_window xfrm_user: validate XFRM_MSG_NEWAE incoming ESN size harder KEYS: Disallow keyrings beginning with '.' to be joined as session keyrings KEYS: Change the name of the dead type to ".dead" to prevent user access KEYS: fix keyctl_set_reqkey_keyring() to not leak thread keyrings ext4: fix data exposure after a crash locking/rtmutex: Prevent dequeue vs. unlock race m68k: Fix ndelay() macro hotplug: Make register and unregister notifier API symmetric Btrfs: fix tree search logic when replaying directory entry deletes USB: serial: kl5kusb105: fix open error path block_dev: don't test bdev->bd_contains when it is not stable crypto: caam - fix AEAD givenc descriptors ext4: fix mballoc breakage with 64k block size ext4: fix stack memory corruption with 64k block size ext4: reject inodes with negative size ext4: return -ENOMEM instead of success f2fs: set ->owner for debugfs status file's file_operations block: protect iterate_bdevs() against concurrent close scsi: zfcp: fix use-after-"free" in FC ingress path after TMF scsi: zfcp: do not trace pure benign residual HBA responses at default level scsi: zfcp: fix rport unblock race with LUN recovery ftrace/x86_32: Set ftrace_stub to weak to prevent gcc from using short jumps to it IB/mad: Fix an array index check IB/multicast: Check ib_find_pkey() return value powerpc: Convert cmp to cmpd in idle enter sequence usb: gadget: composite: Test get_alt() presence instead of set_alt() USB: serial: omninet: fix NULL-derefs at open and disconnect USB: serial: quatech2: fix sleep-while-atomic in close USB: serial: pl2303: fix NULL-deref at open USB: serial: keyspan_pda: verify endpoints at probe USB: serial: spcp8x5: fix NULL-deref at open USB: serial: io_ti: fix NULL-deref at open USB: serial: io_ti: fix another NULL-deref at open USB: serial: iuu_phoenix: fix NULL-deref at open USB: serial: garmin_gps: fix memory leak on failed URB submit USB: serial: ti_usb_3410_5052: fix NULL-deref at open USB: serial: io_edgeport: fix NULL-deref at open USB: serial: oti6858: fix NULL-deref at open USB: serial: cyberjack: fix NULL-deref at open USB: serial: kobil_sct: fix NULL-deref in write USB: serial: mos7840: fix NULL-deref at open USB: serial: mos7720: fix NULL-deref at open USB: serial: mos7720: fix use-after-free on probe errors USB: serial: mos7720: fix parport use-after-free on probe errors USB: serial: mos7720: fix parallel probe usb: xhci-mem: use passed in GFP flags instead of GFP_KERNEL usb: musb: Fix trying to free already-free IRQ 4 ALSA: usb-audio: Fix bogus error return in snd_usb_create_stream() USB: serial: kl5kusb105: abort on open exception path staging: iio: ad7606: fix improper setting of oversampling pins usb: dwc3: gadget: always unmap EP0 requests cris: Only build flash rescue image if CONFIG_ETRAX_AXISFLASHMAP is selected hwmon: (ds620) Fix overflows seen when writing temperature limits clk: clk-wm831x: fix a logic error iommu/amd: Fix the left value check of cmd buffer scsi: mvsas: fix command_active typo target/iscsi: Fix double free in lio_target_tiqn_addtpg() mmc: mmc_test: Uninitialized return value powerpc/pci/rpadlpar: Fix device reference leaks ser_gigaset: return -ENOMEM on error instead of success net, sched: fix soft lockup in tc_classify net: stmmac: Fix race between stmmac_drv_probe and stmmac_open gro: Enter slow-path if there is no tailroom gro: use min_t() in skb_gro_reset_offset() gro: Disable frag0 optimization on IPv6 ext headers powerpc: Fix build warning on 32-bit PPC Input: i8042 - add Pegatron touchpad to noloop table mm/hugetlb.c: fix reservation race when freeing surplus pages USB: serial: kl5kusb105: fix line-state error handling USB: serial: ch341: fix initial modem-control state USB: serial: ch341: fix open error handling USB: serial: ch341: fix control-message error handling USB: serial: ch341: fix open and resume after B0 USB: serial: ch341: fix resume after reset USB: serial: ch341: fix modem-control and B0 handling x86/cpu: Fix bootup crashes by sanitizing the argument of the 'clearcpuid=' command-line option NFSv4.1: nfs4_fl_prepare_ds must be careful about reporting success. powerpc/ibmebus: Fix further device reference leaks powerpc/ibmebus: Fix device reference leaks in sysfs interface IB/mlx4: Set traffic class in AH IB/mlx4: Fix port query for 56Gb Ethernet links perf scripting: Avoid leaking the scripting_context variable ARM: dts: imx31: fix clock control module interrupts description svcrpc: don't leak contexts on PROC_DESTROY mmc: mxs-mmc: Fix additional cycles after transmission stop mtd: nand: xway: disable module support ubifs: Fix journal replay wrt. xattr nodes arm64/ptrace: Preserve previous registers for short regset write arm64/ptrace: Avoid uninitialised struct padding in fpr_set() arm64/ptrace: Reject attempts to set incomplete hardware breakpoint fields ARM: ux500: fix prcmu_is_cpu_in_wfi() calculation ite-cir: initialize use_demodulator before using it fuse: do not use iocb after it may have been freed crypto: caam - fix non-hmac hashes drm/i915: Don't leak edid in intel_crt_detect_ddc() s5k4ecgx: select CRC32 helper platform/x86: intel_mid_powerbtn: Set IRQ_ONESHOT net: fix harmonize_features() vs NETIF_F_HIGHDMA tcp: initialize max window for a new fastopen socket svcrpc: fix oops in absence of krb5 module ARM: 8643/3: arm/ptrace: Preserve previous registers for short regset write mac80211: Fix adding of mesh vendor IEs scsi: zfcp: fix use-after-free by not tracing WKA port open/close on failed send drm/i915: fix use-after-free in page_flip_completed() net: use a work queue to defer net_disable_timestamp() work ipv4: keep skb->dst around in presence of IP options netlabel: out of bound access in cipso_v4_validate() ip6_gre: fix ip6gre_err() invalid reads ping: fix a null pointer dereference l2tp: do not use udp_ioctl() packet: fix races in fanout_add() packet: Do not call fanout_release from atomic contexts net: socket: fix recvmmsg not returning error from sock_error USB: serial: mos7840: fix another NULL-deref at open USB: serial: ftdi_sio: fix modem-status error handling USB: serial: ftdi_sio: fix extreme low-latency setting USB: serial: ftdi_sio: fix line-status over-reporting USB: serial: spcp8x5: fix modem-status handling USB: serial: opticon: fix CTS retrieval at open USB: serial: ark3116: fix register-accessor error handling x86/platform/goldfish: Prevent unconditional loading goldfish: Sanitize the broken interrupt handler ocfs2: do not write error flag to user structure we cannot copy from/to mfd: pm8921: Potential NULL dereference in pm8921_remove() drm/nv50/disp: min/max are reversed in nv50_crtc_gamma_set() net: 6lowpan: fix lowpan_header_create non-compression memcpy call vti4: Don't count header length twice. net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames MIPS: OCTEON: Fix copy_from_user fault handling for large buffers MIPS: Clear ISA bit correctly in get_frame_info() MIPS: Prevent unaligned accesses during stack unwinding MIPS: Fix get_frame_info() handling of microMIPS function size MIPS: Fix is_jump_ins() handling of 16b microMIPS instructions MIPS: Calculate microMIPS ra properly when unwinding the stack MIPS: Handle microMIPS jumps in the same way as MIPS32/MIPS64 jumps uvcvideo: Fix a wrong macro scsi: aacraid: Reorder Adapter status check ath9k: use correct OTP register offsets for the AR9340 and AR9550 fuse: add missing FR_FORCE RDMA/core: Fix incorrect structure packing for booleans NFSv4: fix getacl head length estimation s390/qdio: clear DSCI prior to scanning multiple input queues IB/ipoib: Fix deadlock between rmmod and set_mode ktest: Fix child exit code processing nlm: Ensure callback code also checks that the files match dm: flush queued bios when process blocks to avoid deadlock USB: serial: digi_acceleport: fix OOB data sanity check USB: serial: digi_acceleport: fix OOB-event processing MIPS: ip27: Disable qlge driver in defconfig tracing: Add #undef to fix compile error USB: serial: safe_serial: fix information leak in completion handler USB: serial: omninet: fix reference leaks at open USB: iowarrior: fix NULL-deref at probe USB: iowarrior: fix NULL-deref in write USB: serial: io_ti: fix NULL-deref in interrupt callback USB: serial: io_ti: fix information leak in completion handler vxlan: correctly validate VXLAN ID against VXLAN_N_VID ipv4: mask tos for input route locking/static_keys: Add static_key_{en,dis}able() helpers net: net_enable_timestamp() can be called from irq contexts dccp/tcp: fix routing redirect race net sched actions: decrement module reference count after table flush. perf/core: Fix event inheritance on fork() isdn/gigaset: fix NULL-deref at probe xen: do not re-use pirq number cached in pci device msi msg data net: properly release sk_frag.page net: unix: properly re-increment inflight counter of GC discarded candidates Input: ims-pcu - validate number of endpoints before using them Input: hanwang - validate number of endpoints before using them Input: yealink - validate number of endpoints before using them Input: cm109 - validate number of endpoints before using them USB: uss720: fix NULL-deref at probe USB: idmouse: fix NULL-deref at probe USB: wusbcore: fix NULL-deref at probe uwb: i1480-dfu: fix NULL-deref at probe uwb: hwa-rc: fix NULL-deref at probe mmc: ushc: fix NULL-deref at probe ext4: mark inode dirty after converting inline directory scsi: libsas: fix ata xfer length ALSA: ctxfi: Fallback DMA mask to 32bit ALSA: ctxfi: Fix the incorrect check of dma_set_mask() call ACPI / PNP: Avoid conflicting resource reservations ACPI / resources: free memory on error in add_region_before() ACPI / PNP: Reserve ACPI resources at the fs_initcall_sync stage USB: OHCI: Fix race between ED unlink and URB submission i2c: at91: manage unexpected RXRDY flag when starting a transfer ipv4: igmp: Allow removing groups from a removed interface ptrace: fix PTRACE_LISTEN race corrupting task->state ring-buffer: Fix return value check in test_ringbuffer() metag/usercopy: Fix alignment error checking metag/usercopy: Add early abort to copy_to_user metag/usercopy: Set flags before ADDZ metag/usercopy: Fix src fixup in from user rapf loops metag/usercopy: Add missing fixups s390/decompressor: fix initrd corruption caused by bss clear net/mlx4_en: Fix bad WQE issue net/mlx4_core: Fix racy CQ (Completion Queue) free char: Drop bogus dependency of DEVPORT on !M68K powerpc: Disable HFSCR[TM] if TM is not supported pegasus: Use heap buffers for all register access rtl8150: Use heap buffers for all register access tracing: Allocate the snapshot buffer before enabling probe ring-buffer: Have ring_buffer_iter_empty() return true when empty netfilter: arp_tables: fix invoking 32bit "iptable -P INPUT ACCEPT" failed in 64bit kernel net: phy: handle state correctly in phy_stop_machine l2tp: take reference on sessions being dumped MIPS: KGDB: Use kernel context for sleeping threads ARM: dts: imx31: move CCM device node to AIPS2 bus devices ARM: dts: imx31: fix AVIC base address tun: Fix TUN_PKT_STRIP setting Staging: vt6655-6: potential NULL dereference in hostap_disable_hostapd() net: sctp: rework multihoming retransmission path selection to rfc4960 perf trace: Use the syscall raw_syscalls:sys_enter timestamp USB: usbtmc: add missing endpoint sanity check ping: implement proper locking USB: fix problems with duplicate endpoint addresses USB: dummy-hcd: fix bug in stop_activity (handle ep0) mm/init: fix zone boundary creation can: Fix kernel panic at security_sock_rcv_skb Drivers: hv: avoid vfree() on crash xc2028: avoid use after free xc2028: unlock on error in xc2028_set_config() xc2028: Fix use-after-free bug properly ipv6: fix ip6_tnl_parse_tlv_enc_lim() ipv6: pointer math error in ip6_tnl_parse_tlv_enc_lim() ipv6: fix the use of pcpu_tstats in ip6_tunnel sctp: avoid BUG_ON on sctp_wait_for_sndbuf sctp: deny peeloff operation on asocs with threads sleeping on it KVM: x86: clear bus pointer when destroyed kvm: exclude ioeventfd from counting kvm_io_range limit KVM: kvm_io_bus_unregister_dev() should never fail TTY: n_hdlc, fix lockdep false positive tty: n_hdlc: get rid of racy n_hdlc.tbuf ipv6: handle -EFAULT from skb_copy_bits fs: exec: apply CLOEXEC before changing dumpable task flags mm/huge_memory.c: respect FOLL_FORCE/FOLL_COW for thp dccp/tcp: do not inherit mc_list from parent char: lp: fix possible integer overflow in lp_setup() dccp: fix freeing skb too early for IPV6_RECVPKTINFO Linux 3.10.106 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Conflicts: drivers/mfd/pm8921-core.c include/linux/cpu.h kernel/cpu.c net/ipv4/inet_connection_sock.c net/ipv4/ping.c
| * | ring-buffer: Have ring_buffer_iter_empty() return true when emptySteven Rostedt (VMware)2017-06-081-2/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 78f7a45dac2a2d2002f98a3a95f7979867868d73 upstream. I noticed that reading the snapshot file when it is empty no longer gives a status. It suppose to show the status of the snapshot buffer as well as how to allocate and use it. For example: ># cat snapshot # tracer: nop # # # * Snapshot is allocated * # # Snapshot commands: # echo 0 > snapshot : Clears and frees snapshot buffer # echo 1 > snapshot : Allocates snapshot buffer, if not already allocated. # Takes a snapshot of the main buffer. # echo 2 > snapshot : Clears snapshot buffer (but does not allocate or free) # (Doesn't have to be '2' works with any number that # is not a '0' or '1') But instead it just showed an empty buffer: ># cat snapshot # tracer: nop # # entries-in-buffer/entries-written: 0/0 #P:4 # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | What happened was that it was using the ring_buffer_iter_empty() function to see if it was empty, and if it was, it showed the status. But that function was returning false when it was empty. The reason was that the iter header page was on the reader page, and the reader page was empty, but so was the buffer itself. The check only tested to see if the iter was on the commit page, but the commit page was no longer pointing to the reader page, but as all pages were empty, the buffer is also. Fixes: 651e22f2701b ("ring-buffer: Always reset iterator to reader page") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | tracing: Allocate the snapshot buffer before enabling probeSteven Rostedt (VMware)2017-06-081-3/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit df62db5be2e5f070ecd1a5ece5945b590ee112e0 upstream. Currently the snapshot trigger enables the probe and then allocates the snapshot. If the probe triggers before the allocation, it could cause the snapshot to fail and turn tracing off. It's best to allocate the snapshot buffer first, and then enable the trigger. If something goes wrong in the enabling of the trigger, the snapshot buffer is still allocated, but it can also be freed by the user by writting zero into the snapshot buffer file. Also add a check of the return status of alloc_snapshot(). Fixes: 77fd5c15e3 ("tracing: Add snapshot trigger to function probes") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | ring-buffer: Fix return value check in test_ringbuffer()Wei Yongjun2017-06-081-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 62277de758b155dc04b78f195a1cb5208c37b2df upstream. In case of error, the function kthread_run() returns ERR_PTR() and never returns NULL. The NULL test in the return value check should be replaced with IS_ERR(). Link: http://lkml.kernel.org/r/1466184839-14927-1-git-send-email-weiyj_lk@163.com Fixes: 6c43e554a ("ring-buffer: Add ring buffer startup selftest") Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | ptrace: fix PTRACE_LISTEN race corrupting task->statebsegall@google.com2017-06-081-4/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 5402e97af667e35e54177af8f6575518bf251d51 upstream. In PT_SEIZED + LISTEN mode STOP/CONT signals cause a wakeup against __TASK_TRACED. If this races with the ptrace_unfreeze_traced at the end of a PTRACE_LISTEN, this can wake the task /after/ the check against __TASK_TRACED, but before the reset of state to TASK_TRACED. This causes it to instead clobber TASK_WAKING, allowing a subsequent wakeup against TRACED while the task is still on the rq wake_list, corrupting it. Oleg said: "The kernel can crash or this can lead to other hard-to-debug problems. In short, "task->state = TASK_TRACED" in ptrace_unfreeze_traced() assumes that nobody else can wake it up, but PTRACE_LISTEN breaks the contract. Obviusly it is very wrong to manipulate task->state if this task is already running, or WAKING, or it sleeps again" [akpm@linux-foundation.org: coding-style fixes] Fixes: 9899d11f ("ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL") Link: http://lkml.kernel.org/r/xm26y3vfhmkp.fsf_-_@bsegall-linux.mtv.corp.google.com Signed-off-by: Ben Segall <bsegall@google.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | perf/core: Fix event inheritance on fork()Peter Zijlstra2017-06-081-2/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream. While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | locking/static_keys: Add static_key_{en,dis}able() helpersPeter Zijlstra2017-06-081-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit e33886b38cc82a9fc3b2d655dfc7f50467594138 upstream. Add two helpers to make it easier to treat the refcount as boolean. [js] do not involve WARN_ON_ONCE as it causes build failures Suggested-by: Jason Baron <jasonbaron0@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Jiri Slaby <jslaby@suse.cz> [wt: only backported for use in next fix ; s/static_key_count(key)/atomic_read(&key->enabled)/] Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | hotplug: Make register and unregister notifier API symmetricMichal Hocko2017-06-081-2/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 777c6e0daebb3fcefbbd6f620410a946b07ef6d0 upstream. Yu Zhao has noticed that __unregister_cpu_notifier only unregisters its notifiers when HOTPLUG_CPU=y while the registration might succeed even when HOTPLUG_CPU=n if MODULE is enabled. This means that e.g. zswap might keep a stale notifier on the list on the manual clean up during the pool tear down and thus corrupt the list. Resulting in the following [ 144.964346] BUG: unable to handle kernel paging request at ffff880658a2be78 [ 144.971337] IP: [<ffffffffa290b00b>] raw_notifier_chain_register+0x1b/0x40 <snipped> [ 145.122628] Call Trace: [ 145.125086] [<ffffffffa28e5cf8>] __register_cpu_notifier+0x18/0x20 [ 145.131350] [<ffffffffa2a5dd73>] zswap_pool_create+0x273/0x400 [ 145.137268] [<ffffffffa2a5e0fc>] __zswap_param_set+0x1fc/0x300 [ 145.143188] [<ffffffffa2944c1d>] ? trace_hardirqs_on+0xd/0x10 [ 145.149018] [<ffffffffa2908798>] ? kernel_param_lock+0x28/0x30 [ 145.154940] [<ffffffffa2a3e8cf>] ? __might_fault+0x4f/0xa0 [ 145.160511] [<ffffffffa2a5e237>] zswap_compressor_param_set+0x17/0x20 [ 145.167035] [<ffffffffa2908d3c>] param_attr_store+0x5c/0xb0 [ 145.172694] [<ffffffffa290848d>] module_attr_store+0x1d/0x30 [ 145.178443] [<ffffffffa2b2b41f>] sysfs_kf_write+0x4f/0x70 [ 145.183925] [<ffffffffa2b2a5b9>] kernfs_fop_write+0x149/0x180 [ 145.189761] [<ffffffffa2a99248>] __vfs_write+0x18/0x40 [ 145.194982] [<ffffffffa2a9a412>] vfs_write+0xb2/0x1a0 [ 145.200122] [<ffffffffa2a9a732>] SyS_write+0x52/0xa0 [ 145.205177] [<ffffffffa2ff4d97>] entry_SYSCALL_64_fastpath+0x12/0x17 This can be even triggered manually by changing /sys/module/zswap/parameters/compressor multiple times. Fix this issue by making unregister APIs symmetric to the register so there are no surprises. [js] backport to 3.12 Fixes: 47e627bc8c9a ("[PATCH] hotplug: Allow modules to use the cpu hotplug notifiers even if !CONFIG_HOTPLUG_CPU") Reported-and-tested-by: Yu Zhao <yuzhao@google.com> Signed-off-by: Michal Hocko <mhocko@suse.com> Cc: linux-mm@kvack.org Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Dan Streetman <ddstreet@ieee.org> Link: http://lkml.kernel.org/r/20161207135438.4310-1-mhocko@kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | locking/rtmutex: Prevent dequeue vs. unlock raceThomas Gleixner2017-06-081-2/+66
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit dbb26055defd03d59f678cb5f2c992abe05b064a upstream. David reported a futex/rtmutex state corruption. It's caused by the following problem: CPU0 CPU1 CPU2 l->owner=T1 rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T2) boost() unlock(l->wait_lock) schedule() rt_mutex_lock(l) lock(l->wait_lock) l->owner = T1 | HAS_WAITERS; enqueue(T3) boost() unlock(l->wait_lock) schedule() signal(->T2) signal(->T3) lock(l->wait_lock) dequeue(T2) deboost() unlock(l->wait_lock) lock(l->wait_lock) dequeue(T3) ===> wait list is now empty deboost() unlock(l->wait_lock) lock(l->wait_lock) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; l->owner = owner ==> l->owner = T1 } lock(l->wait_lock) rt_mutex_unlock(l) fixup_rt_mutex_waiters() if (wait_list_empty(l)) { owner = l->owner & ~HAS_WAITERS; cmpxchg(l->owner, T1, NULL) ===> Success (l->owner = NULL) l->owner = owner ==> l->owner = T1 } That means the problem is caused by fixup_rt_mutex_waiters() which does the RMW to clear the waiters bit unconditionally when there are no waiters in the rtmutexes rbtree. This can be fatal: A concurrent unlock can release the rtmutex in the fastpath because the waiters bit is not set. If the cmpxchg() gets in the middle of the RMW operation then the previous owner, which just unlocked the rtmutex is set as the owner again when the write takes place after the successfull cmpxchg(). The solution is rather trivial: verify that the owner member of the rtmutex has the waiters bit set before clearing it. This does not require a cmpxchg() or other atomic operations because the waiters bit can only be set and cleared with the rtmutex wait_lock held. It's also safe against the fast path unlock attempt. The unlock attempt via cmpxchg() will either see the bit set and take the slowpath or see the bit cleared and release it atomically in the fastpath. It's remarkable that the test program provided by David triggers on ARM64 and MIPS64 really quick, but it refuses to reproduce on x86-64, while the problem exists there as well. That refusal might explain that this got not discovered earlier despite the bug existing from day one of the rtmutex implementation more than 10 years ago. Thanks to David for meticulously instrumenting the code and providing the information which allowed to decode this subtle problem. Reported-by: David Daney <ddaney@caviumnetworks.com> Tested-by: David Daney <david.daney@cavium.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sebastian Siewior <bigeasy@linutronix.de> Cc: Will Deacon <will.deacon@arm.com> Fixes: 23f78d4a03c5 ("[PATCH] pi-futex: rt mutex core") Link: http://lkml.kernel.org/r/20161130210030.351136722@linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> [wt: s/{READ,WRITE}_ONCE/ACCESS_ONCE/] Signed-off-by: Willy Tarreau <w@1wt.eu>
* | | Merge 3.10.105 into android-msm-angler-3.10-oreo-m5Nathan Chancellor2018-01-246-25/+82
|\| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Changes in 3.10.105: (315 commits) sched/core: Fix a race between try_to_wake_up() and a woken up task sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule() crypto: algif_skcipher - Require setkey before accept(2) crypto: af_alg - Disallow bind/setkey/... after accept(2) crypto: af_alg - Add nokey compatibility path crypto: algif_skcipher - Add nokey compatibility path crypto: hash - Add crypto_ahash_has_setkey crypto: shash - Fix has_key setting crypto: algif_hash - Require setkey before accept(2) crypto: skcipher - Add crypto_skcipher_has_setkey crypto: algif_skcipher - Add key check exception for cipher_null crypto: af_alg - Allow af_af_alg_release_parent to be called on nokey path crypto: algif_hash - Remove custom release parent function crypto: algif_skcipher - Remove custom release parent function crypto: af_alg - Forbid bind(2) when nokey child sockets are present crypto: algif_hash - Fix race condition in hash_check_key crypto: algif_skcipher - Fix race condition in skcipher_check_key crypto: algif_skcipher - Load TX SG list after waiting crypto: cryptd - initialize child shash_desc on import crypto: skcipher - Fix blkcipher walk OOM crash crypto: gcm - Fix IV buffer size in crypto_gcm_setkey MIPS: KVM: Fix unused variable build warning KVM: MIPS: Precalculate MMIO load resume PC KVM: MIPS: Drop other CPU ASIDs on guest MMU changes KVM: nVMX: postpone VMCS changes on MSR_IA32_APICBASE write KVM: MIPS: Make ERET handle ERL before EXL KVM: x86: fix wbinvd_dirty_mask use-after-free KVM: x86: fix missed SRCU usage in kvm_lapic_set_vapic_addr KVM: Disable irq while unregistering user notifier PM / devfreq: Fix incorrect type issue. ppp: defer netns reference release for ppp channel x86/mm/xen: Suppress hugetlbfs in PV guests xen: Add RING_COPY_REQUEST() xen-netback: don't use last request to determine minimum Tx credit xen-netback: use RING_COPY_REQUEST() throughout xen-blkback: only read request operation from shared ring once xen/pciback: Save xen_pci_op commands before processing it xen/pciback: Save the number of MSI-X entries to be copied later. xen/pciback: Return error on XEN_PCI_OP_enable_msi when device has MSI or MSI-X enabled xen/pciback: Return error on XEN_PCI_OP_enable_msix when device has MSI or MSI-X enabled xen/pciback: Do not install an IRQ handler for MSI interrupts. xen/pciback: For XEN_PCI_OP_disable_msi[|x] only disable if device has MSI(X) enabled. xen/pciback: Don't allow MSI-X ops if PCI_COMMAND_MEMORY is not set. xen-pciback: Add name prefix to global 'permissive' variable x86/xen: fix upper bound of pmd loop in xen_cleanhighmap() x86/traps: Ignore high word of regs->cs in early_idt_handler_common x86/mm: Disable preemption during CR3 read+write x86/apic: Do not init irq remapping if ioapic is disabled x86/mm/pat, /dev/mem: Remove superfluous error message x86/paravirt: Do not trace _paravirt_ident_*() functions x86/build: Build compressed x86 kernels as PIE x86/um: reuse asm-generic/barrier.h iommu/amd: Update Alias-DTE in update_device_table() iommu/amd: Free domain id when free a domain of struct dma_ops_domain ARM: 8616/1: dt: Respect property size when parsing CPUs ARM: 8618/1: decompressor: reset ttbcr fields to use TTBR0 on ARMv7 ARM: sa1100: clear reset status prior to reboot ARM: sa1111: fix pcmcia suspend/resume arm64: avoid returning from bad_mode arm64: Define AT_VECTOR_SIZE_ARCH for ARCH_DLINFO arm64: spinlocks: implement smp_mb__before_spinlock() as smp_mb() arm64: debug: avoid resetting stepping state machine when TIF_SINGLESTEP MIPS: Malta: Fix IOCU disable switch read for MIPS64 MIPS: ptrace: Fix regs_return_value for kernel context powerpc/mm: Don't alias user region to other regions below PAGE_OFFSET powerpc/vdso64: Use double word compare on pointers powerpc/powernv: Use CPU-endian PEST in pnv_pci_dump_p7ioc_diag_data() powerpc/64: Fix incorrect return value from __copy_tofrom_user powerpc/nvram: Fix an incorrect partition merge avr32: fix copy_from_user() avr32: fix 'undefined reference to `___copy_from_user' avr32: off by one in at32_init_pio() s390/dasd: fix hanging device after clear subchannel parisc: Ensure consistent state when switching to kernel stack at syscall entry microblaze: fix __get_user() microblaze: fix copy_from_user() mn10300: failing __get_user() and get_user() should zero m32r: fix __get_user() sh64: failing __get_user() should zero score: fix __get_user/get_user s390: get_user() should zero on failure ARC: uaccess: get_user to zero out dest in cause of fault asm-generic: make get_user() clear the destination on errors frv: fix clear_user() cris: buggered copy_from_user/copy_to_user/clear_user blackfin: fix copy_from_user() score: fix copy_from_user() and friends sh: fix copy_from_user() hexagon: fix strncpy_from_user() error return mips: copy_from_user() must zero the destination on access_ok() failure asm-generic: make copy_from_user() zero the destination properly alpha: fix copy_from_user() metag: copy_from_user() should zero the destination on access_ok() failure parisc: fix copy_from_user() openrisc: fix copy_from_user() openrisc: fix the fix of copy_from_user() mn10300: copy_from_user() should zero on access_ok() failure... sparc32: fix copy_from_user() ppc32: fix copy_from_user() ia64: copy_from_user() should zero the destination on access_ok() failure fix fault_in_multipages_...() on architectures with no-op access_ok() fix memory leaks in tracing_buffers_splice_read() arc: don't leak bits of kernel stack into coredump Fix potential infoleak in older kernels swapfile: fix memory corruption via malformed swapfile coredump: fix unfreezable coredumping task usb: dwc3: gadget: increment request->actual once USB: validate wMaxPacketValue entries in endpoint descriptors USB: fix typo in wMaxPacketSize validation usb: xhci: Fix panic if disconnect USB: serial: fix memleak in driver-registration error path USB: kobil_sct: fix non-atomic allocation in write path USB: serial: mos7720: fix non-atomic allocation in write path USB: serial: mos7840: fix non-atomic allocation in write path usb: renesas_usbhs: fix clearing the {BRDY,BEMP}STS condition USB: change bInterval default to 10 ms usb: gadget: fsl_qe_udc: signedness bug in qe_get_frame() USB: serial: cp210x: fix hardware flow-control disable usb: misc: legousbtower: Fix NULL pointer deference usb: gadget: function: u_ether: don't starve tx request queue USB: serial: cp210x: fix tiocmget error handling usb: gadget: u_ether: remove interrupt throttling usb: chipidea: move the lock initialization to core file Fix USB CB/CBI storage devices with CONFIG_VMAP_STACK=y ALSA: rawmidi: Fix possible deadlock with virmidi registration ALSA: timer: fix NULL pointer dereference in read()/ioctl() race ALSA: timer: fix division by zero after SNDRV_TIMER_IOCTL_CONTINUE ALSA: timer: fix NULL pointer dereference on memory allocation failure ALSA: ali5451: Fix out-of-bound position reporting ALSA: pcm : Call kill_fasync() in stream lock zfcp: fix fc_host port_type with NPIV zfcp: fix ELS/GS request&response length for hardware data router zfcp: close window with unblocked rport during rport gone zfcp: retain trace level for SCSI and HBA FSF response records zfcp: restore: Dont use 0 to indicate invalid LUN in rec trace zfcp: trace on request for open and close of WKA port zfcp: restore tracing of handle for port and LUN with HBA records zfcp: fix D_ID field with actual value on tracing SAN responses zfcp: fix payload trace length for SAN request&response zfcp: trace full payload of all SAN records (req,resp,iels) scsi: zfcp: spin_lock_irqsave() is not nestable scsi: mpt3sas: Fix secure erase premature termination scsi: mpt3sas: Unblock device after controller reset scsi: mpt3sas: fix hang on ata passthrough commands mpt2sas: Fix secure erase premature termination scsi: megaraid_sas: Fix data integrity failure for JBOD (passthrough) devices scsi: megaraid_sas: fix macro MEGASAS_IS_LOGICAL to avoid regression scsi: ibmvfc: Fix I/O hang when port is not mapped scsi: Fix use-after-free scsi: arcmsr: Buffer overflow in arcmsr_iop_message_xfer() scsi: scsi_debug: Fix memory leak if LBP enabled and module is unloaded scsi: arcmsr: Send SYNCHRONIZE_CACHE command to firmware ext4: validate that metadata blocks do not overlap superblock ext4: avoid modifying checksum fields directly during checksum verification ext4: use __GFP_NOFAIL in ext4_free_blocks() ext4: reinforce check of i_dtime when clearing high fields of uid and gid ext4: allow DAX writeback for hole punch ext4: sanity check the block and cluster size at mount time reiserfs: fix "new_insert_key may be used uninitialized ..." reiserfs: Unlock superblock before calling reiserfs_quota_on_mount() xfs: fix superblock inprogress check libxfs: clean up _calc_dquots_per_chunk btrfs: ensure that file descriptor used with subvol ioctls is a dir ocfs2/dlm: fix race between convert and migration ocfs2: fix start offset to ocfs2_zero_range_for_truncate() ubifs: Fix assertion in layout_in_gaps() ubifs: Fix xattr_names length in exit paths UBIFS: Fix possible memory leak in ubifs_readdir() ubifs: Abort readdir upon error ubifs: Fix regression in ubifs_readdir() UBI: fastmap: scrub PEB when bitflips are detected in a free PEB EC header NFSv4.x: Fix a refcount leak in nfs_callback_up_net NFSD: Using free_conn free connection NFS: Don't drop CB requests with invalid principals NFSv4: Open state recovery must account for file permission changes fs/seq_file: fix out-of-bounds read fs/super.c: fix race between freeze_super() and thaw_super() isofs: Do not return EACCES for unknown filesystems hostfs: Freeing an ERR_PTR in hostfs_fill_sb_common() driver core: Delete an unnecessary check before the function call "put_device" driver core: fix race between creating/querying glue dir and its cleanup drm/radeon: fix radeon_move_blit on 32bit systems drm: Reject page_flip for !DRIVER_MODESET drm/radeon: Ensure vblank interrupt is enabled on DPMS transition to on qxl: check for kmap failures Input: i8042 - break load dependency between atkbd/psmouse and i8042 Input: i8042 - set up shared ps2_cmd_mutex for AUX ports Input: ili210x - fix permissions on "calibrate" attribute hwrng: exynos - Disable runtime PM on probe failure hwrng: omap - Fix assumption that runtime_get_sync will always succeed hwrng: omap - Only fail if pm_runtime_get_sync returns < 0 i2c-eg20t: fix race between i2c init and interrupt enable em28xx-i2c: rt_mutex_trylock() returns zero on failure i2c: core: fix NULL pointer dereference under race condition i2c: at91: fix write transfers by clearing pending interrupt first iio: accel: kxsd9: Fix raw read return iio: accel: kxsd9: Fix scaling bug thermal: hwmon: Properly report critical temperature in sysfs cdc-acm: fix wrong pipe type on rx interrupt xfers timers: Use proper base migration in add_timer_on() EDAC: Increment correct counter in edac_inc_ue_error() IB/ipoib: Fix memory corruption in ipoib cm mode connect flow IB/core: Fix use after free in send_leave function IB/ipoib: Don't allow MC joins during light MC flush IB/mlx4: Fix incorrect MC join state bit-masking on SR-IOV IB/mlx4: Fix create CQ error flow IB/uverbs: Fix leak of XRC target QPs IB/cm: Mark stale CM id's whenever the mad agent was unregistered mtd: blkdevs: fix potential deadlock + lockdep warnings mtd: pmcmsp-flash: Allocating too much in init_msp_flash() mtd: nand: davinci: Reinitialize the HW ECC engine in 4bit hwctl perf symbols: Fixup symbol sizes before picking best ones perf: Tighten (and fix) the grouping condition tty: Prevent ldisc drivers from re-using stale tty fields tty: limit terminal size to 4M chars tty: vt, fix bogus division in csi_J vt: clear selection before resizing drivers/vfio: Rework offsetofend() include/stddef.h: Move offsetofend() from vfio.h to a generic kernel header stddef.h: move offsetofend inside #ifndef/#endif guard, neaten ipv6: don't call fib6_run_gc() until routing is ready ipv6: split duplicate address detection and router solicitation timer ipv6: move DAD and addrconf_verify processing to workqueue ipv6: addrconf: fix dev refcont leak when DAD failed ipv6: fix rtnl locking in setsockopt for anycast and multicast ip6_gre: fix flowi6_proto value in ip6gre_xmit_other() ipv6: correctly add local routes when lo goes up ipv6: dccp: fix out of bound access in dccp_v6_err() ipv6: dccp: add missing bind_conflict to dccp_ipv6_mapped ip6_tunnel: Clear IP6CB in ip6tunnel_xmit() ip6_tunnel: disable caching when the traffic class is inherited net/irda: handle iriap_register_lsap() allocation failure tcp: fix use after free in tcp_xmit_retransmit_queue() tcp: properly scale window in tcp_v[46]_reqsk_send_ack() tcp: fix overflow in __tcp_retransmit_skb() tcp: fix wrong checksum calculation on MTU probing tcp: take care of truncations done by sk_filter() bonding: Fix bonding crash net: ratelimit warnings about dst entry refcount underflow or overflow mISDN: Support DR6 indication in mISDNipac driver mISDN: Fixing missing validation in base_sock_bind() net: disable fragment reassembly if high_thresh is set to zero ipvs: count pre-established TCP states as active iwlwifi: pcie: fix access to scratch buffer svc: Avoid garbage replies when pc_func() returns rpc_drop_reply brcmsmac: Free packet if dma_mapping_error() fails in dma_rxfill brcmsmac: Initialize power in brcms_c_stf_ss_algo_channel_get() brcmfmac: avoid potential stack overflow in brcmf_cfg80211_start_ap() pstore: Fix buffer overflow while write offset equal to buffer size net/mlx4_core: Allow resetting VF admin mac to zero firewire: net: guard against rx buffer overflows firewire: net: fix fragmented datagram_size off-by-one netfilter: fix namespace handling in nf_log_proc_dostring can: bcm: fix warning in bcm_connect/proc_register net: fix sk_mem_reclaim_partial() net: avoid sk_forward_alloc overflows ipmr, ip6mr: fix scheduling while atomic and a deadlock with ipmr_get_route packet: call fanout_release, while UNREGISTERING a netdev net: sctp, forbid negative length sctp: validate chunk len before actually using it net: clear sk_err_soft in sk_clone_lock() net: mangle zero checksum in skb_checksum_help() dccp: do not send reset to already closed sockets dccp: fix out of bound access in dccp_v4_err() sctp: assign assoc_id earlier in __sctp_connect neigh: check error pointer instead of NULL for ipv4_neigh_lookup() ipv4: use new_gw for redirect neigh lookup mac80211: fix purging multicast PS buffer queue mac80211: discard multicast and 4-addr A-MSDUs cfg80211: limit scan results cache size mwifiex: printk() overflow with 32-byte SSIDs ipv4: Set skb->protocol properly for local output net: sky2: Fix shutdown crash kaweth: fix firmware download tracing: Move mutex to protect against resetting of seq data kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscd Revert "ipc/sem.c: optimize sem_lock()" cfq: fix starvation of asynchronous writes drbd: Fix kernel_sendmsg() usage - potential NULL deref lib/genalloc.c: start search from start of chunk tools/vm/slabinfo: fix an unintentional printf rcu: Fix soft lockup for rcu_nocb_kthread ratelimit: fix bug in time interval by resetting right begin time mfd: core: Fix device reference leak in mfd_clone_cell PM / sleep: fix device reference leak in test_suspend mmc: mxs: Initialize the spinlock prior to using it mmc: block: don't use CMD23 with very old MMC cards pstore/core: drop cmpxchg based updates pstore/ram: Use memcpy_toio instead of memcpy pstore/ram: Use memcpy_fromio() to save old buffer mb86a20s: fix the locking logic mb86a20s: fix demod settings cx231xx: don't return error on success cx231xx: fix GPIOs for Pixelview SBTVD hybrid gpio: mpc8xxx: Correct irq handler function uio: fix dmem_region_start computation KEYS: Fix short sprintf buffer in /proc/keys show function hv: do not lose pending heartbeat vmbus packets staging: iio: ad5933: avoid uninitialized variable in error case mei: bus: fix received data size check in NFC fixup ACPI / APEI: Fix incorrect return value of ghes_proc() PCI: Handle read-only BARs on AMD CS553x devices tile: avoid using clocksource_cyc2ns with absolute cycle count dm flakey: fix reads to be issued if drop_writes configured mm,ksm: fix endless looping in allocating memory when ksm enable can: dev: fix deadlock reported after bus-off hwmon: (adt7411) set bit 3 in CFG1 register mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] mfd: 88pm80x: Double shifting bug in suspend/resume ASoC: omap-mcpdm: Fix irq resource handling regulator: tps65910: Work around silicon erratum SWCZ010 dm: mark request_queue dead before destroying the DM device fbdev/efifb: Fix 16 color palette entry calculation metag: Only define atomic_dec_if_positive conditionally Linux 3.10.105 Signed-off-by: Nathan Chancellor <natechancellor@gmail.com> Conflicts: arch/arm/mach-sa1100/generic.c arch/arm64/kernel/traps.c crypto/blkcipher.c drivers/devfreq/devfreq.c drivers/usb/dwc3/gadget.c drivers/usb/gadget/u_ether.c fs/ubifs/dir.c include/net/if_inet6.h lib/genalloc.c net/ipv6/addrconf.c net/ipv6/tcp_ipv6.c net/wireless/scan.c sound/core/timer.c
| * | PM / sleep: fix device reference leak in test_suspendJohan Hovold2017-02-101-1/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ceb75787bc75d0a7b88519ab8a68067ac690f55a upstream. Make sure to drop the reference taken by class_find_device() after opening the RTC device. Fixes: 77437fd4e61f (pm: boot time suspend selftest) Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | rcu: Fix soft lockup for rcu_nocb_kthreadDing Tianhong2017-02-101-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit bedc1969150d480c462cdac320fa944b694a7162 upstream. Carrying out the following steps results in a softlockup in the RCU callback-offload (rcuo) kthreads: 1. Connect to ixgbevf, and set the speed to 10Gb/s. 2. Use ifconfig to bring the nic up and down repeatedly. [ 317.005148] IPv6: ADDRCONF(NETDEV_CHANGE): eth2: link becomes ready [ 368.106005] BUG: soft lockup - CPU#1 stuck for 22s! [rcuos/1:15] [ 368.106005] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 368.106005] task: ffff88057dd8a220 ti: ffff88057dd9c000 task.ti: ffff88057dd9c000 [ 368.106005] RIP: 0010:[<ffffffff81579e04>] [<ffffffff81579e04>] fib_table_lookup+0x14/0x390 [ 368.106005] RSP: 0018:ffff88061fc83ce8 EFLAGS: 00000286 [ 368.106005] RAX: 0000000000000001 RBX: 00000000020155c0 RCX: 0000000000000001 [ 368.106005] RDX: ffff88061fc83d50 RSI: ffff88061fc83d70 RDI: ffff880036d11a00 [ 368.106005] RBP: ffff88061fc83d08 R08: 0000000000000001 R09: 0000000000000000 [ 368.106005] R10: ffff880036d11a00 R11: ffffffff819e0900 R12: ffff88061fc83c58 [ 368.106005] R13: ffffffff816154dd R14: ffff88061fc83d08 R15: 00000000020155c0 [ 368.106005] FS: 0000000000000000(0000) GS:ffff88061fc80000(0000) knlGS:0000000000000000 [ 368.106005] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 368.106005] CR2: 00007f8c2aee9c40 CR3: 000000057b222000 CR4: 00000000000407e0 [ 368.106005] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 368.106005] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 [ 368.106005] Stack: [ 368.106005] 00000000010000c0 ffff88057b766000 ffff8802e380b000 ffff88057af03e00 [ 368.106005] ffff88061fc83dc0 ffffffff815349a6 ffff88061fc83d40 ffffffff814ee146 [ 368.106005] ffff8802e380af00 00000000e380af00 ffffffff819e0900 020155c0010000c0 [ 368.106005] Call Trace: [ 368.106005] <IRQ> [ 368.106005] [ 368.106005] [<ffffffff815349a6>] ip_route_input_noref+0x516/0xbd0 [ 368.106005] [<ffffffff814ee146>] ? skb_release_data+0xd6/0x110 [ 368.106005] [<ffffffff814ee20a>] ? kfree_skb+0x3a/0xa0 [ 368.106005] [<ffffffff8153698f>] ip_rcv_finish+0x29f/0x350 [ 368.106005] [<ffffffff81537034>] ip_rcv+0x234/0x380 [ 368.106005] [<ffffffff814fd656>] __netif_receive_skb_core+0x676/0x870 [ 368.106005] [<ffffffff814fd868>] __netif_receive_skb+0x18/0x60 [ 368.106005] [<ffffffff814fe4de>] process_backlog+0xae/0x180 [ 368.106005] [<ffffffff814fdcb2>] net_rx_action+0x152/0x240 [ 368.106005] [<ffffffff81077b3f>] __do_softirq+0xef/0x280 [ 368.106005] [<ffffffff8161619c>] call_softirq+0x1c/0x30 [ 368.106005] <EOI> [ 368.106005] [ 368.106005] [<ffffffff81015d95>] do_softirq+0x65/0xa0 [ 368.106005] [<ffffffff81077174>] local_bh_enable+0x94/0xa0 [ 368.106005] [<ffffffff81114922>] rcu_nocb_kthread+0x232/0x370 [ 368.106005] [<ffffffff81098250>] ? wake_up_bit+0x30/0x30 [ 368.106005] [<ffffffff811146f0>] ? rcu_start_gp+0x40/0x40 [ 368.106005] [<ffffffff8109728f>] kthread+0xcf/0xe0 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 [ 368.106005] [<ffffffff816147d8>] ret_from_fork+0x58/0x90 [ 368.106005] [<ffffffff810971c0>] ? kthread_create_on_node+0x140/0x140 ==================================cut here============================== It turns out that the rcuos callback-offload kthread is busy processing a very large quantity of RCU callbacks, and it is not reliquishing the CPU while doing so. This commit therefore adds an cond_resched_rcu_qs() within the loop to allow other tasks to run. [js] use onlu cond_resched() in 3.12 Signed-off-by: Ding Tianhong <dingtianhong@huawei.com> [ paulmck: Substituted cond_resched_rcu_qs for cond_resched. ] Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Dhaval Giani <dhaval.giani@oracle.com> Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | kernel/fork: fix CLONE_CHILD_CLEARTID regression in nscdMichal Hocko2017-02-101-6/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 735f2770a770156100f534646158cb58cb8b2939 upstream. Commit fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") has caused a subtle regression in nscd which uses CLONE_CHILD_CLEARTID to clear the nscd_certainly_running flag in the shared databases, so that the clients are notified when nscd is restarted. Now, when nscd uses a non-persistent database, clients that have it mapped keep thinking the database is being updated by nscd, when in fact nscd has created a new (anonymous) one (for non-persistent databases it uses an unlinked file as backend). The original proposal for the CLONE_CHILD_CLEARTID change claimed (https://lkml.org/lkml/2006/10/25/233): : The NPTL library uses the CLONE_CHILD_CLEARTID flag on clone() syscalls : on behalf of pthread_create() library calls. This feature is used to : request that the kernel clear the thread-id in user space (at an address : provided in the syscall) when the thread disassociates itself from the : address space, which is done in mm_release(). : : Unfortunately, when a multi-threaded process incurs a core dump (such as : from a SIGSEGV), the core-dumping thread sends SIGKILL signals to all of : the other threads, which then proceed to clear their user-space tids : before synchronizing in exit_mm() with the start of core dumping. This : misrepresents the state of process's address space at the time of the : SIGSEGV and makes it more difficult for someone to debug NPTL and glibc : problems (misleading him/her to conclude that the threads had gone away : before the fault). : : The fix below is to simply avoid the CLONE_CHILD_CLEARTID action if a : core dump has been initiated. The resulting patch from Roland (https://lkml.org/lkml/2006/10/26/269) seems to have a larger scope than the original patch asked for. It seems that limitting the scope of the check to core dumping should work for SIGSEGV issue describe above. [Changelog partly based on Andreas' description] Fixes: fec1d0115240 ("[PATCH] Disable CLONE_CHILD_CLEARTID for abnormal exit") Link: http://lkml.kernel.org/r/1471968749-26173-1-git-send-email-mhocko@kernel.org Signed-off-by: Michal Hocko <mhocko@suse.com> Tested-by: William Preston <wpreston@suse.com> Acked-by: Oleg Nesterov <oleg@redhat.com> Cc: Roland McGrath <roland@hack.frob.com> Cc: Andreas Schwab <schwab@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | tracing: Move mutex to protect against resetting of seq dataSteven Rostedt (Red Hat)2017-02-101-7/+8
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 1245800c0f96eb6ebb368593e251d66c01e61022 upstream. The iter->seq can be reset outside the protection of the mutex. So can reading of user data. Move the mutex up to the beginning of the function. Fixes: d7350c3f45694 ("tracing/core: make the read callbacks reentrants") Reported-by: Al Viro <viro@ZenIV.linux.org.uk> Signed-off-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | perf: Tighten (and fix) the grouping conditionPeter Zijlstra2017-02-101-2/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit c3c87e770458aa004bd7ed3f29945ff436fd6511 upstream. The fix from 9fc81d87420d ("perf: Fix events installation during moving group") was incomplete in that it failed to recognise that creating a group with events for different CPUs is semantically broken -- they cannot be co-scheduled. Furthermore, it leads to real breakage where, when we create an event for CPU Y and then migrate it to form a group on CPU X, the code gets confused where the counter is programmed -- triggered in practice as well by me via the perf fuzzer. Fix this by tightening the rules for creating groups. Only allow grouping of counters that can be co-scheduled in the same context. This means for the same task and/or the same cpu. Fixes: 9fc81d87420d ("perf: Fix events installation during moving group") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/20150123125834.090683288@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | timers: Use proper base migration in add_timer_on()Tejun Heo2017-02-101-3/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit 22b886dd1018093920c4250dee2a9a3cb7cff7b8 upstream. Regardless of the previous CPU a timer was on, add_timer_on() currently simply sets timer->flags to the new CPU. As the caller must be seeing the timer as idle, this is locally fine, but the timer leaving the old base while unlocked can lead to race conditions as follows. Let's say timer was on cpu 0. cpu 0 cpu 1 ----------------------------------------------------------------------------- del_timer(timer) succeeds del_timer(timer) lock_timer_base(timer) locks cpu_0_base add_timer_on(timer, 1) spin_lock(&cpu_1_base->lock) timer->flags set to cpu_1_base operates on @timer operates on @timer This triggered with mod_delayed_work_on() which contains "if (del_timer()) add_timer_on()" sequence eventually leading to the following oops. BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0 ... Workqueue: wqthrash wqthrash_workfunc [wqthrash] task: ffff8800172ca680 ti: ffff8800172d0000 task.ti: ffff8800172d0000 RIP: 0010:[<ffffffff810ca6e9>] [<ffffffff810ca6e9>] detach_if_pending+0x69/0x1a0 ... Call Trace: [<ffffffff810cb0b4>] del_timer+0x44/0x60 [<ffffffff8106e836>] try_to_grab_pending+0xb6/0x160 [<ffffffff8106e913>] mod_delayed_work_on+0x33/0x80 [<ffffffffa0000081>] wqthrash_workfunc+0x61/0x90 [wqthrash] [<ffffffff8106dba8>] process_one_work+0x1e8/0x650 [<ffffffff8106e05e>] worker_thread+0x4e/0x450 [<ffffffff810746af>] kthread+0xef/0x110 [<ffffffff8185980f>] ret_from_fork+0x3f/0x70 Fix it by updating add_timer_on() to perform proper migration as __mod_timer() does. Mike: apply tglx backport Reported-and-tested-by: Jeff Layton <jlayton@poochiereds.net> Signed-off-by: Tejun Heo <tj@kernel.org> Cc: Chris Worley <chris.worley@primarydata.com> Cc: bfields@fieldses.org Cc: Michael Skralivetsky <michael.skralivetsky@primarydata.com> Cc: Trond Myklebust <trond.myklebust@primarydata.com> Cc: Shaohua Li <shli@fb.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: kernel-team@fb.com Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/20151029103113.2f893924@tlielax.poochiereds.net Link: http://lkml.kernel.org/r/20151104171533.GI5749@mtj.duckdns.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Mike Galbraith <mgalbraith@suse.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | fix memory leaks in tracing_buffers_splice_read()Al Viro2017-02-061-8/+9
| | | | | | | | | | | | | | | | | | | | | commit 1ae2293dd6d2f5c823cf97e60b70d03631cd622f upstream. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Willy Tarreau <w@1wt.eu>
| * | sched/core: Fix an SMP ordering race in try_to_wake_up() vs. schedule()Peter Zijlstra2017-02-061-0/+19
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | commit ecf7d01c229d11a44609c0067889372c91fb4f36 upstream. Oleg noticed that its possible to falsely observe p->on_cpu == 0 such that we'll prematurely continue with the wakeup and effectively run p on two CPUs at the same time. Even though the overlap is very limited; the task is in the middle of being scheduled out; it could still result in corruption of the scheduler data structures. CPU0 CPU1 set_current_state(...) <preempt_schedule> context_switch(X, Y) prepare_lock_switch(Y) Y->on_cpu = 1; finish_lock_switch(X) store_release(X->on_cpu, 0); try_to_wake_up(X) LOCK(p->pi_lock); t = X->on_cpu; // 0 context_switch(Y, X) prepare_lock_switch(X) X->on_cpu = 1; finish_lock_switch(Y) store_release(Y->on_cpu, 0); </preempt_schedule> schedule(); deactivate_task(X); X->on_rq = 0; if (X->on_rq) // false if (t) while (X->on_cpu) cpu_relax(); context_switch(X, ..) finish_lock_switch(X) store_release(X->on_cpu, 0); Avoid the load of X->on_cpu being hoisted over the X->on_rq load. Reported-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Willy Tarreau <w@1wt.eu>