litmus-rt.git - The LITMUS^RT kernel.

	Commit message (Collapse)	Author	Age
*	mm: invoke oom-killer from page fault	Nick Piggin	2009-01-06
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Rather than have the pagefault handler kill a process directly if it gets a VM_FAULT_OOM, have it call into the OOM killer. With increasingly sophisticated oom behaviour (cpusets, memory cgroups, oom killing throttling, oom priority adjustment or selective disabling, panic on oom, etc), it's silly to unconditionally kill the faulting process at page fault time. Create a hook for pagefault oom path to call into instead. Only converted x86 and uml so far. [akpm@linux-foundation.org: make __out_of_memory() static] [akpm@linux-foundation.org: fix comment] Signed-off-by: Nick Piggin <npiggin@suse.de> Cc: Jeff Dike <jdike@addtoit.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'cpus4096-for-linus-2' of ↵	Linus Torvalds	2009-01-02
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'cpus4096-for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (66 commits) x86: export vector_used_by_percpu_irq x86: use logical apicid in x2apic_cluster's x2apic_cpu_mask_to_apicid_and() sched: nominate preferred wakeup cpu, fix x86: fix lguest used_vectors breakage, -v2 x86: fix warning in arch/x86/kernel/io_apic.c sched: fix warning in kernel/sched.c sched: move test_sd_parent() to an SMP section of sched.h sched: add SD_BALANCE_NEWIDLE at MC and CPU level for sched_mc>0 sched: activate active load balancing in new idle cpus sched: bias task wakeups to preferred semi-idle packages sched: nominate preferred wakeup cpu sched: favour lower logical cpu number for sched_mc balance sched: framework for sched_mc/smt_power_savings=N sched: convert BALANCE_FOR_xx_POWER to inline functions x86: use possible_cpus=NUM to extend the possible cpus allowed x86: fix cpu_mask_to_apicid_and to include cpu_online_mask x86: update io_apic.c to the new cpumask code x86: Introduce topology_core_cpumask()/topology_thread_cpumask() x86: xen: use smp_call_function_many() x86: use work_on_cpu in x86/kernel/cpu/mcheck/mce_amd_64.c ... Fixed up trivial conflict in kernel/time/tick-sched.c manually
\| *	cpumask: convert struct clock_event_device to cpumask pointers.	Rusty Russell	2008-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: change calling convention of existing clock_event APIs struct clock_event_timer's cpumask field gets changed to take pointer, as does the ->broadcast function. Another single-patch change. For safety, we BUG_ON() in clockevents_register_device() if it's not set. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Cc: Ingo Molnar <mingo@elte.hu>
\| *	cpumask: centralize cpu_online_map and cpu_possible_map	Rusty Russell	2008-12-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Impact: cleanup Each SMP arch defines these themselves. Move them to a central location. Twists: 1) Some archs (m32, parisc, s390) set possible_map to all 1, so we add a CONFIG_INIT_ALL_POSSIBLE for this rather than break them. 2) mips and sparc32 '#define cpu_possible_map phys_cpu_present_map'. Those archs simply have phys_cpu_present_map replaced everywhere. 3) Alpha defined cpu_possible_map to cpu_present_map; this is tricky so I just manipulate them both in sync. 4) IA64, cris and m32r have gratuitous 'extern cpumask_t cpu_possible_map' declarations. Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> Reviewed-by: Grant Grundler <grundler@parisc-linux.org> Tested-by: Tony Luck <tony.luck@intel.com> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Mike Travis <travis@sgi.com> Cc: ink@jurassic.park.msu.ru Cc: rmk@arm.linux.org.uk Cc: starvik@axis.com Cc: tony.luck@intel.com Cc: takata@linux-m32r.org Cc: ralf@linux-mips.org Cc: grundler@parisc-linux.org Cc: paulus@samba.org Cc: schwidefsky@de.ibm.com Cc: lethal@linux-sh.org Cc: wli@holomorphy.com Cc: davem@davemloft.net Cc: jdike@addtoit.com Cc: mingo@redhat.com
* \|	take init_fs to saner place	Al Viro	2008-12-31
\|/ \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	x86, um: get rid of uml unistd.h	Al Viro	2008-10-23
\| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
*	x86, um: sanitize uml sigcontext.h uses	Al Viro	2008-10-23
\| \| \| \| \| \| \| \| \| \| \| \|	a) the only difference between sigcontext and sysdep/sigcontext is that the former contains externs for two long-dead functions. Removed, switched the only user to sysdep/sigcontext b) asm/sigcontext.h is removable - that of underlying architecture would get used. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
*	uml: remove the dead TTY_LOG code	Adrian Bunk	2008-10-16
\| \| \| \| \| \| \| \| \| \|	Remove the dead CONFIG_TTY_LOG (no kconfig option). Reported-by: Robert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: Adrian Bunk <bunk@kernel.org> Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	kernel/cpu.c: create a CPU_STARTING cpu_chain notifier	Manfred Spraul	2008-09-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Right now, there is no notifier that is called on a new cpu, before the new cpu begins processing interrupts/softirqs. Various kernel function would need that notification, e.g. kvm works around by calling smp_call_function_single(), rcu polls cpu_online_map. The patch adds a CPU_STARTING notification. It also adds a helper function that sends the message to all cpu_chain handlers. Tested on x86-64. All other archs are untested. Especially on sparc, I'm not sure if I got it right. Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	um: use generic show_mem()	Johannes Weiner	2008-07-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Remove arch-specific show_mem() in favor of the generic version. This also removes the following redundant information display: - free swap pages, printed by show_swap_cache_info() - pages in swapcache, printed by show_swap_cache_info() where show_mem() calls show_free_areas(), which calls show_swap_cache_info(). Signed-off-by: Johannes Weiner <hannes@saeurebad.de> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	Merge branch 'timers-fixes-for-linus' of ↵	Linus Torvalds	2008-07-24
\|\ \| \| \| \| \| \| \| \| \| \| \| \| \| \|	git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: nohz: adjust tick_nohz_stop_sched_tick() call of s390 as well nohz: prevent tick stop outside of the idle loop
\| *	Merge branch 'linus' into timers/nohz	Ingo Molnar	2008-07-18
\| \|\
\| * \|	nohz: prevent tick stop outside of the idle loop	Thomas Gleixner	2008-07-18
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Jack Ren and Eric Miao tracked down the following long standing problem in the NOHZ code: scheduler switch to idle task enable interrupts Window starts here ----> interrupt happens (does not set NEED_RESCHED) irq_exit() stops the tick ----> interrupt happens (does set NEED_RESCHED) return from schedule() cpu_idle(): preempt_disable(); Window ends here The interrupts can happen at any point inside the race window. The first interrupt stops the tick, the second one causes the scheduler to rerun and switch away from idle again and we end up with the tick disabled. The fact that it needs two interrupts where the first one does not set NEED_RESCHED and the second one does made the bug obscure and extremly hard to reproduce and analyse. Kudos to Jack and Eric. Solution: Limit the NOHZ functionality to the idle loop to make sure that we can not run into such a situation ever again. cpu_idle() { preempt_disable(); while(1) { tick_nohz_stop_sched_tick(1); <- tell NOHZ code that we are in the idle loop while (!need_resched()) halt(); tick_nohz_restart_sched_tick(); <- disables NOHZ mode preempt_enable_no_resched(); schedule(); preempt_disable(); } } In hindsight we should have done this forever, but ... /me grabs a large brown paperbag. Debugged-by: Jack Ren <jack.ren@marvell.com>, Debugged-by: eric miao <eric.y.miao@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* \| \|	UML: make several more things static	WANG Cong	2008-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Make some variables and functions static, since they don't need to be global. - Remove an unused function - arch/um/kernel/time.c::sched_clock(). - Clean the style a bit as complained by checkpatch.pl. Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	arch/um/kernel/mem.c: remove arch_validate()	WANG Cong	2008-07-24
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	- Remove arch_validate(), because no one uses it. - Remove useless macro HAVE_ARCH_VALIDATE. - Make the variable 'empty_bad_page' static. Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \| \|	arch/um/kernel/irq.c: clean up some functions	WANG Cong	2008-07-24
\| \|/ \|/\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Make activate_fd() and free_irq_by_irq_and_dev() static. Remove init_aio_irq() since it has no users. Cc: Jeff Dike <jdike@addtoit.com> Signed-off-by: WANG Cong <wangcong@zeuux.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* \|	smp_call_function: get rid of the unused nonatomic/retry argument	Jens Axboe	2008-06-26
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It's never used and the comments refer to nonatomic and retry interchangably. So get rid of it. Acked-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com> Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
* \|	uml: deal with inaccessible address space start	Tom Spink	2008-06-06
\|/ \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch makes os_get_task_size locate the bottom of the address space, as well as the top. This is for systems which put a lower limit on mmap addresses. It works by manually scanning pages from zero onwards until a valid page is found. Because the bottom of the address space may not be zero, it's not sufficient to assume the top of the address space is the size of the address space. The size is the difference between the top address and bottom address. [jdike@addtoit.com: changed the name to reflect that this function is supposed to return the top of the process address space, not its size and changed the return value to reflect that. Also some minor formatting changes] Signed-off-by: Tom Spink <tspink@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: add missing exports for UML_RANDOM=m	Al Viro	2008-05-21
\| \| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	[PATCH] take init_files to fs/file.c	Al Viro	2008-05-16
\| \| \| \|	Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
*	uml: physical memory shouldn't include initial stack	Jeff Dike	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \|	The top of physical memory should be below the initial process stack, not the top of the address space, at least for as long as the stack isn't known to the kernel VM system and appropriately reserved. Cc: "Christopher S. Aker" <caker@theshore.net> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: use PAGE_SIZE in linker scripts	Cyrill Gorcunov	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This patch includes page.h header into linker scripts that allow us to use PAGE_SIZE macro instead of numeric constant. To be able to include page.h into linker scripts page.h is needed for some modification - i.e. we need to use __ASSEMBLY__ and _AC macro [jdike@linux.intel.com - fixed conflict with as-layout.h] Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: fix bad NTP interaction with clock	Jeff Dike	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	UML's supposed nanosecond clock interacts badly with NTP when NTP decides that the clock has drifted ahead and needs to be slowed down. Slowing down the clock is done by decrementing the cycle-to-nanosecond multiplier, which is 1. Decrementing that gives you 0 and time is stopped. This is fixed by switching to a microsecond clock, with a multiplier of 1000. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: fix build when SLOB is enabled	Jeff Dike	2008-05-13
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Reintroduce uml_kmalloc for the benefit of UML libc code. The previous tactic of declaring __kmalloc so it could be called directly from the libc side of the house turned out to be getting too intimate with slab, and it doesn't work with slob. So, the uml_kmalloc wrapper is back. It calls kmalloc or whatever that translates into, and libc code calls it. kfree is left alone since that still works, leaving a somewhat inconsistent API. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: style fixes	Jeff Dike	2008-05-13
\| \| \| \| \| \| \| \| \|	A few random style fixes. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	unified (weak) sys_pipe implementation	Ulrich Drepper	2008-05-03
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	This replaces the duplicated arch-specific versions of "sys_pipe()" with one unified implementation. This removes almost 250 lines of duplicated code. It's marked __weak, so that if an architecture wants to override the default implementation it can do so by simply having its own replacement version, since many architectures use alternate calling conventions for the 'pipe()' system call for legacy reasons (ie traditional UNIX implementations often return the two file descriptors in registers) I still haven't changed the cris version even though Linus says the BKL isn't needed. The arch maintainer can easily do it if there are really no obstacles. Signed-off-by: Ulrich Drepper <drepper@redhat.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proc: remove proc_root from drivers	Alexey Dobriyan	2008-04-29
\| \| \| \| \| \| \| \| \| \| \| \|	Remove proc_root export. Creation and removal works well if parent PDE is supplied as NULL -- it worked always that way. So, one useless export removed and consistency added, some drivers created PDEs with &proc_root as parent but removed them as NULL and so on. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	proper extern for late_time_init	Adrian Bunk	2008-04-29
\| \| \| \| \| \| \| \| \| \| \|	Add a proper extern for late_time_init in include/linux/init.h Signed-off-by: Adrian Bunk <bunk@kernel.org> Acked-by: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	arch/um/kernel/um_arch.c: some small improvements	WANG Cong	2008-04-28
\| \| \| \| \| \| \| \| \|	Make some small improvements for arch/um/kernel/um_arch.c. Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Acked-by: Jeff Dike <jdike@addtoit.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	sched: add declaration of sched_tail to sched.h	Harvey Harrison	2008-02-25
\| \| \| \| \| \| \| \| \| \|	Avoids sparse warnings: kernel/sched.c:2170:17: warning: symbol 'schedule_tail' was not declared. Should it be static? Avoids the need for an external declaration in arch/um/process.c Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
*	arch/um/kernel/mem.c: fix a shadowed variable	WANG Cong	2008-02-23
\| \| \| \| \| \| \| \| \| \|	Fix a shadowed variable in arch/um/kernel/mem.c, since there is a global variable has the same name. Cc: Jeff Dike <jdike@linux.intel.com> Signed-off-by: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: fix initrd printk	Johann Felix Soden	2008-02-23
\| \| \| \| \| \| \| \| \| \| \|	If the initrd file has zero-length, the error message should contain the filepath. Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Johann Felix Soden <johfel@users.sourceforge.net> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: fix mm_context memory leak	Jeff Dike	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	[ Spotted by Miklos ] Fix a memory leak in init_new_context. The struct page ** buffer allocated for install_special_mapping was never recorded, and thus leaked when the mm_struct was freed. Fix it by saving the pointer in mm_context_t and freeing it in arch_exit_mmap. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: runtime host VMSPLIT detection	Jeff Dike	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Calculate TASK_SIZE at run-time by figuring out the host's VMSPLIT - this is needed on i386 if UML is to run on hosts with varying VMSPLITs without recompilation. TASK_SIZE is now defined in terms of a variable, task_size. This gets rid of an include of pgtable.h from processor.h, which can cause include loops. On i386, task_size is calculated early in boot by probing the address space in a binary search to figure out where the boundary between usable and non-usable memory is. This tries to make sure that a page that is considered to be in userspace is, or can be made, read-write. I'm concerned about a system-global VDSO page in kernel memory being hit and considered to be a userspace page. On x86_64, task_size is just the old value of CONFIG_TOP_ADDR. A bunch of config variable are gone now. CONFIG_TOP_ADDR is directly replaced by TASK_SIZE. NEST_LEVEL is gone since the relocation of the stubs makes it irrelevant. All the HOST_VMSPLIT stuff is gone. All references to these in arch/um/Makefile are also gone. I noticed and fixed a missing extern in os.h when adding os_get_task_size. Note: This has been revised to fix the 32-bit UML on 64-bit host bug that Miklos ran into. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: Miklos Szeredi <miklos@szeredi.hu> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	CONFIG_HIGHPTE vs. sub-page page tables.	Martin Schwidefsky	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Background: I've implemented 1K/2K page tables for s390. These sub-page page tables are required to properly support the s390 virtualization instruction with KVM. The SIE instruction requires that the page tables have 256 page table entries (pte) followed by 256 page status table entries (pgste). The pgstes are only required if the process is using the SIE instruction. The pgstes are updated by the hardware and by the hypervisor for a number of reasons, one of them is dirty and reference bit tracking. To avoid wasting memory the standard pte table allocation should return 1K/2K (31/64 bit) and 2K/4K if the process is using SIE. Problem: Page size on s390 is 4K, page table size is 1K or 2K. That means the s390 version for pte_alloc_one cannot return a pointer to a struct page. Trouble is that with the CONFIG_HIGHPTE feature on x86 pte_alloc_one cannot return a pointer to a pte either, since that would require more than 32 bit for the return value of pte_alloc_one (and the pte * would not be accessible since its not kmapped). Solution: The only solution I found to this dilemma is a new typedef: a pgtable_t. For s390 pgtable_t will be a (pte ) - to be introduced with a later patch. For everybody else it will be a (struct page ). The additional problem with the initialization of the ptl lock and the NR_PAGETABLE accounting is solved with a constructor pgtable_page_ctor and a destructor pgtable_page_dtor. The page table allocation and free functions need to call these two whenever a page table page is allocated or freed. pmd_populate will get a pgtable_t instead of a struct page pointer. To get the pgtable_t back from a pmd entry that has been installed with pmd_populate a new function pmd_pgtable is added. It replaces the pmd_page call in free_pte_range and apply_to_pte_range. Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	aout: remove unnecessary inclusions of {asm, linux}/a.out.h	David Howells	2008-02-08
\| \| \| \| \| \| \| \| \| \|	Remove now unnecessary inclusions of {asm,linux}/a.out.h. [akpm@linux-foundation.org: fix alpha build] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	aout: suppress A.OUT library support if !CONFIG_ARCH_SUPPORTS_AOUT	David Howells	2008-02-08
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Suppress A.OUT library support if CONFIG_ARCH_SUPPORTS_AOUT is not set. Not all architectures support the A.OUT binfmt, so the ELF binfmt should not be permitted to go looking for A.OUT libraries to load in such a case. Not only that, but under such conditions A.OUT core dumps are not produced either. To make this work, this patch also does the following: (1) Makes the existence of the contents of linux/a.out.h contingent on CONFIG_ARCH_SUPPORTS_AOUT. (2) Renames dump_thread() to aout_dump_thread() as it's only called by A.OUT core dumping code. (3) Moves aout_dump_thread() into asm/a.out-core.h and makes it inline. This is then included only where needed. This means that this bit of arch code will be stored in the appropriate A.OUT binfmt module rather than the core kernel. (4) Drops A.OUT support for Blackfin (according to Mike Frysinger it's not needed) and FRV. This patch depends on the previous patch to move STACK_TOP[_MAX] out of asm/a.out.h and into asm/processor.h as they're required whether or not A.OUT format is available. [jdike@addtoit.com: uml: re-remove accidentally restored code] Signed-off-by: David Howells <dhowells@redhat.com> Cc: <linux-arch@vger.kernel.org> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: redo the calculation of NR_syscalls	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Redo the calculation of NR_syscalls since that disappeared from i386 and use a similar mechanism on x86_64. We now figure out the size of the system call table in arch code and stick that in syscall_table_size. arch/um/kernel/skas/syscall.c defines NR_syscalls in terms of that since its the only thing that needs to know how many system calls there are. The old mechananism that was used on x86_64 is gone. arch/um/include/sysdep-i386/syscalls.h got some formatting since I was looking at it. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Cc: WANG Cong <xiyou.wangcong@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: remove map_cb	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \|	John Reiser noticed that a physical memory region was being mapped twice. This patch fixes that, and it inlines the responsible function, as that had only one caller. Cc: John Reiser <jreiser@BitWagon.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: SMP locking commentary	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \|	Add some more commentary about various pieces of global data not needing locking. Also got rid of unmap_physmem since that is no longer used. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: style fixes in arch/um/kernel	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Joe Perches noticed some printks in smp.c that needed fixing. While I was in there, I did the usual tidying in arch/um/kernel, which should be fairly style-clean at this point: copyright updates emacs formatting comments removal include tidying style fixes Cc: Joe Perches <joe@perches.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: get rid of syscall counters	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \|	Get rid of some syscall counters which haven't been useful in ages. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: don't kill pid 0	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	A bit of defensive programming - during development, it ocassionally happens that a call to init_new_context is missed, resulting in context holding a host pid of zero. When that address space is torn down, destroy_context does a kill(0), which instantly kills the whole UML without any errors whatsoever. This patch add a check for pids less than 2, to also catch 1 and negative pids. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: install panic notifier earlier	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	It turns out that if there's a panic early enough, UML will just sit there in the LED-blinking loop because the panic notifier hadn't been installed yet. This patch installs it earlier. It also fixes the problem which exposed the hang, namely that if you give UML a zero-sized initrd, it will ask alloc_bootmem for zero bytes, and that will cause the panic. While I was in initrd.c, I gave it a style makeover. Prompted by checkpatch, I moved a couple extern declarations of uml_exitcode to kern_util.h. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: eliminate setjmp_wrapper	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	setjmp_wrapper existed to provide setjmp to kernel code when UML used libc's setjmp and longjmp. Now that UML has its own implementation, this isn't needed and kernel code can invoke setjmp directly. do_buffer_op is massively cleaned up since it is no longer a callback from setjmp_wrapper and given a va_list from which it must extract its arguments. The actual setjmp is moved from buffer_op to do_op_one_page because the copy operation is inside an atomic section (kmap_atomic to kunmap_atomic) and it shouldn't be longjmp-ed out of. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: customize tlb.h	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Customize the hooks in tlb.h to optimize TLB flushing some more. Add start and end fields to tlb_gather_mmu, which are used to limit the address space range scanned when a region is unmapped. The interfaces which just free page tables, without actually changing mappings, don't need to cause a TLB flush. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: 64-bit tlb fixes	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \|	Some 64-bit tlb fixes - moved pmd_page_vaddr to pgtable.h since it's the same for both 2-level and 3-level page tables fixed a bogus cast on pud_page_vaddr made the address checking in update_*_range more careful Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: miscellaneous code cleanups	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Code tidying - the pid field of struct irq_fd isn't used, so it is removed os_set_fd_async needed to read flags before changing them, it doesn't need a pid passed in because it can call getpid itself, and a block of unused code needed deleting os_get_exec_close was unused, so it is removed ptrace_child called _exit for historical reasons which are no longer valid, so just calls exit instead Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: remove duplicate config symbol and unused file and variables	Karol Swietlicki	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \|	Fix the repetition of the NET symbol. It was once in UML specific options and once in networking. I removed the first occurrence, as it makes more sense to me to keep it only in networking. It also removes a mostly empty file which is not used anymore and some unused variables. Signed-off-by: Karol Swietlicki <magotari@gmail.com> Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
*	uml: cover stubs with a VMA	Jeff Dike	2008-02-05
\| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \| \|	Give the stubs a VMA. This allows the removal of a truly nasty kludge to make sure that mm->nr_ptes was correct in exit_mmap. The underlying problem was always that the stubs, which have ptes, and thus allocated a page table, weren't covered by a VMA. This patch fixes that by using install_special_mapping in arch_dup_mmap and activate_context to create the VMA. The stubs have to be moved, since shift_arg_pages seems to assume that the stack is the only VMA present at that point during exec, and uses vma_adjust to fiddle its VMA. However, that extends the stub VMA by the amount removed from the stack VMA. To avoid this problem, the stubs were moved to a different fixed location at the start of the address space. The init_stub_pte calls were moved from init_new_context to arch_dup_mmap because I was occasionally seeing arch_dup_mmap not being called, causing exit_mmap to die. Rather than figure out what was really happening, I decided it was cleaner to just move the calls so that there's no doubt that both the pte and VMA creation happen, no matter what. arch_exit_mmap is used to clear the stub ptes at exit time. The STUB_* constants in as-layout.h no longer depend on UM_TASK_SIZE, that that definition is removed, along with the comments complaining about gcc. Because the stubs are no longer at the top of the address space, some care is needed while flushing TLBs. update_pte_range checks for addresses in the stub range and skips them. flush_thread now issues two unmaps, one for the range before STUB_START and one for the range after STUB_END. Signed-off-by: Jeff Dike <jdike@linux.intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>