From 4732efbeb997189d9f9b04708dc26bf8613ed721 Mon Sep 17 00:00:00 2001 From: Jakub Jelinek Date: Tue, 6 Sep 2005 15:16:25 -0700 Subject: [PATCH] FUTEX_WAKE_OP: pthread_cond_signal() speedup ATM pthread_cond_signal is unnecessarily slow, because it wakes one waiter (which at least on UP usually means an immediate context switch to one of the waiter threads). This waiter wakes up and after a few instructions it attempts to acquire the cv internal lock, but that lock is still held by the thread calling pthread_cond_signal. So it goes to sleep and eventually the signalling thread is scheduled in, unlocks the internal lock and wakes the waiter again. Now, before 2003-09-21 NPTL was using FUTEX_REQUEUE in pthread_cond_signal to avoid this performance issue, but it was removed when locks were redesigned to the 3 state scheme (unlocked, locked uncontended, locked contended). Following scenario shows why simply using FUTEX_REQUEUE in pthread_cond_signal together with using lll_mutex_unlock_force in place of lll_mutex_unlock is not enough and probably why it has been disabled at that time: The number is value in cv->__data.__lock. thr1 thr2 thr3 0 pthread_cond_wait 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) 0 lll_futex_wait (&cv->__data.__futex, futexval) 0 pthread_cond_signal 1 lll_mutex_lock (cv->__data.__lock) 1 pthread_cond_signal 2 lll_mutex_lock (cv->__data.__lock) 2 lll_futex_wait (&cv->__data.__lock, 2) 2 lll_futex_requeue (&cv->__data.__futex, 0, 1, &cv->__data.__lock) # FUTEX_REQUEUE, not FUTEX_CMP_REQUEUE 2 lll_mutex_unlock_force (cv->__data.__lock) 0 cv->__data.__lock = 0 0 lll_futex_wake (&cv->__data.__lock, 1) 1 lll_mutex_lock (cv->__data.__lock) 0 lll_mutex_unlock (cv->__data.__lock) # Here, lll_mutex_unlock doesn't know there are threads waiting # on the internal cv's lock Now, I believe it is possible to use FUTEX_REQUEUE in pthread_cond_signal, but it will cost us not one, but 2 extra syscalls and, what's worse, one of these extra syscalls will be done for every single waiting loop in pthread_cond_*wait. We would need to use lll_mutex_unlock_force in pthread_cond_signal after requeue and lll_mutex_cond_lock in pthread_cond_*wait after lll_futex_wait. Another alternative is to do the unlocking pthread_cond_signal needs to do (the lock can't be unlocked before lll_futex_wake, as that is racy) in the kernel. I have implemented both variants, futex-requeue-glibc.patch is the first one and futex-wake_op{,-glibc}.patch is the unlocking inside of the kernel. The kernel interface allows userland to specify how exactly an unlocking operation should look like (some atomic arithmetic operation with optional constant argument and comparison of the previous futex value with another constant). It has been implemented just for ppc*, x86_64 and i?86, for other architectures I'm including just a stub header which can be used as a starting point by maintainers to write support for their arches and ATM will just return -ENOSYS for FUTEX_WAKE_OP. The requeue patch has been (lightly) tested just on x86_64, the wake_op patch on ppc64 kernel running 32-bit and 64-bit NPTL and x86_64 kernel running 32-bit and 64-bit NPTL. With the following benchmark on UP x86-64 I get: for i in nptl-orig nptl-requeue nptl-wake_op; do echo time elf/ld.so --library-path .:$i /tmp/bench; \ for j in 1 2; do echo ( time elf/ld.so --library-path .:$i /tmp/bench ) 2>&1; done; done time elf/ld.so --library-path .:nptl-orig /tmp/bench real 0m0.655s user 0m0.253s sys 0m0.403s real 0m0.657s user 0m0.269s sys 0m0.388s time elf/ld.so --library-path .:nptl-requeue /tmp/bench real 0m0.496s user 0m0.225s sys 0m0.271s real 0m0.531s user 0m0.242s sys 0m0.288s time elf/ld.so --library-path .:nptl-wake_op /tmp/bench real 0m0.380s user 0m0.176s sys 0m0.204s real 0m0.382s user 0m0.175s sys 0m0.207s The benchmark is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00001.txt Older futex-requeue-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00002.txt Older futex-wake_op-glibc.patch version is at: http://sourceware.org/ml/libc-alpha/2005-03/txt00003.txt Will post a new version (just x86-64 fixes so that the patch applies against pthread_cond_signal.S) to libc-hacker ml soon. Attached is the kernel FUTEX_WAKE_OP patch as well as a simple-minded testcase that will not test the atomicity of the operation, but at least check if the threads that should have been woken up are woken up and whether the arithmetic operation in the kernel gave the expected results. Acked-by: Ingo Molnar Cc: Ulrich Drepper Cc: Jamie Lokier Cc: Rusty Russell Signed-off-by: Yoichi Yuasa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/futex.h | 53 ++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 53 insertions(+) create mode 100644 include/asm-parisc/futex.h (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/futex.h b/include/asm-parisc/futex.h new file mode 100644 index 000000000000..2cac5ecd9d00 --- /dev/null +++ b/include/asm-parisc/futex.h @@ -0,0 +1,53 @@ +#ifndef _ASM_FUTEX_H +#define _ASM_FUTEX_H + +#ifdef __KERNEL__ + +#include +#include +#include + +static inline int +futex_atomic_op_inuser (int encoded_op, int __user *uaddr) +{ + int op = (encoded_op >> 28) & 7; + int cmp = (encoded_op >> 24) & 15; + int oparg = (encoded_op << 8) >> 20; + int cmparg = (encoded_op << 20) >> 20; + int oldval = 0, ret, tem; + if (encoded_op & (FUTEX_OP_OPARG_SHIFT << 28)) + oparg = 1 << oparg; + + if (! access_ok (VERIFY_WRITE, uaddr, sizeof(int))) + return -EFAULT; + + inc_preempt_count(); + + switch (op) { + case FUTEX_OP_SET: + case FUTEX_OP_ADD: + case FUTEX_OP_OR: + case FUTEX_OP_ANDN: + case FUTEX_OP_XOR: + default: + ret = -ENOSYS; + } + + dec_preempt_count(); + + if (!ret) { + switch (cmp) { + case FUTEX_OP_CMP_EQ: ret = (oldval == cmparg); break; + case FUTEX_OP_CMP_NE: ret = (oldval != cmparg); break; + case FUTEX_OP_CMP_LT: ret = (oldval < cmparg); break; + case FUTEX_OP_CMP_GE: ret = (oldval >= cmparg); break; + case FUTEX_OP_CMP_LE: ret = (oldval <= cmparg); break; + case FUTEX_OP_CMP_GT: ret = (oldval > cmparg); break; + default: ret = -ENOSYS; + } + } + return ret; +} + +#endif +#endif -- cgit v1.2.2 From 202e5979af4d91c7ca05892641131dee22653259 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:16:40 -0700 Subject: [PATCH] compat: be more consistent about [ug]id_t When I first wrote the compat layer patches, I was somewhat cavalier about the definition of compat_uid_t and compat_gid_t (or maybe I just misunderstood :-)). This patch makes the compat types much more consistent with the types we are being compatible with and hopefully will fix a few bugs along the way. compat type type in compat arch __compat_[ug]id_t __kernel_[ug]id_t __compat_[ug]id32_t __kernel_[ug]id32_t compat_[ug]id_t [ug]id_t The difference is that compat_uid_t is always 32 bits (for the archs we care about) but __compat_uid_t may be 16 bits on some. Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/compat.h | 10 ++++++---- 1 file changed, 6 insertions(+), 4 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/compat.h b/include/asm-parisc/compat.h index 7630d1ad2391..38b918feead9 100644 --- a/include/asm-parisc/compat.h +++ b/include/asm-parisc/compat.h @@ -13,8 +13,10 @@ typedef s32 compat_ssize_t; typedef s32 compat_time_t; typedef s32 compat_clock_t; typedef s32 compat_pid_t; -typedef u32 compat_uid_t; -typedef u32 compat_gid_t; +typedef u32 __compat_uid_t; +typedef u32 __compat_gid_t; +typedef u32 __compat_uid32_t; +typedef u32 __compat_gid32_t; typedef u16 compat_mode_t; typedef u32 compat_ino_t; typedef u32 compat_dev_t; @@ -67,8 +69,8 @@ struct compat_stat { compat_dev_t st_realdev; u16 st_basemode; u16 st_spareshort; - compat_uid_t st_uid; - compat_gid_t st_gid; + __compat_uid32_t st_uid; + __compat_gid32_t st_gid; u32 st_spare4[3]; }; -- cgit v1.2.2 From 36d57ac4a818cb4aa3edbdf63ad2ebc31106f925 Mon Sep 17 00:00:00 2001 From: "H. J. Lu" Date: Tue, 6 Sep 2005 15:16:49 -0700 Subject: [PATCH] auxiliary vector cleanups The size of auxiliary vector is fixed at 42 in linux/sched.h. But it isn't very obvious when looking at linux/elf.h. This patch adds AT_VECTOR_SIZE so that we can change it if necessary when a new vector is added. Because of include file ordering problems, doing this necessitated the extraction of the AT_* symbols into a standalone header file. Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/auxvec.h | 4 ++++ 1 file changed, 4 insertions(+) create mode 100644 include/asm-parisc/auxvec.h (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/auxvec.h b/include/asm-parisc/auxvec.h new file mode 100644 index 000000000000..9c3ac4b89dc9 --- /dev/null +++ b/include/asm-parisc/auxvec.h @@ -0,0 +1,4 @@ +#ifndef __ASMPARISC_AUXVEC_H +#define __ASMPARISC_AUXVEC_H + +#endif -- cgit v1.2.2 From f26fdd59929e1144c6caf72adcaf4561d6e682a4 Mon Sep 17 00:00:00 2001 From: Karsten Wiese Date: Tue, 6 Sep 2005 15:17:25 -0700 Subject: [PATCH] CHECK_IRQ_PER_CPU() to avoid dead code in __do_IRQ() IRQ_PER_CPU is not used by all architectures. This patch introduces the macros ARCH_HAS_IRQ_PER_CPU and CHECK_IRQ_PER_CPU() to avoid the generation of dead code in __do_IRQ(). ARCH_HAS_IRQ_PER_CPU is defined by architectures using IRQ_PER_CPU in their include/asm_ARCH/irq.h file. Through grepping the tree I found the following architectures currently use IRQ_PER_CPU: cris, ia64, ppc, ppc64 and parisc. Signed-off-by: Karsten Wiese Acked-by: Ingo Molnar Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/irq.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/irq.h b/include/asm-parisc/irq.h index 75654ba93353..f876bdf22056 100644 --- a/include/asm-parisc/irq.h +++ b/include/asm-parisc/irq.h @@ -26,6 +26,11 @@ #define NR_IRQS (CPU_IRQ_MAX + 1) +/* + * IRQ line status macro IRQ_PER_CPU is used + */ +#define ARCH_HAS_IRQ_PER_CPU + static __inline__ int irq_canonicalize(int irq) { return (irq == 2) ? 9 : irq; -- cgit v1.2.2 From c8d127418d78aaeeb1a417ef7453dc09c9118146 Mon Sep 17 00:00:00 2001 From: Christoph Hellwig Date: Tue, 6 Sep 2005 15:17:27 -0700 Subject: [PATCH] remove asm-*/hdreg.h unused and useless.. Signed-off-by: Christoph Hellwig Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/hdreg.h | 1 - 1 file changed, 1 deletion(-) delete mode 100644 include/asm-parisc/hdreg.h (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/hdreg.h b/include/asm-parisc/hdreg.h deleted file mode 100644 index 7f7fd1af0af3..000000000000 --- a/include/asm-parisc/hdreg.h +++ /dev/null @@ -1 +0,0 @@ -#include -- cgit v1.2.2 From 97de50c0add1e8f3b4e764c66a13c07235fee631 Mon Sep 17 00:00:00 2001 From: Jesper Juhl Date: Tue, 6 Sep 2005 15:17:49 -0700 Subject: [PATCH] remove verify_area(): remove verify_area() from various uaccess.h headers Remove the deprecated (and unused) verify_area() from various uaccess.h headers. Signed-off-by: Jesper Juhl Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/uaccess.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/uaccess.h b/include/asm-parisc/uaccess.h index c1b5bdea53ee..f6c417c8c484 100644 --- a/include/asm-parisc/uaccess.h +++ b/include/asm-parisc/uaccess.h @@ -40,10 +40,6 @@ static inline long access_ok(int type, const void __user * addr, return 1; } -#define verify_area(type,addr,size) (0) /* FIXME: all users should go away soon, - * and use access_ok instead, then this - * should be removed. */ - #define put_user __put_user #define get_user __get_user -- cgit v1.2.2 From 9317259ead88fe6c05120ae1e3ace99738e2c698 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:17:57 -0700 Subject: [PATCH] Create asm-generic/fcntl.h This set of patches creates asm-generic/fcntl.h and consolidates as much as possible from the asm-*/fcntl.h files into it. This patch just gathers all the identical bits of the asm-*/fcntl.h files into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell Signed-off-by: Yoichi Yuasa Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/fcntl.h | 26 +------------------------- 1 file changed, 1 insertion(+), 25 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h index def35230716a..0a4ee6398735 100644 --- a/include/asm-parisc/fcntl.h +++ b/include/asm-parisc/fcntl.h @@ -3,10 +3,6 @@ /* open/fcntl - O_SYNC is only implemented on blocks devices and on files located on an ext2 file system */ -#define O_ACCMODE 00000003 -#define O_RDONLY 00000000 -#define O_WRONLY 00000001 -#define O_RDWR 00000002 #define O_APPEND 00000010 #define O_BLKSEEK 00000100 /* HPUX only */ #define O_CREAT 00000400 /* not fcntl */ @@ -27,11 +23,6 @@ #define O_NOFOLLOW 00000200 /* don't follow links */ #define O_INVISIBLE 04000000 /* invisible I/O, for DMAPI/XDSM */ -#define F_DUPFD 0 /* dup */ -#define F_GETFD 1 /* get f_flags */ -#define F_SETFD 2 /* set f_flags */ -#define F_GETFL 3 /* more flags (cloexec) */ -#define F_SETFL 4 #define F_GETLK 5 #define F_SETLK 6 #define F_SETLKW 7 @@ -44,9 +35,6 @@ #define F_SETSIG 13 /* for sockets. */ #define F_GETSIG 14 /* for sockets. */ -/* for F_[GET|SET]FL */ -#define FD_CLOEXEC 1 /* actually anything with low bit set goes */ - /* for posix fcntl() and lockf() */ #define F_RDLCK 01 #define F_WRLCK 02 @@ -59,18 +47,6 @@ /* for leases */ #define F_INPROGRESS 16 -/* operations for bsd flock(), also used by the kernel implementation */ -#define LOCK_SH 1 /* shared lock */ -#define LOCK_EX 2 /* exclusive lock */ -#define LOCK_NB 4 /* or'd with one of the above to prevent - blocking */ -#define LOCK_UN 8 /* remove lock */ - -#define LOCK_MAND 32 /* This is a mandatory flock */ -#define LOCK_READ 64 /* ... Which allows concurrent read operations */ -#define LOCK_WRITE 128 /* ... Which allows concurrent write operations */ -#define LOCK_RW 192 /* ... Which allows concurrent read & write ops */ - struct flock { short l_type; short l_whence; @@ -87,6 +63,6 @@ struct flock64 { pid_t l_pid; }; -#define F_LINUX_SPECIFIC_BASE 1024 +#include #endif -- cgit v1.2.2 From e64ca97fd80a129e538ca42d0b12c379746b83db Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:17:58 -0700 Subject: [PATCH] Clean up the open flags This patch puts the most popular of each open flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/fcntl.h | 4 ---- 1 file changed, 4 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h index 0a4ee6398735..64664ba33440 100644 --- a/include/asm-parisc/fcntl.h +++ b/include/asm-parisc/fcntl.h @@ -6,19 +6,15 @@ #define O_APPEND 00000010 #define O_BLKSEEK 00000100 /* HPUX only */ #define O_CREAT 00000400 /* not fcntl */ -#define O_TRUNC 00001000 /* not fcntl */ #define O_EXCL 00002000 /* not fcntl */ #define O_LARGEFILE 00004000 #define O_SYNC 00100000 #define O_NONBLOCK 00200004 /* HPUX has separate NDELAY & NONBLOCK */ -#define O_NDELAY O_NONBLOCK #define O_NOCTTY 00400000 /* not fcntl */ #define O_DSYNC 01000000 /* HPUX only */ #define O_RSYNC 02000000 /* HPUX only */ #define O_NOATIME 04000000 -#define FASYNC 00020000 /* fcntl, for BSD compatibility */ -#define O_DIRECT 00040000 /* direct disk access hint - currently ignored */ #define O_DIRECTORY 00010000 /* must be a directory */ #define O_NOFOLLOW 00000200 /* don't follow links */ #define O_INVISIBLE 04000000 /* invisible I/O, for DMAPI/XDSM */ -- cgit v1.2.2 From 1abf62afb6e9cdc1b2618b69067a186b94281587 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:17:59 -0700 Subject: [PATCH] Clean up the fcntl operations This patch puts the most popular of each fcntl operation/flag into asm-generic/fcntl.h and cleans up the arch files. Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/fcntl.h | 10 ---------- 1 file changed, 10 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h index 64664ba33440..59a54fff7c63 100644 --- a/include/asm-parisc/fcntl.h +++ b/include/asm-parisc/fcntl.h @@ -19,9 +19,6 @@ #define O_NOFOLLOW 00000200 /* don't follow links */ #define O_INVISIBLE 04000000 /* invisible I/O, for DMAPI/XDSM */ -#define F_GETLK 5 -#define F_SETLK 6 -#define F_SETLKW 7 #define F_GETLK64 8 #define F_SETLK64 9 #define F_SETLKW64 10 @@ -36,13 +33,6 @@ #define F_WRLCK 02 #define F_UNLCK 03 -/* for old implementation of bsd flock () */ -#define F_EXLCK 4 /* or 3 */ -#define F_SHLCK 8 /* or 4 */ - -/* for leases */ -#define F_INPROGRESS 16 - struct flock { short l_type; short l_whence; -- cgit v1.2.2 From 5ac353f9baf7169298ebb7de86b2d697b25bca44 Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:18:00 -0700 Subject: [PATCH] Clean up struct flock definitions This patch just gathers together all the struct flock definitions except xtensa into asm-generic/fcntl.h. Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/fcntl.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h index 59a54fff7c63..eadda003c201 100644 --- a/include/asm-parisc/fcntl.h +++ b/include/asm-parisc/fcntl.h @@ -33,14 +33,6 @@ #define F_WRLCK 02 #define F_UNLCK 03 -struct flock { - short l_type; - short l_whence; - off_t l_start; - off_t l_len; - pid_t l_pid; -}; - struct flock64 { short l_type; short l_whence; -- cgit v1.2.2 From 8d286aa5eaf951bf53d4a0f64576d4b377c435ba Mon Sep 17 00:00:00 2001 From: Stephen Rothwell Date: Tue, 6 Sep 2005 15:18:01 -0700 Subject: [PATCH] Clean up struct flock64 definitions This patch gathers all the struct flock64 definitions (and the operations), puts them under !CONFIG_64BIT and cleans up the arch files. Signed-off-by: Stephen Rothwell Signed-off-by: Andrew Morton Signed-off-by: Linus Torvalds --- include/asm-parisc/fcntl.h | 8 -------- 1 file changed, 8 deletions(-) (limited to 'include/asm-parisc') diff --git a/include/asm-parisc/fcntl.h b/include/asm-parisc/fcntl.h index eadda003c201..317851fa78f3 100644 --- a/include/asm-parisc/fcntl.h +++ b/include/asm-parisc/fcntl.h @@ -33,14 +33,6 @@ #define F_WRLCK 02 #define F_UNLCK 03 -struct flock64 { - short l_type; - short l_whence; - loff_t l_start; - loff_t l_len; - pid_t l_pid; -}; - #include #endif -- cgit v1.2.2