aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorIngo Molnar <mingo@elte.hu>2005-08-01 07:39:13 -0400
committerLinus Torvalds <torvalds@g5.osdl.org>2005-08-01 13:03:56 -0400
commit6cb54819d7b1867053e2dfd8c0ca3a8dc65a7eff (patch)
tree1a1422dc2e103fe92dd86bfa26b8b39b3f2413d5
parent5d546f54324e04747e82ccbb4ea85f54bdcacd6d (diff)
[PATCH] remove sys_set_zone_reclaim()
This removes sys_set_zone_reclaim() for now. While i'm sure Martin is trying to solve a real problem, we must not hard-code an incomplete and insufficient approach into a syscall, because syscalls are pretty much for eternity. I am quite strongly convinced that this syscall must not hit v2.6.13 in its current form. Firstly, the syscall lacks basic syscall design: e.g. it allows the global setting of VM policy for unprivileged users. (!) [ Imagine an Oracle installation and a SAP installation on the same NUMA box fighting over the 'optimal' setting for this flag. What will they do? Will they try to set the flag to their own preferred value every second or so? ] Secondly, it was added based on a single datapoint from Martin: http://marc.theaimsgroup.com/?l=linux-mm&m=111763597218177&w=2 where Martin characterizes the numbers the following way: ' Run-to-run variability for "make -j" is huge, so these numbers aren't terribly useful except to see that with reclaim the benchmark still finishes in a reasonable amount of time. ' in other words: the fundamental problem has likely not been solved, only a tendential move into the right direction has been observed, and a handful of numbers were picked out of a set of hugely variable results, without showing the variability data. How much variance is there run-to-run? I'd really suggest to first walk the walk and see what's needed to get stable & predictable kernel compilation numbers on that NUMA box, before adding random syscalls to tune a particular aspect of the VM ... which approach might not even matter once the whole picture has been analyzed and understood! The third, most important point is that the syscall exposes VM tuning internals in a completely unstructured way. What sense does it make to have a _GLOBAL_ per-node setting for 'should we go to another node for reclaim'? If then it might make sense to do this per-app, via numalib or so. The change is minimalistic in that it doesnt remove the syscall and the underlying infrastructure changes, only the user-visible changes. We could perhaps add a CAP_SYS_ADMIN-only sysctl for this hack, a'ka /proc/sys/vm/swappiness, but even that looks quite counterproductive when the generic approach is that we are trying to reduce the number of external factors in the VM balance picture. Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
-rw-r--r--arch/i386/kernel/syscall_table.S2
-rw-r--r--arch/ia64/kernel/entry.S2
-rw-r--r--kernel/sys_ni.c1
3 files changed, 2 insertions, 3 deletions
diff --git a/arch/i386/kernel/syscall_table.S b/arch/i386/kernel/syscall_table.S
index 468500a7e894..9b21a31d4f4e 100644
--- a/arch/i386/kernel/syscall_table.S
+++ b/arch/i386/kernel/syscall_table.S
@@ -251,7 +251,7 @@ ENTRY(sys_call_table)
251 .long sys_io_submit 251 .long sys_io_submit
252 .long sys_io_cancel 252 .long sys_io_cancel
253 .long sys_fadvise64 /* 250 */ 253 .long sys_fadvise64 /* 250 */
254 .long sys_set_zone_reclaim 254 .long sys_ni_syscall
255 .long sys_exit_group 255 .long sys_exit_group
256 .long sys_lookup_dcookie 256 .long sys_lookup_dcookie
257 .long sys_epoll_create 257 .long sys_epoll_create
diff --git a/arch/ia64/kernel/entry.S b/arch/ia64/kernel/entry.S
index 66946f3fdac7..9be53e1ea404 100644
--- a/arch/ia64/kernel/entry.S
+++ b/arch/ia64/kernel/entry.S
@@ -1573,7 +1573,7 @@ sys_call_table:
1573 data8 sys_keyctl 1573 data8 sys_keyctl
1574 data8 sys_ioprio_set 1574 data8 sys_ioprio_set
1575 data8 sys_ioprio_get // 1275 1575 data8 sys_ioprio_get // 1275
1576 data8 sys_set_zone_reclaim 1576 data8 sys_ni_syscall
1577 data8 sys_inotify_init 1577 data8 sys_inotify_init
1578 data8 sys_inotify_add_watch 1578 data8 sys_inotify_add_watch
1579 data8 sys_inotify_rm_watch 1579 data8 sys_inotify_rm_watch
diff --git a/kernel/sys_ni.c b/kernel/sys_ni.c
index 42b40ae5eada..1ab2370e2efa 100644
--- a/kernel/sys_ni.c
+++ b/kernel/sys_ni.c
@@ -79,7 +79,6 @@ cond_syscall(sys_request_key);
79cond_syscall(sys_keyctl); 79cond_syscall(sys_keyctl);
80cond_syscall(compat_sys_keyctl); 80cond_syscall(compat_sys_keyctl);
81cond_syscall(compat_sys_socketcall); 81cond_syscall(compat_sys_socketcall);
82cond_syscall(sys_set_zone_reclaim);
83cond_syscall(sys_inotify_init); 82cond_syscall(sys_inotify_init);
84cond_syscall(sys_inotify_add_watch); 83cond_syscall(sys_inotify_add_watch);
85cond_syscall(sys_inotify_rm_watch); 84cond_syscall(sys_inotify_rm_watch);