aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorMike Stroyan <mike.stroyan@hp.com>2005-11-29 19:12:55 -0500
committerDavid S. Miller <davem@davemloft.net>2005-11-29 19:12:55 -0500
commit18955cfcb2a5d75a08e0cb297f13ccfb6904de48 (patch)
tree0cd153fab98e3fc09b4741f811e55e81d5719111
parent2f12c74f0cfdc93e1d47ac70766e837ef29472fd (diff)
[IPV4] tcp/route: Another look at hash table sizes
The tcp_ehash hash table gets too big on systems with really big memory. It is worse on systems with pages larger than 4KB. It wastes memory that could be better used. It also makes the netstat command slow because reading /proc/net/tcp and /proc/net/tcp6 needs to go through the full hash table. The default value should not be larger for larger page sizes. It seems that the effect of page size is an unintended error dating back a long time. I also wonder if the default value really should be a larger fraction of memory for systems with more memory. While systems with really big ram can afford more space for hash tables, it is not clear to me that they benefit from increasing the allocation ratio for this table. The amount of memory allocated is determined by net/ipv4/tcp.c:tcp_init and mm/page_alloc.c:alloc_large_system_hash. tcp_init calls alloc_large_system_hash passing parameters- bucketsize=sizeof(struct tcp_ehash_bucket) numentries=thash_entries scale=(num_physpages >= 128 * 1024) ? (25-PAGE_SHIFT) : (27-PAGE_SHIFT) limit=0 On i386, PAGE_SHIFT is 12 for a page size of 4K On ia64, PAGE_SHIFT defaults to 14 for a page size of 16K The num_physpages test above makes the allocation take a larger fraction of the total memory on systems with larger memory. The threshold size for a i386 system is 512MB. For an ia64 system with 16KB pages the threshold is 2GB. For smaller memory systems- On i386, scale = (27 - 12) = 15 On ia64, scale = (27 - 14) = 13 For larger memory systems- On i386, scale = (25 - 12) = 13 On ia64, scale = (25 - 14) = 11 For the rest of this discussion, I'll just track the larger memory case. The default behavior has numentries=thash_entries=0, so the allocated size is determined by either scale or by the default limit of 1/16 of total memory. In alloc_large_system_hash- | numentries = (flags & HASH_HIGHMEM) ? nr_all_pages : nr_kernel_pages; | numentries += (1UL << (20 - PAGE_SHIFT)) - 1; | numentries >>= 20 - PAGE_SHIFT; | numentries <<= 20 - PAGE_SHIFT; At this point, numentries is pages for all of memory, rounded up to the nearest megabyte boundary. | /* limit to 1 bucket per 2^scale bytes of low memory */ | if (scale > PAGE_SHIFT) | numentries >>= (scale - PAGE_SHIFT); | else | numentries <<= (PAGE_SHIFT - scale); On i386, numentries >>= (13 - 12), so numentries is 1/8196 of bytes of total memory. On ia64, numentries <<= (14 - 11), so numentries is 1/2048 of bytes of total memory. | log2qty = long_log2(numentries); | | do { | size = bucketsize << log2qty; bucketsize is 16, so size is 16 times numentries, rounded down to a power of two. On i386, size is 1/512 of bytes of total memory. On ia64, size is 1/128 of bytes of total memory. For smaller systems the results are On i386, size is 1/2048 of bytes of total memory. On ia64, size is 1/512 of bytes of total memory. The large page effect can be removed by just replacing the use of PAGE_SHIFT with a constant of 12 in the calls to alloc_large_system_hash. That makes them more like the other uses of that function from fs/inode.c and fs/dcache.c Signed-off-by: David S. Miller <davem@davemloft.net>
-rw-r--r--net/ipv4/route.c3
-rw-r--r--net/ipv4/tcp.c6
2 files changed, 3 insertions, 6 deletions
diff --git a/net/ipv4/route.c b/net/ipv4/route.c
index 381dd6a6aebb..e9c14f4a2eba 100644
--- a/net/ipv4/route.c
+++ b/net/ipv4/route.c
@@ -3149,8 +3149,7 @@ int __init ip_rt_init(void)
3149 sizeof(struct rt_hash_bucket), 3149 sizeof(struct rt_hash_bucket),
3150 rhash_entries, 3150 rhash_entries,
3151 (num_physpages >= 128 * 1024) ? 3151 (num_physpages >= 128 * 1024) ?
3152 (27 - PAGE_SHIFT) : 3152 15 : 17,
3153 (29 - PAGE_SHIFT),
3154 HASH_HIGHMEM, 3153 HASH_HIGHMEM,
3155 &rt_hash_log, 3154 &rt_hash_log,
3156 &rt_hash_mask, 3155 &rt_hash_mask,
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 9ac7a4f46bd8..5e6bc4b32875 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -2065,8 +2065,7 @@ void __init tcp_init(void)
2065 sizeof(struct inet_ehash_bucket), 2065 sizeof(struct inet_ehash_bucket),
2066 thash_entries, 2066 thash_entries,
2067 (num_physpages >= 128 * 1024) ? 2067 (num_physpages >= 128 * 1024) ?
2068 (25 - PAGE_SHIFT) : 2068 13 : 15,
2069 (27 - PAGE_SHIFT),
2070 HASH_HIGHMEM, 2069 HASH_HIGHMEM,
2071 &tcp_hashinfo.ehash_size, 2070 &tcp_hashinfo.ehash_size,
2072 NULL, 2071 NULL,
@@ -2082,8 +2081,7 @@ void __init tcp_init(void)
2082 sizeof(struct inet_bind_hashbucket), 2081 sizeof(struct inet_bind_hashbucket),
2083 tcp_hashinfo.ehash_size, 2082 tcp_hashinfo.ehash_size,
2084 (num_physpages >= 128 * 1024) ? 2083 (num_physpages >= 128 * 1024) ?
2085 (25 - PAGE_SHIFT) : 2084 13 : 15,
2086 (27 - PAGE_SHIFT),
2087 HASH_HIGHMEM, 2085 HASH_HIGHMEM,
2088 &tcp_hashinfo.bhash_size, 2086 &tcp_hashinfo.bhash_size,
2089 NULL, 2087 NULL,