aboutsummaryrefslogtreecommitdiffstats
diff options
context:
space:
mode:
authorDennis Zhou (Facebook) <dennisszhou@gmail.com>2017-07-12 14:27:32 -0400
committerTejun Heo <tj@kernel.org>2017-07-26 17:41:05 -0400
commit40064aeca35c5c14789e2adcf3a1d7e5d4bd65f2 (patch)
tree8fbd31cdcdc24666daf1eba37204c8a52fccd0a6
parent91e914c5a4988d00a13c14297ab02b250611e00e (diff)
percpu: replace area map allocator with bitmap
The percpu memory allocator is experiencing scalability issues when allocating and freeing large numbers of counters as in BPF. Additionally, there is a corner case where iteration is triggered over all chunks if the contig_hint is the right size, but wrong alignment. This patch replaces the area map allocator with a basic bitmap allocator implementation. Each subsequent patch will introduce new features and replace full scanning functions with faster non-scanning options when possible. Implementation: This patchset removes the area map allocator in favor of a bitmap allocator backed by metadata blocks. The primary goal is to provide consistency in performance and memory footprint with a focus on small allocations (< 64 bytes). The bitmap removes the heavy memmove from the freeing critical path and provides a consistent memory footprint. The metadata blocks provide a bound on the amount of scanning required by maintaining a set of hints. In an effort to make freeing fast, the metadata is updated on the free path if the new free area makes a page free, a block free, or spans across blocks. This causes the chunk's contig hint to potentially be smaller than what it could allocate by up to the smaller of a page or a block. If the chunk's contig hint is contained within a block, a check occurs and the hint is kept accurate. Metadata is always kept accurate on allocation, so there will not be a situation where a chunk has a later contig hint than available. Evaluation: I have primarily done testing against a simple workload of allocation of 1 million objects (2^20) of varying size. Deallocation was done by in order, alternating, and in reverse. These numbers were collected after rebasing ontop of a80099a152. I present the worst-case numbers here: Area Map Allocator: Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 310 | 4770 16B | 557 | 1325 64B | 436 | 273 256B | 776 | 131 1024B | 3280 | 122 Bitmap Allocator: Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 490 | 70 16B | 515 | 75 64B | 610 | 80 256B | 950 | 100 1024B | 3520 | 200 This data demonstrates the inability for the area map allocator to handle less than ideal situations. In the best case of reverse deallocation, the area map allocator was able to perform within range of the bitmap allocator. In the worst case situation, freeing took nearly 5 seconds for 1 million 4-byte objects. The bitmap allocator dramatically improves the consistency of the free path. The small allocations performed nearly identical regardless of the freeing pattern. While it does add to the allocation latency, the allocation scenario here is optimal for the area map allocator. The area map allocator runs into trouble when it is allocating in chunks where the latter half is full. It is difficult to replicate this, so I present a variant where the pages are second half filled. Freeing was done sequentially. Below are the numbers for this scenario: Area Map Allocator: Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 4118 | 4892 16B | 1651 | 1163 64B | 598 | 285 256B | 771 | 158 1024B | 3034 | 160 Bitmap Allocator: Object Size | Alloc Time (ms) | Free Time (ms) ---------------------------------------------- 4B | 481 | 67 16B | 506 | 69 64B | 636 | 75 256B | 892 | 90 1024B | 3262 | 147 The data shows a parabolic curve of performance for the area map allocator. This is due to the memmove operation being the dominant cost with the lower object sizes as more objects are packed in a chunk and at higher object sizes, the traversal of the chunk slots is the dominating cost. The bitmap allocator suffers this problem as well. The above data shows the inability to scale for the allocation path with the area map allocator and that the bitmap allocator demonstrates consistent performance in general. The second problem of additional scanning can result in the area map allocator completing in 52 minutes when trying to allocate 1 million 4-byte objects with 8-byte alignment. The same workload takes approximately 16 seconds to complete for the bitmap allocator. V2: Fixed a bug in pcpu_alloc_first_chunk end_offset was setting the bitmap using bytes instead of bits. Added a comment to pcpu_cnt_pop_pages to explain bitmap_weight. Signed-off-by: Dennis Zhou <dennisszhou@gmail.com> Reviewed-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Tejun Heo <tj@kernel.org>
-rw-r--r--include/linux/percpu.h1
-rw-r--r--init/main.c1
-rw-r--r--mm/percpu-internal.h34
-rw-r--r--mm/percpu-km.c2
-rw-r--r--mm/percpu-stats.c99
-rw-r--r--mm/percpu.c729
6 files changed, 362 insertions, 504 deletions
diff --git a/include/linux/percpu.h b/include/linux/percpu.h
index 90e0cb0f7802..b7e6c98722d1 100644
--- a/include/linux/percpu.h
+++ b/include/linux/percpu.h
@@ -120,7 +120,6 @@ extern bool is_kernel_percpu_address(unsigned long addr);
120#if !defined(CONFIG_SMP) || !defined(CONFIG_HAVE_SETUP_PER_CPU_AREA) 120#if !defined(CONFIG_SMP) || !defined(CONFIG_HAVE_SETUP_PER_CPU_AREA)
121extern void __init setup_per_cpu_areas(void); 121extern void __init setup_per_cpu_areas(void);
122#endif 122#endif
123extern void __init percpu_init_late(void);
124 123
125extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp); 124extern void __percpu *__alloc_percpu_gfp(size_t size, size_t align, gfp_t gfp);
126extern void __percpu *__alloc_percpu(size_t size, size_t align); 125extern void __percpu *__alloc_percpu(size_t size, size_t align);
diff --git a/init/main.c b/init/main.c
index 052481fbe363..c9a9ffff6ec6 100644
--- a/init/main.c
+++ b/init/main.c
@@ -500,7 +500,6 @@ static void __init mm_init(void)
500 page_ext_init_flatmem(); 500 page_ext_init_flatmem();
501 mem_init(); 501 mem_init();
502 kmem_cache_init(); 502 kmem_cache_init();
503 percpu_init_late();
504 pgtable_init(); 503 pgtable_init();
505 vmalloc_init(); 504 vmalloc_init();
506 ioremap_huge_init(); 505 ioremap_huge_init();
diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h
index c4c8fc49780b..2e9d9bcb6fa2 100644
--- a/mm/percpu-internal.h
+++ b/mm/percpu-internal.h
@@ -11,14 +11,12 @@ struct pcpu_chunk {
11#endif 11#endif
12 12
13 struct list_head list; /* linked to pcpu_slot lists */ 13 struct list_head list; /* linked to pcpu_slot lists */
14 int free_size; /* free bytes in the chunk */ 14 int free_bytes; /* free bytes in the chunk */
15 int contig_hint; /* max contiguous size hint */ 15 int contig_bits; /* max contiguous size hint */
16 void *base_addr; /* base address of this chunk */ 16 void *base_addr; /* base address of this chunk */
17 17
18 int map_used; /* # of map entries used before the sentry */ 18 unsigned long *alloc_map; /* allocation map */
19 int map_alloc; /* # of map entries allocated */ 19 unsigned long *bound_map; /* boundary map */
20 int *map; /* allocation map */
21 struct list_head map_extend_list;/* on pcpu_map_extend_chunks */
22 20
23 void *data; /* chunk data */ 21 void *data; /* chunk data */
24 int first_free; /* no free below this */ 22 int first_free; /* no free below this */
@@ -45,6 +43,30 @@ extern int pcpu_nr_empty_pop_pages;
45extern struct pcpu_chunk *pcpu_first_chunk; 43extern struct pcpu_chunk *pcpu_first_chunk;
46extern struct pcpu_chunk *pcpu_reserved_chunk; 44extern struct pcpu_chunk *pcpu_reserved_chunk;
47 45
46/**
47 * pcpu_nr_pages_to_map_bits - converts the pages to size of bitmap
48 * @pages: number of physical pages
49 *
50 * This conversion is from physical pages to the number of bits
51 * required in the bitmap.
52 */
53static inline int pcpu_nr_pages_to_map_bits(int pages)
54{
55 return pages * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
56}
57
58/**
59 * pcpu_chunk_map_bits - helper to convert nr_pages to size of bitmap
60 * @chunk: chunk of interest
61 *
62 * This conversion is from the number of physical pages that the chunk
63 * serves to the number of bits in the bitmap.
64 */
65static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk)
66{
67 return pcpu_nr_pages_to_map_bits(chunk->nr_pages);
68}
69
48#ifdef CONFIG_PERCPU_STATS 70#ifdef CONFIG_PERCPU_STATS
49 71
50#include <linux/spinlock.h> 72#include <linux/spinlock.h>
diff --git a/mm/percpu-km.c b/mm/percpu-km.c
index eb58aa4c0997..d2a76642c4ae 100644
--- a/mm/percpu-km.c
+++ b/mm/percpu-km.c
@@ -69,7 +69,7 @@ static struct pcpu_chunk *pcpu_create_chunk(void)
69 chunk->base_addr = page_address(pages) - pcpu_group_offsets[0]; 69 chunk->base_addr = page_address(pages) - pcpu_group_offsets[0];
70 70
71 spin_lock_irq(&pcpu_lock); 71 spin_lock_irq(&pcpu_lock);
72 pcpu_chunk_populated(chunk, 0, nr_pages); 72 pcpu_chunk_populated(chunk, 0, nr_pages, false);
73 spin_unlock_irq(&pcpu_lock); 73 spin_unlock_irq(&pcpu_lock);
74 74
75 pcpu_stats_chunk_alloc(); 75 pcpu_stats_chunk_alloc();
diff --git a/mm/percpu-stats.c b/mm/percpu-stats.c
index e146b585fd18..ad03d73aa5fe 100644
--- a/mm/percpu-stats.c
+++ b/mm/percpu-stats.c
@@ -29,65 +29,85 @@ static int cmpint(const void *a, const void *b)
29} 29}
30 30
31/* 31/*
32 * Iterates over all chunks to find the max # of map entries used. 32 * Iterates over all chunks to find the max nr_alloc entries.
33 */ 33 */
34static int find_max_map_used(void) 34static int find_max_nr_alloc(void)
35{ 35{
36 struct pcpu_chunk *chunk; 36 struct pcpu_chunk *chunk;
37 int slot, max_map_used; 37 int slot, max_nr_alloc;
38 38
39 max_map_used = 0; 39 max_nr_alloc = 0;
40 for (slot = 0; slot < pcpu_nr_slots; slot++) 40 for (slot = 0; slot < pcpu_nr_slots; slot++)
41 list_for_each_entry(chunk, &pcpu_slot[slot], list) 41 list_for_each_entry(chunk, &pcpu_slot[slot], list)
42 max_map_used = max(max_map_used, chunk->map_used); 42 max_nr_alloc = max(max_nr_alloc, chunk->nr_alloc);
43 43
44 return max_map_used; 44 return max_nr_alloc;
45} 45}
46 46
47/* 47/*
48 * Prints out chunk state. Fragmentation is considered between 48 * Prints out chunk state. Fragmentation is considered between
49 * the beginning of the chunk to the last allocation. 49 * the beginning of the chunk to the last allocation.
50 *
51 * All statistics are in bytes unless stated otherwise.
50 */ 52 */
51static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk, 53static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
52 int *buffer) 54 int *buffer)
53{ 55{
54 int i, s_index, e_index, last_alloc, alloc_sign, as_len; 56 int i, last_alloc, as_len, start, end;
55 int *alloc_sizes, *p; 57 int *alloc_sizes, *p;
56 /* statistics */ 58 /* statistics */
57 int sum_frag = 0, max_frag = 0; 59 int sum_frag = 0, max_frag = 0;
58 int cur_min_alloc = 0, cur_med_alloc = 0, cur_max_alloc = 0; 60 int cur_min_alloc = 0, cur_med_alloc = 0, cur_max_alloc = 0;
59 61
60 alloc_sizes = buffer; 62 alloc_sizes = buffer;
61 s_index = (chunk->start_offset) ? 1 : 0;
62 e_index = chunk->map_used - ((chunk->end_offset) ? 1 : 0);
63
64 /* find last allocation */
65 last_alloc = -1;
66 for (i = e_index - 1; i >= s_index; i--) {
67 if (chunk->map[i] & 1) {
68 last_alloc = i;
69 break;
70 }
71 }
72 63
73 /* if the chunk is not empty - ignoring reserve */ 64 /*
74 if (last_alloc >= s_index) { 65 * find_last_bit returns the start value if nothing found.
75 as_len = last_alloc + 1 - s_index; 66 * Therefore, we must determine if it is a failure of find_last_bit
76 67 * and set the appropriate value.
77 /* 68 */
78 * Iterate through chunk map computing size info. 69 last_alloc = find_last_bit(chunk->alloc_map,
79 * The first bit is overloaded to be a used flag. 70 pcpu_chunk_map_bits(chunk) -
80 * negative = free space, positive = allocated 71 chunk->end_offset / PCPU_MIN_ALLOC_SIZE - 1);
81 */ 72 last_alloc = test_bit(last_alloc, chunk->alloc_map) ?
82 for (i = 0, p = chunk->map + s_index; i < as_len; i++, p++) { 73 last_alloc + 1 : 0;
83 alloc_sign = (*p & 1) ? 1 : -1; 74
84 alloc_sizes[i] = alloc_sign * 75 as_len = 0;
85 ((p[1] & ~1) - (p[0] & ~1)); 76 start = chunk->start_offset;
77
78 /*
79 * If a bit is set in the allocation map, the bound_map identifies
80 * where the allocation ends. If the allocation is not set, the
81 * bound_map does not identify free areas as it is only kept accurate
82 * on allocation, not free.
83 *
84 * Positive values are allocations and negative values are free
85 * fragments.
86 */
87 while (start < last_alloc) {
88 if (test_bit(start, chunk->alloc_map)) {
89 end = find_next_bit(chunk->bound_map, last_alloc,
90 start + 1);
91 alloc_sizes[as_len] = 1;
92 } else {
93 end = find_next_bit(chunk->alloc_map, last_alloc,
94 start + 1);
95 alloc_sizes[as_len] = -1;
86 } 96 }
87 97
88 sort(alloc_sizes, as_len, sizeof(chunk->map[0]), cmpint, NULL); 98 alloc_sizes[as_len++] *= (end - start) * PCPU_MIN_ALLOC_SIZE;
99
100 start = end;
101 }
102
103 /*
104 * The negative values are free fragments and thus sorting gives the
105 * free fragments at the beginning in largest first order.
106 */
107 if (as_len > 0) {
108 sort(alloc_sizes, as_len, sizeof(int), cmpint, NULL);
89 109
90 /* Iterate through the unallocated fragements. */ 110 /* iterate through the unallocated fragments */
91 for (i = 0, p = alloc_sizes; *p < 0 && i < as_len; i++, p++) { 111 for (i = 0, p = alloc_sizes; *p < 0 && i < as_len; i++, p++) {
92 sum_frag -= *p; 112 sum_frag -= *p;
93 max_frag = max(max_frag, -1 * (*p)); 113 max_frag = max(max_frag, -1 * (*p));
@@ -101,8 +121,8 @@ static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
101 P("nr_alloc", chunk->nr_alloc); 121 P("nr_alloc", chunk->nr_alloc);
102 P("max_alloc_size", chunk->max_alloc_size); 122 P("max_alloc_size", chunk->max_alloc_size);
103 P("empty_pop_pages", chunk->nr_empty_pop_pages); 123 P("empty_pop_pages", chunk->nr_empty_pop_pages);
104 P("free_size", chunk->free_size); 124 P("free_bytes", chunk->free_bytes);
105 P("contig_hint", chunk->contig_hint); 125 P("contig_bytes", chunk->contig_bits * PCPU_MIN_ALLOC_SIZE);
106 P("sum_frag", sum_frag); 126 P("sum_frag", sum_frag);
107 P("max_frag", max_frag); 127 P("max_frag", max_frag);
108 P("cur_min_alloc", cur_min_alloc); 128 P("cur_min_alloc", cur_min_alloc);
@@ -114,22 +134,23 @@ static void chunk_map_stats(struct seq_file *m, struct pcpu_chunk *chunk,
114static int percpu_stats_show(struct seq_file *m, void *v) 134static int percpu_stats_show(struct seq_file *m, void *v)
115{ 135{
116 struct pcpu_chunk *chunk; 136 struct pcpu_chunk *chunk;
117 int slot, max_map_used; 137 int slot, max_nr_alloc;
118 int *buffer; 138 int *buffer;
119 139
120alloc_buffer: 140alloc_buffer:
121 spin_lock_irq(&pcpu_lock); 141 spin_lock_irq(&pcpu_lock);
122 max_map_used = find_max_map_used(); 142 max_nr_alloc = find_max_nr_alloc();
123 spin_unlock_irq(&pcpu_lock); 143 spin_unlock_irq(&pcpu_lock);
124 144
125 buffer = vmalloc(max_map_used * sizeof(pcpu_first_chunk->map[0])); 145 /* there can be at most this many free and allocated fragments */
146 buffer = vmalloc((2 * max_nr_alloc + 1) * sizeof(int));
126 if (!buffer) 147 if (!buffer)
127 return -ENOMEM; 148 return -ENOMEM;
128 149
129 spin_lock_irq(&pcpu_lock); 150 spin_lock_irq(&pcpu_lock);
130 151
131 /* if the buffer allocated earlier is too small */ 152 /* if the buffer allocated earlier is too small */
132 if (max_map_used < find_max_map_used()) { 153 if (max_nr_alloc < find_max_nr_alloc()) {
133 spin_unlock_irq(&pcpu_lock); 154 spin_unlock_irq(&pcpu_lock);
134 vfree(buffer); 155 vfree(buffer);
135 goto alloc_buffer; 156 goto alloc_buffer;
diff --git a/mm/percpu.c b/mm/percpu.c
index 84cc2559d4aa..986d900e6680 100644
--- a/mm/percpu.c
+++ b/mm/percpu.c
@@ -86,10 +86,9 @@
86 86
87#include "percpu-internal.h" 87#include "percpu-internal.h"
88 88
89#define PCPU_SLOT_BASE_SHIFT 5 /* 1-31 shares the same slot */ 89/* the slots are sorted by free bytes left, 1-31 bytes share the same slot */
90#define PCPU_DFL_MAP_ALLOC 16 /* start a map with 16 ents */ 90#define PCPU_SLOT_BASE_SHIFT 5
91#define PCPU_ATOMIC_MAP_MARGIN_LOW 32 91
92#define PCPU_ATOMIC_MAP_MARGIN_HIGH 64
93#define PCPU_EMPTY_POP_PAGES_LOW 2 92#define PCPU_EMPTY_POP_PAGES_LOW 2
94#define PCPU_EMPTY_POP_PAGES_HIGH 4 93#define PCPU_EMPTY_POP_PAGES_HIGH 4
95 94
@@ -218,10 +217,10 @@ static int pcpu_size_to_slot(int size)
218 217
219static int pcpu_chunk_slot(const struct pcpu_chunk *chunk) 218static int pcpu_chunk_slot(const struct pcpu_chunk *chunk)
220{ 219{
221 if (chunk->free_size < sizeof(int) || chunk->contig_hint < sizeof(int)) 220 if (chunk->free_bytes < PCPU_MIN_ALLOC_SIZE || chunk->contig_bits == 0)
222 return 0; 221 return 0;
223 222
224 return pcpu_size_to_slot(chunk->free_size); 223 return pcpu_size_to_slot(chunk->free_bytes);
225} 224}
226 225
227/* set the pointer to a chunk in a page struct */ 226/* set the pointer to a chunk in a page struct */
@@ -317,38 +316,6 @@ static void pcpu_mem_free(void *ptr)
317} 316}
318 317
319/** 318/**
320 * pcpu_count_occupied_pages - count the number of pages an area occupies
321 * @chunk: chunk of interest
322 * @i: index of the area in question
323 *
324 * Count the number of pages chunk's @i'th area occupies. When the area's
325 * start and/or end address isn't aligned to page boundary, the straddled
326 * page is included in the count iff the rest of the page is free.
327 */
328static int pcpu_count_occupied_pages(struct pcpu_chunk *chunk, int i)
329{
330 int off = chunk->map[i] & ~1;
331 int end = chunk->map[i + 1] & ~1;
332
333 if (!PAGE_ALIGNED(off) && i > 0) {
334 int prev = chunk->map[i - 1];
335
336 if (!(prev & 1) && prev <= round_down(off, PAGE_SIZE))
337 off = round_down(off, PAGE_SIZE);
338 }
339
340 if (!PAGE_ALIGNED(end) && i + 1 < chunk->map_used) {
341 int next = chunk->map[i + 1];
342 int nend = chunk->map[i + 2] & ~1;
343
344 if (!(next & 1) && nend >= round_up(end, PAGE_SIZE))
345 end = round_up(end, PAGE_SIZE);
346 }
347
348 return max_t(int, PFN_DOWN(end) - PFN_UP(off), 0);
349}
350
351/**
352 * pcpu_chunk_relocate - put chunk in the appropriate chunk slot 319 * pcpu_chunk_relocate - put chunk in the appropriate chunk slot
353 * @chunk: chunk of interest 320 * @chunk: chunk of interest
354 * @oslot: the previous slot it was on 321 * @oslot: the previous slot it was on
@@ -374,358 +341,270 @@ static void pcpu_chunk_relocate(struct pcpu_chunk *chunk, int oslot)
374} 341}
375 342
376/** 343/**
377 * pcpu_need_to_extend - determine whether chunk area map needs to be extended 344 * pcpu_cnt_pop_pages- counts populated backing pages in range
378 * @chunk: chunk of interest 345 * @chunk: chunk of interest
379 * @is_atomic: the allocation context 346 * @bit_off: start offset
347 * @bits: size of area to check
380 * 348 *
381 * Determine whether area map of @chunk needs to be extended. If 349 * Calculates the number of populated pages in the region
382 * @is_atomic, only the amount necessary for a new allocation is 350 * [page_start, page_end). This keeps track of how many empty populated
383 * considered; however, async extension is scheduled if the left amount is 351 * pages are available and decide if async work should be scheduled.
384 * low. If !@is_atomic, it aims for more empty space. Combined, this
385 * ensures that the map is likely to have enough available space to
386 * accomodate atomic allocations which can't extend maps directly.
387 *
388 * CONTEXT:
389 * pcpu_lock.
390 * 352 *
391 * RETURNS: 353 * RETURNS:
392 * New target map allocation length if extension is necessary, 0 354 * The nr of populated pages.
393 * otherwise.
394 */ 355 */
395static int pcpu_need_to_extend(struct pcpu_chunk *chunk, bool is_atomic) 356static inline int pcpu_cnt_pop_pages(struct pcpu_chunk *chunk, int bit_off,
357 int bits)
396{ 358{
397 int margin, new_alloc; 359 int page_start = PFN_UP(bit_off * PCPU_MIN_ALLOC_SIZE);
398 360 int page_end = PFN_DOWN((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
399 lockdep_assert_held(&pcpu_lock);
400 361
401 if (is_atomic) { 362 if (page_start >= page_end)
402 margin = 3;
403
404 if (chunk->map_alloc <
405 chunk->map_used + PCPU_ATOMIC_MAP_MARGIN_LOW) {
406 if (list_empty(&chunk->map_extend_list)) {
407 list_add_tail(&chunk->map_extend_list,
408 &pcpu_map_extend_chunks);
409 pcpu_schedule_balance_work();
410 }
411 }
412 } else {
413 margin = PCPU_ATOMIC_MAP_MARGIN_HIGH;
414 }
415
416 if (chunk->map_alloc >= chunk->map_used + margin)
417 return 0; 363 return 0;
418 364
419 new_alloc = PCPU_DFL_MAP_ALLOC; 365 /*
420 while (new_alloc < chunk->map_used + margin) 366 * bitmap_weight counts the number of bits set in a bitmap up to
421 new_alloc *= 2; 367 * the specified number of bits. This is counting the populated
422 368 * pages up to page_end and then subtracting the populated pages
423 return new_alloc; 369 * up to page_start to count the populated pages in
370 * [page_start, page_end).
371 */
372 return bitmap_weight(chunk->populated, page_end) -
373 bitmap_weight(chunk->populated, page_start);
424} 374}
425 375
426/** 376/**
427 * pcpu_extend_area_map - extend area map of a chunk 377 * pcpu_chunk_update - updates the chunk metadata given a free area
428 * @chunk: chunk of interest 378 * @chunk: chunk of interest
429 * @new_alloc: new target allocation length of the area map 379 * @bit_off: chunk offset
380 * @bits: size of free area
430 * 381 *
431 * Extend area map of @chunk to have @new_alloc entries. 382 * This updates the chunk's contig hint given a free area.
383 */
384static void pcpu_chunk_update(struct pcpu_chunk *chunk, int bit_off, int bits)
385{
386 if (bits > chunk->contig_bits)
387 chunk->contig_bits = bits;
388}
389
390/**
391 * pcpu_chunk_refresh_hint - updates metadata about a chunk
392 * @chunk: chunk of interest
432 * 393 *
433 * CONTEXT: 394 * Iterates over the chunk to find the largest free area.
434 * Does GFP_KERNEL allocation. Grabs and releases pcpu_lock.
435 * 395 *
436 * RETURNS: 396 * Updates:
437 * 0 on success, -errno on failure. 397 * chunk->contig_bits
398 * nr_empty_pop_pages
438 */ 399 */
439static int pcpu_extend_area_map(struct pcpu_chunk *chunk, int new_alloc) 400static void pcpu_chunk_refresh_hint(struct pcpu_chunk *chunk)
440{ 401{
441 int *old = NULL, *new = NULL; 402 int bits, nr_empty_pop_pages;
442 size_t old_size = 0, new_size = new_alloc * sizeof(new[0]); 403 int rs, re; /* region start, region end */
443 unsigned long flags;
444 404
445 lockdep_assert_held(&pcpu_alloc_mutex); 405 /* clear metadata */
406 chunk->contig_bits = 0;
446 407
447 new = pcpu_mem_zalloc(new_size); 408 bits = nr_empty_pop_pages = 0;
448 if (!new) 409 pcpu_for_each_unpop_region(chunk->alloc_map, rs, re, 0,
449 return -ENOMEM; 410 pcpu_chunk_map_bits(chunk)) {
411 bits = re - rs;
450 412
451 /* acquire pcpu_lock and switch to new area map */ 413 pcpu_chunk_update(chunk, rs, bits);
452 spin_lock_irqsave(&pcpu_lock, flags);
453 414
454 if (new_alloc <= chunk->map_alloc) 415 nr_empty_pop_pages += pcpu_cnt_pop_pages(chunk, rs, bits);
455 goto out_unlock; 416 }
456 417
457 old_size = chunk->map_alloc * sizeof(chunk->map[0]); 418 /*
458 old = chunk->map; 419 * Keep track of nr_empty_pop_pages.
420 *
421 * The chunk maintains the previous number of free pages it held,
422 * so the delta is used to update the global counter. The reserved
423 * chunk is not part of the free page count as they are populated
424 * at init and are special to serving reserved allocations.
425 */
426 if (chunk != pcpu_reserved_chunk)
427 pcpu_nr_empty_pop_pages +=
428 (nr_empty_pop_pages - chunk->nr_empty_pop_pages);
459 429
460 memcpy(new, old, old_size); 430 chunk->nr_empty_pop_pages = nr_empty_pop_pages;
431}
461 432
462 chunk->map_alloc = new_alloc; 433/**
463 chunk->map = new; 434 * pcpu_is_populated - determines if the region is populated
464 new = NULL; 435 * @chunk: chunk of interest
436 * @bit_off: chunk offset
437 * @bits: size of area
438 * @next_off: return value for the next offset to start searching
439 *
440 * For atomic allocations, check if the backing pages are populated.
441 *
442 * RETURNS:
443 * Bool if the backing pages are populated.
444 * next_index is to skip over unpopulated blocks in pcpu_find_block_fit.
445 */
446static bool pcpu_is_populated(struct pcpu_chunk *chunk, int bit_off, int bits,
447 int *next_off)
448{
449 int page_start, page_end, rs, re;
465 450
466out_unlock: 451 page_start = PFN_DOWN(bit_off * PCPU_MIN_ALLOC_SIZE);
467 spin_unlock_irqrestore(&pcpu_lock, flags); 452 page_end = PFN_UP((bit_off + bits) * PCPU_MIN_ALLOC_SIZE);
468 453
469 /* 454 rs = page_start;
470 * pcpu_mem_free() might end up calling vfree() which uses 455 pcpu_next_unpop(chunk->populated, &rs, &re, page_end);
471 * IRQ-unsafe lock and thus can't be called under pcpu_lock. 456 if (rs >= page_end)
472 */ 457 return true;
473 pcpu_mem_free(old);
474 pcpu_mem_free(new);
475 458
476 return 0; 459 *next_off = re * PAGE_SIZE / PCPU_MIN_ALLOC_SIZE;
460 return false;
477} 461}
478 462
479/** 463/**
480 * pcpu_fit_in_area - try to fit the requested allocation in a candidate area 464 * pcpu_find_block_fit - finds the block index to start searching
481 * @chunk: chunk the candidate area belongs to 465 * @chunk: chunk of interest
482 * @off: the offset to the start of the candidate area 466 * @alloc_bits: size of request in allocation units
483 * @this_size: the size of the candidate area 467 * @align: alignment of area (max PAGE_SIZE bytes)
484 * @size: the size of the target allocation 468 * @pop_only: use populated regions only
485 * @align: the alignment of the target allocation 469 *
486 * @pop_only: only allocate from already populated region 470 * RETURNS:
487 * 471 * The offset in the bitmap to begin searching.
488 * We're trying to allocate @size bytes aligned at @align. @chunk's area 472 * -1 if no offset is found.
489 * at @off sized @this_size is a candidate. This function determines
490 * whether the target allocation fits in the candidate area and returns the
491 * number of bytes to pad after @off. If the target area doesn't fit, -1
492 * is returned.
493 *
494 * If @pop_only is %true, this function only considers the already
495 * populated part of the candidate area.
496 */ 473 */
497static int pcpu_fit_in_area(struct pcpu_chunk *chunk, int off, int this_size, 474static int pcpu_find_block_fit(struct pcpu_chunk *chunk, int alloc_bits,
498 int size, int align, bool pop_only) 475 size_t align, bool pop_only)
499{ 476{
500 int cand_off = off; 477 int bit_off, bits;
478 int re; /* region end */
501 479
502 while (true) { 480 pcpu_for_each_unpop_region(chunk->alloc_map, bit_off, re, 0,
503 int head = ALIGN(cand_off, align) - off; 481 pcpu_chunk_map_bits(chunk)) {
504 int page_start, page_end, rs, re; 482 bits = re - bit_off;
505 483
506 if (this_size < head + size) 484 /* check alignment */
507 return -1; 485 bits -= ALIGN(bit_off, align) - bit_off;
486 bit_off = ALIGN(bit_off, align);
487 if (bits < alloc_bits)
488 continue;
508 489
509 if (!pop_only) 490 bits = alloc_bits;
510 return head; 491 if (!pop_only || pcpu_is_populated(chunk, bit_off, bits,
492 &bit_off))
493 break;
511 494
512 /* 495 bits = 0;
513 * If the first unpopulated page is beyond the end of the
514 * allocation, the whole allocation is populated;
515 * otherwise, retry from the end of the unpopulated area.
516 */
517 page_start = PFN_DOWN(head + off);
518 page_end = PFN_UP(head + off + size);
519
520 rs = page_start;
521 pcpu_next_unpop(chunk->populated, &rs, &re,
522 PFN_UP(off + this_size));
523 if (rs >= page_end)
524 return head;
525 cand_off = re * PAGE_SIZE;
526 } 496 }
497
498 if (bit_off == pcpu_chunk_map_bits(chunk))
499 return -1;
500
501 return bit_off;
527} 502}
528 503
529/** 504/**
530 * pcpu_alloc_area - allocate area from a pcpu_chunk 505 * pcpu_alloc_area - allocates an area from a pcpu_chunk
531 * @chunk: chunk of interest 506 * @chunk: chunk of interest
532 * @size: wanted size in bytes 507 * @alloc_bits: size of request in allocation units
533 * @align: wanted align 508 * @align: alignment of area (max PAGE_SIZE)
534 * @pop_only: allocate only from the populated area 509 * @start: bit_off to start searching
535 * @occ_pages_p: out param for the number of pages the area occupies
536 *
537 * Try to allocate @size bytes area aligned at @align from @chunk.
538 * Note that this function only allocates the offset. It doesn't
539 * populate or map the area.
540 *
541 * @chunk->map must have at least two free slots.
542 * 510 *
543 * CONTEXT: 511 * This function takes in a @start offset to begin searching to fit an
544 * pcpu_lock. 512 * allocation of @alloc_bits with alignment @align. If it confirms a
513 * valid free area, it then updates the allocation and boundary maps
514 * accordingly.
545 * 515 *
546 * RETURNS: 516 * RETURNS:
547 * Allocated offset in @chunk on success, -1 if no matching area is 517 * Allocated addr offset in @chunk on success.
548 * found. 518 * -1 if no matching area is found.
549 */ 519 */
550static int pcpu_alloc_area(struct pcpu_chunk *chunk, int size, int align, 520static int pcpu_alloc_area(struct pcpu_chunk *chunk, int alloc_bits,
551 bool pop_only, int *occ_pages_p) 521 size_t align, int start)
552{ 522{
553 int oslot = pcpu_chunk_slot(chunk); 523 size_t align_mask = (align) ? (align - 1) : 0;
554 int max_contig = 0; 524 int bit_off, end, oslot;
555 int i, off;
556 bool seen_free = false;
557 int *p;
558
559 for (i = chunk->first_free, p = chunk->map + i; i < chunk->map_used; i++, p++) {
560 int head, tail;
561 int this_size;
562
563 off = *p;
564 if (off & 1)
565 continue;
566
567 this_size = (p[1] & ~1) - off;
568 525
569 head = pcpu_fit_in_area(chunk, off, this_size, size, align, 526 lockdep_assert_held(&pcpu_lock);
570 pop_only);
571 if (head < 0) {
572 if (!seen_free) {
573 chunk->first_free = i;
574 seen_free = true;
575 }
576 max_contig = max(this_size, max_contig);
577 continue;
578 }
579
580 /*
581 * If head is small or the previous block is free,
582 * merge'em. Note that 'small' is defined as smaller
583 * than sizeof(int), which is very small but isn't too
584 * uncommon for percpu allocations.
585 */
586 if (head && (head < sizeof(int) || !(p[-1] & 1))) {
587 *p = off += head;
588 if (p[-1] & 1)
589 chunk->free_size -= head;
590 else
591 max_contig = max(*p - p[-1], max_contig);
592 this_size -= head;
593 head = 0;
594 }
595 527
596 /* if tail is small, just keep it around */ 528 oslot = pcpu_chunk_slot(chunk);
597 tail = this_size - head - size;
598 if (tail < sizeof(int)) {
599 tail = 0;
600 size = this_size - head;
601 }
602 529
603 /* split if warranted */ 530 /*
604 if (head || tail) { 531 * Search to find a fit.
605 int nr_extra = !!head + !!tail; 532 */
606 533 end = start + alloc_bits;
607 /* insert new subblocks */ 534 bit_off = bitmap_find_next_zero_area(chunk->alloc_map, end, start,
608 memmove(p + nr_extra + 1, p + 1, 535 alloc_bits, align_mask);
609 sizeof(chunk->map[0]) * (chunk->map_used - i)); 536 if (bit_off >= end)
610 chunk->map_used += nr_extra; 537 return -1;
611
612 if (head) {
613 if (!seen_free) {
614 chunk->first_free = i;
615 seen_free = true;
616 }
617 *++p = off += head;
618 ++i;
619 max_contig = max(head, max_contig);
620 }
621 if (tail) {
622 p[1] = off + size;
623 max_contig = max(tail, max_contig);
624 }
625 }
626 538
627 if (!seen_free) 539 /* update alloc map */
628 chunk->first_free = i + 1; 540 bitmap_set(chunk->alloc_map, bit_off, alloc_bits);
629 541
630 /* update hint and mark allocated */ 542 /* update boundary map */
631 if (i + 1 == chunk->map_used) 543 set_bit(bit_off, chunk->bound_map);
632 chunk->contig_hint = max_contig; /* fully scanned */ 544 bitmap_clear(chunk->bound_map, bit_off + 1, alloc_bits - 1);
633 else 545 set_bit(bit_off + alloc_bits, chunk->bound_map);
634 chunk->contig_hint = max(chunk->contig_hint,
635 max_contig);
636 546
637 chunk->free_size -= size; 547 chunk->free_bytes -= alloc_bits * PCPU_MIN_ALLOC_SIZE;
638 *p |= 1;
639 548
640 *occ_pages_p = pcpu_count_occupied_pages(chunk, i); 549 pcpu_chunk_refresh_hint(chunk);
641 pcpu_chunk_relocate(chunk, oslot);
642 return off;
643 }
644 550
645 chunk->contig_hint = max_contig; /* fully scanned */
646 pcpu_chunk_relocate(chunk, oslot); 551 pcpu_chunk_relocate(chunk, oslot);
647 552
648 /* tell the upper layer that this chunk has no matching area */ 553 return bit_off * PCPU_MIN_ALLOC_SIZE;
649 return -1;
650} 554}
651 555
652/** 556/**
653 * pcpu_free_area - free area to a pcpu_chunk 557 * pcpu_free_area - frees the corresponding offset
654 * @chunk: chunk of interest 558 * @chunk: chunk of interest
655 * @freeme: offset of area to free 559 * @off: addr offset into chunk
656 * @occ_pages_p: out param for the number of pages the area occupies
657 *
658 * Free area starting from @freeme to @chunk. Note that this function
659 * only modifies the allocation map. It doesn't depopulate or unmap
660 * the area.
661 * 560 *
662 * CONTEXT: 561 * This function determines the size of an allocation to free using
663 * pcpu_lock. 562 * the boundary bitmap and clears the allocation map.
664 */ 563 */
665static void pcpu_free_area(struct pcpu_chunk *chunk, int freeme, 564static void pcpu_free_area(struct pcpu_chunk *chunk, int off)
666 int *occ_pages_p)
667{ 565{
668 int oslot = pcpu_chunk_slot(chunk); 566 int bit_off, bits, end, oslot;
669 int off = 0;
670 unsigned i, j;
671 int to_free = 0;
672 int *p;
673 567
674 lockdep_assert_held(&pcpu_lock); 568 lockdep_assert_held(&pcpu_lock);
675 pcpu_stats_area_dealloc(chunk); 569 pcpu_stats_area_dealloc(chunk);
676 570
677 freeme |= 1; /* we are searching for <given offset, in use> pair */ 571 oslot = pcpu_chunk_slot(chunk);
678
679 i = 0;
680 j = chunk->map_used;
681 while (i != j) {
682 unsigned k = (i + j) / 2;
683 off = chunk->map[k];
684 if (off < freeme)
685 i = k + 1;
686 else if (off > freeme)
687 j = k;
688 else
689 i = j = k;
690 }
691 BUG_ON(off != freeme);
692 572
693 if (i < chunk->first_free) 573 bit_off = off / PCPU_MIN_ALLOC_SIZE;
694 chunk->first_free = i;
695 574
696 p = chunk->map + i; 575 /* find end index */
697 *p = off &= ~1; 576 end = find_next_bit(chunk->bound_map, pcpu_chunk_map_bits(chunk),
698 chunk->free_size += (p[1] & ~1) - off; 577 bit_off + 1);
578 bits = end - bit_off;
579 bitmap_clear(chunk->alloc_map, bit_off, bits);
699 580
700 *occ_pages_p = pcpu_count_occupied_pages(chunk, i); 581 /* update metadata */
582 chunk->free_bytes += bits * PCPU_MIN_ALLOC_SIZE;
701 583
702 /* merge with next? */ 584 pcpu_chunk_refresh_hint(chunk);
703 if (!(p[1] & 1))
704 to_free++;
705 /* merge with previous? */
706 if (i > 0 && !(p[-1] & 1)) {
707 to_free++;
708 i--;
709 p--;
710 }
711 if (to_free) {
712 chunk->map_used -= to_free;
713 memmove(p + 1, p + 1 + to_free,
714 (chunk->map_used - i) * sizeof(chunk->map[0]));
715 }
716 585
717 chunk->contig_hint = max(chunk->map[i + 1] - chunk->map[i] - 1, chunk->contig_hint);
718 pcpu_chunk_relocate(chunk, oslot); 586 pcpu_chunk_relocate(chunk, oslot);
719} 587}
720 588
589/**
590 * pcpu_alloc_first_chunk - creates chunks that serve the first chunk
591 * @tmp_addr: the start of the region served
592 * @map_size: size of the region served
593 *
594 * This is responsible for creating the chunks that serve the first chunk. The
595 * base_addr is page aligned down of @tmp_addr while the region end is page
596 * aligned up. Offsets are kept track of to determine the region served. All
597 * this is done to appease the bitmap allocator in avoiding partial blocks.
598 *
599 * RETURNS:
600 * Chunk serving the region at @tmp_addr of @map_size.
601 */
721static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr, 602static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
722 int map_size, 603 int map_size)
723 int *map,
724 int init_map_size)
725{ 604{
726 struct pcpu_chunk *chunk; 605 struct pcpu_chunk *chunk;
727 unsigned long aligned_addr; 606 unsigned long aligned_addr;
728 int start_offset, region_size; 607 int start_offset, offset_bits, region_size, region_bits;
729 608
730 /* region calculations */ 609 /* region calculations */
731 aligned_addr = tmp_addr & PAGE_MASK; 610 aligned_addr = tmp_addr & PAGE_MASK;
@@ -740,83 +619,99 @@ static struct pcpu_chunk * __init pcpu_alloc_first_chunk(unsigned long tmp_addr,
740 0); 619 0);
741 620
742 INIT_LIST_HEAD(&chunk->list); 621 INIT_LIST_HEAD(&chunk->list);
743 INIT_LIST_HEAD(&chunk->map_extend_list);
744 622
745 chunk->base_addr = (void *)aligned_addr; 623 chunk->base_addr = (void *)aligned_addr;
746 chunk->start_offset = start_offset; 624 chunk->start_offset = start_offset;
747 chunk->end_offset = region_size - chunk->start_offset - map_size; 625 chunk->end_offset = region_size - chunk->start_offset - map_size;
748 626
749 chunk->nr_pages = region_size >> PAGE_SHIFT; 627 chunk->nr_pages = region_size >> PAGE_SHIFT;
628 region_bits = pcpu_chunk_map_bits(chunk);
750 629
751 chunk->map = map; 630 chunk->alloc_map = memblock_virt_alloc(
752 chunk->map_alloc = init_map_size; 631 BITS_TO_LONGS(region_bits) *
632 sizeof(chunk->alloc_map[0]), 0);
633 chunk->bound_map = memblock_virt_alloc(
634 BITS_TO_LONGS(region_bits + 1) *
635 sizeof(chunk->bound_map[0]), 0);
753 636
754 /* manage populated page bitmap */ 637 /* manage populated page bitmap */
755 chunk->immutable = true; 638 chunk->immutable = true;
756 bitmap_fill(chunk->populated, chunk->nr_pages); 639 bitmap_fill(chunk->populated, chunk->nr_pages);
757 chunk->nr_populated = chunk->nr_pages; 640 chunk->nr_populated = chunk->nr_pages;
758 chunk->nr_empty_pop_pages = chunk->nr_pages; 641 chunk->nr_empty_pop_pages =
642 pcpu_cnt_pop_pages(chunk, start_offset / PCPU_MIN_ALLOC_SIZE,
643 map_size / PCPU_MIN_ALLOC_SIZE);
759 644
760 chunk->contig_hint = chunk->free_size = map_size; 645 chunk->contig_bits = map_size / PCPU_MIN_ALLOC_SIZE;
646 chunk->free_bytes = map_size;
761 647
762 if (chunk->start_offset) { 648 if (chunk->start_offset) {
763 /* hide the beginning of the bitmap */ 649 /* hide the beginning of the bitmap */
764 chunk->nr_empty_pop_pages--; 650 offset_bits = chunk->start_offset / PCPU_MIN_ALLOC_SIZE;
765 651 bitmap_set(chunk->alloc_map, 0, offset_bits);
766 chunk->map[0] = 1; 652 set_bit(0, chunk->bound_map);
767 chunk->map[1] = chunk->start_offset; 653 set_bit(offset_bits, chunk->bound_map);
768 chunk->map_used = 1;
769 } 654 }
770 655
771 /* set chunk's free region */
772 chunk->map[++chunk->map_used] =
773 (chunk->start_offset + chunk->free_size) | 1;
774
775 if (chunk->end_offset) { 656 if (chunk->end_offset) {
776 /* hide the end of the bitmap */ 657 /* hide the end of the bitmap */
777 chunk->nr_empty_pop_pages--; 658 offset_bits = chunk->end_offset / PCPU_MIN_ALLOC_SIZE;
778 659 bitmap_set(chunk->alloc_map,
779 chunk->map[++chunk->map_used] = region_size | 1; 660 pcpu_chunk_map_bits(chunk) - offset_bits,
661 offset_bits);
662 set_bit((start_offset + map_size) / PCPU_MIN_ALLOC_SIZE,
663 chunk->bound_map);
664 set_bit(region_bits, chunk->bound_map);
780 } 665 }
781 666
667 pcpu_chunk_refresh_hint(chunk);
668
782 return chunk; 669 return chunk;
783} 670}
784 671
785static struct pcpu_chunk *pcpu_alloc_chunk(void) 672static struct pcpu_chunk *pcpu_alloc_chunk(void)
786{ 673{
787 struct pcpu_chunk *chunk; 674 struct pcpu_chunk *chunk;
675 int region_bits;
788 676
789 chunk = pcpu_mem_zalloc(pcpu_chunk_struct_size); 677 chunk = pcpu_mem_zalloc(pcpu_chunk_struct_size);
790 if (!chunk) 678 if (!chunk)
791 return NULL; 679 return NULL;
792 680
793 chunk->map = pcpu_mem_zalloc(PCPU_DFL_MAP_ALLOC * 681 INIT_LIST_HEAD(&chunk->list);
794 sizeof(chunk->map[0])); 682 chunk->nr_pages = pcpu_unit_pages;
795 if (!chunk->map) { 683 region_bits = pcpu_chunk_map_bits(chunk);
796 pcpu_mem_free(chunk);
797 return NULL;
798 }
799 684
800 chunk->map_alloc = PCPU_DFL_MAP_ALLOC; 685 chunk->alloc_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits) *
801 chunk->map[0] = 0; 686 sizeof(chunk->alloc_map[0]));
802 chunk->map[1] = pcpu_unit_size | 1; 687 if (!chunk->alloc_map)
803 chunk->map_used = 1; 688 goto alloc_map_fail;
804 689
805 INIT_LIST_HEAD(&chunk->list); 690 chunk->bound_map = pcpu_mem_zalloc(BITS_TO_LONGS(region_bits + 1) *
806 INIT_LIST_HEAD(&chunk->map_extend_list); 691 sizeof(chunk->bound_map[0]));
807 chunk->free_size = pcpu_unit_size; 692 if (!chunk->bound_map)
808 chunk->contig_hint = pcpu_unit_size; 693 goto bound_map_fail;
809 694
810 chunk->nr_pages = pcpu_unit_pages; 695 /* init metadata */
696 chunk->contig_bits = region_bits;
697 chunk->free_bytes = chunk->nr_pages * PAGE_SIZE;
811 698
812 return chunk; 699 return chunk;
700
701bound_map_fail:
702 pcpu_mem_free(chunk->alloc_map);
703alloc_map_fail:
704 pcpu_mem_free(chunk);
705
706 return NULL;
813} 707}
814 708
815static void pcpu_free_chunk(struct pcpu_chunk *chunk) 709static void pcpu_free_chunk(struct pcpu_chunk *chunk)
816{ 710{
817 if (!chunk) 711 if (!chunk)
818 return; 712 return;
819 pcpu_mem_free(chunk->map); 713 pcpu_mem_free(chunk->bound_map);
714 pcpu_mem_free(chunk->alloc_map);
820 pcpu_mem_free(chunk); 715 pcpu_mem_free(chunk);
821} 716}
822 717
@@ -825,13 +720,17 @@ static void pcpu_free_chunk(struct pcpu_chunk *chunk)
825 * @chunk: pcpu_chunk which got populated 720 * @chunk: pcpu_chunk which got populated
826 * @page_start: the start page 721 * @page_start: the start page
827 * @page_end: the end page 722 * @page_end: the end page
723 * @for_alloc: if this is to populate for allocation
828 * 724 *
829 * Pages in [@page_start,@page_end) have been populated to @chunk. Update 725 * Pages in [@page_start,@page_end) have been populated to @chunk. Update
830 * the bookkeeping information accordingly. Must be called after each 726 * the bookkeeping information accordingly. Must be called after each
831 * successful population. 727 * successful population.
728 *
729 * If this is @for_alloc, do not increment pcpu_nr_empty_pop_pages because it
730 * is to serve an allocation in that area.
832 */ 731 */
833static void pcpu_chunk_populated(struct pcpu_chunk *chunk, 732static void pcpu_chunk_populated(struct pcpu_chunk *chunk, int page_start,
834 int page_start, int page_end) 733 int page_end, bool for_alloc)
835{ 734{
836 int nr = page_end - page_start; 735 int nr = page_end - page_start;
837 736
@@ -839,8 +738,11 @@ static void pcpu_chunk_populated(struct pcpu_chunk *chunk,
839 738
840 bitmap_set(chunk->populated, page_start, nr); 739 bitmap_set(chunk->populated, page_start, nr);
841 chunk->nr_populated += nr; 740 chunk->nr_populated += nr;
842 chunk->nr_empty_pop_pages += nr; 741
843 pcpu_nr_empty_pop_pages += nr; 742 if (!for_alloc) {
743 chunk->nr_empty_pop_pages += nr;
744 pcpu_nr_empty_pop_pages += nr;
745 }
844} 746}
845 747
846/** 748/**
@@ -945,19 +847,23 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
945 struct pcpu_chunk *chunk; 847 struct pcpu_chunk *chunk;
946 const char *err; 848 const char *err;
947 bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL; 849 bool is_atomic = (gfp & GFP_KERNEL) != GFP_KERNEL;
948 int occ_pages = 0; 850 int slot, off, cpu, ret;
949 int slot, off, new_alloc, cpu, ret;
950 unsigned long flags; 851 unsigned long flags;
951 void __percpu *ptr; 852 void __percpu *ptr;
853 size_t bits, bit_align;
952 854
953 /* 855 /*
954 * We want the lowest bit of offset available for in-use/free 856 * There is now a minimum allocation size of PCPU_MIN_ALLOC_SIZE,
955 * indicator, so force >= 16bit alignment and make size even. 857 * therefore alignment must be a minimum of that many bytes.
858 * An allocation may have internal fragmentation from rounding up
859 * of up to PCPU_MIN_ALLOC_SIZE - 1 bytes.
956 */ 860 */
957 if (unlikely(align < PCPU_MIN_ALLOC_SIZE)) 861 if (unlikely(align < PCPU_MIN_ALLOC_SIZE))
958 align = PCPU_MIN_ALLOC_SIZE; 862 align = PCPU_MIN_ALLOC_SIZE;
959 863
960 size = ALIGN(size, PCPU_MIN_ALLOC_SIZE); 864 size = ALIGN(size, PCPU_MIN_ALLOC_SIZE);
865 bits = size >> PCPU_MIN_ALLOC_SHIFT;
866 bit_align = align >> PCPU_MIN_ALLOC_SHIFT;
961 867
962 if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE || 868 if (unlikely(!size || size > PCPU_MIN_UNIT_SIZE || align > PAGE_SIZE ||
963 !is_power_of_2(align))) { 869 !is_power_of_2(align))) {
@@ -975,23 +881,13 @@ static void __percpu *pcpu_alloc(size_t size, size_t align, bool reserved,
975 if (reserved && pcpu_reserved_chunk) { 881 if (reserved && pcpu_reserved_chunk) {
976 chunk = pcpu_reserved_chunk; 882 chunk = pcpu_reserved_chunk;
977 883
978 if (size > chunk->contig_hint) { 884 off = pcpu_find_block_fit(chunk, bits, bit_align, is_atomic);
885 if (off < 0) {
979 err = "alloc from reserved chunk failed"; 886 err = "alloc from reserved chunk failed";
980 goto fail_unlock; 887 goto fail_unlock;
981 } 888 }
982 889
983 while ((new_alloc = pcpu_need_to_extend(chunk, is_atomic))) { 890 off = pcpu_alloc_area(chunk, bits, bit_align, off);
984 spin_unlock_irqrestore(&pcpu_lock, flags);
985 if (is_atomic ||
986 pcpu_extend_area_map(chunk, new_alloc) < 0) {
987 err = "failed to extend area map of reserved chunk";
988 goto fail;
989 }
990 spin_lock_irqsave(&pcpu_lock, flags);
991 }
992
993 off = pcpu_alloc_area(chunk, size, align, is_atomic,
994 &occ_pages);
995 if (off >= 0) 891 if (off >= 0)
996 goto area_found; 892 goto area_found;
997 893
@@ -1003,31 +899,15 @@ restart:
1003 /* search through normal chunks */ 899 /* search through normal chunks */
1004 for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) { 900 for (slot = pcpu_size_to_slot(size); slot < pcpu_nr_slots; slot++) {
1005 list_for_each_entry(chunk, &pcpu_slot[slot], list) { 901 list_for_each_entry(chunk, &pcpu_slot[slot], list) {
1006 if (size > chunk->contig_hint) 902 off = pcpu_find_block_fit(chunk, bits, bit_align,
903 is_atomic);
904 if (off < 0)
1007 continue; 905 continue;
1008 906
1009 new_alloc = pcpu_need_to_extend(chunk, is_atomic); 907 off = pcpu_alloc_area(chunk, bits, bit_align, off);
1010 if (new_alloc) {
1011 if (is_atomic)
1012 continue;
1013 spin_unlock_irqrestore(&pcpu_lock, flags);
1014 if (pcpu_extend_area_map(chunk,
1015 new_alloc) < 0) {
1016 err = "failed to extend area map";
1017 goto fail;
1018 }
1019 spin_lock_irqsave(&pcpu_lock, flags);
1020 /*
1021 * pcpu_lock has been dropped, need to
1022 * restart cpu_slot list walking.
1023 */
1024 goto restart;
1025 }
1026
1027 off = pcpu_alloc_area(chunk, size, align, is_atomic,
1028 &occ_pages);
1029 if (off >= 0) 908 if (off >= 0)
1030 goto area_found; 909 goto area_found;
910
1031 } 911 }
1032 } 912 }
1033 913
@@ -1077,23 +957,17 @@ area_found:
1077 957
1078 spin_lock_irqsave(&pcpu_lock, flags); 958 spin_lock_irqsave(&pcpu_lock, flags);
1079 if (ret) { 959 if (ret) {
1080 pcpu_free_area(chunk, off, &occ_pages); 960 pcpu_free_area(chunk, off);
1081 err = "failed to populate"; 961 err = "failed to populate";
1082 goto fail_unlock; 962 goto fail_unlock;
1083 } 963 }
1084 pcpu_chunk_populated(chunk, rs, re); 964 pcpu_chunk_populated(chunk, rs, re, true);
1085 spin_unlock_irqrestore(&pcpu_lock, flags); 965 spin_unlock_irqrestore(&pcpu_lock, flags);
1086 } 966 }
1087 967
1088 mutex_unlock(&pcpu_alloc_mutex); 968 mutex_unlock(&pcpu_alloc_mutex);
1089 } 969 }
1090 970
1091 if (chunk != pcpu_reserved_chunk) {
1092 spin_lock_irqsave(&pcpu_lock, flags);
1093 pcpu_nr_empty_pop_pages -= occ_pages;
1094 spin_unlock_irqrestore(&pcpu_lock, flags);
1095 }
1096
1097 if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW) 971 if (pcpu_nr_empty_pop_pages < PCPU_EMPTY_POP_PAGES_LOW)
1098 pcpu_schedule_balance_work(); 972 pcpu_schedule_balance_work();
1099 973
@@ -1211,7 +1085,6 @@ static void pcpu_balance_workfn(struct work_struct *work)
1211 if (chunk == list_first_entry(free_head, struct pcpu_chunk, list)) 1085 if (chunk == list_first_entry(free_head, struct pcpu_chunk, list))
1212 continue; 1086 continue;
1213 1087
1214 list_del_init(&chunk->map_extend_list);
1215 list_move(&chunk->list, &to_free); 1088 list_move(&chunk->list, &to_free);
1216 } 1089 }
1217 1090
@@ -1230,25 +1103,6 @@ static void pcpu_balance_workfn(struct work_struct *work)
1230 pcpu_destroy_chunk(chunk); 1103 pcpu_destroy_chunk(chunk);
1231 } 1104 }
1232 1105
1233 /* service chunks which requested async area map extension */
1234 do {
1235 int new_alloc = 0;
1236
1237 spin_lock_irq(&pcpu_lock);
1238
1239 chunk = list_first_entry_or_null(&pcpu_map_extend_chunks,
1240 struct pcpu_chunk, map_extend_list);
1241 if (chunk) {
1242 list_del_init(&chunk->map_extend_list);
1243 new_alloc = pcpu_need_to_extend(chunk, false);
1244 }
1245
1246 spin_unlock_irq(&pcpu_lock);
1247
1248 if (new_alloc)
1249 pcpu_extend_area_map(chunk, new_alloc);
1250 } while (chunk);
1251
1252 /* 1106 /*
1253 * Ensure there are certain number of free populated pages for 1107 * Ensure there are certain number of free populated pages for
1254 * atomic allocs. Fill up from the most packed so that atomic 1108 * atomic allocs. Fill up from the most packed so that atomic
@@ -1296,7 +1150,7 @@ retry_pop:
1296 if (!ret) { 1150 if (!ret) {
1297 nr_to_pop -= nr; 1151 nr_to_pop -= nr;
1298 spin_lock_irq(&pcpu_lock); 1152 spin_lock_irq(&pcpu_lock);
1299 pcpu_chunk_populated(chunk, rs, rs + nr); 1153 pcpu_chunk_populated(chunk, rs, rs + nr, false);
1300 spin_unlock_irq(&pcpu_lock); 1154 spin_unlock_irq(&pcpu_lock);
1301 } else { 1155 } else {
1302 nr_to_pop = 0; 1156 nr_to_pop = 0;
@@ -1335,7 +1189,7 @@ void free_percpu(void __percpu *ptr)
1335 void *addr; 1189 void *addr;
1336 struct pcpu_chunk *chunk; 1190 struct pcpu_chunk *chunk;
1337 unsigned long flags; 1191 unsigned long flags;
1338 int off, occ_pages; 1192 int off;
1339 1193
1340 if (!ptr) 1194 if (!ptr)
1341 return; 1195 return;
@@ -1349,13 +1203,10 @@ void free_percpu(void __percpu *ptr)
1349 chunk = pcpu_chunk_addr_search(addr); 1203 chunk = pcpu_chunk_addr_search(addr);
1350 off = addr - chunk->base_addr; 1204 off = addr - chunk->base_addr;
1351 1205
1352 pcpu_free_area(chunk, off, &occ_pages); 1206 pcpu_free_area(chunk, off);
1353
1354 if (chunk != pcpu_reserved_chunk)
1355 pcpu_nr_empty_pop_pages += occ_pages;
1356 1207
1357 /* if there are more than one fully free chunks, wake up grim reaper */ 1208 /* if there are more than one fully free chunks, wake up grim reaper */
1358 if (chunk->free_size == pcpu_unit_size) { 1209 if (chunk->free_bytes == pcpu_unit_size) {
1359 struct pcpu_chunk *pos; 1210 struct pcpu_chunk *pos;
1360 1211
1361 list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list) 1212 list_for_each_entry(pos, &pcpu_slot[pcpu_nr_slots - 1], list)
@@ -1651,8 +1502,6 @@ static void pcpu_dump_alloc_info(const char *lvl,
1651int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai, 1502int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
1652 void *base_addr) 1503 void *base_addr)
1653{ 1504{
1654 static int smap[PERCPU_DYNAMIC_EARLY_SLOTS] __initdata;
1655 static int dmap[PERCPU_DYNAMIC_EARLY_SLOTS] __initdata;
1656 size_t size_sum = ai->static_size + ai->reserved_size + ai->dyn_size; 1505 size_t size_sum = ai->static_size + ai->reserved_size + ai->dyn_size;
1657 size_t static_size, dyn_size; 1506 size_t static_size, dyn_size;
1658 struct pcpu_chunk *chunk; 1507 struct pcpu_chunk *chunk;
@@ -1787,8 +1636,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
1787 */ 1636 */
1788 tmp_addr = (unsigned long)base_addr + static_size; 1637 tmp_addr = (unsigned long)base_addr + static_size;
1789 map_size = ai->reserved_size ?: dyn_size; 1638 map_size = ai->reserved_size ?: dyn_size;
1790 chunk = pcpu_alloc_first_chunk(tmp_addr, map_size, smap, 1639 chunk = pcpu_alloc_first_chunk(tmp_addr, map_size);
1791 ARRAY_SIZE(smap));
1792 1640
1793 /* init dynamic chunk if necessary */ 1641 /* init dynamic chunk if necessary */
1794 if (ai->reserved_size) { 1642 if (ai->reserved_size) {
@@ -1797,8 +1645,7 @@ int __init pcpu_setup_first_chunk(const struct pcpu_alloc_info *ai,
1797 tmp_addr = (unsigned long)base_addr + static_size + 1645 tmp_addr = (unsigned long)base_addr + static_size +
1798 ai->reserved_size; 1646 ai->reserved_size;
1799 map_size = dyn_size; 1647 map_size = dyn_size;
1800 chunk = pcpu_alloc_first_chunk(tmp_addr, map_size, dmap, 1648 chunk = pcpu_alloc_first_chunk(tmp_addr, map_size);
1801 ARRAY_SIZE(dmap));
1802 } 1649 }
1803 1650
1804 /* link the first chunk in */ 1651 /* link the first chunk in */
@@ -2375,36 +2222,6 @@ void __init setup_per_cpu_areas(void)
2375#endif /* CONFIG_SMP */ 2222#endif /* CONFIG_SMP */
2376 2223
2377/* 2224/*
2378 * First and reserved chunks are initialized with temporary allocation
2379 * map in initdata so that they can be used before slab is online.
2380 * This function is called after slab is brought up and replaces those
2381 * with properly allocated maps.
2382 */
2383void __init percpu_init_late(void)
2384{
2385 struct pcpu_chunk *target_chunks[] =
2386 { pcpu_first_chunk, pcpu_reserved_chunk, NULL };
2387 struct pcpu_chunk *chunk;
2388 unsigned long flags;
2389 int i;
2390
2391 for (i = 0; (chunk = target_chunks[i]); i++) {
2392 int *map;
2393 const size_t size = PERCPU_DYNAMIC_EARLY_SLOTS * sizeof(map[0]);
2394
2395 BUILD_BUG_ON(size > PAGE_SIZE);
2396
2397 map = pcpu_mem_zalloc(size);
2398 BUG_ON(!map);
2399
2400 spin_lock_irqsave(&pcpu_lock, flags);
2401 memcpy(map, chunk->map, size);
2402 chunk->map = map;
2403 spin_unlock_irqrestore(&pcpu_lock, flags);
2404 }
2405}
2406
2407/*
2408 * Percpu allocator is initialized early during boot when neither slab or 2225 * Percpu allocator is initialized early during boot when neither slab or
2409 * workqueue is available. Plug async management until everything is up 2226 * workqueue is available. Plug async management until everything is up
2410 * and running. 2227 * and running.