aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation/cgroups
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2015-06-25 19:00:17 -0400
committerLinus Torvalds <torvalds@linux-foundation.org>2015-06-25 19:00:17 -0400
commite4bc13adfd016fc1036838170288b5680d1a98b0 (patch)
tree8d2cb749397749439732f3a827cb7f2336408337 /Documentation/cgroups
parentad90fb97515b732bc27a0109baa10af636c3c8cd (diff)
parent3e1534cf4a2a8278e811e7c84a79da1a02347b8b (diff)
Merge branch 'for-4.2/writeback' of git://git.kernel.dk/linux-block
Pull cgroup writeback support from Jens Axboe: "This is the big pull request for adding cgroup writeback support. This code has been in development for a long time, and it has been simmering in for-next for a good chunk of this cycle too. This is one of those problems that has been talked about for at least half a decade, finally there's a solution and code to go with it. Also see last weeks writeup on LWN: http://lwn.net/Articles/648292/" * 'for-4.2/writeback' of git://git.kernel.dk/linux-block: (85 commits) writeback, blkio: add documentation for cgroup writeback support vfs, writeback: replace FS_CGROUP_WRITEBACK with SB_I_CGROUPWB writeback: do foreign inode detection iff cgroup writeback is enabled v9fs: fix error handling in v9fs_session_init() bdi: fix wrong error return value in cgwb_create() buffer: remove unusued 'ret' variable writeback: disassociate inodes from dying bdi_writebacks writeback: implement foreign cgroup inode bdi_writeback switching writeback: add lockdep annotation to inode_to_wb() writeback: use unlocked_inode_to_wb transaction in inode_congested() writeback: implement unlocked_inode_to_wb transaction and use it for stat updates writeback: implement [locked_]inode_to_wb_and_lock_list() writeback: implement foreign cgroup inode detection writeback: make writeback_control track the inode being written back writeback: relocate wb[_try]_get(), wb_put(), inode_{attach|detach}_wb() mm: vmscan: disable memcg direct reclaim stalling if cgroup writeback support is in use writeback: implement memcg writeback domain based throttling writeback: reset wb_domain->dirty_limit[_tstmp] when memcg domain size changes writeback: implement memcg wb_domain writeback: update wb_over_bg_thresh() to use wb_domain aware operations ...
Diffstat (limited to 'Documentation/cgroups')
-rw-r--r--Documentation/cgroups/blkio-controller.txt83
-rw-r--r--Documentation/cgroups/memory.txt1
2 files changed, 79 insertions, 5 deletions
diff --git a/Documentation/cgroups/blkio-controller.txt b/Documentation/cgroups/blkio-controller.txt
index cd556b914786..68b6a6a470b0 100644
--- a/Documentation/cgroups/blkio-controller.txt
+++ b/Documentation/cgroups/blkio-controller.txt
@@ -387,8 +387,81 @@ groups and put applications in that group which are not driving enough
387IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle 387IO to keep disk busy. In that case set group_idle=0, and CFQ will not idle
388on individual groups and throughput should improve. 388on individual groups and throughput should improve.
389 389
390What works 390Writeback
391========== 391=========
392- Currently only sync IO queues are support. All the buffered writes are 392
393 still system wide and not per group. Hence we will not see service 393Page cache is dirtied through buffered writes and shared mmaps and
394 differentiation between buffered writes between groups. 394written asynchronously to the backing filesystem by the writeback
395mechanism. Writeback sits between the memory and IO domains and
396regulates the proportion of dirty memory by balancing dirtying and
397write IOs.
398
399On traditional cgroup hierarchies, relationships between different
400controllers cannot be established making it impossible for writeback
401to operate accounting for cgroup resource restrictions and all
402writeback IOs are attributed to the root cgroup.
403
404If both the blkio and memory controllers are used on the v2 hierarchy
405and the filesystem supports cgroup writeback, writeback operations
406correctly follow the resource restrictions imposed by both memory and
407blkio controllers.
408
409Writeback examines both system-wide and per-cgroup dirty memory status
410and enforces the more restrictive of the two. Also, writeback control
411parameters which are absolute values - vm.dirty_bytes and
412vm.dirty_background_bytes - are distributed across cgroups according
413to their current writeback bandwidth.
414
415There's a peculiarity stemming from the discrepancy in ownership
416granularity between memory controller and writeback. While memory
417controller tracks ownership per page, writeback operates on inode
418basis. cgroup writeback bridges the gap by tracking ownership by
419inode but migrating ownership if too many foreign pages, pages which
420don't match the current inode ownership, have been encountered while
421writing back the inode.
422
423This is a conscious design choice as writeback operations are
424inherently tied to inodes making strictly following page ownership
425complicated and inefficient. The only use case which suffers from
426this compromise is multiple cgroups concurrently dirtying disjoint
427regions of the same inode, which is an unlikely use case and decided
428to be unsupported. Note that as memory controller assigns page
429ownership on the first use and doesn't update it until the page is
430released, even if cgroup writeback strictly follows page ownership,
431multiple cgroups dirtying overlapping areas wouldn't work as expected.
432In general, write-sharing an inode across multiple cgroups is not well
433supported.
434
435Filesystem support for cgroup writeback
436---------------------------------------
437
438A filesystem can make writeback IOs cgroup-aware by updating
439address_space_operations->writepage[s]() to annotate bio's using the
440following two functions.
441
442* wbc_init_bio(@wbc, @bio)
443
444 Should be called for each bio carrying writeback data and associates
445 the bio with the inode's owner cgroup. Can be called anytime
446 between bio allocation and submission.
447
448* wbc_account_io(@wbc, @page, @bytes)
449
450 Should be called for each data segment being written out. While
451 this function doesn't care exactly when it's called during the
452 writeback session, it's the easiest and most natural to call it as
453 data segments are added to a bio.
454
455With writeback bio's annotated, cgroup support can be enabled per
456super_block by setting MS_CGROUPWB in ->s_flags. This allows for
457selective disabling of cgroup writeback support which is helpful when
458certain filesystem features, e.g. journaled data mode, are
459incompatible.
460
461wbc_init_bio() binds the specified bio to its cgroup. Depending on
462the configuration, the bio may be executed at a lower priority and if
463the writeback session is holding shared resources, e.g. a journal
464entry, may lead to priority inversion. There is no one easy solution
465for the problem. Filesystems can try to work around specific problem
466cases by skipping wbc_init_bio() or using bio_associate_blkcg()
467directly.
diff --git a/Documentation/cgroups/memory.txt b/Documentation/cgroups/memory.txt
index f456b4315e86..ff71e16cc752 100644
--- a/Documentation/cgroups/memory.txt
+++ b/Documentation/cgroups/memory.txt
@@ -493,6 +493,7 @@ pgpgin - # of charging events to the memory cgroup. The charging
493pgpgout - # of uncharging events to the memory cgroup. The uncharging 493pgpgout - # of uncharging events to the memory cgroup. The uncharging
494 event happens each time a page is unaccounted from the cgroup. 494 event happens each time a page is unaccounted from the cgroup.
495swap - # of bytes of swap usage 495swap - # of bytes of swap usage
496dirty - # of bytes that are waiting to get written back to the disk.
496writeback - # of bytes of file/anon cache that are queued for syncing to 497writeback - # of bytes of file/anon cache that are queued for syncing to
497 disk. 498 disk.
498inactive_anon - # of bytes of anonymous and swap cache memory on inactive 499inactive_anon - # of bytes of anonymous and swap cache memory on inactive