diff options
| author | Tejun Heo <htejun@gmail.com> | 2005-10-20 10:47:40 -0400 |
|---|---|---|
| committer | Jens Axboe <axboe@nelson.home.kernel.dk> | 2005-10-28 02:45:47 -0400 |
| commit | 4c9f7836406f41ef9da6ee68d7a0448fdb97b5ef (patch) | |
| tree | 9ed099a117a24bee56a5e920ea95f9e2bb82add6 /Documentation/block | |
| parent | 98b11471d72a374f346bec50a00d0887719b85b3 (diff) | |
[PATCH] 05/05 update biodoc to match new generic dispatch api
Updates biodoc to reflect changes in elevator API
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jens Axboe <axboe@suse.de>
Diffstat (limited to 'Documentation/block')
| -rw-r--r-- | Documentation/block/biodoc.txt | 113 |
1 files changed, 52 insertions, 61 deletions
diff --git a/Documentation/block/biodoc.txt b/Documentation/block/biodoc.txt index 6dd274d7e1..2d65c21821 100644 --- a/Documentation/block/biodoc.txt +++ b/Documentation/block/biodoc.txt | |||
| @@ -906,9 +906,20 @@ Aside: | |||
| 906 | 906 | ||
| 907 | 907 | ||
| 908 | 4. The I/O scheduler | 908 | 4. The I/O scheduler |
| 909 | I/O schedulers are now per queue. They should be runtime switchable and modular | 909 | I/O scheduler, a.k.a. elevator, is implemented in two layers. Generic dispatch |
| 910 | but aren't yet. Jens has most bits to do this, but the sysfs implementation is | 910 | queue and specific I/O schedulers. Unless stated otherwise, elevator is used |
| 911 | missing. | 911 | to refer to both parts and I/O scheduler to specific I/O schedulers. |
| 912 | |||
| 913 | Block layer implements generic dispatch queue in ll_rw_blk.c and elevator.c. | ||
| 914 | The generic dispatch queue is responsible for properly ordering barrier | ||
| 915 | requests, requeueing, handling non-fs requests and all other subtleties. | ||
| 916 | |||
| 917 | Specific I/O schedulers are responsible for ordering normal filesystem | ||
| 918 | requests. They can also choose to delay certain requests to improve | ||
| 919 | throughput or whatever purpose. As the plural form indicates, there are | ||
| 920 | multiple I/O schedulers. They can be built as modules but at least one should | ||
| 921 | be built inside the kernel. Each queue can choose different one and can also | ||
| 922 | change to another one dynamically. | ||
| 912 | 923 | ||
| 913 | A block layer call to the i/o scheduler follows the convention elv_xxx(). This | 924 | A block layer call to the i/o scheduler follows the convention elv_xxx(). This |
| 914 | calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh, | 925 | calls elevator_xxx_fn in the elevator switch (drivers/block/elevator.c). Oh, |
| @@ -921,44 +932,36 @@ keeping work. | |||
| 921 | The functions an elevator may implement are: (* are mandatory) | 932 | The functions an elevator may implement are: (* are mandatory) |
| 922 | elevator_merge_fn called to query requests for merge with a bio | 933 | elevator_merge_fn called to query requests for merge with a bio |
| 923 | 934 | ||
| 924 | elevator_merge_req_fn " " " with another request | 935 | elevator_merge_req_fn called when two requests get merged. the one |
| 936 | which gets merged into the other one will be | ||
| 937 | never seen by I/O scheduler again. IOW, after | ||
| 938 | being merged, the request is gone. | ||
| 925 | 939 | ||
| 926 | elevator_merged_fn called when a request in the scheduler has been | 940 | elevator_merged_fn called when a request in the scheduler has been |
| 927 | involved in a merge. It is used in the deadline | 941 | involved in a merge. It is used in the deadline |
| 928 | scheduler for example, to reposition the request | 942 | scheduler for example, to reposition the request |
| 929 | if its sorting order has changed. | 943 | if its sorting order has changed. |
| 930 | 944 | ||
| 931 | *elevator_next_req_fn returns the next scheduled request, or NULL | 945 | elevator_dispatch_fn fills the dispatch queue with ready requests. |
| 932 | if there are none (or none are ready). | 946 | I/O schedulers are free to postpone requests by |
| 947 | not filling the dispatch queue unless @force | ||
| 948 | is non-zero. Once dispatched, I/O schedulers | ||
| 949 | are not allowed to manipulate the requests - | ||
| 950 | they belong to generic dispatch queue. | ||
| 933 | 951 | ||
| 934 | *elevator_add_req_fn called to add a new request into the scheduler | 952 | elevator_add_req_fn called to add a new request into the scheduler |
| 935 | 953 | ||
| 936 | elevator_queue_empty_fn returns true if the merge queue is empty. | 954 | elevator_queue_empty_fn returns true if the merge queue is empty. |
| 937 | Drivers shouldn't use this, but rather check | 955 | Drivers shouldn't use this, but rather check |
| 938 | if elv_next_request is NULL (without losing the | 956 | if elv_next_request is NULL (without losing the |
| 939 | request if one exists!) | 957 | request if one exists!) |
| 940 | 958 | ||
| 941 | elevator_remove_req_fn This is called when a driver claims ownership of | ||
| 942 | the target request - it now belongs to the | ||
| 943 | driver. It must not be modified or merged. | ||
| 944 | Drivers must not lose the request! A subsequent | ||
| 945 | call of elevator_next_req_fn must return the | ||
| 946 | _next_ request. | ||
| 947 | |||
| 948 | elevator_requeue_req_fn called to add a request to the scheduler. This | ||
| 949 | is used when the request has alrnadebeen | ||
| 950 | returned by elv_next_request, but hasn't | ||
| 951 | completed. If this is not implemented then | ||
| 952 | elevator_add_req_fn is called instead. | ||
| 953 | |||
| 954 | elevator_former_req_fn | 959 | elevator_former_req_fn |
| 955 | elevator_latter_req_fn These return the request before or after the | 960 | elevator_latter_req_fn These return the request before or after the |
| 956 | one specified in disk sort order. Used by the | 961 | one specified in disk sort order. Used by the |
| 957 | block layer to find merge possibilities. | 962 | block layer to find merge possibilities. |
| 958 | 963 | ||
| 959 | elevator_completed_req_fn called when a request is completed. This might | 964 | elevator_completed_req_fn called when a request is completed. |
| 960 | come about due to being merged with another or | ||
| 961 | when the device completes the request. | ||
| 962 | 965 | ||
| 963 | elevator_may_queue_fn returns true if the scheduler wants to allow the | 966 | elevator_may_queue_fn returns true if the scheduler wants to allow the |
| 964 | current context to queue a new request even if | 967 | current context to queue a new request even if |
| @@ -967,13 +970,33 @@ elevator_may_queue_fn returns true if the scheduler wants to allow the | |||
| 967 | 970 | ||
| 968 | elevator_set_req_fn | 971 | elevator_set_req_fn |
| 969 | elevator_put_req_fn Must be used to allocate and free any elevator | 972 | elevator_put_req_fn Must be used to allocate and free any elevator |
| 970 | specific storate for a request. | 973 | specific storage for a request. |
| 974 | |||
| 975 | elevator_activate_req_fn Called when device driver first sees a request. | ||
| 976 | I/O schedulers can use this callback to | ||
| 977 | determine when actual execution of a request | ||
| 978 | starts. | ||
| 979 | elevator_deactivate_req_fn Called when device driver decides to delay | ||
| 980 | a request by requeueing it. | ||
| 971 | 981 | ||
| 972 | elevator_init_fn | 982 | elevator_init_fn |
| 973 | elevator_exit_fn Allocate and free any elevator specific storage | 983 | elevator_exit_fn Allocate and free any elevator specific storage |
| 974 | for a queue. | 984 | for a queue. |
| 975 | 985 | ||
| 976 | 4.2 I/O scheduler implementation | 986 | 4.2 Request flows seen by I/O schedulers |
| 987 | All requests seens by I/O schedulers strictly follow one of the following three | ||
| 988 | flows. | ||
| 989 | |||
| 990 | set_req_fn -> | ||
| 991 | |||
| 992 | i. add_req_fn -> (merged_fn ->)* -> dispatch_fn -> activate_req_fn -> | ||
| 993 | (deactivate_req_fn -> activate_req_fn ->)* -> completed_req_fn | ||
| 994 | ii. add_req_fn -> (merged_fn ->)* -> merge_req_fn | ||
| 995 | iii. [none] | ||
| 996 | |||
| 997 | -> put_req_fn | ||
| 998 | |||
| 999 | 4.3 I/O scheduler implementation | ||
| 977 | The generic i/o scheduler algorithm attempts to sort/merge/batch requests for | 1000 | The generic i/o scheduler algorithm attempts to sort/merge/batch requests for |
| 978 | optimal disk scan and request servicing performance (based on generic | 1001 | optimal disk scan and request servicing performance (based on generic |
| 979 | principles and device capabilities), optimized for: | 1002 | principles and device capabilities), optimized for: |
| @@ -993,18 +1016,7 @@ request in sort order to prevent binary tree lookups. | |||
| 993 | This arrangement is not a generic block layer characteristic however, so | 1016 | This arrangement is not a generic block layer characteristic however, so |
| 994 | elevators may implement queues as they please. | 1017 | elevators may implement queues as they please. |
| 995 | 1018 | ||
| 996 | ii. Last merge hint | 1019 | ii. Merge hash |
| 997 | The last merge hint is part of the generic queue layer. I/O schedulers must do | ||
| 998 | some management on it. For the most part, the most important thing is to make | ||
| 999 | sure q->last_merge is cleared (set to NULL) when the request on it is no longer | ||
| 1000 | a candidate for merging (for example if it has been sent to the driver). | ||
| 1001 | |||
| 1002 | The last merge performed is cached as a hint for the subsequent request. If | ||
| 1003 | sequential data is being submitted, the hint is used to perform merges without | ||
| 1004 | any scanning. This is not sufficient when there are multiple processes doing | ||
| 1005 | I/O though, so a "merge hash" is used by some schedulers. | ||
| 1006 | |||
| 1007 | iii. Merge hash | ||
| 1008 | AS and deadline use a hash table indexed by the last sector of a request. This | 1020 | AS and deadline use a hash table indexed by the last sector of a request. This |
| 1009 | enables merging code to quickly look up "back merge" candidates, even when | 1021 | enables merging code to quickly look up "back merge" candidates, even when |
| 1010 | multiple I/O streams are being performed at once on one disk. | 1022 | multiple I/O streams are being performed at once on one disk. |
| @@ -1013,29 +1025,8 @@ multiple I/O streams are being performed at once on one disk. | |||
| 1013 | are far less common than "back merges" due to the nature of most I/O patterns. | 1025 | are far less common than "back merges" due to the nature of most I/O patterns. |
| 1014 | Front merges are handled by the binary trees in AS and deadline schedulers. | 1026 | Front merges are handled by the binary trees in AS and deadline schedulers. |
| 1015 | 1027 | ||
| 1016 | iv. Handling barrier cases | 1028 | iii. Plugging the queue to batch requests in anticipation of opportunities for |
| 1017 | A request with flags REQ_HARDBARRIER or REQ_SOFTBARRIER must not be ordered | 1029 | merge/sort optimizations |
| 1018 | around. That is, they must be processed after all older requests, and before | ||
| 1019 | any newer ones. This includes merges! | ||
| 1020 | |||
| 1021 | In AS and deadline schedulers, barriers have the effect of flushing the reorder | ||
| 1022 | queue. The performance cost of this will vary from nothing to a lot depending | ||
| 1023 | on i/o patterns and device characteristics. Obviously they won't improve | ||
| 1024 | performance, so their use should be kept to a minimum. | ||
| 1025 | |||
| 1026 | v. Handling insertion position directives | ||
| 1027 | A request may be inserted with a position directive. The directives are one of | ||
| 1028 | ELEVATOR_INSERT_BACK, ELEVATOR_INSERT_FRONT, ELEVATOR_INSERT_SORT. | ||
| 1029 | |||
| 1030 | ELEVATOR_INSERT_SORT is a general directive for non-barrier requests. | ||
| 1031 | ELEVATOR_INSERT_BACK is used to insert a barrier to the back of the queue. | ||
| 1032 | ELEVATOR_INSERT_FRONT is used to insert a barrier to the front of the queue, and | ||
| 1033 | overrides the ordering requested by any previous barriers. In practice this is | ||
| 1034 | harmless and required, because it is used for SCSI requeueing. This does not | ||
| 1035 | require flushing the reorder queue, so does not impose a performance penalty. | ||
| 1036 | |||
| 1037 | vi. Plugging the queue to batch requests in anticipation of opportunities for | ||
| 1038 | merge/sort optimizations | ||
| 1039 | 1030 | ||
| 1040 | This is just the same as in 2.4 so far, though per-device unplugging | 1031 | This is just the same as in 2.4 so far, though per-device unplugging |
| 1041 | support is anticipated for 2.5. Also with a priority-based i/o scheduler, | 1032 | support is anticipated for 2.5. Also with a priority-based i/o scheduler, |
| @@ -1069,7 +1060,7 @@ Aside: | |||
| 1069 | blk_kick_queue() to unplug a specific queue (right away ?) | 1060 | blk_kick_queue() to unplug a specific queue (right away ?) |
| 1070 | or optionally, all queues, is in the plan. | 1061 | or optionally, all queues, is in the plan. |
| 1071 | 1062 | ||
| 1072 | 4.3 I/O contexts | 1063 | 4.4 I/O contexts |
| 1073 | I/O contexts provide a dynamically allocated per process data area. They may | 1064 | I/O contexts provide a dynamically allocated per process data area. They may |
| 1074 | be used in I/O schedulers, and in the block layer (could be used for IO statis, | 1065 | be used in I/O schedulers, and in the block layer (could be used for IO statis, |
| 1075 | priorities for example). See *io_context in drivers/block/ll_rw_blk.c, and | 1066 | priorities for example). See *io_context in drivers/block/ll_rw_blk.c, and |
