diff options
| author | Ilya Dryomov <ilya.dryomov@inktank.com> | 2014-03-19 10:58:37 -0400 |
|---|---|---|
| committer | Sage Weil <sage@inktank.com> | 2014-04-05 00:07:26 -0400 |
| commit | e2b149cc4ba00766aceb87950c6de72ea7fc8b2e (patch) | |
| tree | 5f3d7b5dd55b7f75c412db786e1e6f4915ef9ed8 /include/linux | |
| parent | 6ed1002f368c63ef79d7f659fcb4368a90098132 (diff) | |
crush: add chooseleaf_vary_r tunable
The current crush_choose_firstn code will re-use the same 'r' value for
the recursive call. That means that if we are hitting a collision or
rejection for some reason (say, an OSD that is marked out) and need to
retry, we will keep making the same (bad) choice in that recursive
selection.
Introduce a tunable that fixes that behavior by incorporating the parent
'r' value into the recursive starting point, so that a different path
will be taken in subsequent placement attempts.
Note that this was done from the get-go for the new crush_choose_indep
algorithm.
This was exposed by a user who was seeing PGs stuck in active+remapped
after reweight-by-utilization because the up set mapped to a single OSD.
Reflects ceph.git commit a8e6c9fbf88bad056dd05d3eb790e98a5e43451a.
Signed-off-by: Ilya Dryomov <ilya.dryomov@inktank.com>
Reviewed-by: Josh Durgin <josh.durgin@inktank.com>
Diffstat (limited to 'include/linux')
| -rw-r--r-- | include/linux/crush/crush.h | 6 |
1 files changed, 6 insertions, 0 deletions
diff --git a/include/linux/crush/crush.h b/include/linux/crush/crush.h index acaa5615d634..75f36a6c7f67 100644 --- a/include/linux/crush/crush.h +++ b/include/linux/crush/crush.h | |||
| @@ -173,6 +173,12 @@ struct crush_map { | |||
| 173 | * apply to a collision: in that case we will retry as we used | 173 | * apply to a collision: in that case we will retry as we used |
| 174 | * to. */ | 174 | * to. */ |
| 175 | __u32 chooseleaf_descend_once; | 175 | __u32 chooseleaf_descend_once; |
| 176 | |||
| 177 | /* if non-zero, feed r into chooseleaf, bit-shifted right by (r-1) | ||
| 178 | * bits. a value of 1 is best for new clusters. for legacy clusters | ||
| 179 | * that want to limit reshuffling, a value of 3 or 4 will make the | ||
| 180 | * mappings line up a bit better with previous mappings. */ | ||
| 181 | __u8 chooseleaf_vary_r; | ||
| 176 | }; | 182 | }; |
| 177 | 183 | ||
| 178 | 184 | ||
