slub: drop mutex before deleting sysfs entry

Sasha Levin recently reported a lockdep problem resulting from the new attribute propagation introduced by kmemcg series. In short, slab_mutex will be called from within the sysfs attribute store function. This will create a dependency, that will later be held backwards when a cache is destroyed - since destruction occurs with the slab_mutex held, and then calls in to the sysfs directory removal function. In this patch, I propose to adopt a strategy close to what __kmem_cache_create does before calling sysfs_slab_add, and release the lock before the call to sysfs_slab_remove. This is pretty much the last operation in the kmem_cache_shutdown() path, so we could do better by splitting this and moving this call alone to later on. This will fit nicely when sysfs handling is consistent between all caches, but will look weird now. Lockdep info: ====================================================== [ INFO: possible circular locking dependency detected ] 3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Tainted: G W ------------------------------------------------------- trinity-child13/6961 is trying to acquire lock: (s_active#43){++++.+}, at: sysfs_addrm_finish+0x31/0x60 but task is already holding lock: (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x22/0xe0 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (slab_mutex){+.+.+.}: lock_acquire+0x1aa/0x240 __mutex_lock_common+0x59/0x5a0 mutex_lock_nested+0x3f/0x50 slab_attr_store+0xde/0x110 sysfs_write_file+0xfa/0x150 vfs_write+0xb0/0x180 sys_pwrite64+0x60/0xb0 tracesys+0xe1/0xe6 -> #0 (s_active#43){++++.+}: __lock_acquire+0x14df/0x1ca0 lock_acquire+0x1aa/0x240 sysfs_deactivate+0x122/0x1a0 sysfs_addrm_finish+0x31/0x60 sysfs_remove_dir+0x89/0xd0 kobject_del+0x16/0x40 __kmem_cache_shutdown+0x40/0x60 kmem_cache_destroy+0x40/0xe0 mon_text_release+0x78/0xe0 __fput+0x122/0x2d0 ____fput+0x9/0x10 task_work_run+0xbe/0x100 do_exit+0x432/0xbd0 do_group_exit+0x84/0xd0 get_signal_to_deliver+0x81d/0x930 do_signal+0x3a/0x950 do_notify_resume+0x3e/0x90 int_signal+0x12/0x17 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(slab_mutex); lock(s_active#43); lock(slab_mutex); lock(s_active#43); *** DEADLOCK *** 2 locks held by trinity-child13/6961: #0: (mon_lock){+.+.+.}, at: mon_text_release+0x25/0xe0 #1: (slab_mutex){+.+.+.}, at: kmem_cache_destroy+0x22/0xe0 stack backtrace: Pid: 6961, comm: trinity-child13 Tainted: G W 3.7.0-rc4-next-20121106-sasha-00008-g353b62f #117 Call Trace: print_circular_bug+0x1fb/0x20c __lock_acquire+0x14df/0x1ca0 lock_acquire+0x1aa/0x240 sysfs_deactivate+0x122/0x1a0 sysfs_addrm_finish+0x31/0x60 sysfs_remove_dir+0x89/0xd0 kobject_del+0x16/0x40 __kmem_cache_shutdown+0x40/0x60 kmem_cache_destroy+0x40/0xe0 mon_text_release+0x78/0xe0 __fput+0x122/0x2d0 ____fput+0x9/0x10 task_work_run+0xbe/0x100 do_exit+0x432/0xbd0 do_group_exit+0x84/0xd0 get_signal_to_deliver+0x81d/0x930 do_signal+0x3a/0x950 do_notify_resume+0x3e/0x90 int_signal+0x12/0x17 Signed-off-by: Glauber Costa <glommer@parallels.com> Reported-by: Sasha Levin <sasha.levin@oracle.com> Cc: Michal Hocko <mhocko@suse.cz> Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Christoph Lameter <cl@linux-foundation.org> Cc: Pekka Enberg <penberg@kernel.org> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
author: Glauber Costa <glommer@parallels.com> 2012-12-18 17:23:13 -0500
committer: Linus Torvalds <torvalds@linux-foundation.org> 2012-12-18 18:02:15 -0500
commit: 5413dfba88d6f6090c8cdf181ab9172d24752f8f (patch)
tree: 6d8d90ef6a342d35d753533f3a06b3f2d9dd3dd1 /mm/slub.c
parent: ebe945c27628fca03723582eba138acc2e2f3d15 (diff)
1 files changed, 12 insertions, 1 deletions
diff --git a/mm/slub.c b/mm/slub.c
index efe2cffc29b0..ba2ca53f6c3a 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3153,8 +3153,19 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
 {
        int rc = kmem_cache_close(s);
-        if (!rc)
+        if (!rc) {
+                /*
+                 * We do the same lock strategy around sysfs_slab_add, see
+                 * __kmem_cache_create. Because this is pretty much the last
+                 * operation we do and the lock will be released shortly after
+                 * that in slab_common.c, we could just move sysfs_slab_remove
+                 * to a later point in common code. We should do that when we
+                 * have a common sysfs framework for all allocators.
+                 */
+                mutex_unlock(&slab_mutex);
                sysfs_slab_remove(s);
+                mutex_lock(&slab_mutex);
+        }
        return rc;
 }
author	Glauber Costa <glommer@parallels.com>	2012-12-18 17:23:13 -0500
committer	Linus Torvalds <torvalds@linux-foundation.org>	2012-12-18 18:02:15 -0500
commit	5413dfba88d6f6090c8cdf181ab9172d24752f8f (patch)
tree	6d8d90ef6a342d35d753533f3a06b3f2d9dd3dd1 /mm/slub.c
parent	ebe945c27628fca03723582eba138acc2e2f3d15 (diff)

diff --git a/mm/slub.c b/mm/slub.c index efe2cffc29b0..ba2ca53f6c3a 100644 --- a/mm/slub.c +++ b/mm/slub.c
@@ -3153,8 +3153,19 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
3153	{	3153	{
3154	int rc = kmem_cache_close(s);	3154	int rc = kmem_cache_close(s);
3155		3155
3156	if (!rc)	3156	if (!rc) {
		3157	/*
		3158	* We do the same lock strategy around sysfs_slab_add, see
		3159	* __kmem_cache_create. Because this is pretty much the last
		3160	* operation we do and the lock will be released shortly after
		3161	* that in slab_common.c, we could just move sysfs_slab_remove
		3162	* to a later point in common code. We should do that when we
		3163	* have a common sysfs framework for all allocators.
		3164	*/
		3165	mutex_unlock(&slab_mutex);
3157	sysfs_slab_remove(s);	3166	sysfs_slab_remove(s);
		3167	mutex_lock(&slab_mutex);
		3168	}
3158		3169
3159	return rc;	3170	return rc;
3160	}	3171	}