From 3a4d5c94e959359ece6d6b55045c3f046677f55c Mon Sep 17 00:00:00 2001 From: "Michael S. Tsirkin" Date: Thu, 14 Jan 2010 06:17:27 +0000 Subject: vhost_net: a kernel-level virtio server What it is: vhost net is a character device that can be used to reduce the number of system calls involved in virtio networking. Existing virtio net code is used in the guest without modification. There's similarity with vringfd, with some differences and reduced scope - uses eventfd for signalling - structures can be moved around in memory at any time (good for migration, bug work-arounds in userspace) - write logging is supported (good for migration) - support memory table and not just an offset (needed for kvm) common virtio related code has been put in a separate file vhost.c and can be made into a separate module if/when more backends appear. I used Rusty's lguest.c as the source for developing this part : this supplied me with witty comments I wouldn't be able to write myself. What it is not: vhost net is not a bus, and not a generic new system call. No assumptions are made on how guest performs hypercalls. Userspace hypervisors are supported as well as kvm. How it works: Basically, we connect virtio frontend (configured by userspace) to a backend. The backend could be a network device, or a tap device. Backend is also configured by userspace, including vlan/mac etc. Status: This works for me, and I haven't see any crashes. Compared to userspace, people reported improved latency (as I save up to 4 system calls per packet), as well as better bandwidth and CPU utilization. Features that I plan to look at in the future: - mergeable buffers - zero copy - scalability tuning: figure out the best threading model to use Note on RCU usage (this is also documented in vhost.h, near private_pointer which is the value protected by this variant of RCU): what is happening is that the rcu_dereference() is being used in a workqueue item. The role of rcu_read_lock() is taken on by the start of execution of the workqueue item, of rcu_read_unlock() by the end of execution of the workqueue item, and of synchronize_rcu() by flush_workqueue()/flush_work(). In the future we might need to apply some gcc attribute or sparse annotation to the function passed to INIT_WORK(). Paul's ack below is for this RCU usage. (Includes fixes by Alan Cox , David L Stevens , Chris Wright ) Acked-by: Rusty Russell Acked-by: Arnd Bergmann Acked-by: "Paul E. McKenney" Signed-off-by: Michael S. Tsirkin Signed-off-by: David S. Miller --- include/linux/miscdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/miscdevice.h') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index adaf3c15e449..8b5f7cc0fba6 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -30,6 +30,7 @@ #define HPET_MINOR 228 #define FUSE_MINOR 229 #define KVM_MINOR 232 +#define VHOST_NET_MINOR 233 #define MISC_DYNAMIC_MINOR 255 struct device; -- cgit v1.2.2 From 578454ff7eab61d13a26b568f99a89a2c9edc881 Mon Sep 17 00:00:00 2001 From: Kay Sievers Date: Thu, 20 May 2010 18:07:20 +0200 Subject: driver core: add devname module aliases to allow module on-demand auto-loading This adds: alias: devname: to some common kernel modules, which will allow the on-demand loading of the kernel module when the device node is accessed. Ideally all these modules would be compiled-in, but distros seems too much in love with their modularization that we need to cover the common cases with this new facility. It will allow us to remove a bunch of pretty useless init scripts and modprobes from init scripts. The static device node aliases will be carried in the module itself. The program depmod will extract this information to a file in the module directory: $ cat /lib/modules/2.6.34-00650-g537b60d-dirty/modules.devname # Device nodes to trigger on-demand module loading. microcode cpu/microcode c10:184 fuse fuse c10:229 ppp_generic ppp c108:0 tun net/tun c10:200 dm_mod mapper/control c10:235 Udev will pick up the depmod created file on startup and create all the static device nodes which the kernel modules specify, so that these modules get automatically loaded when the device node is accessed: $ /sbin/udevd --debug ... static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184 static_dev_create_from_modules: mknod '/dev/fuse' c10:229 static_dev_create_from_modules: mknod '/dev/ppp' c108:0 static_dev_create_from_modules: mknod '/dev/net/tun' c10:200 static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235 udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666 udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666 A few device nodes are switched to statically allocated numbers, to allow the static nodes to work. This might also useful for systems which still run a plain static /dev, which is completely unsafe to use with any dynamic minor numbers. Note: The devname aliases must be limited to the *common* and *single*instance* device nodes, like the misc devices, and never be used for conceptually limited systems like the loop devices, which should rather get fixed properly and get a control node for losetup to talk to, instead of creating a random number of device nodes in advance, regardless if they are ever used. This facility is to hide the mess distros are creating with too modualized kernels, and just to hide that these modules are not compiled-in, and not to paper-over broken concepts. Thanks! :) Cc: Greg Kroah-Hartman Cc: David S. Miller Cc: Miklos Szeredi Cc: Chris Mason Cc: Alasdair G Kergon Cc: Tigran Aivazian Cc: Ian Kent Signed-Off-By: Kay Sievers Signed-off-by: Greg Kroah-Hartman --- include/linux/miscdevice.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux/miscdevice.h') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index 8b5f7cc0fba6..b631c46cffd9 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -31,6 +31,8 @@ #define FUSE_MINOR 229 #define KVM_MINOR 232 #define VHOST_NET_MINOR 233 +#define BTRFS_MINOR 234 +#define AUTOFS_MINOR 235 #define MISC_DYNAMIC_MINOR 255 struct device; -- cgit v1.2.2 From 79907d89c397b8bc2e05b347ec94e928ea919d33 Mon Sep 17 00:00:00 2001 From: Alan Cox Date: Wed, 9 Jun 2010 09:39:49 +0100 Subject: misc: Fix allocation 'borrowed' by vhost_net 10, 233 is allocated officially to /dev/kmview which is shipping in Ubuntu and Debian distributions. vhost_net seem to have borrowed it without making a proper request and this causes regressions in the other distributions. vhost_net can use a dynamic minor so use that instead. Also update the file with a comment to try and avoid future misunderstandings. cc: stable@kernel.org Signed-off-by: Alan Cox [ We should have caught this before 2.6.34 got released. - Linus ] Signed-off-by: Linus Torvalds --- include/linux/miscdevice.h | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) (limited to 'include/linux/miscdevice.h') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index b631c46cffd9..f6c9b7dcb9fd 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -3,6 +3,12 @@ #include #include +/* + * These allocations are managed by device@lanana.org. If you use an + * entry that is not in assigned your entry may well be moved and + * reassigned, or set dynamic if a fixed value is not justified. + */ + #define PSMOUSE_MINOR 1 #define MS_BUSMOUSE_MINOR 2 #define ATIXL_BUSMOUSE_MINOR 3 @@ -30,7 +36,6 @@ #define HPET_MINOR 228 #define FUSE_MINOR 229 #define KVM_MINOR 232 -#define VHOST_NET_MINOR 233 #define BTRFS_MINOR 234 #define AUTOFS_MINOR 235 #define MISC_DYNAMIC_MINOR 255 -- cgit v1.2.2 From 7e507eb6432afdd798d4c6dccf949b8c43ef151c Mon Sep 17 00:00:00 2001 From: Peter Rajnoha Date: Thu, 12 Aug 2010 04:14:05 +0100 Subject: dm: allow autoloading of dm mod Add devname:mapper/control and MAPPER_CTRL_MINOR module alias to support dm-mod module autoloading. Signed-off-by: Kay Sievers Signed-off-by: Peter Rajnoha Signed-off-by: Alasdair G Kergon --- include/linux/miscdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/miscdevice.h') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index f6c9b7dcb9fd..bafffc737903 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -38,6 +38,7 @@ #define KVM_MINOR 232 #define BTRFS_MINOR 234 #define AUTOFS_MINOR 235 +#define MAPPER_CTRL_MINOR 236 #define MISC_DYNAMIC_MINOR 255 struct device; -- cgit v1.2.2 From 8905aaafb4b5d9764c5b4b54c7d03eb41bb0a7e9 Mon Sep 17 00:00:00 2001 From: Kay Sievers Date: Thu, 19 Aug 2010 09:52:28 -0700 Subject: Input: uinput - add devname alias to allow module on-demand load Recent modprobe and udev versions allow to create device nodes for modules which are not loaded. Only the first access will cause the in-kernel module loader to pull-in the module. Systems which never access the device node will not needlessly load the module, and no longer need init scripts or other facilities to unconditionally load it. Signed-off-by: Kay Sievers Signed-off-by: Dmitry Torokhov --- include/linux/miscdevice.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux/miscdevice.h') diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h index bafffc737903..18fd13028ba1 100644 --- a/include/linux/miscdevice.h +++ b/include/linux/miscdevice.h @@ -33,6 +33,7 @@ #define MWAVE_MINOR 219 /* ACP/Mwave Modem */ #define MPT_MINOR 220 #define MPT2SAS_MINOR 221 +#define UINPUT_MINOR 223 #define HPET_MINOR 228 #define FUSE_MINOR 229 #define KVM_MINOR 232 -- cgit v1.2.2