staging: ramster: add how-to document

Add how-to documentation that provides a step-by-step guide for configuring and trying out a ramster cluster. Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: Dan Magenheimer <dan.magenheimer@oracle.com> 2013-05-20 10:52:17 -0400
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2013-05-20 11:21:04 -0400
commit: 8bb3e55103b37869175333e00fc01b34b0459529 (patch)
tree: 96eb4df3801d92460a82b708014406f06df2bdd5
parent: 642f2ecc092f4d2d5a9b7219090531508017c324 (diff)
1 files changed, 366 insertions, 0 deletions
diff --git a/drivers/staging/zcache/ramster/ramster-howto.txt b/drivers/staging/zcache/ramster/ramster-howto.txt
new file mode 100644
index 000000000000..7b1ee3bbfdd5
--- /dev/null
+++ b/drivers/staging/zcache/ramster/ramster-howto.txt
@@ -0,0 +1,366 @@
+                        RAMSTER HOW-TO
+Author: Dan Magenheimer
+Ramster maintainer: Konrad Wilk <konrad.wilk@oracle.com>
+This is a HOWTO document for ramster which, as of this writing, is in
+the kernel as a subdirectory of zcache in drivers/staging, called ramster.
+(Zcache can be built with or without ramster functionality.)  If enabled
+and properly configured, ramster allows memory capacity load balancing
+across multiple machines in a cluster.  Further, the ramster code serves
+as an example of asynchronous access for zcache (as well as cleancache and
+frontswap) that may prove useful for future transcendent memory
+implementations, such as KVM and NVRAM.  While ramster works today on
+any network connection that supports kernel sockets, its features may
+become more interesting on future high-speed fabrics/interconnects.
+Ramster requires both kernel and userland support.  The userland support,
+called ramster-tools, is known to work with EL6-based distros, but is a
+set of poorly-hacked slightly-modified cluster tools based on ocfs2, which
+includes an init file, a config file, and a userland binary that interfaces
+to the kernel.  This state of userland support reflects the abysmal userland
+skills of this suitably-embarrassed author; any help/patches to turn
+ramster-tools into more distributable rpms/debs useful for a wider range
+of distros would be appreciated.  The source RPM that can be used as a
+starting point is available at:
+    http://oss.oracle.com/projects/tmem/files/RAMster/ 
+As a result of this author's ignorance, userland setup described in this
+HOWTO assumes an EL6 distro and is described in EL6 syntax.  Apologies
+if this offends anyone!
+Kernel support has only been tested on x86_64.  Systems with an active
+ocfs2 filesystem should work, but since ramster leverages a lot of
+code from ocfs2, there may be latent issues.  A kernel configuration that
+includes CONFIG_OCFS2_FS should build OK, and should certainly run OK
+if no ocfs2 filesystem is mounted.
+This HOWTO demonstrates memory capacity load balancing for a two-node
+cluster, where one node called the "local" node becomes overcommitted
+and the other node called the "remote" node provides additional RAM
+capacity for use by the local node.  Ramster is capable of more complex
+topologies; see the last section titled "ADVANCED RAMSTER TOPOLOGIES".
+If you find any terms in this HOWTO unfamiliar or don't understand the
+motivation for ramster, the following LWN reading is recommended:
+-- Transcendent Memory in a Nutshell (lwn.net/Articles/454795)
+-- The future calculus of memory management (lwn.net/Articles/475681)
+And since ramster is built on top of zcache, this article may be helpful:
+-- In-kernel memory compression (lwn.net/Articles/545244)
+Now that you've memorized the contents of those articles, let's get started!
+A. PRELIMINARY
+1) Install two x86_64 Linux systems that are known to work when
+   upgraded to a recent upstream Linux kernel version.
+On each system:
+2) Configure, build and install, then boot Linux, just to ensure it
+   can be done with an unmodified upstream kernel.  Confirm you booted
+   the upstream kernel with "uname -a".
+3) If you plan to do any performance testing or unless you plan to
+   test only swapping, the "WasActive" patch is also highly recommended.
+   (Search lkml.org for WasActive, apply the patch, rebuild your kernel.)
+   For a demo or simple testing, the patch can be ignored.
+4) Install ramster-tools as root.  An x86_64 rpm for EL6-based systems
+   can be found at:
+    http://oss.oracle.com/projects/tmem/files/RAMster/ 
+   (Sorry but for now, non-EL6 users must recreate ramster-tools on
+   their own from source.  See above.)
+5) Ensure that debugfs is mounted at each boot.  Examples below assume it
+   is mounted at /sys/kernel/debug.
+B. BUILDING RAMSTER INTO THE KERNEL
+Do the following on each system:
+1) Using the kernel configuration mechanism of your choice, change
+   your config to include:
+        CONFIG_CLEANCACHE=y
+        CONFIG_FRONTSWAP=y
+        CONFIG_STAGING=y
+        CONFIG_CONFIGFS_FS=y # NOTE: MUST BE y, not m
+        CONFIG_ZCACHE=y
+        CONFIG_RAMSTER=y
+   For a linux-3.10 or later kernel, you should also set:
+        CONFIG_ZCACHE_DEBUG=y
+        CONFIG_RAMSTER_DEBUG=y
+   Before building the kernel please doublecheck your kernel config
+   file to ensure all of the settings are correct.
+2) Build this kernel and change your boot file (e.g. /etc/grub.conf)
+   so that the new kernel will boot.
+3) Add "zcache" and "ramster" as kernel boot parameters for the new kernel.
+4) Reboot each system approximately simultaneously.
+5) Check dmesg to ensure there are some messages from ramster, prefixed
+   by "ramster:"
+        # dmesg | grep ramster
+   You should also see a lot of files in:
+        # ls /sys/kernel/debug/zcache
+        # ls /sys/kernel/debug/ramster
+   These are mostly counters for various zcache and ramster activities.
+   You should also see files in:
+        # ls /sys/kernel/mm/ramster
+   These are sysfs files that control ramster as we shall see.
+   Ramster now will act as a single-system zcache on each system
+   but doesn't yet know anything about the cluster so can't yet do
+   anything remotely.
+C. CONFIGURING THE RAMSTER CLUSTER
+This part can be error prone unless you are familiar with clustering
+filesystems.  We need to describe the cluster in a /etc/ramster.conf
+file and the init scripts that parse it are extremely picky about
+the syntax.
+1) Create a /etc/ramster.conf file and ensure it is identical on both
+   systems.  This file mimics the ocfs2 format and there is a good amount
+   of documentation that can be searched for ocfs2.conf, but you can use:
+        cluster:
+                name = ramster
+                node_count = 2
+        node:
+                name = system1
+                cluster = ramster
+                number = 0
+                ip_address = my.ip.ad.r1
+                ip_port = 7777
+        node:
+                name = system2
+                cluster = ramster
+                number = 1
+                ip_address = my.ip.ad.r2
+                ip_port = 7777
+   You must ensure that the "name" field in the file exactly matches
+   the output of "hostname" on each system; if "hostname" shows a
+   fully-qualified hostname, ensure the name is fully qualified in
+   /etc/ramster.conf.  Obviously, substitute my.ip.ad.rx with proper
+   ip addresses.
+2) Enable the ramster service and configure it.  If you used the
+   EL6 ramster-tools, this would be:
+        # chkconfig --add ramster
+        # service ramster configure
+   Set "load on boot" to "y", cluster to start is "ramster" (or whatever
+   name you chose in ramster.conf), heartbeat dead threshold as "500",
+   network idle timeout as "1000000".  Leave the others as default.
+3) Reboot both systems.  After reboot, try (assuming EL6 ramster-tools):
+        # service ramster status
+   You should see "Checking RAMSTER cluster "ramster": Online".  If you do
+   not, something is wrong and ramster will not work.  Note that you
+   should also see that the driver for "configfs" is loaded and mounted,
+   the driver for ocfs2_dlmfs is not loaded, and some numbers for network
+   parameters.  You will also see "Checking RAMSTER heartbeat: Not active".
+   That's all OK.
+4) Now you need to start the cluster heartbeat; the cluster is not "up"
+   until all nodes detect a heartbeat.  In a real cluster, heartbeat detection
+   is done via a cluster filesystem, but ramster doesn't require one.  Some
+   hack-y kernel code in ramster can start the heartbeat for you though if
+   you tell it what nodes are "up".  To enable the heartbeat, do:
+        # echo 0 > /sys/kernel/mm/ramster/manual_node_up
+        # echo 1 > /sys/kernel/mm/ramster/manual_node_up
+   This must be done on BOTH nodes and, to avoid timeouts, must be done
+   approximately concurrently on both nodes.  On an EL6 system, it is
+   convenient to put these lines in /etc/rc.local.  To confirm that the
+   cluster is now up, on both systems do:
+        # dmesg | grep ramster
+   You should see ramster "Accepted connection" messages in dmesg on both
+   nodes after this.  Note that if you check userland status again with
+        # service ramster status
+   you will still see "Checking RAMSTER heartbeat: Not active".  That's
+   still OK... the ramster kernel heartbeat hack doesn't communicate to
+   userland.
+5) You now must tell each node the node to which it should "remotify" pages.
+   On this two node cluster, we will assume the "local" node, node 0, has
+   memory overcommitted and will use ramster to utilize RAM capacity on
+   the "remote node", node 1.  To configure this, on node 0, you do:
+        # echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
+   You should see "ramster: node 1 set as remotification target" in dmesg
+   on node 0.  Again, on EL6, /etc/rc.local is a good place to put this
+   on node 0 so you don't forget to do it at each boot.
+6) One more step:  By default, the ramster code does not "remotify" any
+   pages; this is primarily for testing purposes, but sometimes it is
+   useful.  This may change in the future, but for now, on node 0, you do:
+        # echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
+        # echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable
+   The first enables remotifying swap (persistent, aka frontswap) pages,
+   the second enables remotifying of page cache (ephemeral, cleancache)
+   pages.
+   On EL6, these lines can also be put in /etc/rc.local (AFTER the
+   node_up lines), or at the beginning of a script that runs a workload.
+7) Note that most testing has been done with both/all machines booted
+   roughly simultaneously to avoid cluster timeouts.  Ideally, you should
+   do this too unless you are trying to break ramster rather than just
+   use it. ;-)
+D. TESTING RAMSTER
+1) Note that ramster has no value unless pages get "remotified".  For
+   swap/frontswap/persistent pages, this doesn't happen unless/until
+   the workload would cause swapping to occur, at which point pages
+   are put into frontswap/zcache, and the remotification thread starts
+   working.  To get to the point where the system swaps, you either
+   need a workload for which the working set exceeds the RAM in the
+   system; or you need to somehow reduce the amount of RAM one of
+   the system sees.  This latter is easy when testing in a VM, but
+   harder on physical systems.  In some cases, "mem=xxxM" on the
+   kernel command line restricts memory, but for some values of xxx
+   the kernel may fail to boot.  One may also try creating a fixed
+   RAMdisk, doing nothing with it, but ensuring that it eats up a fixed
+   amount of RAM.
+2) To see if ramster is working, on the "remote node", node 1, try:
+        # grep . /sys/kernel/debug/ramster/foreign_*
+        # # note, that is space-dot-space between grep and the pathname
+   to monitor the number (and max) ephemeral and persistent pages
+   that ramster has sent.  If these stay at zero, ramster is not working
+   either because the workload on the local node (node 0) isn't creating
+   enough memory pressure or because "remotifying" isn't working.  On the
+   local system, node 0, you can watch lots of useful information also.
+   Try:
+        grep . /sys/kernel/debug/zcache/*pageframes* \
+                /sys/kernel/debug/zcache/*zbytes* \
+                /sys/kernel/debug/zcache/*zpages* \
+                /sys/kernel/debug/ramster/*remote*
+   Of particular note are the remote_*_pages_succ_get counters.  These
+   show how many disk reads and/or disk writes have been avoided on the
+   overcommitted local system by storing pages remotely using ramster.
+   At the risk of information overload, you can also grep:
+        /sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*
+   These show, for example, how many disk reads and/or disk writes have
+   been avoided by using zcache to optimize RAM on the local system.
+AUTOMATIC SWAP REPATRIATION
+You may notice that while the systems are idle, the foreign persistent
+page count on the remote machine slowly decreases.  This is because
+ramster implements "frontswap selfshrinking":  When possible, swap
+pages that have been remotified are slowly repatriated to the local
+machine.  This is so that local RAM can be used when possible and
+so that, in case of remote machine crash, the probability of loss
+of data is reduced.
+REBOOTING / POWEROFF
+If a system is shut down while some of its swap pages still reside
+on a remote system, the system may lock up during the shutdown
+sequence.  This will occur if the network is shut down before the
+swap mechansim is shut down, which is the default ordering on many
+distros.  To avoid this annoying problem, simply shut off the swap
+subsystem before starting the shutdown sequence, e.g.:
+        # swapoff -a
+        # reboot
+Ideally, this swapoff-before-ifdown ordering should be enforced permanently
+using shutdown scripts.
+KNOWN PROBLEMS
+1) You may periodically see messages such as:
+    ramster_r2net, message length problem
+   This is harmless but indicates that a node is sending messages
+   containing compressed pages that exceed the maximum for zcache
+   (PAGE_SIZE*15/16).  The sender side needs to be fixed.
+2) If you see a "No longer connected to node..." message or a "No connection
+   established with node X after N seconds", it is possible you may
+   be in an unrecoverable state.  If you are certain all of the
+   appropriate cluster configuration steps described above have been
+   performed, try rebooting the two servers concurrently to see if
+   the cluster starts.
+   Note that "Connection to node... shutdown, state 7" is an intermediate
+   connection state.  As long as you later see "Accepted connection", the
+   intermediate states are harmless.
+3) There are known issues in counting certain values.  As a result
+   you may see periodic warnings from the kernel.  Almost always you
+   will see "ramster: bad accounting for XXX".  There are also "WARN_ONCE"
+   messages.  If you see kernel warnings with a tombstone, please report
+   them.  They are harmless but reflect bugs that need to be eventually fixed.
+ADVANCED RAMSTER TOPOLOGIES
+The kernel code for ramster can support up to eight nodes in a cluster,
+but no testing has been done with more than three nodes.
+In the example described above, the "remote" node serves as a RAM
+overflow for the "local" node.  This can be made symmetric by appropriate
+settings of the sysfs remote_target_nodenum file.  For example, by setting:
+        # echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
+on node 0, and
+        # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
+on node 1, each node can serve as a RAM overflow for the other.
+For more than two nodes, a "RAM server" can be configured.  For a
+three node system, set:
+        # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
+on node 1, and
+        # echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
+on node 2.  Then node 0 is a RAM server for node 1 and node 2.
+In this implementation of ramster, any remote node is potentially a single
+point of failure (SPOF).  Though the probability of failure is reduced
+by automatic swap repatriation (see above), a proposed future enhancement
+to ramster improves high-availability for the cluster by sending a copy
+of each page of date to two other nodes.  Patches welcome!
author	Dan Magenheimer <dan.magenheimer@oracle.com>	2013-05-20 10:52:17 -0400
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-05-20 11:21:04 -0400
commit	8bb3e55103b37869175333e00fc01b34b0459529 (patch)
tree	96eb4df3801d92460a82b708014406f06df2bdd5
parent	642f2ecc092f4d2d5a9b7219090531508017c324 (diff)

diff --git a/drivers/staging/zcache/ramster/ramster-howto.txt b/drivers/staging/zcache/ramster/ramster-howto.txt new file mode 100644 index 000000000000..7b1ee3bbfdd5 --- /dev/null +++ b/drivers/staging/zcache/ramster/ramster-howto.txt
@@ -0,0 +1,366 @@
	1	RAMSTER HOW-TO
	2
	3	Author: Dan Magenheimer
	4	Ramster maintainer: Konrad Wilk <konrad.wilk@oracle.com>
	5
	6	This is a HOWTO document for ramster which, as of this writing, is in
	7	the kernel as a subdirectory of zcache in drivers/staging, called ramster.
	8	(Zcache can be built with or without ramster functionality.) If enabled
	9	and properly configured, ramster allows memory capacity load balancing
	10	across multiple machines in a cluster. Further, the ramster code serves
	11	as an example of asynchronous access for zcache (as well as cleancache and
	12	frontswap) that may prove useful for future transcendent memory
	13	implementations, such as KVM and NVRAM. While ramster works today on
	14	any network connection that supports kernel sockets, its features may
	15	become more interesting on future high-speed fabrics/interconnects.
	16
	17	Ramster requires both kernel and userland support. The userland support,
	18	called ramster-tools, is known to work with EL6-based distros, but is a
	19	set of poorly-hacked slightly-modified cluster tools based on ocfs2, which
	20	includes an init file, a config file, and a userland binary that interfaces
	21	to the kernel. This state of userland support reflects the abysmal userland
	22	skills of this suitably-embarrassed author; any help/patches to turn
	23	ramster-tools into more distributable rpms/debs useful for a wider range
	24	of distros would be appreciated. The source RPM that can be used as a
	25	starting point is available at:
	26	http://oss.oracle.com/projects/tmem/files/RAMster/
	27
	28	As a result of this author's ignorance, userland setup described in this
	29	HOWTO assumes an EL6 distro and is described in EL6 syntax. Apologies
	30	if this offends anyone!
	31
	32	Kernel support has only been tested on x86_64. Systems with an active
	33	ocfs2 filesystem should work, but since ramster leverages a lot of
	34	code from ocfs2, there may be latent issues. A kernel configuration that
	35	includes CONFIG_OCFS2_FS should build OK, and should certainly run OK
	36	if no ocfs2 filesystem is mounted.
	37
	38	This HOWTO demonstrates memory capacity load balancing for a two-node
	39	cluster, where one node called the "local" node becomes overcommitted
	40	and the other node called the "remote" node provides additional RAM
	41	capacity for use by the local node. Ramster is capable of more complex
	42	topologies; see the last section titled "ADVANCED RAMSTER TOPOLOGIES".
	43
	44	If you find any terms in this HOWTO unfamiliar or don't understand the
	45	motivation for ramster, the following LWN reading is recommended:
	46	-- Transcendent Memory in a Nutshell (lwn.net/Articles/454795)
	47	-- The future calculus of memory management (lwn.net/Articles/475681)
	48	And since ramster is built on top of zcache, this article may be helpful:
	49	-- In-kernel memory compression (lwn.net/Articles/545244)
	50
	51	Now that you've memorized the contents of those articles, let's get started!
	52
	53	A. PRELIMINARY
	54
	55	1) Install two x86_64 Linux systems that are known to work when
	56	upgraded to a recent upstream Linux kernel version.
	57
	58	On each system:
	59
	60	2) Configure, build and install, then boot Linux, just to ensure it
	61	can be done with an unmodified upstream kernel. Confirm you booted
	62	the upstream kernel with "uname -a".
	63
	64	3) If you plan to do any performance testing or unless you plan to
	65	test only swapping, the "WasActive" patch is also highly recommended.
	66	(Search lkml.org for WasActive, apply the patch, rebuild your kernel.)
	67	For a demo or simple testing, the patch can be ignored.
	68
	69	4) Install ramster-tools as root. An x86_64 rpm for EL6-based systems
	70	can be found at:
	71	http://oss.oracle.com/projects/tmem/files/RAMster/
	72	(Sorry but for now, non-EL6 users must recreate ramster-tools on
	73	their own from source. See above.)
	74
	75	5) Ensure that debugfs is mounted at each boot. Examples below assume it
	76	is mounted at /sys/kernel/debug.
	77
	78	B. BUILDING RAMSTER INTO THE KERNEL
	79
	80	Do the following on each system:
	81
	82	1) Using the kernel configuration mechanism of your choice, change
	83	your config to include:
	84
	85	CONFIG_CLEANCACHE=y
	86	CONFIG_FRONTSWAP=y
	87	CONFIG_STAGING=y
	88	CONFIG_CONFIGFS_FS=y # NOTE: MUST BE y, not m
	89	CONFIG_ZCACHE=y
	90	CONFIG_RAMSTER=y
	91
	92	For a linux-3.10 or later kernel, you should also set:
	93
	94	CONFIG_ZCACHE_DEBUG=y
	95	CONFIG_RAMSTER_DEBUG=y
	96
	97	Before building the kernel please doublecheck your kernel config
	98	file to ensure all of the settings are correct.
	99
	100	2) Build this kernel and change your boot file (e.g. /etc/grub.conf)
	101	so that the new kernel will boot.
	102
	103	3) Add "zcache" and "ramster" as kernel boot parameters for the new kernel.
	104
	105	4) Reboot each system approximately simultaneously.
	106
	107	5) Check dmesg to ensure there are some messages from ramster, prefixed
	108	by "ramster:"
	109
	110	# dmesg \| grep ramster
	111
	112	You should also see a lot of files in:
	113
	114	# ls /sys/kernel/debug/zcache
	115	# ls /sys/kernel/debug/ramster
	116
	117	These are mostly counters for various zcache and ramster activities.
	118	You should also see files in:
	119
	120	# ls /sys/kernel/mm/ramster
	121
	122	These are sysfs files that control ramster as we shall see.
	123
	124	Ramster now will act as a single-system zcache on each system
	125	but doesn't yet know anything about the cluster so can't yet do
	126	anything remotely.
	127
	128	C. CONFIGURING THE RAMSTER CLUSTER
	129
	130	This part can be error prone unless you are familiar with clustering
	131	filesystems. We need to describe the cluster in a /etc/ramster.conf
	132	file and the init scripts that parse it are extremely picky about
	133	the syntax.
	134
	135	1) Create a /etc/ramster.conf file and ensure it is identical on both
	136	systems. This file mimics the ocfs2 format and there is a good amount
	137	of documentation that can be searched for ocfs2.conf, but you can use:
	138
	139	cluster:
	140	name = ramster
	141	node_count = 2
	142	node:
	143	name = system1
	144	cluster = ramster
	145	number = 0
	146	ip_address = my.ip.ad.r1
	147	ip_port = 7777
	148	node:
	149	name = system2
	150	cluster = ramster
	151	number = 1
	152	ip_address = my.ip.ad.r2
	153	ip_port = 7777
	154
	155	You must ensure that the "name" field in the file exactly matches
	156	the output of "hostname" on each system; if "hostname" shows a
	157	fully-qualified hostname, ensure the name is fully qualified in
	158	/etc/ramster.conf. Obviously, substitute my.ip.ad.rx with proper
	159	ip addresses.
	160
	161	2) Enable the ramster service and configure it. If you used the
	162	EL6 ramster-tools, this would be:
	163
	164	# chkconfig --add ramster
	165	# service ramster configure
	166
	167	Set "load on boot" to "y", cluster to start is "ramster" (or whatever
	168	name you chose in ramster.conf), heartbeat dead threshold as "500",
	169	network idle timeout as "1000000". Leave the others as default.
	170
	171	3) Reboot both systems. After reboot, try (assuming EL6 ramster-tools):
	172
	173	# service ramster status
	174
	175	You should see "Checking RAMSTER cluster "ramster": Online". If you do
	176	not, something is wrong and ramster will not work. Note that you
	177	should also see that the driver for "configfs" is loaded and mounted,
	178	the driver for ocfs2_dlmfs is not loaded, and some numbers for network
	179	parameters. You will also see "Checking RAMSTER heartbeat: Not active".
	180	That's all OK.
	181
	182	4) Now you need to start the cluster heartbeat; the cluster is not "up"
	183	until all nodes detect a heartbeat. In a real cluster, heartbeat detection
	184	is done via a cluster filesystem, but ramster doesn't require one. Some
	185	hack-y kernel code in ramster can start the heartbeat for you though if
	186	you tell it what nodes are "up". To enable the heartbeat, do:
	187
	188	# echo 0 > /sys/kernel/mm/ramster/manual_node_up
	189	# echo 1 > /sys/kernel/mm/ramster/manual_node_up
	190
	191	This must be done on BOTH nodes and, to avoid timeouts, must be done
	192	approximately concurrently on both nodes. On an EL6 system, it is
	193	convenient to put these lines in /etc/rc.local. To confirm that the
	194	cluster is now up, on both systems do:
	195
	196	# dmesg \| grep ramster
	197
	198	You should see ramster "Accepted connection" messages in dmesg on both
	199	nodes after this. Note that if you check userland status again with
	200
	201	# service ramster status
	202
	203	you will still see "Checking RAMSTER heartbeat: Not active". That's
	204	still OK... the ramster kernel heartbeat hack doesn't communicate to
	205	userland.
	206
	207	5) You now must tell each node the node to which it should "remotify" pages.
	208	On this two node cluster, we will assume the "local" node, node 0, has
	209	memory overcommitted and will use ramster to utilize RAM capacity on
	210	the "remote node", node 1. To configure this, on node 0, you do:
	211
	212	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
	213
	214	You should see "ramster: node 1 set as remotification target" in dmesg
	215	on node 0. Again, on EL6, /etc/rc.local is a good place to put this
	216	on node 0 so you don't forget to do it at each boot.
	217
	218	6) One more step: By default, the ramster code does not "remotify" any
	219	pages; this is primarily for testing purposes, but sometimes it is
	220	useful. This may change in the future, but for now, on node 0, you do:
	221
	222	# echo 1 > /sys/kernel/mm/ramster/pers_remotify_enable
	223	# echo 1 > /sys/kernel/mm/ramster/eph_remotify_enable
	224
	225	The first enables remotifying swap (persistent, aka frontswap) pages,
	226	the second enables remotifying of page cache (ephemeral, cleancache)
	227	pages.
	228
	229	On EL6, these lines can also be put in /etc/rc.local (AFTER the
	230	node_up lines), or at the beginning of a script that runs a workload.
	231
	232	7) Note that most testing has been done with both/all machines booted
	233	roughly simultaneously to avoid cluster timeouts. Ideally, you should
	234	do this too unless you are trying to break ramster rather than just
	235	use it. ;-)
	236
	237	D. TESTING RAMSTER
	238
	239	1) Note that ramster has no value unless pages get "remotified". For
	240	swap/frontswap/persistent pages, this doesn't happen unless/until
	241	the workload would cause swapping to occur, at which point pages
	242	are put into frontswap/zcache, and the remotification thread starts
	243	working. To get to the point where the system swaps, you either
	244	need a workload for which the working set exceeds the RAM in the
	245	system; or you need to somehow reduce the amount of RAM one of
	246	the system sees. This latter is easy when testing in a VM, but
	247	harder on physical systems. In some cases, "mem=xxxM" on the
	248	kernel command line restricts memory, but for some values of xxx
	249	the kernel may fail to boot. One may also try creating a fixed
	250	RAMdisk, doing nothing with it, but ensuring that it eats up a fixed
	251	amount of RAM.
	252
	253	2) To see if ramster is working, on the "remote node", node 1, try:
	254
	255	# grep . /sys/kernel/debug/ramster/foreign_*
	256	# # note, that is space-dot-space between grep and the pathname
	257
	258	to monitor the number (and max) ephemeral and persistent pages
	259	that ramster has sent. If these stay at zero, ramster is not working
	260	either because the workload on the local node (node 0) isn't creating
	261	enough memory pressure or because "remotifying" isn't working. On the
	262	local system, node 0, you can watch lots of useful information also.
	263	Try:
	264
	265	grep . /sys/kernel/debug/zcache/pageframes \
	266	/sys/kernel/debug/zcache/zbytes \
	267	/sys/kernel/debug/zcache/zpages \
	268	/sys/kernel/debug/ramster/remote
	269
	270	Of particular note are the remote_*_pages_succ_get counters. These
	271	show how many disk reads and/or disk writes have been avoided on the
	272	overcommitted local system by storing pages remotely using ramster.
	273
	274	At the risk of information overload, you can also grep:
	275
	276	/sys/kernel/debug/cleancache/* and /sys/kernel/debug/frontswap/*
	277
	278	These show, for example, how many disk reads and/or disk writes have
	279	been avoided by using zcache to optimize RAM on the local system.
	280
	281
	282	AUTOMATIC SWAP REPATRIATION
	283
	284	You may notice that while the systems are idle, the foreign persistent
	285	page count on the remote machine slowly decreases. This is because
	286	ramster implements "frontswap selfshrinking": When possible, swap
	287	pages that have been remotified are slowly repatriated to the local
	288	machine. This is so that local RAM can be used when possible and
	289	so that, in case of remote machine crash, the probability of loss
	290	of data is reduced.
	291
	292	REBOOTING / POWEROFF
	293
	294	If a system is shut down while some of its swap pages still reside
	295	on a remote system, the system may lock up during the shutdown
	296	sequence. This will occur if the network is shut down before the
	297	swap mechansim is shut down, which is the default ordering on many
	298	distros. To avoid this annoying problem, simply shut off the swap
	299	subsystem before starting the shutdown sequence, e.g.:
	300
	301	# swapoff -a
	302	# reboot
	303
	304	Ideally, this swapoff-before-ifdown ordering should be enforced permanently
	305	using shutdown scripts.
	306
	307	KNOWN PROBLEMS
	308
	309	1) You may periodically see messages such as:
	310
	311	ramster_r2net, message length problem
	312
	313	This is harmless but indicates that a node is sending messages
	314	containing compressed pages that exceed the maximum for zcache
	315	(PAGE_SIZE*15/16). The sender side needs to be fixed.
	316
	317	2) If you see a "No longer connected to node..." message or a "No connection
	318	established with node X after N seconds", it is possible you may
	319	be in an unrecoverable state. If you are certain all of the
	320	appropriate cluster configuration steps described above have been
	321	performed, try rebooting the two servers concurrently to see if
	322	the cluster starts.
	323
	324	Note that "Connection to node... shutdown, state 7" is an intermediate
	325	connection state. As long as you later see "Accepted connection", the
	326	intermediate states are harmless.
	327
	328	3) There are known issues in counting certain values. As a result
	329	you may see periodic warnings from the kernel. Almost always you
	330	will see "ramster: bad accounting for XXX". There are also "WARN_ONCE"
	331	messages. If you see kernel warnings with a tombstone, please report
	332	them. They are harmless but reflect bugs that need to be eventually fixed.
	333
	334	ADVANCED RAMSTER TOPOLOGIES
	335
	336	The kernel code for ramster can support up to eight nodes in a cluster,
	337	but no testing has been done with more than three nodes.
	338
	339	In the example described above, the "remote" node serves as a RAM
	340	overflow for the "local" node. This can be made symmetric by appropriate
	341	settings of the sysfs remote_target_nodenum file. For example, by setting:
	342
	343	# echo 1 > /sys/kernel/mm/ramster/remote_target_nodenum
	344
	345	on node 0, and
	346
	347	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
	348
	349	on node 1, each node can serve as a RAM overflow for the other.
	350
	351	For more than two nodes, a "RAM server" can be configured. For a
	352	three node system, set:
	353
	354	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
	355
	356	on node 1, and
	357
	358	# echo 0 > /sys/kernel/mm/ramster/remote_target_nodenum
	359
	360	on node 2. Then node 0 is a RAM server for node 1 and node 2.
	361
	362	In this implementation of ramster, any remote node is potentially a single
	363	point of failure (SPOF). Though the probability of failure is reduced
	364	by automatic swap repatriation (see above), a proposed future enhancement
	365	to ramster improves high-availability for the cluster by sending a copy
	366	of each page of date to two other nodes. Patches welcome!