aboutsummaryrefslogtreecommitdiffstats
path: root/Documentation
diff options
context:
space:
mode:
authorKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>2015-11-10 19:10:45 -0500
committerDan Williams <dan.j.williams@intel.com>2015-11-12 12:55:23 -0500
commit8de5dff8bae634497f4413bc3067389f2ed267da (patch)
tree14307943d7a9889c6140f2b79fa1a71f8fa745d4 /Documentation
parent589e75d15702dc720b363a92f984876704864946 (diff)
libnvdimm: documentation clarifications
A bunch of changes that I hope will help in understanding it better for first-time readers. Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Diffstat (limited to 'Documentation')
-rw-r--r--Documentation/nvdimm/nvdimm.txt49
1 files changed, 28 insertions, 21 deletions
diff --git a/Documentation/nvdimm/nvdimm.txt b/Documentation/nvdimm/nvdimm.txt
index 197a0b6b0582..e894de69915a 100644
--- a/Documentation/nvdimm/nvdimm.txt
+++ b/Documentation/nvdimm/nvdimm.txt
@@ -62,6 +62,12 @@ DAX: File system extensions to bypass the page cache and block layer to
62mmap persistent memory, from a PMEM block device, directly into a 62mmap persistent memory, from a PMEM block device, directly into a
63process address space. 63process address space.
64 64
65DSM: Device Specific Method: ACPI method to to control specific
66device - in this case the firmware.
67
68DCR: NVDIMM Control Region Structure defined in ACPI 6 Section 5.2.25.5.
69It defines a vendor-id, device-id, and interface format for a given DIMM.
70
65BTT: Block Translation Table: Persistent memory is byte addressable. 71BTT: Block Translation Table: Persistent memory is byte addressable.
66Existing software may have an expectation that the power-fail-atomicity 72Existing software may have an expectation that the power-fail-atomicity
67of writes is at least one sector, 512 bytes. The BTT is an indirection 73of writes is at least one sector, 512 bytes. The BTT is an indirection
@@ -133,16 +139,16 @@ device driver:
133 registered, can be immediately attached to nd_pmem. 139 registered, can be immediately attached to nd_pmem.
134 140
135 2. BLK (nd_blk.ko): This driver performs I/O using a set of platform 141 2. BLK (nd_blk.ko): This driver performs I/O using a set of platform
136 defined apertures. A set of apertures will all access just one DIMM. 142 defined apertures. A set of apertures will access just one DIMM.
137 Multiple windows allow multiple concurrent accesses, much like 143 Multiple windows (apertures) allow multiple concurrent accesses, much like
138 tagged-command-queuing, and would likely be used by different threads or 144 tagged-command-queuing, and would likely be used by different threads or
139 different CPUs. 145 different CPUs.
140 146
141 The NFIT specification defines a standard format for a BLK-aperture, but 147 The NFIT specification defines a standard format for a BLK-aperture, but
142 the spec also allows for vendor specific layouts, and non-NFIT BLK 148 the spec also allows for vendor specific layouts, and non-NFIT BLK
143 implementations may other designs for BLK I/O. For this reason "nd_blk" 149 implementations may have other designs for BLK I/O. For this reason
144 calls back into platform-specific code to perform the I/O. One such 150 "nd_blk" calls back into platform-specific code to perform the I/O.
145 implementation is defined in the "Driver Writer's Guide" and "DSM 151 One such implementation is defined in the "Driver Writer's Guide" and "DSM
146 Interface Example". 152 Interface Example".
147 153
148 154
@@ -152,7 +158,7 @@ Why BLK?
152While PMEM provides direct byte-addressable CPU-load/store access to 158While PMEM provides direct byte-addressable CPU-load/store access to
153NVDIMM storage, it does not provide the best system RAS (recovery, 159NVDIMM storage, it does not provide the best system RAS (recovery,
154availability, and serviceability) model. An access to a corrupted 160availability, and serviceability) model. An access to a corrupted
155system-physical-address address causes a cpu exception while an access 161system-physical-address address causes a CPU exception while an access
156to a corrupted address through an BLK-aperture causes that block window 162to a corrupted address through an BLK-aperture causes that block window
157to raise an error status in a register. The latter is more aligned with 163to raise an error status in a register. The latter is more aligned with
158the standard error model that host-bus-adapter attached disks present. 164the standard error model that host-bus-adapter attached disks present.
@@ -162,7 +168,7 @@ data could be interleaved in an opaque hardware specific manner across
162several DIMMs. 168several DIMMs.
163 169
164PMEM vs BLK 170PMEM vs BLK
165BLK-apertures solve this RAS problem, but their presence is also the 171BLK-apertures solve these RAS problems, but their presence is also the
166major contributing factor to the complexity of the ND subsystem. They 172major contributing factor to the complexity of the ND subsystem. They
167complicate the implementation because PMEM and BLK alias in DPA space. 173complicate the implementation because PMEM and BLK alias in DPA space.
168Any given DIMM's DPA-range may contribute to one or more 174Any given DIMM's DPA-range may contribute to one or more
@@ -220,8 +226,8 @@ socket. Each unique interface (BLK or PMEM) to DPA space is identified
220by a region device with a dynamically assigned id (REGION0 - REGION5). 226by a region device with a dynamically assigned id (REGION0 - REGION5).
221 227
222 1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A 228 1. The first portion of DIMM0 and DIMM1 are interleaved as REGION0. A
223 single PMEM namespace is created in the REGION0-SPA-range that spans 229 single PMEM namespace is created in the REGION0-SPA-range that spans most
224 DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that 230 of DIMM0 and DIMM1 with a user-specified name of "pm0.0". Some of that
225 interleaved system-physical-address range is reclaimed as BLK-aperture 231 interleaved system-physical-address range is reclaimed as BLK-aperture
226 accessed space starting at DPA-offset (a) into each DIMM. In that 232 accessed space starting at DPA-offset (a) into each DIMM. In that
227 reclaimed space we create two BLK-aperture "namespaces" from REGION2 and 233 reclaimed space we create two BLK-aperture "namespaces" from REGION2 and
@@ -230,13 +236,13 @@ by a region device with a dynamically assigned id (REGION0 - REGION5).
230 236
231 2. In the last portion of DIMM0 and DIMM1 we have an interleaved 237 2. In the last portion of DIMM0 and DIMM1 we have an interleaved
232 system-physical-address range, REGION1, that spans those two DIMMs as 238 system-physical-address range, REGION1, that spans those two DIMMs as
233 well as DIMM2 and DIMM3. Some of REGION1 allocated to a PMEM namespace 239 well as DIMM2 and DIMM3. Some of REGION1 is allocated to a PMEM namespace
234 named "pm1.0" the rest is reclaimed in 4 BLK-aperture namespaces (for 240 named "pm1.0", the rest is reclaimed in 4 BLK-aperture namespaces (for
235 each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and 241 each DIMM in the interleave set), "blk2.1", "blk3.1", "blk4.0", and
236 "blk5.0". 242 "blk5.0".
237 243
238 3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1 244 3. The portion of DIMM2 and DIMM3 that do not participate in the REGION1
239 interleaved system-physical-address range (i.e. the DPA address below 245 interleaved system-physical-address range (i.e. the DPA address past
240 offset (b) are also included in the "blk4.0" and "blk5.0" namespaces. 246 offset (b) are also included in the "blk4.0" and "blk5.0" namespaces.
241 Note, that this example shows that BLK-aperture namespaces don't need to 247 Note, that this example shows that BLK-aperture namespaces don't need to
242 be contiguous in DPA-space. 248 be contiguous in DPA-space.
@@ -252,15 +258,15 @@ LIBNVDIMM Kernel Device Model and LIBNDCTL Userspace API
252 258
253What follows is a description of the LIBNVDIMM sysfs layout and a 259What follows is a description of the LIBNVDIMM sysfs layout and a
254corresponding object hierarchy diagram as viewed through the LIBNDCTL 260corresponding object hierarchy diagram as viewed through the LIBNDCTL
255api. The example sysfs paths and diagrams are relative to the Example 261API. The example sysfs paths and diagrams are relative to the Example
256NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit 262NVDIMM Platform which is also the LIBNVDIMM bus used in the LIBNDCTL unit
257test. 263test.
258 264
259LIBNDCTL: Context 265LIBNDCTL: Context
260Every api call in the LIBNDCTL library requires a context that holds the 266Every API call in the LIBNDCTL library requires a context that holds the
261logging parameters and other library instance state. The library is 267logging parameters and other library instance state. The library is
262based on the libabc template: 268based on the libabc template:
263https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git/ 269https://git.kernel.org/cgit/linux/kernel/git/kay/libabc.git
264 270
265LIBNDCTL: instantiate a new library context example 271LIBNDCTL: instantiate a new library context example
266 272
@@ -409,7 +415,7 @@ Bit 31:28 Reserved
409LIBNVDIMM/LIBNDCTL: Region 415LIBNVDIMM/LIBNDCTL: Region
410---------------------- 416----------------------
411 417
412A generic REGION device is registered for each PMEM range orBLK-aperture 418A generic REGION device is registered for each PMEM range or BLK-aperture
413set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture 419set. Per the example there are 6 regions: 2 PMEM and 4 BLK-aperture
414sets on the "nfit_test.0" bus. The primary role of regions are to be a 420sets on the "nfit_test.0" bus. The primary role of regions are to be a
415container of "mappings". A mapping is a tuple of <DIMM, 421container of "mappings". A mapping is a tuple of <DIMM,
@@ -509,7 +515,7 @@ At first glance it seems since NFIT defines just PMEM and BLK interface
509types that we should simply name REGION devices with something derived 515types that we should simply name REGION devices with something derived
510from those type names. However, the ND subsystem explicitly keeps the 516from those type names. However, the ND subsystem explicitly keeps the
511REGION name generic and expects userspace to always consider the 517REGION name generic and expects userspace to always consider the
512region-attributes for 4 reasons: 518region-attributes for four reasons:
513 519
514 1. There are already more than two REGION and "namespace" types. For 520 1. There are already more than two REGION and "namespace" types. For
515 PMEM there are two subtypes. As mentioned previously we have PMEM where 521 PMEM there are two subtypes. As mentioned previously we have PMEM where
@@ -698,8 +704,8 @@ static int configure_namespace(struct ndctl_region *region,
698 704
699Why the Term "namespace"? 705Why the Term "namespace"?
700 706
701 1. Why not "volume" for instance? "volume" ran the risk of confusing ND 707 1. Why not "volume" for instance? "volume" ran the risk of confusing
702 as a volume manager like device-mapper. 708 ND (libnvdimm subsystem) to a volume manager like device-mapper.
703 709
704 2. The term originated to describe the sub-devices that can be created 710 2. The term originated to describe the sub-devices that can be created
705 within a NVME controller (see the nvme specification: 711 within a NVME controller (see the nvme specification:
@@ -774,13 +780,14 @@ block" needs to be destroyed. Note, that to destroy a BTT the media
774needs to be written in raw mode. By default, the kernel will autodetect 780needs to be written in raw mode. By default, the kernel will autodetect
775the presence of a BTT and disable raw mode. This autodetect behavior 781the presence of a BTT and disable raw mode. This autodetect behavior
776can be suppressed by enabling raw mode for the namespace via the 782can be suppressed by enabling raw mode for the namespace via the
777ndctl_namespace_set_raw_mode() api. 783ndctl_namespace_set_raw_mode() API.
778 784
779 785
780Summary LIBNDCTL Diagram 786Summary LIBNDCTL Diagram
781------------------------ 787------------------------
782 788
783For the given example above, here is the view of the objects as seen by the LIBNDCTL api: 789For the given example above, here is the view of the objects as seen by the
790LIBNDCTL API:
784 +---+ 791 +---+
785 |CTX| +---------+ +--------------+ +---------------+ 792 |CTX| +---------+ +--------------+ +---------------+
786 +-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" | 793 +-+-+ +-> REGION0 +---> NAMESPACE0.0 +--> PMEM8 "pm0.0" |