diff options
| author | Ian Munsie <imunsie@au1.ibm.com> | 2014-10-08 04:55:05 -0400 |
|---|---|---|
| committer | Michael Ellerman <mpe@ellerman.id.au> | 2014-10-08 05:16:19 -0400 |
| commit | a9282d01cf357379ce29103cec5e7651a53c634d (patch) | |
| tree | efbc02a23f5dbc8453cdb4584c0ac2cef1316ba0 /Documentation/powerpc | |
| parent | 881632c905f29fd7173250fd1d5b3a9a769d02be (diff) | |
cxl: Add documentation for userspace APIs
This documentation gives an overview of the hardware architecture, userspace
APIs via /dev/cxl/afuM.N and the syfs files. It also adds a MAINTAINERS file
entry for cxl.
Signed-off-by: Ian Munsie <imunsie@au1.ibm.com>
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Diffstat (limited to 'Documentation/powerpc')
| -rw-r--r-- | Documentation/powerpc/00-INDEX | 2 | ||||
| -rw-r--r-- | Documentation/powerpc/cxl.txt | 379 |
2 files changed, 381 insertions, 0 deletions
diff --git a/Documentation/powerpc/00-INDEX b/Documentation/powerpc/00-INDEX index a68784d0a1ee..6fd0e8bb8140 100644 --- a/Documentation/powerpc/00-INDEX +++ b/Documentation/powerpc/00-INDEX | |||
| @@ -11,6 +11,8 @@ bootwrapper.txt | |||
| 11 | cpu_features.txt | 11 | cpu_features.txt |
| 12 | - info on how we support a variety of CPUs with minimal compile-time | 12 | - info on how we support a variety of CPUs with minimal compile-time |
| 13 | options. | 13 | options. |
| 14 | cxl.txt | ||
| 15 | - Overview of the CXL driver. | ||
| 14 | eeh-pci-error-recovery.txt | 16 | eeh-pci-error-recovery.txt |
| 15 | - info on PCI Bus EEH Error Recovery | 17 | - info on PCI Bus EEH Error Recovery |
| 16 | firmware-assisted-dump.txt | 18 | firmware-assisted-dump.txt |
diff --git a/Documentation/powerpc/cxl.txt b/Documentation/powerpc/cxl.txt new file mode 100644 index 000000000000..2c71ecc519d9 --- /dev/null +++ b/Documentation/powerpc/cxl.txt | |||
| @@ -0,0 +1,379 @@ | |||
| 1 | Coherent Accelerator Interface (CXL) | ||
| 2 | ==================================== | ||
| 3 | |||
| 4 | Introduction | ||
| 5 | ============ | ||
| 6 | |||
| 7 | The coherent accelerator interface is designed to allow the | ||
| 8 | coherent connection of accelerators (FPGAs and other devices) to a | ||
| 9 | POWER system. These devices need to adhere to the Coherent | ||
| 10 | Accelerator Interface Architecture (CAIA). | ||
| 11 | |||
| 12 | IBM refers to this as the Coherent Accelerator Processor Interface | ||
| 13 | or CAPI. In the kernel it's referred to by the name CXL to avoid | ||
| 14 | confusion with the ISDN CAPI subsystem. | ||
| 15 | |||
| 16 | Coherent in this context means that the accelerator and CPUs can | ||
| 17 | both access system memory directly and with the same effective | ||
| 18 | addresses. | ||
| 19 | |||
| 20 | |||
| 21 | Hardware overview | ||
| 22 | ================= | ||
| 23 | |||
| 24 | POWER8 FPGA | ||
| 25 | +----------+ +---------+ | ||
| 26 | | | | | | ||
| 27 | | CPU | | AFU | | ||
| 28 | | | | | | ||
| 29 | | | | | | ||
| 30 | | | | | | ||
| 31 | +----------+ +---------+ | ||
| 32 | | PHB | | | | ||
| 33 | | +------+ | PSL | | ||
| 34 | | | CAPP |<------>| | | ||
| 35 | +---+------+ PCIE +---------+ | ||
| 36 | |||
| 37 | The POWER8 chip has a Coherently Attached Processor Proxy (CAPP) | ||
| 38 | unit which is part of the PCIe Host Bridge (PHB). This is managed | ||
| 39 | by Linux by calls into OPAL. Linux doesn't directly program the | ||
| 40 | CAPP. | ||
| 41 | |||
| 42 | The FPGA (or coherently attached device) consists of two parts. | ||
| 43 | The POWER Service Layer (PSL) and the Accelerator Function Unit | ||
| 44 | (AFU). The AFU is used to implement specific functionality behind | ||
| 45 | the PSL. The PSL, among other things, provides memory address | ||
| 46 | translation services to allow each AFU direct access to userspace | ||
| 47 | memory. | ||
| 48 | |||
| 49 | The AFU is the core part of the accelerator (eg. the compression, | ||
| 50 | crypto etc function). The kernel has no knowledge of the function | ||
| 51 | of the AFU. Only userspace interacts directly with the AFU. | ||
| 52 | |||
| 53 | The PSL provides the translation and interrupt services that the | ||
| 54 | AFU needs. This is what the kernel interacts with. For example, if | ||
| 55 | the AFU needs to read a particular effective address, it sends | ||
| 56 | that address to the PSL, the PSL then translates it, fetches the | ||
| 57 | data from memory and returns it to the AFU. If the PSL has a | ||
| 58 | translation miss, it interrupts the kernel and the kernel services | ||
| 59 | the fault. The context to which this fault is serviced is based on | ||
| 60 | who owns that acceleration function. | ||
| 61 | |||
| 62 | |||
| 63 | AFU Modes | ||
| 64 | ========= | ||
| 65 | |||
| 66 | There are two programming modes supported by the AFU. Dedicated | ||
| 67 | and AFU directed. AFU may support one or both modes. | ||
| 68 | |||
| 69 | When using dedicated mode only one MMU context is supported. In | ||
| 70 | this mode, only one userspace process can use the accelerator at | ||
| 71 | time. | ||
| 72 | |||
| 73 | When using AFU directed mode, up to 16K simultaneous contexts can | ||
| 74 | be supported. This means up to 16K simultaneous userspace | ||
| 75 | applications may use the accelerator (although specific AFUs may | ||
| 76 | support fewer). In this mode, the AFU sends a 16 bit context ID | ||
| 77 | with each of its requests. This tells the PSL which context is | ||
| 78 | associated with each operation. If the PSL can't translate an | ||
| 79 | operation, the ID can also be accessed by the kernel so it can | ||
| 80 | determine the userspace context associated with an operation. | ||
| 81 | |||
| 82 | |||
| 83 | MMIO space | ||
| 84 | ========== | ||
| 85 | |||
| 86 | A portion of the accelerator MMIO space can be directly mapped | ||
| 87 | from the AFU to userspace. Either the whole space can be mapped or | ||
| 88 | just a per context portion. The hardware is self describing, hence | ||
| 89 | the kernel can determine the offset and size of the per context | ||
| 90 | portion. | ||
| 91 | |||
| 92 | |||
| 93 | Interrupts | ||
| 94 | ========== | ||
| 95 | |||
| 96 | AFUs may generate interrupts that are destined for userspace. These | ||
| 97 | are received by the kernel as hardware interrupts and passed onto | ||
| 98 | userspace by a read syscall documented below. | ||
| 99 | |||
| 100 | Data storage faults and error interrupts are handled by the kernel | ||
| 101 | driver. | ||
| 102 | |||
| 103 | |||
| 104 | Work Element Descriptor (WED) | ||
| 105 | ============================= | ||
| 106 | |||
| 107 | The WED is a 64-bit parameter passed to the AFU when a context is | ||
| 108 | started. Its format is up to the AFU hence the kernel has no | ||
| 109 | knowledge of what it represents. Typically it will be the | ||
| 110 | effective address of a work queue or status block where the AFU | ||
| 111 | and userspace can share control and status information. | ||
| 112 | |||
| 113 | |||
| 114 | |||
| 115 | |||
| 116 | User API | ||
| 117 | ======== | ||
| 118 | |||
| 119 | For AFUs operating in AFU directed mode, two character device | ||
| 120 | files will be created. /dev/cxl/afu0.0m will correspond to a | ||
| 121 | master context and /dev/cxl/afu0.0s will correspond to a slave | ||
| 122 | context. Master contexts have access to the full MMIO space an | ||
| 123 | AFU provides. Slave contexts have access to only the per process | ||
| 124 | MMIO space an AFU provides. | ||
| 125 | |||
| 126 | For AFUs operating in dedicated process mode, the driver will | ||
| 127 | only create a single character device per AFU called | ||
| 128 | /dev/cxl/afu0.0d. This will have access to the entire MMIO space | ||
| 129 | that the AFU provides (like master contexts in AFU directed). | ||
| 130 | |||
| 131 | The types described below are defined in include/uapi/misc/cxl.h | ||
| 132 | |||
| 133 | The following file operations are supported on both slave and | ||
| 134 | master devices. | ||
| 135 | |||
| 136 | |||
| 137 | open | ||
| 138 | ---- | ||
| 139 | |||
| 140 | Opens the device and allocates a file descriptor to be used with | ||
| 141 | the rest of the API. | ||
| 142 | |||
| 143 | A dedicated mode AFU only has one context and only allows the | ||
| 144 | device to be opened once. | ||
| 145 | |||
| 146 | An AFU directed mode AFU can have many contexts, the device can be | ||
| 147 | opened once for each context that is available. | ||
| 148 | |||
| 149 | When all available contexts are allocated the open call will fail | ||
| 150 | and return -ENOSPC. | ||
| 151 | |||
| 152 | Note: IRQs need to be allocated for each context, which may limit | ||
| 153 | the number of contexts that can be created, and therefore | ||
| 154 | how many times the device can be opened. The POWER8 CAPP | ||
| 155 | supports 2040 IRQs and 3 are used by the kernel, so 2037 are | ||
| 156 | left. If 1 IRQ is needed per context, then only 2037 | ||
| 157 | contexts can be allocated. If 4 IRQs are needed per context, | ||
| 158 | then only 2037/4 = 509 contexts can be allocated. | ||
| 159 | |||
| 160 | |||
| 161 | ioctl | ||
| 162 | ----- | ||
| 163 | |||
| 164 | CXL_IOCTL_START_WORK: | ||
| 165 | Starts the AFU context and associates it with the current | ||
| 166 | process. Once this ioctl is successfully executed, all memory | ||
| 167 | mapped into this process is accessible to this AFU context | ||
| 168 | using the same effective addresses. No additional calls are | ||
| 169 | required to map/unmap memory. The AFU memory context will be | ||
| 170 | updated as userspace allocates and frees memory. This ioctl | ||
| 171 | returns once the AFU context is started. | ||
| 172 | |||
| 173 | Takes a pointer to a struct cxl_ioctl_start_work: | ||
| 174 | |||
| 175 | struct cxl_ioctl_start_work { | ||
| 176 | __u64 flags; | ||
| 177 | __u64 work_element_descriptor; | ||
| 178 | __u64 amr; | ||
| 179 | __s16 num_interrupts; | ||
| 180 | __s16 reserved1; | ||
| 181 | __s32 reserved2; | ||
| 182 | __u64 reserved3; | ||
| 183 | __u64 reserved4; | ||
| 184 | __u64 reserved5; | ||
| 185 | __u64 reserved6; | ||
| 186 | }; | ||
| 187 | |||
| 188 | flags: | ||
| 189 | Indicates which optional fields in the structure are | ||
| 190 | valid. | ||
| 191 | |||
| 192 | work_element_descriptor: | ||
| 193 | The Work Element Descriptor (WED) is a 64-bit argument | ||
| 194 | defined by the AFU. Typically this is an effective | ||
| 195 | address pointing to an AFU specific structure | ||
| 196 | describing what work to perform. | ||
| 197 | |||
| 198 | amr: | ||
| 199 | Authority Mask Register (AMR), same as the powerpc | ||
| 200 | AMR. This field is only used by the kernel when the | ||
| 201 | corresponding CXL_START_WORK_AMR value is specified in | ||
| 202 | flags. If not specified the kernel will use a default | ||
| 203 | value of 0. | ||
| 204 | |||
| 205 | num_interrupts: | ||
| 206 | Number of userspace interrupts to request. This field | ||
| 207 | is only used by the kernel when the corresponding | ||
| 208 | CXL_START_WORK_NUM_IRQS value is specified in flags. | ||
| 209 | If not specified the minimum number required by the | ||
| 210 | AFU will be allocated. The min and max number can be | ||
| 211 | obtained from sysfs. | ||
| 212 | |||
| 213 | reserved fields: | ||
| 214 | For ABI padding and future extensions | ||
| 215 | |||
| 216 | CXL_IOCTL_GET_PROCESS_ELEMENT: | ||
| 217 | Get the current context id, also known as the process element. | ||
| 218 | The value is returned from the kernel as a __u32. | ||
| 219 | |||
| 220 | |||
| 221 | mmap | ||
| 222 | ---- | ||
| 223 | |||
| 224 | An AFU may have an MMIO space to facilitate communication with the | ||
| 225 | AFU. If it does, the MMIO space can be accessed via mmap. The size | ||
| 226 | and contents of this area are specific to the particular AFU. The | ||
| 227 | size can be discovered via sysfs. | ||
| 228 | |||
| 229 | In AFU directed mode, master contexts are allowed to map all of | ||
| 230 | the MMIO space and slave contexts are allowed to only map the per | ||
| 231 | process MMIO space associated with the context. In dedicated | ||
| 232 | process mode the entire MMIO space can always be mapped. | ||
| 233 | |||
| 234 | This mmap call must be done after the START_WORK ioctl. | ||
| 235 | |||
| 236 | Care should be taken when accessing MMIO space. Only 32 and 64-bit | ||
| 237 | accesses are supported by POWER8. Also, the AFU will be designed | ||
| 238 | with a specific endianness, so all MMIO accesses should consider | ||
| 239 | endianness (recommend endian(3) variants like: le64toh(), | ||
| 240 | be64toh() etc). These endian issues equally apply to shared memory | ||
| 241 | queues the WED may describe. | ||
| 242 | |||
| 243 | |||
| 244 | read | ||
| 245 | ---- | ||
| 246 | |||
| 247 | Reads events from the AFU. Blocks if no events are pending | ||
| 248 | (unless O_NONBLOCK is supplied). Returns -EIO in the case of an | ||
| 249 | unrecoverable error or if the card is removed. | ||
| 250 | |||
| 251 | read() will always return an integral number of events. | ||
| 252 | |||
| 253 | The buffer passed to read() must be at least 4K bytes. | ||
| 254 | |||
| 255 | The result of the read will be a buffer of one or more events, | ||
| 256 | each event is of type struct cxl_event, of varying size. | ||
| 257 | |||
| 258 | struct cxl_event { | ||
| 259 | struct cxl_event_header header; | ||
| 260 | union { | ||
| 261 | struct cxl_event_afu_interrupt irq; | ||
| 262 | struct cxl_event_data_storage fault; | ||
| 263 | struct cxl_event_afu_error afu_error; | ||
| 264 | }; | ||
| 265 | }; | ||
| 266 | |||
| 267 | The struct cxl_event_header is defined as: | ||
| 268 | |||
| 269 | struct cxl_event_header { | ||
| 270 | __u16 type; | ||
| 271 | __u16 size; | ||
| 272 | __u16 process_element; | ||
| 273 | __u16 reserved1; | ||
| 274 | }; | ||
| 275 | |||
| 276 | type: | ||
| 277 | This defines the type of event. The type determines how | ||
| 278 | the rest of the event is structured. These types are | ||
| 279 | described below and defined by enum cxl_event_type. | ||
| 280 | |||
| 281 | size: | ||
| 282 | This is the size of the event in bytes including the | ||
| 283 | struct cxl_event_header. The start of the next event can | ||
| 284 | be found at this offset from the start of the current | ||
| 285 | event. | ||
| 286 | |||
| 287 | process_element: | ||
| 288 | Context ID of the event. | ||
| 289 | |||
| 290 | reserved field: | ||
| 291 | For future extensions and padding. | ||
| 292 | |||
| 293 | If the event type is CXL_EVENT_AFU_INTERRUPT then the event | ||
| 294 | structure is defined as: | ||
| 295 | |||
| 296 | struct cxl_event_afu_interrupt { | ||
| 297 | __u16 flags; | ||
| 298 | __u16 irq; /* Raised AFU interrupt number */ | ||
| 299 | __u32 reserved1; | ||
| 300 | }; | ||
| 301 | |||
| 302 | flags: | ||
| 303 | These flags indicate which optional fields are present | ||
| 304 | in this struct. Currently all fields are mandatory. | ||
| 305 | |||
| 306 | irq: | ||
| 307 | The IRQ number sent by the AFU. | ||
| 308 | |||
| 309 | reserved field: | ||
| 310 | For future extensions and padding. | ||
| 311 | |||
| 312 | If the event type is CXL_EVENT_DATA_STORAGE then the event | ||
| 313 | structure is defined as: | ||
| 314 | |||
| 315 | struct cxl_event_data_storage { | ||
| 316 | __u16 flags; | ||
| 317 | __u16 reserved1; | ||
| 318 | __u32 reserved2; | ||
| 319 | __u64 addr; | ||
| 320 | __u64 dsisr; | ||
| 321 | __u64 reserved3; | ||
| 322 | }; | ||
| 323 | |||
| 324 | flags: | ||
| 325 | These flags indicate which optional fields are present in | ||
| 326 | this struct. Currently all fields are mandatory. | ||
| 327 | |||
| 328 | address: | ||
| 329 | The address that the AFU unsuccessfully attempted to | ||
| 330 | access. Valid accesses will be handled transparently by the | ||
| 331 | kernel but invalid accesses will generate this event. | ||
| 332 | |||
| 333 | dsisr: | ||
| 334 | This field gives information on the type of fault. It is a | ||
| 335 | copy of the DSISR from the PSL hardware when the address | ||
| 336 | fault occurred. The form of the DSISR is as defined in the | ||
| 337 | CAIA. | ||
| 338 | |||
| 339 | reserved fields: | ||
| 340 | For future extensions | ||
| 341 | |||
| 342 | If the event type is CXL_EVENT_AFU_ERROR then the event structure | ||
| 343 | is defined as: | ||
| 344 | |||
| 345 | struct cxl_event_afu_error { | ||
| 346 | __u16 flags; | ||
| 347 | __u16 reserved1; | ||
| 348 | __u32 reserved2; | ||
| 349 | __u64 error; | ||
| 350 | }; | ||
| 351 | |||
| 352 | flags: | ||
| 353 | These flags indicate which optional fields are present in | ||
| 354 | this struct. Currently all fields are Mandatory. | ||
| 355 | |||
| 356 | error: | ||
| 357 | Error status from the AFU. Defined by the AFU. | ||
| 358 | |||
| 359 | reserved fields: | ||
| 360 | For future extensions and padding | ||
| 361 | |||
| 362 | Sysfs Class | ||
| 363 | =========== | ||
| 364 | |||
| 365 | A cxl sysfs class is added under /sys/class/cxl to facilitate | ||
| 366 | enumeration and tuning of the accelerators. Its layout is | ||
| 367 | described in Documentation/ABI/testing/sysfs-class-cxl | ||
| 368 | |||
| 369 | Udev rules | ||
| 370 | ========== | ||
| 371 | |||
| 372 | The following udev rules could be used to create a symlink to the | ||
| 373 | most logical chardev to use in any programming mode (afuX.Yd for | ||
| 374 | dedicated, afuX.Ys for afu directed), since the API is virtually | ||
| 375 | identical for each: | ||
| 376 | |||
| 377 | SUBSYSTEM=="cxl", ATTRS{mode}=="dedicated_process", SYMLINK="cxl/%b" | ||
| 378 | SUBSYSTEM=="cxl", ATTRS{mode}=="afu_directed", \ | ||
| 379 | KERNEL=="afu[0-9]*.[0-9]*s", SYMLINK="cxl/%b" | ||
