VxDebug: Vortex Debug Extension

External debug spec for Vortex GPGPU

The Vortex Debug Extension adds support for interactive debugging of programs running on RISC-V based GPUs such as Vortex. It provides a bridge between a debugger (such as GDB) and the GPU, exposing hardware state (registers, memory, warps, threads) through the standard GDB Remote Serial Protocol (RSP).

Programming Model

Vortex is a GPGPU that uses a SIMT execution model. The smallest unit of computation is a thread. Multiple threads are grouped into warps that share a program counter (PC). These warps are further grouped into blocks (or tasks) that are dynamically scheduled on multiple vortex cores by the vortex runtime.

Hardware Architecture

Vortex is a highly paramatrized in-order issue and out-of-order commit architecture based on the RISC-V ISA. It supports the GPU stack by implementing custom RISC-V instruction that allow spawning and managing threads. The Vortex core contain 6 pipeline stages namely: schedule, fetch, decode, issue, execute, and commit. A thread on vortex owns a private 32-register space in the register file. Multiple threads are grouped into a warp which sharing a program counter (PC). Each vortex core can hold multiple warp contexts.

The schedule stage in Vortex is responsible for managing and scheduling these warps to the pipeline. Schedule stage keeps track of active warps, stalled warps, per-warp PCs and per-warp thread masks. Each cycle, a warp scheduler dynamically picks a warp from active warps for execution. As the scheduled warp propages through the pipeline, it fetches an instruction from I-Cache in fetch stage, decodes it in decode stage, and reads operands in the issue stage. The warps are issued to execute stage where they use multi-lane functional units for computation. Each thread in a warp is typically assigned a lane in the FU for computation. Finally the results are commited back to register file in the commit stage.

The debug extension is not tied to vortex implementation in any way and can generally be reused for other similar architectures.

Debug System Overview

Typically, Vortex GPGPU is connected to the host processor over the PCIe bus. A typical vortex program consists of 2 parts: 1) host code, and 2) kernel code. The host code is responsible for setting up the device side buffers, transferring data and launching the kernel. Vortex programs are compiled using 2 compilers, typically x86_64 on host and riscv32/64 for vortex. When the program is launched, the host code talks to the vortex hardware over the PCI bus through the vortex runtime to setup buffers and launch the kernel. When the kernel is launched, a debugger can be attached to the kernel to debug the kernel code.

Vortex Debug Extension

Vortex debug extension uses a lightweight Debug Module (DM) alongside the small debug logic in cores to support runtime software debugging.

Features:

  • Can obtain all information about the platform automatically.
  • Supports upto 128 threads/warp, and upto 32k warps globally.
  • Supports batch halting/resuming.
  • Supports ability to halt warps after reset (allows debugging from first instruction).
  • Supports single warp stepping and instruction injection for arbitrary code execution.
  • *Supports reading/writing GPRs and CSRs.
  • *Supports reading/writing memory.
  • *Supports software breakpoints.

* emulated in using instruction injection.

Vortex Debug System Overview

Selecting Warps

Vortex is a warp-based architecture, so the DM works at the warp level. DM can halt/resume/step individual warps. Warps from different cores in a vortex platform are tracked globally using a global warp array in the DM. The DM has the ability to select multiple warps to perform halt/resume operations. Furthermore, It can select a single warp to perform stepping and instruction injection.

Selecting Multiple Warps

DM maintains a global warp status array and a global warp mask array to support batch operaions. To select warps, the debugger uses DSELECT.winsel to select a 32-bit window in the global arrays, and writes WMASK.mask to select one or more warps. By setting/clearing bit i in the global mask array, warp with warpid=i can be selected/deselected. Once one or more warps are selected, All subsequent halt/resume operations target the selected warps. Debugger can also observe the halted state of one or more warps by using DSELECT.winsel to select a 32-bit window in the global status array and reading the window of status bits through WSTATUS.status (1 means halted, 0 means active).

Selecting a Single Warp/Thread

While the debug extension allows selecting multiple warps for halting/resuming, Only one warp can be selected for more targeted operations such as single stepping or observing warp/thread state. Debugger can select a single warp/thead by writing its global warp id to DSELECT.warpsel and warp-local thead-id to DSELECT.threadsel. All subsequent step and instruction injection commands then use the selected warp/thread pair.

Debug Module

The Debug Module (DM) is a small IP that resides in the vortex topmodule. It exposes a control register interface for the debugger host, separate from the program’s memory space. Debugger can access these registers through a debug medium such as JTAG to control and inspect the execution on the platform.

Features:

  • Supports batch halting/resuming.
  • Supports ability to halt warps after reset (allows debugging from first instruction).
  • Supports single warp stepping and instruction injection for arbitrary code execution.
  • Supports reading/writing GPRs and CSRs (using instruction injection).
  • Supports reading/writing memory (using instruction injection).
  • Supports software emulated breakpoints.

Debug Module Registers

Addr Name Description
0x0 PLATFORM Platform information register
0x1 DCONFIG Debug configuration register
0x2 DSELECT Debug selection register
0x3 WMASK Warp mask register
0x4 WACTIVE Warp active register
0x5 WSTATUS Warp status register
0x6 DCTRL Debug Control Register
0x7 DPC Debug Program Counter Register
0x8 INJECT Instruction Injection Register
0x9-0xc DSCRATCH[0-3] Debug Scratch Registers (upto-4)

0x0: PLATFORM: Platform Information Register

{reg: [
    {bits: 3,  name: 'numthreads', attr: ['3']},
    {bits: 9,  name: 'numwarps', attr: ['9']},
    {bits: 9,  name: 'numcores', attr: ['9']},
    {bits: 7,  name: 'numclusters', attr: ['7']},
    {bits: 4,  name: 'platformid', attr: ['4']}
], config:{fontsize: 12}}
Subfield Width Access Description
platformid 4 R Platform ID
numclusters 7 R Number of clusters (Upto 128)
numcores 9 R Number of cores/cluster (Upto 512)
numwarps 9 R Number of warps/core (Upto 512)
numthreads 3 R $log_2$ number of threads/warp (Upto 128)

Vortex uses PlatformID = 4'b0001

0x1: DCONFIG: Debug Config Register

{reg: [
    {bits: 1, name: 'EH', attr: ['1']},
    {bits: 25, name: 'reserved', attr: ['25']},
    {bits: 3, name: 'RHC', attr: ['3']},
    {bits: 3, name: 'NRC', attr: ['3']},
], config: {fontsize: 12}}
Subfield Width Access Description
ndmresetcycles(NRC) 3 RW $log_2$ power of 2 cycles to assert ndmreset
3 bits = 0x0-0x7 –> $2^0$ to $2^7$ cycles
resethaltreqcycles(NRC) 3 RW $log_2$ number of cycles to assert resethaltreq
3 bits = 0x0-0x7 –> $2^0$ to $2^7$ cycles
ebreakh (EH) 1 RW Enable ebreak halt

To reduce register width, the cycle count is encoded as an exponent of 2 (i.e., cycles = 2^NRC)

0x2: DSELECT: Debug Select Register

{reg: [
    {bits: 7, name: 'threadsel', attr: ['7']},
    {bits: 15, name: 'warpsel', attr: ['15']},
    {bits: 10, name: 'winsel', attr: ['10']},
], config:{fontsize: 12}}
Subfield Width Access Description
winsel 10 RW Selects which 32-bit window of warp status/mask array is accessed
warpsel 15 RW Selects warp to debug (global Warp-id)
threadsel 7 RW Selects thread to debug (warp local thread-id)

Upto 128 threads/warps supported. Upto 32k total warps can be debugged globally.

0x3: WMASK: Warp Mask Register

{reg: [
    {bits: 32, name: 'mask', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
mask 32 RW Warp mask for selected window
(bit[n]=1 means warp n is selected)

0x4: WACTIVE: Warp Active Status Register

{reg: [
    {bits: 32, name: 'astatus', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
astatus 32 R warp active bits for selected window
(bit[n]=1 means warp n is active)

0x5: WSTATUS: Warp Status Register

{reg: [
    {bits: 32, name: 'status', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
status 32 R warp status for selected window
(bit[n]=1 means warp n is halted)

0x6: DCTRL: Debug Control Register

{reg: [
    {bits: 1, name: 'HR', attr: ['1']},
    {bits: 1, name: 'RR', attr: ['1']},
    {bits: 1, name: 'RHR', attr: ['1']},
    {bits: 1, name: 'SR', attr: ['1']},
    {bits: 2, name: 'SST', attr: ['2']},
    {bits: 1, name: 'IR', attr: ['1']},
    {bits: 2, name: 'IST', attr: ['2']},
    {bits: 3, name: 'HC', attr: ['3']},
    {bits: 12, name: 'reserved', attr: ['12']},
    {bits: 1, name: 'NU', attr: ['1']},
    {bits: 1, name: 'AU', attr: ['1']},
    {bits: 1, name: 'NR', attr: ['1']},
    {bits: 1, name: 'AR', attr: ['1']},
    {bits: 1, name: 'NH', attr: ['1']},
    {bits: 1, name: 'AH', attr: ['1']},
    {bits: 1, name: 'NT', attr: ['1']},
    {bits: 1, name: 'DA', attr: ['1']}
], config:{fontsize: 12}}
Subfield Width Access Description
dmactive (DA) 1 RW write 1 to enable debug module, when 0, all debug module registers are reset
ndmreset (NT) 1 RW write 1 to assert ndmreset output for NDMRESET_CYCLES cycles, Read returns 1 if ndmreset is currently asserted
allhalted (AH) 1 R All warps are halted
anyhalted (NH) 1 R Any warps are halted
allrunning (AR) 1 R All warps are running
anyrunning (NR) 1 R Any warps are running
allunavail (AU) 1 R All warps are unavailable
anyunavail (NU) 1 R Any warps are unavailable
hacause (HC) 3 R Shows halt cause of currently selected warp
(3b000: NONE, 3b001: EBREAK, 3b010: HALTREQ, 3b011: STEP, 3b100: RESETHALTREQ)
injectstate (IST) 2 R Shows status of instruction inject request
(2b00: NONE, 2b01: REQ, 2b10: INFLIGHT)
injectreq (IR) 1 W write 1 to inject an instruction (INJECT.instr) in warp selected by DSELECT.warpsel and thread selected by DSELECT.threadsel
stepstate (SST) 2 R Shows status of step request
(2b00: NONE, 2b01: REQ, 2b10: INFLIGHT)
stepreq (SR) 1 W write 1 to step warp selected by DSELECT.warpsel
resethaltreq (RHR) 1 W Write 1 and assert ndmreset to halt selected warps right after reset
resumereq (RR) 1 W write 1 to resume all selected warps in global warp array
haltreq (HR) 1 W write 1 to halt all selected warps global warp array

Its acceptable to report hacause as HALTREQ on a RESETHALTREQ to simplify things.

0x7: DPC: Debug Program Counter Register

{reg: [
    {bits: 32, name: 'pc', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
pc 32 RW Program counter of the warp selected by DSELECT.warpsel
value is valid if warp is halted

0x8: INJECT: Instruction Injection Register

{reg: [
    {bits: 32, name: 'instr', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
instr 32 RW Instruction to be injected when DCTRL.injectreq is asserted

0x9-0xc: DSCRATCH0-3: Debug Scratch Registers

{reg: [
    {bits: 32, name: 'data', attr: ['32']}
], config:{fontsize: 12}}
Subfield Width Access Description
data 32 RW data

DSCRATCH register is exposed to core as a per-thread CSR register. It can also be read/written by the debugger through backdoor access.