Note: this entire post is a lightly edited checkpoint of a conversation with some LLM. Part 1 here.

Introduction

RDMA-CM (RDMA Connection Manager) is the control-plane layer for RDMA. It exists to make connection setup look and feel like sockets: applications can bind to addresses, listen on ports, and accept or connect without worrying about raw InfiniBand identifiers. Underneath, RDMA-CM exchanges the connection metadata, programs the QPs, and hands back an endpoint ready for verbs. This is distinct from the verbs API, which only covers the data plane.

Background: QoS in InfiniBand

InfiniBand provides Quality of Service using Service Levels (SLs). An SL is a small integer carried in each packet’s header. The subnet manager maps SLs onto Virtual Lanes (VLs), which are independent link-level flows with separate buffering and flow control. By assigning flows to different SLs, the fabric can prioritize some traffic or keep classes of traffic from interfering with each other. SL is selected when a QP is created, and remains fixed for that QP’s traffic.

RDMA-CM Basics

RDMA-CM exposes a socket-like API:

  • The central handle is an rdma_cm_id, analogous to a socket fd.

  • Each cm_id belongs to a port space, such as RDMA_PS_TCP (connection-oriented) or RDMA_PS_UDP (datagrams).

  • Applications use rdma_bind_addr, rdma_listen, rdma_connect, etc., in the same way they would with sockets.

What looks like an IP+port pair in RDMA-CM is just a wrapper. With IPoIB enabled, the “IP” is the interface address (e.g. 10.94.3.105), which RDMA-CM resolves to a GID. Without IPoIB, you still can create connections, but the address is directly a GID (usually expressed in IPv6 link-local form). In both cases, the binding resolves to a specific HCA port and GID.

The handshake itself is a control-plane exchange: on InfiniBand it uses CM MADs over QP1, on RoCE/iWARP it uses UDP/TCP messages. This is how QP numbers, PSNs, MTU, and SL hints are exchanged.

RDMA-CM and Service IDs

The key to QoS integration is that RDMA-CM maps the familiar notion of port numbers into InfiniBand Service IDs. A Service ID is a 64‑bit identifier that the Subnet Manager understands. In RDMA_PS_TCP, the port number you bind is encoded into the Service ID automatically. This provides a clean classification hook: flows created on different port ranges correspond to different Service IDs.

Subnet Manager QoS Policies & Setup (OpenSM)

OpenSM can enforce QoS by mapping Service IDs (derived from RDMA-CM port numbers) to Service Levels (SLs) and then to Virtual Lanes (VLs).

1) Enable QoS in OpenSM

Edit /etc/opensm/opensm.conf (or the file you pass with -f) and set:

qos TRUE
qos_max_vls 8              # adjust to your ASIC/link support (e.g., 4 or 8)
qos_policy_file /etc/opensm/qos-policy.conf

You may additionally pin SL→VL and VL arbitration globally here if you want static tables in the options file (advanced fabrics often keep these in the policy file instead).

Restart or start OpenSM with QoS active:

systemctl restart opensm    # distro services
# or manually
opensm -Q -f /etc/opensm/opensm.conf
# (-Q/--qos is equivalent to qos TRUE; -Y sets an explicit policy file path)

2) Define QoS policy (port→ServiceID→SL)

Create /etc/opensm/qos-policy.conf with policy rules. A pragmatic pattern is to carve the RDMA-CM port space into classes and assign SLs per Service ID range. Example with three classes:

# --- Classes ---------------------------------------------------
# Control plane / RPC (low bandwidth, low latency)
service_id 0x0000000000001000-0x0000000000001fff sl 4

# Bulk telemetry / background
service_id 0x0000000000002000-0x0000000000002fff sl 1

# Data path / latency-critical (highest priority)
service_id 0x0000000000003000-0x0000000000003fff sl 6

Notes:

  • RDMA-CM encodes RDMA_PS_TCP + port into a 64-bit Service ID. Use contiguous Service ID ranges to represent your port ranges. Confirm the exact mapping in your stack by printing the CM service_id on connection or using SA queries (see verification below).

  • If you use IPoIB, classification still hinges on Service ID, not L3 QoS.

3) Map SLs to VLs (isolation / priority)

Still in qos-policy.conf, define SL→VL and VL arbitration so the fabric actually separates traffic:

# Map each SL to a VL. Keep hot classes on distinct VLs.
sl2vl 0 0
sl2vl 1 1
sl2vl 4 2
sl2vl 6 3

# Basic VL arbitration (weights). Keep highest-priority VLs with more credits.
vlarb_high 0:64 1:64 2:96 3:128
vlarb_low  0:32 1:32 2:48  3:64

Guidelines:

  • Keep the number of active VLs modest (4–8). Ensure all HCAs/links support the count you choose.

  • Use distinct VLs for classes that must not interfere. Coalesce low-priority classes on the same VL if you’re VL-limited.

5) Apply & persist

Ensure your distro loads the same opensm.conf and qos-policy.conf on boot. On systems with multiple HCAs/SM instances, anchor each OpenSM to a GUID/port and reuse the same policy file.

Verification / Troubleshooting

  • Check SM picked up QoS: review opensm.log for policy parsing, SL2VL, and VLArb tables.

  • Resolve paths with SA:

    • Get a PathRecord with an explicit Service ID to see the returned SL (and MTU/rate). Using saquery (part of infiniband-diags):

        saquery --sgid-to-dgid <sgid>:<dgid> --service-id 0x0000000000003010 --pkey 0x7fff --sl
      
  • Confirm on endpoints: log the CM service_id and the final QP SL your app sees during connection events.

  • Port/VL counters: use perfquery/ibqueryerrors to confirm traffic hits the expected VLs; congestion on the wrong VL hints at SL2VL mismatch.

Putting It Together

Trace the path: RDMA-CM portService IDSubnet Manager policySLVL The path from application to QoS is:

RDMA-CM portService IDSubnet Manager policySLVL

The application simply binds to a port; RDMA-CM turns that into a Service ID; the SM maps Service IDs to SLs; the fabric maps SLs to VLs. QoS enforcement is transparent to the app.

Practical Notes

  • With IPoIB: CM addresses look like normal IPs, but they resolve to GIDs under the hood.

  • Without IPoIB: you can still use RDMA-CM with GID-based addresses; port numbers still become Service IDs.

  • If no Service Record matches, the default SL (often 0) is used.

Conclusion

RDMA-CM is the bridge between socket-like connection setup and InfiniBand’s fabric-level QoS. It hides GIDs and QP details from the application, while exposing a port number abstraction that the Subnet Manager can map to Service Levels. This is what allows administrators to classify traffic by port ranges and enforce fabric QoS without requiring applications to manipulate QP attributes directly.