vllm.model_executor.layers.quantization.utils.mxfp4_utils ¶
Functions:
-
should_use_cdna4_mx_scale_swizzle–Whether to use the CDNA4 swizzled scale layout for mxfp4 on gfx950.
_swizzle_mxfp4(quant_tensor, scale, num_warps=8) ¶
weight swizzle for mxfp4 moe, used for OAI mxfp4 kernel
Source code in vllm/model_executor/layers/quantization/utils/mxfp4_utils.py
should_use_cdna4_mx_scale_swizzle() ¶
Whether to use the CDNA4 swizzled scale layout for mxfp4 on gfx950.
CDNA4 swizzle requires BLOCK_K%256==0; at TP>=4 the A8W4 dispatch picks BK<256 tiles for the smaller per-rank shapes, so swizzle must be off. Used by both the weight-load swizzle in _swizzle_mxfp4 and the kernel-argument gate in aiter_mxfp4_w4a8_moe; they must agree.