Note/»Synopsys ARC NPX« — NPU for AI and »Neural« Processing

01 Jan 2026 — Igor Böhm

This collection of random notes, quotes, and articles is work in progress…

Terminology

NPX…informally stands for Neural Processor EXtension ¹ NPU…Neural Processing Unit

Convolution…Convolution is fancy multiplication ² 1. a thing that is complex and difficult to follow 2. a coil or twist (DE: Windung, Verschlingung) 3. a sinuous fold in the surface of the brain (DE: Hirnwindung, Krümmung des Gehirns) 4. Mathematics: a function derived from two given functions by integration which expresses how the shape of one is modified by the other (DE: Faltung)

CNN…Convolutional Neural Net Machine learning is about discovering the math functions that transform input data into a desired result (a prediction, classification, etc.).

Starting with an input signal, we could convolve it with a bunch of kernels:

 > input * k_1 * k_2 * k_3 … = result

Given that convolution can do complex math (moving averages, blurs, derivatives...), it seems some combination of kernels should turn our input into something useful, right?

Convolutional Neural Nets (CNNs) process an input with layers of kernels, optimizing their weights (plans) to reach a goal. Imagine tweaking the treatment plan to keep medicine usage below some threshold.

Tensor…multi-dimensional array 1. Mathematics: a mathematical object analogous to but more general than a vector, represented by an array of components that are functions of the coordinates of a space. 2. Anatomy: a muscle that tightens or stretches a part of the body

In machine learning, a tensor is a multi-dimensional array (like a list, matrix, or higher-dimensional grid) used as the fundamental data structure for storing and manipulating numerical data, from simple numbers (scalars) to complex data like images and text, enabling efficient computations, especially on GPUs, for training models like neural networks. Think of them as generalized vectors and matrices, where a scalar is a 0D tensor, a vector is 1D, a matrix is 2D, and so on, allowing for flexible representation of various data types and model parameters.

Key Aspects of Tensors in ML:
• Data Representation/Type: They organize diverse data: a single pixel (scalar), a row of pixels (vector), an image (matrix or 3D tensor with color channels), or batches of data.
• Building Blocks: Frameworks like TensorFlow and PyTorch are named after them; they're the basic units for inputs, weights, biases, and gradients.
• Dimensionality (Rank):
    — The number of dimensions determines the rank:
        • 0D (Scalar): A single number (e.g., 5).
        • 1D (Vector): A list of numbers (e.g., [1, 2, 3]).
        • 2D (Matrix): A grid/table of numbers (e.g., [[1,2],[3,4]]).
        • 3D+ (Higher-Order Tensor): Stacks of matrices, like a color image (height, width, color channels).
• Efficient Computation: Their array structure allows for parallel processing and optimized mathematical operations (addition, multiplication) on hardware like GPUs, crucial for deep learning.
• Automatic Differentiation: Tensors support automatic differentiation, simplifying complex training algorithms like backpropagation in neural networks.

NNAC…Neural Network Compiler³

NCHW…acronym describing the order of the axes in a tensor containing image data samples (N: # of data samples, C: image channels with RGB having 3 channels, H:height, W: width) aka »channels-FIRST layout« as used by ONNX with data stored as a batch favoring GPU based training where processing all pixels in a channel together is faster.

NHWC…acronym describing the order of the axes in a tensor containing image data samples (N: # of data samples, C: image channels with RGB having 3 channels, H:height, W: width) aka »channels-LAST layout« as used by TensorFlow favoring CPU performance and specific emory access patterns, storing all color channels for one pixel together.

Synopsys ARC NPX6 NPU IP Architecture

 —— NPU ————————————————————————————————————————
|  —— Core ————————————————————————————————————————
| |	• L1 Controller with MMU
| |	• DMA
| |	• L1 Memory
| |	• Convolution Accelerator 4K MAC
| |	• Tensor Accelerator
| |	• Tensor FPU
|  ————————————————————————————————————————————————
|
|• L2 Controller with MMU
|• L2 Shared memory
|• DMA
|• STU
| ————————————————————————————————————————————————

At the heart of the architecture is the convolution accelerator supporting:

various types of convolution layers
matrix multiplication (i.e., used in transformers)

Its companion is the tensore accelerator supporting:

a set of activation functions
other tensor operations

The L2 Controller orchestrates the signal processing.

Linley Gwennap, Synopsys NPX6 Expands AI Options, TechInsights, 2022. ↩︎
Kalid Azad, Intuitive Guide to Convolution. ↩︎
Tom Michiels, Highly efficient programing environment for handling AI workloads,ARC Summint 2022. ↩︎

#Note