Arm Architecture, SoC Development, and AI Acceleration

1. Introduction to Arm

Arm is a processor design company that creates blueprints (BP) and licenses both its ISA (Instruction Set Architecture) and microarchitecture designs.
๐Ÿ‘‰ Arm Developer

Armโ€™s CPU architectures are implemented by a wide range of microarchitectures.

  • Architecture (ISA) defines:
    • Basic instruction set
    • Exception handling
    • Memory model

Arm Profiles

  • A (Application) Profile: Runs full systems like Linux
  • R (Real-time) Profile: Runs RTOS or bare-metal code (BMC), for safety-critical systems
  • M (Microcontroller) Profile: Runs RTOS or BMC, optimized for efficiency

2. Arm Architecture Profiles

A-Profile

Supports both 32-bit and 64-bit execution:

  • AArch32: T32 + A32 IS (backward compatible with Armv7-A)
  • AArch64: A64 IS (modern 64-bit execution)

Latest versions (2025):

  • v9.4-A (in sync with v8.9-A)
  • Armv9 introduces:
    • SVE2 (Scalable Vector Extension 2)
    • aSIMD (NEON SIMD)
    • Security isolation

๐Ÿ”— SIMD overview
๐Ÿ”— SVE2 documentation

Key Milestones

  • v8.1: AArch64 introduction
  • v8.2: FP16 support (Half-precision floating point)
  • v8.3: Improved JS data type conversion
  • v8.4: Memory partitioning, SHA acceleration
  • v8.5 / v9.0: Memory tagging, branch target identification
  • v8.6 / v9.1: GEMM (General Matrix Multiply)
  • v8.7 / v9.2: PCIe hotplug
  • v8.8 / v9.3: PAC enhancements, optimized memcpy(), memset()

R-Profile

Designed for time-sensitive and safety-critical systems.
Features:

  • PMSA & MPU
  • Functional safety (ISO 26262, IEC 61508 certified from R52+)
  • SSD/HDD controllers (R82 supports DRAM up to 1TB)
  • Safety Island for isolation and rapid response

M-Profile

Optimized for microcontrollers and efficiency.

  • v8-M Baseline: Minimalist, bare-metal
  • v8-M Mainline: Feature-rich extension
  • v8-M23: Successor to v6-M
  • v8-M33: Successor to v7-M

Supports Helium technology (vector extension for ML & DSP).

  • Cortex-M55, M85 support Helium
  • Difference vs Neon:
    • Helium = designed for small, low-power CPUs
    • Optimized for loop prediction, lane prediction, and complex math ops
  • Lacks a secure monitor

3. Arm System IP

AMBA (Advanced Microcontroller Bus Architecture)

Provides flexible interconnect protocols.

  • APB: Low-bandwidth control (registers, system peripherals)
  • AHB: High-speed single-clock edge
  • AXI-CE: System-wide coherency
  • CHI: High-performance coherent interconnect
  • C2C (Chip-to-Chip): Multichip SMP & coherent links

Benefits:

  • Efficient IP reuse
  • Flexibility
  • Compatibility
  • Ecosystem support

Exception Levels (Armv8 & v9)

  • EL0: Applications
  • EL1: OS Kernel
  • EL2: Hypervisor
  • EL3: Trusted firmware

Armv9 enhancements:

  • Memory tagging
  • Secure EL2
  • Pointer authentication
  • Branch target identifier
  • Crypto extensions

4. SoC Development Flow

Armโ€™s SoC design flow is similar to other chips:

  1. Frontend
    • Define system requirements
    • Choose cores, IP, and system-level optimizations
  2. IP Selection & Integration
    • Choose pre-verified RTL IP
    • Ensure EDA compatibility
  3. Frontend Design & Verification
    • RTL design (Verilog)
    • Testbenches & constraints validation
  4. Software Development (Parallel)
    • Virtual platforms (Arm FVPs)
    • Firmware, bootloaders, drivers
    • HW/SW co-design
  5. Backend Implementation
    • Logic synthesis โ†’ gate-level netlist
    • Floorplanning, place & route
    • Clock tree, parasitic extraction
    • STA, DFM, DRC/LVS checks
  6. Tape-out
    • GDSII layout to foundry
  7. Validation
    • FPGA / validation boards
  8. Mass Production
    • Yield optimization

Key Components of Arm SoCs

  • CPU cores: Cortex-A, R, M
  • Memory subsystems: caches, SRAM, ROM
  • AMBA interconnect
  • System IP & Corstone (prebuilt compute subsystems)
  • Peripherals: UART, I2C, USB, SPI, GPIOs, timers, ADC/DAC
  • Accelerators: AI/ML via Ethos NPUs
  • Security: TrustZone, secure boot, TPM
  • Power/Clock management: DVFS, gating, power islands

๐Ÿ”— SoC Labs


5. Arm Ethos-U (NPU)

Armโ€™s Ethos-U NPUs provide dedicated ML acceleration.

  • U55: 256 MAC (CNN focus)
  • U65: 512 MAC
  • U85: 2048 MAC (Transformer focus)

Workflow

  • NN ops assigned to NPU
  • Conditions & post-processing on CPU
  • Weights in flash, activations in SRAM
  • CPU triggers execution, handles interrupts

Standards & Software:

  • TOSA (Tensor Operator Set Architecture) support
  • CMSIS-NN kernels for Cortex-M fallback
  • uTFL (TensorFlow Lite Micro) support (INT8 only)
  • Vela compiler for operator partitioning & quantization
  • Arm NN: ML framework bridge
  • KleidiAI: Optimized NN runtime

6. CPU & GPU Optimization

CPU Optimizations

  • Instruction-level parallelism (pipelines, OoO execution)
  • Data-level parallelism (SIMD, SME)
  • Branch prediction & speculative execution
  • Cache prefetching & tuning
  • Power efficiency (DVFS, C-states)
  • Multi-core parallelism

GPU Optimizations

  • SIMD & SIMT execution
  • Memory hierarchy optimization
  • Multi-precision support
  • MMA units (like tensor cores)

7. Neural Network Optimizations

  • Quantization (linear + QAT)
  • Pruning (remove redundant layers)
  • Knowledge Distillation (teacher/student models)
  • Matrix optimizations:
    • Loop reordering
    • Tiling
    • Loop unrolling
    • Multi-core parallelism (OpenMP, SMT)

Advanced Features

  • SVE2 + SME (Scalable Matrix Extension)
    • Dedicated tile storage (ZA)
    • Streaming execution mode

All of these are integrated into KleidiAI libraries.

๐Ÿ”— Arm University GitHub
๐Ÿ”— Arm Developer Labs


8. Conclusion

Armโ€™s ecosystem spans CPU, GPU, and NPU architectures, connected by robust system IP and optimized through software libraries.

From application processors (A-profile) to real-time controllers (R-profile) and ultra-low-power microcontrollers (M-profile), Arm enables scalable solutions across domains such as IoT, AI, cloud, and mobile computing.

The integration of Ethos NPUs, Helium, Neon, SVE2, and SME highlights Armโ€™s commitment to AI/ML acceleration and energy-efficient computing for the next generation of devices.

Note: (Latest information: 2025 Aug), any further updates from Arm are not get updated here.