Chips & Hardware · Report

NVIDIA-Arm chip stack emerges as winning platform architecture for humanoid robot deployment wave.

New workload vertical validates AI chip demand extending beyond traditional data center, opening emerging high-margin robotics growth channel.

Trade pressSlicast · May 19, 2026 · Global · Source: forbes.com

importance 60

On May 13, four humanoid robots named Bob, Frank, Gary, and Rose climbed onto a conveyor belt at a Figure AI testing facility and began sorting packages. By the 24-hour mark, the four had moved more than 30,000 barcoded boxes between bins, recharging in shifts, with zero mechanical or software failures. The livestream ran past hour 38 with 47,000 packages handled before Figure called the demo. CEO Brett Adcock framed it as proof that humanoids can hold a human shift. However, the critical technical detail lies in the architecture: Helix-02, the neural network running the robots, was executing entirely on the hardware with no cloud connection and no data center round-trip. Every visual frame, every joint movement, and every grip adjustment was inferred locally, on chips bolted to the robot's torso.

Helix-02 is a unified neural network that controls walking, manipulation, and balance from raw sensor data, replacing more than 109,000 lines of hand-coded locomotion logic with a single set of weights. Motor control runs at 200 Hz while scene understanding runs at 7-9 Hz, with both loops continuous and constrained by millisecond-level latency budgets. Figure has confirmed the inference runs on dual Nvidia RTX GPU modules bolted inside the robot, with vision, manipulation, and balance all processed onboard. This architectural choice answers a question that has hovered over humanoid robotics since 2022: can a robot doing useful work in a warehouse rely on cloud inference? The answer Figure demonstrated for 38 hours is definitively no.

The constraints are fundamentally physical. Wireless latency to a cloud data center runs 20 to 200 milliseconds in optimistic conditions, while a humanoid robot balancing on two legs has roughly 10 milliseconds before a perturbation translates into a fall. Cloud inference cannot serve a closed-loop physical control system that must catch itself when it stumbles. Bandwidth compounds the problem: a pair of stereo cameras at production framerate generates gigabits per second of raw visual data, and streaming that to a data center in real time would saturate the wireless link of any warehouse. Additionally, warehouses lose connectivity routinely, and a robot that stops when the network drops is a worse worker than a human. Every humanoid robot doing closed-loop physical work at production speed will therefore run inference locally.

The Nvidia Jetson AGX Thor module shipped commercially in 2026, featuring 14 Arm Neoverse V3AE CPU cores running up to 2.6 GHz, a Blackwell-class GPU delivering 2,070 FP4 teraflops, and 128 gigabytes of LPDDR5X memory, all inside a 130-watt power envelope. Jetson Thor is now the reference platform major humanoid programs are designing around. This mirrors the data-center pattern with CPU plus GPU on one tightly-coupled SoC, with Arm cores handling control logic and Nvidia silicon handling parallel inference. Tesla represents the visible exception with its AI5 chip taped out in April—custom silicon designed in-house for both Optimus and Cybercab, with mass production targets of mid-2027. However, Tesla is the only credible humanoid program with the silicon capability to leave the standard stack; everyone else is buying.

The investment relevance is straightforward. A warehouse running ten humanoid robots in 2027 needs ten Jetson Thor-class modules, plus spares and development kits, with the math compounding quickly as pilots transition to fleet deployments. The US industrial sector is projected to need 3.8 million new workers by 2033, with nearly 1.9 million roles at risk of going unfilled, making warehouses unable to grow output by hiring alone. Current consensus models for Arm and Nvidia value the chips business off data-center capex and smartphone royalty volumes, with the edge inference tier inside physical AI not yet materially represented in those models. Figure's livestream represents the first commercial-grade signal that the deployments are imminent rather than theoretical.

Read the original