What is theNeuReality

NeuReality enables AI everywhere by offering a holistic solution for inference deployment that lowers the complexity, cost, and power consumption of AI inference.

NeuReality’s AI-Centric Approach

We’ve developed a revolutionary, AI-centric architecture based on the following elements:


Purpose-built and optimized for AI inference


New network attached system-on-a-chip to migrate simple but critical data-path functions from software to hardware


Upper-level AI tools and orchestrated solution for simplified deployment process


Simplified UX for development and deployment cycles

NeuReality Architecture

NeuReality has developed a new architecture design to exploit the power of Deep Learning Accelerators (DLAs).

We accomplish this through the world’s first Network Addressable Processing Unit, or NAPU.

This architecture enables inference through hardware with AI-over-Fabric, an AI-hypervisor, and AI-pipeline offload.

This illustration shows all of the functionality that is contained within the NAPU:

AI-centric vs CPU-centric

Traditional, generic, multi-purpose CPUs perform all their tasks in software, our purpose built components perform those same tasks in hardware that was specifically designed for AI inference.

The following table compares these two approaches:

AI-centric NAPUTraditional CPU-centric
Architecture ApproachPurpose-built for Inference workflowsGeneric, multi-purpose chip
AI Pipeline ProcessingLinear“Star” model
Instruction ProcessingHardware basedSoftware based
ManagementAI Process natively managed by cloud
orchestration tools
AI process not managed, only
CPU managed
Pre/Post ProcessingPerformed in HardwarePerformed in software by CPU
System ViewSingle chip hostPartitioned (CPU, NIC, PCI switch)
ScalabilityLinear scalabilityDiminishing returns
Total Cost of OwnershipLowHigh
LatencyLowHigh, due to over partitioning
and bottlenecking

In a traditional CPU-centric system, large portions of the workload still need to be run in software.

Only an AI-centric NAPU runs everything in dedicated, purpose-built hardware.

NeuReality Hardware

NR1 Network Addressable Processing Unit

NR1 Network Addressable Processing Unit

The NeuReality NR1 is a network attached inference Server-on-a-Chip with an embedded Neural Network Engine. The NR1 is the world’s first Network Attached Processing Unit (NAPU). As workflow-optimized hardware devices with specialized processing units, native network capabilities, and virtualization capabilities, NAPUs are the ideal form of devices specialized for specific capabilities and important in the heterogeneous data center of the future.

NR1-M Inference Module

NR1-M Inference Module

The NeuReality NR1-M module is a Full-Height Double-wide PCIe card containing one NR1 Network Attached Processing Unit (NAPU) system-on-chip and a network attached Inference Server and can connect to an external Deep Learning Accelerator (DLA).

NR1-S Inference Server

NR1-S Inference Server

The world’s first AI-centric server, NeuReality’s AI-centric NR1-S is an optimized design for an inference server which contains NR1-M modules with the NR1 NAPU, which enables truly disaggregated AI service in a scalable and efficient architecture. The system not only lowers cost and power performance by up to 50X but doesn’t require IT to implement for end users.

NeuReality Software

NeuReality also gives AI-centric hardware the automatic software tools that don’t require the developer to know how it is done on the hardware level.

Software Development Kit

The NeuReality software development kit (SDK) provides toolchain, runtime, and simple UX for data scientists and devops to remove barriers in deploying AI. The SDK consists of both cloud and AI server components. The NR Manager Deployment Tools help with runtime, orchestration, full Kubernetes management, and MLOPs Integration. The NR Monitor Offline Tools allow offloading of AI pipeline and compute graph compilation.

Composable Runtime

NeuReality APIs

NeuReality’s APIs provide a simplified UX for development and deployment cycles. Inference APIs hide development complexities and simplify the life cycle of AI applications by exposing functionalities through API per user.

NR Toolchain APINR Provisioner APINR Client API
For analyzing, training, and validating data, as well as optimizing, compiling, and emulatingFor deploying and monitoringFor connecting to register and serve

AI inference cycle