Compute Resource Solutions

Introduction

  • Cisco provides compute resource solutions including converged and hyperconverged infrastructure solutions, and specific solutions for artificial intelligence (AI), such as FlashStack Data Center, Nutanix GPT-in-a-Box, and Run:ai.

Cisco Hyperconverged Infrastructure Solutions Overview

  • Integrated hyperconverged solutions offer a robust, scalable, and efficient IT infrastructure that simplifies data center management. Joint Cisco and Nutanix approaches enable comprehensive support, seamless scalability, and advanced data protection, which makes them ideal for modern enterprises looking to optimize their IT operations and support a wide range of applications.

Introduction to Hyperconverged Solutions

  • One of the most challenging and costly aspects of designing a traditional three-tier solution is knowing how many resources will be needed over the lifecycle of the solution.

  • Hyperconverged infrastructure, on the other hand, scales linearly and predictably due to its core architecture, which automatically redistributes data as new nodes are added, using the built-in node expansion tool.

  • As you can see in the following figure, you can add storage-heavy, storage-only, or compute-only virtual machines (VMs), based on your needs.Alt text

  • You should consider scaling when you encounter the following situations:

    • If storage resources run out faster than compute, add storage-heavy nodes.

    • If more storage is needed, add storage-only nodes (without having to license a hypervisor).

      • Mix and match flash and hybrid nodes within the same cluster so your original cluster design does not constrain your efforts.

    • If compute resources run out faster than storage, you can add compute-only nodes to support more VMs without adding storage to the cluster.

  • The ability to purchase what you need in the short term and buy more resources as needed removes a lot of guesswork and friction when buying infrastructure.

Hyperconverged Infrastructure Simplifies Hybrid Multicloud

  • Cisco and Nutanix combine excellence in unifying management, infrastructure, and platforms to simplify operations and establish an agile foundation for supporting any application, regardless of its location.

  • Cisco offers best-in-class compute, network, and software-as-a-service (SaaS)-based infrastructure management. Cisco solutions have stateless, programmable policy-based systems, which allow users to see, control, and automate their infrastructure from a single location. Cisco also provides proactive, automated health monitoring and support capabilities.

  • Nutanix, on the other hand, is a leader in hyperconverged software. Nutanix's unified platform enables seamless workload mobility and includes a complete set of enterprise and cloud features. Nutanix's software also offers enterprise-grade disaster recovery and security capabilities.

  • As described in the following figure, a combination of both solutions aims to simplify and accelerate the delivery of IT infrastructure.Alt text

Unified Approach to Hybrid Multicloud Benefits

  • Cisco is partnering with Nutanix to deliver a hybrid multicloud solution built on combined leadership in application, data, and infrastructure management. The Nutanix Cloud Platform, based on its industry-leading hyperconverged infrastructure foundation, is now validated, certified, and integrated with the Cisco server infrastructure to help enterprises accelerate their hybrid multicloud journey.

  • Cisco Compute Hyperconverged with Nutanix is holistically built, managed, and supported to deliver a more seamless experience, foster innovation, and accelerate the hybrid multicloud journey for customers, as illustrated in the following figure.Alt text

  • Cisco Compute Hyperconverged with Nutanix simplifies and accelerates the delivery and operation of infrastructure and applications at a global scale with these advantages:

    • This end-to-end solution is built, managed, and supported holistically and features a best-in-class cloud operating model.

    • This solution adapts to dynamic business and application requirements with flexibility and choice in Cisco servers; the latest in accelerator, network, and storage technologies; SaaS innovations; and the freedom to connect to multiple clouds.

    • Innovation can occur confidently in the knowledge that joint, augmented support and automated resiliency capabilities prevent and resolve issues faster.

Cisco Hyperconverged Solution Components

  • Cisco Compute Hyperconverged with Nutanix is a secure, resilient, and self-healing software platform that allows you to build your hybrid multicloud infrastructure to support all kinds of workloads and use cases across public and private clouds.

  • Cisco Compute Hyperconverged with Nutanix has the following components:

    • Nutanix Cloud Platform (NCP), which includes Nutanix Cloud Infrastructure (NCI), Nutanix Unified Storage (NUS), and Nutanix Cloud Manager (NCM)

    • Hypervisor support: Nutanix Acropolis Hypervisor (AHV) and VMware vSphere

    • One or more of the supported Cisco UCS nodes

    • Systems management: Cisco Intersight Infrastructure Service

  • The hierarchy and dependency of the NCP components are illustrated in the following figure.

  • NCM is a software control plane for provisioning, operating, automating, and governing workloads across clouds.

  • NUS is a distributed and software-defined storage solution that provides the scale that organizations need to serve any workload anywhere. It enables a unified storage platform, which can provide block, file, and object storage services.

  • NCI provides a complete software stack to unify your hybrid cloud infrastructure and enables the following functionalities:

    • Scale-out storage

    • Nutanix Acropolis Hypervisor

    • Advanced hyperconverged infrastructure

    • Virtual networking

    • Disaster recovery

    • Container services

    • Data and network security

  • Cisco Unified Computing System (UCS) compute provides a stateless server architecture that combines compute and networking into a single platform to power your applications. Cisco UCS compute provides the following advantages:

    • Simplified management

    • Complete application programming interface (API) programmability

    • Stateless configuration through logical policies and profiles

    • Multiple form-factors and peripherals to meet customer requirements

  • Cisco Intersight is a SaaS-based management platform providing global visibility and fleet management for all Cisco Compute Hyperconverged with Nutanix nodes.

  • Intersight integration with Nutanix Prism Central allows these enhanced capabilities using Intersight standalone mode:

    • Nutanix personality in Intersight

    • Connected Cisco TAC

    • Proactive Return Material Authorization (RMA)

    • Hardware contract status, end-of-life (EOL) notices, security advisories, and field notices

    • Server faults and alerts

FlashStack Data Center

  • Converged infrastructure combines computing, networking, storage, management software, and automation capabilities in a unified solution. With validated solutions, converged infrastructure can simplify and accelerate deployment while reducing risk. The converged infrastructure consists of grouped, pretested, and prevalidated components. Converged infrastructure systems use storage arrays, and the server has no local storage.

  • Converged infrastructure typically includes the following elements:

    • Compute

    • Storage

    • Networking

    • Hypervisor

    • Management software

    • Automation and orchestration capabilities

  • Converged infrastructure vendors may deliver equipment that is tested, pre-racked, and ready to use. Alternatively, the equipment can be assembled at a customer site, along with a reference architecture document, which is a detailed guide on deploying and configuring the solution.

  • An example of converged infrastructure is a FlashStack data center that is powered by NVIDIA. It incorporates accelerated computing, essential AI software, and pretrained models. This stack simplifies the deployment of AI models across diverse applications and offers a comprehensive solution for a wide range of use cases.

  • The FlashStack architecture is built by using the following infrastructure components for compute, network, and storage:

    • Cisco UCS X-Series modular platform using Cisco UCS X210c M7 compute nodes with NVIDIA graphics processing units (GPUs)

    • Cisco Nexus switches

    • Cisco MDS 9000 Series Switches

    • Pure Storage FlashArray

  • The deployment consists of Red Hat OpenShift Container Platform clusters deployed on VMware vSphere installed on Cisco compute nodes with NVIDIA GPUs. Cisco Intersight manages the compute nodes. The software layer of the NVIDIA AI platform, NVIDIA AI Enterprise, powers the inferencing workflow.Alt text

  • All FlashStack components are integrated, so you can deploy the solution quickly and economically while eliminating many risks associated with researching, designing, building, and deploying similar solutions from the foundation. One of the main benefits of FlashStack is its ability to maintain consistency at scale.

  • The FlashStack solution uses Cisco UCS C-Series or Cisco UCS X-Series with the following hardware components:

  • A combination of the following server families:

    • Cisco UCS X9508 chassis with any number of Cisco UCS X210c M7 compute nodes.

    • Cisco UCS C-Series Rack Servers (C220 M7, C240 M7, C245 M8, for example).

  • Cisco UCS 4th-generation 6454 Fabric Interconnects to support 25 and 100 Gigabit Ethernet connectivity from various components.

  • High-speed Cisco Nexus Operating System (NX-OS)-based Cisco Nexus 93180YC-FX3 switching design to support up to 100 Gigabit Ethernet connectivity.

  • Pure Storage FlashBlade//S500 scale-out file and object storage with 100 Gigabit Ethernet connectivity to Cisco Nexus switching fabric.

  • Pure FlashArray//XL170 storage with 25 Gigabit Ethernet connectivity to Cisco Nexus switching fabric and 32-Gb Fibre Channel connectivity to Cisco MDS switching fabric.

  • The software components consist of these features:

    • Cisco Intersight platform to deploy, maintain, and support the FlashStack components.

    • Cisco Intersight Assist virtual appliance to help connect the Pure Storage FlashArray and VMware vCenter with the Cisco Intersight platform.

    • For virtualized clusters, VMware vCenter 8.0 is needed to set up and manage the virtual infrastructure and integration of the virtual environment with the Cisco Intersight software.

Nutanix GPT-in-a-Box

  • The Cisco Compute Hyperconverged with Nutanix GPT-in-a-Box solution takes the complexity out of adopting generative AI. It provides the steps for deploying the underlying infrastructure for this solution in a single box. This solution combines Cisco servers and SaaS operations with Nutanix software, using the most popular large language models (LLMs) to produce a fully validated AI-ready platform that can simplify and jumpstart your AI initiatives from the data center to the edge.

  • The solution offers three key benefits:

    • Simplicity: The solution offers both SaaS and on-premises management options and includes day-0 through day-N operations, including service profiles for compute, storage, and networking that is customized for Nutanix to help simplify and accelerate cluster deployment and deliver better performance and resiliency.

    • Flexibility: The Cisco Compute Hyperconverged with Nutanix solution addresses modern applications and use cases to offer multiple choices in Cisco UCS server deployment options, the latest accelerator and drive technologies, and SaaS innovations from two industry powerhouses, including integrations with the leading public cloud providers. Also, the solution incorporates Cisco best-in-class networking technology, including Cisco Application Centric Infrastructure (ACI) integrations, to enhance performance and resiliency for data-intensive workloads in hybrid cloud environments.

    • Resiliency: The joint solution uses only enterprise-grade components. Its augmented system protection includes a collaborative support model, proactive automated resilience and security capabilities, and support systems and case notes for faster triage. When log files are uploaded or case notes are generated, that information is shared, enabling enhanced collaboration among support teams to resolve issues faster and provide an improved customer experience. The policy-based approach minimizes human error and configuration drift, which results in consistent, reliable cluster deployments.

  • It also enforces an overall security posture through centralized authorizations to prevent tampering with configurations.

  • This reference architecture combines these elements:

    • Nutanix GPT-in-a-Box software-defined solution

    • Cisco Compute Hyperconverged C-Series servers

    • NVIDIA L40S GPU

    • Cisco Intersight standalone mode (no external fabric interconnect)

    • Systems management: Nutanix Prism

    • A range of the most popular LLMs

Run:ai on Cisco UCS

  • Run:ai is an AI orchestration platform that offers effective solutions for managing and streamlining AI workflows. When integrated with OpenShift on Cisco UCS X-Series, Run:ai can help optimize AI and machine learning workloads. OpenShift, a Kubernetes-based platform, provides the perfect environment for deploying and managing Run:ai and enables containerization and automation of AI workloads. Cisco UCS X-Series, a highly scalable and flexible modular computing platform, provides the necessary computing power and capacity to handle resource-intensive AI tasks.

  • The integration of Run:ai with OpenShift on Cisco UCS X-Series offers a solution for AI workload management. It allows organizations to dynamically allocate resources, simplify workload management, and accelerate AI research. With Run:ai, enterprises can efficiently prioritize tasks, ensure optimal resource utilization, and reduce operational costs.

  • The following figure describes the key features of Run:ai.Alt text

Fractional GPU Sharing Capability

  • Run:ai can allocate a container with a specific amount of GPU RAM. If your code needs 4 GB of RAM, you can submit a job specifying the exact portion of the GPU memory you need. Going beyond the specified RAM amount will result in an out-of-memory exception.

  • With the fraction GPU capability, all running workloads that use the GPU share the compute in parallel and, on average, get a requested share of the compute.

  • For example, assuming two containers, one with a 0.25 GPU workload and the other with a 0.75 GPU workload, both will, on average, get an equal proportion of the compute power. If one of the workloads does not use the GPU, the other workload will get the rest of the GPU's compute power.

Dynamic MIG

  • NVIDIA MIG allows GPUs that are based on the NVIDIA Ampere architecture (such as NVIDIA A100) to be partitioned into separate GPU instances. MIG can partition available GPU compute resources to provide a defined quality of service (QoS) with fault isolation for clients such as VMs, containers, or processes. MIG enables multiple GPU instances to run in parallel on a single, physical NVIDIA GPU. The division is both compute and memory and has fixed sizes. Up to seven units of compute and memory in fixed sizes are supported based on the various MIG profiles that can be found in the NVIDIA documentation. A typical profile can be MIG 2g.10gb, which provides 2/7 of the compute power and 10 GB of RAM. The division is static in the sense that you must call the NVIDIA API or the nvidia-smi command to create or remove the MIG partition.

  • To avoid static assignment, Run:ai provides a way to create a MIG partition dynamically. Run.ai has these characteristics:

    • Similar to the fractional GPU capability, you can request the portion of the GPU memory that you need. Run:ai will call the NVIDIA - MIG API to generate the smallest possible MIG profile for your request and allocate it to your container.

    • MIG is configured according to workload demand without draining workloads or involving an IT administrator.

    • Run:ai will automatically deallocate the partition when the workload finishes. The partition will not be removed until the scheduler - decides it is needed elsewhere.

    • In a single GPU cluster, you can have some MIG nodes that are dynamically allocated and some that are not.

NVIDIA GPUDirect

  • NVIDIA GPUDirect is a family of technologies that enhances data movement and access for NVIDIA data center GPUs. Using GPUDirect, network adapters, and storage drives can directly read and write to and from GPU memory. This ability eliminates unnecessary memory copies, decreases CPU overheads, reduces latency, and results in significant performance improvements.

  • NVIDIA GPUDirect includes these technologies:

    • GPUDirect Storage

    • GPUDirect Remote Direct Memory Access (RDMA)

  • From the developer's perspective, these technologies are presented through a comprehensive set of APIs.

  • The following figure compares a system with and without GPUDirect Storage. GPUDirect Storage enables a direct data path between local or remote storage, such as Non-Volatile Memory Express (NVMe) or NVMe over Fabric (NVMe-oF), and GPU memory. Bounce buffer refers to an intermediate memory area used to facilitate data transfer between the GPU and storage devices, such as NVMe, when direct data transfer is not possible.Alt text

  • GPUDirect Storage avoids CPU utilization, enabling a direct memory access (DMA) engine near the network interface card (NIC) or storage to move data on a direct path into or out of GPU memory. In addition to the benefits of speeding up computation with GPUs instead of CPUs, GPUDirect Storage acts as a force multiplier when whole data processing pipelines shift to GPU execution. This function becomes especially important when dataset sizes no longer fit into system memory, and data I/O to the GPUs grows to be the defining bottleneck in processing time.

  • GPUDirect RDMA is another technology in high-performance computing (HPC) and AI/ML clusters, where the GPUs in the remote nodes can directly access each other’s memory, as illustrated in the following figure.Alt text

  • Designed for GPU acceleration, GPUDirect RDMA provides direct communication between NVIDIA GPUs in remote systems. This communication eliminates the system CPUs and the required buffer copies of data via the system memory and results in approximately 10 times better performance.

  • The RDMA protocol enables remote direct memory access from the memory of one computer into the memory of another without involving the operating system of either one. Examples of operations include RDMA write and RDMA read. The RDMA protocol must not be confused with GPUDirect RDMA, which is not related to the RDMA protocol. GPUDirect RDMA is one of the technologies NVIDIA enables in the GPUDirect family of technologies. It enables the network card to send or receive data directly by accessing the GPU memory and bypassing the CPU memory copies and operating system routines. GPUDirect RDMA works with InfiniBand or RDMA over Converged Ethernet (RoCE).

  • GPUDirect RDMA is available in the Compute Unified Device Architecture (CUDA) toolkit.

  • Note: CUDA is a parallel computing platform and programming model developed by NVIDIA for general computing on GPUs. With CUDA, developers can dramatically speed up computing applications by using the power of GPUs.Alt textAlt text

Last updated