Unit 5_AI Software Ecosystems

Published on Nov 22, 2024

Scene 1 (0s)

[Virtual Presenter] Welcome. In this unit, we’ll cover the software ecosystem that has allowed developers to make use of GPU computing for data science..

Scene 2 (11s)

[Audio] We'll start with a brief overview of VGPU as a foundational technology. From there, we'll move into what frameworks are and their benefits with AI. We'll also provide an overview of the NVIDIA software stack and Cuda X AI software acceleration libraries. Later, we'll move on to NVIDIA containerized software catalog, known as NGC and discuss how NVIDIA is extending AI to every enterprise using virtualization with Nvidia AI Enterprise software suite..

Scene 3 (42s)

[Audio] By the end of this unit, you'll be able to understand virtual GPU as a foundational technology upon which the AI ecosystem sits. Briefly, describe the deep learning stack and kuda. Define the steps that make up the AI workflow. Identify the various types of workflows from open source third-party vendors, as well as those provided by NVIDIA. See what makes up NGC and the Enterprise catalog and discuss their benefits. Walk through and describe the benefits and features of Nvidia AI Enterprise and NVIDIA's provided AI workflows..

Scene 4 (1m 16s)

[Audio] Let's get started.. GPU Virtualization (vGPU).

Scene 5 (1m 22s)

[Audio] Before we get into AI frameworks and the way NVIDIA provides and supports these frameworks, let's take a few minutes to briefly cover VGPU. As a foundational technology, the workplace is experiencing a pandemic disruption that is changing the form and perspective about how we work. The adoption of digital technologies has helped organizations respond to the unprecedented challenges and increasingly make a mobile workforce more prevalent. By 2030 end user computing is expected to grow to $20 billion and 40% of storage and compute shifting towards service-based models. However, to build an enhanced digital workspace for the post pandemic recovery and beyond, we must move beyond defensive short term models and focus on sustainable resilient operating methods. Improved user experience paired with security stands at the forefront of the corporate agenda. In fact, 53% of IT executives report their companies are increasing investment in digital transformation. While 49% are looking to improve efficiencies..

Scene 6 (2m 31s)

[Audio] This is where Nvidia Virtual GPU technology comes into play, allowing it to deliver graphics rich virtual experiences across their user base, whether deploying office productivity applications for knowledge workers, or providing engineers and designers with high performance virtual workstations to access professional design and visualization applications. It can deliver an appealing user experience and maintain the productivity and efficiency of their users. Application and desktop virtualization solutions have been around for a long time, but their number one point of failure tends to be user experience. The reason is very simple. When applications and desktops were first virtualized, GPUs were not a part of the mix. This meant that all of the capture and code and rendering that was traditionally done on a GPU in a physical device was being handled by the CPU in the host. Enter NVIDIA's virtual GPU or VGPU solution. It enables it to virtualize a GPU and share it across multiple virtual machines or VMs. This not only improves performance for existing VDI environments, but it also opens up a whole new set of use cases that can leverage this technology. With our portfolio of virtual GPU solutions, we enable accelerated productivity across a wide range of users and applications. Knowledge workers benefit from an improved experience with office applications, browsers, high definition video, including video conferencing like Zoom, WebEx,and Skype for creative and technical professionals. NVIDIA enables virtual access to the professional applications to typically run on physical workstations, including CAD applications or design applications such as Revit and Maya. It enables GIS apps like ESRI, ArcGIS, pro Oil and Gas apps like petrol financial services like Bloomberg, healthcare apps like Epic or manufacturing apps like Keica, Siemens nx, and SolidWorks, to name a few..

Scene 7 (4m 33s)

[Audio] Our virtual software is available for on-prem data centers and also in the cloud, NVIDIA Virtual PC VPC and virtual apps, V apps software for the knowledge and business workers, Nvidia, RTX virtual workstation VWS for creative and technical professionals such as engineers, architects, and designers. We have a series of courses to walk you through each software offering. Please review the virtualization sales curriculum for more detailed information..

Scene 8 (5m 2s)

[Audio] Let's review how NVIDIA virtual GPU software enables multiple virtual machines to have direct access to a single physical GPU while using the same NVIDIA drivers that our customers deploy on non-virtualized operating systems. On the left hand side, we have a standard VMware ESXI host. VMware has done a great job over the years virtualizing CPU workloads. However, certain tasks are more efficiently handled by dedicated hardware such as GPUs, which offer enhanced graphics and accelerated computing capabilities. On the right side. From the bottom up, we have a server with a GPU running the ESXI hypervisor. When the NVIDIA VGPU manager software or VIB is installed on the host server, we're able to assign VGPU profiles to individual VMs. NVIDIA branded drivers are then installed into the guest OS providing for a high-end user experience. This software enables multiple VMs to share a single GPU or if there are multiple GPU in the server, they can be aggregated so that a single VM can access multiple GPUs. This GPU-enabled environment provides for unprecedented performance while enabling support for more users on a server because work that was done by the CPU can now be offloaded to the GPU.

Scene 9 (6m 23s)

[Audio] Let's review how NVIDIA virtual GPU software enables multiple virtual machines to have direct access to a single physical GPU while using the same NVIDIA drivers that our customers deploy on non-virtualized operating systems. On the left hand side, we have a standard VMware ESXI host. VMware has done a great job over the years virtualizing CPU workloads. However, certain tasks are more efficiently handled by dedicated hardware such as GPUs, which offer enhanced graphics and accelerated computing capabilities. On the right side. From the bottom up, we have a server with a GPU running the ESXI hypervisor. When the NVIDIA VGPU manager software or VIB is installed on the host server, we're able to assign VGPU profiles to individual VMs. NVIDIA branded drivers are then installed into the guest OS providing for a high-end user experience. This software enables multiple VMs to share a single GPU or if there are multiple GPU in the server, they can be aggregated so that a single VM can access multiple GPUs. This GPU-enabled environment provides for unprecedented performance while enabling support for more users on a server because work that was done by the CPU can now be offloaded to the GPU. Most people understand the benefits of GPU virtualization, the ability to divide up GPU resources and share it across multiple virtual machines to deliver the best possible performance, but there are many other benefits delivered by NVIDIA Virtual GPU software included in the Nvidia AI enterprise suite, which go beyond just GPU sharing. With Nvidia VGPU software, it can deliver bare metal performance for compute workloads with minimal overhead running virtualized integrations with partners like VMware provide it a complete lifecycle approach to operational management from infrastructure right sizing to proactive management and issue remediation. These integrations allow it to use the same familiar management tools from hypervisor and leading monitoring software vendors. For deep insights into GPU usage, NVIDIA VGPU supports live migration of accelerated workloads without interruption to end users. This allows for business continuity and workload balancing. The ability to flexibly allocate GPU resources means that it can better utilize the resources in their data center. Since virtualization enables all data to remain securely in the data center, the solution helps to ensure infrastructure and data security..

Scene 10 (9m 3s)

[Audio] . Let's now explore deep learning. We'll start with a brief review of what it is, then walk through an AI workflow. From there, we'll talk about the AI software stack and Cuda X deep learning is a subclass of machine learning..

Scene 11 (9m 18s)

[Audio] From there, we'll talk about the AI software stack and Cuda X deep learning is a subclass of machine learning. It uses neural networks to train a model using very large data sets in the range of terabytes or more of data. Neural networks are algorithms that mimic the human brain in understanding complex patterns. Labeled data is a set of data with labels that help the neural network learn. In the example here, the labels are the objects in the images, cars, and trucks. The errors that the classifier makes on the training data are used to incrementally improve the network structure. Once the neural network-based model is trained, it can make predictions on new images. Once trained, the network and classifier are deployed against previously unseen data, which is not labeled. If the training was done correctly, the network will be able to apply its feature representation to correctly classify similar classes in different situations..

Scene 12 (10m 14s)

[Audio] To understand the AI ecosystem, you have to start with the workflow. The first step is the process of preparing raw data and making it suitable for the machine learning model. Examples of tools for this are Nvidia Rapids and the Nvidia Rapids Accelerator for Apache Spark. Once the data is processed, we move on to the training phase. This is where we teach the model to interpret data. Examples of tools for this are PyTorch, the Nvidia Tau toolkit, and TensorFlow. Next, we refine the data through optimization. An example tool for this is Tensor RT. Finally, we deploy the model making it available for systems to receive data and return predictions. The Nvidia Triton inference server allows the simple deployment of scalable AI models in production.

Scene 13 (11m 3s)

[Audio] So what are frameworks? Frameworks are designed to provide higher level building blocks that make it easy for data scientists and domain experts in computer vision, natural language processing, robotics, and other areas to design, train, and validate AI models. They can be an interface library or tool, which allows developers to more easily and quickly build models. Data scientists use frameworks to create models for a variety of use cases such as computer vision, natural language processing, and speech recognition. For example, MXNet is a modern open source deep learning framework used to train and deploy deep neural networks. It is scalable allowing for fast model training and supports a flexible programming model and multiple languages. The MXNet Library is portable and can scale to multiple GPUs and multiple machines. SCI Kit Learn is a free software machine learning library for the Python programming language. It features various classification regression and clustering algorithms and is designed to interoperate with the Python, numerical and scientific libraries, NumPy and SciPi. TensorFlow is a popular open source software library for data flow programming across a range of tasks. It is a symbolic math library and is commonly used for deep learning applications. Nvidia Isaac Lab is a lightweight application built on Isaac Sim for robot learning. Isaac lab optimizes for reinforcement, imitation and transfer learning and can train all types of robot embodiments. Data scientists can use frameworks to create models for a variety of use cases such as computer vision, natural language processing, and speech recognition..

Scene 14 (12m 52s)

[Audio] The diagram shows the software stack for deep learning. The hardware is comprised of a system which can be a workstation or a server with one or more GPUs. The system is provisioned with an operating system and an NVIDIA driver that enables the deep learning framework to leverage the GPU functions. For accelerated computing, Containers are becoming the choice for development in organizations. NVIDIA provides many frameworks as docker containers through NGC, which is a cloud registry for GPU accelerated software. It hosts over 100 containers for GPU accelerated applications, tools, and frameworks. These containers help with faster and more portable development and deployment of AI applications on GPUs across the cloud. data center and edge, and are optimized for accelerated computing on GPUs. Hence, the stack includes running the Nvidia Docker runtime specific for Nvidia GPUs. The containers include all the required libraries to deliver high-performance GPU acceleration during the processing required for training. The CUDA Toolkit is an NVIDIA groundbreaking parallel programming model that provides essential optimizations for deep learning, machine learning, and high-performance computing, leveraging Nvidia GPUs..

Scene 15 (14m 12s)

NVIDIA Deep Learning Software Stack — NVIDIA's groundbreaking parallel • CUDA programming model Enables GPUs to be • NVIDIA Container Runtime used inside containers — Publicly available containers • NGC Containers optimized to run on NVIDIA GPUs • DL Frameworks — Popular deep learning frameworks available inside the containers • Provides essential optimizations for deep learning, machine learning, and high-performance computing (HPC) leveraging NVIDIA GPUs A range of interfaces can be used Deep Learning Frameworks Deep Learning Libraries CUDA Toolkit Mounted NVIDIA Driver Container OS Containerized Tool NVIDIA Container Runtime For Docker Docker Engine NVIDIA Driver ...4 Host OS.

Scene 16 (14m 33s)

How Do I Build An A1 Platform? Two ways to build an A1 platform Do It Yourself (DIY) NVIDIA A1 Enterprise.

Scene 17 (14m 42s)

[Audio] There are two ways you can go about building an AI platform. You can either take the do-it-yourself approach or leverage NVIDIA AI Enterprise, both of which we'll discuss over the next two sections. Leveraging open-source software has become a mainstream method for AI and machine learning development because it can be collaboratively shared and modified upon distribution. However, building your own AI platform based on open source can be risky without a robust support for production. AI, open source software is often distributed and maintained by community developers without the dedicated resources for quality assurance and verifications. Open source software deployment is often limited to the current GPU architecture and offers only self-service support..

Scene 18 (15m 37s)

[Audio] With Nvidia AI Enterprise, enterprises who leverage the open source practices can build mission-critical applications on top of the NVIDIA AI platform. Nvidia AI Enterprise provides NVIDIA enterprise support and hardware testing and certifications for past, current, and future GPUs. Now that you have an understanding of the two ways you can build an AI platform, let's explore the benefits of the Nvidia AI enterprise solution. In order to use a do-it-yourself or build your own approach or download and use NVIDIA AI Enterprise, all software for either of these approaches is provided in the NVIDIA's NGC and the Enterprise catalog. Let's take a few minutes to explore that. Now. Navigating the world of software stack for AI and accelerated applications is complex. The stack varies by use cases. AI stack is different from HPC simulation apps and Genomic stack is different from the visualization app. The underlying software stack to run a particular application on different platforms from on-prem to cloud, from bare metal to container and from VM to microservices also varies. NGC catalog offers containerized software for AI, HPC data science and visualization applications built by NVIDIA and by our partners. The containers allow you to encapsulate the application and its complex dependencies in a single package, simplifying and accelerating end-to-end workflows and can be deployed on premises in the cloud or at the edge. NGC also offers pre-trained models across a variety of domains and AI tasks such as computer vision and LP and recommender systems. Such pre-trained models can be fine-tuned with your own data, saving you valuable time when it comes to AI model development. Finally, for consistent deployment, NGC also has helm charts that allow you to deploy your application and NGC collections, which bring together all the necessary building blocks, helping you build applications faster. The pre-trained models in the NGC catalog are built and continually trained by NVIDIA experts. For many of our models, we provide model resumes. They're analogous to a potential candidate's resume. You can see the data set. The model was trained on training epochs, batch size, and more importantly, its accuracy. This ensures that users can find the right models for their use case..

Scene 19 (18m 12s)

NGC and the Enterprise catalog.

Scene 20 (18m 18s)

[Audio] . The NGC catalog has rich collections of general-purpose such as ResNet 50 and UNet. More importantly, the catalog also provides application-specific models such as people or vehicle detection pose and gaze estimation. You'll also find models in conversational AI that include speech recognition, text-to-speech, language translation, and more. Not only do you get these rich assortments of models, but these models can also be easily fine-tuned with your custom data or can be easily integrated into industry. SDKs like Riva or Deep Stream containers are now ubiquitous when it comes to developing and deploying software. A container is a portable unit of software that combines the application and all its dependencies into a single package that is agnostic to the underlying host OS. Using containers in an AI development environment ensures that AI applications run consistently across different computing environments. In scientific research, containers allow researchers to easily reproduce and corroborate without having to rebuild the environment from scratch. Nvidia, NGC containers offer certified images that have been scanned for vulnerabilities and are thoroughly tested. Some of our containers are backed by enterprise support via the NVIDIA AI enterprise program. The containers are designed to support multi-GPU and multi-node applications. For high-performance NGC containers can be run with many container runtimes, including Docker, Cryo contained, and Singularity on bare metal virtual machines. and Kubernetes environments. With a monthly update cadence for deep learning containers such as TensorFlow and PyTorch, the containers are continually improving to offer the best performance possible while targeting the latest versions of software to provide easy access and support to your AI journey without having to build it yourself. Nvidia AI enterprise is the easiest on-ramp..

Scene 21 (20m 23s)

Fast Track A1 with Pre — nv101A NGC I CATAXOG PeopleNet Production Quality Trained and continuously updated by experts Model resumes to find the right fit PeopleNet Model Card trained Models from NGC 2020 Limitations Very Small Objects WI mA to OWtS l&ger than 1 Ox' O Therefore t Occluded Objects When objects t' that less is th" be and shov"ers e if the RfSOdS and/or are not Dark-lighting, Monochrome or Infrared Camera Images rhe trained on B in in Warped and Blurry Images not on Ot Face and Bag class kha.h bag •nd dass ate imiuded the mooa. the accuracy of these classes be much than to pog.protns.ed OBWAN N MS to and Wide Range Of Use Cases People Detection, Vehicle Detection & Gaze Estimation Intent Classification, Question-Answering, Speech Recognition and Text to Speech Adapt & Integrate Adapt your domain with custom data Integrate easily into industry SDKs.

Scene 22 (20m 54s)

Containers Enable You to Focus on Building A1 omn kin=tica CORE Enterprise-Ready Software Scanned for CVEs, malware, crypto Tested for reliability Backed by Enterprise support Performance Optimized Scalable Updated monthly Better performance on the same system Deploy Anywhere Docker I cri-o I Containerd I Singularity Bare metal, VMs, Kubernetes Multi-cloud, on-prem, hybrid, edge.

Scene 23 (21m 8s)

[Audio] The next sections will briefly walk you through what it is, what it does, and how to find it. Nvidia AI platform consists of three important layers, accelerated infrastructure that provides accelerated computing to power the entire AI technology stack. AI platform software, which is the Nvidia AI enterprise software suite for production, AI and AI services for enterprises to easily build AI applications. Leveraging state-of-the-art Foundation models, we'll be focusing on Nvidia AI enterprise the software layer of the most advanced AI platform. Nvidia AI platform provides reliability and security for production AI consisting of four important layers. Infrastructure optimization and cloud native management or orchestration layers are essential to optimize your infrastructure to be AI ready. Cloud native management and orchestration tools facilitate deployment of the solution in cloud native and hybrid environments. AI and data science development and deployment tools include the best in class AI software that's needed for development and deployment. AI workflows, frameworks and pre-trained models are designed for enterprise to quickly get started with developing specific AI use cases and addressing business outcomes. For example, customers might leverage included AI workflows to develop intelligent virtual assistance for contact centers or digital fingerprinting to detect cybersecurity threats. The entire software stack can be flexibly deployed across accelerated cloud, data center, edge and embedded infrastructure. Wherever you choose to run your AI workloads, applications can run anywhere that NVIDIA infrastructure is available with one license. Nvidia AI Enterprise covers your AI center of excellence or COE needs partnered with the most experienced group of enterprise AI experts in the market with included enterprise support. NVIDIA AI platform offers cloud-native hybrid optimized, deploy anywhere on-prem and in the cloud, reduce development complexity, secure and scalable certifications with broad partner ecosystem, improved AI model accuracy, standard support nine by five, premium 24 by seven. Now that you have a general understanding of NVIDIA AI enterprise and its benefits, let's turn our attention to GPU virtualization. NVIDIA offers a diverse range of SDKs models and frameworks. This slide provides a concise overview of their functions. For a deeper understanding of any specific model or framework, a quick Google search is recommended. To round up this discussion on the AI ecosystem, we will briefly cover NVIDIA's AI workflows. One question that frequently arises is whether there is a difference between an AI workload and a workflow. We believe there is a difference, and Nvidia provides solutions to address both scenarios. There are customers who are running workloads already, and these can be accelerated by NVIDIA frameworks and libraries that leverage Nvidia GPUs. Also, there are organizations who would like to deploy specific workflows but aren't quite sure how to build them or how to get started. For these customers, we've created AI workflows which are assembled, tested, documented, and customizable to provide customers a head start in solving specific challenges. Now that you understand the differences between workloads and workflows, let's explore another potential point of confusion. Let's explore NVIDIA's AI workflows available through NGC and the enterprise catalog. These are prepackaged solutions designed to assist AI practitioners with specific use cases. Each workflow guides you through the necessary tools and steps to create and run a variety of workflows. These workflows have been fully tested and are vetted by NVIDIA in the future. NVIDIA plans to introduce more AI workflows to cover a broader range of use cases. The majority of enterprises are now moving to adopt AI but the.

Scene 24 (25m 26s)

NVIDIA A1 Platform NVIDIA A1 enterprise is the software layer of the most advanced A1 platform NVIDIA A1 A1 Foundation Models & Services A1 Platform Software Accelerated Infrastructure NVIDIA A1 Enterprise.

Scene 25 (25m 36s)

NVIDIA A1 Enterprise End to end A1 software includes over 50 frameworks and pretrained models A1 Workflows, Frameworks and Pretrained Models* Medical Imaging Speech A1 Conversational A1 Recommenders Communi Video Logistics Analytics Robotics A1 and Data Science: Development and Deployment Tools Cloud Native Management and Orchestration Infrastructure Optimization Accelerated Infrastructure Autonomous Cybersecurity Vehicles Embedded 1 OFFERING Cloud Native, Hybrid Optimized Deploy anywhere - on-prem and in the cloud Reduce OSS development complexity Secure and Scalable Certifications with broad partner ecosystem • Improved A1 model accuracy Standard Support 9 x S, Premium 24x7 Cloud Data Center Edge *NVIDIA NGC public catalog provides a complete listing of over 50 supported frameworks and pretrained models..

Scene 26 (25m 58s)

Application Workflows SDKs, pre-trained models, and frameworks M LIN MODULUS MAXINE CLARA: A1 Applications and frameworks for healthcare and medical imaging. RIVA: Multilingual Speech and translation A1 software development kit. TOKKIO: Framework to build and deploy Al- powered digital assistants and avatars. MERLIN: Framework for building high- performing recommender systems at scale. MODULUS: Physics ML platform that blends physics with deep learning training data. MAXINE: A1 SDKs and cloud-native microservices for deploying A1 features that enhance audio, video and reality effects. . METROPOLI CUOPT NEMO' ISAAC DRIVE MORPHEUS METROPOLIS: Application framework to bring visual data and A1 together. CUOPT: Operations Research API using A1 to create complex, real-time fleet routing workflows. NEMO: Framework to build, customize, and deploy generative A1 models. ISAAC: Framework to build modular robotics applications. DRIVE: Framework to help collect data, train deep neural networks, test, validate and operate Autonomous Vehicles. MORPHEUS: Framework that enables cybersecurity developers to create Optimized pipelines for filtering, processing and classifying data..

Scene 27 (26m 38s)

NVIDIA A1 Workflows.

Scene 28 (26m 44s)

Terminology Explained Workload vs. workflow O o Workload O O $3 O Any application, microservice, or function that is . standalone, or as a part of a workflow, that uses compute resources to accomplish a task or output results. Data science, A1, and 3D graphics workloads can be accelerated by frameworks and libraries that leverage NVIDIA GPUs. Examples: Spark jobs, models doing video analytics, training a large language model, a text-to-speech function, video rendering Workflow Multi-step process to get from initiation to completion, where each step is a workload. For example, the generic workflow of A1 is data prep > training > simulation > inference. NVIDIA A1 workflows are assembled, tested, documented, and customizable to provide a partners and customers a head start in solving specific challenges. Examples: Audio transcription, digital fingerprinting to detect cybersecurity threats, contact center intelligent virtual assistant.

Scene 29 (27m 18s)

NVIDIA A1 Workflows Prepackaged reference applications to rapidly automate your business with A1 Intelligent Virtual Assistant 000 Engaging contact center assistance 24/7 for lower operational costs Cloud Audio Transcription World-class, accurate transcripts based on GPU-optimized models Digital Fingerprinting Threat Detection 6 Cybersecurity threat detection and alert prioritization to identify and act faster Next Item Prediction Personalized product recommendations for increased customer engagement and retention NVIDIA A1 Enterprise Data Center Route Optimization Vehicle and robot routing optimization to reduce travel times and fuel costs Edge o Generative A1 Knowledge base Embedded o.

Scene 30 (27m 34s)

A1 Workflows Accelerate the Path to A1 Outcomes Reduce the cost of developing and deploying A1 solutions Accelerate Development & Deployment Prepackaged, customizable reference applications include best-in-class A1 software with cloud-native deployable packaging Improve Accuracy & Performance Frameworks and containers performance-tuned and tested for NVIDIA GPUs Gain Confidence in A1 Outcomes Enterprise-grade support.

Scene 31 (27m 49s)

Unit Summary.

Scene 32 (27m 54s)

Summary Now that you have completed this unit, you should be able to: Define Virtual GPU (vGPU) Describe NVIDIA deep learning software stack and NVIDIA CUDA-X Ecosystem Define the steps in A1 pipeline workflow Define and identify open source, 3rd party, and NVIDIA frameworks Describe the benefits of NGC and Enterprise catalog Describe the benefits and use cases of NVIDIA A1 Enterprise Describe NVIDIA's A1 Workflows.

Scene 33 (28m 13s)

Accelerating A1 with GPUs Unit 4 Coming Up Next Continue the journey by taking the next unit! Data Center and Cloud Computing Unit 6 A1 Software Ecosystem Unit 5 Compute Platforms for A1 Unit 7.

Scene 34 (28m 23s)

nVlDlA nVlDIA.