How to Implement Deep Learning Applications for NVIDIA GPUs with GPU Coder

GPU Coder™ generates readable and portable CUDA® code that leverages CUDA libraries like cuBLAS and cuDNN from the MATLAB® algorithm, which is then cross-compiled and deployed to NVIDIA® GPUs from the Tesla® to the embedded Jetson™ platform.
Learn more about GPU Coder:
Download a free Deep Learning ebook:

The first part of this talk describes how MATLAB is used to design and prototype end-to-end systems that include a deep learning network augmented with computer vision algorithms. You’ll learn about the affordances in MATLAB to access and manage large data sets, as well as pretrained models to quickly get started with deep learning design. Then, you’ll see how distributed and GPU computing capabilities integrated with MATLAB are employed during training, debugging, and verification of the network. Finally, most end-to-end systems need more than just classification: Data needs to be pre- and post-processed before and after classification. The results are often inputs to a downstream control system. These traditional computer vision and control algorithms, written in MATLAB, are used to interface with the deep learning network to build up the end-to-end system.

The second part of this talk focuses on the embedded deployment phase. Using representative examples from automated driving to illustrate the entire workflow, see how GPU Coder automatically analyzes your MATLAB algorithm to (a) partition the MATLAB algorithm between CPU/GPU execution; (b) infer memory dependencies; (c) allocate to the GPU memory hierarchy (including global, local, shared, and constant memories); (d) minimize data transfers and device-synchronizations between CPU and GPU; and (e) finally generate CUDA code that leverages optimized CUDA libraries like cuBLAS and cuDNN to deliver high-performance.

Finally, you’ll see that the generated code is highly optimized with benchmarks that show that deep learning inference performance of the auto-generated CUDA code is ~2.5x faster for mxNet, ~5x faster for Caffe2, and ~7x faster for TensorFlow®.

Watch this talk to learn how to:

1. Access and manage large image sets

2. Visualize networks and gain insight into the training process

3. Import reference networks such as AlexNet and GoogLeNet

4. Automatically generate portable and optimized CUDA code from the MATLAB algorithm

  • 매스웍스
  • 11개월전 |  41min 16sec

  • 텍사스인스트루먼트
  • 맥심
  • 래티스반도체
  • 프리스케일
  • Viavi Solutions
  • 안리쓰
  • 매스웍스
  • 아나로그디바이스
  • 키사이트
  • 리니어 테크놀로지
  • CEVA
  • 컴볼트
  • 포티넷
  • 인포매티카
  • 몰렉스
  • 오라클
  • 퀀텀
  • 퀘스트 소프트웨어
  • 삼성전자
  • 빅트렉스
  • 와콤
  • 마이크로스트레티지
  • ams
  • 텔레다인 르크로이
  • 플리어시스템
  • 아크로니스
  • 어드밴텍
  • 아카마이
  • intel FPGA
  • AMD
  • 아나디직스
  • 어플라이드 머티어리얼즈
  • ARM
  • Artesyn
  • 에이수스
  • 에이텐
  • 아트멜
  • 아바고테크놀로지스
  • 엑시스커뮤니케이션즈
  • 바스프
  • 브로드컴
  • 브로케이드
  • 시스코
  • CML Microcircuits
  • 코그넥스
  • CSR
  • 싸이프레스
  • 델EMC
  • 디지키
  • 에너지마이크로
  • 페어차일드반도체
  • 플루크
  • 가트너
  • IDT
  • 인피니언
  • 인텔
  • 인터실
  • 주니퍼 네트웍스
  • 카세야코리아
  • 키슬리
  • LG전자
  • LSI
  • MapR
  • 멘토그래픽스
  • 마이크로칩 테크놀로지
  • 마이크로세미
  • 마이크로소프트
  • 마우저
  • MOXA
  • 내쇼날인스트루먼트
  • 내셔널세미컨덕터
  • NXP
  • 온세미컨덕터
  • 옵토마
  • 팔로알토 네트웍스
  • PTC
  • 퀵로직
  • 램버스
  • 레드햇
  • 로데슈바르즈
  • 로옴
  • 슈나이더 일렉트릭
  • 씨게이트
  • 젠하이저
  • 지멘스 PLM 소프트웨어
  • 실리콘랩스
  • ST마이크로일렉트로닉스
  • 스트라타시스
  • 시만텍
  • TDK-Micronas
  • 텍트로닉스
  • 텔릿와이어리스솔루션즈
  • 테스토
  • VMware
  • 윈드리버
  • 울프슨 마이크로일렉트로닉스
  • 자일링스
  • 요꼬가와
  • 지브라 테크놀로지스