Visual Computing Systems

Stanford CS348K, Fall 2018

VISUAL COMPUTING SYSTEMS

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info

Tues/Thurs 1:30-2:50pm

Room 60-109

Instructor: Kayvon Fatahalian

See the course info page for more info on course policies, logistics, and how to prepare for the course.

Fall 2018 Schedule

Sep 25	Course Introduction + Review of Throughput Hardware Design Principles How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs, GPUs; understanding latency and bandwidth constraints
Sep 27	The Digital Camera Processing Pipeline Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, image filtering
Oct 2	Modern Smartphone Camera Processing (such as in the Pixel 2 Phone) Multi-scale processing with Gaussian and Laplacian pyramids, HDR/local tone mapping, portrait mode in the Pixel 2 camera
Oct 4	Efficiently Scheduling Image Processing Algorithms on Parallel Hardware Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Oct 9	Specialized Hardware for Efficient Image Processing Benefits of fixed-function processing, comparing GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Oct 11	Lossy Image and Video Compression JPG compression. H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, emerging opportunities for compression when machines, not humans, will observe most images
Oct 16	The Light Field and Capture for VR Display Light field representation, light-field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
Oct 18	Efficient DNN Inference (for Image Analysis) popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation, where the compute lies in modern networks, DNN pruning, neural architecture search
Oct 23	Algorithms for Parallel DNN Training at Scale Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Oct 25	Hardware Accelerators for DNN Inference GPUs, Google TPU, special instructions for DNN evaluation, choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration
Oct 30	Algorithmic Optimization: Examples of Task-Motivated DNN Structure Neural module networks, discussion on value of modularity vs. end-to-end learning
Nov 1	Algorithmic Optimizations for DNN-Based Video Analysis Exploiting temporal coherence in video, pipelined networks, specialization to scene and camera viewpoint, sharing computations across applications and users
Nov 6	Video Stream Processing at Cloud Scale Facebook SVE/Lumos, Scanner, processing as a service
Nov 8	The GPU-Accelerated Real-Time Graphics Pipeline 3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Nov 13	Efficiently Accessing Memory: Hardware for Texture Mapping and Depth Buffering Texture sampling basics, hardware texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Nov 15	Scheduling the Graphics Pipeline onto a GPU Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Nov 27	Guest Lecture: Bill Mark (Google) Topic: specialized hardware for deep learning and computational photography at Google
Nov 29	Domain-Specific Languages for Shading (with Tim Foley, NVIDIA) Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
Dec 4	Misc topics: Design of ML Frameworks / VR Rendering Mapping shaders to GPUs, Design of platform for ML computations, rendering concerns of VR
Dec 6	The Fusion of Rendering and Deep Learning How deep learning and hardware specialization stand to make real-time raytracing feasible

Assignments and Projects

optional	Optional Assignment 0: Analyzing Parallel Program Performance on a Quad-Core CPU All CVS348K students are encouraged to attempt this assignment during or before the first week of the course to check their background in parallel systems.
Oct 22	Assignment 1: Burst Mode HDR Camera RAW Processing for the kPhone 348
optional	Optional Assignment 2: Implementing a Separable Conv Layer in Halide
Dec 11	Final Project Guidelines: students will complete a substantial term project on a course-relevant topic of their choosing.