Stanford CS348K, Fall 2018
VISUAL COMPUTING SYSTEMS

Visual computing tasks such as computational imaging, image/video understanding, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and machine learning students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info
Tues/Thurs 1:30-2:50pm
Room 60-109
Instructor: Kayvon Fatahalian
See the course info page for more info on course policies, logistics, and how to prepare for the course.
Fall 2018 Schedule
Sep 25
Course Introduction + Review of Parallel Hardware Architecture
How superscalar, multi-core, SIMD, and hardware multi-threading are used in CPUs, GPUs; understanding latency and bandwidth constraints
Sep 27
The Digital Camera Processing Pipeline
Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, autofocus/autoexposure, image filtering
Oct 2
Modern Smartphone Camera Processing (such as in the Pixel 2 Phone)
Multi-scale processing with Gaussian and Laplacian pyramids, local tone mapping, multi-shot alignment/merging, HDR/portrait mode in the Pixel 2 camera
Oct 4
Efficiently Scheduling Image Processing Algorithms on Parallel Hardware
Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Oct 9
Specialized Hardware for Efficient Image Processing
Benefits of fixed-function processing, comparing GPUs, DSPs, Image Signal Processors, and FPGAs for image processing, domain-specific languages for hardware synthesis such as Darkroom/Rigel, compiling Halide to hardware
Oct 11
Lossy Video Compression
H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, emerging opportunities for compression when machines, not humans, will observe most images
Oct 16
The Light Field and Image Capture for VR Display
Light field representation, light-field cameras, computational challenges of synthesizing video streams for VR output, Google's Jump VR pipeline
Oct 18
Efficient DNN Inference (for Image Analysis)
popular DNN trunks and topologies, design of MobileNet, challenges of direct implementation, where the compute lies in modern networks, DNN pruning, neural architecture search
Oct 23
Algorithms for Parallel DNN Training at Scale
Footprint challenges of training, model vs. data parallelism, asynchronous vs. synchronous training debate, parameter server designs, key optimizations for parallel training
Oct 25
Hardware Accelerators for DNN Inference
GPUs, Google TPU, special instructions for DNN evaluation, choice of precision in arithmetic, recent ISCA/MICRO papers on DNN acceleration
Oct 30
Algorithmic Optimization: Examples of Task-Motivated DNN Structure
Neural module networks, learning to compress images/video, discussion on value of modularity vs. end-to-end learning
Nov 1
Algorithmic Optimizations for DNN-Based Video Analysis
Exploiting temporal coherence in video, pipelined networks, specialization to scene and camera viewpoint, sharing computations across applications and users
Nov 6
Video Stream Processing at Cloud Scale
Facebook SVE/Lumos, Scanner, emerging platforms for video processing as a service
Nov 8
Guest Lecture: to be announced
Topic: specialized hardware for deep learning and computational photography
Nov 13
The GPU-Accelerated Real-Time Graphics Pipeline
3D graphics pipeline as a machine architecture (abstraction), pipeline semantics/functionality, contrasting graphics pipeline architecture with compute-mode GPU architecture
Nov 15
Hardware Acceleration of Texture Mapping and Depth Buffering
Texture sampling basics, hardware texture compression, depth-and-color buffer compression, motivations for hardware multi-threading for latency hiding in modern GPUs
Nov 27
Scheduling the Graphics Pipeline onto a GPU
Molnar sorting taxonomy, dataflow scheduling under data amplification, tiled rendering for bandwidth-efficiency, deferred shading as a scheduling decision
Nov 29
The Design of Domain-Specific Languages for Shading
Renderman Shading Language and Cg: contrasting two different levels of abstraction for shading languages, Slang
Dec 4
High-Performance Ray Tracing Techniques
Algorithms for fast ray tracing, mapping ray tracing to GPU hardware
Dec 6
Guest Lecture: Emerging GPU Hardware for Accelerating Ray Tracing
How deep learning and custom hardware will soon make real-time raytracing feasible
Assignments and Projects
optionalOptional Assignment 0: Analyzing Parallel Program Performance on a Quad-Core CPU
All CVS348K students are encouraged to attempt this assignment during or before the first week of the course to check their background in parallel systems.
Oct 18Required Assignment: In this multi-part assignment, students will implement a basic pipeline for processing RAW images produced by a camera sensor into high-quality images. This pipeline will employ techniques common in modern smartphone camera applications. Students will optionally tune the performance of their implementation for modern multi-core CPUs or GPUs.
optionalOptional Assignment 2: Implementing a Separable Conv Layer in Halide
TBDFinal Project: students will complete a substantial term project on a course-relevant topic of their choosing.