Stanford CS348K, Spring 2024

VISUAL COMPUTING SYSTEMS

Visual computing tasks such as computational imaging, image/video understanding, generative AI, and real-time 3D graphics are key responsibilities of modern computer systems ranging from sensor-rich smart phones, autonomous robots, and large datacenters. These workloads demand exceptional system efficiency and this course examines the key ideas, techniques, and challenges associated with the design of parallel, heterogeneous systems that accelerate visual computing applications. This course is intended for systems students interested in architecting efficient graphics, image processing, and computer vision platforms (both new hardware architectures and domain-optimized programming frameworks for these platforms) and for graphics, vision, and AI students that wish to understand throughput computing principles to design new algorithms that map efficiently to these machines.

Basic Info

Tues/Thurs 10:30-11:50pm

Location: Lathrop 282

Instructor: Kayvon Fatahalian

Welcome to CS348K Spring 2024. Please see the course info page for more info on policies and logistics, and well as answers to common questions like "Am I prepared to take this class?" This course is a paper-reading and in-class discussion-based course, so live attendence is expected of all participants.

Spring 2024 Schedule

Apr 02		Course Introduction + Importance of Explicit Goals and Constraints Discussion of modern visual computing applications, a design exercise
Apr 04		Digital Camera Processing Pipeline (Part I) - Algorithms Algorithms for taking raw sensor pixels to an RGB image: demosaicing, sharpening, correcting lens aberrations, multi-shot alignment/merging, image filtering, multi-scale processing with Gaussian and Laplacian pyramids, HDR (local tone mapping)
Apr 09		Digital Camera Processing Pipeline (Part II) + Camera Programming Abstractions The Frankencamera, modern camera APIs, advanced image analysis for photography (portrait mode, autofocus, etc)
Apr 11		Scheduling Image Processing Algorithms to Parallel Hardware (Part I) Balancing locality, parallelism, and work, fusion and tiling, design of the Halide domain-specific language, automatically scheduling image processing pipelines
Apr 16		Scheduling Image Processing Algorithms to Parallel Hardware (Part II) Detailed look at Halide's scheduling algebra
Apr 18		Efficient DNN Inference and Scheduling Data-layout optimizations, scheduling decisions, fusion optimizations, modern libraries (like CUTLASS)
Apr 23		DNN Accelerator Hardware GPUs, TPUs, special instructions for DNN evaluation (and their efficiency vs custom ASIC), choice of precision in arithmetic, modern commercial DNN accelerators, flexibility vs efficiency trade-offs
Apr 25		Generative Image Synthesis - Part I (Control) The importance of predictable control in content creation. Techniques for inserting new forms of control into generative image synthesis, role of human-interpretable abstractions.
Apr 30		Generative Image Synthesis - Part II (Efficient Generation) Modern techniques for generating images efficiently with generative AI: stable diffusion, low-dimensional spaces, consistency matching, how it comes together in SDXL Turbo
May 02		Generating new types of media (video, animation, 3D, worlds and more) Video generation (like Sora), generating 3D content, virtual worlds, generating programs
May 07		Creating AI Agents (Including LLM-based problem solving) LLM-based problem solving agents, systems and platforms for developing AI agents
May 09		Fast 3D World Simulation for Model Training (Part I) Training agents in virtual worlds, simulation engines for training agents, throughput-maximized engines, sim-to-real issues, hybrid RL-LLM systems
May 14		Fast 3D World Simulation for Model Training (Part II) Discussion of high-throughput systems like Madrona, and pixel based systems like DeepMind's Genie
May 16		Differentiable Rendering and Optimizable Representations for 3D Reconstruction (Part I) Scene representations such as NeRF, dense volumes, sparse-octrees, neural Hash-Grids, 3D gaussians
May 21		Differentiable Rendering and Optimizable Representations for 3D Reconstruction (Part II) Gaussian splatting and its performance optimization. Ray casting vs. rasterization.
May 23		Video Compression: Traditional and Learned H.264 video representation/encoding, parallel encoding, motivations for ASIC acceleration, ML-based compression methods, emerging opportunities for compression when machines, not humans, will observe most images
May 28		The Present and Future of Videoconferencing Systems System design issues for building a video conferencing system: reducing latency, bandwidth, etc. How real-time video analysis will enable richer video-based applications.

Assignments

Jun 5

Term Project Information