This page contains lecture slides and recommended readings for the Fall 2018 offering of CS348K.
- The Compute Architecture of Intel Processor Graphics Gen9, Intel Technical Whitepaper
The reading for this class is not an academic paper, but a whitepaper from Intel describing the architectural geometry of a recent GPU. -- the marketing name is HD Graphics 530 (or larger numbers).
I'd like you to read the whitepaper, focusing on the description of the processor in Sections 5.3-5.5. Then, given your knowledge of the concepts discussed in lecture (such as superscalar execution, multi-core, multi-threading, etc.), I'd like you to describe the organization of the processor (using terms from the lecture, not Intel terms). For example, what is the basic processor building block. How many hardware threads does it support? What type of SIMD instructions are executed by those threads? Does it have superscalar execution capabilities? How many times is this block replicated for additional parallelism?
Consider your favorite data-parallel language, such as GLSL/HLSL shading languages, CUDA, OpenCL, ISPC, or just an OpenMP #pragma parallel for. Can you think through how an embarrassingly parallel for loop can be mapped to this architecture. (You don't need to write this down, but you could.)
I also encourage you to read NVIDIA's V100 (Volta) Architecture whitepaper, linked in the "further reading" below. Can you put the organization of this GPU in correspondence with the organization of the Intel GPU? You could make a table contrasting the features of a modern AVX-capable Intel CPU, Intel Integrated Graphics (Gen9), NVIDIA GPUs, etc.
- NVIDIA Tesla V100 Whitepaper, 2017
- NVIDIA Tegra X1 Whitepaper
- The Rise of Mobile Visual Computing Systems, Fatahalian, IEEE Mobile Computing 2016
- Scalability! But at What COST?, McSherry, Isard, and Murray. HotOS 2015 (The arguments in this paper are very consistent with the way we think about performance in the visual computing domain.)
- Burst Photography for High Dynamic Range and Low-light Imaging on Mobile Cameras, Hasinoff et al. SIGGRAPH Asia 2016
- The Stanford CS448A course notes are a very good reference for camera image processing pipeline algorithms and issues.
- The interactive demos on the Stanford CS178 course site are very well done
- Clarkvision.com has some very interesting material on cameras.
- Demosaicking: Color Filter Array Interpolation, Gunturk et al. IEEE Signal Processing Magazine, 2005
- A Non-Local Algorithm for Image Denoising, Buades et al. CVPR 2005
- A Gentle Introduction to Bilateral Filtering and its Applications, Paris et al. SIGGRAPH 2008 Course Notes
- A Fast Approximation of the Bilateral Filter using a Signal Processing Approach, Paris and Durand. MIT technical report 2006 (extends their ECCV 2006 paper)
- Synthetic Depth-of-Field with a Single-Camera Mobile Phone, Wadha et al. SIGGRAPH 2018
- The Laplacian Pyramid as a Compact Image Code. Burt and Adelson, IEEE Transactions on Communications 1983.
- Pyramid Methods in Image Processing. Andersen et al. 1984
- Local Laplacian Filters: Edge-aware Image Processing with a Laplacian Pyramid. Paris et al. SIGGRAPH 2013
- Exposure Fusion. Mertens et al. Computer Graphics and Applications, 2007
- Fast Local Laplacian Filters: Theory and Applications. Aubry et al. TOG 14
- A Non-Local Algorithm for Image Denoising, Buades et al. CVPR 2005
- Halide: A Language and Compiler for Optimizing Parallelism, Locality, and Recomputation in Image Processing Pipelines. Ragan-Kelley, Andrew Adams, et al. PLDI 2013 (or read the selected chapters in the Ragan-Kelley thesis below)
- Decoupling Algorithms from the Organization of Computation for High Performance Image Processing (please read Chapters 1, 4, 5, and 6.1), Ragan-Kelley (MIT Ph.D. thesis, 2014)
- Automatically Scheduling Halide Image Processing Pipelines, Mullapudi et al. SIGGRAPH 2016
- Differentiable Programming for Image Processing and Deep Learning in Halide. T. Li et al. SIGGRAPH 2018
- Parallel Associative Reductions in Halide. P. Suriana et al. CGO 2017
- Halide Language Website (contains documentation and many tutorials)
- Check out this Youtube Video on scheduling
- The Frankencamera: An Experimental Platform for Computational Photography. A. Adams et al. SIGGRAPH 2010
- Darkroom: Compiling High-Level Image Processing Code into Hardware Pipelines Hegarty et al. SIGGRAPH 2014
- Rigel: Flexible Multi-Rate Image Processing Hardware, Hegarty et al. SIGGRAPH 2016.
- Programming Heterogeneous Systems from an Image Processing DSL. Pu et al. TACO 2017
- There is no required reading for this lecture.
- Overview of the H.264/AVC Video Coding Standard
- Learning Binary Residual Representations for Domain-specific Video Streaming. Tsai et al. AAAI 18
- Real-Time Adaptive Image Compression. Rippel and Bourdev. 2017
- Light Field Rendering. Levoy and Hanrahan SIGGRAPH 1996
- The Lumigraph. Gortler et al. SIGGRAPH 1996
- Single Lens Stereo with a Plenoptic Camera. E. Adelseon and J. Wang. Transactions on Pattern Analysis and Machine Intelligence, 1992
- Light-Field Photography with a Hand-Held Plenoptic Camera. Ng et al. Stanford Technical Report, 2005
- Digital Light Field Photography. R. Ng. Stanford Ph.D. Dissertation, 2006 (see chapters 1-4)
- Jump: Virtual Reality Video, Andersen et al. SIGGRAPH Asia 2016 (Jump website)
- Casual 3D Photography, Hedman et al. SIGGRAPH Asia 2017
- Instant 3D Photography. Hedman and Kopf. SIGGRAPH 2018
- Facebook Surround 360 page
- Going Deeper with Convolutions, Szegedy et al. CVPR 2015 (the Inception paper). You may also enjoy reading this useful blog post about versions of the Inception network.
- MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. Howard et al. 2017
- Stanford cs231: Convolutional Neural Networks for Visual Recognition. If you haven't taken CS231N, I recommend that you read through the lecture notes of modules 1 and 2 for very nice explanation of key topics.
- An Introduction to different Types of Convolutions in Deep Learning. by Paul-Louis Pröve (a nice little tutorial)
- Neural Networks and Deep Learning, Nielson, 2016 (a free online book)
- Deep Residual Learning for Image Recognition. K. He et al. CVPR 2016 (the ResNet paper)
- Visualizing and Understanding Convolutional Neural Networks, Zeiler and Fergus, ECCV14
- Facebook Tensor Comprehensions (Arxiv paper is here)
- Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. Han et al. ICLR 2016
- Progressive Neural Architecture Search. Liu et al. ECCV 2018
- Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour. Goyal et al. 2017
- ImageNet Training in Minutes. You et al. 2018
- Scaling Distributed Machine Learning with the Parameter Server, Li et al. OSDI 2014
- Deep Gradient Compression. Lin et al. ICLR 2018
- In-Datacenter Performance Analysis of a Tensor Processing Unit. Jouppi et al. ISCA 2017
- SCNN: An Accelerator for Compressed-sparse Convolutional Neural Networks. Parashar et al. ISCA 2017
- EIE: Efficient Inference Engine on Compressed Deep Neural Network, Han et al. ISCA 2016
- vDNN: Virtualized Deep Neural Networks for Scalable, Memory-Efficient Neural Network Design, Rhu et al. MICRO 2016
- Eyeriss: A Spatial Architecture for Energy-Efficient Dataflow for Convolutional Neural Network, Chen et al. ISCA 2016
- Inferring and Executing Programs for Visual Reasoning. J. Johnson et al. ICCV 2017
- CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning. J. Johnson et al. CVPR 2017
- Neural Module Networks, Andreas et al. CVPR 2016
- HydraNets: Specialized Dynamic Architectures for Efficient Inference. Mullapudi et al. CVPR 2018
- Deep Bilateral Learning for Real-Time Image Enhancement. Gharni et al. SIGGRAPH 2017
- NoScope: Optimizing Neural Network Queries over Video at Scale. D. Kang et al. 2017
- Mainstream: Dynamic Stem-Sharing for Multi-Tenant Video Processing. Jiang et al. USENIX 2018
- Deep Feature Flow for Video Recognition. Zhu et al. CVPR 2017
- Clockwork Convnets for Video Semantic Segmentation. E. Shelhamer et al. 2016
- Scanner: Efficient Video Analysis at Scale. Poms et al. SIGGRAPH 2018 (Github code here)
- SVE: Distributed Video Processing at Facebook Scale. Huang et al. SOSP 2017
- Chameleon: Scalable Adaptation of Video Analytics. Jiang et al. SIGCOMM 2018
- Encoding, Fast and Slow: Low-Latency Video Processing Using Thousands of Tiny Threads. Fouladi et al. NSDI 2017
- Please read the Fouladi et al. paper listed under the recommended readings from last class.
- A Trip Down the LOL Graphics Pipeline. A nice introductory blog post for Riot Games that illustrates all the different rendering passes used to construct a League of Legends scene. Note how each of these passes draws geometry under different graphics pipeline state configurations.
- A Trip Down the Graphics Pipeline. A much more detailed blog post by Fabian Giesen describing the Direct3D 10-class pipeline
- The Design of the OpenGL Graphics Interface. by M. Segal and K. Akeley. [unpublished 1994]
- The Direct3D 10 System by D. Blythe. SIGGRAPH 2006
- Pyramidal Parametrics. L. Williams, Computer Graphics 1983
- Texture on Demand. D. Peachy. Pixar Technical Memory #217. 1990
- The Design and Analysis of a Cache Architecture for Texture Mapping. Z. S. Hakura and Anoop Gupta, ISCA 1997
- Prefetching in a Texture Cache Architecture. H. Igehy et al. Graphics Hardware 1998
- Cardinality-Constrained Texture Filtering. J. Manson and S. Schaefer. SIGGRAPH 2013.
- Parameterization-Aware MIP-Mapping. J. Manson and S. Schaefer. Computer Graphics Forum. 2012.
- Texture Compression using Low-Frequency Signal Modulation. S. Fenney. Graphics Hardware 2003
- iPACKMAN: High-Quality, Low-Complexity Texture Compression for Mobile Phones. J. Ström and T. Akenine-Möller. Graphics Hardware 2005
- ETC2: Texture Compression using Invalid Combinations. J. Ström and M. Pettersson. Graphics Hardware 2007
- Adaptive Scalable Texture Compression. T. Olson et al. High Performance Graphics 2012
- Block Compression in Direct3D 10. MSDN Developer Reference. 2013
- Efficient Depth Buffer Compression. J. Hasselgren and T. Akenine Möller.
- Stochastic Depth Buffer Compression using Generalized Plane Encoding. M. Andersson et al. Computer Graphics Forum 2013
- Pomegranate: A Fully Scalable Graphics Architecture. M. Eldridge et al. SIGGRAPH 2000
- Life of a Triangle - NVIDIA's Logical Pipeline. C. Kubisch (NVIDIA GameWorks Blog, 2015)
- Fast Tessellated Rendering on Fermi GF100. T. Purcell (High Performance Graphics Hot3D talk)
- A Sorting Classification of Parallel Rendering. S. Molnar et al. IEEE Computer Graphics and Applications, 1994.
- A Language for Shading and Lighting Calculations. P. Hanrahan and J. Lawson. SIGGRAPH 1990
- Cg: A System for Programming Graphics Hardware in a C-like Language. W. R. Mark et al. SIGGRAPH 2003
- Slang: Language Mechanisms for Extensible Real-time Shading Systems. Y. He, K. Fatahalian, T. Foley. SIGGRAPH 2018
- Shader Components: Modular and High Performance Shader Development. Y. He et al. SIGGRAPH 2017
- A Real-Time Procedural Shading System for Programmable Graphics Hardware. K. Proudfoot et al. SIGGRAPH 2001
- Shade Trees. R. Cook. SIGGRAPH 1984
- An Image Synthesizer. K. Perlin. SIGGRAPH 1985