3D游戏编程GPU编程基础.pdf
《3D游戏编程GPU编程基础.pdf》由会员分享,可在线阅读,更多相关《3D游戏编程GPU编程基础.pdf(67页珍藏版)》请在淘文阁 - 分享文档赚钱的网站上搜索。
1、GPU Programming Yanci Zhang Game Programming IIGame Programming II GPU overview OpenGL shading language overview Vertex/Geometry/Fragment shader Application:Per-pixel shading Game Programming IIGame Programming II Outline GPU:Graphics Processing Unit Developed rapidly from being primitive drawing de
2、vices to being major computing resources Extremely powerful and flexible processor Tremendous memory bandwidth and computational power High level languages have emerged Capable of general-purpose computation beyond graphics applications Game Programming IIGame Programming II What is GPU?In many resp
3、ects GPU is more powerful than CPU Computational power:FLOPS(Floating point Operations Per Second)Parallelism Bandwidth Performance growth rate Game Programming IIGame Programming II Motivation FLOPS:A common benchmark measurement for rating the speed of FPU CPU Intel Core i7 980 XE(quad-core):107.5
4、5 GFLOPS GPU nVidia GeForce GTX 480:2.02 TFLOPS Modern GPUs support high precision 32-bit floating point throughout the pipeline No support for a double precision format Game Programming IIGame Programming II Floating Point Calculation Parallelism:allows simultaneous operations at the same time CPU
5、Do not adequately exploit parallelism Dual-core,quad-core GPU GeForce GTX 480:512 kernels Game Programming IIGame Programming II Parallelism Peak performance of computer systems is often far in excess of actual application performance The bandwidth between key components ultimately dictates system p
6、erformance CPU 64bits DDR3-2133 dual-channel:17GB/s GPU GeForce GTX 480:384bits,177.4GB/s Game Programming IIGame Programming II Bandwidth CPU Annual growth 1.5x-decade growth 60 x Moores law GPU Annual growth 2.0 x-decade growth 1000 x Faster than Moores law Multi-billion dollar video game market i
7、s a pressure cooker that drives innovation Game Programming IIGame Programming II Getting Faster and Faster Efficient computation Maximize the hardware devoted to computation Allow parallelism Task parallelism Data parallelism Instruction parallelism Ensure each computation unit operates at maximum
8、efficiency Game Programming IIGame Programming II Keys to High-Perf.Computing Efficient communication Simply providing large amounts of computation is not sufficient PEs often spend most of the time waiting for data Minimize off-chip communication Game Programming IIGame Programming II Keys to High-
9、Perf.Computing A programming model allowing high efficiency in computation and communication Two basic components Stream All data is represented as a stream An ordered set of data of the same data type Kernels:operations on streams Applications are constructed by chaining multiple kernels together G
10、ame Programming IIGame Programming II Stream Programming Model Operates on entire streams of elements and produces new streams Within a kernel,computations on one stream element are never dependant on computations on another element Input elements and intermediate computed data are stored locally Fi
11、ts perfectly onto data-parallel hardware Game Programming IIGame Programming II Kernel Use of transistors can be divided to three categories:Control:direct the computation Datapath:perform computation Storage:store data Game Programming IIGame Programming II Efficient Computation(1)Only simple contr
12、ol flow in kernel execution Devote most of transistors to datapath hardware rather than control hardware Streams expose parallelism in the application Allows a hardware implementation to specialize hardware Game Programming IIGame Programming II Efficient Computation(2)Off-chip communication is effi
13、cient Intermediate results between kernels are kept on-chip to minimize off-chip communication High degree of latency tolerance Game Programming IIGame Programming II Efficient Communication Prescribes both the operation to be executed and the required data Only a limited prefetch of the input data
14、can occur Jumps are expected in the instruction stream L2 cache consumes lots of the transistors in CPU Game Programming IIGame Programming II Instruction-Stream-Based(CPU)Separates two tasks:Configuring PEs Controlling data-flow to and from PEs Data elements can be assembled from memory before proc
15、essing Uses only small caches and devotes the majority of transistors to computation Game Programming IIGame Programming II Data-Stream-Based(GPU)The stream formulation of the graphics pipeline All data as streams All computation as kernels Both user-programmable and nonprogrammable stages can be ex
16、pressed as kernels Game Programming IIGame Programming II Mapping Pipeline to Stream Model Fixed Very fast Can not modify the pipeline,only can turn on/off some functions Hard to implement advanced techniques on GPU Programmable Allows programmers to write shaders to change the pipeline Game Program
17、ming IIGame Programming II Fixed vs.Programmable Three programmable kernels in pipeline Vertex shader Geometry shader Pixel shader Load shaders through graphics API The fixed pipeline are replaced by shaders Game Programming IIGame Programming II Programmable Graphics Hardware MIMD:Multiple Instruct
18、ion stream,Multiple Data stream A number of processors that function asynchronously and independently Game Programming IIGame Programming II Vertex Processor Operate on a single input vertex and produce a single output vertex Replace transformation&lighting unit Now you have to do everything by your
19、self Transformation Lighting Texture coordinates generation As a minimum,a vertex shader must output vertex position in homogeneous clip space Game Programming IIGame Programming II Vertex Shader:Basic Function What else we can do?Displacement mapping Object deformation Vertex blending Game Programm
20、ing IIGame Programming II Vertex Shader:Advanced Function We can not Add or delete any vertices Change the primitive type Change the order of vertices form the primitives No knowledge of the type of primitive and neighboring vertices Game Programming IIGame Programming II Vertex Shader:Limitations S
21、IMD:Single Instruction,Multiple Data Achieves data level parallelism “get this pixel,get the next one”-“get lots of pixel”Game Programming IIGame Programming II Fragment Processor Invoked once for each fragment covered by the primitive Computes the final pixel color and depth Can output up to 8 32-b
22、it 4-component data for the current pixel location Game Programming IIGame Programming II Fragment Shader:Basic Function Enables rich shading techniques Per-pixel lighting,bump mapping,normal mapping Fluid simulation Game Programming IIGame Programming II Fragment Shader:Advanced Function Dynamic br
23、anching less efficient than vertex proc.Can not change the screen coordinate of a fragment No arbitrary memory write Game Programming IIGame Programming II Fragment Shader:Limitations New for 2007 Executed after vertex shaders Input:whole primitive,possibly with adjacent information Invoked once for
24、 every primitive Output:multiple vertices forming a single selected topology(tristrip,linestrip,pointlist)Output may be fed to rasterizer and/or to a vertex buffer in memory Game Programming IIGame Programming II Geometry Shader Point Sprite Expansion Single Pass Render-to-Cubemap Dynamic Particle S
25、ystems Fur/Fin Generation Shadow Volume Generation Game Programming IIGame Programming II Geometry Shader:Applications Graphics applications Per-pixel lighting Ray tracing Deformation GPGPU Computer vision Physically-based simulation Image processing Database queries Game Programming IIGame Programm
- 配套讲稿:
如PPT文件的首页显示word图标,表示该PPT已包含配套word讲稿。双击word图标可打开word文档。
- 特殊限制:
部分文档作品中含有的国旗、国徽等图片,仅作为作品整体效果示例展示,禁止商用。设计者仅对作品中独创性部分享有著作权。
- 关 键 词:
- 游戏 编程 GPU 基础
限制150内