Heterogeneous Computing with OpenCL 2.0 teaches OpenCL and parallel programming for complex systems that may include a variety of device architectures: multi-core CPUs, GPUs, and fully-integrated Accelerated Processing Units (APUs). This fully-revised edition includes the latest enhancements in OpenCL 2.0 including:
• Shared virtual memory to increase programming flexibility and reduce data transfers that consume resources
• Dynamic parallelism which reduces processor load and avoids bottlenecks
• Improved imaging support and integration with OpenGL
Designed to work on multiple platforms, OpenCL will help you more effectively program for a heterogeneous future. Written by leaders in the parallel computing and OpenCL communities, this book explores memory spaces, optimization techniques, extensions, debugging and profiling. Multiple case studies and examples illustrate high-performance algorithms, distributing work across heterogeneous systems, embedded domain-specific languages, and will give you hands-on OpenCL experience to address a range of fundamental parallel algorithms.
Key Features
- Updated content to cover the latest developments in OpenCL 2.0, including improvements in memory handling, parallelism, and imaging support
- Explanations of principles and strategies to learn parallel programming with OpenCL, from understanding the abstraction models to thoroughly testing and debugging complete applications
- Example code covering image analytics, web plugins, particle simulations, video editing, performance optimization, and more
- List of Figures
- List of Tables
- Foreword
- Acknowledgments
- Chapter 1: Introduction
- Abstract
- 1.1 Introduction to Heterogeneous Computing
- 1.2 The Goals of This Book
- 1.3 Thinking Parallel
- 1.4 Concurrency and Parallel Programming Models
- 1.5 Threads and Shared Memory
- 1.6 Message-Passing Communication
- 1.7 Different Grains of Parallelism
- 1.8 Heterogeneous Computing with OpenCL
- 1.9 Book Structure
- Chapter 2: Device architectures
- Abstract
- 2.1 Introduction
- 2.2 Hardware Trade-offs
- 2.3 The Architectural Design Space
- 2.4 Summary
- Chapter 3: Introduction to OpenCL
- Abstract
- 3.1 Introduction
- 3.2 The OpenCL Platform Model
- 3.3 The OpenCL Execution Model
- 3.4 Kernels and the OpenCL Programming Model
- 3.5 OpenCL Memory Model
- 3.6 The OpenCL Runtime with an Example
- 3.7 Vector Addition Using an OpenCL C++ Wrapper
- 3.8 OpenCL for CUDA Programmers
- 3.9 Summary
- Chapter 4: Examples
- Abstract
- 4.1 OpenCL Examples
- 4.2 Histogram
- 4.3 Image Rotation
- 4.4 Image Convolution
- 4.5 Producer-Consumer
- 4.6 Utility Functions
- 4.7 Summary
- Chapter 5: OpenCL runtime and concurrency model
- Abstract
- 5.1 Commands and the Queuing Model
- 5.2 Multiple Command-Queues
- 5.3 The Kernel Execution Domain: Work-Items, Work-Groups, and NDRanges
- 5.4 Native and Built-In Kernels
- 5.5 Device-Side Queuing
- 5.6 Summary
- Chapter 6: OpenCL host-side memory model
- Abstract
- 6.1 Memory Objects
- 6.2 Memory Management
- 6.3 Shared Virtual Memory
- 6.4 Summary
- Chapter 7: OpenCL device-side memory model
- Abstract
- 7.1 Synchronization and Communication
- 7.2 Global Memory
- 7.3 Constant Memory
- 7.4 Local Memory
- 7.5 Private Memory
- 7.6 Generic Address Space
- 7.7 Memory Ordering
- 7.8 Summary
- Chapter 8: Dissecting OpenCL on a heterogeneous system
- Abstract
- 8.1 OpenCL on an AMD FX-8350 CPU
- 8.2 OpenCL on the AMD Radeon R9 290X GPU
- 8.3 Memory Performance Considerations in OpenCL
- 8.4 Summary
- Chapter 9: Case study: Image clustering
- Abstract
- 9.1 Introduction
- 9.2 The Feature Histogram on the CPU
- 9.3 OpenCL Implementation
- 9.4 Performance Analysis
- 9.5 Conclusion
- Chapter 10: OpenCL profiling and debugging
- Abstract
- 10.1 Introduction
- 10.2 Profiling OpenCL Code Using Events
- 10.3 AMD CodeXL
- 10.4 Profiling Using CodeXL
- 10.5 Analyzing Kernels Using CodeXL
- 10.6 Debugging OpenCL Kernels Using CodeXL
- 10.7 Debugging Using printf
- 10.8 Summary
- Chapter 11: Mapping high-level programming languages to OpenCL 2.0: A compiler writer’s perspective
- Abstract
- 11.1 Introduction
- 11.2 A Brief Introduction to C++ AMP
- 11.3 OpenCL 2.0 as a Compiler Target
- 11.4 Mapping Key C++ AMP Constructs to OpenCL
- 11.5 C++ AMP Compilation Flow
- 11.6 Compiled C++ AMP Code
- 11.7 How Shared Virtual Memory in OpenCL 2.0 Fits in
- 11.8 Compiler Support for Tiling in C++AMP
- 11.9 Address Space Deduction
- 11.10 Data Movement Optimization
- 11.11 Binomial Options: A Full Example
- 11.12 Preliminary Results
- 11.13 Conclusion
- Chapter 12: WebCL: Enabling OpenCL acceleration of Web applications
- Abstract
- 12.1 Introduction
- 12.2 Programming with WebCL
- 12.3 Synchronization
- 12.4 Interoperability with WebGL
- 12.5 Example Application
- 12.6 Security Enhancement
- 12.7 WebCL on the Server
- 12.8 Status and Future of WebCL
- Works Cited
- Chapter 13: Foreign lands: Plugging OpenCL in
- Abstract
- 13.1 Introduction
- 13.2 Beyond C and C+ +
- 13.3 Haskell OpenCL
- 13.4 Summary
- Index
- Gaster et al, Heterogeneous Computing with OpenCL, Morgan Kaufmann, 2011, Paperback, 282pp, 9780123877666, $69.95
- Herlihy: The Art of Multiprocessor Programming Revised Ed, Morgan Kaufmann, 2012, Paperback, 514pp, 9780123973375, $74.95
Software engineers, programmers, hardware engineers, graduate students