This course provides an introduction to High-Performance Computation (HPC), an essential tool in many sciences and engineering and increasingly also in Internet Computing, medicine, the humanities, and the entertainment industry. HPC has traditionally been driving computer architecture and computer system design and has during the last decade been driving industry’s efforts towards significantly improved energy efficiency in computation for economic and environmental reasons.
HPC today implies large scale parallel computation and in many cases employs high-performance networking technologies. In HPC there is a great emphasis on achieving high efficiency in resource utilization for applications, something that often requires a good understanding of the application, the processor, node and platform architecture, and compilers, operating systems and programming tools. Increasingly, the drive for performance also includes a drive for energy efficiency and, as stated by Google researchers, energy proportional computing.
This course focuses on the architecture of high-performance, energy efficient, scalable computing environments and algorithms for scientific and engineering problem solving in such environments. The course is suitable for scientists and engineers with computationally demanding problems, and computer scientists and applied mathematicians with an interest in techniques for efficient use of platforms suitable for large scale computations.
The course gives an overview of high performance computer architectures, parallel programming paradigms with an emphasis on MPI, OpenMP, and GPU programming. The Map-Reduce programming model will also be described. Basic algorithms for matrix operations, the solution of linear systems of equations, sorting, the Fast Fourier Transform and other common operations will be taught.
Reducing the need for data motion through proper data allocation and management of the data motion is critical for performance and energy efficiency. User level techniques for managing memory hierarchies will be discussed, and tools for performance analysis will be covered briefly.
Scalable platforms will be used for homework and projects.
Lecture 1 Overview – Applications
Lecture 2 – Technology I
Lecture 3 – Clusters
Lecture 4 – Technology II
Lecture 5 – Memory I
Lecture 6 – Memory II
Lecture 7 – Cache
Lecture 8 – Cache II
Lecture 9 – Parallel Computing Concepts
Lecture 10 – Vectorization I
Lecture 11 – OpenMP
Lecture 12 – Vectorization II
Lecture 13 – Matrix-Vector Multiplication
Lecture 14 – Matrix-Matrix Multiplication
Lecture 15 – Cache Oblivious Algorithms
Lecture 16 – OpenCL I
Lecture 17 – OpenCL II
Lecture 17 – Interconnection Networks I
Lecture 18 – Interconnection Networks II
Lecture 19 – Interconnection Networks III
Lecture 20 – Sorting I
Lecture 21 – Sorting II
Lecture 21 – MPI I
Lecture 22 – MPI II
Lecture 23 – MPI III
Lecture 24 – LU Factorization and Solve
Lecture 25 – Data Partitioning I
Lecture 26 – Data Partitioning II
Lecture 27 – Fast Fourier Transform I
Lecture 28 – Fast Fourier Transform II