COSC6365 Introduction to High-Performance Computing, Spring 2014

Cr. 3. (3-0). Prerequisites: COSC 4310, COSC 4330, and COSC 6303 or equivalent.

Class Schedule: 4 PM – 5:30 PM, Tuesdays and Thursdays in AH2.


Instructor: Prof Lennart Johnsson,, PGH 592, x33371
Office Hours: By appointment

Teaching Assistant: Olga Datskova,

What will be covered?

This course provides an introduction to High-Performance Computation (HPC), an essential tool in many sciences and engineering and increasingly also in Internet Computing, medicine, the humanities, and the entertainment industry. HPC has traditionally been driving computer architecture and computer system design and has during the last decade been driving industry’s efforts towards significantly improved energy efficiency in computation for economic and environmental reasons. This drive has impacted computer architecture and how computer systems are managed, their behavior and programming.

HPC today implies large scale parallel computation and in many cases employs high-performance networking technologies. In HPC there is a great emphasis on achieving high efficiency in resource utilization for applications, something that often requires a good understanding of the application, the processor, node and platform architecture, and compilers, operating systems and programming tools. Thus, HPC requires good knowledge of a broad range of topics. Increasingly, the drive for performance also includes a drive for energy efficiency and, as stated by Google researchers, energy proportional computing.

This course focuses on the architecture of high-performance, energy efficient, scalable computing environments and basic algorithms for scientific and engineering problem solving in such environments. The course is suitable for scientists and engineers with computationally demanding problems, and computer scientists and applied mathematicians with an interest in techniques for efficient use of platforms suitable for large scale computations.

The course gives an overview of high performance computer architectures, parallel programming paradigms with an emphasis on MPI, OpenMP, and GPU programming. The Map-Reduce programming model will also be described. Parallel algorithms for matrix operations, solution of linear systems of equations, sorting, the Fast Fourier Transform and other common operations will be taught.

Reducing the need for data motion through proper data allocation and management of the data motion is critical for performance and energy efficiency. User level techniques for managing memory hierarchies will be discussed, and tools for performance analysis will be covered briefly.

Scalable platforms will be used for homework and projects.

Lecture Notes and Handouts

Assignments: 50%
Midterm exam 20%
Final project 30%

Tentatively there will be six homework assignments. Of those the first two assignments are aimed at familiarizing yourself with the computer systems and compilers to be used for assignments in the course. These assignments do not depend on material taught in the course but on material you should already be familiar with. Tentatively, there will be four programming assignments on course material. The assignments are designed to familiarize you with programming models introduced in class and that are common in large-scale applications.

The class will meet during the final exam time for project presentations. A written report is also required for the final project. There will be no final exam. The final project, as indicated by the grading, represents a substantial piece of the course, and should be started about four weeks before the end of the semester. Final projects require the approval of the instructor, and may be based on your own suggestion, or chosen among a list of suggested projects. Final projects, if appropriate, may be performed by teams of two students, though individual projects are encouraged.

No make-up exam will be offered unless there is a verifiable excuse, such as medical condition.

Assignment Late Policy. As a general rule, you will be penalized 15% of actual score each day for late assignments.

Assignments are to be performed individually. Copying of past or current assignments of other students is not allowed.

Academic Honesty: Any student found guilty of academic dishonesty will be reported to department and established procedure will be followed. (See details at


Lecture 1 Overview – Applications
Lecture 2 – Technology I
Lecture 3 – Clusters
Lecture 4 – Memory I
Lecture 5 – Memory II
Lecture 6 – Cache
Lecture 7 – Performance Tools
Lecture 8 – Parallel Computing Concepts
Lecture 9 – OpenMP
Lecture 10 – Vectorization I
Lecture 11 – Matrix-Vector Multiplication
Lecture 12 – Matrix-Matrix Multiplication
Lecture 13 – Cache Oblivious Algorithms
Lecture 14 – OpenCL
Lecture 15 – Review
Lecture 16 – Interconnection Networks I
Lecture 17 – Interconnection Networks II II
Lecture 18 – Sorting I
Lecture 19 – Sorting II
Lecture 20 – FFT I
Lecture 21 – FFT II
Lecture 21 – LU I
Lecture 22 – LU II
Lecture 23 – MPI
Lecture 24 – Partitioning I
Lecture 25 – Partitioning II
Lecture 26 – Data Partitioning III
Lecture 27 – N-body
Lecture 28 – Sparse Matrices

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s