# Introduction to CuPy
<div class="dateauthor">
10 June 2021 | Jan H. Meinke
</div>
<img src="images/cupy.png" style="float:right">

In [None]:
%matplotlib inline
import matplotlib.pyplot as plt

## CuPY
### A NumPy-like interface to GPU programming

> The best way to program a GPU is to let other people do the work!

[CuPy][] provides a [NumPy][]-like interface to use and program GPUs.

[CuPy]: https://cupy.dev/
[NumPy]: https://numpy.org/

In [None]:
import cupy
N = 2048
A = cupy.random.random((N, N))
B = cupy.random.random((N, N))
C = A@B

This cell creates two random arrays of size N by N on the GPU and performs a matrix multiplication using Nvidia's optimized linar algebra library cuBLAS.

### Exercise
In [Think Vector][TV], you [calculated the Mandelbrot set][TV_Mandelbrot] using [NumPy][] and vectorization. Take either your solution or ours and convert it to [CuPy][]. Visualize the result.

Tip: If you get an error message when visualizing the results, take a look [below](#CuPy-Arrays).

[CuPy]: https://cupy.dev/
[NumPy]: https://numpy.org/
[TV]: ThinkVector.ipynb
[TV_Mandelbrot]: ThinkVector.ipynb#Programming-exercise-Mandelbrot

## GPU Libraries

* cuBLAS
* cuDNN
* cuRand 
* cuSolver
* cuSPARSE
* cuFFT
* NCCL 

CuPy is the fastest and easiest way to use Nvidia's GPU libraries.

### Exercise
a) Time the execution of C=A@B for different matrix sizes, e.g., 256, 512, 1024, 2048, 4096. Calculate the performance in GFLOP/s using gflops = 2e-9 * N ** 3 / t and store the sizes and the times in an ndarray.

b) Do the same using numpy. How do the numbers compare.

## CuPy Arrays

CuPy arrays live on the GPU. To retrieve them you can use

In [None]:
A = cupy.random.random((N, N))
A1_np = A.get()

In [None]:
A2_np = cupy.asnumpy(A)

The first command only works for GPU arrays, but the second can also be used for a NumPy array. If A is a GPU array, the data will be copied from the CPU to the GPU. If A is a NumPy array, no copy will be made and A2_np becomes a reference to A.

In [None]:
import numpy
x = numpy.linspace(0, 1, 10)
x_gpu = cupy.asarray(x)  # Copy x to the GPU

To transfer data to the GPU use `cupy.asarray`. This doesn't just work with NumPy arrays, but also, for example, with lists.

### Exercise
Create a 2D array of random number on the GPU. Transfer it to the CPU. Subtract 0.5 from all elements. Copy the result back to the GPU. Clculate the average value of all elements on the GPU and write the result to the screen.

## Picking a Device

CuPy always works on the *current device*. On multi-GPU nodes this can be changed using

In [None]:
cupy.cuda.Device(1).use()
A_on_gpu1 = cupy.random.random((N, N))

If you need to switch devices regularly, you can use a `with` statement:

In [None]:
with cupy.cuda.Device(2):
    A_on_gpu2 = cupy.random.random((N, N))

Note, that you can only access `A_on_gpu2` while device 1 is active. Otherwise, you'll get an error message.