

For data clustering tasks where the clusters are expected to form complicated geometric patterns, it is often advantageous to consider nonlinear dimension reduction techniques, such as spectral clustering. In spectral clustering, information on the similarity between data points are used to construct a graph together with a discrete diffusion operator on this graph, known as the graph Laplacian. Graph Laplacians encode geometric information contained in data, via the eigenfunctions associated with their small eigenvalues. These spectral properties provide powerful tools for data clustering and data classification tasks. We will give an introduction to graphbased spectral clustering and classification. When a large number of data points are available one may consider instead continuum limits of the graph Laplacian, both to give insight and, potentially, as the basis for numerical methods. We summarize recent insights into the properties of these algorithms by investigating their corresponding formulations in the large data limit. These results open doors to the design of new algorithms, in specific application domains such as image segmentation, and for large data regimes more generally.
