# Dimensionality Estimation Diary

## Contents

## Papers

- Belkin, Mikhail, and Partha Niyogi. "Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering."
- Ulrike von Luxburg. "A Tutorial on Spectral Clustering."
- Fowlkes, Belongie, Chung and Malik. "Spectral Grouping Using the Nyström Method."

## Paper summaries

- A Tutorial on Spectral Clustering
- Spectral Grouping Using the Nyström Method
- Laplacian Eigenmaps and Spectral Techniques for Embedding and Clustering

## Implementation Steps

- Take a dataset - ISOMAP, turning teapot or synthetically generated

- For synthetically generated data:

- Choose random orientation of the circle in high dimension
- Produce points on circle by uniformly sampling from the random orientation in high dimension
- Add Gaussian noise

- Construct a graph from the data by connecting k-nearest neighbors of each node
- Solve Graph Laplacian of the graph and find the projection of the data onto the low dimensional embedding
- Apply KMeans++ to identify structure in low dimension
- Using the structure in low dimension, construct the graph and solve the Graph Laplacian as illustrated above

## Spectral Clustering

We performed Spectral clustering on a few toy datasets such as, concentric circles, half moons and swiss roll, to ensure the correctness of the spectral clustering implementation. Below are the plots of the data which have been clustered. The plots illustrate the clustering is done based on connectivity rather than proximity.

## Spectral Analysis

To further verify the working of Spectral analysis, we tested it on two datasets, namely Swiss Roll and Turning Teapot, to ensure the projections make sense.

### Swiss Roll

The above plot represents the projection of the second eigenvector on the swiss roll dataset. That is, the eigenvector corresponding to the eigenvalue <math>\lambda_1</math>. This shows that the values of the projection along one dimension is constant which implies that that dimension has been reduced.

The above plot represents the projection of the first eigenvector on the swiss roll dataset. That is, the eigenvector corresponding to the eigenvalue <math\lambda_0=0</math>. This shows that the projection of the data onto the first eigenvector produces an almost constant value all along.

### Turning Teapot

The above plot represents the second eigenvector of the spectral decomposition, which is represented in 2D Euclidean Space. The plot is annotated with the corresponding image of the point. As we can see, the points have been ordered by the direction of rotation although the initial data was shuffled. (Note: The missing annotations do not mean that the corresponding image is not present. The number of annotations shown has been reduced to ensure that the images are better visible)

## Dimensionality Estimation

### Turning Teapot

--Yoavfreund 20:25, 13 February 2017 (PST) It is hard to say that there is a downward slope in any of these graphs. It looks more like a number of downward steps. I think that in order to understand what is going on we need a much larger number of data-points. Instead of real datasets it might be better to generate a dataset that corresponds to a 2D circle embedded in a high dimensional spaces.

We used the centroid initialization technique of KMeans++ to estimate the dimensionality of the data. We use the two measures to compute a log-log plot which in turn is used to estimate the dimensionality:

- <math>k</math> - Number of cluster representative chosen till now
- <math>\epsilon</math> - Average of previous <math>n</math> distances of the distance of the chosen point to the closest cluster representative.

We add noise to distort the data to improve the effect of KMeans++ centroid initialisation. Below are the log-log plots of <math>k</math> vs <math>\epsilon</math> for different levels of noise: --Yoavfreund 08:59, 22 February 2017 (PST)

- I don't understand you method for creating a circle. It seems too complex. I would generate a circle in <math>R^2</math> using the first two coordinates of the 1000, and then multiply by a random orthonormal matrix to rotate it to a different direction.
- What is the scaling of log(epsilon) in the second plot? And why is it so different from the scaling in the first plot?
- Please specific the actual noise amount of "little noise" and "some noise"
- please generate the graph also for "no noise"

### High dimensional circle

We generated a circle in <math>\R^{1000}</math> using the following method:

- Generate points on a n-sphere using Marsaglia's method
- Sample two points <math>p_1</math> and <math>p_2</math> from the n-sphere. These will determine the plane the circle will lie on.
- Make them unit vectors by dividing by the norm
- Make <math>p_2</math> perpendicular to <math>p_1</math> as follows: <math>p_2 = p_2 - ( p_1 \cdot p_2 ) \odot p_1</math>
- Use the parametric equation of the circle to sample points: <math>cos(\theta) * p_1 + sin(\theta) * p2</math> where <math>\theta \in [0,2\pi]</math>

Below are the plots obtained from running KMeans++ as mentioned above:

## Running Kmeans++ with different levels of noise

Below are the plots obtained from running KMeans++ as mentioned above, for different levels of noise:

- <math> k \in [1, 50], k \in \mathbb{Z} </math>
- Noise is sampled from a Gaussian distribution: <math> \mathcal{N}(0,1) </math>

--Yoavfreund 16:22, 22 March 2017 (PDT) The step from no-noise to 1x noise is devastating, from a slope of (1--9)/5=2 , you get to a slope of (10.97-10.85)/6~0.02.

It is also clear that the noise dominates the circle because the distances are so much larger.

I would try much smaller levels of noise: 0.01,0.001,... and see what level of noise is small enough that the slope is around 2.