Alex Egg,

# Semantic Image Search

This paper will explore creating a sophistcated imag search engine using a CNN model pertained on ImageNet as a feature extractor.

## Theory

It has been shown by Egg, et al [1] that the last fully connected layers of Deep Convolutional Archectures trained on ImageNet generalize images by learning their semantic structure. These semantic features in the last fully connected layer can be used as feature extractors for transfer learning which was first explored by the Decaf paper [2]. When image features from cafe are plotted in 2 dimensions [3 & 4] clear clustering of images can be seem. Not just clustering due to simularities in pixel intensities such that. a naieve system would acheive, but clustering as if the system understood higher-leve semantics of the object. For example, not just a cluster of Elephants in the same orientation, but a cluster of elephants in all orientations and evern a Pachaderm cluster.

The AlexNet paper [5] orginally explored using these image features w/ a distance measure to conduct semantic image search.

## Implementation

### Database

Our database will store the feature vector for each image in a native format (PG) that will let us perform nearest neighbor search using the database system. This allows us to run search on a large scale which would be prohibitive if we had to hold all the feature vectors in memory. See Posgtres Array Type: https://www.postgresql.org/docs/9.3/static/arrays.html which was introduced in verison 9.

Taking this a step further, Postgres’ cube module provides cube_distance, which calculates the distance between two vectors in a query.

cube_distance(cube, cube) returns double Returns the distance between two cubes. If both cubes are points, this is the normal distance function.

### Feature Extractor

For the feature extrator we will use a VGG Network that was pre-trained on imagenet. We will take the otuputs of the 2 FC layer, which is a 4096 D vector. All images in the database will be represented by the feature obtained from a forward pass through the network.

Note: It may also be possible to lower the dimensionality of the semantic representation though some PCA analysis. This hypothesis is alreadying confirmed by the fact that there is semantic clustering in 2D. This would have computation and storage advantages at the expense of accuracy.

For each query, the input image will do a forward pass through the network to obtain the feature vector. A distance measure will be collected for this image against every image in the database. These distance measures will be sorted and retuned as a ranked list. A search would be executed as $O(n)$ operation as exhaustive search will have to take place.