What happens to Superpixels?

While reading papers regarding superpixels, these superpixels nicely bounding on the edges of the objects and you say! han! why not? it is natural to use the superpixel.

It’s only when you apply on the images that you end up with all sorts of problems, using one of the method you get so many varying-shape superpixels and in other where the superpixels don’t look much scary you get superpixel boundaries sometime not following the image boundaries.

What to change? in most of the methods the only parameter user-controlled is the number of superpixels. There are some that tell how much weight one should give to the edges, but still there is no stable formulation.

Problem that creates is how to compare the superpixels across the image? Shape and size do not appear to be so meaningful, edge-directions on the superpixel boundary could be one? But I could not find any proper method that can do comparison across the images. Any method should take into consideration that due to some calculation the superpixel that should have been one has been broken into two. Or one that was square has become little elongated on one side.

Then there is problem of defining the neighborhood, should even the one pixel boundary is the neighbor. How the neighborhood distance should be computed? should it be distance between the centers? Percentage of the boundary between the superpixels?

More and more algorithms are basing their calculations on the superpixels and they are dealing with these problems quite randomly. They calculate large number of features per superpixel, hoping somehow one of the features will negate the effect of discrepancies talked above. We need to develop more proper solutions with clear thinking of objectives.


Face Recognition by Yima and Features of Andrew Ng’s recent work.

Was thinking in terms of Andrew Ng’s “Building High-level Features Using Large Scale Unsupervised Learning (NIPS 2012)  and the Yima’s Robust Face Recognition (code present my other blog)

What could be benefits of using the features coming from the Andrew Ng and explicitly modeling them using the Sparse dictionary learning. Definitely one cannot use the the Dictionary as done by the Yima, since that is not feasible for huge amount of data and people. So will the features coming from the Andrew Ng’s work provide the robustness when used for the dictionary learning and then the coding?

Or Group Sparse coding and Block Dictionary learning could be used to better model the network itself, thus reducing the complexity and time required to train the network?

Just a thought.

NIPS 2012: Multimodal Learning with Deep Boltzmann Machines

This is quite interesting paper from from the Ruslan (Toronto University) ( project page: http://www.cs.toronto.edu/~nitish/multimodal/,  video-lecture http://videolectures.net/nips2012_salakhutdinov_multimodal_learning/) [they used Gaussian RBM while making DBM]

Interesting interms of application and how the DBM is used.

multi modal DBM

In this way they can use it given one set of features to find others. I will recommend watching the video lecture.

Vision and Deep Learning in 2012

This entry is an effort to collect important Deep Learning Papers that were published in 2012 especially related to computer vision.

There is general resource http://deeplearning.net/ but not a good resource that collects the papers in Deep Learning w.r.t to Computer Vision problems.

General Resources 

Interesting Papers 

Bird Dataset

If you are photographer then you can contribute the Birds ID research



Cosegmentation collecting implementation

Hi everyone, I am trying to develop a collect links to the implementations of Cosegmentation algorithms (in any language C/C++, Java, matlab, python, etc…). Unfortunately very few authors make their implementations public, so it becomes difficult for the new work in the area to compare their algorithm with what previously have been done.

CVPR 2012

What we can do with Curiosity’s 64×64 and 256×256 images from Mars

Images we are getting from the Curiosity rover are just 64×64 or 256×256.

In the world of high resolutions images and multi-spectral cameras, we have device which sending us only 64×64 or 256×256 images. What can we do with it? What kind of computer vision algorithms could be run and information could be extracted?

It will be interesting to look in this direction…. to see what could be done here.

shadow in Gale Crater on Mars