Vision and Deep Learning in 2012

This entry is an effort to collect important Deep Learning Papers that were published in 2012 especially related to computer vision.

There is general resource but not a good resource that collects the papers in Deep Learning w.r.t to Computer Vision problems.

General Resources 

Interesting Papers 

Meet Edward H Adelson

If you have seen following image you have met Edward H Adelson,

He is faculty member at M.I.T Dept. of Brain and Cognitive Sciences, was recently going over some part of his paper “On Seeing Stuff: The Perception of Materials by Humans  and Machines“, quite interesting paper. Talks about why recognizing materials is important and points out that machines are not able to do that. In his paper one example is of icecream, that due to it’s texture can be recognized even by the child where as a machine cannot do that. This paper was published in 2001, I feel still the machines cannot recognize ice-cream.

There are many illusions on his webpage, have a look,

Can Computer Vision do that?

I was listening to one Bach as recorded for BBC Proms Bach Day ‘Passacaglia and fugue’, in the comments of youtube video there was reference to the Flute player.  I wanted to know where the video captured the Flute player so I had to walk through whole video (somewhat Binary search manner) until I found the video clip.

While searching I began wondering can recent technology solve problem of finding that part of video where it shows certain instrument is being “played”. Note the PLAYED part.

The naive way is to find the frames where e.g. Flute is being shown, then use the sound analysis to find whether in  those frames we can hear the Flute sound or not. It appears to be good solution, however the problem is finding where the Flute is being played is not easy when many other instruments are also playing. Secondly it’s not necessarily that the flute person that is being shown is actually playing flute also.

Question is Can we judge (both using the sound and not using sound) whether in given clip some instrument is being “PLAYED” or is being just shown,

Try this by looking at this video while you enjoy the amazing Bach (From 4:37)

SimpleCV; another Computer Vision Open Source Library

SimpleCV ( is a python based library. Have not tried that, but it looks interesting. Question is how it is different from OpenCv or just in different language.

Car Datasets

I am looking for the Car Detection Datasets, especially rear and front ones.
I have found following few

    1. They have Side view of Car and Multiview Car dataset also


The ones in the 2 and 3 are more good trying to get more datasets, if you have some please send me a link