Archive for the ‘ paper ’ Category

How Google Photos finds trees?

When a friend posted on facebook that if he types alligator in his Google Photos search page, it can find all the photos of the alligators in his images, even the ones which are not tagged. I was intrigued …

If you want to learn how it does work go to last section of this blog. I have detailed the links, both for general reading and technical one. Let me just dazzle you with some results here.

So I searched for trees in my photos,

IMG_20141107_111758710_HDR IMG_20150103_122857248_HDRIMG_20141106_120345930_HDR

It was even able to find few looking like trees, for example following from Museum of Fine Arts exhibition

DSCN2797  Which to me made sense, one can look at the texture, shape and colors. Thus inferring that some image has a tree in it. They might have added some context too, if there is a horizon etc.. (I already knew that they are using CNN, this was me trying to make sense of if I did not knew that they were using CNN).

What really surprised me were results below. One on the left has no color, none of the tree like texture and very thin tree like structure. On right there is an image in which tree has been made blue (it’s art installment where trees were colored blue to represent veins carrying oxygen), so it cannot use the color cues.


Gainesville Blue Trees art installment

Gainesville Blue Trees art installment

That made me think might be they using little more than just image cues, they might be using similarity among images and see if there is any other similar image, that has been labeled, has been tagged by someone or has some caption or description. Which in this dan and age is fair and smart thing to do.

I remembered a very old paper, I think it was from UCF Computer Vision Lab (I think one of the students of Mubarak Shah), where they were trying to distinguish between the grass, shrubs, trees, … and it was not an easy task. So I experimented with it. Next thing I searched was “grass” and yes results were quite different. Although it was not as accurate, however it did made sense.

IMG_20141106_120345930_HDR DSCN2711 IMAG0046 (1)R2 IMG_20141108_174249930_HDR

Terms like “food” give the most worse result. However, more defined objects give much better result, for example it was able to find “cycle” that was not even the main part of the image, similarly results for car and airplane were good. For searches on terms “Chair” it did an interesting thing. It found people in sitting pose, most of these were people who were sitting on Sofa or on the ground, but it made an association of human pose with the concept of “chair”.

How Magic works? (aka how google can do these amazing things?)

If you want to have a Google’s view of how the ‘magic’ works have a look at this blog If you are looking to learn something out of it, have a look at Freebase, CNN, Dr. Hinton…. and that their final layer is just linear classifier. If you are interested in getting some technical know-how, have a look at Dr. Hinton’s paper ImageNet Classification with Deep Convolutional Neural Networks“.

However, we know things have moved at a quite fast pace, about a year ago there was an announcement by google that they can now provide “natural description of images”  Technical stuff to look for  RNN (recursive neural network), how they are using for machine translation, have a look their paper, etc….

NIPS 2012: Multimodal Learning with Deep Boltzmann Machines

This is quite interesting paper from from the Ruslan (Toronto University) ( project page:,  video-lecture [they used Gaussian RBM while making DBM]

Interesting interms of application and how the DBM is used.

multi modal DBM

In this way they can use it given one set of features to find others. I will recommend watching the video lecture.

Vision and Deep Learning in 2012

This entry is an effort to collect important Deep Learning Papers that were published in 2012 especially related to computer vision.

There is general resource but not a good resource that collects the papers in Deep Learning w.r.t to Computer Vision problems.

General Resources 

Interesting Papers 

Cosegmentation collecting implementation

Hi everyone, I am trying to develop a collect links to the implementations of Cosegmentation algorithms (in any language C/C++, Java, matlab, python, etc…). Unfortunately very few authors make their implementations public, so it becomes difficult for the new work in the area to compare their algorithm with what previously have been done.

CVPR 2012

Co-Segmentation in CVPR 2012

There is the list of Co-Segmentation papers in the CVPR 2012 {if you find someother interesting papers regarding Co-Segmentation please send message or post comment thanks}

  • “Multi-Class Cosegmentation” Armand Joulin, Francis Bach, Jean Ponce
  • “On Multiple Foreground Cosegmentation” Gunhee KIM, Eric P. Xing
  • Higher Level Segmentation: Detecting and Grouping of Invariant Repetitive Patterns” Yunliang Cai, George Baciu: not directly co-segmentation paper but could be seen in that way. 
  • “Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions” Maxwell D. Collins, Jia Xu, Leo Grady, Vikas Singh
  • A Hierarchical Image Clustering Cosegmentation FrameworkEdward Kim, Hongsheng Li, Xiaolei Huang
  • Unsupervised Co-segmentation Through Region MatchingJose C. Rubio, Joan Serrat, Antonio López, Nikos Paragios

Some interesting papers to look into

  • Learning Image-Specific Parameters for Interactive Segmentation”  Zhanghui Kuang, Dirk Schnieders, Hao Zhou, Kwan-Yee K. Wong, Yizhou Yu, Bo Peng
  • “Graph Cuts Optimization for Multi-Limb Human Segmentation in Depth Maps” Antonio Hernández-Vela, Nadezhda Zlateva, Alexander Marinov, Miguel Reyes, Petia Radeva, Dimo Dimov, Sergio Escalera
    • {JUST want to read it to See How the Depth Data is Being Used}
  • “Active Learning for Semantic Segmentation with Expected Change”  Alexander Vezhnevets, Joachim M. Buhmann, Vittorio Ferrari
    • Basic Objective is to Learn about the “Active Learning” and how it is used
  • “Semantic Segmentation using Regions and Parts”  Pablo Arbeláez, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir Bourdev, Jitendra Malik
  • “Affinity Learning via Self-diffusion for Image Segmentation and Clustering” Bo Wang, Zhuowen Tu
  • “Bag of Textons for Image Segmentation via Soft Clustering and Convex Shift”  Zhiding Yu, Ang Li, Oscar C. Au, Chunjing Xu
  • “Multiple Clustered Instance Learning for Histopathology Cancer Image Classification, Segmentation and Clustering” Yan Xu, Jun-Yan Zhu, Eric Chang, Zhuowen Tu
  • “Maximum Weight Cliques with Mutex Constraints for Video Object Segmentation”  Tianyang Ma, Longin Jan Latecki

getting into Deep Learning

Yes, the fever has reached me also 🙂 and I have decided to look into the deep learning. Some interesting papers if you want to have a look

    • Training Products of Experts by Minimizing Contrastive Divergence, Geoffrey Hinton
    • Deep Boltzmann Machine; Salakhutdinov, Hinton; Proceedings of the international conference on artificial intelligence and statistics, 2009. Knowing and understanding following concepts will help reading this paper
      • Boltzmann Machine and RBM (reading Product of Experts is highly recomended)
      • Annealed Importance Sampling (AIS) (Neal 2001) or have a look at “Importance Sampling: A Review” by Tokdar and Kass
      • Mean Field as used in Variational Inference. (Wikipedia page is quite helpful)

Reading is good, driving equations is better.

Another good read for beginners is

From Neural Networks to Deep Learning

  • Very interesting point made by the Jef Hawkins (author of On Intelligence and founder of Numenta)

It requires a temporal memory that learns what follows what. It’s inherent in the brain. If a neural network has no concept of time, you will not capture a huge portion of what brains do. Most Deep Learning algorithms do not have a concept of time

Meet Edward H Adelson

If you have seen following image you have met Edward H Adelson,

He is faculty member at M.I.T Dept. of Brain and Cognitive Sciences, was recently going over some part of his paper “On Seeing Stuff: The Perception of Materials by Humans  and Machines“, quite interesting paper. Talks about why recognizing materials is important and points out that machines are not able to do that. In his paper one example is of icecream, that due to it’s texture can be recognized even by the child where as a machine cannot do that. This paper was published in 2001, I feel still the machines cannot recognize ice-cream.

There are many illusions on his webpage, have a look,