Co-Segmentation in CVPR 2012

There is the list of Co-Segmentation papers in the CVPR 2012 {if you find someother interesting papers regarding Co-Segmentation please send message or post comment thanks}

  • “Multi-Class Cosegmentation” Armand Joulin, Francis Bach, Jean Ponce
  • “On Multiple Foreground Cosegmentation” Gunhee KIM, Eric P. Xing
  • Higher Level Segmentation: Detecting and Grouping of Invariant Repetitive Patterns” Yunliang Cai, George Baciu: not directly co-segmentation paper but could be seen in that way. 
  • “Random Walks based Multi-Image Segmentation: Quasiconvexity Results and GPU-based Solutions” Maxwell D. Collins, Jia Xu, Leo Grady, Vikas Singh
  • A Hierarchical Image Clustering Cosegmentation FrameworkEdward Kim, Hongsheng Li, Xiaolei Huang
  • Unsupervised Co-segmentation Through Region MatchingJose C. Rubio, Joan Serrat, Antonio López, Nikos Paragios

Some interesting papers to look into

  • Learning Image-Specific Parameters for Interactive Segmentation”  Zhanghui Kuang, Dirk Schnieders, Hao Zhou, Kwan-Yee K. Wong, Yizhou Yu, Bo Peng
  • “Graph Cuts Optimization for Multi-Limb Human Segmentation in Depth Maps” Antonio Hernández-Vela, Nadezhda Zlateva, Alexander Marinov, Miguel Reyes, Petia Radeva, Dimo Dimov, Sergio Escalera
    • {JUST want to read it to See How the Depth Data is Being Used}
  • “Active Learning for Semantic Segmentation with Expected Change”  Alexander Vezhnevets, Joachim M. Buhmann, Vittorio Ferrari
    • Basic Objective is to Learn about the “Active Learning” and how it is used
  • “Semantic Segmentation using Regions and Parts”  Pablo Arbeláez, Bharath Hariharan, Chunhui Gu, Saurabh Gupta, Lubomir Bourdev, Jitendra Malik
  • “Affinity Learning via Self-diffusion for Image Segmentation and Clustering” Bo Wang, Zhuowen Tu
  • “Bag of Textons for Image Segmentation via Soft Clustering and Convex Shift”  Zhiding Yu, Ang Li, Oscar C. Au, Chunjing Xu
  • “Multiple Clustered Instance Learning for Histopathology Cancer Image Classification, Segmentation and Clustering” Yan Xu, Jun-Yan Zhu, Eric Chang, Zhuowen Tu
  • “Maximum Weight Cliques with Mutex Constraints for Video Object Segmentation”  Tianyang Ma, Longin Jan Latecki

getting into Deep Learning

Yes, the fever has reached me also 🙂 and I have decided to look into the deep learning. Some interesting papers if you want to have a look

    • Training Products of Experts by Minimizing Contrastive Divergence, Geoffrey Hinton
    • Deep Boltzmann Machine; Salakhutdinov, Hinton; Proceedings of the international conference on artificial intelligence and statistics, 2009. Knowing and understanding following concepts will help reading this paper
      • Boltzmann Machine and RBM (reading Product of Experts is highly recomended)
      • Annealed Importance Sampling (AIS) (Neal 2001) or have a look at “Importance Sampling: A Review” by Tokdar and Kass
      • Mean Field as used in Variational Inference. (Wikipedia page is quite helpful)

Reading is good, driving equations is better.

Another good read for beginners is

From Neural Networks to Deep Learning

  • Very interesting point made by the Jef Hawkins (author of On Intelligence and founder of Numenta)

It requires a temporal memory that learns what follows what. It’s inherent in the brain. If a neural network has no concept of time, you will not capture a huge portion of what brains do. Most Deep Learning algorithms do not have a concept of time

CVPR 2012, attending Sebastian Thrun’s talk

{being updated as the talk goes on}

Sebastian is wearing a version of google glasses and showing the Google driver-less car video. The project is interesting but the story behind is much more interesting, they shows video of one of researcher driving it, who is legally blind.

It is showing how the Change Detection and segmentation will become an important problem.

If you guys can find the video link do share, talk is good enough to re listen

Talking about California the motorbikes can pass through very near to the car or inbetween the two cars to overtake them, so it becomes difficult to track them. Especially when they become so near to the car that they appear to be one. Mentions that same kind of problem could be seen in the Kinect and if someone working on the Tracking can solve more efficiently this problem it could be life saving. 

Saying the driverless car is lot safer than the human in case of collision but mentions where there are situations where computer cannot fully understand the situation and reacts improperly.

Showing one more application, used by the motor patrolling personals. So that they don’t have to give much attention to car and can do their job.

Excellent talk, although we were discussing a case of such cars in cities like Lahore, Delhi or even New York. In some crowded cities such situations arise quite commonly where the such “safe” cars might end up in deadlock.

Is human pose enough to tell human action?

With recent papers coming in the area of the detection of Human Action in static images, the question of pose detection has taken center stage. Many papers tend to model the problem in trying to recover the pose or trying to recover body parts.

Question is does finding a pose or representing a human in terms of different body parts tell us what he is doing?

For example, if someone labels all the parts of the players in the following can we make a decision that they are playing cricket?


what about following?


Why it is not just Walking?

Very important thing is CONTEXT: cricket-bat, helmet, pads, green grass, crowd,  etc…  how much are they needed to define that above guys are playing cricket.

Not to say that there are some poses which actually represent the cricket or other game, e.g. have a look at following


But what about the differentiating between the certain shots of squash and certain poses of batsman in cricket?

Do we should also include the factor that different sports are photographed differently also? e.g Tennis you will find many images taken from camera on top of head of player, looking down . This will be not possible in soccer.





Edwin Chen did a very nice tutorial on Conditional Random Fileds. CRF have been used in NLP and Computer Vision papers ( Two out of three papers I read yesterday were using CRF). Question the bothered me was what is exact difference between CRF and MRF? Where to use just MRF and where to use CRF?  For example Edwin Chen’s blog says every HMM is CRF, is there any such relationship between MRF and CRF?

There are few pointers to this answer, have a look at




Meet Edward H Adelson

If you have seen following image you have met Edward H Adelson,

He is faculty member at M.I.T Dept. of Brain and Cognitive Sciences, was recently going over some part of his paper “On Seeing Stuff: The Perception of Materials by Humans  and Machines“, quite interesting paper. Talks about why recognizing materials is important and points out that machines are not able to do that. In his paper one example is of icecream, that due to it’s texture can be recognized even by the child where as a machine cannot do that. This paper was published in 2001, I feel still the machines cannot recognize ice-cream.

There are many illusions on his webpage, have a look,

Computer Vision algorithms where are implementations?

Why can’t one find the implementations of papers published in prestigious conferences and journals? even one which have been published 2 years back?
It takes way an important tool to compare current algorithms with the previous works. If you want to compare you have implement that paper and end up figuring out all the tweaking parameters. This greatly hampers both the speed of research as well introduces the mistrust, did the author’s algorithm really working? or I am making some mistake? was the tweaking done by hand? or there is some other algorithm picking up those values?

CVPR, ECCV and other prestigious conferences should ask the authors to make their implementations available or available on request at-least after the conferences are held. I know certain research’s implementation could not be made available due to contractual obligations however if the algorithm is been published it should come with some open source implementation or atleast an executable for the popular platforms.