Archive for February, 2012

Is human pose enough to tell human action?

With recent papers coming in the area of the detection of Human Action in static images, the question of pose detection has taken center stage. Many papers tend to model the problem in trying to recover the pose or trying to recover body parts.

Question is does finding a pose or representing a human in terms of different body parts tell us what he is doing?

For example, if someone labels all the parts of the players in the following can we make a decision that they are playing cricket?


what about following?


Why it is not just Walking?

Very important thing is CONTEXT: cricket-bat, helmet, pads, green grass, crowd,  etc…  how much are they needed to define that above guys are playing cricket.

Not to say that there are some poses which actually represent the cricket or other game, e.g. have a look at following


But what about the differentiating between the certain shots of squash and certain poses of batsman in cricket?

Do we should also include the factor that different sports are photographed differently also? e.g Tennis you will find many images taken from camera on top of head of player, looking down . This will be not possible in soccer.






Edwin Chen did a very nice tutorial on Conditional Random Fileds. CRF have been used in NLP and Computer Vision papers ( Two out of three papers I read yesterday were using CRF). Question the bothered me was what is exact difference between CRF and MRF? Where to use just MRF and where to use CRF?  For example Edwin Chen’s blog says every HMM is CRF, is there any such relationship between MRF and CRF?

There are few pointers to this answer, have a look at




Meet Edward H Adelson

If you have seen following image you have met Edward H Adelson,

He is faculty member at M.I.T Dept. of Brain and Cognitive Sciences, was recently going over some part of his paper “On Seeing Stuff: The Perception of Materials by Humans  and Machines“, quite interesting paper. Talks about why recognizing materials is important and points out that machines are not able to do that. In his paper one example is of icecream, that due to it’s texture can be recognized even by the child where as a machine cannot do that. This paper was published in 2001, I feel still the machines cannot recognize ice-cream.

There are many illusions on his webpage, have a look,