Archive for the ‘ DailyRead ’ Category

NIPS 2012: Multimodal Learning with Deep Boltzmann Machines

This is quite interesting paper from from the Ruslan (Toronto University) ( project page:,  video-lecture [they used Gaussian RBM while making DBM]

Interesting interms of application and how the DBM is used.

multi modal DBM

In this way they can use it given one set of features to find others. I will recommend watching the video lecture.


getting into Deep Learning

Yes, the fever has reached me also 🙂 and I have decided to look into the deep learning. Some interesting papers if you want to have a look

    • Training Products of Experts by Minimizing Contrastive Divergence, Geoffrey Hinton
    • Deep Boltzmann Machine; Salakhutdinov, Hinton; Proceedings of the international conference on artificial intelligence and statistics, 2009. Knowing and understanding following concepts will help reading this paper
      • Boltzmann Machine and RBM (reading Product of Experts is highly recomended)
      • Annealed Importance Sampling (AIS) (Neal 2001) or have a look at “Importance Sampling: A Review” by Tokdar and Kass
      • Mean Field as used in Variational Inference. (Wikipedia page is quite helpful)

Reading is good, driving equations is better.

Another good read for beginners is

From Neural Networks to Deep Learning

  • Very interesting point made by the Jef Hawkins (author of On Intelligence and founder of Numenta)

It requires a temporal memory that learns what follows what. It’s inherent in the brain. If a neural network has no concept of time, you will not capture a huge portion of what brains do. Most Deep Learning algorithms do not have a concept of time

Questioning Sparsity

Went through Rigamonti’s CVPR 2011 paper “Are Sparse Representations Really Relevant for Image Classification?” {Rigamonti, Brown, Lepetit}

Recently there being quite a lot of papers on the Sparsity, above is a very valid question. They report lot of experiments and compare many different techniques. Their conclusion is sparsity in important while learning Feature Dictionary but not helpful during classification. Although only thing it was able to convince me was that might be in their setting the convexity is not working.

Looking forward to see rebuttals or papers questioning or answering questions raised by Rigamonti; in coming year. Overall this appears to be paper that will be cited quiet a lot.

Anchors and Cluster Centers

Went through A Probabilistic Representation for Efficient Large Scale Visual Recognition Tasks  (Bhattacharya, Sukthankar, Jin, Mubarak Shah) CVPR 2011. It has good results but basically they are trying to find weights to fit mixture of gaussians (each centered around a selected feature vector).

That’s what my understanding is ……

Instead of doing the clustering to find the words of dictionary, they randomly select the Features from the dataset and call them ‘Anchors’. These ‘Anchors’ do same job as words afterwords. Instead of just matching one feature with only one word, they try to get the weight on each word; that is for each given image, they get K features, each feature can say how important each ‘Anchor’ is, that makes the weight ‘w’ vector. They find weight vector through maximum likelihood estimator. Now when they have weight vector for each image, they do the SVM for classification.

Their results are good but I still have questions about how they know how many Anchors to randomly pick and every time they will run their experiment their results will be different because their Anchors have changed.

But again they have done extensive experiment. Should have a look at their experiment section.

Bayesian Nonparametric Models on Decomposable Graphs

Quite interesting paper,