|
I am looking to evaluate some of the deep methods on a image segmentation problem familiar to the team. I am very familiar with regular MLPs but not so with image processing. The problem is very simple, given a set of k x k images (k ~= 500), say for each pixel if it is class A or class B (e.g., sky & not sky). I have 1000's of unlabeled images and a handful labeled ones. If need be I can generate more labeled ones (with a different method) but these will not be perfect. The resolution of the segmented image should be the same as the original. I can downsample the images if need be to save computation time, but 100x100 would probably be the extreme limit. So the goal is not unlike that in this paper by Alvarez/LeCun et. al. I am wrapping my head around Theano and its algorithms to see how this would work but not quite there yet. I was planning to start from the LeNet example in the tutorial. My current plan:
So I have only a handful of high quality labeled images but can generate an arbitrary number of noisy ones (should be reasonably good but not perfect).
Any guidance appreciated. |
|
I am not very familiar with the problem of image segmentation and I may not be able to give a full answer but as far as I understand, in order to perform deep learning, it is always good to pre-train your network in an unsupervised manner to get good initialization for the weights. In case you don't perform any pre-training, you are likely to start with a bad random set of initial weights for the network and may end up stuck in a poor local minima. Your network can be DBNs, Stacked Auto-encoders or any such common methods used for pre-training a deep network. Since you have a lot of unlabelled data, there should be a big plus to have unsupervised pre-training before any fine-tuning. Now coming to the issue of image segmentation, the CNN LeNet5 example from theano may not be very helpful as it shows a completely supervised classification problem in which the assigned label is for the complete image and not for every pixel. You can probably looked at something like denoising auto-encoder (DA) and Stacked DA to start with the pre-training. You can further put a regression or classification layer on top of it as may be required. |