I am an undergraduate student slowly beginning to build an interest in artificial intelligence. So I am not sure what is possible or not for now. Anyway,

Is it possible for a neural network to generate a 3D model of an object by being "shown" multiple pictures of the same object taken from different angles (the angles are not perpendicular)? Is it possible for networks to interpolate the data between? The number of pictures are sufficient and the object is entirely visible.

Thanks in advance.

asked Jul 07 '10 at 14:55

Emre's gravatar image

Emre
36226

I'd like to know if the techniques people mention below are fast enough to be integrated into a neural network approach. For instance, if you can patch or extract 2D images into a 3D model of an image to simultaneously build your 3D representation and compare it to your 2D representation, and then add in some of the statistical power of a neural network approach, you'd have a pretty solid object recognizer right there.

(Aug 09 '10 at 20:33) Jacob Jensen
2

You seem to be putting the cart before the horse here, emphasising method (neural network) over problem (inferring 3D model). Unless you have a good reason to use a neural network I suggest you address the problem first and then choose an appropriate method to solve it. I say this because I see a lot of undergraduates that think AI = neural networks + genetic algorithms, when really these techniques are only applicable for a small subset of the problems addressed in machine learning. Of course, this may not apply to you, in which case please disregard.

(Aug 10 '10 at 06:02) Noel Welsh

@Noel Welsh: that's a very good point, yes, and can't be emphasized enough: why using a neural network when there are other, well accepted, methods, and there is no evidence that the neural network/GA would work better?

(Aug 10 '10 at 07:09) Alexandre Passos ♦

@Alexandre Passos: Well, honestly, neural networks and GAs are just plain cooler. C4.5, naive bayes and mixture methods are all stultifying by comparison. Graphical models and kernel methods are almost as interesting, but just a little less accessible.

It's funny, given the above, that smart decision tree + powerful boosting method = extremely good results for almost any classification task with a smallish number of features.

You are right though, and I jumped the gun in my own earlier comment too, the cool method is not always (or even often) the right method.

(Aug 10 '10 at 09:17) Jacob Jensen

Thanks for all of your comments!

(Sep 02 '10 at 12:05) Emre

4 Answers:

Yes can have a look at this for more info http://make3d.cs.cornell.edu/

This is one of the leading model of visual cortex which has practical application in CV as well http://cbcl.mit.edu/jmutch/cns/

answered Aug 10 '10 at 19:54

DirectedGraph's gravatar image

DirectedGraph
54531422

edited Aug 10 '10 at 20:05

That's a very cool project, although it uses Markov random fields, which are not exactly neural networks (although some neural networks are MRFs)

(Aug 10 '10 at 19:56) Alexandre Passos ♦

the term neural networks has become so dilute that it has no longer intended meaning. very few algorithms actually mimic the "real neurons in the brain".

(Aug 10 '10 at 20:02) DirectedGraph

A neural network can be trained to recognize the same object in different lighting conditions and viewpoints, according to Nair and Hinton, NIPS 2009. Also, Schlecht and Barnard had, also in NIPS 2009, an statistical model that learns block representations of 3d objects.

By the fact that both these papers got published so recently at a top conference I'd say that just inducing a 3d model of an object reliably using machine learning is still an open problem. You can do similar things with standard computer vision techniques, but they have their limitations. To see state-of-the-art computer vision reconstruction techniques, check here (sections Geometry, Pose, and 3D), also check here sections Shape from X and Stereo and Structure from Motion). There are dozens of papers on that site, and there are entire books on reconstruction as well. The classical techniques are Structure from motion, space carving, and some others I can't remember right now. Any vision book should be a good intro on those.

This answer is marked "community wiki".

answered Jul 07 '10 at 16:06

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1677242188306

edited Jul 07 '10 at 16:45

Thanks, I will check those.

(Sep 02 '10 at 12:05) Emre

+1 for referring recent results.

(Sep 08 '10 at 01:32) Lucian Sasu

I have seen shape inferred with Gray Codes for 3D scanning, and also somewhat with Regions of Interest in computer vision to infer shapes looking at different axises. I would think that a neural network could be trained to look at a shape, but the variables would be something about guessing the perspective. I would think that computer vision techniques would be more suited, unless you're trying to train the network with classified photos and then predicting the size, or shape, or some other attribute against unclassified photos.

answered Jul 07 '10 at 15:19

th0ma5's gravatar image

th0ma5
463

-1

I've seen a movie that was converted into a 3D movie automatically, I don't know the name of the software off the top of my head but I could find out if you're interested. It worked best in shots that caught the object in motion or rotation, the algorithm picks out reference points, follows their motion, and infers the shape of the object and textures the 2d scene onto the inferred 3d scene. Unfortunately I'm pretty sure it's closed source and very expensive, but it more or less confirms that there exists an algorithm that can turn photos into 3d (albeit many photos in the form of frames).

answered Jul 07 '10 at 16:28

John%20Wilkinson's gravatar image

John Wilkinson
0

Yes, there are many algorithms for that, in the computer vision literature. Look in the geometry section of http://www.cvpapers.com/iccv2009.html for some state-of-the-art examples. They just don't do it by means of a neural network.

(Jul 07 '10 at 16:34) Alexandre Passos ♦
Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.