Hi, i have been trying to code a computer graphic manipulating AI built on top of gimp( GNU Image Manipulation Program) that will mimic or learn form a human user actually working on gimp( GNU Image Manipulation Program )with little success.

please can anyone point me to the right approach on how to tackle this ?

thanks

update::objectives of the project is the teach a machine how to manipulation digital pictures (like remove a background you know ) while i do train it using gimp i was hoping to hook into gimp since i can script it in python (a language i know very well ) i am not really an expert i will love hear from more experience ML guys?

asked Oct 17 '10 at 19:56

daniel's gravatar image

daniel
5113

edited Oct 19 '10 at 00:08

1

Can you clarify your objectives? Are you attempting to suggest tools for the user, take actions for the user? predict what the user might do? etc.

(Oct 17 '10 at 22:44) anonymous_4235

Can you provide a lot of training examples? Like 10000 examples of each different task?

(Oct 19 '10 at 00:25) Joseph Turian ♦♦

4 Answers:

There is a lot of research on using machine learning and pattern recognition methods in all sorts of image manipulation and/or computer vision problem. From obvious problems (object/face detection/recognition, 3d modeling, denoising, inpainting, image segmentation) to harder ones (gestural computing, gait/pose modeling/recognition, american sign language transcription, anything you can think of). However, most people from this website are closer to natural language processing, and hence are not very familiar with what is done in computer vision or image manipulation these days.

So it's an area big enough that there is no single "obvious" project you can do. Pretty much any task a person does when editing an image can be automated (vectorizing, removing red eyes, removing ugly thngs, pasting lots of pictures together for a panoramic view), and many of these can be improved by learning at least a few hard elements. The two biggest conferences in computer vision are ICCV and CVPR. There is this very good website called cvpapers which stores lots of recent papers in computer vision, and you can search by sub area. I strongly encourage you to read some (at least 10) papers that catch your eye (for example, by focusing on interesting problems) from that website before thinking about what such a project could be. Also search for computer vision people in or around your university (they might be in the EE department instead of the CS department, however) and talk to them, and see if any one of them could advise you. The way you talk does not seem to show enough maturity to undertake research on your own, and it's better to have an advisor in the area.

answered Oct 21 '10 at 05:17

Alexandre%20Passos's gravatar image

Alexandre Passos ♦
1895244214333

Didn't know about cvpapers, thanks for the pointer!

(Oct 21 '10 at 14:20) David Warde Farley ♦

I don't want to discourage you from working on cool ML problems, but this problem in general sounds extremely difficult, it even seems too ambitious for a PhD topic in ML.

First of all you have a very large state space the set of possible images is preposterously large, it's for this reason image data is notoriously difficult to learn with. Moreover, such a problem would not only have an image as a single state but more likely a sequence of images that can transition from one to the other when the user takes actions, i.e. you start with an image and remove all of one color (like the sky), then remove parts of an object, etc. This means that you cannot make the assumption the data is simply drawn at random (i.i.d.) like the standard supervised learning setting. More likely such a problem would fit in the reinforcement learning problem definition (which is harder).

Worse yet, your action space is huge, the agent can take an action for every tool (brush, eraser, clone, etc), for every setting of that tool (brush size, opacity, etc), for every pixel on the image (where you use that tool). If you are attempting to learn about how different actions affect the state of the system over time, this is a lot of stuff to take in! Just consider the size of the search space: number possible images multiplied by the number of actions.

And after all that you still would have to come up with a reward function to tell the algorithm which states are better than others (or infer it from data with inverse RL, which is not quite solved).

You might be able to solve something simple, Like draw stick men or happy faces, but even that sounds hard.

There have also been many algorithms proposed to automatically segment images by objects or foreground/background. This sort of thing is usually studied in the supervised learning context where the user hand labels hundreds of images and the algorithm attempts to learn and generalize this to unseen images. But this would not really have anything to do with users using GIMP...

answered Oct 19 '10 at 00:45

anonymous_4235's gravatar image

anonymous_4235
16627

edited Oct 19 '10 at 03:23

ok.. i still want to do this i know the heart of the matter is what is an objective measure of a "prefect" image(picture).is there paper on the this?

wow that is an awesome response. i meant in the process you explain machine learning so well that a six year old will understand it.please edit Wikipedia ML page .i think u have a better explanation of what ML is?

(Oct 20 '10 at 21:30) daniel

Perfect in what sense? Beautiful? Natural? What kind of images? Whose criteria?

There's a ton of subjectivity here.

(Oct 21 '10 at 14:03) David Warde Farley ♦

Here is a paper which came to mind (there could be more recent work, check the citations), which learns a transformation which relates two images. This transformation can then be applied to a third image.

http://www.cs.utoronto.ca/~hinton/absps/cvpr07.pdf

I think this is pretty far below what you were looking for, but it should give you more concrete idea of the difficulties that this type of thing entails.

answered Oct 21 '10 at 02:08

jbowlan's gravatar image

jbowlan
1062710

edited Oct 21 '10 at 02:09

Your question is incredibly broad, and you're probably going to have to restrict the class of images you want to deal (and the type of enhancement) within any practical application. There has been a lot of work on image denoising and super-resolution, for example, which are fairly domain agnostic but of limited scope.

For a more domain-specific approach, you might have a look at Eisenthal et al, 2006's Neural Computation paper about what constitutes an "attractive" face, and the SIGGRAPH paper by the same group (Leyvand et al, 2008) that focused on enhancing images of faces using their learned models of facial attractiveness and doing a local search to try and "improve" the photograph with a modest warp.

answered Oct 21 '10 at 14:12

David%20Warde%20Farley's gravatar image

David Warde Farley ♦
491818

edited Oct 21 '10 at 14:17

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.