How does machine learning genrally handle problems where there are a variable number of inputs?

To put another way, the input vector to a Machine Learning problem has to be some length, what length do you make it for vairable length data?

For example, if I was learning sentences how would you handle the fact the the number of characters (or words) in a sentence can be any number?

  1. Do you just have some arbitraty maximum and just put zeros for the inputs that don't make the maximum length?
  2. Is there something fancy like a reverse soft max?
  3. What works well in practice?

asked May 07 '14 at 03:55

ML1982's gravatar image

ML1982
1112

edited May 07 '14 at 03:55


One Answer:

There may be several methods to do that, dependent on the kind of data you have.

For instance, I worked with variable length input data once. They were neural spike times within a window. The approach there was assume infinite temporal resolution and place a gaussian window centered at each spike time. Thus, we could integrate the resulting signal or calculate its inner product with another spike train, even if they had different spike counts. We used this inner product definition to create a Reproducing Kernel Hilbert Space. Note that this way there was no need to define a maximum size or to binaries the space...

answered May 07 '14 at 16:48

eder's gravatar image

eder
2162511

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.