|
Suppose we have a supervised training set $T={ (x_1, y_1),..., (x_n,y_n)}$ where $x_i$ is an example and $y_i in {-1,+1}$ is its label. Further suppose that examples are only observable through a feature extraction function $f(x;s)$ where $x$ is an example and $s in {s_1,dots,s_m}$ is an argument for feature extraction. For each possible value of $s$, we train a linear SVM (on the set ${ (f(x_1;s), y_1),..., (f(x_n;s),y_n)}$). Let $w_i$ be learned weights of the SVM for $s=s_i$. My question is on combining subsets of these SVMs for improved classification. Specifically, for a test example $x$, suppose that we have the scores of only the first two SVMs (feature extraction is costly): $w_1^T f(x;s_1)$ and $w_2^Tf(x;s_2)$. How can we combine these scores (optimally) to obtain a final decision? A trivial answer would be to train a SVM for each subset of $s$ values but this is not tractable. Ideally, I'm interested in a probabilistic interpretation. Assuming each SVM models $P(y|f(x;s_i))$, I want to express $P(y|f(x;s_1), f(x;s_2))$ using $P(y|f(x;s_1))$ and $P(y|f(x;s_2))$. |
Why are my latex equations not rendered properly?