The problem with performing feature selection as a pre-processing step is that you fail to take the interaction of the feature representation and the learning algorithm into account. For example, if you use e.g. mutual information as a criterion, you assume conditional independence between features. The problem with forward selection is that it's greedy: asume you have features f1,f2,f3,f4 such that f1 and f2 are sightly discriminative, f3 and f4 together are the most discriminative features, but f3 or f4 alone are the least discriminative. Then the algorithm will select f1 and f2, but not f3 and f4. Backward elimination handles this better, but still is only locally optimal.
If you instead use L2-regularized SVM, you will take interactions between features into account and you will get a globally maximal optimum, though you will need to tune the regularization parameter. Still you should be careful about only adding features that you have good reason to believe will improve the results.