|
Given a large collection of text documents that contain multi-lingual terms, such as Germany and Chinese, are there good approaches to build feature matrix? For a single langugage classification tasks, n-gram modeling works fine for most of the cases. |