Some input from a non-expert here. I am developing a text classifier as well, with large dimensionality (many words) and large # of input documents.
I achieved a large jump in accuracy by using the 'maximum entropy' classifier in my ML lib, Python NLTK. This is sometimes known as logistic regression. The classifier actually used the same exact feature data structures as the naive bayesian so it was literally a slide-in replacement.
I struggled for a couple of months with naive Bayesian and selecting features (tokens, bi-grams, top 1000 more informative, etc, etc, etc.
You might also want to build a separate classifier (maybe as an input or adjunct to your main classifier) that looks at file metadata - discretized creation date, file owner, file size, file type/extension, email headers, etc. I had surprisingly good results classifying on this type of data.