I would not say that style per se really matter, but vocaulary obvisously does. If you use a bag of words / tokens / ngrams as input features for you classifier, the classifier will not be able use the information present in tokens never seen before during the training phase while they might be very important for your test set.
I think to answer this question you need to try and train models on your specific task and measure the variance of the precision and recall performance by using cross validations with folds coming from your different datasets (textbooks, research papers, news, wikipedia, ...).
It is also probably very dependant on the size of your training corpus and the number and the nature of classes / labels you target.