I was wondering if anyone knew of any document corpora that include the name of the author. Specifically, I'm looking for large corpora (>1000), with at least a sub-sample labelled for authorship.
Most studies in authorship analysis only use a relatively small amount of data. I'm looking to see if it works on a large scale, and need the data to do so. My thought was that a standard dataset in document clustering has authorship as meta data, but this isn't really information that gets 'advertised'. Any thoughts?