Revision history[back]
click to hide/show revision 1
Revision n. 1

Jul 09 '11 at 01:40

Robert%20Layton's gravatar image

Robert Layton
1625122637

Document clustering corpora with authorship

I was wondering if anyone knew of any document corpora that include the name of the author. Specifically, I'm looking for large corpora (>1000), with at least a sub-sample labelled for authorship.

Most studies in authorship analysis only use a relatively small amount of data. I'm looking to see if it works on a large scale, and need the data to do so. My thought was that a standard dataset in document clustering has authorship as meta data, but this isn't really information that gets 'advertised'. Any thoughts?

click to hide/show revision 2
Revision n. 2

Aug 17 '11 at 07:16

Robert%20Layton's gravatar image

Robert Layton
1625122637

Document clustering corpora with authorship

I was wondering if anyone knew of any document corpora that include the name of the author. Specifically, I'm looking for large corpora (>1000), with at least a sub-sample labelled for authorship.

Most studies in authorship analysis only use a relatively small amount of data. I'm looking to see if it works on a large scale, and need the data to do so. My thought was that a standard dataset in document clustering has authorship as meta data, but this isn't really information that gets 'advertised'. Any thoughts?

edit: There are a few new users, so I'm shamelessly bumping this question to see if anyone has any ideas. In short, looking for a large collection of text with known authors for at least some documents.

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.