|
Hi all, How would I find a corpus of Spanish text, as large as possible, for training general NLP models? Ideally, it would be a corpus of books and/or magazines for the sake of lower variation in grammar and spelling, etc. Thanks P |
|
I know about this tool, FreeLing, developed at Universitat Politècnica de Catalunya. It provides a range of NLP tools for Spanish. There is no corpora available for download, but I suppose you can get in touch with the people who work on the project and they can provide you with corpora for non-commercial use. |
|
What about the Spanish Wikipedia? http://es.wikipedia.org There are also books in Spanish in the Gutenberg project: http://www.gutenberg.org/browse/languages/es |
|
I've bee having some troubles finding suitable corpus, so I usually end up parsing webpages, which is not that difficult. You can try sending an email to these guys, perhaps they can be of some help. |