|
I am looking for Gigaword corpus(distributed by LDC) reader. The corpus is in SGML format. So can anyone please suggest me already existing Gigaword corpus reader which would extract the relevant text from the these gigaword files. I found the lingpipe library which had a Gigaword corpus reader but its deprecated now and the library does not support it anymore. |