Hi ,

Is there a corpus which contains a collection of hate speeches? Currently, I know of the Bitterlemons corpus and CORPS though both seem unsuitable for my task at hand. I want a larger collection of something like this. Any help regarding the same would be highly appreciated.

asked Oct 14 '10 at 12:13

Dexter's gravatar image

Dexter
416243438


4 Answers:

There's nothing that comes to mind that exactly fits what you're looking for.

Construction of a corpus of hate speech would force a researcher to walk on thin political ice. It requires a firm empirical definition of Hate Speech. If this definition can be construed to include what some could interpret as political disagreement, unnecessary and necessary fire and ire could be brought in the direction of a well-meaning researcher. Not to mention, the odious nature of the task of collecting this linguistic material.

Take for example, the very high profile debate over the location of Park51 (or Cordoba House). Some of the opposition to the placement of the Islamic center and mosque could possibly be construed as "Hate Speech". However, other opposition was more measured and (while I disagree with the position) I wouldn't go so far as to call it "Hate Speech". I, for one, wouldn't want to be in a position to draw that line.

That said, there seems to be some research on "Hate Speech" or at least hateful speech. Here's a paper on swearing and abuse in British English which suggests that the Lancaster Corpus of Abuse exists. Then there are a few more linguistic oriented papers which don't necessarily have associated corpora, including this and this.

This may not be the most satisfying answer, but maybe it'll get you in the right direction.

answered Oct 15 '10 at 19:49

Andrew%20Rosenberg's gravatar image

Andrew Rosenberg
173772540

Andrew, I understand & appreciate the concerns you have raised. I must add that one can create a corpus aligning to a particular definition OR a better way would be to take conspicuous hate speeches (one which are very extremist).

Thanks for pointing me out to the Lancaster Corpus of Abuse but detecting abuse/swearing in language may not be the right choice.

(Oct 16 '10 at 02:10) Dexter

i have hate speech data sets, but they're proprietary, with ordinal labels corresponding to the demands and restrictions of brand advertisers. We can talk about sharing, but we'll need to talk about what it is exactly that you're doing to do with it. please email me [email protected]

answered Oct 18 '10 at 14:16

downer's gravatar image

downer
54891720

Downer, Thanks! Could you please provide me a link to your home page? I am unable to find you!

(Oct 29 '10 at 06:35) Dexter

intentionally unavailable. email me if you want info.

(Oct 29 '10 at 08:49) downer

Hi Josh,

I tried to e-mail your Stern address, but it came back as undeliverable. Is there another way I might contact you? Thank you for your help.

M Holtz

(Jun 25 '11 at 07:35) M Holtz
-1

Hi Josh,

I tried to e-mail your stern address, but it came back as undeliverable. Is there another way I might contact you? Thank you for your help.

M Holtz

answered Jun 21 '11 at 22:26

M%20Holtz's gravatar image

M Holtz
0

-1

oops, my email is [email protected]

answered Jun 25 '11 at 10:35

downer's gravatar image

downer
54891720

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.