2
2

I am searching for methodologies to apply text analytics to financial crimes detection, using UIMA, combined with correlation and deviation statistical analysis.

This is a call to anybody would has
1) de-identified financial data sets 2) typical use-cases for financial crimes detection 3) Actual UIMA models for this

This question is marked "community wiki".

asked Jul 07 '10 at 09:39

kameron%20cole's gravatar image

kameron cole
1123

edited Jul 07 '10 at 21:33

Joseph%20Turian's gravatar image

Joseph Turian ♦♦
469041105126

I haven't done 1), 2), or 3), but if you give us more details about the task, we can give you some ideas.

(Jul 07 '10 at 21:32) Joseph Turian ♦♦

I attended a talk on fraud detection by an auditing firm. Even they have a data sparsity problem. There just aren't enough positive examples, and you can't be sure the negative examples are really negative.

One possible approach would be to extract any numbers that appear in the text and tag them by type from the context. Then you could apply Benford's Law or some other statistical method.

(Jul 08 '10 at 13:25) cityhall

3 Answers:

So for (1), de-identification is usually done by using an entity-type tagger to identifying all people, places, organization, and date time expressions and replacing them with generic tokens (e.g., PERSON). Sometimes a de-identifier will try to make a consistent replacement of a person's name with a new one (this requires solving coref). You should check out the bio-medical domain for de-identifier papers since they have to do this a lot.

This answer is marked "community wiki".

answered Jul 09 '10 at 10:04

aria42's gravatar image

aria42
194962241

The Enron email dataset might be of interest to you. UIMA can be used to tag semantically similar words-of-interest in the email body text (e.g. Jeff Skilling's emails).

This answer is marked "community wiki".

answered Jul 08 '10 at 12:01

Clifton%20Phua's gravatar image

Clifton Phua
12

edited Jul 08 '10 at 12:03

This paper might be of interest to you

This answer is marked "community wiki".

answered Jul 07 '10 at 10:13

Aditya%20Mukherji's gravatar image

Aditya Mukherji
2251611

Your answer
toggle preview

powered by OSQA

User submitted content is under Creative Commons: Attribution - Share Alike; Other things copyright (C) 2010, MetaOptimize LLC.