Deepak S Padmanabhan – IBM India Research Lab
Information Extraction for enhancing Master Data Management
This talk introduces an IBM product for exploiting Information Extraction to enhance Master Data Management that was conceptualized and developed almost wholly within IBM India Research Lab. It uses dictionary based annotators to extract entity mentions from unstructured text documents, and uses such entity mentions to enhance relationship discovery and entity resolution tasks. After an overview of the product, some technical challenges in enhancing rule-based IE systems will be introduced followed by a detailed description of a technique to enhance coverage of regular expressions for usage in rule-based information extraction. Besides the main technical content, I will also include a brief overview of IBM India Research Laboratory, and a broad summary of technical work being done there.
The following papers contain the technical details that will be covered in this talk:
Karin Murthy, Prasad Deshpande, Atreyee Dey, Ramanujam Halasipuram, Mukesh Mohania, Deepak P, Jennifer Reed, Scott Schumacher, “Exploiting evidence from unstructured data to enhance master data management”, Industry Track Paper at the 38th Intl. Conference on Very Large Databases (VLDB 2012), Istanbul, Turkey, August 2012
Karin Murthy, Deepak P, Prasad Deshpande, “Improving Recall of Regular Expressions for Information Extraction”, 13th Intl. Conf. on Web Information Systems Engineering (WISE 2012), Paphos, Cyprus, November 2012
School of Computing Science & Digital Media, Robert Gordon University, Riverside East, Garthdee, Aberdeen, Conference Room N118, 15:30 – 16:30.