This project is read-only.
DataExtractor allows you to extract entities of some types or with help of regexes.

From the box it can extract:
  • Emails;
  • IP addresses;
  • Phones in USA format;
  • CapsWord;
  • Urls;
  • DateTimeLong – dates in "N/N/N H:M:S AM/PM" format.

Example:

string text = @"
    email1 = test@domain.com
    email2 = contactus@mydomain.ru
    IP = 127.0.0.1
    Phone = (978) 369-1118, ask ALEX or LEO
    Url = http://dddddd.com:3030/page.aspx?id=5&g=6
    Url = https://microsoft.com/ or ftp://microsoft.com/
    Date = 01/02/2008 5:20:50 PM
";
DataExtractor ext = new DataExtractor(text, DataTypes.All);
ext.AddRegex(Regexes.CapsWord, "AdditionalCapsWords");
var results = ext.GetExtractedResults();
var emails = ext.GetExtractedResults("Email");
  
foreach (var result in results)
    Console.WriteLine("{0}: {1}", result.GroupName, result.Value);

Output will looks like:



Class Regexes contains a set of regexes and methods for working with.

Last edited Mar 11, 2011 at 4:06 PM by akrakovetsky, version 1

Comments

No comments yet.