Monday, June 27, 2011

C#/SQL: Search Builder: Defining a Fuzzy Search

We assimilate between ten to twelve billion records at Melissa Data that we collect via donation or purchase into just over a billion high quality records that are current, accurate, and more complete. Each record consists of many fields, the contents of which possess a different commercial value to our various customers. Some are interested in a record if one of its fields contains a specific value or its value is a member of a certain set, while others are interested in certain combinations of values in a set of fields.

Typically, customers assemble an SQL query to access the record set of interest in a database, and assembling that approach often becomes long and confusing as the customer's intent is fine-tuned. An ordered list of fuzzy constraints applied to their search is relatively easy to list and understand, and will usually result in better matches to their search intent. These simple fuzzy statements concern which fields are of interest, and also concerns such as: the precedence field matching should be applied when there are multiple fields being considered; the quality of the match levels; the level of match completeness desired; which, and what number of a record's fields must match; is one date range preferable to others; and more. The Search Builder is the application Melissa Data sales engineers use to convert customers' intentions to a data search.

The DataGridview provides add, remove, sort, and modify options.

This is an application designed and orginally coded in C# with WPF and SQLite. I later switched to Winforms when I couldn't find capabilities in the WPF datagridview necessary for user stories that arose later in the project.

The checked or unchecked states of the grid of checkboxes shown above in the DataGridview in both the Rank and Terms tabs are read as binary strings from the SQLite database (e.g. 10010110000101011 & 01110011011101110 correspond to two of the rows with 17 columns).

The Setup tab, seen below, contains a list of the database files previously opened and a list of the user-specified names for saved fuzzy queries which are stored in an SQLite database.

The Setup Tab

The strings seen in the controls in the Candidates and Tests tabs are loaded from a dump of the table and field names in each customer's contact database.  Those names are combined with verbs that help construct good fuzzy statements such as "lastname.required.unaltered".


The Candidates Tab

No comments:

Post a Comment