Friday, April 15, 2016

Winforms DataGridview: Empire Search Logic Builder

When deciding the quality of a street address returned from a database query you'll realize that many variables contribute to the answer.  Is every address component returned? Are the address components in the desired order? If not in the desired order, which of the alternative orders are preferred? Is all of the spelling correct?  Is the national culture reflected in the character set? Were we successful in filtering profanity? How was the recovery when misleading information was used in the query statement?
Of those answers, which is more significant when deciding the magnitude of a result's quality?  That is the goal of the Empire Search Logic Builder -- to decide a magnitude for result quality so we can compare the quality of one version of an address to another version and thereby establish the precedence of the many variations and how it can be communicated to customers.

Automation is mandatory when there is a potential for dealing with more than one billion addresses. I developed the UI for this application so a user can easily specify the parameters for search results evaluations that can then be used to generate a list of the order and rank by which addresses returned from a database query are valued.  Using those results we understand the implications of parameter settings and can provide discrimination to our customers over which quality of address they can purchase. An example is a marketing company that is interested in the purchase of every single address in a certain county, and they intend to use the name, "Current Resident" in the address line. They only need to be assured that the physical address is correct, and therefore a relatively low magnitude for the address quality is acceptable. On the other hand, if a mayoral candidate wants to assure his "friends" in a housing development of his good intentions, he will value an address more that contains not only the correct physical address, but also the correct -- and current -- occupants' names in correct spelling. That type of address would be returned from a database query possessing a magnitude of quality higher than those suitable for addressing to "Current Resident".

This first screenshot from the Search Logic Builder shows a combination of user options and settings read from an existing driver/configuration file.
Empire Search Logic Builder in Verbose Mode

This second screenshot shows a different tab on the same window with the hard-to-read strings replaced with a column of equivalent tokens and equivalent graphical symbols.
Empire Search Logic Builder in Symbol Mode

This screenshot shows generated results that are the precedence by which addresses will be valued based on the settings specified above. Note that there are over 52K magnitude values in this results table and they are stacked in decreasing order of desirability. Also, many items that show in the DataGridView control are highlighted because they contain a blank if the building name is not matched, as selected in the rightside listbox and radio button.
Second page showing results of precedence generation


What I think is the technically interesting part of this application is the story of the icon columns in the two pages. The icon images are stored in a resx file which allows for the simplest ClickOnce distribution and installation. However, the sets of address component parameters are stored in a configuration file in the verbose form seen in the SLB's verbose mode DataGridView. Different configuration files will make use of different sets of verbose strings. At load, the application must scan the component lines to make a list of the component strings that will be used in the evaluation driven by the configuration file and then pair each to a short term (like TN2 or AP4), pair the short term to a graphic image in the graphics resource file, and then arrange them in the order stated in the file.
Note that the left column in the first page's DataGridView shows hyphen-separated short-form identifiers for each of the verbose strings that were prepared in a configuration dialog in an order the user specified. That is appropriate because the order is important in the evaluation. The format is considered easier to read than placing the short-form terms in columns because the address coders at Melissa Data see those as addresses, and spreading them into columns, ordered or not, makes them harder to comprehend. However, the icons in the rightside of that DataGridView are there to indicate the presence of a token and not the order of the tokens. So, giving each token type its own column provides an easier-to-comprehend view of the presence of a token as well as the trends intended as the lines are descended.