Wednesday, March 27, 2013

C#: GlobalAddressCompare With HTTPWebRequest()

A small project I provided a couple of years ago was for the benefit of the international QA team at Melissa Data. They wanted to automate the generation and comparison of their own street address formats returned from our product with the values returned by Google Maps API, Bing Maps API, and Address Doctor.



In an example run, the application whose UI is shown above reads a large list of street addresses from an tester-prepared input file and submits each of them via an C# HTTPWebRequest call to the Google and Bing Maps web services as well as the Address Doctor product that we run in -house. Upon receiving the web XML-formatted responses it filters them so only the relevant, comparable data remains -- presented for visual side-by-side comparison and then compares the results returned by the selected services,

Java/SQL: List Dropped Records

At Melissa Data we purchase many billions of personal contact records from multiple sources representing different facets of the world of commerce.  We combine them, eliminating duplication and obsolete information, much like the goal of database normalization, to distill them into the richest, fullest set of records anywhere -- baked fresh in three week cycles.  Companies who were once our competitors now submit their records to us so they can be both, updated to contain the most current information, and appended to fill in their blanks.

The processing of these records is done in a sequence of phases, starting with a simple conversion from the source formats to our standard. Through our software build process each address acts like a magnet that attracts all of the data associated with it from the batch of original records.  We start with a set of as many as twenty billion records and end up with around one billion product records.

We sometimes encounter a large, but imperceptible loss of records during the transition from one build phase to another after we've made changes to the code that controls the build process. It is not uncommon to find we've lost around 400 million records between steps -- and not realize it unless something unexpected pops into view because that's less than 2% of the data involved. 

The application I wrote, known as "List Dropped Records" tracks every record, whether it has been reduced to an archive or remains active, in every phase. It opens the hundreds of thousands of files, reads the record lists therein, and compares the phases in order to report what has disappeared in the form of a database table that lists the record ID's dropped in each phase.  With that we can quickly learn whether or not we have loss, and how the amount of loss changes from one build to another.

Friday, March 15, 2013

C#/Custom User Controls: AtomSet Utilities

Overview: In a project whose goal is to convert a huge set of international street addresses obtained from hundreds of different sources to the format that is standard for its country, the code we develop at Melissa Data's executes a succession of processing steps. We want to see the effects of our code at each step to ensure it achieves what we expect or what else if not.  The AtomSetUtilities application is the solution I prepared to address this need.

There is an internal class that is the data storage unit for an international address in the sequence of processing steps, AtomSet.  Its data is stored in the raw format in which it was received from our sources in a normalized database, and assigned to the class members during class construction.  Because it concerns approximately 240 different national address formats, the definition of an AtomSet will sometimes change as development work continues, or because national formats vary. Also, the data it contains will require a variable number of entries in its components, so a viewer application must read and present each AtomSet in a dynamic way.

Because there are actually nine different steps in the execution sequence that we want to view I wrote a C# user control that I call the AtomSetViewerUserControl that can be dropped onto a UI area and loaded with whatever dataset is now required. The screenshot below shows one of these user controls containing three dynamically-placed text boxes on the AtomSetUtilities' configuration property page.



A particularly interesting feature that I've added, that was not in the original requirements, but which I thought would be helpful for communications between QA and Development is the ability to record and reload a Snapshot.  A Snapshot is a set of data values that are in the AtomSetUtilities' UI at some instant that have been serialized so that running state can be reloaded later.

Here is another property page with evidence of progress after some steps have completed:


Because I have responsibility for the development of 21 different development utility applications that concern similar subject matter I saw several violations of one of my favorite software design rules, the Once and Only Once rule.  That is stated from the hip as, "a rule should be coded once and only once, and any duplication should be eliminated by using a method that can be called from all places where the duplication was".  As I have developed the 21 applications I have prepared and efficiently made use of 23 libraries, among which are these: 1) the viewer described above; 2) the AtomSet used in all 21 applications; 3) a class definition that is displayed in a combobox; 4) a customized file class; 5) A custom file-open dialog containing an MRU list; 6) a international character culture translator; and, 7) a configuration serialization class.

Sunday, March 10, 2013

C#: KML Generator

Project overview: We want to show high quality competitive analysis, marketing, and project progress overviews in an easily understood geographical view. 

I studied the GIS tutorials and reverse-engineered some KML examples to write the KMLGenerator application which produces two and three dimensional overlays on Google Maps and Google Earth. The original request for an application that could render data geographically was for a color overlay, but as I explored the GIS methods I found that extruded features did a better job of conveying magnitude and making memories and so added several additional rendering and color modes.

The app's UI looks like this:


It accepts very easy to understand user-prepared Excel csv files with the location, data categories, and data values and another file I prepared that contains the geocodes for the centroid and perimeter coordinates of geographic entities such as countries and states. It generates output conveying numeric data values in an Earth map that looks like this:

and more of the same in a United States view:


Street Segments
Another project concerned showing street segments where the street is known by a different name in different locations so we could visualize what we were reading in the data. For instance, Pacific Highway South, Aurora Ave, Highway 99, and Evergreen Way are all different names for Washington State Highway 99 between SeaTac and Everett, WA. The map below shows sections of a road that are known by different names using a different color for each  different name.  A popup provides additional information such as the local street name and the address range.



I wrote this application so it would read and process a large data file and produce a different Google Earth folder for each viewpoint. Doing so allows the person who is reviewing street names for preferred values within a geocode range to run the entire list once and visit the folders, the content of which is one road, as they have time.