Monday, May 11, 2015

Empire Test Case Builder

When constructing a database that contains all mailing addresses in the world a lot of testing is required to ensure your work is correct -- in each country. The GCAT Test Case Builder searches converted data to generate examples of all permutations of practical address types that can be used in a query like our customers will submit.  The idea is, if all possible address formats are returned from the database, than its structure and design is correct.

This first two screen shots show the part of the application that submits the candidate addresses in database queries to see whether or not results are as desired. The first generates addresses from database data that is based on user-defined patterns:



    The second produces addresses that feature the full range of field values found in the database, and      provides control over whether or not the values are transliterated. a synonym of a found database        value, or just examples of all database string values for the selected field.




The second screen shot, below, shows a dialog from the tool that seeks the presence and number of every address component in the incoming data, as well as exceptions or variations to the basic form. It color codes the results in the dialog and then can generate a combinatoric set of all permutations of the colored elements that it will drive through the Builder shown above.


Sunday, May 10, 2015

Token Type Manager

Every street address that, when written on an envelope will steer that envelope to the correct destination is composed of several informational address components. The country, city, thoroughfare name, and the premises number are four commonly used components. However, there are actually 31 components that are in use somewhere on planet Earth to get an envelope delivered.

That four member subset above is all that is necessary in some countries, so if we research the list of all terms used for the country, city, thoroughfare name, and premises number then we will be able to prepare a search strategy that will recognize the proper order and correctness of terms for that country. The purpose for the Empire Token Type Manager is to develop the set of street address components for each country on Earth.

Numerous technicians -- country specialists -- have been assigned the task of exhaustively listing all terms for each of the components that their country uses.  So, for instance, in the USA, acceptable terms for a thoroughfare name include street, avenue, boulevard, way, place, road, and many others can be entered into the EmpireTokenTypeManager to help advance the search capabilities of Melissa Data's worldwide address search and cleaning utilities.

After the technician is done with a TTM session they can save their work, after which it will be sent to a shared network location and merged with the global archive.


The information sent to the network location is actually a delta file, that is, a relatively small file that contains only the changes the technician added or removed. After a delta file has been transferred to the network OnRamp directory a second EmpireTokenTypeManager component, the ConfigFileIntegrator observes the presence of the delta file and merges it, and any others found at the same time, with the master token file. Once the merge is complete the newly updated global token file is used to generate five new files for various uses -- some in text format and some in binary -- and then committed to the company's internet-based Subversion data archive for use by several company products.

Trace Viewer

The Empire Trace Viewer presents a simple UI view that is a facade covering a massive amount of data. As the commercial product runs, and the trace feature is enabled, tracing data is saved in a file that facilitates debugging and performance statistics.  A typical session will generate hundreds of millions of lines of output.
The user story of interest concerns how to process the trace file so the fraction of contents the user wishes to view can be presented with no perceptible delay between dragging the file into the view and seeing the filtered results.  The other, similar user story is, how can the trace be presented with no perceptible delay after one of the filter options is changed.



I handled this problem by loading what is essentially function-keyed hash tables within call ID-keyed hash tables within session-keyed hash tables, which at the leaf end contains a list of file offset addresses.  With that, all lines in the trace file corresponding to which session the user recorded, which call made into the system, and which functional ID's are of interest can be accessed and presented in near immediate time.