Friday, April 15, 2016

Winforms DataGridview: Empire Search Logic Builder

When deciding the quality of a street address returned from a database query you'll realize that many variables contribute to the answer.  Is every address component returned? Are the address components in the desired order? If not in the desired order, which of the alternative orders are preferred? Is all of the spelling correct?  Is the national culture reflected in the character set? Were we successful in filtering profanity? How was the recovery when misleading information was used in the query statement?
Of those answers, which is more significant when deciding the magnitude of a result's quality?  That is the goal of the Empire Search Logic Builder -- to decide a magnitude for result quality so we can compare the quality of one version of an address to another version and thereby establish the precedence of the many variations and how it can be communicated to customers.

Automation is mandatory when there is a potential for dealing with more than one billion addresses. I developed the UI for this application so a user can easily specify the parameters for search results evaluations that can then be used to generate a list of the order and rank by which addresses returned from a database query are valued.  Using those results we understand the implications of parameter settings and can provide discrimination to our customers over which quality of address they can purchase. An example is a marketing company that is interested in the purchase of every single address in a certain county, and they intend to use the name, "Current Resident" in the address line. They only need to be assured that the physical address is correct, and therefore a relatively low magnitude for the address quality is acceptable. On the other hand, if a mayoral candidate wants to assure his "friends" in a housing development of his good intentions, he will value an address more that contains not only the correct physical address, but also the correct -- and current -- occupants' names in correct spelling. That type of address would be returned from a database query possessing a magnitude of quality higher than those suitable for addressing to "Current Resident".

This first screenshot from the Search Logic Builder shows a combination of user options and settings read from an existing driver/configuration file.
Empire Search Logic Builder in Verbose Mode

This second screenshot shows a different tab on the same window with the hard-to-read strings replaced with a column of equivalent tokens and equivalent graphical symbols.
Empire Search Logic Builder in Symbol Mode

This screenshot shows generated results that are the precedence by which addresses will be valued based on the settings specified above. Note that there are over 52K magnitude values in this results table and they are stacked in decreasing order of desirability. Also, many items that show in the DataGridView control are highlighted because they contain a blank if the building name is not matched, as selected in the rightside listbox and radio button.
Second page showing results of precedence generation

What I think is the technically interesting part of this application is the story of the icon columns in the two pages. The icon images are stored in a resx file which allows for the simplest ClickOnce distribution and installation. However, the sets of address component parameters are stored in a configuration file in the verbose form seen in the SLB's verbose mode DataGridView. Different configuration files will make use of different sets of verbose strings. At load, the application must scan the component lines to make a list of the component strings that will be used in the evaluation driven by the configuration file and then pair each to a short term (like TN2 or AP4), pair the short term to a graphic image in the graphics resource file, and then arrange them in the order stated in the file.
Note that the left column in the first page's DataGridView shows hyphen-separated short-form identifiers for each of the verbose strings that were prepared in a configuration dialog in an order the user specified. That is appropriate because the order is important in the evaluation. The format is considered easier to read than placing the short-form terms in columns because the address coders at Melissa Data see those as addresses, and spreading them into columns, ordered or not, makes them harder to comprehend. However, the icons in the rightside of that DataGridView are there to indicate the presence of a token and not the order of the tokens. So, giving each token type its own column provides an easier-to-comprehend view of the presence of a token as well as the trends intended as the lines are descended.

Monday, October 5, 2015

Application Management, Security, & Distribution

While at Attachmate, during a product version development period in which the features on which I worked had no new feature requests or bugs reported, I was assigned a three month special project whose goal was to explore the various management and distribution mechanisms available in the market so I could report on any were suited to improve our product.

Among the products currently performing those services was Microsoft's Management Console, which offered an interface by which we could install our product across a network, control to whom it was available, and control what features were available to specific users and groups. Two components were required, which I had implemented in our flagship product, Extra! Personal Client, before the product version from which I was supposed to be on sabbatical had firmed up its test plans. The first was a text file that specifies the product features that are to be controlled by the network administrator via the group policy user profiles. That was the administrator file (ADM), portions of which I've included below.  The second component that I implemented was the awareness code in the product UI that the group policy was in effect, and how to know whether or not a feature should be disabled or hidden.

With this feature our product cold be installed, configured, and made available feature-by-feature to users or groups as allowed by the network administrator. That reduced the cost of management and distribution of our product and provided security that was controlled in a place accessible only to those a customer would choose gave pause to those rushing to web-based solutions.

The Extra! Personal Client ADM File:

; EE2000 policy settings
#if version <= 2

CLASS USER   ;;;;;;;;;;;;;;;;;;
; BEGIN EXTRA! Enterprise 2000 Policy Template for use with Windows 95/98 or Windows NT 4.0

KEYNAME Software\Policies\Attachmate\EE2000

        POLICY !!EE2K_File

                PART !!EE2K_NewSession_Disable CHECKBOX
                VALUENAME DisableFileNewSession
                END PART

                PART !!EE2K_OpenSession_Disable CHECKBOX
                VALUENAME DisableFileOpenSession
                END PART

                PART !!EE2K_SaveSession_Disable CHECKBOX
                VALUENAME DisableFileSaveSession
                END PART

                PART !!EE2K_SaveSessionAs_Disable CHECKBOX
                VALUENAME DisableFileSaveSessionAs
                END PART

                PART !!EE2K_HostCapture_Disable CHECKBOX
                VALUENAME DisableFileHostCapture
                END PART

                PART !!EE2K_OpenLayout_Disable CHECKBOX
                VALUENAME DisableFileOpenLayout
                END PART

                PART !!EE2K_SaveLayout_Disable CHECKBOX
                VALUENAME DisableFileSaveLayout
                END PART

                PART !!EE2K_Properties_Disable CHECKBOX
                VALUENAME DisableFileProperties
                END PART

                PART !!EE2K_PageSetup_Disable CHECKBOX
                VALUENAME DisableFilePageSetup
                END PART

                PART !!EE2K_PrintSetup_Disable CHECKBOX
                VALUENAME DisableFilePrintSetup
                END PART

                PART !!EE2K_PrintScreen_Disable CHECKBOX
                VALUENAME DisableFilePrintScreen
                END PART

                PART !!EE2K_PrintMultipleScreens_Disable CHECKBOX
                VALUENAME DisableFilePrintMultipleScreens
                END PART

                PART !!EE2K_Capture_Disable CHECKBOX
                VALUENAME DisableFileCapture
                END PART

                PART !!EE2K_StopCapture_Disable CHECKBOX
                VALUENAME DisableFileStopCapture
                END PART

                PART !!EE2K_FinishPrinting_Disable CHECKBOX
                VALUENAME DisableFileFinishPrinting
                END PART

                PART !!EE2K_ExitSession_Disable CHECKBOX
                VALUENAME DisableFileExitSession
                END PART

                PART !!EE2K_ExitExtra_Disable CHECKBOX
                VALUENAME DisableFileExitExtra
                END PART

        END POLICY    ;file

        POLICY !!EE2K_Edit

                PART !!EE2K_Cut_Disable CHECKBOX
                VALUENAME DisableEditCut
                END PART

                PART !!EE2K_Copy_Disable CHECKBOX
                VALUENAME DisableEditCopy
                END PART

                PART !!EE2K_CopyAsTable CHECKBOX
                VALUENAME DisableEditCopyAsTable
                END PART

                PART !!EE2K_CutAndAppend CHECKBOX
                VALUENAME DisableEditCutAndAppend
                END PART

                PART !!EE2K_CopyAndAppend CHECKBOX
                VALUENAME DisableEditCopyAndAppend
                END PART

                PART !!EE2K_Paste_Disable CHECKBOX
                VALUENAME DisableEditPaste
                END PART

                PART !!EE2K_Paste_Continue CHECKBOX
                VALUENAME DisableEditPasteContinue
                END PART

                PART !!EE2K_Clear_Disable CHECKBOX
                VALUENAME DisableEditClearDisable
                END PART

                PART !!EE2K_Clear_Display CHECKBOX
                VALUENAME DisableEditClearDisplay
                END PART

                PART !!EE2K_Clear_History CHECKBOX
                VALUENAME DisableEditClearHistory
                END PART

                PART !!EE2K_SelectAll_Disable CHECKBOX
                VALUENAME DisableEditSelectAll
                END PART

                PART !!EE2K_SelectDisplay CHECKBOX
                VALUENAME DisableEditSelectDisplay
                END PART

                PART !!EE2K_Settings_Disable CHECKBOX
                VALUENAME DisableEditSettings
                END PART

        END POLICY    ;edit

        POLICY !!EE2K_View

                PART !!EE2K_Toolbars_Disable CHECKBOX
                VALUENAME DisableViewToolbars
                END PART

                PART !!EE2K_StatusBar_Disable CHECKBOX
                VALUENAME DisableViewStatusBar
                END PART

                PART !!EE2K_QuickPads_Disable CHECKBOX
                VALUENAME DisableViewQuickPads
                END PART

                PART !!EE2K_HotSpots_Disable CHECKBOX
                VALUENAME DisableViewHotSpots
                END PART

                PART !!EE2K_KeyboardMap_Disable CHECKBOX
                VALUENAME DisableViewKeyboardMap
                END PART

                PART !!EE2K_RuleLines CHECKBOX
                VALUENAME DisableViewRuleLines
                END PART

                PART !!EE2K_PrintStatus_Disable CHECKBOX
                VALUENAME DisableEditPrintStatus
                END PART
                PART !!EE2K_SessionStatus_Disable CHECKBOX
                VALUENAME DisableViewSessionStatus
                END PART

        END POLICY    ;view

        POLICY !!EE2K_Tools

                PART !!EE2K_FileTransfer_Disable CHECKBOX
                VALUENAME DisableToolsFileTransfer
                END PART

                PART !!EE2K_TransferMultiple_Disable CHECKBOX
                VALUENAME DisableToolsMultipleFileTransfer
                END PART

                PART !!EE2K_5250Transfer_Disable CHECKBOX
                VALUENAME DisableTools5250FileTransfer
                END PART

                PART !!EE2K_SendFile_Disable CHECKBOX
                VALUENAME DisableToolsSendFile
                END PART

                PART !!EE2K_ReceiveFile_Disable CHECKBOX
                VALUENAME DisableToolsReceiveFile
                END PART

                PART !!EE2K_Macro_Disable CHECKBOX
                VALUENAME DisableToolsMacro
                END PART

                PART !!EE2K_RecentMacro_Disable CHECKBOX
                VALUENAME DisableToolsRecentMacro
                END PART

                PART !!EE2K_CaptureIncomingData_Disable CHECKBOX
                VALUENAME DisableToolsCaptureIncomingData
                END PART
                PART !!EE2K_EndCapture_Disable CHECKBOX
                VALUENAME DisableToolsEndCapture
                END PART

                PART !!EE2K_RecordPages_Disable CHECKBOX
                VALUENAME DisableToolsRecordPages
                END PART

                PART !!EE2K_PageSettings_Disable CHECKBOX
                VALUENAME DisableToolsPageSettings
                END PART

                PART !!EE2K_Status_Disable CHECKBOX
                VALUENAME DisableToolsStatus
                END PART

                PART !!EE2K_AlignForm_Disable CHECKBOX
                VALUENAME DisableToolsAlignForm
                END PART

                PART !!EE2K_TestPage_Disable CHECKBOX
                VALUENAME DisableToolsPrintTestPage
                END PART

        END POLICY    ;tools

        POLICY !!EE2K_Session

                PART !!EE2K_Connect_Disable CHECKBOX
                VALUENAME DisableSessionConnect
                END PART

                PART !!EE2K_Disconnect_Disable CHECKBOX
                VALUENAME DisableSessionDisconnect
                END PART

                PART !!EE2K_Reset_Disable CHECKBOX
                VALUENAME DisableSessionReset
                END PART

                PART !!EE2K_ResetDisplay_Disable CHECKBOX
                VALUENAME DisableSessionResetDisplay
                END PART

                PART !!EE2K_ResetConnection_Disable CHECKBOX
                VALUENAME DisableSessionResetConnection
                END PART

        END POLICY    ;session

        POLICY !!EE2K_Control

                PART !!EE2K_HoldPrint_Disable CHECKBOX
                VALUENAME DisableControlHoldPrint
                END PART

                PART !!EE2K_PA1_Disable CHECKBOX
                VALUENAME DisableControlPA1
                END PART

                PART !!EE2K_PA2_Disable CHECKBOX
                VALUENAME DisableControlPA2
                END PART

                PART !!EE2K_CancelPrint_Disable CHECKBOX
                VALUENAME DisableControlCancelPrint
                END PART

                PART !!EE2K_FormFeed_Disable CHECKBOX
                VALUENAME DisableControlFormFeed
                END PART

        END POLICY    ;control

        POLICY !!EE2K_Options

                PART !!EE2K_OptionsSettings_Disable CHECKBOX
                VALUENAME DisableOptionsSettings
                END PART

                PART !!EE2K_SessionType_Disable CHECKBOX
                VALUENAME DisableOptionsSessionType
                END PART

                PART !!EE2K_GlobalPreferences_Disable CHECKBOX
                VALUENAME DisableOptionsGlobalPreferences
                END PART

                PART !!EE2K_Security_Disable CHECKBOX
                VALUENAME DisableOptionsSecurity
                END PART

                PART !!EE2K_Color_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsColors
                END PART

                PART !!EE2K_Connection_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsConnection
                END PART

                PART !!EE2K_Display_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsDisplay
                END PART

                PART !!EE2K_Font_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsFonts
                END PART

                PART !!EE2K_Navigation_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsNavigation
                END PART

                PART !!EE2K_Printer_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsPrinter
                END PART

        END POLICY    ;options

        POLICY !!EE2K_Help

                PART !!EE2K_HelpTopics_Disable CHECKBOX
                VALUENAME DisableHelpHelpTopics
                END PART

                PART !!EE2K_UsingHelp_Disable CHECKBOX
                VALUENAME DisableHelpUsingHelp
                END PART

                PART !!EE2K_SupportWeb_Disable CHECKBOX
                VALUENAME DisableHelpSupportWeb
                END PART

                PART !!EE2K_OfficeCompatible_Disable CHECKBOX
                VALUENAME DisableHelpOfficeCompatible
                END PART

                PART !!EE2K_About_Disable CHECKBOX
                VALUENAME DisableHelpAbout
                END PART

        END POLICY    ;help


Monday, May 11, 2015

Empire Test Case Builder

When constructing a database that contains all mailing addresses in the world a lot of testing is required to ensure your work is correct -- in each country. The GCAT Test Case Builder searches converted data to generate examples of all permutations of practical address types that can be used in a query like our customers will submit.  The idea is, if all possible address formats are returned from the database, than its structure and design is correct.

This first two screen shots show the part of the application that submits the candidate addresses in database queries to see whether or not results are as desired. The first generates addresses from database data that is based on user-defined patterns:

    The second produces addresses that feature the full range of field values found in the database, and      provides control over whether or not the values are transliterated. a synonym of a found database        value, or just examples of all database string values for the selected field.

The second screen shot, below, shows a dialog from the tool that seeks the presence and number of every address component in the incoming data, as well as exceptions or variations to the basic form. It color codes the results in the dialog and then can generate a combinatoric set of all permutations of the colored elements that it will drive through the Builder shown above.

Sunday, May 10, 2015

Token Type Manager

Every street address that, when written on an envelope will steer that envelope to the correct destination is composed of several informational address components. The country, city, thoroughfare name, and the premises number are four commonly used components. However, there are actually 31 components that are in use somewhere on planet Earth to get an envelope delivered.

That four member subset above is all that is necessary in some countries, so if we research the list of all terms used for the country, city, thoroughfare name, and premises number then we will be able to prepare a search strategy that will recognize the proper order and correctness of terms for that country. The purpose for the Empire Token Type Manager is to develop the set of street address components for each country on Earth.

Numerous technicians -- country specialists -- have been assigned the task of exhaustively listing all terms for each of the components that their country uses.  So, for instance, in the USA, acceptable terms for a thoroughfare name include street, avenue, boulevard, way, place, road, and many others can be entered into the EmpireTokenTypeManager to help advance the search capabilities of Melissa Data's worldwide address search and cleaning utilities.

After the technician is done with a TTM session they can save their work, after which it will be sent to a shared network location and merged with the global archive.

The information sent to the network location is actually a delta file, that is, a relatively small file that contains only the changes the technician added or removed. After a delta file has been transferred to the network OnRamp directory a second EmpireTokenTypeManager component, the ConfigFileIntegrator observes the presence of the delta file and merges it, and any others found at the same time, with the master token file. Once the merge is complete the newly updated global token file is used to generate five new files for various uses -- some in text format and some in binary -- and then committed to the company's internet-based Subversion data archive for use by several company products.

Trace Viewer

The Empire Trace Viewer presents a simple UI view that is a facade covering a massive amount of data. As the commercial product runs, and the trace feature is enabled, tracing data is saved in a file that facilitates debugging and performance statistics.  A typical session will generate hundreds of millions of lines of output.
The user story of interest concerns how to process the trace file so the fraction of contents the user wishes to view can be presented with no perceptible delay between dragging the file into the view and seeing the filtered results.  The other, similar user story is, how can the trace be presented with no perceptible delay after one of the filter options is changed.

I handled this problem by loading what is essentially function-keyed hash tables within call ID-keyed hash tables within session-keyed hash tables, which at the leaf end contains a list of file offset addresses.  With that, all lines in the trace file corresponding to which session the user recorded, which call made into the system, and which functional ID's are of interest can be accessed and presented in near immediate time.

Monday, November 17, 2014

The Area Browser

A significant part of my current job involves converting data that is obtained from many different sources into a standard format that uses the appropriate terminology for the general customer. One example of that is seen in the various ways we receive our international data. Within the data product of our different international data sources we see different standards for letter casing, combinations of national and language culture, different choices for area name modifiers, and entity duplication.To deal with that we have prepared an application, known as the Area Browser, that has a somewhat cohesive set of features intended to help a geographic area specialist merge the various versions of each unique data point into the preferred value as considered by our paying customers.

As seen just below, the Area Browser presents two tree views of the same data that allow the subject expert to line up merging or moving data so it is logically organized.  Features include several ways to move or merge tree nodes including movement of multiple nodes, movement or merging of like-named nodes, searching and matching that is wise to culture casing and transliteration, assigning the hierarchy of area level names as appropriate for the country, and manual creation/deletion and editing of nodes.

The Area Browser

Menu items show available actions:

The Preference Selector shown below is accessed via a mnemonic or menu item for the purpose of resolving terminology differences that result in duplication across the data set that falls below a selected tree level.  It provides lists of areas for which there is not yet a preference, and for areas where there are duplicate preferences specified. It offers the means to specify terms that should be preferred or avoided, the letter casing in which that preference should be selected, and whether consideration terms should be substrings or whole words.

The Preference Selector

Monday, September 22, 2014

Performance profiling

When my team had finished the basic development work for a new set of seven API's for Attachmate's flagship product, EXTRA! Personal Client, we received the unwelcome news that many methods in our libraries offered slower execution times than our chief competitor.  This set off two parallel efforts to diagnose and solve the problem so we would not only have the most reliable and portable API's, but also the API's offering the best performance in the industry. I handled one approach, and project management worked together to resolve it their way.

The API's were written in the form of DLL libraries that were to be loaded by our customers' homegrown client applications, which then became able to automate function sequences on a remote EXTRA! session that would relay the automation conversation to some type of mainframe computer. There were three primary architectural components to our overall product that determined how it performed compared to our competition.  First, there was the speed of the calls inside the DLL. Most of the library methods were little more than pass-thru methods that relayed a set of parameters to the EXTRA! session where processing would occur.  A cursory McCabe's evaluation suggested that there was almost no percentage of execution time was spent in the library.  Another component was the EXTRA! terminal application, which performed a fair amount of mainframe session processing within itself, but also in network communications with the actual mainframe computer. The last of the main components, was the remote  process call (RPC) between the library and the EXTRA! session.  This is another DLL where the parameters and method ID would be marshalled and sent across the wire from the client DLL to the EXTRA! terminal application. Because the way this was implemented -- a COM bridge between the two modules --was new to the Attachmate culture, the village cried out, "WITCH!", and all of product and marketing management devoted their efforts to replacing this with something better understood.  A staff programmer stated that he could solve all of our problems by replacing  the offending COM bridge with a mapped memory implementation.

My route involved the use of a profiler to actually learn how much, and what fraction of execution time was occured in each API component -- method-by-method.  In a nutshell, the average run percentage in the three API components came in around this:

5% in the DLL while running in the client application
5-15% in the RPC -- depending on which method was used (and how much data was transferred)
80-90% in the terminal

I reported that if we made our API execution time be -1% of each calls duration we could never improve performance by over 16% on any single call.  And, on average they could not expect greater than around 7% improvement -- again, if execution time was improved to -1% of a call's duration.  If they would devote a small amount of effort to the big problem, the terminal application, they would realize a much greater benefit. I produced a table that showed the ultimate possible improvement for each method together with the expected, and stored it in the QA archives so it could be validated at some later time.

Product management were swayed by the hysterics of the other side, they devoted two years' effort to preparing an alternate RPC DLL, and then released it with the product.  They claimed internally and in the marketing propaganda that tests had shown up to a 90% reduction in API execution time. Consider that they didn't touch where, on average, 85% of the execution time occurred that was amazing.  As if the new RPC DLL used -17 times as much processing time.  They didn't listen to my remark that COM is simply Microsoft's implementation of mapped memory, and that it was being tested millions of times every minute around the world, so it was likely more reliable and efficient than anything we were likely to produce to replace it.

I didn't mention before that after the second release there were almost zero bugs reported on the COM-based API set for its product lifespan -- which continues until today.  However, the mapped menory implementation, known as "Enhanced Transport" was continually wracked with issues, also continuing to this day.  During a particularly slow development period after I'd left Attachmate I heard that Marketing decided that configuring the product so Enhanced Transport was the default RPC mechanism would be a good improvement to feature in the advertisements.  It actually caused major fallout due to damage to the reliability of the API itself.  A followup release became necessary to fix the damage to clients' operations and Attachmate's corporate reputation.

I rejoined at Attachmate 2-1/2 years later and found that Paul Riggs, a tester in QA, had eventually performed the formal analysis between the two RPC mechanisms about two years after the Enhanced version's release.  His tests found exactly what my profiling predicted.  Paul told me that he could lay his report over mine and hold it up to the light and they were nearly identical.

This is an example of how a use of good software evaluation tools can lead you to develop the correct solution and help you to make your case when faced with determined and ranking opposition.