Don Cannon

Winforms DataGridview: Empire Search Logic Builder

2016-04-15T09:49:00.004-07:00

When deciding the quality of a street address returned from a database query you'll realize that many variables contribute to the answer. Is every address component returned? Are the address components in the desired order? If not in the desired order, which of the alternative orders are preferred? Is all of the spelling correct? Is the national culture reflected in the character set? Were we successful in filtering profanity? How was the recovery when misleading information was used in the query statement?
Of those answers, which is more significant when deciding the magnitude of a result's quality? That is the goal of the Empire Search Logic Builder -- to decide a magnitude for result quality so we can compare the quality of one version of an address to another version and thereby establish the precedence of the many variations and how it can be communicated to customers.

Automation is mandatory when there is a potential for dealing with more than one billion addresses. I developed the UI for this application so a user can easily specify the parameters for search results evaluations that can then be used to generate a list of the order and rank by which addresses returned from a database query are valued. Using those results we understand the implications of parameter settings and can provide discrimination to our customers over which quality of address they can purchase. An example is a marketing company that is interested in the purchase of every single address in a certain county, and they intend to use the name, "Current Resident" in the address line. They only need to be assured that the physical address is correct, and therefore a relatively low magnitude for the address quality is acceptable. On the other hand, if a mayoral candidate wants to assure his "friends" in a housing development of his good intentions, he will value an address more that contains not only the correct physical address, but also the correct -- and current -- occupants' names in correct spelling. That type of address would be returned from a database query possessing a magnitude of quality higher than those suitable for addressing to "Current Resident".

This first screenshot from the Search Logic Builder shows a combination of user options and settings read from an existing driver/configuration file.

Empire Search Logic Builder in Verbose Mode

This second screenshot shows a different tab on the same window with the hard-to-read strings replaced with a column of equivalent tokens and equivalent graphical symbols.

Empire Search Logic Builder in Symbol Mode

This screenshot shows generated results that are the precedence by which addresses will be valued based on the settings specified above. Note that there are over 52K magnitude values in this results table and they are stacked in decreasing order of desirability. Also, many items that show in the DataGridView control are highlighted because they contain a blank if the building name is not matched, as selected in the rightside listbox and radio button.

Second page showing results of precedence generation

What I think is the technically interesting part of this application is the story of the icon columns in the two pages. The icon images are stored in a resx file which allows for the simplest ClickOnce distribution and installation. However, the sets of address component parameters are stored in a configuration file in the verbose form seen in the SLB's verbose mode DataGridView. Different configuration files will make use of different sets of verbose strings. At load, the application must scan the component lines to make a list of the component strings that will be used in the evaluation driven by the configuration file and then pair each to a short term (like TN2 or AP4), pair the short term to a graphic image in the graphics resource file, and then arrange them in the order stated in the file.
Note that the left column in the first page's DataGridView shows hyphen-separated short-form identifiers for each of the verbose strings that were prepared in a configuration dialog in an order the user specified. That is appropriate because the order is important in the evaluation. The format is considered easier to read than placing the short-form terms in columns because the address coders at Melissa Data see those as addresses, and spreading them into columns, ordered or not, makes them harder to comprehend. However, the icons in the rightside of that DataGridView are there to indicate the presence of a token and not the order of the tokens. So, giving each token type its own column provides an easier-to-comprehend view of the presence of a token as well as the trends intended as the lines are descended.

Application Management, Security, & Distribution

2015-10-05T12:16:00.000-07:00

While at Attachmate, during a product version development period in which the features on which I worked had no new feature requests or bugs reported, I was assigned a three month special project whose goal was to explore the various management and distribution mechanisms available in the market so I could report on any that were suited to improve our product.

Among the products currently performing those services was Microsoft's Management Console, which offered an interface by which we could install and specify what features in our flagship product, Extra Personal Client, were visible or accessible, by user or group, from the network administrator's console. Applying this research I prepared a group policy administrator file (ADM) that provided the list of product features the MMC console could offer to configure and the awareness code in the product that presented the group policy settings in the product's UI.

With this feature our product cold be installed, configured, and made available feature-by-feature to users or groups as allowed by the network administrator. That reduced the cost of management and distribution of our product and provided security that was controlled in a place accessible only to those a customer would choose gave pause to those rushing to web-based solutions.

The Extra! Personal Client ADM File:

; EE2000 policy settings
#if version <= 2

;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
CLASS USER   ;;;;;;;;;;;;;;;;;;
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
;************************************************
;************************************************
;************************************************
;************************************************
; BEGIN EXTRA! Enterprise 2000 Policy Template for use with Windows 95/98 or Windows NT 4.0
;
CATEGORY !!EE2000

KEYNAME Software\Policies\Attachmate\EE2000


        POLICY !!EE2K_File

                PART !!EE2K_NewSession_Disable CHECKBOX
                VALUENAME DisableFileNewSession
                END PART

                PART !!EE2K_OpenSession_Disable CHECKBOX
                VALUENAME DisableFileOpenSession
                END PART

                PART !!EE2K_SaveSession_Disable CHECKBOX
                VALUENAME DisableFileSaveSession
                END PART

                PART !!EE2K_SaveSessionAs_Disable CHECKBOX
                VALUENAME DisableFileSaveSessionAs
                END PART

                PART !!EE2K_HostCapture_Disable CHECKBOX
                VALUENAME DisableFileHostCapture
                END PART

                PART !!EE2K_OpenLayout_Disable CHECKBOX
                VALUENAME DisableFileOpenLayout
                END PART

                PART !!EE2K_SaveLayout_Disable CHECKBOX
                VALUENAME DisableFileSaveLayout
                END PART

                PART !!EE2K_Properties_Disable CHECKBOX
                VALUENAME DisableFileProperties
                END PART

                PART !!EE2K_PageSetup_Disable CHECKBOX
                VALUENAME DisableFilePageSetup
                END PART

                PART !!EE2K_PrintSetup_Disable CHECKBOX
                VALUENAME DisableFilePrintSetup
                END PART

                PART !!EE2K_PrintScreen_Disable CHECKBOX
                VALUENAME DisableFilePrintScreen
                END PART

                PART !!EE2K_PrintMultipleScreens_Disable CHECKBOX
                VALUENAME DisableFilePrintMultipleScreens
                END PART

                PART !!EE2K_Capture_Disable CHECKBOX
                VALUENAME DisableFileCapture
                END PART

                PART !!EE2K_StopCapture_Disable CHECKBOX
                VALUENAME DisableFileStopCapture
                END PART

                PART !!EE2K_FinishPrinting_Disable CHECKBOX
                VALUENAME DisableFileFinishPrinting
                END PART

                PART !!EE2K_ExitSession_Disable CHECKBOX
                VALUENAME DisableFileExitSession
                END PART

                PART !!EE2K_ExitExtra_Disable CHECKBOX
                VALUENAME DisableFileExitExtra
                END PART

        END POLICY    ;file
;************************************************

        POLICY !!EE2K_Edit

                PART !!EE2K_Cut_Disable CHECKBOX
                VALUENAME DisableEditCut
                END PART

                PART !!EE2K_Copy_Disable CHECKBOX
                VALUENAME DisableEditCopy
                END PART

                PART !!EE2K_CopyAsTable CHECKBOX
                VALUENAME DisableEditCopyAsTable
                END PART

                PART !!EE2K_CutAndAppend CHECKBOX
                VALUENAME DisableEditCutAndAppend
                END PART

                PART !!EE2K_CopyAndAppend CHECKBOX
                VALUENAME DisableEditCopyAndAppend
                END PART

                PART !!EE2K_Paste_Disable CHECKBOX
                VALUENAME DisableEditPaste
                END PART

                PART !!EE2K_Paste_Continue CHECKBOX
                VALUENAME DisableEditPasteContinue
                END PART

                PART !!EE2K_Clear_Disable CHECKBOX
                VALUENAME DisableEditClearDisable
                END PART

                PART !!EE2K_Clear_Display CHECKBOX
                VALUENAME DisableEditClearDisplay
                END PART

                PART !!EE2K_Clear_History CHECKBOX
                VALUENAME DisableEditClearHistory
                END PART

                PART !!EE2K_SelectAll_Disable CHECKBOX
                VALUENAME DisableEditSelectAll
                END PART

                PART !!EE2K_SelectDisplay CHECKBOX
                VALUENAME DisableEditSelectDisplay
                END PART

                PART !!EE2K_Settings_Disable CHECKBOX
                VALUENAME DisableEditSettings
                END PART


        END POLICY    ;edit
;************************************************

        POLICY !!EE2K_View

                PART !!EE2K_Toolbars_Disable CHECKBOX
                VALUENAME DisableViewToolbars
                END PART

                PART !!EE2K_StatusBar_Disable CHECKBOX
                VALUENAME DisableViewStatusBar
                END PART

                PART !!EE2K_QuickPads_Disable CHECKBOX
                VALUENAME DisableViewQuickPads
                END PART

                PART !!EE2K_HotSpots_Disable CHECKBOX
                VALUENAME DisableViewHotSpots
                END PART

                PART !!EE2K_KeyboardMap_Disable CHECKBOX
                VALUENAME DisableViewKeyboardMap
                END PART

                PART !!EE2K_RuleLines CHECKBOX
                VALUENAME DisableViewRuleLines
                END PART

                PART !!EE2K_PrintStatus_Disable CHECKBOX
                VALUENAME DisableEditPrintStatus
                END PART

                PART !!EE2K_SessionStatus_Disable CHECKBOX
                VALUENAME DisableViewSessionStatus
                END PART

        END POLICY    ;view
;************************************************

        POLICY !!EE2K_Tools

                PART !!EE2K_FileTransfer_Disable CHECKBOX
                VALUENAME DisableToolsFileTransfer
                END PART

                PART !!EE2K_TransferMultiple_Disable CHECKBOX
                VALUENAME DisableToolsMultipleFileTransfer
                END PART

                PART !!EE2K_5250Transfer_Disable CHECKBOX
                VALUENAME DisableTools5250FileTransfer
                END PART

                PART !!EE2K_SendFile_Disable CHECKBOX
                VALUENAME DisableToolsSendFile
                END PART

                PART !!EE2K_ReceiveFile_Disable CHECKBOX
                VALUENAME DisableToolsReceiveFile
                END PART

                PART !!EE2K_Macro_Disable CHECKBOX
                VALUENAME DisableToolsMacro
                END PART

                PART !!EE2K_RecentMacro_Disable CHECKBOX
                VALUENAME DisableToolsRecentMacro
                END PART

                PART !!EE2K_CaptureIncomingData_Disable CHECKBOX
                VALUENAME DisableToolsCaptureIncomingData
                END PART

                PART !!EE2K_EndCapture_Disable CHECKBOX
                VALUENAME DisableToolsEndCapture
                END PART

                PART !!EE2K_RecordPages_Disable CHECKBOX
                VALUENAME DisableToolsRecordPages
                END PART

                PART !!EE2K_PageSettings_Disable CHECKBOX
                VALUENAME DisableToolsPageSettings
                END PART

                PART !!EE2K_Status_Disable CHECKBOX
                VALUENAME DisableToolsStatus
                END PART

                PART !!EE2K_AlignForm_Disable CHECKBOX
                VALUENAME DisableToolsAlignForm
                END PART

                PART !!EE2K_TestPage_Disable CHECKBOX
                VALUENAME DisableToolsPrintTestPage
                END PART

        END POLICY    ;tools
;************************************************

        POLICY !!EE2K_Session

                PART !!EE2K_Connect_Disable CHECKBOX
                VALUENAME DisableSessionConnect
                END PART

                PART !!EE2K_Disconnect_Disable CHECKBOX
                VALUENAME DisableSessionDisconnect
                END PART

                PART !!EE2K_Reset_Disable CHECKBOX
                VALUENAME DisableSessionReset
                END PART

                PART !!EE2K_ResetDisplay_Disable CHECKBOX
                VALUENAME DisableSessionResetDisplay
                END PART

                PART !!EE2K_ResetConnection_Disable CHECKBOX
                VALUENAME DisableSessionResetConnection
                END PART

        END POLICY    ;session
;************************************************

        POLICY !!EE2K_Control

                PART !!EE2K_HoldPrint_Disable CHECKBOX
                VALUENAME DisableControlHoldPrint
                END PART

                PART !!EE2K_PA1_Disable CHECKBOX
                VALUENAME DisableControlPA1
                END PART

                PART !!EE2K_PA2_Disable CHECKBOX
                VALUENAME DisableControlPA2
                END PART

                PART !!EE2K_CancelPrint_Disable CHECKBOX
                VALUENAME DisableControlCancelPrint
                END PART

                PART !!EE2K_FormFeed_Disable CHECKBOX
                VALUENAME DisableControlFormFeed
                END PART

        END POLICY    ;control
;************************************************

        POLICY !!EE2K_Options

                PART !!EE2K_OptionsSettings_Disable CHECKBOX
                VALUENAME DisableOptionsSettings
                END PART

                PART !!EE2K_SessionType_Disable CHECKBOX
                VALUENAME DisableOptionsSessionType
                END PART

                PART !!EE2K_GlobalPreferences_Disable CHECKBOX
                VALUENAME DisableOptionsGlobalPreferences
                END PART

                PART !!EE2K_Security_Disable CHECKBOX
                VALUENAME DisableOptionsSecurity
                END PART

                PART !!EE2K_Color_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsColors
                END PART

                PART !!EE2K_Connection_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsConnection
                END PART

                PART !!EE2K_Display_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsDisplay
                END PART

                PART !!EE2K_Font_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsFonts
                END PART

                PART !!EE2K_Navigation_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsNavigation
                END PART

                PART !!EE2K_Printer_Disable CHECKBOX
                VALUENAME DisableOptionsSettingsPrinter
                END PART

        END POLICY    ;options
;************************************************

        POLICY !!EE2K_Help

                PART !!EE2K_HelpTopics_Disable CHECKBOX
                VALUENAME DisableHelpHelpTopics
                END PART

                PART !!EE2K_UsingHelp_Disable CHECKBOX
                VALUENAME DisableHelpUsingHelp
                END PART

                PART !!EE2K_SupportWeb_Disable CHECKBOX
                VALUENAME DisableHelpSupportWeb
                END PART

                PART !!EE2K_OfficeCompatible_Disable CHECKBOX
                VALUENAME DisableHelpOfficeCompatible
                END PART

                PART !!EE2K_About_Disable CHECKBOX
                VALUENAME DisableHelpAbout
                END PART

        END POLICY    ;help
;************************************************

END CATEGORY

Empire Test Case Builder

2015-05-11T09:15:00.003-07:00

When constructing a database that contains all mailing addresses in the world a lot of testing is required to ensure your work is correct -- in each country. The GCAT Test Case Builder searches converted data to generate examples of all permutations of practical address types that can be used in a query like our customers will submit. The idea is, if all possible address formats are returned from the database, than its structure and design is correct.

This first two screen shots show the part of the application that submits the candidate addresses in database queries to see whether or not results are as desired. The first generates addresses from database data that is based on user-defined patterns:

The second produces addresses that feature the full range of field values found in the database, and provides control over whether or not the values are transliterated. a synonym of a found database value, or just examples of all database string values for the selected field.

The second screen shot, below, shows a dialog from the tool that seeks the presence and number of every address component in the incoming data, as well as exceptions or variations to the basic form. It color codes the results in the dialog and then can generate a combinatoric set of all permutations of the colored elements that it will drive through the Builder shown above.

Token Type Manager

2015-05-10T21:54:00.003-07:00

Every street address that, when written on an envelope will steer that envelope to the correct destination is composed of several informational address components. The country, city, thoroughfare name, and the premises number are four commonly used components. However, there are actually 31 components that are in use somewhere on planet Earth to get an envelope delivered.

That four member subset above is all that is necessary in some countries, so if we research the list of all terms used for the country, city, thoroughfare name, and premises number then we will be able to prepare a search strategy that will recognize the proper order and correctness of terms for that country. The purpose for the Empire Token Type Manager is to develop the set of street address components for each country on Earth.

Numerous technicians -- country specialists -- have been assigned the task of exhaustively listing all terms for each of the components that their country uses. So, for instance, in the USA, acceptable terms for a thoroughfare name include street, avenue, boulevard, way, place, road, and many others can be entered into the EmpireTokenTypeManager to help advance the search capabilities of Melissa Data's worldwide address search and cleaning utilities.

After the technician is done with a TTM session they can save their work, after which it will be sent to a shared network location and merged with the global archive.

The information sent to the network location is actually a delta file, that is, a relatively small file that contains only the changes the technician added or removed. After a delta file has been transferred to the network OnRamp directory a second EmpireTokenTypeManager component, the ConfigFileIntegrator observes the presence of the delta file and merges it, and any others found at the same time, with the master token file. Once the merge is complete the newly updated global token file is used to generate five new files for various uses -- some in text format and some in binary -- and then committed to the company's internet-based Subversion data archive for use by several company products.

Trace Viewer

2015-05-10T21:46:00.000-07:00

The Empire Trace Viewer presents a simple UI view that is a facade covering a massive amount of data. As the commercial product runs, and the trace feature is enabled, tracing data is saved in a file that facilitates debugging and performance statistics. A typical session will generate hundreds of millions of lines of output.
The user story of interest concerns how to process the trace file so the fraction of contents the user wishes to view can be presented with no perceptible delay between dragging the file into the view and seeing the filtered results. The other, similar user story is, how can the trace be presented with no perceptible delay after one of the filter options is changed.

I handled this problem by loading what is essentially function-keyed hash tables within call ID-keyed hash tables within session-keyed hash tables, which at the leaf end contains a list of file offset addresses. With that, all lines in the trace file corresponding to which session the user recorded, which call made into the system, and which functional ID's are of interest can be accessed and presented in near immediate time.

The Area Browser

2014-11-17T15:00:00.003-08:00

A significant part of my current job involves converting data that is obtained from many different sources into a standard format that uses the appropriate terminology for the general customer. One example of that is seen in the various ways we receive our international data. Within the data product of our different international data sources we see different standards for letter casing, combinations of national and language culture, different choices for area name modifiers, and entity duplication.To deal with that we have prepared an application, known as the Area Browser, that has a somewhat cohesive set of features intended to help a geographic area specialist merge the various versions of each unique data point into the preferred value as considered by our paying customers.

As seen just below, the Area Browser presents two tree views of the same data that allow the subject expert to line up merging or moving data so it is logically organized. Features include several ways to move or merge tree nodes including movement of multiple nodes, movement or merging of like-named nodes, searching and matching that is wise to culture casing and transliteration, assigning the hierarchy of area level names as appropriate for the country, and manual creation/deletion and editing of nodes.

The Area Browser

Menu items show available actions:

The Preference Selector shown below is accessed via a mnemonic or menu item for the purpose of resolving terminology differences that result in duplication across the data set that falls below a selected tree level. It provides lists of areas for which there is not yet a preference, and for areas where there are duplicate preferences specified. It offers the means to specify terms that should be preferred or avoided, the letter casing in which that preference should be selected, and whether consideration terms should be substrings or whole words.

The Preference Selector

Performance profiling

2014-09-22T08:53:00.001-07:00

When my team had finished the basic development work for a new set of seven API's for Attachmate's flagship product, EXTRA! Personal Client, we received the unwelcome news that many methods in our libraries offered slower execution times than our chief competitor. This set off two parallel efforts to diagnose and solve the problem so we would not only have the most reliable and portable API's, but also the API's offering the best performance in the industry. I handled one approach, and project management worked together to resolve it their way.

The API's were written in the form of DLL libraries that were to be loaded by our customers' homegrown client applications, which then became able to automate function sequences on a remote EXTRA! session that would relay the automation conversation to some type of mainframe computer. There were three primary architectural components to our overall product that determined how it performed compared to our competition. First, there was the speed of the calls inside the DLL. Most of the library methods were little more than pass-thru methods that relayed a set of parameters to the EXTRA! session where processing would occur. A cursory McCabe's evaluation suggested that there was almost no percentage of execution time was spent in the library. Another component was the EXTRA! terminal application, which performed a fair amount of mainframe session processing within itself, but also in network communications with the actual mainframe computer. The last of the main components, was the remote process call (RPC) between the library and the EXTRA! session. This is another DLL where the parameters and method ID would be marshalled and sent across the wire from the client DLL to the EXTRA! terminal application. Because the way this was implemented -- a COM bridge between the two modules --was new to the Attachmate culture, the village cried out, "WITCH!", and all of product and marketing management devoted their efforts to replacing this with something better understood. A staff programmer stated that he could solve all of our problems by replacing the offending COM bridge with a mapped memory implementation.

My route involved the use of a profiler to actually learn how much, and what fraction of execution time was occured in each API component -- method-by-method. In a nutshell, the average run percentage in the three API components came in around this:

5% in the DLL while running in the client application
5-15% in the RPC -- depending on which method was used (and how much data was transferred)
80-90% in the terminal

I reported that if we made our API execution time be -1% of each calls duration we could never improve performance by over 16% on any single call. And, on average they could not expect greater than around 7% improvement -- again, if execution time was improved to -1% of a call's duration. If they would devote a small amount of effort to the big problem, the terminal application, they would realize a much greater benefit. I produced a table that showed the ultimate possible improvement for each method together with the expected, and stored it in the QA archives so it could be validated at some later time.

Product management were swayed by the hysterics of the other side, they devoted two years' effort to preparing an alternate RPC DLL, and then released it with the product. They claimed internally and in the marketing propaganda that tests had shown up to a 90% reduction in API execution time. Consider that they didn't touch where, on average, 85% of the execution time occurred that was amazing. As if the new RPC DLL used -17 times as much processing time. They didn't listen to my remark that COM is simply Microsoft's implementation of mapped memory, and that it was being tested millions of times every minute around the world, so it was likely more reliable and efficient than anything we were likely to produce to replace it.

I didn't mention before that after the second release there were almost zero bugs reported on the COM-based API set for its product lifespan -- which continues until today. However, the mapped menory implementation, known as "Enhanced Transport" was continually wracked with issues, also continuing to this day. During a particularly slow development period after I'd left Attachmate I heard that Marketing decided that configuring the product so Enhanced Transport was the default RPC mechanism would be a good improvement to feature in the advertisements. It actually caused major fallout due to damage to the reliability of the API itself. A followup release became necessary to fix the damage to clients' operations and Attachmate's corporate reputation.

I rejoined at Attachmate 2-1/2 years later and found that Paul Riggs, a tester in QA, had eventually performed the formal analysis between the two RPC mechanisms about two years after the Enhanced version's release. His tests found exactly what my profiling predicted. Paul told me that he could lay his report over mine and hold it up to the light and they were nearly identical.

This is an example of how a use of good software evaluation tools can lead you to develop the correct solution and help you to make your case when faced with determined and ranking opposition.

Windows Phone App: Residents

2014-09-21T21:34:00.002-07:00

I was asked to prepare a simple prototype Windows mobile app that could be used by a political candidate when he was canvassing neighborhoods on foot. The application was to prepare him with the name of the people who live in the home where he is about to knock on the door by querying our online database. If our demographic record contains it, we would also provide information regarding the primary resident's political contribution history.

The application ran in two modes; First, the mobile device's onboard GPS would provide the holder's coordinates, and they would be checked against our database for the geocode to residence match. Mode 2 allowed the user to type the address for cases where our geocode didn't possess sufficient precision for an area to return resident information.

ASP.NET MVC 4 Reporting Site

2014-06-11T13:27:00.001-07:00

Recently I've been working on preparing some views into our data that can both be easily accessed and customized by the user, and the data easily updated by the project administrator. I've prepared a few internal webform and MVC websites using the Entity Framework and Linq-to-SQL. Here is a screen shot of the Linq-to-SQL site that supports paging, filtering, and sorting of data stored in a MSSQL database.

I also did this WebForms version that provides a couple more features and pulls its data from an MSSQL database. The first view is of the options page, and the second shows one of the result pages.

File to Database Converter

2014-02-20T14:06:00.000-08:00

File to Database Converter. Transfers data from the variety of configuration and data files used in the development of Melissa Data's international product (now in Beta) to an encrypted SQLite database that is delivered to customers.

Commercial API's and the Open and Closed Principle (hypotheticals galore)

2013-10-15T14:00:00.001-07:00

Suppose you work for a company that markets a desktop software product that is worth 10,000 times its price in operational savings to your customers. Because you must contend with some worthy competitors, your product earns revenue from only 50% of your potential market (and it doesn't sell at a price that is anywhere near its value).
Now, further suppose that you, yourself, wrote a program that can control your company's product by making use of its exposed programming interface, and automates its key features such that it can perform millions of operations per day and run your company's product 24/7 with almost no human interaction. You soon realize that your product's customers would like to develop and run a similar program that can provide speed and cost savings in their own companies. Because your application is specific to your company's business and won't work for everyone you prepare a library that can be loaded and used by applications that your customers develop for themselves. With your library, their applications possess your application's access to your company's product, but tailored to their needs.

Here are a few function prototypes that your library might offer:

List<string> getImportantData(String whichStuff)

bool storeImportantData(String storeStuff, String where)

String getCalculatedResults(String workStuff)

That three method library is an Applications Programming Interface -- an API. Now, your customers not only purchase your company's product, but most of them will probably want your API library, too.

Modification 1:
So, for the first couple of years your company's sales numbers increase. Now, including your library, you're selling two products. Changes in technology occur, and you might soon be losing market share to one of your competitors because they are nearing the release of a product that offers the same features as yours, including those provided in your API library, except it can be run with an application that features the use of a different programming technology. Instead of driving a monolithic desktop program -- that offers way more features than any one customer needs -- it can load and exercise only the monolith's component subset that it actually uses. That can translate to higher processing speed, less use of diskspace, and lower product cost because it's unnecessary to purchase the unused features.

You certainly do not want to lose market share, so your company sets out to match the competition's new offering by developing new software that offers the same features included in your original desktop product, but now those features can be purchased and loaded on an as-needed basis. This will provide the same technology and benefits as your competitor's product.

You are quick to remind everyone that a significant share of your product's sales have been due to the value provided in your API library, and there is every reason to expect that that will be the case when customers use the new product architecture, too. So, you now want to provide an API library to the new product that provides the same benefits as the original. The new product offers a different technology, but the actions that need to be programmable consist of the same basis: Customers will still want to retrieve and store data, and will still need the calculations obtained from your original product. There are a couple of new features made available by the change to the new architecture, and we will be serving our customers best if you make them available through our API. Some customers may desire to automate the object-based nature of the new product using their own client applications designed using the same object-based technology. Because the objects they use in your new product are not anchored to the monolith as before, it will be necessary to provide a little more information for your library that is loaded in their client applications to obtain a connection with the main product. So, we'll introduce a slightly different API set to handle this new option:

List<string> getImportantStuff(String whichStuff, long serverLocation)

bool storeImportantStuff(String storeStuff, String where, long serverLocation)

String getProcessedResults(String workStuff, long serverLocation)

Excellent! You're now providing the same capabilities to your customers that they found so compelling in your original monolithic desktop product. You are saving them disk space and product expense as well as increasing their processing speed because their client applications are making use of refactored, faster calls.

------------------------------------------------------------------

When Modification 1 was in design it would've been a good idea to provide what would be seen as the same library to customers that you offered in your original product. It would be loaded by customer client applications the same as with your original product, and it would offer the same set of methods. On its backside however, it would interface with your new object-based product instead of your monolithic desktop application. In the case of this Modification 1, it might become necessary to bridge the gap between product generations by presetting new configuration options on the system that allow earlier generation API calls to function in the new environment. With that, your customers could continue to make use of the client applications they had written over time. They would not find it necessary to rewrite the client applications they'd perfected over the years just so they could get the same benefit from your new product they've already had with one they're already using.
From my time working in a developer support role, I am aware of several scenarios where customers cannot update their client applications. Sometimes the subject-expert application author has left the company or the application was written by a short-term contractor. Sometimes companies have lost the source code for applications they run regularly as people left, hardware was updated, or roles changes. Sometimes a change in development technology such as the programming language or development environment makes editing and recompiling a challenge: "There isn't anyone here who does that anymore!"
When the new technology provides your customers with a business case for investment in writing new client applications, their development team could take up that task -- one which is much simpler and less expensive than duplicating what was already working in the old technology.

This is where a software engineering best practice, the Open and Closed Principle is suggested. It can be stated as, "an interface may be open for extension, but closed to change". Once published, a functional interface must remain exactly the same -- no changes allowed. Development entities expend a lot of time and money writing applications to an interface, and if the interface were to be changed some proportion, if not all of that time and expense would be spent again to react to the changes. When the interface developer wants to add new capabilities that might require a change to the interface they instead add new member functions as extensions. With that, any legacy applications will continue to work as before with every new release of the interface library. The clearest and most common example of that is HTML verbs. The likelihood of making a change to the original set of HTML verbs is extremely remote since doing so would disable nearly all websites. Extending HTML's abilities without damage to preexisting work can be achieved through adding new extension verbs, like GetEx.
So, applying this wisdom to Modification 1, the new product could provide what, by nearly all indications is the original library, but one that also includes the new product's extension methods -- the 3 original methods together with the three new ones. For customers who intend to use only the new API methods and the new object based programming technology, the objects to which our API library talks on its back end will be directly available to their new object-based client applications.
With that, your customers are not forced into a situation where they must think about how much time will be required to rewrite their client automation applications. They can continue to enjoy the value of what they did in the past to run with your product, and not be forced into a situation where they wold be wise to look into whether or not your competitor might offer a less expensive or better way to get the job done. The cost arithmetic won't justify their moving to a competitor's product and changing all of their API client applications when they are currently running a suite of client applications that continue to work with every version of your product.

Modification 2:
After a company meeting you learn that your company has merged with another company that is your most successful direct competitor. After the merger your one larger company will have two large customer bases that are standardized on two similar products, but which of course have different API definitions. How should you handle this in order to retain the product loyalty of both customer sets? It seems inevitable that, rather than continue to develop two products that offer an almost exact feature match, one-half of your engineering resources will be retasked. Only one of the two previously separate vendors' products or a new hybrid product will be sold starting in the near future . As similarly asked during consideration of Modification 1, how do you retain the loyalty of customers who may not have used the surviving product or technology before the merger?
As mentioned above, in the future one of two approaches toward balancing the API problem are feasible. One, the company decides that one of the products will survive and the other will be phased out over the next couple of releases in order to provide that side of the customer base with migration time. Or, you can help approximately one-half of the customers by either replacing their soon-to-be deprecated API client applications -- such as preparing a code compiler/translator -- or you can provide a new API library that provides the same interface as the sunset application's API but drives your surviving product.

------------------------------------------------------------------

If the surviving application was developed by your new step-company, then you will want to develop a new version of your standard API library that loads as before, and exposes the same methods as before, but which serves as an adapter to the new API. With that your customers' client applications will continue to function as before, but with the new product. Again, no changes should be necessary on your customers' end other than installation and configuration of the new product.

If the management decision is instead that a new product is to be released that combines and replaces the offerings of each partner in the merger then, as you might guess, a new API library that exposes both your product's and your step-sibling's API's -- that will load and be used with no new customer effort. You can always extend the new library to offer features available in the new application or technology, but, for the reasons cited above in Modification 1, you must always preserve the customers' ability to continue use of the API client applications prepared and used in the past by offering an unchanged basis API.

Modification 3:
Your development team learns that your company is preparing to develop and release a new type of system that is both similar and different to what you've been supporting up until now. The new system type is similar in that customers still desire the ability to automate sending, receiving, and receiving the results of calculations from the application -- a welcome acknowledgment to you and your development team that your API has been offering a solid basis for automating your technology.
When you think about preparing an API interface for this new product, consider whether or not your customers who have written client applications that automate your other product might be able to use those same applications with your new product variation. If you stick to the basis used so far, offering the same set of methods, would they work as written with no more than some configuration changes?

Modification 4:
Customers and sales representatives continually see significant value in, or are coerced into, adjusting their operations so they can take advantage of newer technologies. For example, a product developed to run on a web browser can be quickly and inexpensively installed, distributed, and updated. Access security for such a system can be administered in a very general way. So, of course customers and sales reps are interested in what we can do for them in a mobile or cloud-based solution. As those questions are asked we are led to consider how we might benefit from knowledge of the open-closed rule when the product we will soon offer serves browsers from the web.

------------------------------------------------------------------

In the context of this discussion it is important to state that no matter the new technology obtained, customers will almost certainly continue to run their operation using the same type of system: Windows, OS/X, Linux, ... While the new technology offers new features, for instance a JavaScript API to which customers or customers' server applications can write automation in their HTML forms, your first task is to protect your legacy customers by maintaining open-closed rule support providing an API library that your customers' client applications can load and use.The important fact is that you continue to provide automation access to your basis API so customers are never required to change their legacy applications. The API library you provide to maintain support for legacy customer client apps in this new product might need to do some tricky stuff on its backend to be able to communicate the calls made into the standard library to the web-based objects, but you must not change the link between previous products to which your customers have standardized and your new product technology. As mentioned above, both technologies are running on the same system, where it is both possible and will provide the benefits of their workhorse applications to your customers.

Modification 5: In Modification 3 we were concerned with a jump to a web browser thin client offering. You can also jump to a server-based web technology, which would require that the standard API library variation that you've provided for every product so far must now support the customers' legacy client application, but be able to communicate with the web server where your new server-based product is installed.

------------------------------------------------------------------

Modification 6: By now, the pattern should be clear: With any version of your company's product, provide an API library with the same API interface exposed so your customers are always able to use their legacy client applications with your new generation product. That would include a mobile phone or tablet-based offering. It might involve a watch or a car, or some technology of which we are not yet aware. The pattern is, preserve the basic interface, extend it as you wish, and sometimes consider the extensions a new part of the basis moving forward.

------------------------------------------------------------------

Conclusion: If you prepare an automation library that exposes the same API basis for every generation and variation of your product you save your customers from rewriting developed, tested, and proven applications. They do not need to pay for new planning, design, development, and testing of a new generation of automation applications. They already possess applications that they know and trust will deliver. When you release a new generation of your product that features the use of a new technology, smart customers will see that as a time when they should evaluate your competitors' products, too. Because the suite of automation applications that they need is already on your side of the balance, that evaluation period will likely be very short, and to your advantage.

When a potential customer evaluates your product, even without a suite of automation applications on the shelf based on your API, your reputation will help them realize that all applications they write to automate your product will be usable in all technologies supported by your products in the future.

When a company's development manager considers the design and implementation of automation applications that they will produce to drive your products they will write the suite with long term efficiency and quality in mind. They can write more efficient applications and application components because they know they can rely on your basis API as a given.

C#: GlobalAddressCompare With HTTPWebRequest()

2013-03-27T22:25:00.004-07:00

A small project I provided a couple of years ago was for the benefit of the international QA team at Melissa Data. They wanted to automate the generation and comparison of their own street address formats returned from our product with the values returned by Google Maps API, Bing Maps API, and Address Doctor.

In an example run, the application whose UI is shown above reads a large list of street addresses from an tester-prepared input file and submits each of them via an C# HTTPWebRequest call to the Google and Bing Maps web services as well as the Address Doctor product that we run in -house. Upon receiving the web XML-formatted responses it filters them so only the relevant, comparable data remains -- presented for visual side-by-side comparison and then compares the results returned by the selected services,

Java/SQL: List Dropped Records

2013-03-27T20:34:00.000-07:00

At Melissa Data we purchase many billions of personal contact records from multiple sources representing different facets of the world of commerce. We combine them, eliminating duplication and obsolete information, much like the goal of database normalization, to distill them into the richest, fullest set of records anywhere -- baked fresh in three week cycles. Companies who were once our competitors now submit their records to us so they can be both, updated to contain the most current information, and appended to fill in their blanks.

The processing of these records is done in a sequence of phases, starting with a simple conversion from the source formats to our standard. Through our software build process each address acts like a magnet that attracts all of the data associated with it from the batch of original records. We start with a set of as many as twenty billion records and end up with around one billion product records.

We sometimes encounter a large, but imperceptible loss of records during the transition from one build phase to another after we've made changes to the code that controls the build process. It is not uncommon to find we've lost around 400 million records between steps -- and not realize it unless something unexpected pops into view because that's less than 2% of the data involved.

The application I wrote, known as "List Dropped Records" tracks every record, whether it has been reduced to an archive or remains active, in every phase. It opens the hundreds of thousands of files, reads the record lists therein, and compares the phases in order to report what has disappeared in the form of a database table that lists the record ID's dropped in each phase. With that we can quickly learn whether or not we have loss, and how the amount of loss changes from one build to another.

C#/Custom User Controls: AtomSet Utilities

2013-03-15T09:53:00.004-07:00

Overview: In a project whose goal is to convert a huge set of international street addresses obtained from hundreds of different sources to the format that is standard for its country, the code we develop at Melissa Data's executes a succession of processing steps. We want to see the effects of our code at each step to ensure it achieves what we expect or what else if not. The AtomSetUtilities application is the solution I prepared to address this need.

There is an internal class that is the data storage unit for an international address in the sequence of processing steps, AtomSet. Its data is stored in the raw format in which it was received from our sources in a normalized database, and assigned to the class members during class construction. Because it concerns approximately 240 different national address formats, the definition of an AtomSet will sometimes change as development work continues, or because national formats vary. Also, the data it contains will require a variable number of entries in its components, so a viewer application must read and present each AtomSet in a dynamic way.

Because there are actually nine different steps in the execution sequence that we want to view I wrote a C# user control that I call the AtomSetViewerUserControl that can be dropped onto a UI area and loaded with whatever dataset is now required. The screenshot below shows one of these user controls containing three dynamically-placed text boxes on the AtomSetUtilities' configuration property page.

A particularly interesting feature that I've added, that was not in the original requirements, but which I thought would be helpful for communications between QA and Development is the ability to record and reload a Snapshot. A Snapshot is a set of data values that are in the AtomSetUtilities' UI at some instant that have been serialized so that running state can be reloaded later.

Here is another property page with evidence of progress after some steps have completed:

Because I have responsibility for the development of 21 different development utility applications that concern similar subject matter I saw several violations of one of my favorite software design rules, the Once and Only Once rule. That is stated from the hip as, "a rule should be coded once and only once, and any duplication should be eliminated by using a method that can be called from all places where the duplication was". As I have developed the 21 applications I have prepared and efficiently made use of 23 libraries, among which are these: 1) the viewer described above; 2) the AtomSet used in all 21 applications; 3) a class definition that is displayed in a combobox; 4) a customized file class; 5) A custom file-open dialog containing an MRU list; 6) a international character culture translator; and, 7) a configuration serialization class.

C#: KML Generator

2013-03-10T21:58:00.000-07:00

Project overview: We want to show high quality competitive analysis, marketing, and project progress overviews in an easily understood geographical view.

I studied the GIS tutorials and reverse-engineered some KML examples to write the KMLGenerator application which produces two and three dimensional overlays on Google Maps and Google Earth. The original request for an application that could render data geographically was for a color overlay, but as I explored the GIS methods I found that extruded features did a better job of conveying magnitude and making memories and so added several additional rendering and color modes.

The app's UI looks like this:

It accepts very easy to understand user-prepared Excel csv files with the location, data categories, and data values and another file I prepared that contains the geocodes for the centroid and perimeter coordinates of geographic entities such as countries and states. It generates output conveying numeric data values in an Earth map that looks like this:

and more of the same in a United States view:

Street Segments
Another project concerned showing street segments where the street is known by a different name in different locations so we could visualize what we were reading in the data. For instance, Pacific Highway South, Aurora Ave, Highway 99, and Evergreen Way are all different names for Washington State Highway 99 between SeaTac and Everett, WA. The map below shows sections of a road that are known by different names using a different color for each different name. A popup provides additional information such as the local street name and the address range.

I wrote this application so it would read and process a large data file and produce a different Google Earth folder for each viewpoint. Doing so allows the person who is reviewing street names for preferred values within a geocode range to run the entire list once and visit the folders, the content of which is one road, as they have time.

Java/SQL: Record Indexer

2013-01-08T15:15:00.000-08:00

At Melissa Data we purchase large mailing lists from several different facets of the business world and distill them into a form our customers from many business perspectives easily query to gain valuable information. We start with 15-20 billion records from our suppliers and combine them, eliminating redundancy and outdated information until we end up with a much smaller and fuller set of customer contact records.

The size of the data set just mentioned makes it very difficult to realize the effect of every rule we code into the software distillation process that converts purchased data into our product. Each small error or misunderstood user story in our code might obscure or destroy a lot of value in the final product. We've seen more than 100 million records lost -- unrealized -- several times due to a minor oversight in the code involved at some distillation stage. Because the size of the output at each phase is so large, the time necessary to properly review it would deadlock our development efforts and we need a way to quickly locate the text of suspicious records as well as that of their ancestors at every distillation stage.

At the origin of the record distillation process we've purchased thousands of raw text files that, depending on the source, may each contain between hundreds of thousands and millions of records. As the data is processed in four distinct stages including its raw state, two distillation steps, and the final product, all data from all original files remains present at each level. To most efficiently review and test the effects of our software at each development stage we need access to the records' text at all four distillation phases.

The solution we're using for this is a SQLite database named IndexID.db. It contains two simple tables, the first of which, "recordsIndex", contains three simple fields: a record ID, a file ID, and a file offset. The second table, "filesIndex", consists of a file ID and the fully-qualified path to the file. When a developer --or better yet, a program -- seeks to quickly see the contents of a record, the record's ID value is used to locate and read the record text.

After four paragraphs that tell what is involved, we finally start talking about what I think is the interesting part -- the application that creates and loads the IndexID database, the IdIndexer. Every record formatted for our system contains a field that holds a list of the record's ancestor record ID's -- ID's of the records whose data found its way into the record. As each stage of the distillation process finishes, every single bit of the original data can be found within one of the records, but always in one with a different ID from the previous or next phase. The results of all build phases are saved to files in their own directory tree. The IdIndexer opens every file in the build directory tree, from which it locates and lists every path, filename, record ID, list of ancestor record ID's, and offset to each record in the files.

The amount of disk space a database such as IndexID.db consumes is proportional to the number of characters stored in its field values. Because the number of records is so great in IndexID.db, many characters and digits are required to specify their record ID. The syntax for our record ID's uses 40 alphanumeric characters that are grouped by three dashes. The amount of database storage space necessary for a bunch of 40-char ID's alone is 40 times the number of records -- which in our case is around 25 billion -- so we are talking about needing around one trillion bytes. To minimize the amount of storage for the record ID's I took two approaches, the first of which was to convert the last quarter of the ID from a ten digit string of base-10 numerals to the base-36 equivalent, which is normally a change from ten to around three digits.

In the case of the other three ID segments, as mentioned above, the first contained only four characters, and they spelled one of a very small number of strings. The second and third segments each contain around ten alphanumeric characters -- which happily increment from zero to some alphanumeric value, and therefore contain lots of zeros at the front end. It was quickly obvious that there was a lot of duplication involved in all three seqments. What I did is map each seqment's value to a base-36 number that was incremented each time a new segment value was encountered. So, for instance, the first segment was converted to a numeral 1 in place of the original "ABCD". In the second and third segments a string like, "000001R0469", was converted to a base-36 number like 124 because "000001R0469" was the 48096th unique string encounterd, which translates to 124 in base 36. With those translations to the four segments we now store the record ID's for the earlier mentioned 25 billion records in approximately 8 digits instead of the original 40, which reduces disk space used from one trillion bytes to closer to 200 billion.

JavaScript/JSON: Record Visualizer

2012-10-25T19:49:00.001-07:00

One product we sell at Melissa Data is a list of data that matches each customer's demographic needs. We buy data ourselves in a raw form from many different vendors which we refine into a standard format that is efficient and convenient for our customers to access. Because the data we buy comes in so many different sellers' formats, several steps must be taken in the process that rearranges it into our standard format. An important type of information that we want to visualize in our development tools at Melissa Data is the seller source of each bit of data. With that we can tell what fraction of our data comes from which source, the quality of the data from each of them, and where problems have occurred between one assimilation phase to another. When reading from a mountain of contact records a picture is often worth a thousand words. The Record Visualizer project I was assigned produced a Javascript-based tool within a Java servlet-based Apache website that presents data within the records that come into existence at each phase of our assimilation process. Records created during the build are linked by horizontal edges to their parent or children to show the sequence the data follows. The InfoVis Javascript jit library provides an easy way to show this sequence via a tree as is shown below.

Each node on the tree is labeled with a record ID. When one is selected it changes colors and the data within the record is shown on the rightside panel. Note the alternate coloring of the panel text that is intended to distinguish between different items in a compact sequence of entries, and the tooltip that drops down whenever the mouse pointer is hovered over the purple demographic bullet point. The tooltip is a means to present sorted detail info concerning the bullet point while minimizing screen clutter.

The lower screenshot shows the page again with the configuration panel dropped down to reveal the list of record fields that can be shown or hidden in the right side panel.

C#/SQL: Search Builder: Defining a Fuzzy Search

2011-06-27T17:08:00.000-07:00

We assimilate between ten to twelve billion records at Melissa Data that we collect via donation or purchase into just over a billion high quality records that are current, accurate, and more complete. Each record consists of many fields, the contents of which possess a different commercial value to our various customers. Some are interested in a record if one of its fields contains a specific value or its value is a member of a certain set, while others are interested in certain combinations of values in a set of fields.

Typically, customers assemble an SQL query to access the record set of interest in a database, and assembling that approach often becomes long and confusing as the customer's intent is fine-tuned. An ordered list of fuzzy constraints applied to their search is relatively easy to list and understand, and will usually result in better matches to their search intent. These simple fuzzy statements concern which fields are of interest, and also concerns such as: the precedence field matching should be applied when there are multiple fields being considered; the quality of the match levels; the level of match completeness desired; which, and what number of a record's fields must match; is one date range preferable to others; and more. The Search Builder is the application Melissa Data sales engineers use to convert customers' intentions to a data search.

The DataGridview provides add, remove, sort, and modify options.

This is an application designed and orginally coded in C# with WPF and SQLite. I later switched to Winforms when I couldn't find capabilities in the WPF datagridview necessary for user stories that arose later in the project.

The checked or unchecked states of the grid of checkboxes shown above in the DataGridview in both the Rank and Terms tabs are read as binary strings from the SQLite database (e.g. 10010110000101011 & 01110011011101110 correspond to two of the rows with 17 columns).

The Setup tab, seen below, contains a list of the database files previously opened and a list of the user-specified names for saved fuzzy queries which are stored in an SQLite database.

The Setup Tab

The strings seen in the controls in the Candidates and Tests tabs are loaded from a dump of the table and field names in each customer's contact database. Those names are combined with verbs that help construct good fuzzy statements such as "lastname.required.unaltered".

The Candidates Tab

Java/SQL: A Test Case Builder

2011-06-27T16:54:00.000-07:00

When we see the data that we work with at Melissa Data we generally see a huge blob of a few billion records in a few files. As the program which assimilates the billions of records into a set that is merely 10-20% of the size of the orginal records runs it is pretty difficult to perceive a problem from looking at logs or inspections of the records. Even if you devote the time to a well-distributed sampling of a few million records across the dataset you are probably only peeking into less than .1% of the records there. No warm feeling of confidence is found there, even if we had the staff to study the few million records in that .1%. Add to that the fact that the assimilation application sequence uses terrabytes of disk space and runs over two weeks time each time it generates a product dataset and you'll see that we have a practical need for a way to locate and focus on problems.

We needed an application that provides a means to select and process a much smaller set of records so we have a chance to view the source of problems and correct them.
The Test Case Builder is an application I wrote that allows the generation of a Melissa Data dataset that contains perhaps thirty or forty records instead of a few billion. This is valuable because it requires relatively little time to complete, produces a set of data of a size that can be juggled within a person's brain, and allows for sampling that relieves our fear that we've not seen enough to be confident.

The Test Case Builder is a Java class that is started in a script file at a Linux command line. A command line parameter must be specified that names the configuration file that contains the locations for the database, the output location, and where driver files should be placed -- and where they will be daemon monitored. It provides no UI as it is intended to be run on a remote Linux server with no UI capability beyond text-based menus.

Running the Test Case Builder involves preparing a couple of prerequisites so it has the data available to retrieve the data records. First up, a Melissa Data dataset must be generated -- typically containing around 3 billion records. The second prerequisite requires another application I wrote named the IDIndexer to comb through the huge dataset and produce a database that links the record ID of every record in the huge dataset with the name of the file in which the record is stored and the numerical offset into the file where the record's text can be read. From there a user can drop a file into a "requests" directory that contains a list of the record ID's whose record text is wanted for study. These request files usually contain a list of record ID's that were flagged as interesting during the Test department's work.
The Test Case Builder monitors the requests directory and when it finds a request file it opens it and reads the list of record ID's. For each record ID in the file's list the Test Case Builder queries the IDIndex-produced database for the name of the file it came from and the file offset of the record. It next opens that file and reads the record text and writes it to a file that will contain the input for a tiny version of an assimilation build.
Each of the six phases of the assimilation process produces a set of slightly improved and refined records. At each phase any records that were combined to create a record in the new phase are stored within a field of the new child record so the ancestry can be traced. Each time a record is read by the Test Case Builder the list of its parent records is read and they, too, are recursively searched so every source of data that went into a finished product record will be included in the output file.

When the Test Case Builder is done it has generated three files that contain different generations of records, and that can be opened and viewed in the Build Viewer application or can be fed to a very small dataset build process so the effects of each stage of the build can be examined and understood by the development team.

C#/SQL: The Record Viewer: Quick Access To 3,000,000,000 Data Records

2010-11-10T20:45:00.000-08:00

The products offered by Melissa Data are personal and business contact lists intended for use by marketing or search organizations. Melissa Data combines billions of partial records received from several sources into a smaller set of current and complete records. Marketers can use the high quality Melissa Data product to reach customers who are more likely to become actual purchasers and avoid wasteful effort and postage costs associated with attempting to contact previous or duplicate addresses. A search business can learn and provide the latest updates concerning the contact information for people or businesses.

There are far too many records in play for us to accurately realize: i) the effects of modifications to our development process; ii) changes in source and types of input; and, iii) software that converts the raw data to the finished output. One of my purposes at Melissa Data is to produce applications that make retrieval and viewing of records practical at any stage of their assimilation.

One application I wrote for those purposes was a C# Winform record viewer that interfaced with both XML and mySQL backends, the MD Build Viewer. The Build Viewer allows list, view, search, and jump access to every one of the billions of records at each step in their production, their heirarchy, and changes resulting from modifications in a preceding phase. The final version of the records are read from a mySQL database and the parent records are read from a very large flat text, proprietary format text file.

Handling that amount of data reflected on the Build Viewer requirements in the form of performance goals, such as: The UI loads access to the data (approx. 3 billion records) in less than 15 seconds so the user experiences minimal delay (On that count, consider that it takes far longer than 15 seconds for the comparatively simple Linux wc shell command to count the number of lines in a 3 billion line file); Switching between records, presenting parent records, and jumping to records should be rendered in either a standard or cascading sidebar form; Clicking on any record field will jump to the breakout of that record component; Right-clicking on a street address will pop up a Google map with the address shown in the map. Here are a few views:

The view above shows the treeview record list and the detail of the selected record.

The next shows the details of one of the individuals in that record. The yellow tree nodes are those that have been reviewed and the red ones have been read and marked via a right click menu selection for later review. The user can suppress all but the red nodes or store them to a serialized list that can be loaded in a later session. A mnemonic key sequence provides a means to jump to the next or previous red node or to a specified record number.

Here are views of the sidebar view that shows the contents of the parent records of the one selected in the treeview. The upper will replace the parent record present in the sidebar with the one just selected from the list or reached via the up or down arrow key.
The lower shot shows the cascade option where the records overlap as the user clicks or accesses them via the arrow keys.

And here is one that show a Google Map inset resulting from right-clicking on an address in the record field and a sidebar view showing the same.

C++: Support of Windows Group Policy for Attachmate

2010-04-06T12:24:00.001-07:00

Around the time that I finished the project to rewrite the API set in Attachmate's Extra! Personal Client product I saw an increasing number of articles in the trade magazines concerning the subjects of management and distribution of applications. This was a time that straddled the shift from desktop applications to web-based. Enterprises were keen on improving their return on investment of their software dollars -- often described as reducing the enterprise-wide cost of ownership of a product.

I asked my development manager if I might devote a project's time to researching this area and perhaps develop something if I ran into a good idea. After receiving his OK I obtained and installed several products including Novell's LanDesk Manager and Citrix WinFrame. In the end I felt that Microsoft had, in its Microsoft Management Console, a good candidate that I could quickly fit with Attachmate's products.

At design time, based on the MMC capabilities, I saw my goals as:
1) prepare the Extra! product for ease of distribution on a corporate LAN or WAN.
2) prepare the Extra! product so its features list can be centrally provided or withheld from a user based on one or more of their Windows network group memberships.

First up, I wrote an Administrator file for Extra! (known as an ADM file) that mapped a user's permission to access Extra!'s menu items and configuration settings. I then added code to Extra!'s frame class that obeyed the MMC protocol configured by an MMC administrator when the new extra.ADM file was used. An MMC administrator could then assign a mapping of features that should be available, disabled, or hidden to a specific user, a group of users, or the entire enterprise from his console.

With that, a network administrator could prepare Windows roaming profiles for an individual, a group, or the enterprise that silently installed an Extra! product from a network server, that contained a preconfigured set of product session parameters, and a policy set that provides or prevents access to Extra! product features as appropriate for the user.

Apart from the development efforts using the guinea pigs in Development, I did three different demonstrations to Product Management, QA, and Customer Support in which I brought a clean Windows laptop into the conference room -- or borrowed one from somebody in the room -- and logged in to an account that was configured to use a roaming profile. I showed the start menu items that came from the profile and also that Extra! was not installed on the PC. Then I clicked on the start menu item for Extra! which resulted in a running Extra! session showing up on the desktop about 20 seconds later. When checked, everyone attending could see that several of the menu items were disabled -- corresponding to the policy settings I set earlier in the server room while impersonating an MMC administrator. I repeated the same demonstration using a different roaming user account to show that that other user had a different set of menu items enabled and disabled.

This work coincidentally was a significant contributor to Extra!'s ability to meet Windows 2000 Logo certification the next year.

SQL/Master Data: Information Integration at Attachmate

2010-04-05T22:07:00.000-07:00

Back in the earliest days of Attachmate's history there were three employees and the one manufacturing/sales/software development/support department. As the company began to grow new departments were formed to handle specialized tasks such as Sales, Marketing, Shipping, Software Development, Testing, and Customer Support. As each department grew they separately designed a uniquely structured Advanced Revelations (AREV) relational database to handle information relevant to their specific mission in the company. For example, Sales kept data about transactions and the customer systems involved. At the same time, but on a different database, Customer Support logged data about customer systems, problems, and missing features that, if implemented into Attachmate's product would provide for compelling sales opportunities.

And as this growth was occurring Marketing was sending advertising mailers to customers for products Sales had recently sold to them and customers were frequently asked the same questions each time they communicated with a different department. That is, information in every department's database was valuable for numerous reasons if it could be made available to the other divisions. I was given the opportunity to devise a way to organize and manage the different departments' databases so accurate, consistent, and up-to-date information would be available to everyone.

Going in, I expected that my goal was to decide the best technical design for an end-product data warehouse. However, it turns out that establishing nth level normalization is not the biggest challenge in this type of an endeavor. I realized early on that I was put in charge of this project because several months of weekly meetings attended by the nine department heads involved had failed to pound out a well-normalized corporate database. Each of the nine departments with an AREV database had a lot of stored data in their database, but each was constructed in a dialect that reflected the needs and understanding of the types of people who worked in the department. That is, while much of the data was exactly the same from one system to the next, there was a lack of what are known as "unique identifiers" shared by all of the databases to identify a field type. For example, a field in one department's database might cover multiple fields in another department's. Another common problem was where a field in one department's database might concern the same data, but be named differently, and some departments had developed a serious religion about their choice of field names.

The steps I chose to reach an agreement involved producing a candidate list of the fields that emerged from running historical queries against all of the databases using the each department's terminology. To avoid the head butting that had occurred before my entry to the project I submitted in series -- specifically not in parallel -- the candidate table and field list to each of the department managers involved. (If any of you managers involved are reading this, my strategy was to gain managers' acceptance in order from the least agreeable of you to the most. ;-) ). Each time I ran into need for a change when discussing the list with one manager I would polish the changes in and start over at the first manager, successively gaining approval until the job was done. Yes, at the start it looked like an O(n^2) endeavor, but happily the managers turned out to be agreeable and it was finished in O(n) time.

From that effort, together with the purchase of a newer web-based database engine, the goal to create one master corporate database was completed. It provided a lot of strategic information, made clear what tactics worked well from one project or product lifetime to the next, and allowed us to optimally manage our efforts in ways that certainly played a role in Attachmate's later success.

C++/Java Invocation API: Java App Meets Vista Logo Certification?

2009-07-09T19:26:00.000-07:00

Attachmate wanted to prepare an update to their Windows-native product, Reflection X, in Java so it could be run on several previously unsupported operating systems. At the same time, another significant goal was supplied from Attachmate's Marketing department, which specified that Reflection X must pass Microsoft's Vista Logo Certification.

Vista Logo requires a candidate application to meet more than 30 Microsoft-specified requirements that are to be tested by a third party validator, in this case Veri-Test. Two items from the list of requirements are 1) the application must provide a fully capable MSI installer; and 2) the application must conform to the Windows Error Reporting (WER) initiative. In a nutshell, the WER requirement will be met when 1) serious runtime exceptions are recognized by the application; 2) the user is warned via a common Windows dialog that asks if the problem should be reported to Microsoft; 3) if the user selects OK, a minidump is transmitted to the Microsoft WER site where it can be bucketed for statistical purposes and from where the user can be redirected to a website that provides further information on a workaround or product update; & 4) an application restart action can be started.

I prepared a WIX MSI installer that installed the Reflection X jar files, documents, graphic images, language resources, and JVM, and then installed and started the Reflection X Server, the underlying Windows service. Difficulty with the WER requirement came from the fact that the Java JVM is a virtual machine that contains its internal threads within structured exception handlers (SEH). That means that any serious or fatal Java exception that originates in Reflection X -- the kind that should be reported via the WER -- would be caught by the JVM, never making its way to a WER handler in RX's Java launcher application.

The interesting solution to this problem was found in the precedence given to different exception handler types. As mentioned above, a SEH is defined to surround each Java thread and therefore catches all exceptions that originate within the thread. If an exception is not handled in an application's SEH then the JVM will eventually catch it with one of its own handlers, ensuring that none escape to an interested WER manager. Another exception type, a Vectored Exception Handler (VEH), is given the first look at every exception thrown within the application by the operating sytem before it makes it to any SEH. I added a VEH to Reflection X's Java launcher application that evaluates every exception Reflection X generates and reports serious or fatal failures to the WER before passing them on to the SEH that should eventually handle it.

As a result of this design, and I think much to Microsoft's chagrin, Reflection X 2008 is the only Java application to earn Vista Logo Certification.

C: Rocket System Reliability

2009-01-27T22:58:00.000-08:00

I started working at Olin Rocket Research as an applications programmer soon after I graduated from the University of Washington. I was hired to maintain existing engineering applications that were written in Basic, Fortran, or Lotus macros for the Chemical and Reliability Engineering departments.
Among the projects I was assigned was the update of a program that calculated the reliability of a rocket system. The program had been originally written by a mechanical engineer more than ten years earlier and then modified several times each year as needed for each new system variation.
The calculation for reliability of a hardware system is similar to that for resistance of an electrical system in that serial and parallel component sequences must be identified and counted differently. The calculated reliability of a series of hardware components is simply the product of the set of serial components' tested reliability ratings. The calculated reliability of parallel components is a little trickier -- too simply stated as 1 - [(1-pc1)(1-pc2)] for a simple single redundancy system with parallel component reliability values of pc1 and pc2.
The problem as I saw it was twofold. First, since the components of each different system were different in number and type -- and each different component commonly possessed a different reliability value -- the original Fortran program had to be recompiled with a new data set each time a new system was to be evaluated. The other part of the problem was that each system that had a different number of components and a different redundancy arrangement had to be described in a different way in the code. That is, each time the hardware system changed the calculation sequence had to be rewritten to reflect the new serial and parallel layout and then recompiled. That easily could be seen to have wasted months of effort over the program's lifetime.
I saw that the program needed to be changed so that it would no longer be necessary to recompile every time a new system was to be evaluated. I chose to drive the program with a table described in a data file that provided system shape and component reliability values. I also saw that a switch to the C programming language was necessary because Fortran did not offer dynamic memory allocation needed for variably-sized systems nor the recursion that would be required for handling all possible shapes of parallel circuits.
The next day I replaced the 2000 lines of Fortran sourcecode in a couple of hours with a 50 line, table-driven recursive C-language version. From that time on, all a mechanical engineer had to do to calculate a system's reliability was feed it's shape and components' reliability values into a data file and open and run it with the program. No code changes or compiler were needed from then on.

C: Cubic Splines Interpolation

2009-01-27T20:38:00.000-08:00

I started working at Olin Rocket Research as an applications programmer soon after I graduated from the University of Washington. I was hired to maintain existing engineering applications that were written in Basic, Fortran, or Lotus macros for the Chemical and Reliability Engineering departments.
Among the projects I was assigned was to perform data reduction on about seventeen years of test laboratory data that tracked, over time and temperature change, the reaction with hydrazine rocket fuel of different metals that might be used in rocket fuel tanks, plumbing, or nozzles. The product of the work was to be two-dimensional line charts that showed the measure of the reaction recorded -- the amount of gas generated.
The problem with the tool available at the time, Lotus, was it rendered line charts in jagged, sawtooth patterns. The extremes of those charts reflect measured, accurate values, but nearly all of the points in between were not accurate. The slope of the curve approaching and leaving a naturally occurring measured data point is almost never sharply steep, flat, and then sharply steep in the other direction in the span of three infinitesimally small data points. Instead, the slope approaches and leaves zero comparably slowly at each maximum or minimum point, and curves gradually as it connects them.
To improve the quality of the visual rendering I wrote a new program that generated a large number of additional data points using cubic splines interpolation on the lab-measured data points. That involved predictive calculation of a more natural first and second derivative on both sides of each inflection point based on the slope of the surrounding lines' midpoints. With the new data points the chart would be redrawn using many more data points than those actually measured, which would effectively correct the sawtooth pattern with curves, much like the effect of using a french curve drafting tool. Each line reflected what near instantaneous measurement would've shown.