Monday, November 17, 2014

The Area Browser

A significant part of my current job involves converting data that is obtained from many different sources into a standard format that uses the appropriate terminology for the general customer. One example of that is seen in the various ways we receive our international data. Within the data product of our different international data sources we see different standards for letter casing, combinations of national and language culture, different choices for area name modifiers, and entity duplication.To deal with that we have prepared an application, known as the Area Browser, that has a somewhat cohesive set of features intended to help a geographic area specialist merge the various versions of each unique data point into the preferred value as considered by our paying customers.

As seen just below, the Area Browser presents two tree views of the same data that allow the subject expert to line up merging or moving data so it is logically organized.  Features include several ways to move or merge tree nodes including movement of multiple nodes, movement or merging of like-named nodes, searching and matching that is wise to culture casing and transliteration, assigning the hierarchy of area level names as appropriate for the country, and manual creation/deletion and editing of nodes.

The Area Browser

Menu items show available actions:

The Preference Selector shown below is accessed via a mnemonic or menu item for the purpose of resolving terminology differences that result in duplication across the data set that falls below a selected tree level.  It provides lists of areas for which there is not yet a preference, and for areas where there are duplicate preferences specified. It offers the means to specify terms that should be preferred or avoided, the letter casing in which that preference should be selected, and whether consideration terms should be substrings or whole words.

The Preference Selector

Monday, September 22, 2014

Performance profiling

When my team had finished the basic development work for a new set of seven API's for Attachmate's flagship product, EXTRA! Personal Client, we received the unwelcome news that many methods in our libraries offered slower execution times than our chief competitor.  This set off two parallel efforts to diagnose and solve the problem so we would not only have the most reliable and portable API's, but also the API's offering the best performance in the industry. I handled one approach, and project management worked together to resolve it their way.

The API's were written in the form of DLL libraries that were to be loaded by our customers' homegrown client applications, which then became able to automate function sequences on a remote EXTRA! session that would relay the automation conversation to some type of mainframe computer. There were three primary architectural components to our overall product that determined how it performed compared to our competition.  First, there was the speed of the calls inside the DLL. Most of the library methods were little more than pass-thru methods that relayed a set of parameters to the EXTRA! session where processing would occur.  A cursory McCabe's evaluation suggested that there was almost no percentage of execution time was spent in the library.  Another component was the EXTRA! terminal application, which performed a fair amount of mainframe session processing within itself, but also in network communications with the actual mainframe computer. The last of the main components, was the remote  process call (RPC) between the library and the EXTRA! session.  This is another DLL where the parameters and method ID would be marshalled and sent across the wire from the client DLL to the EXTRA! terminal application. Because the way this was implemented -- a COM bridge between the two modules --was new to the Attachmate culture, the village cried out, "WITCH!", and all of product and marketing management devoted their efforts to replacing this with something better understood.  A staff programmer stated that he could solve all of our problems by replacing  the offending COM bridge with a mapped memory implementation.

My route involved the use of a profiler to actually learn how much, and what fraction of execution time was occured in each API component -- method-by-method.  In a nutshell, the average run percentage in the three API components came in around this:

5% in the DLL while running in the client application
5-15% in the RPC -- depending on which method was used (and how much data was transferred)
80-90% in the terminal

I reported that if we made our API execution time be -1% of each calls duration we could never improve performance by over 16% on any single call.  And, on average they could not expect greater than around 7% improvement -- again, if execution time was improved to -1% of a call's duration.  If they would devote a small amount of effort to the big problem, the terminal application, they would realize a much greater benefit. I produced a table that showed the ultimate possible improvement for each method together with the expected, and stored it in the QA archives so it could be validated at some later time.

Product management were swayed by the hysterics of the other side, they devoted two years' effort to preparing an alternate RPC DLL, and then released it with the product.  They claimed internally and in the marketing propaganda that tests had shown up to a 90% reduction in API execution time. Consider that they didn't touch where, on average, 85% of the execution time occurred that was amazing.  As if the new RPC DLL used -17 times as much processing time.  They didn't listen to my remark that COM is simply Microsoft's implementation of mapped memory, and that it was being tested millions of times every minute around the world, so it was likely more reliable and efficient than anything we were likely to produce to replace it.

I didn't mention before that after the second release there were almost zero bugs reported on the COM-based API set for its product lifespan -- which continues until today.  However, the mapped menory implementation, known as "Enhanced Transport" was continually wracked with issues, also continuing to this day.  During a particularly slow development period after I'd left Attachmate I heard that Marketing decided that configuring the product so Enhanced Transport was the default RPC mechanism would be a good improvement to feature in the advertisements.  It actually caused major fallout due to damage to the reliability of the API itself.  A followup release became necessary to fix the damage to clients' operations and Attachmate's corporate reputation.

I rejoined at Attachmate 2-1/2 years later and found that Paul Riggs, a tester in QA, had eventually performed the formal analysis between the two RPC mechanisms about two years after the Enhanced version's release.  His tests found exactly what my profiling predicted.  Paul told me that he could lay his report over mine and hold it up to the light and they were nearly identical.

This is an example of how a use of good software evaluation tools can lead you to develop the correct solution and help you to make your case when faced with determined and ranking opposition.

Sunday, September 21, 2014

Windows Phone App: Residents

I was asked to prepare a simple prototype Windows mobile app that could be used by a political candidate when he was canvassing neighborhoods on foot. The application was to prepare him with the name of the people who live in the home where he is about to knock on the door by querying our online database. If our demographic record contains it, we would also provide information regarding the primary resident's political contribution history.

The application ran in two modes; First, the mobile device's onboard GPS would provide the holder's coordinates, and they would be checked against our database for the geocode to residence match. Mode 2 allowed the user to type the address for cases where our geocode didn't possess sufficient precision for an area to return resident information.

Wednesday, June 11, 2014

ASP.NET MVC 4 Reporting Site

Recently I've been working on preparing some views into our data that can both be easily accessed and customized by the user, and the data easily updated by the project administrator.  I've prepared a few internal webform and MVC websites using the Entity Framework and Linq-to-SQL. Here is a screen shot of the Linq-to-SQL site that supports paging, filtering, and sorting of data stored in a MSSQL database.

I also did this WebForms version that provides a couple more features and pulls its data from an MSSQL database. The first view is of the options page, and the second shows one of the result pages.

Thursday, February 20, 2014

File to Database Converter

File to Database Converter. Transfers data from the variety of configuration and data files used in the development of Melissa Data's international product (now in Beta) to an encrypted SQLite database that is delivered to customers.