Monday, September 22, 2014

Performance profiling

When my team had finished the basic development work for a new set of seven API's for Attachmate's flagship product, EXTRA! Personal Client, we received the unwelcome news that many methods in our libraries offered slower execution times than our chief competitor.  This set off two parallel efforts to diagnose and solve the problem so we would not only have the most reliable and portable API's, but also the API's offering the best performance in the industry. I handled one approach, and project management worked together to resolve it their way.

The API's were written in the form of DLL libraries that were to be loaded by our customers' homegrown client applications, which then became able to automate function sequences on a remote EXTRA! session that would relay the automation conversation to some type of mainframe computer. There were three primary architectural components to our overall product that determined how it performed compared to our competition.  First, there was the speed of the calls inside the DLL. Most of the library methods were little more than pass-thru methods that relayed a set of parameters to the EXTRA! session where processing would occur.  A cursory McCabe's evaluation suggested that there was almost no percentage of execution time was spent in the library.  Another component was the EXTRA! terminal application, which performed a fair amount of mainframe session processing within itself, but also in network communications with the actual mainframe computer. The last of the main components, was the remote  process call (RPC) between the library and the EXTRA! session.  This is another DLL where the parameters and method ID would be marshalled and sent across the wire from the client DLL to the EXTRA! terminal application. Because the way this was implemented -- a COM bridge between the two modules --was new to the Attachmate culture, the village cried out, "WITCH!", and all of product and marketing management devoted their efforts to replacing this with something better understood.  A staff programmer stated that he could solve all of our problems by replacing  the offending COM bridge with a mapped memory implementation.

My route involved the use of a profiler to actually learn how much, and what fraction of execution time was occured in each API component -- method-by-method.  In a nutshell, the average run percentage in the three API components came in around this:

5% in the DLL while running in the client application
5-15% in the RPC -- depending on which method was used (and how much data was transferred)
80-90% in the terminal

I reported that if we made our API execution time be -1% of each calls duration we could never improve performance by over 16% on any single call.  And, on average they could not expect greater than around 7% improvement -- again, if execution time was improved to -1% of a call's duration.  If they would devote a small amount of effort to the big problem, the terminal application, they would realize a much greater benefit. I produced a table that showed the ultimate possible improvement for each method together with the expected, and stored it in the QA archives so it could be validated at some later time.

Product management were swayed by the hysterics of the other side, they devoted two years' effort to preparing an alternate RPC DLL, and then released it with the product.  They claimed internally and in the marketing propaganda that tests had shown up to a 90% reduction in API execution time. Consider that they didn't touch where, on average, 85% of the execution time occurred that was amazing.  As if the new RPC DLL used -17 times as much processing time.  They didn't listen to my remark that COM is simply Microsoft's implementation of mapped memory, and that it was being tested millions of times every minute around the world, so it was likely more reliable and efficient than anything we were likely to produce to replace it.

I didn't mention before that after the second release there were almost zero bugs reported on the COM-based API set for its product lifespan -- which continues until today.  However, the mapped menory implementation, known as "Enhanced Transport" was continually wracked with issues, also continuing to this day.  During a particularly slow development period after I'd left Attachmate I heard that Marketing decided that configuring the product so Enhanced Transport was the default RPC mechanism would be a good improvement to feature in the advertisements.  It actually caused major fallout due to damage to the reliability of the API itself.  A followup release became necessary to fix the damage to clients' operations and Attachmate's corporate reputation.

I rejoined at Attachmate 2-1/2 years later and found that Paul Riggs, a tester in QA, had eventually performed the formal analysis between the two RPC mechanisms about two years after the Enhanced version's release.  His tests found exactly what my profiling predicted.  Paul told me that he could lay his report over mine and hold it up to the light and they were nearly identical.

This is an example of how a use of good software evaluation tools can lead you to develop the correct solution and help you to make your case when faced with determined and ranking opposition.

Sunday, September 21, 2014

Windows Phone App: Residents

I was asked to prepare a simple prototype Windows mobile app that could be used by a political candidate when he was canvassing neighborhoods on foot. The application was to prepare him with the name of the people who live in the home where he is about to knock on the door by querying our online database. If our demographic record contains it, we would also provide information regarding the primary resident's political contribution history.


The application ran in two modes; First, the mobile device's onboard GPS would provide the holder's coordinates, and they would be checked against our database for the geocode to residence match. Mode 2 allowed the user to type the address for cases where our geocode didn't possess sufficient precision for an area to return resident information.