Saturday, January 24, 2009

C++/SQL: Speedbar State Machine

Speedbar Case Study

Speedbar State Machine Case Study
Object Oriented Analysis - Winter 2005
Don Cannon
March 14, 2005


Supervisor/Agent Recorder/Player dialog with Speedbar
Problem Statement
My former employer had a product on the market whose engineering produced a rate, process maturity, and team culture that commonly resulted in Windows blue screen issues. When the Director of Development for this company -- my development manager at my previous job -- first demonstrated the product to me, it blue screened four times during the demo -- unintentionally. Each time the PC had to be shut off and restarted, with the loss of many of the application's volatile configuration settings. I had been hired to stabilize the product overall, with elimination of these blue screens the highest priority.

Product Overview
The UI view for the application that possessed that behavior is a custom toolbar called "the Speedbar". The Speedbar controls recording, playing, and editing a combination of PC screen video and telephone audio that occurs between a call center agent and a customer. It presents a variety of function buttons needed in creating an audio/video recording such as video recording, audio recording, video and audio recording, playback, pause, fast forward, fast rewind, book mark, annotate, and append. Each of those buttons was to be presented in one of four states: enabled, disabled, enabled and flashing, and disabled and flashing. The recorded files were stored on a network server within a relational database so access to them could be controlled by the product administrator.

The Speedbar object's implementation accessed the memory addresses of various kinds of objects such as functions, the database connection, or recording files. Those objects are grouped and stored in different network locations and so sometimes it happens that some of them are not available during various activities the Recorder performs or due to network fluctuations. It is critically important that these memory pointer variables are guarded so that their properties or methods cannot be invoked when the actual object is not available, or after it has been destroyed. The primary failure of the Speedbar object was its propensity to access null objects in the software service's process space.

Analysis
The enabled/disabled/flash state of each button on the Speedbar was established by approximately 30 variables -- the majority of which were declared to be static. An example is a Boolean value named bPlaying which stored whether or not a recording was currently being played back. A preliminary search of the files found that the variables were "assigned" values in approximately 2000 locations - across the product in several different class objects and three concurrently running modules. When the code logic was inspected which pulls together all of the variables to determine the current enabled or disabled state for each Speedbar button, it was immediately clear that the volatile nature of the various parameters provided a level of complexity that was way beyond what could be understood. The code was so complex that no person could be capable of concluding that it was ready for production. A preliminary McCabe's complexity evaluation of the mechanism that assigned the enabled/disabled decisions had a value of 57, which characterizes it as un-maintainable.

Solution
With the complexity and its nature realized it became clear that the key to the solution was to devise a way to reduce the multi-dimensional problem down to a level that could be more easily understood. I chose to reduce this multi-dimensional problem to two dimensions by reducing it to a two dimensional finite state machine. In so doing, each feasible and unique combination of variables was seen to be a theoretical node on a connected, directed graph, and the edges between the nodes represented the declared path to the next node resulting from invoking a specific action. It was common for a node to have multiple edges leaving it that corresponded to the number of button or runtime actions currently enabled. It was also common for a node to have multiple edges pointing to it that showed that several states were followed by that state if the right action was invoked. To implement the plan the following was done:
    1. Each unique and feasible combination of variable values was provided a state name.

    2. Every action capable of bumping one state to the next was provided an action or event name. Most of these were just pressing one of the buttons, and others were a preset recording length elapsed, a recording played to its end, or the connection to the recording database was lost.

    3. The names of all states discovered during analysis were listed across the X-axis of an Excel spreadsheet. The names of all actions realized in the analysis were listed on the Y-axis of the same spreadsheet. Into the row/column intersection of each state and action was written the state that was to result when the action was fired while in that state. This matrix was fairly sparse in that a large fraction of the intersection cells represented infeasible states.

    4. The matrix was sent to QA for review and then for use as the basis for their Speedbar test suite.

    5. The completed state/action matrix was copied into a header file for a new state machine class. As the state machine class was compiled the data was read into a matrix that was used to implement the logic specified in the spreadsheet.

    6. All code that involved an assignment of one of the independent variables was replaced with an action fired to the state machine.

    7. Every time an action was fired to the state machine, if the state changed, a procedure named uniquely for the resultant state would be called that would explicitly set the enabled/disabled state for every button. At project completion there were seventy-two unique states and twenty-four actions or events that would trigger the transition from one state to another.


Summary
Upon completion and insertion of the state machine the blue screens happened no more and the module's McCabe's value was 14. The context of a given enable/disable decision throughout the code was (almost) immediately obvious to an engineer, and so thereafter relatively little thought was required to properly modify the code.


No comments:

Post a Comment