Friday, April 10, 2020

But it worked fine on my machine!

Published September 6, 2009 | By Andy Kramek

How many times have you thought that? Or even worse, actually heard yourself saying it?
Come on, now, be honest!

When you hear that, in the context of an application, or some screen or function within an application, what does it really mean? basically it means that, whatever "it" is, it wasn't properly tested. however, before addressing the question of how we can we avoid finding ourselves in this situation, let's examine the question of "testing" in a little more detail. what exactly do we, as applications developers, understand by testing?

Now, first, a caveat! this is a topic on which whole books have been written (i liked "software testing-a craftsman's approach" (third edition) by paul c. jorgensen) and i am certainly not going to try and get into a comprehensive discussion of testing and testing strategies in this article. however, we can reduce the issue to a number of discrete levels at which testing of any new piece of software should be done, these are:


  • unit/module this is the initial test whose purpose is to ensure that the code does what it is supposed to do without error. this should be done at the smallest possible functionally complete item (i.e. the form or class level) and is normally, entirely, the responsibility of the developer
  • integration the purpose of integration testing is twofold; first, to ensure that the code continues to function when integrated with other components of the application and, second, that the addition of the new code does not cause problems for other components. this should be done at the lowest level of existing functionality that is directly impacted by the new code (i.e. the calling component or option) and is normally, primarily, a developer responsibility
  • regression the objective of regression testing is to ensure that all existing parts of a system continue to function to specification after the addition of some new component or sub-system. this should be carried out at the highest level of applicable functionality (i.e. the application as a whole). there are several types of regression testing of which the two most common are "sanity testing" (which checks for unexpected or bizarre behaviors) and "smoke testing" (which tests basic functionality). irrespective of implementation, the essence of regression testing is to ensure that existing functionality remains unchanged by the addition of new functionality. this is, essentially, a qa function, although smoke testing by developers should always be an essential step in the build process (i.e. to ensure that the build was successful)
  • user acceptance (uat) this is the acid test for any application or system. the objective is to put the system into a "production" environment and have "users" work on the system in as realistic a fashion as possible. often this is done in a special qa environment, sometimes it is done by actually installing the software on the client's hardware. either way, this is not normally within the purview of the developer but it is often the point at which the dreaded phrase "it worked fine on my machine" is heard from the developer

So what is that causes the "it worked fine on my machine" reaction? while there are many possible reasons for something failing to perform as expected, basically they come down to one of two things. either:


  • the functionality was only tested to ensure that it "worked" (i.e. that it did what it was designed to do under the conditions in which it was developed). this usually indicates a failure at the lowest level of testing because some condition or combination of conditions was not explicitly handled by the code
  • the data with which it was tested did not accurately represent the data against which it really had to work. this represents a failure at all levels of testing above the initial unit/module test because there is, at the end of the day, little point in testing anything unless the data with which you are working is realistic

There is nothing really that i can say, in general terms, about the first of these issues. the requirements for defensive coding, graceful and comprehensive error handling and thorough testing are well understood. unfortunately, all too often they are 'more honour'd in the breach than the observance' (hamlet, act 1, scene 4, by william shakespeare).


  • literary note: despite common usage, as with many sayings taken out of context from shakespeare's writings, the meaning is actually quite different from that which we usually take it to be. in this case, it is almost the exact opposite! hamlet is really saying that it would be more honorable to discontinue the king's practice of holding drunken parties than to go along with, and thereby condone, it.

The second, however, is something that we can address. we absolutely must ensure that we have realistic data both in terms of quality and quantity to test against. here are a couple of examples of what can happen if we don't:

Several years ago, marcia and i worked on an application (built by someone else) whose initial interface was an empty screen into which the user typed criteria for finding a client. when the search string was submitted the application displayed the standard windows "searching" animation (you know, the one where a torch moves back and forth, illuminating folders) and, after a few seconds, the screen would be populated with the result set. so what's wrong with that, i hear you ask? well, nothing really, except that, in the test data there was only one client. the fact that it took 'a few seconds' to return the result was, surely, an indication that something was amiss.

It turned out, on investigation that there was a three-second wait programmed into the code. probably it was added by the original developer to test the animation, but somehow it got left in the final version of the code that got checked into source control! had someone correlated the observed behavior with the size of the dataset this would have been picked up immediately – but no-one did.

Another example comes from a very large company that had a process that calculated vacation entitlement for each employee based on their years of service and actual vacation/leave of absence time taken in the previous 5 years. this was a complex piece of processing (involving processing the timesheet history for the entire 5-year period) but, in the development environment, it worked and gave verifiably correct answers in about 12 seconds which was, for an occasional process, considered to be acceptable. however, after a couple of years in production the same process on the live system was taking up to 45 minutes – which was not acceptable!

So what was the difference? in a word, volume!

In the test environment there just under 34,000 rows in the timesheet database but, in production there were almost 10,000,000 (the timesheet system generated two records per day for each employee. so for five years of history for 2,500 or so employees, you get around ten million rows). this is a classic example of a test environment that simply was not realistic! the 34,000 rows represented about 7 days worth of timesheet data for the 2500 employees (less than 0.35% of the production volume) – hardly a representative sample for a process that had to run against 5 years worth of data! so while the code worked acceptably with the test data, it totally failed in production when confronted with real data volumes.

The cause (not surprisingly) turned out to be poorly optimized code. using the coverage profiler in the test environment we could see that the bulk of the time (8.96 seconds out of 12 seconds - almost three-quarters of the total processing time) was spent doing replacements into three fields in the same table using one replace statement per field! to make it worse this was being done inside a unfiltered scan loop! why use multiple replace statements rather than one single replace that updated all affected fields and why an unfiltered scan instead of replace all?

Answer: probably because the programmer originally had other stuff inside the scan loop, and later changed their mind but not their code!

Replacing the scan/replace with a simple replace all produced identical results to the original – but in less than 1 second compared to the original 12 seconds. along with some other improvements to the code and the addition of proper indexes on the tables we managed to get this process down from around 45 minutes to less than 2 minutes on the live system.

So, what can we do about it. the first, and most obvious solution is to ensure that development and testing environments use copies of actual production data whenever possible. the data does not have to be 'real time' but must be current. many companies use a daily, or weekly, refresh process to update their development environments to ensure that new code can be thoroughly and realistically evaluated. however, this is not always a simple task – especially when sensitive, or confidential data is involved. while it is possible to 'scrub' data, to remove confidential or sensitive values (like replacing real social security numbers with invalid but realistically structured values, real email addresses with standard in-house addresses and so on) this has to be done properly. i once worked on a company's data that had been poorly scrubbed; all email addresses were replaced with the company's own email, all ssns with the same "888-999-1111' value, all account numbers with "0123456789" and so on. the result was actually worse than useless since you could not actually validate anything in the test data because, when all values are the same, all results are "correct" (but meaningless!).

There is another issue with using and refreshing copies of real data. this is when you have "standard" test data that must be present (used in regression tests for example) or data that must be entered through the application for some reason. it is possible to address this by ensuring that all such data is 'scripted' out so that it can be re-applied after a refresh, but this is not a satisfactory solution (apart from anything else, it's too easy for it to go wrong).

A better solution, in my opinion anyway, is to generate realistic-looking, but spurious, test data in the first place. this is not really as hard as it might appear at first glance because vfp is very good at this sort of processing. for example, one of the commonest requirements for any application is to handle names and addresses, so let's see what we can do to generate some "test data". the first thing is that we will need some metadata to use. i created, for this purpose a set of tables:


  • titles a set of standard address title and an associated gender indicator. thus "miss" is defined as "female", while "mr" is "male". combinations like "mr & mrs" are also "male", while "prof" exists twice once as male, and once as female. this is used to choose a name based on the form of address selected
  • forenames a list of forenames each of which is associated with a gender indicator. the gender indicator is used, as noted above, to associate a name with the selected title
  • surnames simply a list of possible surnames. there are no associated fields with this table it is simply a list
  • places another list of names, this table contains the list of candidates for use as the name of a street
  • streets yet another list, this time of street name suffixes (avenue, boulevard etc)
  • cystzip this is the most complex table of all since we need to ensure that city/state/zip and telephone area codes are correctly matched (otherwise we would not be able to test validation routines!). so this table contains real data from my standard zip code master table
Having got our metadata all we need is some code to generate random numbers and select the record whose record number matches as "data". a simple function returns a random number between two specified values:

function gennum (tnloval, tnhival )
local lnsel
lnsel = round( (tnhival - tnloval) * rand(), 0 ) + tnloval
lnsel = iif( lnsel <= 1, 1, lnsel )
return lnsel

Now in order to generate our data we need a target cursor:

create cursor gendata ( ;
ipsnpk integer(  4 ), ;
ctitle varchar( 10 ), ;
cfname varchar( 30 ), ;
cinit  varchar(  1 ), ;
clname varchar( 30 ), ;
csex   varchar(  1 ), ;
caddr  varchar( 30 ), ;
ccity  varchar( 20 ), ;
cstate varchar(  2 ), ;
czip   varchar(  5 ), ;
cphone varchar( 12 ), ;
dborn  date( 8 )) 

now it's a simple task to generate our data. first we grab a title (and the associated gender indicator) at random by calling the gennum() function and passing the limits for the title table:

lnsel = gennum( 1, 10 )
goto lnsel in titles
lcsex = alltrim( titles.csex )
lctitle = titles.ctitle

next we use the gender indicator to get the first name, and middle initial. (note, i set the forenames table up and by sorting the names on "gender + name", but this could easily be done in real time using queries to build cursors of names by gender and selecting from the appropriate one. since my data is static i didn't bother in this case).

*** now a suitable first name
lnsel = iif( lcsex = 'f', gennum( 1, 241 ), gennum( 242, 436 ))
goto lnsel in forenames
lcfname = forenames.cname

*** and a middle initial
lnsel = iif( lcsex = 'f', gennum( 1, 241 ), gennum( 242, 436 ))
goto lnsel in forenames
lcinit = left( forenames.cname, 1 )

We generate the surname, street name and street descriptor in precisely the same way from the surnames, places and streets tables respectively. the street number is just a random number between 1 and 9999. next we need to get a set of city/state/zip and area codes – chosen at random from the cystzip table – and then generate exchange and phone numbers as random numbers in the range 100 to 999 (for exchange) and 1000 to 9999 (for phone number) respectively. we now have all the elements of the address and phone number.

The last remaining piece is a data of birth. this is a little more tricky, but can be done in various ways – i opted for the simplest, generate a year at random between 1918 and 1989 (to give ages in the range 20 – 91) and i handled the date by only allowing days 1 through 28. obviously if date of birth was critical to your application you would need a more sophisticated algorithm to generate the dates, but for simple test data this works for me.

I didn't need social security numbers in this set, but they are just as easily generated as any other number. in this case though we should take care not to use 'valid' numbers (as of july 2009 a valid ssn cannot have an area number - the first three digits - between 734 and 749, or above 772; so providing we only generate numbers that use these "invalid" ranges we can never hit on a real ssn - even by accident)

The whole code is wrapped in a loop and so i can generate any number of records that i want simply by passing in the required number of records. on my machine, executing "do gendata with 550" generates 550 random names and addresses in about 0.5 seconds! larger data sets take longer, of course; 55,000 names takes about 5 seconds and 1,000,000 about 90 seconds! as i said, vfp is very good at this sort of processing!

Remember too that these names and addresses are "valid" in the sense that city, state and zip code are real (and match), and that the area code is correct for the zip code, but everything else is randomly generated. of course it is possible that one might occasionally hit on a "real" address (in which the randomly generated street number and name exactly match a real address in a real city/state/zip) but then chances of also generating a name that really is associated with that address, let alone the actual phone number, are infinitesimally small.

With a little bit of thought, and planning, you can develop similar routines for almost any set of data that you will ever need. do this, and you will never again run into the situation where your test data is not realistic, or adequate.

The attached zip file includes my metadata tables and the code for generating the names and address cursor. as always, please feel free to modify and improve on my stuff. just let me know what you do with it so that i can benefit too.

No comments:

Post a Comment

Writing better code (Part 1)

Writing better code (Part 1) As we all know, Visual FoxPro provides an extremely rich and varied development environment but sometimes to...