Writing Better Code (Part 2)

As I noted in the first article in this little series, one of the really great things about Visual FoxPro is that there are usually several different ways of doing the same thing. Unfortunately, this is also one of the worst things about it because, at least if there is only one way of doing something, you don’t have to think too much about it. One of the things that I notice whenever I am working on code (my own or someone else’s) is how easily we developers fall into patterns of doing things. Once we have figured out the way to tackle some particular type of operation we tend to stick to it unquestioningly.

I had occasion, recently, to re-visit some of the most basic commands in Visual FoxPro and to critically examine their performance. I found some of the results surprising and, in the hope that you will too, I have documented the results of my research.

How the tests were run

There are several problems when trying to assess the relative performance of Visual FoxPro commands and functions. The solution that I adopted was to set up my tests as individual functions, and then to use a random number generator to determine which test to call. By running the individual tests many thousands of times, in random sequences, the effects of caching and pre-compilation are effectively cancelled out. I used the same basic methodology for all tests and won’t bother to mention it again.

For example, the code used for Macro Expansion Test #5 (InDirect EVAL() for Object Value) was:

loObj = CREATEOBJECT( 'textbox' )
loObj.Value = "This is a test string value"
lcObj = "loObj"
lnSt = SECONDS()
FOR lnReps = 1 TO 10000
luVal = EVALUATE( lcObj + '.Value' )
NEXT
lnEn = SECONDS()

(Note: 10,000 iterations was enough to be meaningfully measurable – the result was then averaged for each test). The results of each test were written out to a table and then aggregated and averaged to get the results illustrated in this article.

Name Expressions vs Macro Substitution

One of the great strengths of Visual FoxPro is its ability to use indirection at run time. Now, all experienced Visual FoxPro developers know that using a name expression (i.e. the EVALUATE() function) is ‘faster’ than macro substitution (& operator) for doing this. But the question is, does it really matter? Seems to me that there are three basic scenarios in which we may want to use indirection when working with data:

Use a reference to get a value: e.g. Retrieving the contents of a field using only its name
Use a reference to address an object: e.g. Get a property value
Use a reference to find some specific item of data: e.g. In a LOCATE command

The following table shows the results of carrying out the same operation in three different ways. First using traditional VFP macro substitution, then using the evaluate function indirectly (e.g. EVAL( lcFieldName )) and finally using a direct evaluation direct (e.g. EVAL( FIELD( nn )).

Relative performance of macro expansion and name expressions

Test Name	Total Runs	Total Time	Avg (ms)
Macro Substitute for Table Value	107000	14.8230	0.1385
Indirect EVAL() for Table Value	142000	11.6680	0.0822
Direct EVAL() for Table Value	96000	6.4920	0.0676

Macro Substitute for Object Value	1090000	13.7840	0.0126
InDirect EVAL() for Object Value	1040000	8.7900	0.0085
Direct EVAL() for Object Value	1130000	9.1930	0.0081

Macro substitute in LOCATE	1010000	8.1700	0.0081
InDirect EVAL() in LOCATE	1140000	9.1250	0.0080
Direct EVAL() in LOCATE	1140000	9.1980	0.0081

The first thing to note is that there is no intrinsically slow method when considering each approach in isolation (the worst case scenario shows a difference of no more than 0.05 milliseconds between best and worst methodology – hardly detectable!). However, when being executed repeatedly (inside a tight loop for example), it is clear that EVAL() has a significant performance advantage when dealing with either data tables or objects and so the perceived wisdom that EVAL() is faster than &expansion is obviously well founded.

Interestingly there is little detectable difference between using direct and indirect evaluation and my personal preference, if only for improving readability of the code, is therefore to use the Indirect method. In other words, I will accept (the very small) performance hit to retain code like this:

lcField = FIELD( lnCnt )
luVal = EVALUATE( lcField )

in preference to the less obvious:

luVal = EVALUATE( FIELD( lnCnt ))

The main conclusion is, therefore, that there is no benefit in using macro substitution when EVAL() is available as an option. A secondary point to note is that as the number of repetitions increases the difference rapidly becomes significant and that EVAL() should always be used when processing repetitively.

Conditional Evaluation

When it comes to evaluating some condition and switching control depending on the result, VFP 9.0 has almost an embarrassment of riches. We now have four different ways of evaluating some condition ranging from the traditional IF…THEN…ELSE, through the IIF() function, the DO CASE…ENDCASE statement and finally the new ICASE() function. The question is, therefore, does it matter which we use in any given situation? The speed of execution of these commands was compared using a standardized multiple level test in the general form:

IF lnTest = 1
  lnRes = 1
ELSE
  IF lnTest = 2
    lnRes = 2
  ELSE
    IF lnTest = 3
      lnRes = 3
    ELSE
      lnRes = 4
    ENDIF
  ENDIF
ENDIF

Relative performance of conditional testing

Test Name	Total Runs	Total Time	Avg (sec)
DO CASE Level 1	282000	0.6910	0.0025
DO CASE Level 2	242000	0.7600	0.0031
DO CASE Level 3	275000	1.1820	0.0043
DO CASE Otherwise	247000	0.9900	0.0040

ICASE Level 1	313000	0.4020	0.0013
ICASE Level 2	281000	0.4210	0.0015
ICASE Level 3	300000	0.5610	0.0019
ICASE Otherwise	302000	0.5300	0.0018

IF ELSE Level 1	203000	0.3100	0.0015
IF ELSE Level 2	235000	0.6430	0.0027
IF ELSE Level 3	213000	0.7610	0.0036
IF ELSE Otherwise	227000	0.8510	0.0037

IIF Level 1	213000	0.2700	0.0013
IIF Level 2	236000	0.4240	0.0018
IIF Level 3	209000	0.4200	0.0020
IIF Otherwise	222000	0.5100	0.0023

Level 2 of nested DO CASE inside DO CASE	67000	0.4300	0.0064
Level 2 of nested IF…ELSE inside IF…ELSE	56000	0.1400	0.0025

Once again the first thing to note is that nothing here is exactly slow. However there are some differences and a couple of anomalies that are worth noting.

One thing that stands out clearly is that the DO…CASE is invariably the slowest way of handling this sort of test although it is also the most readable. Perhaps oddly, the fastest way to execute a simple evaluation of this type is to use the ICASE() function – though of course this, like the IIF() is not really relevant when a block of code follows the conditional test. The main problem with both the IIF() and the ICASE() functions is, of course, the readability of the code. For example the ICASE that duplicates the IF…ELSE ladder above is:

lnRes = ICASE( lnTest = 1, 1, lnTest = 2, 2, lnTest = 3, 4 )

which is hardly comprehensible without considerable effort. Of course, the commonest scenario for this type of logic is inside an SQL Query and so speed of execution is, rightly, paramount in that case.

The anomaly I referred to above is that with all methods tested, the execution time increases as you go further down the levels – which is precisely what you would expect. After all it makes sense that it should take longer to make three tests than two, and longer for two than for one. However in both the CASE and the ICASE the ‘Otherwise’ condition is executed faster than the level which immediately precedes it, not much faster admittedly, but undeniably and consistently faster. I first noticed this phenomenon back in VFP 7.0 and it is interesting that it persists not only in the CASE statement in VFP 9.0 but also in the new ICASE. What it means, I do not know, but it is an observed phenomenon.

The conclusions here are, first, that if you can get away with it, use the functions in preference to the plain commands. Second that if you are doing this repeatedly, then an IF…ELSE ladder is likely to give you better performance than a DO CASE – especially if the nesting exceeds one level. The caveat is that the more complex the code becomes the harder it is to read and understand when using an IF…ELSE ladder or the functions. My personal preference here is to use the CASE structure as a comment in my code, but actually execute it using one of the other methods.

Looping Through Data

One of the most fundamental operations in almost every data centric application is the requirement to retrieve a set of records and them process them sequentially. In Visual FoxPro there are basically three ways of doing this, the original xBase DO WHILE command, the newer SCAN command and newest of all (though still pretty ancient J) the FOR…NEXT loop. There are, of course, many variants and variations but these three constructs provide the basis for all of them. In the next little batch of tests I evaluated the speed with which each record in a large name and address table could be visited and a specific column value retrieved (and, for the purposes of this test, immediately discarded).

Each of the three constructs was tested unconditionally (i.e. start at the first record and proceed to the last) and for a set of records meeting a specific condition (in this case for addresses in the state of Ohio).

The reason for this particular test is that it represents the two commonest scenarios. However unlike the other comparisons we have done to date, the code required to perform this operation is completely different for each option.

	Unconditional	Conditional
DO WHILE	GO TOP DO WHILE NOT EOF() luRes = cState SKIP ENDDO	=SEEK( ‘OH’, ‘bigtable’, ‘cState’ ) DO WHILE cState == ‘OH’ AND NOT EOF() lures = cState SKIP ENDDO
SCAN	GO TOP SCAN luRes = cState ENDSCAN	*** SCAN FOR GO TOP SCAN FOR cstate == ‘OH’ luRes = cState ENDSCAN
SCAN	GO TOP SCAN luRes = cState ENDSCAN	*** SEEK() and SCAN WHILE =SEEK( ‘OH’, ‘bigtable’, ‘cState’ ) SCAN FOR cstate == ‘OH’ luRes = cState ENDSCAN
FOR…NEXT	* Using GOTO * lnRex = RECCOUNT(‘bigtable’) FOR lnCnt = 1 TO lnRex GOTO (lnCnt) IF EOF() EXIT ENDIF luRes = cState NEXT	lnRex = RECCOUNT( 'bigtable' ) FOR lnCnt = 1 TO lnRex GOTO (lnCnt) IF EOF() EXIT ENDIF IF NOT cState == 'OH' LOOP ENDIF luRes = cState NEXT
FOR…NEXT	*** Using SKIP lnRex = RECCOUNT( 'bigtable' ) FOR lnCnt = 1 TO lnRex SKIP IF EOF() EXIT ENDIF luRes = cState NEXT

The table used has 125,000 records and so is moderately large for this type of operation. The results may surprise you, they certainly did me!

Relative performance of the Looping Tests

Test Name	Total Runs	Max (sec)	Min (sec)	Avg (sec)
DO WHILE NOT EOF() with NO controlling index	26	0.6110	0.4510	0.5160769
SCAN...ENDSCAN with NO controlling index	29	0.3610	0.3310	0.3536667
FOR...NEXT with SKIP with NO controlling index	19	1.1220	0.4500	0.5746250
FOR...NEXT & GOTO with NO controlling index	26	0.6600	0.4710	0.5691667

DO WHILE NOT EOF() with controlling index	21	2.7040	2.4240	2.5595714
SCAN...ENDSCAN with controlling index	25	2.9640	2.3530	2.5836154
FOR...NEXT with SKIP with controlling index	9	3.0840	2.4140	2.6314444
FOR...NEXT & GOTO with controlling index	26	0.9510	0.4810	0.6214500

Scan FOR cState = 'OH' with NO index set	26	0.0710	0.0600	0.0680000
Scan FOR cState = 'OH' with index set	19	0.1510	0.1100	0.1208571
FOR...NEXT With test for cState = ‘OH’	16	0.6300	0.5410	0.5867500

SEEK() & DO WHILE cState = 'OH'	19	0.0910	0.0700	0.0777500
SEEK() & SCAN WHILE cState = 'OH'	29	0.0900	0.0700	0.0735000

There are a number of things to note from this table.

First, when processing all records in a table, the impact of a controlling index is significant if you are using any loop control that relies on the SKIP command – whether explicitly (DO…WHILE and FOR…NEXT) or implicitly (SCAN…ENDSCAN).

Second, SCAN FOR is also slower when a controlling index is set. In these tests there was an optimizable index on the column used and, even though SCAN is optimizable, the addition of the controlling index slowed the process noticeably.

Third, the controlling index does not matter when we are using SEEK() and a WHILE clause with either a SCAN or a DO WHILE. Here it makes little difference which we use as both deliver essentially the same performance and we usually need a controlling index when using a WHILE as the scope anyway.

The following conclusions can be drawn from these results:

When processing all records. Ensure that there is no controlling index when using a command that includes a SKIP. Probably the best generalization is to use the FOR loop, with a GOTO, since it is essentially unaffected by the presence or absence of a controlling index
The most efficient way to handle selective processing depends on whether an index suitable for use with a SEEK() exists. If it does, then use a combination of SEEK() and WHILE scope otherwise use SCAN FOR with no controlling index set on the table

Published Monday, March 13, 2006 4:43 PM by andykr

Andy Kramek VFP Blog - Mirror

Friday, April 23, 2021