Friday, April 23, 2021

Writing Better Code (Part 2)

As I noted in the first article in this little series, one of the really great things about Visual FoxPro is that there are usually several different ways of doing the same thing. Unfortunately, this is also one of the worst things about it because, at least if there is only one way of doing something, you don’t have to think too much about it. One of the things that I notice whenever I am working on code (my own or someone else’s) is how easily we developers fall into patterns of doing things. Once we have figured out the way to tackle some particular type of operation we tend to stick to it unquestioningly.


I had occasion, recently, to re-visit some of the most basic commands in Visual FoxPro and to critically examine their performance. I found some of the results surprising and, in the hope that you will too, I have documented the results of my research.

How the tests were run

There are several problems when trying to assess the relative performance of Visual FoxPro commands and functions. The solution that I adopted was to set up my tests as individual functions, and then to use a random number generator to determine which test to call. By running the individual tests many thousands of times, in random sequences, the effects of caching and pre-compilation are effectively cancelled out. I used the same basic methodology for all tests and won’t bother to mention it again.
For example, the code used for Macro Expansion Test #5  (InDirect EVAL() for Object Value) was:
loObj = CREATEOBJECT( 'textbox' )
loObj.Value = "This is a test string value"
lcObj = "loObj"
lnSt = SECONDS()
FOR lnReps = 1 TO 10000
  luVal = EVALUATE( lcObj + '.Value' )
NEXT
lnEn = SECONDS()
(Note: 10,000 iterations was enough to be meaningfully measurable – the result was then averaged for each test). The results of each test were written out to a table and then aggregated and averaged to get the results illustrated in this article.

Name Expressions vs Macro Substitution

One of the great strengths of Visual FoxPro is its ability to use indirection at run time. Now, all experienced Visual FoxPro developers know that using a name expression (i.e. the EVALUATE() function) is ‘faster’ than macro substitution (& operator) for doing this. But the question is, does it really matter? Seems to me that there are three basic scenarios in which we may want to use indirection when working with data:
  • Use a reference to get a value: e.g. Retrieving the contents of a field using only its name
  • Use a reference to address an object: e.g. Get a property value
  • Use a reference to find some specific item of data: e.g. In a LOCATE command
The following table shows the results of carrying out the same operation in three different ways. First using traditional VFP macro substitution, then using the evaluate function indirectly (e.g. EVAL( lcFieldName )) and finally using a direct evaluation direct (e.g. EVAL( FIELD( nn )). 
Relative performance of macro expansion and name expressions
Test Name
Total Runs
Total Time
Avg (ms)
Macro Substitute for Table Value
107000
14.8230
0.1385
Indirect EVAL() for Table Value
142000
11.6680
0.0822
Direct EVAL() for Table Value
96000
6.4920
0.0676




Macro Substitute for Object Value
1090000
13.7840
0.0126
InDirect EVAL() for Object Value
1040000
8.7900
0.0085
Direct EVAL() for Object Value
1130000
9.1930
0.0081




Macro substitute in LOCATE
1010000
8.1700
0.0081
InDirect EVAL() in LOCATE
1140000
9.1250
0.0080
Direct EVAL() in LOCATE
1140000
9.1980
0.0081
The first thing to note is that there is no intrinsically slow method when considering each approach in isolation (the worst case scenario shows a difference of no more than 0.05 milliseconds between best and worst methodology – hardly detectable!). However, when being executed repeatedly (inside a tight loop for example), it is clear that EVAL() has a significant performance advantage when dealing with either data tables or objects and so the perceived wisdom that EVAL() is faster than &expansion is obviously well founded.
Interestingly there is little detectable difference between using direct and indirect evaluation and my personal preference, if only for improving readability of the code, is therefore to use the Indirect method. In other words, I will accept (the very small) performance hit to retain code like this:
lcField = FIELD( lnCnt )
luVal = EVALUATE( lcField )
in preference to the less obvious:
luVal = EVALUATE( FIELD( lnCnt ))
The main conclusion is, therefore, that there is no benefit in using macro substitution when EVAL() is available as an option. A secondary point to note is that as the number of repetitions increases the difference rapidly becomes significant and that EVAL() should always be used when processing repetitively.

Conditional Evaluation

When it comes to evaluating some condition and switching control depending on the result, VFP 9.0 has almost an embarrassment of riches. We now have four different ways of evaluating some condition ranging from the traditional IF…THEN…ELSE, through the IIF() function, the DO CASE…ENDCASE statement and finally the new ICASE() function. The question is, therefore, does it matter which we use in any given situation? The speed of execution of these commands was compared using a standardized multiple level test in the general form:
IF lnTest = 1
  lnRes = 1
ELSE
  IF lnTest = 2
    lnRes = 2
  ELSE
    IF lnTest = 3 
      lnRes = 3
    ELSE
      lnRes = 4
    ENDIF
  ENDIF
ENDIF
Relative performance of conditional testing
Test Name
Total Runs
Total Time
Avg (sec)
DO CASE Level 1
 282000
0.6910
 0.0025
DO CASE Level 2
 242000
0.7600
 0.0031
DO CASE Level 3
 275000
1.1820
 0.0043
DO CASE Otherwise
 247000
0.9900
 0.0040




ICASE Level 1
 313000
0.4020
 0.0013
ICASE Level 2
 281000
0.4210
 0.0015
ICASE Level 3
 300000
0.5610
 0.0019
ICASE Otherwise
 302000
0.5300
 0.0018




IF ELSE Level 1
 203000
0.3100
 0.0015
IF ELSE Level 2
 235000
0.6430
 0.0027
IF ELSE Level 3
 213000
0.7610
 0.0036
IF ELSE Otherwise
 227000
0.8510
 0.0037




IIF Level 1
 213000
0.2700
 0.0013
IIF Level 2
 236000
0.4240
 0.0018
IIF Level 3
 209000
0.4200
 0.0020
IIF Otherwise
 222000
0.5100
 0.0023




Level 2 of nested DO CASE inside DO CASE
67000
         0.4300
0.0064
Level 2 of nested IF…ELSE inside IF…ELSE
56000
0.1400
0.0025
Once again the first thing to note is that nothing here is exactly slow. However there are some differences and a couple of anomalies that are worth noting.
One thing that stands out clearly is that the DO…CASE is invariably the slowest way of handling this sort of test although it is also the most readable. Perhaps oddly, the fastest way to execute a simple evaluation of this type is to use the ICASE() function – though of course this, like the IIF() is not really relevant when a block of code follows the conditional test. The main problem with both the IIF() and the ICASE() functions is, of course, the readability of the code. For example the ICASE that duplicates the IF…ELSE ladder above is:
lnRes = ICASE( lnTest = 1, 1, lnTest = 2, 2, lnTest = 3, 4 )
which is hardly comprehensible without considerable effort. Of course, the commonest scenario for this type of logic is inside an SQL Query and so speed of execution is, rightly, paramount in that case.
The anomaly I referred to above is that with all methods tested, the execution time increases as you go further down the levels – which is precisely what you would expect. After all it makes sense that it should take longer to make three tests than two, and longer for two than for one. However in both the CASE and the ICASE the ‘Otherwise’ condition is executed faster than the level which immediately precedes it, not much faster admittedly, but undeniably and consistently faster. I first noticed this phenomenon back in VFP 7.0 and it is interesting that it persists not only in the CASE statement in VFP 9.0 but also in the new ICASE. What it means, I do not know, but it is an observed phenomenon.
The conclusions here are, first, that if you can get away with it, use the functions in preference to the plain commands. Second that if you are doing this repeatedly, then an IF…ELSE ladder is likely to give you better performance than a DO CASE – especially if the nesting exceeds one level. The caveat is that the more complex the code becomes the harder it is to read and understand when using an IF…ELSE ladder or the functions. My personal preference here is to use the CASE structure as a comment  in my code, but actually execute it using one of the other methods.

Looping Through Data

One of the most fundamental operations in almost every data centric application is the requirement to retrieve a set of records and them process them sequentially. In Visual FoxPro there are basically three ways of doing this, the original xBase DO WHILE command, the newer SCAN command and newest of all (though still pretty ancient J) the FOR…NEXT loop. There are, of course, many variants and variations but these three constructs provide the basis for all of them. In the next little batch of tests I evaluated the speed with which each record in a large name and address table could be visited and a specific column value retrieved (and, for the purposes of this test, immediately discarded).
Each of the three constructs was tested unconditionally (i.e. start at the first record and proceed to the last) and for a set of records meeting a specific condition (in this case for addresses in the state of Ohio).
The reason for this particular test is that it represents the two commonest scenarios. However unlike the other comparisons we have done to date, the code required to perform this operation is completely different for each option.

UnconditionalConditional
DO WHILEGO TOP
DO WHILE NOT EOF()

  luRes = cState

  
SKIP
ENDDO
=SEEK( ‘OH’, ‘bigtable’, ‘cState’ )
DO WHILE cState == ‘OH’ AND NOT EOF()

  
lures = cState
  
SKIP
ENDDO
SCANGO TOP
SCAN

  
luRes = cState
ENDSCAN
*** SCAN FOR
GO TOP
SCAN FOR cstate == ‘OH’

  
luRes = cState
ENDSCAN
*** SEEK() and SCAN WHILE
=SEEK( ‘OH’, ‘bigtable’, ‘cState’ )
SCAN FOR cstate == ‘OH’

  
luRes = cState
ENDSCAN
FOR…NEXT*** Using GOTO ***
lnRex = RECCOUNT(‘bigtable’)
FOR lnCnt = 1 TO lnRex

 
  GOTO (lnCnt)
 
 IF EOF()
    
EXIT
  
ENDIF
  
luRes = cState
NEXT
lnRex = RECCOUNT( 'bigtable' )
FOR lnCnt = 1 TO lnRex

  
GOTO (lnCnt)
  
IF EOF()
    
EXIT
  
ENDIF
  
IF NOT cState == 'OH'
    
LOOP
  
ENDIF
  
luRes = cState
NEXT
*** Using SKIP
lnRex = RECCOUNT( 'bigtable' )
FOR lnCnt = 1 TO lnRex

  
SKIP
  
IF EOF()
    
EXIT
  
ENDIF
  
luRes = cState
NEXT
 The table used has 125,000 records and so is moderately large for this type of operation. The results may surprise you, they certainly did me!
Relative performance of the Looping Tests
Test Name
Total Runs
Max (sec)
Min (sec)
Avg (sec)
DO WHILE NOT EOF() with NO controlling index
26
0.6110
0.4510
0.5160769
SCAN...ENDSCAN with NO controlling index
29
0.3610
0.3310
0.3536667
FOR...NEXT with SKIP with NO controlling index
19
1.1220
0.4500
0.5746250
FOR...NEXT & GOTO with NO controlling index
26
0.6600
0.4710
0.5691667





DO WHILE NOT EOF() with controlling index
21
2.7040
2.4240
2.5595714
SCAN...ENDSCAN with controlling index
25
2.9640
2.3530
2.5836154
FOR...NEXT with SKIP with controlling index
9
3.0840
2.4140
2.6314444
FOR...NEXT & GOTO with controlling index
26
0.9510
0.4810
0.6214500





Scan FOR cState = 'OH' with NO index set
26
0.0710
0.0600
0.0680000
Scan FOR cState = 'OH' with index set
19
0.1510
0.1100
0.1208571
FOR...NEXT With test for cState = ‘OH’
16
0.6300
0.5410
0.5867500





SEEK() & DO WHILE cState = 'OH'
19
0.0910
0.0700
0.0777500
SEEK() & SCAN WHILE cState = 'OH'
29
0.0900
0.0700
0.0735000
There are a number of things to note from this table.
First, when processing all records in a table, the impact of a controlling index is significant if you are using any loop control that relies on the SKIP command – whether explicitly (DO…WHILE and FOR…NEXT) or implicitly (SCAN…ENDSCAN).
Second, SCAN FOR is also slower when a controlling index is set. In these tests there was an optimizable index on the column used and, even though SCAN is optimizable, the addition of the controlling index slowed the process noticeably.
Third, the controlling index does not matter when we are using SEEK() and a WHILE clause with either a SCAN or a DO WHILE. Here it makes little difference which we use as both deliver essentially the same performance and we usually need a controlling index when using a WHILE as the scope anyway.
The following conclusions can be drawn from these results:
  • When processing all records. Ensure that there is no controlling index when using a command that includes a SKIP. Probably the best generalization is to use the FOR loop, with a GOTO, since it is essentially unaffected by the presence or absence of a controlling index
  • The most efficient way to handle selective processing depends on whether an index suitable for use with a SEEK() exists.  If it does, then use a combination of SEEK() and WHILE scope otherwise use SCAN FOR with no controlling index set on the table
Published Monday, March 13, 2006 4:43 PM by andykr

No comments:

Post a Comment

Writing better code (Part 1)

Writing better code (Part 1) As we all know, Visual FoxPro provides an extremely rich and varied development environment but sometimes to...