As I noted in the first article in this little series, one of the really great things about Visual FoxPro is that there are usually several different ways of doing the same thing. Unfortunately, this is also one of the worst things about it because, at least if there is only one way of doing something, you don’t have to think too much about it. One of the things that I notice whenever I am working on code (my own or someone else’s) is how easily we developers fall into patterns of doing things. Once we have figured out the way to tackle some particular type of operation we tend to stick to it unquestioningly.
I had occasion, recently, to re-visit some of the most basic commands in Visual FoxPro and to critically examine their performance. I found some of the results surprising and, in the hope that you will too, I have documented the results of my research.
How the tests were run
There are several problems when trying to assess the relative performance of Visual FoxPro commands and functions. The solution that I adopted was to set up my tests as individual functions, and then to use a random number generator to determine which test to call. By running the individual tests many thousands of times, in random sequences, the effects of caching and pre-compilation are effectively cancelled out. I used the same basic methodology for all tests and won’t bother to mention it again.
For example, the code used for Macro Expansion Test #5 (InDirect EVAL() for Object Value) was:
loObj = CREATEOBJECT( 'textbox' )
loObj.Value = "This is a test string value"
lcObj = "loObj"
lnSt = SECONDS()
FOR lnReps = 1 TO 10000
luVal = EVALUATE( lcObj + '.Value' )
NEXT
lnEn = SECONDS()
loObj.Value = "This is a test string value"
lcObj = "loObj"
lnSt = SECONDS()
FOR lnReps = 1 TO 10000
luVal = EVALUATE( lcObj + '.Value' )
NEXT
lnEn = SECONDS()
(Note: 10,000 iterations was enough to be meaningfully measurable – the result was then averaged for each test). The results of each test were written out to a table and then aggregated and averaged to get the results illustrated in this article.
Name Expressions vs Macro Substitution
One of the great strengths of Visual FoxPro is its ability to use indirection at run time. Now, all experienced Visual FoxPro developers know that using a name expression (i.e. the EVALUATE() function) is ‘faster’ than macro substitution (& operator) for doing this. But the question is, does it really matter? Seems to me that there are three basic scenarios in which we may want to use indirection when working with data:
- Use a reference to get a value: e.g. Retrieving the contents of a field using only its name
- Use a reference to address an object: e.g. Get a property value
- Use a reference to find some specific item of data: e.g. In a LOCATE command
The following table shows the results of carrying out the same operation in three different ways. First using traditional VFP macro substitution, then using the evaluate function indirectly (e.g. EVAL( lcFieldName )) and finally using a direct evaluation direct (e.g. EVAL( FIELD( nn )).
Test Name
|
Total Runs
|
Total Time
|
Avg (ms)
|
Macro Substitute for Table Value
|
107000
|
14.8230
|
0.1385
|
Indirect EVAL() for Table Value
|
142000
|
11.6680
|
0.0822
|
Direct EVAL() for Table Value
|
96000
|
6.4920
|
0.0676
|
Macro Substitute for Object Value
|
1090000
|
13.7840
|
0.0126
|
InDirect EVAL() for Object Value
|
1040000
|
8.7900
|
0.0085
|
Direct EVAL() for Object Value
|
1130000
|
9.1930
|
0.0081
|
Macro substitute in LOCATE
|
1010000
|
8.1700
|
0.0081
|
InDirect EVAL() in LOCATE
|
1140000
|
9.1250
|
0.0080
|
Direct EVAL() in LOCATE
|
1140000
|
9.1980
|
0.0081
|
The first thing to note is that there is no intrinsically slow method when considering each approach in isolation (the worst case scenario shows a difference of no more than 0.05 milliseconds between best and worst methodology – hardly detectable!). However, when being executed repeatedly (inside a tight loop for example), it is clear that EVAL() has a significant performance advantage when dealing with either data tables or objects and so the perceived wisdom that EVAL() is faster than &expansion is obviously well founded.
Interestingly there is little detectable difference between using direct and indirect evaluation and my personal preference, if only for improving readability of the code, is therefore to use the Indirect method. In other words, I will accept (the very small) performance hit to retain code like this:
lcField = FIELD( lnCnt )
luVal = EVALUATE( lcField )
luVal = EVALUATE( lcField )
in preference to the less obvious:
luVal = EVALUATE( FIELD( lnCnt ))
The main conclusion is, therefore, that there is no benefit in using macro substitution when EVAL() is available as an option. A secondary point to note is that as the number of repetitions increases the difference rapidly becomes significant and that EVAL() should always be used when processing repetitively.
Conditional Evaluation
When it comes to evaluating some condition and switching control depending on the result, VFP 9.0 has almost an embarrassment of riches. We now have four different ways of evaluating some condition ranging from the traditional IF…THEN…ELSE, through the IIF() function, the DO CASE…ENDCASE statement and finally the new ICASE() function. The question is, therefore, does it matter which we use in any given situation? The speed of execution of these commands was compared using a standardized multiple level test in the general form:
IF lnTest = 1
lnRes = 1
ELSE
IF lnTest = 2
lnRes = 2
ELSE
IF lnTest = 3
lnRes = 3
ELSE
lnRes = 4
ENDIF
ENDIF
ENDIF
lnRes = 1
ELSE
IF lnTest = 2
lnRes = 2
ELSE
IF lnTest = 3
lnRes = 3
ELSE
lnRes = 4
ENDIF
ENDIF
ENDIF
Test Name
|
Total Runs
|
Total Time
|
Avg (sec)
|
DO CASE Level 1
|
282000
|
0.6910
|
0.0025
|
DO CASE Level 2
|
242000
|
0.7600
|
0.0031
|
DO CASE Level 3
|
275000
|
1.1820
|
0.0043
|
DO CASE Otherwise
|
247000
|
0.9900
|
0.0040
|
ICASE Level 1
|
313000
|
0.4020
|
0.0013
|
ICASE Level 2
|
281000
|
0.4210
|
0.0015
|
ICASE Level 3
|
300000
|
0.5610
|
0.0019
|
ICASE Otherwise
|
302000
|
0.5300
|
0.0018
|
IF ELSE Level 1
|
203000
|
0.3100
|
0.0015
|
IF ELSE Level 2
|
235000
|
0.6430
|
0.0027
|
IF ELSE Level 3
|
213000
|
0.7610
|
0.0036
|
IF ELSE Otherwise
|
227000
|
0.8510
|
0.0037
|
IIF Level 1
|
213000
|
0.2700
|
0.0013
|
IIF Level 2
|
236000
|
0.4240
|
0.0018
|
IIF Level 3
|
209000
|
0.4200
|
0.0020
|
IIF Otherwise
|
222000
|
0.5100
|
0.0023
|
Level 2 of nested DO CASE inside DO CASE
|
67000
|
0.4300
|
0.0064
|
Level 2 of nested IF…ELSE inside IF…ELSE
|
56000
|
0.1400
|
0.0025
|
Once again the first thing to note is that nothing here is exactly slow. However there are some differences and a couple of anomalies that are worth noting.
One thing that stands out clearly is that the DO…CASE is invariably the slowest way of handling this sort of test although it is also the most readable. Perhaps oddly, the fastest way to execute a simple evaluation of this type is to use the ICASE() function – though of course this, like the IIF() is not really relevant when a block of code follows the conditional test. The main problem with both the IIF() and the ICASE() functions is, of course, the readability of the code. For example the ICASE that duplicates the IF…ELSE ladder above is:
lnRes = ICASE( lnTest = 1, 1, lnTest = 2, 2, lnTest = 3, 4 )
which is hardly comprehensible without considerable effort. Of course, the commonest scenario for this type of logic is inside an SQL Query and so speed of execution is, rightly, paramount in that case.
The anomaly I referred to above is that with all methods tested, the execution time increases as you go further down the levels – which is precisely what you would expect. After all it makes sense that it should take longer to make three tests than two, and longer for two than for one. However in both the CASE and the ICASE the ‘Otherwise’ condition is executed faster than the level which immediately precedes it, not much faster admittedly, but undeniably and consistently faster. I first noticed this phenomenon back in VFP 7.0 and it is interesting that it persists not only in the CASE statement in VFP 9.0 but also in the new ICASE. What it means, I do not know, but it is an observed phenomenon.
The conclusions here are, first, that if you can get away with it, use the functions in preference to the plain commands. Second that if you are doing this repeatedly, then an IF…ELSE ladder is likely to give you better performance than a DO CASE – especially if the nesting exceeds one level. The caveat is that the more complex the code becomes the harder it is to read and understand when using an IF…ELSE ladder or the functions. My personal preference here is to use the CASE structure as a comment in my code, but actually execute it using one of the other methods.
Looping Through Data
One of the most fundamental operations in almost every data centric application is the requirement to retrieve a set of records and them process them sequentially. In Visual FoxPro there are basically three ways of doing this, the original xBase DO WHILE command, the newer SCAN command and newest of all (though still pretty ancient J) the FOR…NEXT loop. There are, of course, many variants and variations but these three constructs provide the basis for all of them. In the next little batch of tests I evaluated the speed with which each record in a large name and address table could be visited and a specific column value retrieved (and, for the purposes of this test, immediately discarded).
Each of the three constructs was tested unconditionally (i.e. start at the first record and proceed to the last) and for a set of records meeting a specific condition (in this case for addresses in the state of Ohio).
The reason for this particular test is that it represents the two commonest scenarios. However unlike the other comparisons we have done to date, the code required to perform this operation is completely different for each option.
Unconditional | Conditional | |
DO WHILE | GO TOP DO WHILE NOT EOF() luRes = cState SKIP ENDDO | =SEEK( ‘OH’, ‘bigtable’, ‘cState’ ) DO WHILE cState == ‘OH’ AND NOT EOF() lures = cState SKIP ENDDO |
SCAN | GO TOP SCAN luRes = cState ENDSCAN | *** SCAN FOR GO TOP SCAN FOR cstate == ‘OH’ luRes = cState ENDSCAN |
*** SEEK() and SCAN WHILE =SEEK( ‘OH’, ‘bigtable’, ‘cState’ ) SCAN FOR cstate == ‘OH’ luRes = cState ENDSCAN | ||
FOR…NEXT | *** Using GOTO *** lnRex = RECCOUNT(‘bigtable’) FOR lnCnt = 1 TO lnRex GOTO (lnCnt) IF EOF() EXIT ENDIF luRes = cState NEXT | lnRex = RECCOUNT( 'bigtable' ) FOR lnCnt = 1 TO lnRex GOTO (lnCnt) IF EOF() EXIT ENDIF IF NOT cState == 'OH' LOOP ENDIF luRes = cState NEXT |
*** Using SKIP lnRex = RECCOUNT( 'bigtable' ) FOR lnCnt = 1 TO lnRex SKIP IF EOF() EXIT ENDIF luRes = cState NEXT |
The table used has 125,000 records and so is moderately large for this type of operation. The results may surprise you, they certainly did me!
Test Name
|
Total Runs
|
Max (sec)
|
Min (sec)
|
Avg (sec)
|
DO WHILE NOT EOF() with NO controlling index
|
26
|
0.6110
|
0.4510
|
0.5160769
|
SCAN...ENDSCAN with NO controlling index
|
29
|
0.3610
|
0.3310
|
0.3536667
|
FOR...NEXT with SKIP with NO controlling index
|
19
|
1.1220
|
0.4500
|
0.5746250
|
FOR...NEXT & GOTO with NO controlling index
|
26
|
0.6600
|
0.4710
|
0.5691667
|
DO WHILE NOT EOF() with controlling index
|
21
|
2.7040
|
2.4240
|
2.5595714
|
SCAN...ENDSCAN with controlling index
|
25
|
2.9640
|
2.3530
|
2.5836154
|
FOR...NEXT with SKIP with controlling index
|
9
|
3.0840
|
2.4140
|
2.6314444
|
FOR...NEXT & GOTO with controlling index
|
26
|
0.9510
|
0.4810
|
0.6214500
|
Scan FOR cState = 'OH' with NO index set
|
26
|
0.0710
|
0.0600
|
0.0680000
|
Scan FOR cState = 'OH' with index set
|
19
|
0.1510
|
0.1100
|
0.1208571
|
FOR...NEXT With test for cState = ‘OH’
|
16
|
0.6300
|
0.5410
|
0.5867500
|
SEEK() & DO WHILE cState = 'OH'
|
19
|
0.0910
|
0.0700
|
0.0777500
|
SEEK() & SCAN WHILE cState = 'OH'
|
29
|
0.0900
|
0.0700
|
0.0735000
|
There are a number of things to note from this table.
First, when processing all records in a table, the impact of a controlling index is significant if you are using any loop control that relies on the SKIP command – whether explicitly (DO…WHILE and FOR…NEXT) or implicitly (SCAN…ENDSCAN).
Second, SCAN FOR is also slower when a controlling index is set. In these tests there was an optimizable index on the column used and, even though SCAN is optimizable, the addition of the controlling index slowed the process noticeably.
Third, the controlling index does not matter when we are using SEEK() and a WHILE clause with either a SCAN or a DO WHILE. Here it makes little difference which we use as both deliver essentially the same performance and we usually need a controlling index when using a WHILE as the scope anyway.
The following conclusions can be drawn from these results:
- When processing all records. Ensure that there is no controlling index when using a command that includes a SKIP. Probably the best generalization is to use the FOR loop, with a GOTO, since it is essentially unaffected by the presence or absence of a controlling index
- The most efficient way to handle selective processing depends on whether an index suitable for use with a SEEK() exists. If it does, then use a combination of SEEK() and WHILE scope otherwise use SCAN FOR with no controlling index set on the table
Published Monday, March 13, 2006 4:43 PM by andykr
No comments:
Post a Comment