Thursday, August 18, 2005

Getting (possibly) 500% speed gain on divisions

I've moved my blog and it's post to my new blog, please go to Getting (possibly) 500% speed gain on divisions on Landman Code.One of my fellow students once made a remark about the old days when a division was much faster if it was written as multiplication.

This means, instead of devising with a 100, you multiply with 0.01 (1/100). Because some programs of mine have a lot of divisions in their core loops, I investigated the difference.

I created the following test program.

program DivVsMult
 
{$APPTYPE CONSOLE} 
 
uses 
SysUtils
Windows
 
var 
Start,Stop, Start2, Stop2, Freq:int64
i : integer
t : real
CpuSpeed : integer
begin 
{ TODO -oUser -cConsole Main : Insert code here } 
{ Cpu Speed fastes cpu = 1 slower => 10 
it's just to determin the number of time to do the loop 
Maxint div CpuSpeed is calculated } 
if ParamCount = 1 then 
CpuSpeed := StrToIntDef(ParamStr(1),1
else 
CpuSpeed := 10
Writeln('Simple Number division:'); 
Writeln('Calculating'); 
QueryPerformanceFrequency(freq); 
QueryPerformanceCounter(Start); 
for i:=0 to MaxInt div CpuSpeed do 
t := i * (1/100); 
QueryPerformanceCounter(Stop); 
Writeln(Format('First Pass Result: %f',[t])); 
{ This is needed because the compiler would optimize, 
and would notice the result of the loop isn't used at all, 
so therefor the result is useless.. so depending on the compiler, it will 
choose what to do with it, this disables that optimization } 
QueryPerformanceCounter(Start2); 
for i:=0 to MaxInt div CpuSpeed do 
t := i * (1/100); 
QueryPerformanceCounter(Stop2); 
Writeln(Format('Second Pass Result: %15.6f',[t])); 
{ This is needed because the compiler would optimize, 
and would notice the result of the loop isn't used at all, 
so therefor the result is useless.. so depending on the compiler, it will 
choose what to do with it, this disables that optimization } 
Writeln('Done, Results:'); 
Writeln(Format('/ 100 Time: %6.4f seconds'+#13#10
'/ 100 Clock: %d ticks'+#13#10
'* 0.01 Time: %6.4f seconds'+#13#10
'* 0.01 Clock: %d ticks',[(Stop-Start) / freq, , (Stop2-Start2)])); 
Writeln
Writeln('Odd Number division:'); 
QueryPerformanceCounter(Start); 
for i:=0 to high(i) div CpuSpeed do 
t := i / 556
QueryPerformanceCounter(Stop); 
Writeln(Format('First Pass Result: %15.6f',[t])); 
{ This is needed because the compiler would optimize, 
and would notice the result of the loop isn't used at all, 
so therefor the result is useless.. so depending on the compiler, it will 
choose what to do with it, this disables that optimization } 
QueryPerformanceCounter(Start2); 
for i:=0 to high(i) div CpuSpeed do 
t := i * (1/556); 
QueryPerformanceCounter(Stop2); 
Writeln(Format('Second Pass Result: %15.6f',[t])); 
Writeln(Format('/ 556 Time: %6.4f seconds'+#13#10
'/ 556 Clock: %d ticks'+#13#10
'* (1/556) Time: %6.4f seconds'+#13#10
'* (1/556) Clock: %d ticks',[(Stop-Start) / freq, (Stop-Start), (Stop2-Start2) / freq, (Stop2-Start2)])); 
Writeln(Format(' (1/556) = %15.14f (approximate)',[1/556])); 
// Readln; 
 
end
 

On an old P3 900Mhz:

Simple Number division:
Calculating
First Pass Result: 2147483,64
Second Pass Result: 2147483,64
Done, Results:
/ 100 Time: 10,3319 seconds
/ 100 Clock: 36983482 ticks
* 0.01 Time: 2,0378 seconds
* 0.01 Clock: 7294251 ticks

Odd Number division:
First Pass Result: 386238,0647482014610000
Second Pass Result: 386238,0647482014610000
Done, Results:
/ 556 Time: 10,0735 seconds
/ 556 Clock: 36058581 ticks
* (1/556) Time: 2,0446 seconds
* (1/556) Clock: 7318775 ticks
(1/556) = 0,0017985611510791 (approximate)

On a new P4 2.3 Ghz:

Simple Number division:
Calculating
First Pass Result: 2147483.64
Second Pass Result: 2147483.64
Done, Results:
/ 100 Time: 4.6227 seconds
/ 100 Clock: 16547055 ticks
* 0.01 Time: 1.0782 seconds
* 0.01 Clock: 3859508 ticks

Odd Number division:
First Pass Result: 386238.064748
Second Pass Result: 386238.064748
Done, Results:
/ 556 Time: 4.5820 seconds
/ 556 Clock: 16401425 ticks
* (1/556) Time: 12.1746 seconds
* (1/556) Clock: 43579366 ticks
(1/556) = 0.00179856115108 (approximate)

The results are variating, on simple numbers like 0.01 the speedup is allways working, but somehow the very complex numbers tend to be slower sometimes.

I use this tip allot when working with percentage.