Friday, April 21, 2006

Fast reading of files using Memory Mapping

I've moved my blog and it's post to my new blog, please go to Fast reading of files using Memory Mapping on Landman Code. It has been six months since I last posted something. Lets just say things got a little busy J. And posting source code on Blogspot seemed to be a bitch because blogspot would filter out the enters. I solved that in the previous post by using an <br /> as an enter. But when copying and pasting from the page the newlines were lost (offcourse DelForExp fixes that.. but still it sucked).

Now I have just a little bit of time, and a few articles I wanted to post. So after some testing I found out blogspot fixed the enter removal and now I’ll try to post more frequently.

Now let’s get ontopic, Memory Mapped Files can be very helpful for reading large files. Looking through the internet you can find many advantages and disadvantages. The important thing is, think about what your doing, MMF can be very fast in one application. But slow in an other, it all depends on the situation, there are enough articles about the subject (for instance this one by the Delphi Compiler Team)

I like MMF a lot when using binary files of a certain format. Let’s assume we have the following file format:

TCustomerStruct = packed record
    CustomerID: Longword;
    CustomerName: array[0..254] of Char;
    CustomerBirthDay: TDateTime;
    CustomerRate: Double;
    AccountManagerID: Longword;
  end;

You could read this using BlockRead:

var
  CustomerFile: file of TCustomerStruct;
  Customers: array of TCustomerStruct;
  i : integer;
begin
  AssignFile(CustomerFile,'c:\customers.cus');
  try
    Reset(CustomerFile); // open the file for reading
    SetLength(Customers, FileSize(CustomerFile)); // create the array
    BlockRead(CustomerFile, Customers, Length(Customers));  // Read the hole party in to the array
    for i := 0 to High(Customers) do
    // List all the customers in a memo
      memCustomerList.Lines.Add('Name: '+ Customers[i].CustomerName);
  finally
    CloseFile(CustomerFile);
  end;

And now using MemoryMapping:

type
  TCustomerStructArray = array[0..MaxInt div SizeOf(TCustomerStruct) - 1] of TCustomerStruct;
  PCustomerStructArray = ^TCustomerStructArray;
var
  CustomerFile : TMappedFile;
  Customers: PCustomerStructArray;
  i : integer;
begin
  CustomerFile := TMappedFile.Create;
  try
    CustomerFile.MapFile('c:\customers.cus');
    Customers := PCustomerStructArray(CustomerFile.Content); // not needed, but handy
    for i := 0 to CustomerFile.Size div SizeOf(TCustomerStruct) -1 do
      memCustomerList.Lines.Add('Name: '+ Customers[i].CustomerName);
  finally
    CustomerFile.Free;
  end;

The MaxInt div SizeOf(TCustomerStruct) – 1 is the maximum amount of records (thus memory) loaded at once.

The TMappedFile class is something I created myself so I can be lazy. Off course I will share that piece of code too.

unit unFileMapping;
{
Copyright (c) 2005-2006 by Davy Landman

See the file COPYING.FPC, included in this distribution,
for details about the copyright. Alternately, you may use this source under the provisions of MPL v1.x or later

This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
}

interface
uses
  Windows, SysUtils;
type
  TMappedFile = class
  private
    FMapping: THandle;
    FContent: Pointer;
    FSize: Integer;
    procedure MapFile(const AFileName: WideString);
  public
    constructor Create(const AFileName: WideString);
    destructor Destroy; override;
    property Content: Pointer read FContent;
    property Size: Integer read FSize;
  end;

implementation

function FileExistsLongFileNames(const FileName: WideString): Boolean;
begin
  if Length(FileName) < 2 then
  begin
    Result := False;
    Exit;
  end;
  if CompareMem(@FileName[1], @WideString('\\')[1], 2) then
    Result := (GetFileAttributesW(PWideChar(FileName)) and FILE_ATTRIBUTE_DIRECTORY = 0)
  else
    Result := (GetFileAttributesW(PWideChar(WideString('\\?\' + FileName))) and FILE_ATTRIBUTE_DIRECTORY = 0)
end;

{ TMappedFile }



constructor TMappedFile.Create(const AFileName: WideString);
begin
  inherited Create;
  if FileExistsLongFileNames(AFileName) then
    MapFile(AFileName)
  else
    raise Exception.Create('File "' + AFileName + '" does not exists.');
end;

destructor TMappedFile.Destroy;
begin
  if Assigned(FContent) then
  begin
    UnmapViewOfFile(FContent);
    CloseHandle(FMapping);
  end;
  inherited;
end;

procedure TMappedFile.MapFile(const AFileName: WideString);
var
  FileHandle: THandle;
begin
  if CompareMem(@(AFileName[1]), @('\\'[1]), 2) then
    { Allready an UNC path }
    FileHandle := CreateFileW(PWideChar(AFileName), GENERIC_READ, FILE_SHARE_READ or
      FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0)
  else
    FileHandle := CreateFileW(PWideChar(WideString('\\?\' + AFileName)), GENERIC_READ, FILE_SHARE_READ or
      FILE_SHARE_WRITE, nil, OPEN_EXISTING, FILE_ATTRIBUTE_NORMAL, 0);
  if FileHandle <> 0 then
  try
    FSize := GetFileSize(FileHandle, nil);
    if FSize <> 0 then
    begin
      FMapping := CreateFileMappingW(FileHandle, nil, PAGE_READONLY, 0, 0, nil);
      //Win32Check(FMapping <> 0);
    end;
  finally
    CloseHandle(FileHandle);
  end;
  if FSize = 0 then
    FContent := nil
  else
    FContent := MapViewOfFile(FMapping, FILE_MAP_READ, 0, 0, 0);
  //Win32Check(FContent <> nil);
end;

end.

The big advantage is, that with BlockRead you can either read the whole content of the file in the array, or buffering the file in blocks. With MMF there is no need to worry about it (unless you get very big files), Windows automatically arranges the memory when requested.