Accessing files, how hard can it be?

If there were no resource files, programming would be easy. You could just directly access any thing you need by its variable name. Unfortunately in the real world, you sometimes need to load stuff from files. This seemingly simple task is in fact very complicated.

When confronted with this problem for the first time the developer will usually do something like this:

fp = fopen("datadir/datafile.dat", "r");

Which kinda works, sometimes. Unfortunately this is just wrong on so many levels. It assumes that the program binary is run in the root of your source dir. If it is not, the path “datadir/datafile.dat” is invalid. (You are of course keeping your build directory completely separate from your source dir, right?) After diving into LSB specs and config.h autogeneration, the fearless programmer might come up with something like this:

fp = fopen(INSTALL_PREFIX "/datadir/datafile.dat", "r");

Which works. Sometimes. The main downside being that you have to run “make install” always before running the binary. Otherwise the data files in INSTALL_PREFIX may be stale and cause you endless debugging headaches. It also does not work if the binary is installed to a different directory than given in INSTALL_PREFIX. The platforms that can do this are Windows, OSX and, yes, Linux (though, to be honest, no-one really does that).

Usually the next step is to change the binary to take a command line argument specifying where its data files are. Then a wrapper script is created that determines where the binary currently lies, constructs the argument and starts the binary.

This also works. Sometimes. Unfortunately there is no portable scripting language that works on all major platforms so there need to be several scripts. Also, what happens if you want to run the binary under gdb? You can’t run the script under gdb, and the binary itself won’t work without the script. The only choice is to code custom support for gdb in the script itself. Simply invoking gdb reliably is hard. The commands to run are completely different depending on whether you are using Libtool or not and have installed the binary or not. If you want to run it under Valgrind, it needs custom support as well. The wrapper script will balloon into a rat’s nest of hacks and ifs very, very quickly.

Before going further, let’s list all the different requirements for file access. The binary would need to access its own files:

  • in the source tree when the binary is compiled in-source
  • in the source tree when the binary is compiled out-of-source
  • in the install directory when installed
  • in a custom, user specified directory (overriding all other options)
  • without wrapper scripts
  • in Unix, OSX and Windows

There are many ways to do this. A recommended exercise is to try to think up your own solution before going further.

The approach I have used is based on an environment variable, say MYPROG_PREFIX. Then we get something like this in pseudocode:

open_datafile(file_name) {
  if(envvar_set("MYPROG_PREFIX"))
    return fopen(envvar_value("MYPROG_PREFIX") + file_name);
  return platform_specific_open(file_name);
}

// One of these is #ifdeffed to platform_specific_open.

open_datafile_unix(file_name) {
  return fopen(INSTALL_PREFIX + file_name);
}

open_datafile_windows(file_name) {
  // Win apps usually lump stuff in one dir and
  // cwd is changed to that on startup.
  // I think. It's been a long time since I did win32.
  return fopen(file_name);
}

open_datafile_osx(file_name) {
  // This is for accessing files in bundles.
  // Unfortunately I don't know how to do that,
  // so this function is empty.
}

During development, the environment variable is set to point to the source dir. This is simple to do in any IDE. Vi users need to do their own thing, but they are used to it by now. ;-) The end user does not have this variable set, so their apps will load from the install directory.

The one remaining issue is Unix where the binary is relocated to somewhere else than the install location. A simple approach that comes to mind is to dynamically query the location of the current binary, and then just do CURRENT_BINARY_DIR + “../share/mystuff/datafile.dat”.

Unfortunately Posix does not provide a portable way to ask where the currently executing binary is. For added difficulty, suppose that your installed thing is not a binary but a shared library. It may lie in a completely different prefix than the binary that uses it and thus the app binary’s location is useless. I seem to recall that the Autopackage people had code to work around this but their website seems to be down ATM so I can’t link to it.