Onderweg Blog

Retro file formats: dBase files with custom indexes

2023-02-27T09:00:00+01:00

In my newly found interest in retro file formats, I turned my attention to the good old dBase format.

Quoting from Wikipedia:

dBase (also stylized dBASE) was one of the first database management systems for microcomputers and the most successful in its day.

I’m old enough to remember working with dBase files on a MS-DOS computer. A typical use case for dBase at the time on a personal computer was to store a personal address book in a database. There were no iCloud backups or cheap reliable storage media back then. You’d put your database, worth hours of manual labour, on a diskette and prayed you would not encounter disk read errors one day.

Anyway. What was nice about dBase is that it was relatively easy to create forms to enter data. Also, it included a programming language, ideal for quick data manipulation.

The dbf format

The actual data was stored in .dbf files. A .dbf file contains field definitions (names an types of fields in de database), followed by the actual data. The data is stored in fixed length records. Records are what we would refer to today as “rows” in SQL terminology, and fields are equivalent to columns.

While being being essentially an obsolete file format for most use-cases, dBase files are still widely used as part of the ESRI Shapefile format. The Shapefile format is a geospatial data format. While called “shapefile”, data is actually stored in a collection of files. One of those files is a dBase file, which is being used to store columnar attributes for each shape stored in .shp file.

Libraries for dBase files today

Although the dBase file structure is not very complicated, it is convenient that there are still libraries available today in many languages to read and write dBase files. For example, the code for reading/writing dBase files included in PHP 3.0 and in many other projects, originally created by Brad Eacker in 1993, is still a good starting point for handling dBase files in your own C code. Another example is the C implementation in Shapelib.

One of the main reasons that dBase libraries are not hard to find in these post-dBase times, is that dBase files are abundant in the world of GIS software, thanks to ESRI shapefiles. Many open source libraries that handle geospatial data contain code for handling dBase files, as part of their Shapefile implementation.

(By the way, the fact that dBase is still a widely used format, while not well suited for modern applications, is a situation that not everyone is happy with.)

In general, dBase libraries can handle only a subset of .dbf files that you’ll encounter in the wild. That is because the most complicated part of the dBase format is undoubtedly the many versions and variants of the format. Other database software products like FoxPro and Clipper used to dBase file format as well, but with their own additions. For example by adding support for additional field types. Also, the dBase product evolved itself, ending with the dBASE 7 format. Currently though, III+–V is the most common dBASE file format found in the wild. So this is the the format most libraries will handle without issues.

The large amount of dBase variants becomes more of a problem when working with indexes.

The jungle of dBase indexes

.dbf files are the backbone of a dBase database. But dBase actually defines many types of files. Just like Shapefiles, a dBase database can be a collection of files.

Widespread amongst these companion files are .dbt files, which are used for memo fields. These are fields that can contain more characters than the character max 254 characters of character fields.

Another important dBase file type are indexes. They are generally stored as a B+ tree in a separate file. Originally, dBase used .ndx for single indexes, and .mdx for multiple indexes. But of course, also for indexes, many new variants came, and stil exist in the wild.

For example, Visual FoxPro supports structural compound index (.cdx), nonstructural compound index (.cdx) files, and standalone index (.idx) files. Clipper uses .ntx files. It is similar to an .ndx file, but allows for longer search key expressions and stores data in ASCII format. In Shapfiles, .ain and .aih attribute indexes are used. In ArcGIS 8 and later however, .atx attribute indexes are used. GDAL/OGR 1.7 is using .ind and .idm attribute indexes for Shapefiles, which are not compatible with other GIS software.

As you can see, it’s a jungle out there. Figuring out which legacy index format to use when creating a new dBase file is daunting, just like finding libraries or format descriptions to read and write all these kinds of formats. And once you know what index format to use, and how it should be structured, it turns out that writing code for creating those indexes is really complicated (reading is a easier, but still not straightforward). For that reason probably, many software packages that can read .dbf files ignore indexes altogether.

While experimenting with handling dBase files in C, I was a bit stuck on how to handle indexing, for the above reasons. But then a pragmatic approach came to mind: forget legacy dBase index formats and just go for the most convenient, but still efficient, way of storing an index available today.

Using file-based key/value stores for dBase indexes

An index is basically nothing more than a way to quickly map a key to a value. In this case, the key being the field to index in the database, and the value the record number. Because records in a .dbf file have a fixed length, once you now the record number, you can directly access it.

A while ago, I discoverd the existance of file-based key/value databases, like Tokyo cabinet, its successor tkrzw and constant databases. I’m a big fan of the concept, but finding every-day uses cases for personal projects is challenging. But I might have found at least one now: use it as an index for a dBase file. Because these key/value stores are implemented as a file hash or file B+ tree database, they can act perfectly as an index. The same goes for constant databaes.

All of the above key/value engines have C libraries available, often with bindings for other languages as well. So they can be used in any modern project relatively easy.

Constant databases for indexes

CDBs, short for “constant database”, are awesome. They are basically a very reliable and fast on-disk associative array, mapping keys to values. Why “constant”? Because you create them once, after that they are read only. To add/change entries, you must recreate the database. Not per se a problem when used for indexes: an index can be rebuild when needed. Rebuilding is not as fast as reading, but still fast enough for most use cases.

The use case for that would be what in the dBase world is called a “non-production” index. Production indexes in dBase are automatically opened and kept up to date whenever a table is opened. Non-production .mdx files and .ndx files must be explicitly opened to be updated.

Below an example in PHP, using PHPs build in DBA extension and the PECL dBase extension. It shows a simply way to build a custom index for a dBase file in a CDB database:

// Create `.dbf` file
$def = array(
    array("name",     "C",  50),
    array("age",      "N",   3, 0),
    array("email",    "C", 128)
);
$dbf = dbase_create('/tmp/test.dbf', $def);
if (!$dbf) {
    echo "Error, can't create the database\n";
}

// Add example records to .dbf
dbase_add_record($dbf, ['Maxim Topolov', '23', 'max@example.com']);
dbase_add_record($dbf, ['Leo Bakker', '45', 'leo@example.com']);
dbase_add_record($dbf, ['Bear Voxny', '23', 'bear@example.com']);
dbase_add_record($dbf, ['Qudo Malek', '21', 'qudo@example.com']);
dbase_add_record($dbf, ['Pavlo Nyrola', '34', 'pavlo@example.com']);

// Create index for the 'email' field in a constant database
$cdb = dba_open("/tmp/test.cdb", "n", "cdb_make");
$num = dbase_numrecords($dbf);
for ($i = 1; $i <= $num; $i++) {
    $rec = dbase_get_record($dbf, $i);
    $key = trim($rec[2]);
    dba_insert($key, (string)$i, $cdb);
}

// Cleanup
dba_close($cdb);
dbase_close($dbf);

DBMs (or “Berkeley DB style databases”) for indexes

A constant database is by far the fastest solution for lookups, but updating the index requires a rebuild. So if you need to update the index often, you can also resort to a DBM, or “Berkeley DB style databases” as they are called in PHP docs. Berkeley DB was one of the first in it’s genre of embedded databases for key/value data. Berkeley DB itself is not around anymore (not maintained). But its legacy is lasting. There are many modern variants, such as GNU dbm (C), TKRZW (C/C++), and for Go: BadgerDB and BoltDB amongst many others.

TKRZW is my favorite of these, one of the reasons is that it has a nice C interface. For example, creating a TKRZW database and adding records in C requires a minimum amount of code:

TkrzwDBM *dbm = tkrzw_dbm_open("index.tkh", true, "truncate=false,num_buckets=100");
if (dbm == NULL)
{
    printf("Failure while opening database\n");
    printf("Last status message: %s\n", tkrzw_get_last_status_message());
    exit(EXIT_FAILURE);
}

// Add records.
tkrzw_dbm_set(dbm, "foo", -1, "hop", -1, true);
tkrzw_dbm_set(dbm, "bar", -1, "step", -1, true);
tkrzw_dbm_set(dbm, "baz", -1, "jump", -1, true);

By adding a loop through dBase records, like in the PHP example above, an index of one or more columns can be created.

Conclusion

Stepping away from index formats supported by other software that handles dBase files comes with the price of lack interoperability. Other software won’t be able to read those custom indexes. But for personal use, that’s not a problem at all, since I’m only user.

Altough you lose interoperability, it can easily beat struggling with old dBase index formats. Especially since there is no guarantee whatsover that even if you do pick an existing format, the index you create is interchangeable between different software products. There is no single standard with broad support for dBase indexes afterall.

A unanswered question you might have still about all this is: why would someone in 2023 work with the ancient dBase format in the first place? And the answer is the same as for many hobby projects: because why not! Why would you use something mundane as SQLite, when you can keep alive a 40 year old database format with no unicode support?

Reviving The Guide: dusting off an old binary file format

2022-10-28T10:00:00+02:00

Quite a while ago, in the 2010s while I was still a Windows user, The Guide was my absolute favorite note taking application.

The concept of The Guide is simple as it is brilliant. Notes are stored in a single file (guide). The notebook has a tree structure, every node can have (rich) text associated with it. Nodes can be assigned individual icons and colours. The tree can be of arbitrary depth. That’s it.

The official site describes the concept as follows:

the Guide is a two-pane extrinsic outliner. This concept is similar to mindmapping in some ways.

When I moved from Windows to macOS however, my glorius time with The Guide was over. It is Windows only, so I was forced to switch to different note taking approaches.

I went for notes as separate Markdown files, ordered in a file system folders. Although different from my beloved Guide tree, this approach also works well for me. What I most like about it is that it’s portable, available everywhere when used in combination with for example Dropbox or Google drive, and completely without vendor-lock in. I can use any markdown editor to edit my notes, on desktop or on mobile.

But still. At times I miss the build-in tree based one-file approach from The Guide in all its simplicity. And I also sometimes would like to browse through my old notes. These are stored in .gde files, a binary format specific to The Guide. The only way to open them on Mac is by running The Guide in a virtual Windows machine, or in Wine. That’s not ideal. But also not very fun. Reading those files programmatically, so that I can do whatever I want with the data, is what’s I’m really after.

Using `libguide` in C

In order to programatically read my old .gde files on Mac without the Windows application, I first checked whether The Guide source code was available (for some reason I never checked that before, or I forgot). To my great joy, the source is (still) available at sourceforge. But even better, the part that handles reading/writing .gde, is nicely packaged in a stand-alone C library: libguide (compiled as DLL). libguide can be used completely independent from the main GUI application (Guide) that was written in C++.

Next step would be to call libguide functions from my own code, and experiment with reading my old .gde files. It became apparent soon however, that libguide would not run as-is on my Mac, because of several Windows specific API calls in the code:

CreateFileMapping,MapViewOfFileUse are used to create memory mapped files when reading gde files.
MultiByteToWideChar, WideCharToMultiByte are used for unicode conversion.

Luckily, these functions can be relatively easy replaced by Posix variants, like mmap and mbsrtowcs. So I did.

After replacing Windows specific functions with Posix ones and replacing the Visual Studio project by a Makefile, the library would compile. But, actually reading a guide file caused a segfault.

Back to the drawing board.

Pointer size problems

It took me a while to figure out that the segfault cause was related to the difference in pointer size between 32-bit and 64-bit architectures.

See, on a 32-bit architecture (my old Windows machine) the pointer size is 4 bytes, on my current 64-bit machine (macOS) pointer size is 8 bytes. Not necessary an issue. But it becomes a problem when code explicitly relies on a specific pointer size, as turned out to be the case in libguide.

While reading a file fro memory, libguide uses “fake pointers”, to store unique IDs for nodes. A fake pointer here, is a pointer value read does not point to a real memory address. The value of the pointer interpreted as an uint32 value.

Fake pointers are being used in libguide to store ID values of nodes. The small code fragment below, shows how IDs are being read from a memory mapped region that contains a gde file:

// This fragment read the id and parent id of a node
// char *p points to memory mapped area with gde data
*fake_node_ptr =  ((struct tree_node_t **)p)[0]; 
*fake_parent_ptr = ((struct tree_node_t **)p)[1];
p += 2 * sizeof(struct tree_node_t *);

Because the number of bytes read from from the memory mapped area pdepends on the pointer size of the reading machine, things go wrong when reading a file created on a 32-bit architecture by a 64-bit machine.

Writing has the same problem:

static int _guide_storer_fn(struct tree_node_t *node, void *memory_mapped_data)
{
	struct tree_node_t *parent = tree_get_parent(node);
	struct guide_nodedata_t *data = 
		(struct guide_nodedata_t *)tree_get_data(node);
  FILE *fp = (FILE *)memory_mapped_data;
  
	// write node_id
	fwrite(&node, 1, sizeof(node), fp); // <- sizeof(node) depends on architecture
	// parent_node_id
	fwrite(&parent,1, sizeof(parent), fp);
  
  // ... read rest of the data

I fixed this by always reading and writing uint32 values for node IDs, not relying on machine pointer sizes anymore:

fwrite(&node, 1, sizeof(uint32), fp);	
fwrite(&parent, 1, sizeof(uint32), fp);

This worked. Although for me it’s still an open question if there are situations where a node id won’t fit in a uint32.

Anyway, after these changes and some testing, I had a working cross-platform C library, that can read my old guide files on my Mac (no guarantees about yours, maybe it wil eat them). Github repro.

Multi-language parsing with Kaitai struct

This was all really nice. But what if I would want to read/write .gde files in Go or Swift or whatever other language? Of course one can bridge the C code to other languages. Functions in a C library can be called by almost any language if there is some kind of “bridge” in-between. Often this is a foreign function interface (FFI) . For example, Go has the "C" package, for node there is node-ffi, etc.

But developing bridge code for multiple languages is repetitive work. And also, calling unmanaged C from managed languages like Go and Swift, has all kind of drawbacks. Ideally, instead of bridging C, these languages would have a native gde parsers. Of course, creating those sounds like a lot of work. So, what if generating native parsers for multiple languages would be an automated process?

A while ago I discovered Kaitai struct, which immediately appealed to me, but I could not think of a use case for me at the time.

Kaitai Struct is a declarative language (using YAML syntax), which can be used to describe binary formats. Once you have a description of a format, you can use Kaitai Struct to generate parsers for it in multiple languages.

The main idea is that a particular format is described in Kaitai Struct language (.ksy file) and then can be compiled with ksc into source files in one of the supported programming languages. These modules will include a generated code for a parser that can read the described data structure from a file or stream and give access to it in a nice, easy-to-comprehend API.

Example of what a Kaitai Struct format description looks like:

meta:
  id: tcp_segment
  endian: be
seq:
  - id: src_port
    type: u2
  - id: dst_port
    type: u2
  - id: seq_num
    type: u4
  - id: ack_num
    type: u4

The Guide uses a binary format which has a relatively straightforward structure. Seems like an ideal use case to experiment with Kaitai Struct!

Creating a Kaitai language file for The Guide was indeed not hard. After a few iterations, I was able to parse all my old Guide files without any issues in kaitai’s web IDE. After that, it was trivial to generate parsers for major programming languages via kaitai’s command line tool.

Besides generating parsers, Kaitai can also export parsed data as JSON or XML:

$ gem install kaitai-struct-visualizer
# JSON output
$ ksdump -f json guide.gde gde32.kty
# XML output
$ ksdump -f xml guide.gde gde32.kty

The Kaitai files I created for parsing .gde files are available in the Github repro.

Conclusion

Dusting off old binary files can be fun. If you know how the format is structured, modern tools can relieve some of the effort it takes to parse them with your favourite programming language.

Although mainly an experiment, my first experience with kaitai was quite positive. Some nice to haves that will hopefully arrive in Kaitai in the future:

Ability to write (generate) files with
The Kaitai project is still very much in development. Go support for example not finished and well documented yet (but works)

You don’t have to always write your own Kaitai format descriptions by the way. For a lot of binary format there is already one available.

Using ubisys Zigbee modules with Philips Hue

2021-05-22T10:00:00+02:00

What if you want to be able to switch a light via both the existing wall switch, and the Philips Hue app and/or Homekit? You can install an in-wall home automation module behind the existing wall switch, and voila!

What is meant here with “in-wall switch”, is a module that can be installed behind an existing light switch, and makes “dumb” light smart. Connected lights can then be switched via an app, Philips Hue in my setup, or via the existing wall switch.

Zigbee is commonly used as a wireless protocol for these kind of devices. But of course such a module does not have to work with Zigbee. Besides Zigbee, wifi and z-wave are commonly used in smart lighting. And in the US, Lutron provides a very popular solution of switching dumb lights with their Caseta smart switches via their proprietary RF protocol. But these are not for the European market.

So there is a wide range of technologies available for smart switches. But I’ll focus on Zigbee and Hue compatibility here.

Hue compatible Zigbee switches

There are multiple Zigbee in-wall switches on the market that are compatible with Philips Hue.

Well known among people who are familiar with Zigbee modules, are the modules by Sunricher. Amongst others, these are sold under the iCasa and Robb brands. Because both the Philips Hue bridge and these modules use the Zigbee Light Link standard, they can be added as third party devices to the Hue bridge.

Less known are the modules from the German manufacturer ubisys. Ubisys produces really solid Zigbee modules, like dimmers and switches, that can be flush mounted.

The ubisys S1 is the module that has been installed at several places at my home.

The really nice thing about the S1, is the support for “normal” (rocker) light switches. While the Sunricher modules only work push buttons, the ubisys s1 supports standard European style toggle switches as well.

While this is common for Wifi modules, like Shelly’s, and z-wave modules, there are still not a lot of Zigbee modules that work in a non-push button configuration.

What about the Philips Hue wall switch module?

Note that the solution with Zigbee switch modules discussed here, is different than using the Philips Hue wall switch module. With the Hue wall switch modules you can switch a Hue smart bulb via your existing switch. You can’t however switch dumb lights with it.

Also it has a battery, which you need to replace… eventually. Although it is expected to last at least 5 years, replacement is a hassle given its location behind a switch. The good thing about a battery powered device is that you don’t need a neutral wire in your wall box.

Because the Hue switch is battery powered, it is a Zigbee end device, so it does not act as a Zigbee router, and won’t help with improving your Zigbee mesh network. In contrary to hard wired modules and light bulbs.

So all in all, the concept of Philips’ module is quite different from other Zigbee wall switch modules.

Personally, it feels less solid, because there is no physical connection anymore between your light and the switch. If for whatever reason the Hue bridge is not available, there is no way to switch your lights.

I can imagine though, that if you already use Hue light bulbs, especially I you use color-changing lighting, the Hue wall switch is a more appropriate solution.

About the ubisys S1

A few other things that are noticeable about the ubisys S1:

The “click” sound, audible when the module switches, is relatively loud. Louder than for example in modules from Shelly and iCasa. In my experience it’s not annoying though.
De S1 does not have screw terminals for the wires. Wires (neutral, live, switch) are attached permanently to the module. Connections are made via welding clamps.
The module is relatively small compared to other modules like iCasa, but it still can be a real challenge to fit it in a standard wall box.
Ubisys also has a dimmer variant, the D1. That one does not work with a toggle switch by default.

Ubisys S1 in Philips Hue

A really nice thing about the ubisys, is that the S1 works flawlessly with a Philips Hue bridge; and indirectly with Homekit via Homebridge.

Adding the S1 in the Hue app works the same as with Philips Hue accessories. After a search for new lights via the Hue app, the S1 appears as a new on/off switch.

A disadvantage of using third party modules with Philips Hue, is that you can’t update the firmware of non-Philips devices. This is not a technical limitation. How over-the-air (OTA) updates work in Zigbee, is standarised. So, theoretically, Philips Hue can add functionality to support upgrading non-philips devices in the future (that’s a feature request for you, Philips!).

In the mean time, you can only update via the bridge of the original manufacturer. In the case of ubisys devices, that would be the Ubisys gateway, which is a pretty pricy device. As an alternative, you can also use open-source software like zigbee2mqtt in combination with a Zigbee USB stick. But that is not an easy solution for the avarage user.

A second disadvantage is that you won’t be able change the default configuration of ubisys modules via the Hue Bridge. Ubisys models have quite a few configuration options, but these are only accessible via the ubisys gateway (or, again, via zigbee2mqtt). It’s definitely a disadvantage, but in my experience not a problem though. For many use-case the default configuration suffices.

An example where it might be a problem is when you want to use a non default switch. By default, the S1 is setup for a toggle (two stable states) switch. If you want to use a push switch for example, you’re out of luck, you will need additional configuration.

Finally, functionality supported by the S1 but not in Hue, like measuring power consumption, is only exposed through the ubisys gateway and app. Furthermore, Philips Hue does not expose non-philips devices to Homekit. So for Homekit support you’ll need something like Homebridge with Hue plugin.

Conclusion

Despite all this, if you have a fairly standard configuration, ubisys switches are an excellent choice if you want to automate your existing dumb lights, and already have a working Philips Hue setup. The modules are very solid, easy to integrate with Hue and work with existing switches (as long a there is a neutral line available).

Developing HomeKit accessories without Homebridge

2020-08-04T22:00:00+02:00

Homebridge is a really nice piece of software, written in Node.js, with which you can emulate HomeKit accessories.

In other words: with Homebridge you can add devices without native Homekit support to Homekit. All you need is a plugin for the device. From the Homebridge website:

Homebridge allows you to integrate with smart home devices that do not natively support HomeKit. There are over 2,000 Homebridge plugins supporting thousands of different smart accessories.

Anyeone with Node.js knowledge, can create plugins for Homebridge. Either for personal use, or to share with other Homebridge enthusiasts via NPM.

HomeKit Accessory Protocol Specification

Homebridge implements the HomeKit Accessory Protocol Specification (HAP). Thanks to Apple releasing the HAP specification, developers can create their own non-commercial HomeKit accessories.

A difference with commercial HomeKit accessories is that the latter need certification from Apple under the MFI Program. If you add an uncertified accessory (for example via a Homebridge plugin), you’ll see a message in Homekit that te device is not certified. But beyond that, uncertified devices work perfectly fine in Homekit.

Working with HAP directly is not a trivial task. So the usual approach is to use a framework for all the HAP specific stuff. For Node.js for example you can use HAP-NodeJS, on which Homebridge is build.

Developing Homebridge plugins

So yes, Homebridge is a great way to run uncertified Homekit accessories “out-of-the-box”, thanks to the large number of available plugins, and a big community of Homebridge enthusiasts.

But there are also definitely some downsides, especially when developing you’re own plugins:

Homebridge plugins need to be installed globally, e.g. with npm install -g. If you want to migrate your Homebridge instance to another system, you have to manually go through your Homebridge config, and make a list of plugins you need to install on the new system.
Testing plugins while developing is relatively cumbersome. You need a local Homebridge (development) instance that runs your plugin. Every time you want to check a change, you need to restart the Homebridge instance, and check the Homebridge logs. There is no easy way of directly executing parts of a plugin for testing, without extensive mocking.

Also there are some considerations when running Homebridge:

Resources: Since Homebridge runs on Node.js, naturally you need to have working Node.js environment. This is fine for, for example, a home server or a Raspberry Pi. But can be problematic for embedded systems or other devices with limited resources, such as OpenWRT routers. Also in terms of memory, Homebridge is relatively resource intensive compared to compiled binaries.
Complex configuration: Homebridge act as a “bridge” in Homekit, which means it can host many child accessories/plugins. If you have many, the Homebridge config.json can get rather complex quickly.

Although, it’s fair to say that some of these issues are less of a problem if you use Homebridge Config UI X

Homekit development in Go

Because of these downsides to Homebridge (for me at least), for my own custom developed Homekit accessories, I stepped away from using Homebridge (though I still use Homebridge with a limited number of public plugins, like the excellent Homebridge Hue plugin).

As an alternative to Homebridge, I now mostly use the excellent Go framework Hc. HC is not a stand-alone server with plugins like Homebridge, but it’s a lightweight HAP framework, comparable to HAP-NodeJS. It abstracts the HomeKit Accessory Protocol and supports all HomeKit services and characteristics.

Creating a basic Homekit accessory with HC is as simple as:

package main

import (
    "log"
    "github.com/brutella/hc"
    "github.com/brutella/hc/accessory"
)

func main() {
    // create an accessory
    info := accessory.Info{Name: "My Lamp"}
    ac := accessory.NewSwitch(info)
    
    // configure the ip transport
    config := hc.Config{Pin: "00102003"}
    t, err := hc.NewIPTransport(config, ac.Accessory)
    if err != nil {
        log.Panic(err)
    }
    
    hc.OnTermination(func(){
        <-t.Stop()
    })
    
    t.Start()
}

The result of building an accessory with HC is a single binary without any dependencies. This is great for systems with limited resources (although Go binaries are relatively large, they are still a lot smaller than a complete Node.js environment).

Also in terms of memory, HC accessories are resource friendly. Memory foodprint is usually around 12 MB physical memory, versus 100 MB used by Homebridge in my configuration with just a few plugins.

But also if resources are not a problem, it’s really nice to be able to work with a single binary that’s easily portable between systems, without any additional installation steps on different machines (like setting up Homebridge).

Follow the sun

2019-09-01T10:00:00+02:00

I created a small cli tool called Follow the sun. What it does is switch display settings to Dark mode automatically after sunset, and back to Light mode after sunrise. The idea is that Follow the sun runs as a daemon via launchctl.

For me this was mainly an experiement, it helped me to learn more about:

Accessing private frameworks in macOS
Using CoreFoundation in C
Using macOS Objective-C runtime library (objc_msgSend, sel_registerName, etc) in C.

How to retrieve sunrise/sunset info in macOS

Sunrise/Sunset info is provided by a private macOS system framework named CoreBrightness. A bit more info on this undocumented API (Objective-C) can be found here: https://github.com/thompsonate/Shifty/issues/20.

Example output from this API:

{
    isDaylight = 0;
    nextSunrise = "2018-10-07 05:55:03 +0000";
    nextSunset = "2018-10-06 16:59:34 +0000";
    previousSunrise = "2018-10-05 05:51:38 +0000";
    previousSunset = "2018-10-04 17:04:09 +0000";
    sunrise = "2018-10-06 05:53:21 +0000";
    sunset = "2018-10-05 17:01:52 +0000";
}

You can retrieve the above schedule information from the command line in macOS Mojave:

$ /usr/bin/corebrightnessdiag sunschedule

added in 2020: macOS Big Sur:

$ /usr/libexec/corebrightnessdiag sunschedule

Caveat is that CoreBrightness needs to know your location in order to calculate the correct sunset and sunrise times. If Wifi is disabled, CoreBrightness (via CLLocationManager) might not be able to determine your location. In that case, the calculated sun schedule is not correct. So make sure WiFi is turned on.

The private API to get this info with, is provided via an objective-c class called BrightnessSystemClient. To call this API with plain C, a bit of Objective-C runtime library magic is needed.

How to toggle dark/light mode

For the dark/light mode toggle, there does not seem to be any objective-c or plain C API available in macOS. Toggling is however possible programmatically, via AppleScript:

tell application "System Events"
    tell appearance preferences
      set dark mode to 1
    end tell
end tell

Since there is a way to directly execute AppleScript from objective-c (and hence from C via the Objective-C runtime), I decided to go for that route:

CFStringRef script = CFSTR("PUT APPLE SCRIPT HERE");
id scriptString = objc_msgSend((id)objc_getClass("NSString"),
                                sel_registerName("stringWithFormat:"), scriptString, darkMode);
id NSAppleScript = (id)objc_getClass("NSAppleScript");
SEL alloc = sel_registerName("alloc");
SEL init = sel_registerName("initWithSource:");
SEL release = sel_registerName("release");
id allocScript = objc_msgSend(NSAppleScript, alloc);
id scriptRef = objc_msgSend(allocScript, init, scriptString);
// Execute script    
id res = objc_msgSend(scriptRef, sel_registerName("executeAndReturnError:"), &err);

Running via launchctl

Last part is to run the application via launchctl. The is reasonable straightforward. Besides the mandatory .plist configuration file, I created a helper script to easly start, stop and restart the service.

On building a small cross-platform CLI tool in C, Go & Swift

2019-02-25T20:00:00+01:00

A great thing about being a programmer is that if you need specific, customised, tooling, you can just write it yourself. Often there are existing options, but of course it’s a lot more fun to write your own tools, especially when you have room for some experimentation.

This is how I came to write a simple tool.. 3 times, in different languages.

The experiment

My goal was to write a very simple command line tool that can generate one-time passwords compatible with Google Authenticator. Google authenticator uses the Time-based One-Time Password algorithm (TOTP) to generate codes. Instead of writing my own implementation, I wanted to use an existing TOTP library, since there already are many good ones.

Essentially, all I want my tool to do, is accept a secret as single input, then call an exiting TOTP library to generate a code, and print the generated access code to the standard output.

The question I was asking myself was: suppose I would like to use the tool on several platforms (Mac, Windows, Ubuntu), and would like to distribute the tool amongst a small group of - not necessarily technical - people (e.g. colleagues), what programming language would be the most pragmatic/viable/fun option?

Of course you can look at this question from many angles. Let’s focus on building and distributing the tool. Then, these were my “should have” requirements:

It should be possible to distribute the tool as single executable that works “out of the box”, meaning the user does not have to install dependencies like runtimes, frameworks, libraries, etc.
With the same code base (but possibly different toolchains) it should be possible to produce builds for multiple platforms.

Language choice

I wanted to create a binary for this specific experiment, that’s why I did not consider interpreted languages like Node.js, Ruby, and Python for this specific tool. Although, of course in general these languages would all make perfectly viable options to use for writing a cross-platform command line tool.

There is also a disadvantage to those languages, being that the end user needs to have a runtime (e.g. Node.js) installed. Although many platforms come with common runtimes pre-installed, the user might need to install a different version. That is not always a trivial task for non-technical users.

(I’m aware that there are tools to compile interpreted languages to stand-alone executables, but that feels a bit like cheating here).

In the end, my choice was to experiment with C, Go and Swift.

I decided to stay in my “programming language comfort zone”, because learning a new language was not part of my experiment. Therefore I did not experiment with (in opinion) very interesting other languages, such as Rust, which I will try out in the future (feel free to leave a comment with your Rust experiences). Also good to note maybe: for this experiment I considered C++ overkill (or actually maybe, my C++ knowledge is just lacking).

What I learned

Typically, executables build with C are linked dynamically. That means that end users need to install dependencies (linked libraries) in order to run the tool. That’s definitely not ideal. There are ways around this, but these all come with some disadvantages:
Static linking: create single binary that will hold all required binary code. But, that requires that all libraries that you use (for example a TOTP library) support static linking. This is definitely not always the case. Furthermore, Apple does not support statically linked binaries on Mac OS X.
Distribute linked dynamic libraries with your application. This means that for every target OS you’ll have to pre-build all linked libraries, make sure these libraries can be found by the executable (e.g. changing rpath on macOS), and bundle them with the app. In other words you need compile and bundle .dll (Windows), .dylib (macOS) or .so (Linux) files with your app. C does not have a runtime that needs to be bundled with the application. Therefore the resulting executable is quite small. The only dependency (dynamic library) is the C standard library libc, which is by default available on the OS’es I would like to target.
Building a single C code base on different platforms can be a pain. I generally prefer to use the “default”, or most widely supported, build chain for a platform. In my opinion that is Visual Studio on Windows, Xcode on Mac (or GCC on Mac command line) and GCC on Linux. But that means that for every platform you need to install and setup a completely different build environment (project file, build scripts, etc).
Compiling dependencies from source for multiple platforms is hard. Like I mentioned above, setting up build chain for your own code on different platforms can be difficult already. It is even more difficult to compile third party libraries from source for multiple platforms. Some are relatively easy to work with cross-platform, but others are a real pain because they lack support for, or documentation on, cross-platform building.

Executables build by Golang are statically linked by default. This means users don’t have to install any dependencies, and you don’t need to distribute dynamic libraries with your application. For a small command line application, the only thing you need to distribute is the executable.
Unfortunately, because of the static linking, the resulting executable is relatively huge. That is because a Go binary includes the Go runtime, so that the end user does not need to have Go installed. (but, as Dotan Nahum, points out, there are ways to trim down some fat)
Go is available as a binary distribution on all target platforms I was interested in. That makes setting up a build environment and building on these platforms painless.
A great thing about Go, is that you can easily compile for multiple platforms on one machine. Without the hassle of setting different toolchains for differnt platform that is common when dealing with C.

Swift

It is recommended to statically link to the Swift standard library, so that the resulting executable is not bound to the specific version of Swift that it was built with. This results in a large binary (more than 10 MB for a simple tool). The need for static linking is due to Swift lacking ABI stability. That is on the roadmap to be solved in a future Swift version though. (In comparison, Objective-C does have ABI stability by the way).
Cross-platform support has not yet matured. You can compile a Swift program on both on Mac and Linux (there is no official Windows release yet), but the cross-platform build system - Swift Package Manager (SPM) - is not nearly as mature as Xcode on MacOS. Furthermore, many libraries that are available on CocoaPods or Carthage (MacOS only) don’t support SPM (cross-platform) yet.

Conclusion

When it comes to building cross-platform and distributing the tool, Go gave me the best developer experience. Thanks to the default static linking, it is easy to create a single executable for distribution. Building a Go program on different platforms is also really easy. There is no need for writing platform specific build scripts, or using platform dependent tool chains. Downside is that the resulting executable is relatively large (several megabytes), but in my situation that was not a real issue.

Next up is C. Writing in C always gives me a pleasant sense of control and a feeling of freedom due to the lack of constraints from a runtime. Of course the downside to this is that you can easily shoot yourself in the foot. But the biggest problem here was that there is no single toolchain for building that works just as flawlessly cross-platform as Go.

And finally, Swift. While I really like Swift as a language, and I would only consider Swift as obvious choice when writing command line tools specifically for macOS. Swift is too much of a “moving target” for me. That has several implications, an important one is that it is not straightforward to use Swift on other platforms. Another issue for me is that Windows is not yet officially supported.

As a final note: I wanted to share my experiences, but in the end, what language suits you best comes down to personal preference and current state of languages. Next year might be different (not that it’s 2019 when I wrote this post).

Onderweg Blog

Retro file formats: dBase files with custom indexes

The dbf format

Libraries for dBase files today

The jungle of dBase indexes

Using file-based key/value stores for dBase indexes

Constant databases for indexes

DBMs (or “Berkeley DB style databases”) for indexes

Conclusion

Reviving The Guide: dusting off an old binary file format

Using libguide in C

Pointer size problems

Multi-language parsing with Kaitai struct

Conclusion

Using ubisys Zigbee modules with Philips Hue

Hue compatible Zigbee switches

What about the Philips Hue wall switch module?

About the ubisys S1

Ubisys S1 in Philips Hue

Conclusion

Developing HomeKit accessories without Homebridge

HomeKit Accessory Protocol Specification

Developing Homebridge plugins

Homekit development in Go

Follow the sun

How to retrieve sunrise/sunset info in macOS

How to toggle dark/light mode

Running via launchctl

On building a small cross-platform CLI tool in C, Go & Swift

The experiment

Language choice

What I learned

Conclusion

Using `libguide` in C