David Nadlinger

Systemd units for Buildbot in Conda

2018-08-25T00:00:00+01:00

Buildbot is a Python framework for continuous integration systems. In my research group we are deploying it in a Conda environment, which we also use to manage all the different moving parts of our Python-centric control infrastructure on both Windows and Linux. To start up the master and worker services, the corresponding Conda environment needs to be activated first. This is easiest to achieve using simple wrapper scripts.

For this, let’s assume we’ve created a bb user, with the master and worker configurations in its home directory (~/master and ~/worker). First, create a wrapper script to start up the master process:

#!/bin/bash
set -eo pipefail

export PATH=~/anaconda3/bin:$PATH
source activate buildbot
buildbot start --nodaemon master

~bb/start-master.sh

This assumes Conda has been installed into ~bb/anaconda3, and the environment with the Buildbot installation is called buildbot. master is the name of the configuration directory, and --nodaemon prevents daemonisation (i.e. keeps the process running in the foreground).

Make the script executable, and create a Systemd unit file that invokes it:

[Unit]
Description=Buildbot master service
After=network.target

[Service]
User=bb
Group=bb
WorkingDirectory=/home/bb
ExecStart=/home/bb/start-master.sh
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

/etc/systemd/system/buildbot-master.service

To start the master process, run

systemctl start buildbot-master

and to do so every time the system boots:

systemctl enable buildbot-master

The analogous configuration for the worker process is

#!/bin/bash
set -eo pipefail

export PATH=~/anaconda3/bin:$PATH
source activate buildbot
buildbot-worker start --nodaemon worker

~bb/start-worker.sh

and

[Unit]
Description=Buildbot worker service
After=network.target

[Service]
User=bb
Group=bb
WorkingDirectory=/home/bb
ExecStart=/home/bb/start-worker.sh
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

/etc/systemd/system/buildbot-worker.service

Start it using

systemctl enable buildbot-worker
systemctl start buildbot-worker

This is it; Buildbot is now run automatically when the system boots. To avoid starting the graphical user interface on a desktop Ubuntu install, run systemctl set-default multi-user.target.

Photographing a Single Atom

2018-02-20T00:00:00+00:00

When I spent one Sunday night working on a photograph in our basement laboratory last August, I was admittedly quite pleased with the results. But I certainly didn’t expect the attention it recently received from news media around the globe after winning an award in a photography competition. In this post, I will try to provide some of the scientific background sorely missing from the original press release, and address a few commonly asked questions.

First of all, another look at the picture in question, taken on August 7, 2017, at 2:36 AM in the laboratories of the Ion Trap Quantum Computing group (Prof. David Lucas and Prof. Andrew Steane) at the University of Oxford. As part of my DPhil studies at Balliol College, I work on using ion traps for quantum computation, in particular towards distributing high-fidelity entanglement between several trap modules using optical links (click for a high-quality version, 4.1 MiB):

An ion trap in an ultra-high vacuum vessel. In the centre of the picture, as small bright dot is visible – a single trapped ⁸⁸Sr⁺ ion. (Overall 1^st in the EPSRC 2018 Science Photography Competition; crop slightly changed here.)

Before getting into the details of the science behind all this, one particular misconception that has cropped up in the search for sensationalist headlines should be addressed:

Is this an advance in science? Have single atoms been photographed before?

In short: Not in the least; and yes, probably even before I was born.

First, the techniques that made this picture possible, ion traps and laser cooling, are part of the standard toolbox in modern physics experiments. The photo could have been taken in dozens—if not hundreds—of laboratories around the world, with any one of more than ten different species of atoms. To be very clear, the picture won a photography competition, not a science prize.

Nevertheless, the picture still showcases a lot of cool innovations in physics and engineering from the second half of the 20th century. To name just a few highlights—telling a scientific story in Nobel prizes necessarily paints a very incomplete picture: Both the aforementioned experimental techniques were recognized with the prestigious prize, ion traps in 1989, and the application of laser cooling to neutral atoms in 1997. In 2012, D. Wineland was awarded the Nobel prize for the development of methods to precisely manipulate the quantum state of trapped ions. This is the basis for a number of research groups around the world to investigate them as building blocks for quantum information applications, including ours.

On the second point, this is nowhere near the first picture of a single atom, probably by almost as much as forty years—I can’t help but think that someone working on the early ion trap experiments would have tried to take such a picture as well. Either way, taking pictures of single atoms has been a part of our experiments for a good ten years now, as a way of reading out the result of a quantum computation from our ion qubits (arXiv)—the result in 0s and 1s is literally given as a pattern of bright dots and spots that stay dark. For another example, check out the second picture on our group website, showing a string of ⁴³Ca⁺ ions.

Neutral (uncharged) atoms can also be trapped using laser techniques. People have taken pictures of single atoms in this setting too, such as this group in Otago, New Zealand. Like the pictures used for ion qubit readout, these are typically taken with scientific cameras (lower noise, but usually monochrome) and through a microscope with narrow field of view. By using an ordinary camera, I was able to capture a picture in full colour, including more of the surrounding apparatus.

There is also this funny—possibly apocryphal—story from the early days of ion trapping, featuring a group around Hans Dehmelt (one of the above Nobel laureates) and a photo of a single Barium atom they took back in the very much analogue age of photography: Supposedly, their picture was mostly black, with only a few small bright spots from the ion itself and a bit of stray light hitting the trap electrodes. When they submitted the negatives for publication in some conference proceedings, the image editor promptly stamped out the atom, thinking it to be a speck of dust!

All in all, the scientific value of this image is virtually zero. Still, I hope it can convey some of the fascinating aspects of nature we get to explore in modern physics on a daily basis.

The rest of this blog post is work in progress. For now, have a look at this 3D model of our trap, where the individual parts are easier to recognise than in the picture:

3D model of the trap from the photo. The RF and ground electrode pairs making the quadrupole potential are shown in yellow/lavender, the static "endcap" electrodes confining the ions along the trap axis in red. The purple wires are used to compensate for stray fields which would push the ion off the centre axis. On the top, the transparent cone illustrates the region our imaging system can collect photons from. (S. Woodrow, K. Thirumalai)

Colleagues of ours, the group around Rainer Blatt in Innsbruck, have a few more pictures of similar traps on their website.

A collection of some further links:

Seeing a single atom in person is possible with a magnifying glass or a small microscope, as discussed in this New York Times article from 1986. Apparently, this has been done even back in 1979.
The radius of an atom is a bit tricky to define; according to one common definition (the space taken up in a molecular bond to another atom of the same species), the radius of the strontium atom is about 0.25 nm (a quarter of a billionth of a metre). When confined in a trap potential with frequency \(\omega\), its size is at least \(z_0 = \sqrt{\hbar / (2\ m_{Sr}\ \omega)}\) due to the Heisenberg uncertainty principle. For the trap parameters here, the radius is about 6.5 nm. This is far below the diffraction limit set by the light wavelength of 422 nm.
The temperature of the atom is approximately 0.5 mK, or 1 / 2 000 ºC above absolute zero (slightly above the Doppler limit). Hence, the size due to “motion blur” would still be less than 300 nanometres.
The atom appears bigger due to imperfection in the lens and camera (optical aberrations, plus the focus is slightly off). There seems to be a small amount of camera shake as well (the camera was mounted to the optical table in a somewhat precarious fashion using a cheap tripod head).
Since ions repel each other, one can take pretty pictures of interesting configurations of glowing atoms using a microscope. See for example our group website, or the much fancier pictures by the groups at PTB Braunschweig and NIST Boulder.
The photo was captured on August 7th, 2017, with a Canon EOS 5D Mark II and an EF 50 mm f/1.8 lens at 30s exposure time and f/4 (plus some extension tubes, and two flash units with colour gels).
The quantum efficiency and noise performance of modern digital camera sensors is surprisingly good. Roger N. Clark has collected a swath of useful information over at his website, see for example his page on the camera I used, and his overview of modern digital camera sensor performance. The Sensorgen.info and Photons to Photos websites also provide further information and data on sensor performance. In fact, it appears that if I had taken a closer look at the performance data before I took the picture, I could have optimised the settings a bit more to reduce the apparent noise.

[A long-form blog post is work in progress.]

Testing Verilog AXI4-Lite Peripherals

2016-01-30T00:00:00+00:00

Chips that combine one or more processor cores and FPGA fabric into one integrated system have become quite popular recently, the most well-known product being Xilinx’ ARM-based Zynq series. The standardized AXI buses connecting them make it trivial to bring custom IP cores into the processor address space. This post describes how to interface with it from a standalone Verilog test-bench.

The popularity these combined systems-on-a-chip have been enjoying lately in research labs is certainly in part due to the ease in which programmable logic can be connected to the CPU cores, as compared to having to design and implement an interface between a discrete ARM processor and a stand-alone FPGA chip. This is due to the fact that the Zynq chips feature several internal interconnects between the ARM cores and the programmable logic fabric (including access to the DDR system memory and cache coherency control). These buses follow ARM’s open AMBA AXI4 standard, which is available in several flavors: the base AXI4 protocol, which defines a high-performance memory-mapped interface, AXI4-Stream, which realizes a unidirectional data flow with handshaking, and AXI4-Lite, which is similar to AXI4 but lacks advanced features like buffering, multiple widths and bursts. Each given device implementing one of these protocols can act as either a master or a slave.

Here, we will concern ourselves only with perhaps the simplest case, an AXI4-Lite slave. A typical example for this would be a low-bandwidth control channel from the ARM CPU to a custom IP core. Implementing such a device is quite easy as the Xilinx development environment includes tooling to generate the code for interfacing with the AXI bus (although it seems that, compared to the average programmer, FPGA designers lack any sensibility for writing pretty or even just consistently formatted code). But of course, this leaves the question of how to verify that the IP core reacts correctly to these commands – as it is usually the case for HDL design, you certainly don’t want to run the time-consuming synthesis process and re-flash the hardware on every iteration in the debugging process, only to then find yourself in an environment where it is hard to diagnose errors anyway unless you had enough foresight to litter the code with ChipScope debug probes in all the right places.

These days, I use Icarus Verilog for most all of my simulation needs, except when some proprietary IP is involved for which no functional model is available outside the vendor tools. It is an open source project that provides a Verilog parser, optimizer and virtual machine, and together with a waveform viewer such as GtkWave makes for a nice light-weight testing environment. For small-ish projects, it tends to have already finished the simulation before the clunky and bug-ridden vendor tools such as Xilinx isim would have even completed starting up.

Co-simulating the code to run on the ARM CPU and the FPGA design is certainly possible – maybe by using Verilator and piping data flow on the AXI buses back and forth between the domains, or by bringing out the “big guns”, i.e. system-level verification tools made by companies like Cadence. The most straightforward solution, however, is certainly to test the core in question in isolation, while just manually handling the necessary AXI communication in the test-bench.

Owing to the simplicity of the AXI4-Lite protocol, such functionality is not hard to implement. The “AMBA® AXI™ and ACE™ Protocol Specification” – available on the ARM website after logging in, and certainly floating around in other places as well – is quite clear and well-written. Interestingly, however, none of the templates provided by Xilinx seem to include the relevant pieces of HDL. So, without further ado, here is a Verilog task that reads a single word from the bus and compares it to the expected value:

task automatic enforce_axi_read;
  input [C_S_AXI_ADDR_WIDTH - 1 : 0] addr;
  input [C_S_AXI_DATA_WIDTH - 1 : 0] expected_data;
  begin
    s_axi_araddr = addr;
    s_axi_arvalid = 1;
    s_axi_rready = 1;
    wait(s_axi_arready);
    wait(s_axi_rvalid);

    if (s_axi_rdata != expected_data) begin
      $display("Error: Mismatch in AXI4 read at %x: ", addr,
        "expected %x, received %x",
        expected_data, s_axi_rdata);
    end

    @(posedge s_axi_aclk) #1;
    s_axi_arvalid = 0;
    s_axi_rready = 0;
  end
endtask

Reading a word from the AXI4-Lite bus and comparing it to an expected result.

All the s_axi_… signals are supposed to be hooked up to the corresponding ports of the unit under tests, as they would be in an auto-generated test-bench module. To use it, simply insert enforce_axi_read(<addr>, <data>); at the appropriate point in your test sequence.

In the same vein, the following task writes a data word to the given address:

task automatic axi_write;
  input [C_S_AXI_ADDR_WIDTH - 1 : 0] addr;
  input [C_S_AXI_DATA_WIDTH - 1 : 0] data;
  begin
    s_axi_wdata = data;
    s_axi_awaddr = addr;
    s_axi_awvalid = 1;
    s_axi_wvalid = 1;
    wait(s_axi_awready && s_axi_wready);

    @(posedge s_axi_aclk) #1;
    s_axi_awvalid = 0;
    s_axi_wvalid = 0;
  end
endtask

Writing a word to the AXI4-Lite bus.

As a final note, be aware that these tasks are not at all intended to verify the protocol-level implementation of the AXI interface itself. A verified boilerplate solution, such as the one auto-generated by the Xilinx tools, would be used most of the time anyway. However, it might be interesting to know that ARM offers a set of AXI 4 Protocol Assertion cores that can be inserted into the design to verify that the bus signalling conforms to the specification.

The State of LDC on Windows

2013-05-31T00:00:00+01:00

LDC is one of the three major D compilers. It uses the same frontend as DMD, the reference implementation of the language, but leverages LLVM for optimization and code generation. While it has been stable on Linux and OS X for quite some time, support for the Windows operating system family was virtually non-existent so far. There have been substantial advances recently, and this post gives an overview of the current situation.

Before going on to discuss the present status, though, let me quickly answer the inevitable question: Why did it take so long? It is not that the importance of Windows as a target platform would not have been recognized by the D community (or the LDC contributors in particular). Instead, the reason for the lack on of a working Windows port was caused by the fact that LLVM itself did not support all the required operating system specific features. Notably, exception handling was not implemented at all on Windows for a long time.

This applies to 32-bit variants of Windows (Win32) as well as to the newer 64-bit operating systems (Win64), but interestingly the reasons for this are completely different. In the latter case, the problem was just that nobody took the time to implement the (table-driven) Win64 exception handling scheme in the LLVM backend. This is not so surprising, as most of the big companies sponsoring LLVM development are not using LLVM on Windows, or in an application domain that does not require features such as native exception handling or thread-local storage support.

However, Kai Nacke has tackled this problem recently, among with a number of other LLVM issues blocking development of the Visual Studio-based Win64 port of LDC. A patch fixing the bulk of the bugs in the exception handling implementation is currently under review on the LLVM development mailing list, and Kai has prepared a binary preview version of LDC with all the latest patches. For more information, you can also visit the Building and hacking LDC on Windows using MSVC page on the LDC wiki.

The rest of this post will discuss the situation specifically on Win32/MinGW. Here, the root problem is that Structured Exception Handling (SEH), the default exception handling mechanism on 32-bit Windows, is covered by a Borland-held patent. It will not expire until next year, and while Borland seems to dismiss any related concerns, the GCC and LLVM projects have decided to not include an implementation of SEH in their compiler backends for fear of legal trouble.

Recently, however, support for DWARF 2-style exception handling appeared in GCC/MinGW. Here, the Windows-“native” SEH is forgone for the same table-based exception handling scheme that is also used on Linux. The downside of this approach is obviously that it doesn’t integrate with SEH exceptions raised by the OS or other C libraries. But even if it is theoretically possible to catch those from D, this (DMD) feature isn’t really used widely, and as such virtually all D projects should be oblivious to the exception handling mechanism used under the hood.

Status Overview

So, what can you expect from LDC on Win32/MinGW today? First, the good parts:

Exception handling works, and all the related test cases that also pass on the various Posixen also pass on Win32/MinGW. Why this qualification? Just like GDC, LDC unfortunately doesn’t implement all the fine details of D’s exception chaining mechanism on any platform yet.
Thread-local storage (TLS) support is solid. Seeing this item on the list might surprise you, as TLS is central to each and every D2 application. However, it regularly turns out to be a pain point when porting D to new platforms, as it is typically not so important for other native languages. Thus, the related parts of the toolchains are typically less well tested, and LLVM on MinGW unfortunately was no exception here. At this point, however, my fixes to TLS support have arrived in the upstream versions of both mingw-w64 and LLVM, so no custom patches are required any longer (this is also the reason why LDC requires a very recent version of both).
The DMD, druntime and Phobos test suites mostly pass, and some smaller applications I tested build and work just fine. This notably includes most functionality associated with 80-bit reals (aka long double), which is notoriously problematic as the Microsoft Visual C/C++ runtime (MSVCRT) does not support this type of floating point numbers at all.
LDC is sufficiently ABI-compatible to DMD on 32-bit Windows that virtually all of the inline assembly code in druntime and Phobos works without changes. This only covers a surprisingly small part of the total ABI though, so even if DMD emitted COFF object files, it would still be a hopeless endeavor to try and link object files produced by the two compiles together, just as it is on the other operating systems.

Now, for the less pleasant points:

There are still a few issues related to floating-point math, particularly with complex 80-bit numbers. Single tests in std.complex, std.math, std.mathspecial and std.internal.math.gammafunctionstill fail, and core.stdc.fenv is not implemented properly yet. It seems to be likely that most of these problems are again caused by functions lacking from MSVCRT respectively their MinGW replacements (one specific example is fmodl, which seems to cause interesting ABI issues).
The core.sys.windows.dll tests do not build, and while this would be easy to work around, DLL creation is entirely untested at this point.
While MinGW theoretically supports COM, the std.windows.iunknown tests do not link yet because of missing symbols. There is likely an easy fix, but interfacing with COM has not been tested at all.
There are also still two rather disconcerting test failures in core.time and rt.util.container which have not been tracked down yet.
LDC currently relies on using the MinGW as for emitting object files, as the LLVM integrated assembler does not correctly support writing the DWARF exception handling tables yet. This is suboptimal, as it causes several issues with non-ASCII characters in symbol names and generally has a negative effect on compiler performance. It currently also causes an issue with building the std.algorithm unit tests in debug mode, where the humongous symbol names (in the tens of kilo(!)bytes) overflow some as-internal data structures.
And most importantly, LDC/MinGW is still virtually untested on larger real-world applications. There will certainly be a number of bugs which have not been caught by any of the test suites.

Getting Started

So, how to try out LDC on Windows? The easiest thing would be to just download the latest binary (preview) release. For this, first grab a very recent mingw32-w64 snapshot, such as this one (rubenvb personal build, .7z, ~27 MB) and extract it to an arbitrary location. It is important that you pick one built with Dwarf 2 exception handling enabled; when in doubt, just use the above one.

Then, download and extract the latest LDC binary release for MinGW (.7z, ~8.5 MB). It is a “DMD-style” package that should work from any location without any extra installation steps. Before invoking LDC, you need to make sure that the MinGW bin directory is on your path, though. This is easiest to achieve by starting a shell using mingw32env.cmd in the MinGW root directory, or of course using a MSYS shell altogether.

If you prefer building LDC from source yourself, a guide on building LDC on MinGW x86 is available on the wiki. Any help with LDC/MinGW development would be very much appreciated!

Purity in D

2012-05-27T00:00:00+01:00

Programming language design is a controversial topic, but in light of current challenges regarding both hardware trends and maintainability, several concepts originating in the functional programming world are being rediscovered as universally helpful. To that end, the D programming language includes its own pragmatic take on the idea of functional purity. This article is an introduction to D’s pure keyword and its interaction with other language features.

Purity is a powerful tool for programmer and compiler alike to help reasoning about source code. But before we delve into the implications and use cases of the feature, first a short definition of the actual semantics of pure in D. If you are already familiar with the concept as implemented in other languages, please pretend you never heard of it for the moment. There will likely be a subtle difference in D’s interpretation, the quite profound consequences of which will be covered later.

pure is a function attribute, and represents a contract between functions and their callers: The implementation of a pure function does not access global mutable state, where »global« refers to anything besides the function parameters (which must not reference data shared between threads), and »access« covers all reading or writing operations. A function not marked pure is called impure.

In a slightly less precise way, this means that pure functions always have the same effect and/or return the same result for a given set of arguments. As a consequence, a pure function for example cannot call other impure functions, or perform any kind of I/O (in the classical sense).

However, in order to make implementing non-trivial pure functions feasible, a few things are allowed in pure code that might be illegal under a very strict definition of what comprises state (feel free to skip this if you are only interested in the »big picture«):

Aborting the program: In a systems-level language like D, there will always be ways to terminate the program. As there is really no way around this, it is explicitly allowed in the specification.
Floating point calculations: On x86 processors, the behavior of floating point calculation is influenced by a number of global flags (this probably applies to other ISAs which I am less familiar with as well). Thus, if a function contains even a single, perfectly innocent x87/SSE floating point expression like x + y or cast(int)x, its result, including exceptions being thrown, can vary greatly based on global state (i.e. the processor flags).¹ Thus, under a strict definition of purity, all floating point calculations would be disallowed. As this would be an impractical restriction, in D pure functions are allowed to read and write floating point flags (note, however, that in general D functions which change the flags are required to reset them after control flow leaves the function).

Allocating garbage-collected memory: If maybe not on the first look, on the second thought it should be evident that the result of an operation allocating memory (think malloc) fundamentally depends on global state, namely the amount of free memory available to the system. An equally valid observation, though, is that being unable to use heap-allocated memory at all is a severe restriction for many operations. But it turns out that in D, if allocating GC memory using the new keyword fails, it does so with a non-recoverable Error anyway. Thus, pure functions can use new without violating the guarantees the type system provides. (Note: Strictly speaking, even using memory from the stack would be impure, because depending on the environment, the function might end up triggering a stack overflow.)

What About Referential Transparency?

One thing is ubiquitous in the functional programming world, but conspicuously absent from the above definition: The immutability of the function parameters. This is neither an oversight, nor has it been implied – pure functions in D can alter their arguments. For example, the following snippet is perfectly valid D code:

int readAndIncrement(ref int x) pure {
  return x++;
}

This might be surprising to some, as purity in programming language theory typically implies referential transparency, which means that a function invocation can be replaced with its result without changing the program semantics (implying absence of side effects). However, this is not automatically the case in D. For example, this piece of code

int val = 1;
auto result = readAndIncrement(val) * readAndIncrement(val);
// assert(val == 3 && result == 2);

clearly does not give the same result if readAndIncrement is only evaluated once instead:

int val = 1;
auto tmp = readAndIncrement(val);
auto result = tmp * tmp;
// assert(val == 2 && result == 1);

As covered in the next section, this behavior is actually very desirable in an imperative language, but what to do if you actually want the stronger guarantees of the classical definition of purity, and all the nice properties it entails? Here, another aspect of the D type systems comes to the rescue: the option to transitively mark a view on data as const or the data to be completely immutable³. For a closer look at this, consider the following three function declarations:

int a(int[] val) pure;
int b(const int[] val) pure;
int c(immutable int[] val) pure;

Regarding a with its mutable parameter, the same observation as for readAndIncrement applies (int[] is a dynamic array, i.e. a pointer/length pair referring to a slice of memory). In case of b and c, though, something nice happens: Because the functions are pure, we know that they cannot read/change any global state, and the parameters are not mutable either, so b and c are side-effect free in the usual sense of the word – calls to them are referentially transparent.

That being said, is there a difference between b and c at all? From a purity point of view, there is none – const and immutable impose exactly the same restrictions on what the function can do with its parameters (the latter additionally provides the guarantee that the data will indeed never change, but as no references to it can be escaped besides the return value due to pure, this is unlikely to matter in most cases).

However, there is a subtle but important difference affecting the calling code, depending on whether the actual arguments to a function call are merely const, which both mutable and immutable values are implicitly convertible to, or immutable (i.e. the following applies to both c and b if called with an immutable array).

For example, consider implementing a memoization or common subexpression elimination mechanism. When coming across a pure function with immutable parameters, only the identity of the arguments has to be checked in order to be able to optimize several calls down to one, e.g. by comparing the memory addresses in the case of a runtime implementation, or by a few very simply checks in an optimizing compiler. On the other hand, if an argument type contains indirections and is only const, somebody else could modify the data between two calls, requiring »deep« comparisons that might not be feasible for large data structures in the runtime case, or extensive data flow analysis in a compiler.

The same consideration applies to parallelization: If the arguments of a pure function have no or only immutable indirections, it is guaranteed that it is safe to parallelize, because it can cause no side effects which could lead to non-deterministic behavior, and there can be no data races in the parameters as well. However, for const arguments, this cannot as easily be inferred, because another piece of code with a mutable view on the arguments could end up modifying them at the same time.

Indirections in the Return Type?

In the previous examples, the functions a, b and c differed in whether there were mutable indirections present in their arguments, but in all three cases, the return type was int, the archetypical example for a value type. Is there more to consider if a pure function returns a type containing references?

The first essential point are addresses, respectively the definition of equality applied when considering referential transparency. In functional languages, the actual memory address that some value resides at is usually of little to no importance. D being a system programming language, however, exposes this concept. Now, consider a function ulong[] primes(uint count) pure, which allocates an array and fills it with the first count prime numbers. Invoking primes multiple times with the same count will always return the same numbers, but the arrays containing the result will be allocated at different addresses. Thus, it is clear that when considering referential transparency of functions with indirections in the return value, logical equality (==) instead of bit-by-bit equality (is) is what matters.

The second thing important for referential transparency are mutable indirections in the return type. For example, consider the following snippet of code using the hypothetical primes function:

auto p = primes(42);
auto q = primes(42);
p[] *= 2;

Obviously, rewriting the second invocation of primes to auto q = p is not valid, because then, q would refer to the same slice of memory, and thus also contain twice the primes after the multiplication is executed. Generally speaking, the invocation of a pure function with mutable indirections in its return type cannot immediately be considered referentially transparent, but a number of calls might still be optimized/… as if it was depending on how the calling code uses the return values.

›Weak‹ Purity Allows for Stronger Guarantees

At this point, it should be mentioned that the initial design of the pure keyword in D featured a much stricter set of rules, and while the language specification only ever had a single notion of purity (as defined in the introduction), during discussion of the current more permissive design two terms were coined: weakly pure, referring to functions like readAndIncrement and a from the above examples which have mutable parameters, and strongly pure for side-effect free functions like b and c. Note, however, that there is no exact definition for these terms and their use frequently is the source of confusion in online discussions – to the point where Don Clugston, who introduced the names in his proposal for the improved design, has already asked for them not to be used any longer.

Still, the terms remain in use today, and the fact that this arbitrary distinction refuse to go away corroborates the observation that the amount of guarantees pure provides varies greatly depending on the parameter/return types. And if maybe only for the reason that it is unfamiliar – the actual rules are very simple –, the implications of the current design are sometimes poorly understood. So, what is the motivation behind allowing pure functions to modify their arguments in the first place?

The real power behind the D purity design is that relaxing the rules actually allows more functions to be »strongly« pure. To illustrate this, allow me to quote a recent #AltDevBlogADay article by John Carmack (of id Software fame) titled »Functional Programming in C++«, a refreshingly pragmatic look at the benefits of applying some functional principles to C++ code:

Programming with pure functions will involve more copying of data, and in some cases this clearly makes it the incorrect implementation strategy due to performance considerations. As an extreme example, you can write a pure DrawTriangle() function that takes a framebuffer as a parameter and returns a completely new framebuffer with the triangle drawn into it as a result. Don’t do that. — altdevblogaday.com/…/functional-programming-in-c

There is nothing wrong with this statement, copying the frame buffer every time you draw a triangle is certainly not a good idea. But it turns out that in D, you can actually implement a pure triangle drawing function without committing performance suicide! Its signature might look something like this:⁴

alias ubyte[4] Color;
struct Vertex { float[3] position; /* … */ }
alias Vertex[3] Triangle;
void drawTriangle(Color[] framebuffer, const ref Triangle tri) pure;

This is nice in and for itself: as remarked in the above quote, drawTriangle cannot realistically be referentially transparent since it needs to write to the frame buffer, but pure still guarantees that it does not mess around with any hidden/global state. However, there is more: Being pure, the function can now be called from other pure functions. Continuing the toy example, if allocating a new buffer every frame was an option, this could be a function to render a whole scene consisting of triangles:

Color[] renderScene(
   const Triangle[] triangles,
   ushort width = 640,
   ushort height = 480
) pure {
   auto image = new Color[width * height];
   foreach (ref triangle; triangles) {
      drawTriangle(image, triangle);
   }
   return image;
}

Note how the arguments of renderScene lack any mutable indirections – while it internally calls the argument-mutating drawTriangle, renderScene as a whole is referentially transparent!

Now, granted, this example might be a bit contrived, but with D unwilling to give up the bare-metal performance of imperative code, similar situations quite frequently occur in real-life code (e.g. when using any kind of mutable container in the implementation of a pure function). This is also backed by experience with the aforementioned first iteration of the purity design – relaxing the purity rules had the at first sight slightly paradoxical effect of enabling the same strong guarantees as before to be provided for a greatly increased amount of code.

A related observation is that most modern style guides discourage use of global state anyway, and thus, it should be possible to mark most D functions not dealing with I/O as pure. This is indeed true – so why not make pure the default and require functions to be explicitly marked as, say, impure instead? Regarding D version 2, the reason why this has not been done is simply that purity in its current form was only added at a relatively late point in the evolution of the language, where the impact of such a breaking change was simply considered to be too high. Nevertheless, this is certainly a promising direction to explore for future languages and a (hypothetical) next major release of D.

Templates and Purity

Up to this point, the focus was on the design of pure more or less in isolation. In the following sections, the main topic will be its interaction with other language features, with the first one being templates, or more specifically functions templates.

Once instantiated with its type parameters, functions template are just normal functions, so purity should just work as previously described for them as well. This is indeed the case, but there is additional complexity added because whether a function template can be pure or not might actually depend on the types it is instantiated with.

For an example of this, suppose you want to write a function array which accepts a range⁵ and returns an array containing all of its elements (this function already exists in std.array with a much better implementation). A first take on the problem could look somewhat like this:

auto array(R)(R r) if (isInputRange!R) {
   ElementType!R[] result;
   while (!r.empty) {
      result ~= r.front;
      r.popFront();
   }
   return result;
}

A simple, inefficient reimplementation of std.array.array, which converts a range of elements into a built-in array. Could pure be added to this function?

It is not hard to guess what this is doing – one by one, the front element of the range is popped off and appended to the result array until there are no more elements left. But the question is now: Can this function be made pure? If R is something like the result of a map or filter operation on an array, there is no reason why the should not be callable from pure code. However, if R for example encapsulates a line being read from standard input, there is no way r.empty, r.front and r.popFront() can all be pure. Thus if array was marked pure, it could not operate on such ranges anymore, even if it would otherwise be perfectly able to. So, what to do?

One way of approaching this problem would be to introduce syntax sugar for only conditionally applying attributes to a declaration based on some predicate (which would here depend on R). However, this was shunned at due to the complexity and repetition it would introduce to code that really should be easy to write. The solution which was finally implemented is quite simple: Since D takes a »white-box« approach to templates anyway, meaning that in order to instantiate a template its source must be available, purity is automatically inferred by the compiler for them (along with a few similar attributes like nothrow).

For the above example, this means that a call to array will be callable from pure functions if the concrete range type allows it, and just be impure otherwise. Also note that explicitly specifying pure for template functions is still possible, and can be beneficial for documentation purposes if purity does not depend on the template arguments.

Pure Member Functions

Unsurprisingly, struct and class member functions can be pure as well, and exactly the same rules as for free functions apply to them – with a single addition, or rather clarification: The implicit this parameter is also considered a function parameter for purity semantics, which is a fancy way of saying that pure functions may access and modify member variables.

class Foo {
  int getBar() const pure {
    return bar;
  }

  void setBar(int bar) pure {
    this.bar = bar;
  }

  private int bar;
}

Pure functions are allowed to access member variables (note: typically properties would be used in place of getters/setters in D).

Also note that marking a member function const or immutable is semantically equivalent to applying the attribute to its implicit this parameter; i.e. the above considerations regarding mutability also apply unchanged.

As far as class inheritance is concerned, purity behaves just as one would expect: Generally, a member function in a subclass may require less assumptions while possibly providing more guarantees than its base class equivalent (see e.g. return type covariance). Thus, a pure function might override an impure function, but not the other way round. Actually, for convenience a function overriding a pure base class method is implicitly marked pure (similar to virtual in C++); Walter Bright recently wrote a blog post about this.

`pure` and `immutable` – again?

The effects of const and immutable on referential transparency have already been discussed at length. However, the guarantees of pure in some cases also allow to draw additional conclusions. A prominent case of this, because it is integrated with the type system, is that the return value of pure functions can in some cases be safely cast to immutable. For example, consider the function ulong[] primes(uint n) pure from above. At first, it is not obvious why the following code should compile:

immutable ulong[] p = primes(5);

After all, immutable is a guarantee that there are no mutable references to the data in question at all, but primes clearly returns an array of mutable values. Still, the above code compiles fine, so what is going on here? The reason why it is indeed safe to assume that no other mutable references exist to the return value of primes is of course the fact that it is pure: It does not take any arguments with mutable indirections, nor can it read any global mutable state, so even though the slice returned refers to mutable data, the caller can be sure that nobody else could potentially modify the data.

This seems to be a fairly minor detail, but it turns out to be surprisingly useful in practice, as it allows functions to be seamlessly used in a »functional-style« immutable data context, while at the same time not requiring unnecessary copies in more »traditional« pieces of code, where data might need to be mutated in-place for performance reasons.

Fine, but where is the Escape Hatch?

It lies in the very nature of purity that it is viral, in the sense that when writing a pure function, all code its implementation depends on must be pure as well. D’s purity rules make this compositional aspect of purity very natural, but still, sometimes the need arises to call a function that is nominally impure from pure code.

One example for this is dealing with legacy code, for example when calling a function from an external C library which meets all the criteria to be pure, but has not been marked so in the header files. The way such situations are handled is the same as in all other cases where the type system cannot prove a statement about code: By using a cast. More specifically, by getting a pointer to the function, adding the pure attribute by casting, and then calling it as usual (as any other operation which potentially subverts the type system, this is forbidden in @safe code). If a piece of code has to deal with lots of such »dirty« calls, introducing a short assumePure template which nicely encapsulates the casts might be worthwhile.

And then there is this other thing where purity might momentarily be a hindrance: inserting (impure) debug code into functions, for example to log some values or to take simple call statistics by bumping a global variable every time a function is invoked. Inserting such an impure statement into the innermost of a chain of pure functions would be a major annoyance, and while this style of debugging might be scoffed at by language purists, it is sometimes quite useful in practice.

Initially, D did not include any special provision for this use case, but a way to »temporarily disable purity« for debugging purposes was much requested. As a result, a special case was eventually added to the rules, allowing impure code in pure functions if it is inside a debug conditional. This solution is easy to use, and if not perfectly clean, since such code has to be explicitly enabled via a command line switch (it is not included in normal non-release builds), it is still acceptable from an aesthetical point of view.

Conclusion

To reiterate the statement from the beginning, the importance of the concept of purity lies within the fact that it allows the type system to assert that a particular function call will not depend on or modify hidden state. We have seen that the pure keyword in D imposes less restrictions than in many other languages, but while the same amount of guarantees can still be given due to the interesting properties of transitive const-ness and immutability, this enables very natural interactions with other language features, and perhaps most importantly, imperative-style code.

Where to go for further information? The actual specification for pure is not very long, see the Functions chapter of the language reference at dlang.org. For background information about the evolution of the current design, the discussion started by Don Clugston which led to the last big change is certainly an interesting read – the D programming language forums might also be a good place to ask specific questions about design and implementation of the concepts described here.

¹The consequences of this can be a lot more serious and confusing than one might think: Historically, several printer drivers on Windows modified the FPU flags when issuing a print job without changing them back afterwards. This caused quite a few programs to crash after a document was printed – the perfect case of a hard-to-debug crash occurring only on customer machines…

²D code can be restricted to a memory safe language subset, sometimes referred to as SafeD. The feature can be activated on a per-function basis by applying one of three attributes: Code marked as @safe is guaranteed to be memory safe, and thus e.g. cannot do pointer arithmetic or use C-style memory management. @system code is the opposite – here the full language, including inline assembly and unsafe pointer casts is allowed. Finally, @trusted acts a bridge between both worlds, it contains hand-vetted interfaces to unsafe code. A typical example for the latter would be a type-safe D wrapper around a C void*-style API.

³In D, as notably opposed to C++, const and immutable are transitive. In case of const, this means that everything reachable through a const reference automatically becomes const as well. For example, given struct Foo { int bar; int* baz; }; void fun(const Foo* foo);, in C++ fun is not allowed to modify foo itself, e.g. to set foo->bar to a different value, but can legally modify the value foo->baz points to – this is also called shallow const. In contrast, D features deep const, which means that in fun, foo.baz automatically becomes a const pointer to a const int, disallowing modifications to *foo.baz as well. The same rules apply to immutable, except that it additionally guarantees that no mutable view on the data exists at all, i.e. that not only fun does not modify its parameter, but nobody else ever does so (you could imagine immutable values to be stored in some kind of ROM). immutable implies const.

⁴This example was picked for its illustrative qualities, but admittedly would probably only work like this for a simple software rasterizer. Besides the question of whether purity is much of a benefit here, if an actual graphics API was used to implement it, extra thought would have to be put into how to handle the GPU state in a pure manner.

⁵ Just as C++ iterators are a generalization of pointers, D ranges generalize the notion of an array or a slice of data. In its most basic form, a range offers three primitives, empty, front and popFront. This interface is completely oblivious to how the underlying data is stored – it could come from a chunk of memory as well as from a network transport or the standard input –, and provides an easy to use, yet powerful abstraction for algorithms to work on.

Thrift now officially supports D!

2012-03-27T00:00:00+01:00

Thrift is a cross-language serialization and RPC framework, originally developed for internal use at Facebook, and now an Apache Software Foundation project. I started working on support for the D programming language during Google Summer of Code 2011, and at the end of last week, the implementation was finally incorporated into the main project.

First, let me thank Jake Farrell and everybody else on the Thrift team who was involved in THRIFT-1500; reviewing a ~719 kB patch certainly isn’t an easy thing to do. But now that the work is in, what can you (as a Thrift user) expect from the implementation?

Feature-wise, the library should roughly be up to par with the other major implementations (i.e. C++ and Java):

Protocols: Binary, Compact and JSON. The Dense protocol has not been implemented yet – it is only supported by the C++ implementation and I am not sure about its relevance nowadays (but if you are at a certain well-known company and it turns out that you still need the feature for new projects, let me know; adding support for it should not be hard).
Transports: Socket, SSL, HTTP and log file reader/writer implementations (plus your familiar helpers, i.e. buffered/framed/memory-buffer/piped/zlib...)
Servers: several single- and multithreaded variants (including a libevent-based non-blocking implementation)
Clients: Both a synchronous and an asynchronous (future-based interface with one or more libevent-backed worker threads) implementation are provided. Additionally, several pooling implementations for redundancy as well as aggregation use cases are available.

The implementation makes heavy use of D’s metaprogramming capatibilties and is also able to work without code generated off-line from .thrift files, if so desired. There are also a few experimental gimmicks, such as the capatibility to generate Thrift IDL files from existing D types at compile time. Soon to come:

Unix domain sockets: Currently, the D implementation only supports IPv4 and IPv6 TCP sockets, because that is what the D standard library does, but starting with the next release, it will also support Unix domain sockets (if really needed, the lack of support in `std.socket` could be worked around without much effort, though).
@safe-ty annotations: The D language features built-in memory safety annotations. The majority of the methods in the D Thrift library should be memory safe (except for e.g. `TTransport.borrow`), so marking it as such allowed Thrift to be used in D programs where safety is enforced without requiring the user to mark the Thrift calls as `@trusted`.

So, how to get started? As said above, the source code has been mergerd from my personal GitHub repo to the trunk of the main ASF repo, and as soon as the currently ongoing rework of the official Thrift site is completed, the Getting Started with Thrift and D and Building Thrift/D on Windows pages will follow along. A recent build of the API docs is currently available here on my website. If you find any bugs, be sure to file them at the Thrift JIRA.

`getaddrinfo` cross-platform edge case behavior

2012-01-31T00:00:00+00:00

An often-needed piece of functionality in network programming is to resolve human-readable host or port names to their numerical equivalent, for example in order to pass the latter to operating system socket APIs. The getaddrinfo function fills this role on POSIX and Windows. Apart from some flags, it accepts two string parameters for host and service (port) names and returns a list of corresponding IP addresses and port numbers, superseding the older gethostbyname and getservbyname functions.

Either of its string parameters is allowed to be null, representing the local host/all interfaces (depending on whether AI_PASSIVE is specified) and an automatically assigned port, respectively. Both parameters being null at the same time, however, is disallowed by the specification, and leads to a EAI_NONAME error on Posix or WSAHOST_NOT_FOUND on Windows. What happens if the strings are empty ("\0") instead of null, however, is left open by RFC 2553, and not really mentioned in the operating system API documentation either.

It turns out that there are quite a few differences there between the various operating systems, which is obviously likely to cause issues for Wine (an implementation of the Windows API on Posix/X systems). To get a clear understanding of how the different cases are handled, I put together a little D program which tests a few combinations of host name, port, and flag parameters (see end of post). The snippet could be written in C just the same, as getAddressInfo directly maps to getaddrinfo, I just chose D to avoid platform dependencies and writing an unduly large amount of more boilerplate code.

The results are summarized in the following table, where »loopback« means that the IP addresses returned were 127.0.0.1 and ::1, »catchall« refers to 0.0.0.0 and ::, »public« means that the actual IP addresses of all available network interfaces were returned, and NONAME refers to a lookup error. »hostname« means that the actual fully qualified name of the host that ran the test was used (care: the host part of the FQDN only does usually not resolve on OS X).

Host	Port	Flags	Windows	Linux	OS X
`null`	`null`	-	`NONAME`	`NONAME`	`NONAME`
		`AI_PASSIVE`	`NONAME`	`NONAME`	`NONAME`
	`""`	-	loopback	loopback	`NONAME`
		`AI_PASSIVE`	catchall	catchall	`NONAME`
	`"0"`	-	loopback	loopback	loopback
		`AI_PASSIVE`	catchall	catchall	catchall
	`"80"`	-	loopback	loopback	loopback
		`AI_PASSIVE`	catchall	catchall	catchall
`""`	`null`	-	public	`NONAME`	`NONAME`
		`AI_PASSIVE`	public	`NONAME`	`NONAME`
	`""`	-	public	`NONAME`	`NONAME`
		`AI_PASSIVE`	public	`NONAME`	`NONAME`
	`"0"`	-	public	`NONAME`	loopback
		`AI_PASSIVE`	public	`NONAME`	catchall
	`"80"`	-	public	`NONAME`	loopback
		`AI_PASSIVE`	public	`NONAME`	catchall
`"localhost"`	`null`	-	loopback	loopback (v4)	loopback
		`AI_PASSIVE`	loopback	loopback (v4)	loopback
	`""`	-	loopback	loopback (v4)	loopback
		`AI_PASSIVE`	loopback	loopback (v4)	loopback
	`"0"`	-	loopback	loopback (v4)	loopback
		`AI_PASSIVE`	loopback	loopback (v4)	loopback
	`"80"`	-	loopback	loopback (v4)	loopback
		`AI_PASSIVE`	loopback	loopback (v4)	loopback
hostname	`null`	-	public	loopback (v4)	public
		`AI_PASSIVE`	public	loopback (v4)	public
	`""`	-	public	loopback (v4)	public
		`AI_PASSIVE`	public	loopback (v4)	public
	`"0"`	-	public	loopback (v4)	public
		`AI_PASSIVE`	public	loopback (v4)	public
	`"80"`	-	public	loopback (v4)	public
		`AI_PASSIVE`	public	loopback (v4)	public

getaddrinfo() behavior on Windows Server 2008 R2, Arch Linux (Kernel 3.1.4, glibc 2.14.1), and OS X 10.7.2 (Lion).

What caused me to investigate the issue in the first place is the behavior when given an empty, non-null host string: Windows returns the public addresses of the present interfaces, while OS X resolves them to ::1/::, but only if a port is given, and Linux doesn’t resolve them at all! Windows generally accepts the most combinations, returning an error only for the explicitly disallowed combination, which is relied on by some applications (e.g. the game League of Legends).

There were also some less significant differences in behavior which are mostly not listed in the table. First, in both of the Linux VMs I tried (an up-to-date Arch box and Ubuntu Oneric), only the IPv4 address of the loopback interface was returned. Second, as in the test no address family, socket type or protocol hints were passed to getaddrinfo(), each address was returned twice on OS X, once with SOCK_STREAM/IPPROTO_TCP and once with SOCK_DGRAM/IPPROTO_UDP set. Linux returned three copies of each address, for STREAM, DGRAM and RAW, with the according protocol types set, whereas Windows only returned a single copy with protocol type IPPROTO_IP and socket type set to 0.

In any case, as a result I have prepared a patch for Wine to emulate at least the succeeding/failing behavior of the Winsock incarnation of getaddrinfo on Linux and OS X, which should solve the bigger part of the related problems. There ideally shouldn’t be any Windows software relying on details beyond that (such as the actual number/layout of addresses returned), but who knows…

import std.algorithm, std.conv, std.range, std.socket, std.stdio;
alias AIF = AddressInfoFlags;
void main() {
  foreach (host; [null, "", "localhost", Socket.hostName()])
  foreach (port; [null, "", "0", "80"])
  foreach (flags; [cast(AIF)0, AIF.PASSIVE]) {
    write(
      host ? "'" ~ host ~ "'" : "null", ":",
      port ? "'" ~ port ~ "'" : "null", " (", flags, "): "
    );
    try {
      getAddressInfo(host, port, flags)
        .sort!((a, b) => a.family < b.family)
        .map!(a => text(a.address, " (", a.protocol, ")"))
        .join(", ")
        .writeln;
    } catch (Exception e) {
      writefln("[%s]", e.msg);
    }
  }
}

D program used for gathering the data (longer than necessary for somewhat nicely formatted output).

D/Thrift: Performance and other random things

2011-08-01T00:00:00+01:00

This week, I will try to keep the post short, while still informative – I spent way too much time being unproductive due to hard to track down bugs already to be in the mood for writing up extensive ramblings. So, on to the meat of the recent changes (besides the usual little cleanup commits here and there):

Async client design: Yes, even though it took me quite some time to come up with the original one, I had completely missed the fact that it would be unreasonably difficult to extend the support code with resource types other than sockets – long story short, TAsyncSocketManager now inherits from TAsyncManager, instead of being a part of it. Also, I split TFuture into two parts, a TFuture interface for accessing the result, and a TPromise implementation for actually setting/storing it, and only the TFuture part is returned from the async client methods. The thrift.async docs are actually useful now.
Async socket timeouts: Correctly handling the state of the connection after a read/write timeout turned out to be a surprisingly tough problem to solve (allowing other request to be executed on the same connection after a timeout could lead to strange results). In the end, I settled for just closing the connection, which is a simple yet effective solution. To correctly implement this, I also had to finally kill the TTransport.isOpen related contracts and replace them with exceptions in the right places, leading to modified/clarified isOpen semantics.
The non-blocking server now handles one-way calls correctly, and modifying the task pool after it is running no longer leads to undefined results. In the process, I have also turned the static event struct allocations into dynamic ones, since this should have no measurable performance impact, but removes the dependence on the (unstable, per the libevent docs) struct layout.
D now also has a TPipedTransport, which forwards a copy of all data read/written to another transport, useful e.g. for logging requests/responses to disk.
The biggest chunk of time was actually spent on performance investigations: While I was pretty certain that the D serialization code should not perform any worse than its C++ counterpart already, the difference in speed merely being compiler-dependent, I wanted to prove this fact so that I could cross this item from the list. This involved updating LDC to the 2.054 frontend (only to discover that Alexey Prokhin decided to start work on it at the same time I did, the related commits in the main repository are his now), fixing some LDC-specific druntime bugs, etc¹. Unfortunately, I couldn’t test GDC because of issue 6411, but without further ado, here are the results:

	Writing / kHz	Reading / kHz
DMD v2.054, -O -release -inline	2 051	1 030
GCC 4.6.1 (C++), -O2, templates	5 667	1 050
LDC, -O3 -release	2 300	1 077
LDC, -output-ll / opt -O3	5 500	3 150
LDC, -output-ll / opt -std-compile-opts	6 700	1 950

At this point, I will disregard my earlier resolution and again get into the nitty-gritty details – the rest of this post can easily be summarized as the D version is indeed up to par with C++, when it is equally well optimized, but if you are curious about the details, read on.

If you read the performance figures from my last post, the first thing you will probably notice is that the C++ reading performance figure is about four times lower now. This isn’t a mistake; noting the comparatively slim advantage of the C++ version, I made a change to it quite some time ago, which avoids allocating a new TMemoryBuffer instance on every loop iteration (the D version also reuses it). Without really considering the implications, though, I also moved the construction of the OneOfEach struct out of the loop. This seemed like a minor detail to me, but in fact, it enabled reuse of the std::string-internal buffers for the string members of the struct, which is unrealistic (e.g. for a pretty similar situation in the non-blocking server, there is no buffer reuse possible as well).

In a situation where a big part of the time is spent actually allocating and copying around memory, this makes a big difference. To test this assumption about the big influence of memory allocations, I compiled a version of the D benchmark where a static buffer for the strings was used instead of reallocating them every time, and indeed, the reading performance was more than twice as high.

The std::string implementation of the GCC STL seems to be fairly inefficient in this case, because the best D result (which uses GC-allocated memory), is almost three times faster than it for the reading part. It is possible that there are some further optimizations which could improve performance (-O3 didn’t change things for the better, in case you are wondering), but as my goal wasn’t to squeeze every last bit of performance out of this synthetic benchmark, I didn’t investigate this issue any further.

But now to the D results: Simply switching to LDC 2 instead of DMD didn’t give any great speedups, because readAll() wasn’t inlined by it either, thus leaving all the memory copying unoptimized, as discussed in the last post. To see how much of a difference this would really make, I compiled the D code to LLVM IR files and manually ran the optimizer/code generator/linker on them, with the plan being to manually add the alwaysinline attribute to the relevant pieces of IR:

ldc2 -c -output-ll -oq -w -release -I../src -Igen-d ….d
llvm-link *.ll -o benchmark.bc
opt {-O3, -std-compile-opts} benchmark.bc -o benchmark_opt.bc
llvm-ld -native -llphobos2 -ldl -lm -lrt benchmark_opt.bc

I then discovered that the method calls in question were properly inlined by the stand-alone opt without any manual intervention anyway. I am not really sure why this happens; the inliner cost limits could be more liberal in this case, or the optimization passes being scheduled in a different way than inside LDC could have an impact, or maybe it’s connected to the fact that TMemoryBuffer and the caller are in different modules (to my understanding, LTO shouldn’t be required to optimize in this case, but it may well be that I am mistaken here).

The LDC -output-ll rows in the above table correspond to the benchmark compiled this way, with the -std-compile-opts and -O3 flags passed to opt, respectively. This is a nice example of how important compiler optimizations for this, again, synthetic benchmark really are: for the reading part of the benchmark, -O3 gives a nice speed boost because of the more aggressive inlining (-std-compile-opts doesn’t touch TBinaryProtocol.readFieldBegin(), which is called 15 times per loop iteration and contains some code that can completely be optimized out), but for the writing part, its result is actually slower, presumably because of locality effects (the call graphs are identical).

The only change related to benchmark performance I made since the last post was an LDC-specific workaround to stop manifest constants from incorrectly being leaked from the CTFE codegen process into the writing functions. I think the above results are justification enough to stop worrying about raw serialization performance – the results when using the Compact instead of the Binary protocol are similar – and moving on to more important topics².

¹ ~~If you are curious about LDC 2, you can get the source I used from the official hg repo, and the LDC-specific druntime and Phobos source from my clones at GitHub~~. LDC is officially on GitHub now.

² Such as performance-testing the actual server implementations, but I don't expect any big surprises there, and I am not sure how to reliably benchmark the network-related code – running server and clients on the same machine is probably a bad idea?

D/Thrift: Non-Blocking Server, Async Client, and more

2011-07-15T00:00:00+01:00

First of all, the usual apologies for publishing this post later than I originally planned to. No, seriously, drafting a solid asynchronous client implementation ended up being a lot more work than I originally anticipated, but I wanted to discuss my ideas in this status report. Now, the post turned out way too large anyway, but I guess that’s what I deserve. ;)

Also, a quick notice beforehand: A week ago, DMD 2.054 was released. It is the first version to include, amongst a wealth of other improvements, Don’s necessary CTFE fixes and my std.socket additions. This means that it is no longer necessary to use a Git build to use Thrift with D, you can just go to digitalmars.com and fetch the latest package for your OS.

Small but useful additions

But before discussing the intricacies of non-blocking I/O, to the mundane helper transports that found their way into the D library: The first addition was a simple TInputRangeTransport which, as the name says, just reads data from a generic ubyte input range, with some optimizations for the case where the source is a plain ubyte[] (std.algorithm.put is currently unnecessarily slow if both ranges are sliceable, I didn’t have time to prepare a fix for Phobos yet). It can e.g. be used in cases where want to deserialize some data from a memory buffer, and don’t need to write anything back (which is where TMemoryBuffer would be used).

Another addition is TZlibTransport, which wraps another transport to compress (deflate) data before writing it to the underlying transport, and decompress (inflate) it after reading. This is implemented by directly using zlib (via the C interface) instead of using std.zlib, because the API of the latter would have made it impossible to avoid needlessly allocating buffers all the time. Thankfully, the C++ library already included a zlib-based implementation, saving me from working out the various corner cases.

Some deserialization micro-optimizations

The next thing I worked on were some further optimizations motivated the serialization_benchmark. To recapitulate, it is a trivially simply application which just serializes a struct (OneOfEach from DebugProtoTest.thrift to be precise) to a TMemoryBuffer and then reads the data back into the struct again, repeating both parts a number of times to be able to get meaningful timing results. Here are my related changes:

First, I replaced TMemoryBuffer with the new TInputRangeTransport to avoid copying the data on each iteration of the reading loop. Because the initial copying to the memory buffer took only ~1–2% of the overall time anyway, this didn’t have a great speed impact.
The next change was to provide a shortcut version of TTransport.readAll() for TInputRangeTransport (and TMemoryBuffer as well). Previously, the generic TBaseTransport version which just calls read() in a loop was used – because the method is called about 50 times per reading loop iteration, replacing it with a simple slice assignment gave a ~20% speedup on the reading part of the serialization benchmark.
Furthermore, I nuked the protocol-level »read length« limit implemented for the Binary and Compact protocols. This was not much from an optimization perspective as simply due to the fact that limiting the total amount of data read really belongs at the transport level in my eyes (it was only present because of a, uhm, misguided attempt to draw inspiration from the Java library). Incidentally, this gave another ~15% speedup in the reading benchmark. I will add support for limiting the container and string size Really Soon™ (just as for C++, to be able to somehow cap the amount of memory allocated due to network input), but one more branch per container/string read should have a negligible performance impact.
Finally, I removed a few instances where memory was unnecessarily zero-initialized (only to be completely overwritten later) in the reading code. For the integer buffers (used for byte order conversion) this gave a small but measurable (<5%) performance boost, and for the binary/string reading (which is both larger in size and exercised more often during the benchmark) another ~8% speedup.

Profiling results

So, after all these (de)serialization micro-optimizations (I improved the writing part when first working on performance), how does the D implementation compare to its natural competitor, the C++ one? Well, frankly not too well at this point. Before discussing my findings in more detail, the performance results as measured on an x86_64 Arch Linux VM¹, hosted on my MacBook Pro (Intel Core i7-620M 2.66 GHz, OS X 10.6), by running each part 10 000 000 times and averaging over it (the results are in 1 000 operations per second, so both implementations can perform on the order of a million reads/writes per second):

	Writing / kHz	Reading / kHz
DMD v2.054, -O -release -inline	2 051	1 170
GCC 4.6.1, -O2	4 624	2 053
GCC 4.6.1, -O2, templates	5 667	4 509

The first GCC row shows the result of the vanilla build (what you get by simply doing cd lib/cpp/test; make Benchmark; ./Benchmark), while for the »templates« row, I added the (undocumented?) templates flag to the generator invocation (thrift -gen cpp:templates), which causes the struct reading/writing methods to be templated on the actual protocol type, much like what I implemented for D. In this benchmark, eliminating any indirections naturally has a huge impact on the performance.

So, why has the D version less than half the throughput for writing, and is almost four times slower on reading? Let me first point out that the actual code for the C++ and D implementations is, from a semantic point of view, virtually the same (with the exception of D using garbage collected memory for string/binary data). I think I have arrived at a point where the single largest factor influencing the performance of the serialization code is the compiler used, or to be more exact, how well it optimizes the code.

What follows are a few result from my profiling sessions (Valgrind 3.6.1, visualized using KCachegrind²) which corroborate with my assumption that compiler optimizations are the culprit here. Let’s first have a look at the profiler results for the reading part of the benchmark (this time, the loops were run only a million times each):

C++ reading time profile.

D reading time profile.

I only included the top six functions (by time spent in them) here for the sake of brevity, but for both implementations, the »long tail« of calls in the flat profile are actually runtime helper functions, mostly startup initialization code and memory management-related things used for reading the string functions (for D, GC calls show up prominently, because the benchmark allocates three million strings, which triggers almost 50 collections in between).

This also means that the compiler has done a pretty good job at combining all tiny deserialization functions into the top-level struct reading function by inlining – with one glaring difference: DMD chose not to inline TInputRangeTransport.readAll(), which is ultimately called when deserializing each and every member to read the actual bytes off the wire (or in this case, from memory), thus yielding to 49 million additional function calls. To make matters worse, this also means that the number of bytes requested each time (e.g. 4 for an integer) is not known at compile time, which also means that the generic memcpy implementation has to be called each time. On the other hand, the C++ implementation only calls memcpy in those situations where the number of bytes copied really depends on a runtime value, which is the case for strings which are intrinsically variable-length (the other memcpy calls are called during initialization and initially writing the struct to the buffer).

Profiling the writing part shows similar results:

C++ writing time profile.

D writing time profile.

Again, for the C++ version, everything is inlined into OneOfEach.write(), in which over 80% of the time are actually spent, and just as for the reading part, the only instance where memcpy() is not inlined³ is for strings. On the other hand, the D version is optimized almost as well as the C++ version, with the only exception of TMemoryBuffer.write() not being inlined, which again prevents memcpy from being optimized (the other function showing up, reset(), only resets output buffer once per iteration, this is inlined into main in the C++ version).

So, to recapitulate, I am not sure whether DMD would be able to replace a memcpy() call with optimized asm in the first place, but not knowing the length at compile-time prevents that anyway. I am pretty sure that this difference of about a hundred million function calls and not being able to write optimized text for the short (2, 4, 8, …) byte copies accounts for a large part of the performance gap.

This assumption is supported by data gathered from a case where GCC chose to not inline TBufferBase::write() (which is the common path of TMemoryBuffer::write()). Interestingly, this actually happens at -O3, which is a higher optimization level than -O2 used above (I suppose because of some additional optimizations performed on it, which causes its inlining costs to rise high enough not to be inlined). Just for comparison, here are again the five top functions from the profile:

C++ writing time profile when compiled with -O3, causing TBufferBase::write() to be no longer inlined.

Just as for D, because of this memcpy cannot be optimized away either. And unsurprisingly, this causes the performance to go through the floor as well, the executable only reaches 2 519 thousand operations per second now. The D version is still a bit slower with only 2 051 kHz, but it is on a comparable level now.

So, to finally come to a conclusion, most of the performance gap between C++ and D presumedly comes from DMD not inlining a key function and thus not being able to optimize away memcpy calls as well. An obvious experiment would be to try a different compiler like GDC or LDC, both of which are known to generally optimize better than DMD does. Unfortunately, both of them are currently at front-end version 2.052, but my Thrift code currently requires 2.054.

There are two possible solutions to this, either sprinkle workarounds all over the Thrift code to be able to use the older DMD frontend and Phobos versions for the benchmark, or update the frontend of GDC or LDC to 2.054. While the former would be entirely feasible, I think I update the LDC frontend once I have some time to spare, as this will also be useful for other D projects (choosing LDC because I am already familiar with its codebase).

Libevent-based non-blocking server

If I didn’t lose you during all the talk about micro-optimization above, let me hereby present you the two main additions to the library during the last two weeks: a non-blocking server implementation and a Future-based asynchronous client interface.

I am not sure if I ever stated it explicitly (the timeline only has »event-based I/O Phobos lib?« in parentheses), but I was hoping to be able to come up with a small general-purpose non-blocking I/O library written in D as a by-product of this project. The obvious time to start working on it would have been when implementing the non-blocking server, But after considering several possible designs, I realized that I did not yet know the problem domain well enough to come up with something that is not just a cheap libevent/Boost.Asio rehash, but where I’m still sure that it performs well enough for a production-grade Thrift server implementation.

Thus, I went with simply porting the C++ libevent-based server implementation over to D, which has the benefits of being battle-proved, so that I have something which I can advise people to use in production code without feeling guilty. There are a few instances where I needed to manually add a GC root for some memory passed to libevent, but other than that, the code is reasonably clean, even though it surely could be prettier if a native D »event loop« was used.

A word of warning for Windows users: While libevent is linked dynamically as well, thus making it easy to just use DLL builds on Windows, there are some pieces of the socket code not yet tested for WinSock. Currently, I am not even sure if all of the code compiles on Windows, but I will perform some test on Windows shortly to ensure all the new additions work there as well.

Coroutine-based `TAsyncClient`

Using an asynchronous/concurrent approach for network-related code with its intrinsic I/O latency seems like a very obvious thing to do, but to my knowledge e.g. the C++ libraries currently do not provide a generic async client implementation, which is the part of the reason I did not tackle this earlier.

After getting accustomed with the general idea of non-blocking I/O, it seemed to be a good time to finally work on the topic. What I basically wanted to implement was a way to off-load client-side request/response handling, possibly for multiple connections, to a worker thread, providing a future-based interface to the client code. For multiplexing handling multiple connections per worker thread, I wanted to experiment with a coroutine-based design.

As mentioned in the beginning of this report, coming up with a solid design took me a bit longer than expected, but as of now, thrift.codegen includes a fully functional TAsyncClient implementing such a scheme, also using libevent to have a portable means for handling non-blocking I/O. The new thrift.async package contains the related helper code, such as TAsyncSocket representing a non-blocking socket.

The new code is not yet well-documented or tested, and is still missing some important features like the ability to set timeouts on operations, but I have successfully tested basic use cases.

Plans for the second GSoC half

Which finally brings me to the end of this post: As my project fortunately passed the mid-term evaluations, it is now time to discuss how to go forward during the second part of the Summer of Code.

During the next week, I will work on some of the obviously unfinished things like async client documentation and tests, and will add a few missing utilities such as a tee-like transport which can be used to transparently log requests.

Speaking of documentation, this currently is a big issue for both the D implementation and, to a lesser extent, Thrift in general. However, as of now, I have worked sufficiently long on the code that I am effectively blind for what kinds of documentation a typical user would benefit the most from – more detailed API docs? Simple stand-alone examples with well-commented code? Tutorials? It would be great if you could let me know what you think would be useful.

With the non-blocking server implementation being completed, only the »performance« and »documentation« items from my original timeline remain, besides some general clean-up work being left to do. However, Nitay, my mentor, suggested a few other things which could be worth looking into, such as a generalized client for querying multiple servers, to be used for things like redundancy, load distribution, data verification, etc. I will discuss this in more detail and then update the timeline accordingly.

¹ Why test in a Linux VM (Virtual Box 4.0.8) rather than directly on my OS X development box? Because Linux x86_64 is probably where most of the server-side deployments will end up, only an ancient GCC is available on OS X, DMD is still 32 bit-only there, and Valgrind/Callgrind which I used for profiling is not really usable on OS X 10.6. I am aware that using a VM might skew the results a bit, but I think the impact shouldn't be too large. Incidentally, the tests compiled and ran in the Linux VM generally performed better faster than on the host.

² I patched KCachegrind to elide the middle of the symbol name for better readability in width-limited screenshots, and used my own little demangling tool for the D results.

³ Technically, GCC handles memcpy as compiler built-ins, so inlining might not be precisely the right term, but the effect (avoiding a function call) is the same.

D/Thrift: Docs, Servers, Tests

2011-07-04T00:00:00+01:00

Dear Reader,

Let me apologize for not being terribly motivated to write a blog post right now, but I was lucky enough to catch the flu last week with temperatures between 25 °C and 35 °C outside, and while the fever has gone by now, I am still depleting my tissue stock at an insanely high rate…

Anyway, back to topic, the usual summary of the recent changes to my Thrift GSoC project:

The most important item on the list from an user point of view are probably the documentation improvements: The project now has a Getting Started page, and I have made a complete pass through all the DDoc docs, a build of which is available here (I still have to whip up a nice design, but it should do for the moment).
More interesting from a coding perspective are the additions of two new thrift.server implementations, TThreadedServer and TTaskPoolServer. The former is a naive implementation of a threaded server which just spawns a new worker thread per client connection, while the latter uses a std.parallelism thread pool to process the queued client connections (the maximum number of active connections is configurable). I also added a D version of the StressTest server for sanity checking.
Another server-related change is the addition of server and processor event handlers, which can be used to hook custom code into various points during the server/request lifecycle, e.g. for collecting diagnostic data. Data can be persisted between the calls by saving them as connection/call »context«, which is a Variant the server code passes around for you (I went with variants over e.g. templating the server code on the context object type simply to avoid adding another layer of complexity for a non-essential feature).
I added a standalone test case exercising the different transport types (socket, file, memory buffer) combined with the various wrapper transports (buffered, framed), modeled after the C++ TransportTest. This has uncovered a number of (sometimes not-so-) subtle defects in the transport implementations which have been fixed (TSocket not handling EINTR, framed/memory buffer borrow() would also return smaller than requested, TFileReaderTransport not tailing files correctly, …).
Build system improvements: The stand-alone test cases are now organized in a much less cumbersome scheme, and DDoc documentation is generated for the library by default. lib/d/README now has instructions on how to generate a self-signed certificate for SSL socket testing.

If you are on OS X, you might want to manually apply Phobos pull request 131 until it is merged into Git master to avoid your servers crashing due to an unhandled SIGPIPE (you can also just set signal(SIGPIPE, …) to SIG_IGN in your startup code).

I have also added a list of not yet scheduled ideas to the project page. Implementing a ZLib compression transport is currently the top item on my list, after which I will start to work on a non-blocking server implementation as planned. An asynchronous version of TClient is something I certainly want to implement, but I am planning to defer work on it until I have tackled the non-blocking server, as I could end up using the same approach (e.g. libevent) for it.

D/Thrift: Compact, JSON protocols, performance

2011-06-22T00:00:00+01:00

Another week of my Google Summer of Code project passed by, and so you are reading another status update. I am not including any core D development-related news this time, first because I didn’t do much DMD/Phobos work last week, and second because it gets tedious to list everything here – feel free to see my GitHub activity stream for more information. But still, thanks to Sean Kelly for quickly fixing the OS X threading/GC race condition I encountered the week before.

One of my targets last week was to do some preliminary performance investigations and using the insights gained to modify the protocol interface accordingly before I implement additional protocols. For this, I used the DebugProtoTest.thrift-based serialization performance test already implemented for C++ and Java (see the D version at GitHub, a more intensive look at performance, including creation of some more extensive benchmarks is planned for later).

Ironically, the change with the biggest impact on the writing performance didn’t have anything to do with the protocol interface at all: When first writing TMemoryBuffer, I simply implemented write() as D array appending operation, because I didn’t want to spend much time on optimizing it yet, and I figured that as long as there would not be too many reallocations, it should be reasonably fast for testing purposes. Array appending translates to a non-inlined and not really cheap D runtime call, however, and TMemoryBuffer.write() unsurprisingly happens to be the single most called function in the whole writing part of the benchmark. After changing TMemoryBuffer to manual malloc/free-style memory management, the writing part finished in less than 30% of the time.

I tried to switch to GC.malloc instead of manual freeing afterwards because it would make getting a buffer content slice safe and the small memory allocation overhead should not really be a problem for typical TMemoryBuffer use cases (it does not matter at all in this benchmark because the required amount of memory is pre-allocated), but I encountered some strange data corruption issues in the other larger test cases I have yet to track down. Most probably, I just missed some subtleties when treating GC.realloc as a drop-in realloc/free replacement, but I just didn’t find a way to pin-point the issue.

For the next step, I tackled the design of the TProtocol interface: When building the first prototype for the library, I had the ad-hoc idea of passing in delegates to the aggregate reading/writing functions for processing their members. I figured that this would make the interface nicer as all the *Begin()/*End() pairs could be collapsed into a single call, the struct member reading loop could be moved into the protocol itself instead of being duplicated over and over again (although this is not a real benefit besides a slight code size reduction because it is generated code anyway), and implementing protocols like JSON would be easier since the structural information would not have been completely lost compared to a »flat« interface.

I was, however, aware of the fact that this could pose a performance problem, and indeed some experimenting showed that DMD generated suboptimal code for delegate literals and was not really able to them, even for scoped delegates. From a compiler point of view, this is not really surprising as generating better code would require a fair bit of analysis to be done, but still I decided to switch to a more simplistic protocol design for the time being – even more so, as I realized that my design idea would not really simplify implementing JSON-like protocols anyway. I chose to go with the C++/Java interface verbatim, as it is proven to work (and having a similar interface across multiple languages has its own merits as well), and with the changes in, I measured a 20% speedup, even though no inlining was possible due to virtual calls all over the place yet. (In hindsight, it might have been better to implement the template mechanism first, so that the actual impact of the protocol API change would have been more visible. Maybe I’ll revert the binary protocol back to the old interface and re-run the test to get precise numbers at some point in the future.)

Finally, I implemented a way to specify the concrete transport/protocol types used in the application at compile-time using templates (similar to C++ and the templates Thrift compiler argument), thus eliminating most virtual calls and enable the compiler to inline calls all over the library. I expected to see a dramatic speedup here as well – when not specifying the protocol/transport type, the writing loop in the C++ benchmark is only half as fast –, but instead I saw »only« a 40% speedup overall, with the C++ version still being significantly faster.

When comparing profiling data for the optimized C++ and D versions, I noticed that in the D version _memcpy gets called ten times as often as in the C++ version – GCC, being able to inline the write() calls, is able to replace the calls with optimized routines for shorter lengths, and since both versions spend most of their time actually copying data around at this point, this yields a huge advantage.

After that, I did not make any further attempts at optimizing the D version, since performance was not my primary goal at this stage anyway – the basic design seems to be solid, and what left are micro-optimizations. When focussing on performance later in the term, I will certainly create more benchmarks, and also try to optimize the languages I will compare D to (C++ and Java, most likely) – for example, the current C++ serialization benchmark from the official HEAD does a lot of unneeded work in the reading loop, moving out the initialization code makes it run twice as fast. I will also have a look at using GDC and LDC instead of DMD for their more sophisticated backends, and document the exact performance findings on various platforms.

Even though I am not going to write that much about it, I spent the bigger part of my time on non-performance related work: Generated structs now have an appropriate toString() and opEquals() implementation, the D ThriftTest client actually checks the data it sends/receives instead of just flooding the console with messages (no idea why this hasn’t already been implemented for C++ and Java), and last but not least, I implemented the Compact and JSON protocols for D. This completes the protocol section, as I do not plan to implement the Dense protocol unless there is much time left to spend during the end of the term (as previously discussed).

During the next (or rather: this) week, I am going to work on documentation, integrate a number of test cases I have already lying around with the repository/build system, and implement a simple multithreaded server.

D/Thrift GSoC: Growing the library

2011-06-14T00:00:00+01:00

First, let me apologize for not posting an update last week – I had a busy time, but regardless I will try to let you know about the state of affairs regularly in the future. Now, what were I working on? I updated my project page based on the timeline previously discussed at my project mailing list, and – besides me being a day late, more below – it is still valid. These were the main points I worked on:

Build system integration: The D library is now integrated with the Thrift Autoconf/Automake build system. If a working D2 compiler is detected, the libthriftd static library containing all the modules is now built on issuing make along with the rest of Thrift. make check runs the unit tests for each D module and builds the standalone test executables (i.e. ThriftTest for now).
Socket transport enhancements: Implemented interrupt() for the server socket which can be used to notify a server waiting on a blocking socket for connections about shutdown, added socket timeouts, properly handle exceptions thrown by std.socket, …
Added a D implementation of TMemoryBuffer, which is widely internally used and a nice tool for writing unit tests as well. Implemented the Framed transport in D.
Implemented TFileReaderTransport and TFileWriterTransport, the D equivalent to the C++ TFileTransport. I separated the two components because I could not really think of a situation in which you would use both at once, and conflating the two would complicate the state space (I am not even sure if the C++ implementation does what it is supposed to if read/write calls are interleaved) and make the implementation unnecessarily complex. The TFileWriterTransport implementation performs the actual file I/O in a separate worker thread, which communicates with the main thread using a message passing approach (leveraging D’s std.concurrency module).
A simple HTTP client/server transport, closely modeled after the C++ implementation.
An SSL client/server socket implementation using the OpenSSL library, which is linked in dynamically (primarily for easy Windows compatibility). The actual implementation is pretty much a direct port of the C++ TSSLSocket, but I had to quickly write the D2 bindings for OpenSSL first. For now, the bindings live in thrift.util.openssl, as I only included the subset of functions I needed for Thrift, but I might move them out in the future.

As always, you can find the changes on my GitHub fork. I also spent a sizable chunk of my time on contributing some improvements and fixed to the D compiler and standard library projects. As for the issues I mentioned two weeks ago, kudos to Don Clugston for promptly fixing CTFE issues 6077 and 6078, and my DMD pull request 77 and Phobos pull request 65 were also merged in the meantime.

During the last two weeks, I worked on Phobos pull requests 73 (adds std.socket.socketpair), 87 (better std.file error messages), 90 (fixes a mailbox handling bug in std.concurrency – took me quite some time to track down as it caused sporadic deadlocks in my unit tests), 99 (adds timeout handling and hostname lookup to std.socket – I still don’t know why WinSock adds 500 ms to the recv() timeout), druntime pull request 28 (adds a Posix netdb.h module) and DMD pull request 118 (finally removes the _DH flag).

Furthermore, I collaborated with Daniel Murphy on fixing the long-standing issue that function pointers are not properly typechecked, resulting in DMD pull request 96 and [druntime pull request 26] (https://github.com/D-Programming-Language/druntime/pull/26). I have also started to work the dreaded DMD bug 314. While the basic fix is in place – that’s how I found the bug in Phobos pull request 102 – (I adapted the D1/LDC changes by Christian Kamm to D2/DMD), I still need to add some more tests and solve a few more complex cases. Unfortunately, I also hit two new issues which I have not been able to fix yet: 6108, a DMD contract inheritance bug, and 6135, a druntime OSX threading/GC crash.

D/Thrift GSoC: First results

2011-05-29T00:00:00+01:00

The first week of my D/Thrift project as part of the D programming language Google Summer of Code 2011 is over, and I am happy to be able to share some first results. If you are not sure what I am talking about yet: Apache Thrift, originally developed for internal use at Facebook, is both a data serialization/RPC protocol and its reference implementation. In short, it works by defining data types and services interface in a language-agnostic interface definition file. Then, a compiler is used for generating code from that .thrift file (currently written in C++), using target language support libraries which contain the actual serialization protocol/RPC implementation. Currently, Thrift supports a large number of languages including C++, Java, PHP and Python.

It was clear from the beginning that I would stick with this approach for my implementation, not only because the informal project goal is to establish D as an equal target language besides the existing ones, but simply because one of the main strengths of Thrift is that you can use the same interface definition for all target languages, with the compiler doing all the heavy lifting for you. I did, however, want to leverage the powerful metaprogramming capabilities of D (compile-time reflection, CTFE, string mixins) to lift as much work off the »ahead-of-time« C++ code generator as possible, having the option to use the Thrift libraries beyond the traditional scope of the project for ad-hoc extension of existing D data types and interfaces with serialization/RPC functionality at the back of my mind.

My primary goal during the first week was to evaluate the feasibility of this approach by quickly implementing the basic parts of each Thrift component. In more detail, the sub-goals I tackled during the last week were:

Create a preliminary implementation the central parts of the support library (TBinaryProtocol, TBufferedTransport, TSocket, …) using the C++ and Java implementations as reference to be able to directly test the progressing D implementation against other languages.
Implement the general and client-specific parts of compile-time code generation (struct reading/writing, method arguments/result struct generation TClient, …), using a hand-crafted Thrift tutorial interface to test it against the Java server.
Implement TSimpleServer and related basic server functionality (e.g. TServerSocket) to be able to test server code generation.
Complete missing server-side code generation bits (TProcessor, server-side arguments/result structs, …), again using a hand-crafted interface to test it against the Java Thrift calculator tutorial client.
Add D code generation to the Thrift compiler, and run the compiler against all the test interface files coming with Thrift (test/*.thrift) to catch any obvious issues.
Implement a ThriftTest server and client in D to exercise the more advanced serialization code paths and fix any bugs, testing it against the C++ implementation.

So far, no major problems popped up, and I was able to complete the above list as planned. I did, however, hit a few bugs in DMD, which is, on the other hand, doesn’t come as a total surprise because I am heavily using the metaprogramming facilities. I have been able to find workarounds for all of the issues, but it nevertheless took me quite some time to track them down initially: issues 6069, 6077, 6078, DMD pull request 77 and – this one is merely an enhancement – Phobos pull request 65.

If you want to have a look at the code, feel free to head to my GitHub Thrift fork, where I regularly push my work to. And just to give you a short glimpse of the very basic features (a lot more is already implemented), this is how you could implement a simple calculator server/client which adds two numbers using the Thrift library, without using any generated code.

// This could also be generated from a .thrift file and contain
// structs, exceptions, etc.
module calculator;

interface Calculator {
  int add(int lhs, int rhs);
}

Shared module containing the interface the server offers and the client consumes.

module server;

import calculator;
import thrift.codegen.processor;
import thrift.protocol.binary;
import thrift.protocol.processor;
import thrift.server.simple;
import thrift.transport.buffered;
import thrift.transport.serversocket;

class CalculatorHandler : Calculator {
  override int add(int lhs, int rhs) {
    return n1 + n2;
  }
}

void main() {
  // Expose a CalculatorHandler instance at port 9090.
  auto protocolFactory = new TBinaryProtocolFactory();
  auto processor = new TServiceProcessor!Calculator(
    new CalculatorHandler());
  auto serverTransport = new TServerSocket(9090);
  auto transportFactory = new TBufferedTransportFactory();

  auto server = new TSimpleServer(
    processor, serverTransport, transportFactory, protocolFactory);
  server.serve();
}

Server implementation, accepting connections on port 9090 using the binary protocol.

module client;

import calculator;
import std.stdio;
import thrift.codegen.client;
import thrift.protocol.binary;
import thrift.transport.buffered;
import thrift.transport.socket;

void main() {
  // Set up a client for the Calculator interface and try to
  // connect to localhost:9090.
  auto socket = new TSocket("localhost", 9090);
  auto transport = new TBufferedTransport(socket);
  auto protocol = new TBinaryProtocol(transport);
  auto client = new TClient!Calculator(protocol);
  transport.open();

  // Call the server's add() method and print the result.
  auto lhs = 2;
  auto rhs = 3;
  auto sum = client.add(lhs, rhs);
  writefln("%s + %s = %s", lhs, rhs, sum);
}

Client implementation. Note how the interface defined above in the calculator module is passed to TClient as a template parameter, which then generates the necessary RPC code.

Random D development news

2011-04-26T00:00:00+01:00

During the last couple of weeks, I didn’t really find time to update this blog. Nevertheless, however, I was able to spare some time for work on a couple open source projects related to the D programming language. But first, let me quickly summarize some great changes that will be in the next DMD release:

Don Clugston has basically re-implemented CTFE to fix a whole slew of compile-time function execution bugs, among which is the dreaded bug 1330. There are still some regressions compared to DMD 2.052 (like this one, which breaks QtD), but apart from those, it’s a big step towards getting CTFE out of the »experimental feature« category. The new architecture will also make implementing reference types easier, but this is still a long way off. Then next DMD/Phobos release will also include the new std.parallelism module by David Simcha, some GC optimizations and a large amount of other improvements (among which is the addition of the parent trait, so that QtD doesn’t need a patched DMD any longer) – due to the GitHub migration and the larger part of x86_64 support being done, the perceived development speed in the core community really went up a notch.

As for my own (insignificant, compared to the above) contributions, I did some work on LDC during the last few days, porting it to LLVM 2.9 and bringing the front-end in sync with DMD 1.067 – you can find the changes in the default branch over at Bitbucket. The DMD updates also contained some changes to the varargs ABI on x86_64 and other areas of the runtime interface, which I didn’t merge yet, because it would require an update to Tango as well. I am not aware of any regressions so far (see the DStress results), but feel free to ping me in case of any problems.

There were also some updates and bug fixes to D support in SWIG, most notably support for the nspace feature, which allows you to map C++ namespaces to D packages/modules (it doesn’t work for free functions and global variables yet, but this is a general SWIG restriction that could be easily lifted, just ask me if you need it). There was another SWIG release in the meantime, version 2.0.3, but it was only a »quick backup« by the maintainer before he merged some intrusive Python changes. I was caught pretty much off-guard by it and had no time for real testing and thus, it contains some bugs (mainly related to nspace support when split-proxy mode is not activated, thanks to Jonathan Pfau for the reports) – please use SVN trunk instead.

Another little project I recently worked on is std.units, an units of measurement implementation for D. This topic came up several times on the NG previously, and every time it was suggested to add units support with Phobos, so I have merged the work into my Phobos fork. Please note, however, that this is in no way a formal review request yet. There are still a couple of items left on my to-do list, but before I am tackling the remaining issues, I’d greatly appreciate some feedback (see the thread on the D newsgroup, RFC: Units of measurement for D).

Finally, a personal note: Yesterday, I received notice that I was accepted to work on my Apache Thrift project under the umbrella of Digital Mars as part of the Google Summer of Code 2011 – thanks a lot to everybody who supported my proposals for their trust in me! I know that the expectations are high, and will do my very best to live up to them.

Quake-style drop-down terminal on OS X

2011-02-27T00:00:00+00:00

I’m currently using OS X 10.6 on my MacBook Pro and the combination of a polished UI and the familiar Posix foundation quite appeals to me (I’ll probably do a separate post on my experience with it eventually). Nevertheless, however, I am using the text console a lot, obviously when doing development work, but also for lots of everyday stuff I can still do faster with it.

Because I’m often using the console side-by-side with GUI applications, I found it really useful to be able to access a console overlay via a system-wide hotkey, just like in good old Quake (which I never personally played, by the way). This should give you an idea how it looks like:

The in-game console at the Quake 4 title screen, toggled by the tilde key.

It doesn’t, unfortunately, come as a surprise that the OS X terminal application doesn’t support this out of the box, but fortunately there are several third-party tools for archiving this. First, I tried out Visor, a SIMBL plugin for Apple’s Terminal, which provides more or less exactly what I was looking for. Unfortunately, it turned out to be not quite as stable as I hoped (random crashes from time to time), and Terminal.app itself has the annoying habit of not reacting to input quite often, especially after killing a interactive console app with Ctrl+C.

But a few days ago, I discovered iTerm2, a replacement for the system terminal application, which also supports a system-wide hotkey to hide/unhide the console window and doesn’t suffer the annoying lock up problem with Terminal. It is still in alpha at the moment, but even the nightly build I am currently using (0.20.20110226) has been stable so far.

Just resizing the window to the top third of the screen and using it as an overlay does not quite work when using multiple virtual desktops (»Spaces« in OS X terminology) though: The console window always appears on the same desktop, even if you are working on another one, causing the window manager to switch desktops, which defeats the purpose you were using an overlay in the first place. Fortunately there happens to be an easy solution for this as well: The Afloat »window manager plugin« enables you to keep a window on all spaces, among a wealth of other power-user friendly features.

There is currently another minor quirk with iTerm2: Even though it defaults to UTF-8 input, it does not set the LC_CTYPE environment variable accordingly, which caused some problems with Ruby 1.9 applications for me (random encoding-related errors like »invalid byte sequence in US-ASCII (ArgumentError)«). The simple workaround is to add an export line to your .profile.

SWIG 2.0.2 with D support released

2011-02-21T00:00:00+00:00

Yesterday, SWIG version 2.0.2 has been officially released. Along with various bug fixes for the other supported languages, this is the first release to support the D programming language. As always, you can get the release from the download area, but here are direct links to the files hosted at SourceForge for your convenience: One for the source tarball, and another for swigwin which includes a pre-built Win32 executable.

Since my first announcement, there were a number of changes and improvements. Along them were some critical fixes to the generated code when compiled on Windows, some minor ones regarding name collision in the D part, and a fix to the »directors« feature where a wrong C++ method would be called silently under certain circumstances (thanks to Jimmy Cao for reporting). Unfortunately, there were also some breaking name changes, as previously mentioned on this blog. Furthermore, I added basic support for operator overloading, please refer to the documentation for details.

If you have any questions or need assistance with using SWIG on a certain library, feel free to contact me directly or to post to the swig-user mailing list. During the next few days, I will be quite busy and cannot promise you to reply quickly, but after that, I will be happy to help. Oh, and it would be great if you could share your personal experiences, common pitfalls and how to overcome them when using SWIG for the first time, since »Getting Started«-style documentation for people new to SWIG is a bit scarce at the moment!

git reset using Mercurial

2011-01-09T00:00:00+00:00

I am mainly a Git user, but lately I have been working with Mercurial from time to time. I have been mostly using it for basic committing though, so I still occasionally end up with a commit I did not mean to create like this when performing more advanced operations. But undoing that can’t be too hard, right?

For example, I recently toyed around with Mercurial Queues to emulate Git’s staging area, one of the features that seem trivial, but which you don’t want to miss at any cost once you are used to it. Doing so, while being at, let’s say, revision 1000, I accidentally created two changesets, 1001 and 1002. Now, how do I get rid of these while still keeping the contents of them in the working copy? Using Git, this would just be git reset HEAD~2. But unfortunately, Mercurial seems to make your life somewhat hard in this case, this is what I came up with (please leave me a message in the comments in case I missed an easier way):

hg update -r1000

This sets the working copy to the last »good« revision, only to …

hg revert —all -r1002

… bring the changes from the two commits back to the working copy (but without committing them this time), so we can now …

hg strip —force 1001

… strip the two changesets from the history.

I am perfectly aware of the fact that any other SCM tool will probably seem clumsy at first if I am used to Git (besides the fact that Git seems to be a natural fit for the way I think about versioning), but I still wonder whether there a deeper reason for Mercurial not to support this more directly.

Breaking name changes in SWIG/D

2010-12-01T00:00:00+00:00

Sorry if this notice might come a bit late for some of you, but a few days ago, I have committed a breaking change to D support in SWIG trunk. It was needed to bring the names used in the D module in line with the C# one, the naming scheme of which was intended to be language-independent by the principal maintainer (although it is only used in the C# and D parts right now).

Most of the changes revolve around the term »wrap D module« being replaced with »intermediary D module«, including names derived from it. To adapt your interface files, just perform the following replacements:

s/cwtype/ctype/g

s/dwtype/imtype/g

s/dptype/dtype/g



s/$wcall/$imcall/g

s/$dpcall/$dcall/g



s/wrapdmodule/imdmodule/g

Announcing: D support in SWIG

2010-11-21T00:00:00+00:00

In a nutshell, SWIG is a »glue code« generator, allowing you to access C/C++ libraries from various target languages, including C#, Go, Java, Ruby, Python … and, since I merged my work into SWIG trunk a few days ago, also the D programming language, both version 1 and 2.

Why would D support in SWIG be useful in the first place? After all, D is perfectly able to interface with C on its own, so why bother using a third-party tool?

Well, it turns out that even for »plain old C«, there are reasons why you’d want to use a bindings generator. Besides the obvious problem that you have to convert the C header files to D modules somehow, there is one major inconvenience with directly using C libraries from D: D code usually is on a higher abstraction level than C, and many of the features that make D interesting are simply not available when dealing with C libraries. For instance, you would have to manually convert strings between pointers to \0-terminated char arrays and D strings, and most interesting algorithms from the D2 standard library are simply unusable with C arrays.

While these issues can be worked around relatively easy by hand-coding a thin wrapper layer around the C library in question, there is another issue where writing wrapper code per hand is not feasible: C++ class libraries. D1 does not support interfacing with C++ at all, and even if extern(C++) has been added to D2, the support is quite limited, and a custom wrapper layer is still required in many cases.

Here is, without further ado, a small example of what the D module for SWIG allows you to do. Consider the following (admittedly not very useful) piece of C++ code:

typedef std::pair<float, float> Position;



class Shape {

public:

   Shape( Position pos ) : m_position( pos ) {}

   virtual ~Shape() {}



   virtual std::string getDescription() const = 0;



   Position getPosition() const {

      return m_position;

   }



protected:

   Position m_position;

};



class Circle : public Shape {

public:

   Circle( Position pos ) : Shape( pos ) {}

   virtual ~Circle() {}



   virtual std::string getDescription() const {

      return "A perfect circle.";

   }

};



std::string toString( const Shape& shape ) {

   std::ostringstream result;



   Position p = shape.getPosition();

   result << "A shape at (" << p.first << ", " << p.second << ").";

   result << " It looks like this: " << shape.getDescription();



   return result.str();

}

By using SWIG to generate the necessary glue code, you can easily make the classes available in D, as demonstrated by the following small program:

class Square : Shape {

   this( Position pos ) {

      super( pos );

   }



   override string getDescription() const {

      return "Quite square-ish.";

   }

}



void main() {

   // One of the ugliest bugs currently in D: Type inference does not

   // work correctly for arrays of classes with a common supertype.

   auto shapes = [

      cast( Shape ) new Circle( new Position( 1, 3 ) ),

      new Square( new Position( 2, 1 ) )

   ];



   foreach ( shape; shapes ) {

      writeln( toString( shape ) );

   }

}

A shape at (1, 3). It looks like this: A perfect circle.
A shape at (2, 1). It looks like this: Quite square-ish.

Note that Shape is extended on the D side just as usual and how the C++ call to getDescription() is transparently routed to Square.getDescription(). This mechanism dubbed cross language polymorphism is enabled by a feature of SWIG called »directors«, which causes the extra indirection layer needed for this to be emitted. Also note how the strings are seamlessly converted between their C++ and D representation.

So you want to give the D module in SWIG a whirl? Just head over to the SWIG SVN, grab the sources from there, and build it. If you are planning to run the test suite or the included examples, you might want to specify --with-d1-compiler=<…> and --with-d2-compiler=<…> at the configure command line. In case you want to play around with the small example from above, I also put up a small archive containing the files (for such a small example, the C++ code could be included directly in the SWIG interface file via the %inline directive, but that’s how you would probably want to tackle a real library).

What can you expect to work? The test-suite which covers all the basic features of SWIG should build and run fine, which means that it will probably just work when trying to wrap a library. The source tree includes also a documentation chapter on D (Doc/Manual/D.html) which describes the basic structure and some of the D-specific features. As the D module started out as a fork from the C# one, the documentation on C# could be of considerable use for you as well.

There are still a few areas which need serious work, though. One of them is operator overloading, where both semantics and implementation differ quite a lot between C++ and D. It would probably be not too hard to come up with a solution (maybe using D’s extensive compile-time reflection capabilities to avoid having to add special cases to the SWIG module), but I would really appreciate some help from someone actually needing it here.

The other big one is multithreading support. Since I personally have not needed to use C++ libraries from D in a threaded setting yet, I have not really thought about the problems arising from multiple threads calling the wrapper code. Especially in combination with the garbage collector, I expect quite a lot of issues to pop up in a serious multithreaded environment. There are a few places which include threading-related code (synchronized, shared, …), but these are mostly remnants from the C# module, which may or may not apply to D – once again, I would be happy if somebody needing this would help me out here.

Speaking of C# remnants: As mentioned above, the D module was forked from the C# module, which in turn started out as a fork from the Java one. Due to this heritage, there are a few places where things could be done much easier in D. For example, the code for returning C strings to D without memory leaks is unnecessarily complex at the moment. But the same applies here as well – I would be happy to support anyone wanting to clean this up, but the current implementation did its job for me so far.

Anyway, I would be glad if some of you could go ahead and put SWIG to real-world use, so that any major bug can be fixed before the next SWIG release (not planned so far). If you stumble upon any issues or if any questions should arise, please feel free to contact me, either via mail, on digitalmars.D or in #D on freenode. Besides that, as always, it would also be nice just to hear about what you are doing with this.

In the meantime, two severe bugs in the code generated for Windows have been fixed; please be sure to use the latest version from SVN.

Oh, how glad I am that ActionScript 2 is dead and buried…

2010-06-08T00:00:00+01:00

… but unfortunately not in a personal project of mine that I started quite a while ago and which I have resumed work on recently.

Today, I have been finally able to fix a bug which had already taken me some hours to trace down. Basically, mouse-over and -off events would not work properly on certain MovieClips. After some digging through my custom code managing these events (which I needed to come up with because there is no way to let hover events bubble up the display hierarchy in ActionScript 2), I found that hitTest() wouldn’t work properly on these clips.

Now the fun part began. I meticulously checked every aspect of the MovieClips for anything special, I even considered that it might have something to do with the fact that they were positioned right behind some TextFields, which could have triggered some Flash player bugs (given that they are already surrounded by a cloud of weirdness in AS2). Nope, nothing.

It wasn’t until I had already pretty much given up that I noticed that the name of the clips in question contained a period. After I removed it … hitTest() worked fine – thanks a lot for wasting my time, Adobe! Not only that this really should not happen, at least not without a runtime warning, the fact that you cannot use periods in MovieClip names is apparently undocumented.

Oh well…

STL Algorithms

2010-04-20T00:00:00+01:00

While attending a workshop at Linuxwochen Linz recently, I found using std::for_each and other algorithms from the C++ Standard Template library without even really thinking about it, much to the surprise of the other workshop attendants, which were, contrary to me, rather artists than coders. As I were thinking about the way my C++ coding style evolved over the years, I remembered that my use of <algorithm>s can be traced back to a single article on Dr. Dobb’s: STL Algorithms vs. Hand-Written Loops.

The text was written by Scott Meyers almost ten years ago, but in my eyes, it is still a very good read on the topic. Highly recommended!

The Joys of OPTLINK

2010-02-13T00:00:00+00:00

As you might know, DMD/Windows (the reference compiler for the D programming language) does not use the standard COFF format for the object files it generates, but the fairly obscure OMF instead. This fact itself causes quite a number of annoyances. For example, the format differences make it unable to link static libraries produced by other compilers to D projects, which is especially annoying since it also applies to DLL import libraries. You also cannot use any tools which expect object files in COFF format and vice versa.

However, all of these issues, as annoying as they may be, do not pose a serious problem, they can all be worked around. But there is another one, and it has seven letters: OPTLINK. OPTLINK, courtesy of Digital Mars, is the linker which comes with DMD. There are quite a number of issues with it:

First, it is proprietary closed-source software. Apart from some people’s idealistic worries, this also poses a serious problem to more pragmatically inclined coders because there are no alternative linkers for OMF, at least no even half-decent ones. This means that if you stumble upon a bug, you can do nothing more than to wait for Walter Bright to fix it.

Second, even if the source code was available, it would probably still be hard to fix bugs, since, according to Walter himself, large parts are written in assembler – a maintainer’s nightmare. This might also explain why it took him so long to fix some serious bugs in the past…

Third, there are bugs. Lots of bugs, compared to other linkers and with the pretty high version number (8.00.2) in mind. If you want to know what I am talking about, just search the D newsgroups; projects which make extensive use of templates seem to be affected more often than others. Until yesterday, I personally had been spared from this kind of issues, but the OPTLINK bug I encountered yesterday almost drove me crazy, because one wouldn’t expect this at all:

After I had worked quite some time on Linux exclusively, I needed to compile a Windows version of a project of mine. So I went ahead and rebooted, updated DMD, Tango and a few other tools. Everything worked fine, the project even built fine, until I needed to build debug symbols into the binary. Every time I just added the -g flag to the compiler invocation, OPTLINK would abort with »Error 118: Filename Expected«. Because I had also upgraded my build tool, my first thought was that the linker commands could really be broken, but on closer inspection, it turned out that the invocation was generated perfectly fine. So I went on and downgraded all of the tools again, but to no avail – again the same error, although debug builds had worked flawlessly in the past.

After having searched for about an hour, I finally found the cause, and I could not really believe it at first: Compared to my previous D/Windows setup, I had added the Notepad++ installation directory to my PATH. You might ask yourself now, »Um, what? How should that break the linker?« Well, it turned out that OPTLINK apparently has problems with handling plus signs in all the lookup paths it uses, including not only the ones passed at the command line, but also those from the environment variables.

For a second I was really tempted to just drop DMD altogether, but unfortunately, there currently is no other D compiler of comparable quality for Windows. In my eyes, it would really help if DMD used COFF for its object files, making it possible to easily switch out OPTLINK, since the maturity of the tool-chain is currently the number one problem of D.

Setting up GDC and Tango on Linux x86

2009-10-26T00:00:00+00:00

Currently, there are three more-or-less working compilers for the D programming language (version 1): The oldest and most mature one is DMD, short for Digital Mars D Compiler, the official reference implementation by Walter Bright, the creator of D. It has grown reasonably stable, but has certain limitations, most of them resulting from using a proprietary back-end. Additionally, not all parts of it are Open Source (starting with a capital letter). The second one is LDC, a rather young, but quick-moving project which aims to port the front-end of DMD to the also fairly recent LLVM compiler framework in order to leverage its advanced code generation and optimization infrastructure. While it still has some bugs to iron out (most notably missing exception support on Windows), it works reasonably well on Linux x86 (32 and 64 bit). The third compiler, and subject of interest for this post, is GDC. Like the other two compilers, it uses the Digital Mars D front-end, but coupled to the very mature GNU Compiler Collection (GCC) back-end, whose C/C++ compiler is widely used on Unix-like systems like Linux, Mac OS X, various flavors of BSD and also Windows through MinGW. Unfortunately, development on it has stalled, making it pretty much unusable due to the many bugs the old DMD front-end it uses contains.

However, there has been an effort started to resurrect GDC recently. Development takes place over at bitbucket (you can also find building instructions for GDC there) and the project has been able to celebrate some first success: The reasonably recent front-end versions 1.040 and 2.015 (for D2) are working with GCC 4.3. This seemed enough of a sign of life for me to decide to give GDC another try. After some initial problems (some of which resulted from bugs which have already been fixed in the official Mercurial repository) I managed to compile a GDC binary (frontend version 1.040 against GCC 4.3.1) which happily compiles the Tango standard library and a personal project of mine. This is what I did (silently omitting quite a few hours of searching and fixing bugs):

First, go to some temporary directory and checkout the GDC sources from the Mercurial repository (at the time of writing, revision 53 was current):

cd ~/tmp

hg clone http://bitbucket.org/goshawk/gdc

Then, download the core of GCC 4.3.1 from a mirror near you (version 4.3.2 should also work, but builds against 4.3.4 are currently known to be broken) and extract it inside the GDC sources:

wget ftp://gd.tuwien.ac.at/gnu/gcc/releases/gcc-4.3.1/gcc-core-4.3.1.tar.bz2

mkdir gdc/dev

cd gdc/dev

tar xjvf ../../gcc-core-4.3.1.tar.bz2

Now, link the GDC sources into the extracted directory and use the provided setup-gcc.sh script to patch GCC to enable D version 1:

cd gcc-4.3.4

ln -s ../../../d gcc/d

gcc/d/setup-gcc.sh —d-language-version=1

After that, you are ready to build and install GCC with D support. For this, go to some build directory and configure and make. You can, of course, choose an arbitrary directory for the build files (for instance, I personally prefer having the build files completely outside the source direcotry):

mkdir build

cd build

../configure —enable-languages=d —disable-multilib —disable-shared —prefix=/opt/gdc

make

sudo make install

Note that I configured GCC/GDC to be installed in /opt/gdc. As the build also includes the C compiler, this avoids any interference with the »normal« GCC installed probably in /usr. After the build has finished – this takes quite long, since GCC is built three times to bootstrap itself – you should have a working GDC executable in /opt/gdc/bin. Now for the second part, Tango:

Start off by fetching the Tango sources from the SVN to a temporary working directory (I worked with revision 5023):

cd ~/tmp

svn co http://svn.dsource.org/projects/tango/trunk tango

Unfortunately, Tango currently does not compile with GDC out of the box, you have to apply a couple of minor changes: The first change adds build/arch files for GDC/Linux:

diff -git a/build/arch/linux-i686-gdc-dbg.mak b/build/arch/linux-i686-gdc-dbg.mak

— /dev/null

+ b/build/arch/linux-i686-gdc-dbg.mak

@ -0,0 +1,6 @

include $(ARCHDIR)/gdc.rules

include $(ARCHDIR)/linux.inc



# -Wall breaks the compilation with wrong errors

DFLAGS_COMP=-g

CFLAGS_COMP=-g



diff -git a/build/arch/linux-i686-gdc-opt.mak b/build/arch/linux-i686-gdc-opt.mak

— /dev/null

+ b/build/arch/linux-i686-gdc-opt.mak

@ -0,0 +1,5 @

include $(ARCHDIR)/gdc.rules

include $(ARCHDIR)/linux.inc



DFLAGS_COMP=-O2

+CFLAGS_COMP=-O2



diff -git a/build/arch/linux-i686-gdc-tst.mak b/build/arch/linux-i686-gdc-tst.mak

— /dev/null

+ b/build/arch/linux-i686-gdc-tst.mak

@ -0,0 +1,5 @

include $(ARCHDIR)/gdc.rules

include $(ARCHDIR)/linux.inc



DFLAGS_COMP=-g -fdeprecated -fdebug=UnitTest -funittest

+CFLAGS_COMP=-g

The second change removes the -fversion=Posix flag from the Makefile of the runtime because the DMD frontend GDC currently uses (1.040) does not allow it to be specified as it is set automatically (this restriction has been lifted in later versions):

diff -git a/runtime/compiler/gdc/Makefile.am b/runtime/compiler/gdc/Makefile.am

— a/runtime/compiler/gdc/Makefile.am

+ b/runtime/compiler/gdc/Makefile.am

@ -18,7 +18,7 @

 # AUTOMAKE_OPTIONS = 1.9.6 foreign no-dependencies



 OUR_CFLAGS=DEFS -I.

-D_EXTRA_DFLAGS=-nostdinc -pipe -I../../.. -I../shared -fversion=Posix

+D_EXTRA_DFLAGS=-nostdinc -pipe -I../../.. -I../shared

 ALL_DFLAGS = $(DFLAGS) $(D_MEM_FLAGS) $(D_EXTRA_DFLAGS) $(MULTIFLAGS)



 host_alias=.

diff -git a/runtime/compiler/gdc/Makefile.in b/runtime/compiler/gdc/Makefile.in

— a/runtime/compiler/gdc/Makefile.in

+ b/runtime/compiler/gdc/Makefile.in

@ -228,7 +228,7 @ target_vendor = target_vendor

 top_builddir = top_builddir

 top_srcdir = top_srcdir

 OUR_CFLAGS = DEFS -I.

-D_EXTRA_DFLAGS = -nostdinc -pipe -I../../.. -I../shared -fversion=Posix

+D_EXTRA_DFLAGS = -nostdinc -pipe -I../../.. -I../shared

 ALL_DFLAGS = $(DFLAGS) $(D_MEM_FLAGS) $(D_EXTRA_DFLAGS) $(MULTIFLAGS)

 toolexecdir = $(phobos_toolexecdir)

 toolexeclibdir = $(phobos_toolexeclibdir)

The third and last change adds a workaround to Tango’s user library for a bug in the DMD front-end which has been fixed by now (the compiler fails to resolve the type of the template parameter in the templated intpow function):

diff -git a/user/tango/math/internal/BiguintCore.d b/user/tango/math/internal/BiguintCore.d

— a/user/tango/math/internal/BiguintCore.d

+ b/user/tango/math/internal/BiguintCore.d

@ -516,7 +516,7 @ static BigUint pow(BigUint x, ulong y)

             }

             y0 = y/p;

             finalMultiplier = intpow(x0, y – y0*p);

-            x0 = intpow(x0, p);

+            x0 = intpow!(BigDigit)(x0, p);

         }

         xlength = 1;

     }

After you have applied these patches, you should be ready to build Tango (make sure that you have a cc somewhere in your PATH, if not, create a link to your system’s gcc):

sudo PATH=$PATH:/opt/gdc/bin DC=gdc build/build.sh —lib-install-dir /opt/gdc/lib

However, I had to remove Phobos’ object.d from /opt/gdc/include/d/4.3.1 first. build/build.sh should finish with a note reminding you that the user libraries still have to be installed. To do this, simply copy the contents of the user directory to /opt/gdc/include/d/4.3.1 after removing the old include files which are part of Phobos (you have to keep the gcc and i686-pc-linux-gnu directories though). Congratulations, now you should be able to build your Tango projects with GDC!

A quick tip for DSSS users: You probably have to modify your gdc-posix-tango profile to omit the -version=Posix switch (see above) on gdmd calls and add -L-ltango-base-gdc to the linker flags since Tango was not installed via DSSS in the above instructions.

Since I originally wrote this post, Tango’s build system was modified yet another time (at least, things are much simpler now). Instead of fiddling around with the makefiles, just use the bob tool from the build directory now which should work with GDC out of the box.

The Power of Git

2009-07-28T00:00:00+01:00

As you might already know if you read my blog post about it, I have been using Git quite a while ago now. However, I am still amazed not infrequently by the fact that Git simply works, in the sense that it really does the things you tell it to do.

Recently, for instance, I wanted to merge an extension to the great Open Asset Import Library (bindings for the D programming language, in fact) which I developed locally in Git to the upstream repository in a way that the commit history was kept locally. However, SVN is used as SCM system for upstream development. So I started out by importing the upstream SVN repository via git-svn:

$ mkdir assimp; cd assimp

$ git svn init https://assimp.svn.sourceforge.net/svnroot/assimp/trunk

$ git svn fetch

Nothing too exciting here. So far, I only created a local Git clone of the SVN repository which I probably will use for contributing to upstream development in the future. But how to transfer the bindings from the Git repository to this one including their (strictly linear, i.e. master-only) commit history? Because Git does not try to be smarter than its users, the first solution I came up with worked flawlessly. Here is what I did:

$ git checkout -b d-bindings

$ git fetch ../dAssimp

$ git read-tree —prefix=port/dAssimp FETCH_HEAD

$ git rev-parse FETCH_HEAD > .git/MERGE_HEAD

$ git commit

After switching to a new branch in which the history should be stored, I told Git to fetch the contents of the local dAssimp repository (the D bindings I developed). Because I had not made any merges, Git simply stored the HEAD of the other repository in FETCH_HEAD. The read-tree command reads, as the name suggests, arbitrary tree information into the index. The --prefix switch allows you to keep the current index and read the tree into an (empty) subdirectory instead – perfect for what I intended to do. Storing the FETCH_HEAD‘s object name into .git/MERGE_HEAD tells Git to generate a merge commit the next time git commit is called. There was just one last thing left to do: as git read-tree, according to the manpage, »does not actually update any of the files it ›caches‹«, a git reset --hard is needed to actually create the new files in the working copy. That’s it.

As I found out later, I could have possibly done this more easy using the subtree merge strategy, but I still like, as mentioned above, Git’s feature of »just doing what you tell it to do«…

Installing DMD, LDC, Tango and DSSS on (K)Ubuntu Jaunty

2009-07-28T00:00:00+01:00

For quite a while now, I am using the D programming language, version 1 (I have not looked at D2 yet, it is said to be still rather unstable). Even though I like it very much for its syntactical quality and the language itself is reasonably mature, I must admit that setting up the toolchain correctly can still be a very cumbersome task to do, especially when you are new to D.

This post describes an installation routine that should provide you with a working D development environment containing DMD, LDC, Tango and DSSS on (K)Ubuntu Jaunty. Please note that it assumes your system to be »clean« – if you have already installed any D-related software, it is probably advisable to remove it completely to prevent any problems with, for instance, stale files.

mkdir -p ~/tmp

cd ~/tmp



wget http://ftp.digitalmars.com/dmd.1.050.zip

unzip dmd.1.050.zip

cd dmd/linux/bin/

chmod +x dmd dumpobj obj2asm rdmd

sudo cp dmd dmd.conf dumpobj obj2asm rdmd /usr/local/bin/



cd ~/tmp

svn co http://svn.dsource.org/projects/tango/trunk tango



cd tango/

sudo DC=dmd build/build.sh —lib-install-dir /usr/local/lib

sudo cp -rf user/object.di user/rt user/std user/tango /usr/local/include/d/



sudo su -c 'echo -e "[Environment]\nDFLAGS=-I/usr/local/include/d -defaultlib=tango-base-dmd -debuglib=tango-base-dmd -L-ltango-user-dmd -version=Tango -version=Posix" > /usr/local/bin/dmd.conf'



sudo su -c 'echo -e "# dsss\ndeb http://ppa.launchpad.net/d-language-packagers/ppa/ubuntu jaunty main" >> /etc/apt/sources.list'

sudo apt-get install dsss

sudo su -c 'echo "profile=dmd-posix-tango" > /etc/drebuild/default'

You should now be able to build your D/Tango programs with DMD and DSSS.

I would suggest giving LDC at least a short glance, a fairly young compiler project which leverages LLVM as its code generating backend. It is maturing very quickly and allows you to make use of the various features the LLVM compiler infrastructure provides, the most noticeable probably being its excellent optimization routines. Fortunately, there are current binary packages available at launchpad, so all that is needed to LDC is:

sudo su -c 'echo -e "# ldc-daily\ndeb http://ppa.launchpad.net/d-language-packagers/ppa/ubuntu karmic main\ndeb http://archive.ubuntu.com/ubuntu karmic main universe" >> /etc/apt/sources.list'

sudo apt-get update



sudo apt-get install ldc-daily libtango-ldc-daily-dev



cd ~/tmp

wget -O ldc-posix-tango http://www.dsource.org/projects/ldc/browser/ldc-posix-tango?format=raw

sudo su -c 'sed "s:ldc.rebuild.conf:/etc/ldc/ldc.rebuild.conf:"  /etc/drebuild/ldc-posix-tango'



sudo su -c 'echo "profile=ldc-posix-tango" > /etc/drebuild/default'

Note that the above commands install a daily snapshot of LDC, which I would recommend to use due to the currently fast development of LDC. In order not to break you Jaunty installation, please do not forget to comment out the official »karmic« repositories (which contain some dependencies for ldc-daily) in your /etc/apt/sources.list and run apt-get update after the installation is completed.

Both compilers are set up to use Tango, do not install Tango via DSSS! If you want to switch compilers, just activate the corresponding profile in /etc/drebuild/default and do not forget to rebuild any D libraries you might have compiled and installed with the old compiler (just run dsss net install … for the ones you installed using DSSS).

Since I wrote this post, Tango received yet another big structural change to its codebase (amongst other changes, the core and user libraries have been merged). Now, you should use the supplied »bob« tool now to build tango. Additionally, Karmic is now stable so you might have to adapt the APT repository-related instructions.

Getting KDE's clippboard to work with Eclipse

2009-05-28T00:00:00+01:00

For whatever reason, Eclipse does not work well with Klipper, the KDE clipboard manager, in its default settings. The symptoms: Quite often, you copy a piece of text to the clipboard. When you try to paste it, it miraculously disappears and some old piece of clipboard content is pasted instead.

After simply trying to ignore the problem for some time, I searched and found a solution today: You have to disable the »Prevent empty clipboard« setting in Klipper’s configuration menu (which is accessible by right-clicking on the systray icon).

Intuitive? Not to me…

Disabling the mentioned option might introduce some minor glitches to general clipboard usage (sometimes, the clipboard seems to empty itself). As those occur rather infrequently, I have not been able to find out why this happens, or even if this is connected to the configuration changes described in this post.

Duplicate »Translucency« KWin effect

2009-04-28T00:00:00+01:00

For some time, the effect »Translucency« was listed twice in the KWin KCM plugin list of my local KDE setup (SVN trunk). One copy was actually working, the other was just producing error messages.

Today, I finally had time to investigate the issue: The problem was caused by a stale .desktop file in share/kde4/services/kwin with the old name of the plugin (it was renamed from maketransparent to translucency).

I have no idea how this could happen, because I usually purge the whole /opt/kde folder everytime I svn up my qt-copy, which I happen to do quite often…

Strange segfaults when compiling with GDC

2009-02-04T00:00:00+00:00

Use -no-export-dynamic to prevent segfaults in external libraries when linking with GDC.

More to come soon…

ZoneAlarm Firewall + Windows Vista = No Good

2009-02-01T00:00:00+00:00

I am terribly short on time at the moment, but I just have to record this:
Don’t use the Zone Labs ZoneAlarm Firewall on Microsoft Windows Vista. Just don’t do it!

For me, the firewall caused numerous seemingly unrelated problems: the Windows Update panel froze when started (had to kill explorer.exe), the Windows Defender updates were not working anymore, I could not install Visual Studio (Visual C++ 2008 Express to be precise), etc.

I will certainly write more about this somewhen in the future…

Apparently, Microsoft have already covered this issue in a knowledge base entry almost two years ago. D’oh! I will probably do some testing on this during the next days.

Debugging KDE applications with gdb

2009-01-24T00:00:00+00:00

A pretty long time has passed since my last post here, and in the meantime I have jumped right into KDE development myself. I have stumbled upon quite a few tricks and pitfalls that I will undoubtedly write about here some day.

For now, I just want to share a little gem which I have discovered a few minutes ago: In kdesdk/scripts, David Faure published a little script called kde-devel-gdb (view with WebSVN), which extends gdb with the ability to print the contents of several Qt containers, including the widely used QString. Highly recommended!

KHotkeys in KDE 4.1

2008-11-02T00:00:00+00:00

I just upgraded to Kubuntu 8.10 in order to have a look at shiny new KDE 4. But amongst other minor annoyances, I had real trouble getting hotkeys to work. There are configuration options in the new System Settings panel (which is a huge regression compared to KControl by the way), but they seemed to have no effect.

After various attempts of fixing this problem myself, I finally found a (slightly hackish) solution: Re: Does khotkeys work on your KDE 4.1?. Seems like KDE 4.1 is still very beta…

Git on Windows

2008-10-12T00:00:00+01:00

Just found an excellent writeup by Kyle Cordes about using Git on Windows.

KDE Konsole corruption when Compiz is active

2008-09-03T00:00:00+01:00

On my laptop I’m currently running Kubuntu 8.04 (Hardy Heron). For additional eye candy goodness I am using the compiz-fusion package from the Kubuntu repositories. Surprisingly, even on the laptop hardware (Asus V1S) everything went smooth out of the box – I could even manage to find some drivers for the webcam and for the finger print scanner.

Well, everything worked fine except for one little detail: When I had an active Konsole session on one cube face, for example a log file, and continued to work on another side of the cube, the Konsole output would often be broken when I switched back to it. It would look as if the output had been scrolled, but the non-scrolled output hadn’t been cleared from the window. This problem could be fixed by forcing the window to refresh, e.g. by switching to another (Konsole) tab.

In their 169.XX driver series, nVidia added a config option called UseCompositeWrapper, which can help to sort out this kind of redraw problems. Fortunately, enabling this via adding the following line to the Device section of my xorg.conf was enough to solve the problem:

Option "UseCompositeWrapper" "true"

The Tangled Working Copy Problem

2008-08-25T00:00:00+01:00

There is a problem which probably about every developer using a revision control management for their hobby projects has already experienced: The annoying situation of having two or more completely unrelated changes in your working tree. With its “index” feature, git provides an excellent, if not quite intuitive facility for solving the problem.

This great article by Ryan Tomayko is quite a comprehensive description of this problem itself (he calls it “The Tangeled Working Copy Problem”) as well as a detailed guide on how to solve it using git.

Getting started with git

2008-08-02T00:00:00+01:00

Recently, I decided to have a look at the revision control system Git and the »social code hosting service« GitHub, which is currently hyped in large parts of the open source community (git is used for the Linux kernel, Ruby on Rails, …). At first, I felt pretty overwhelmed, because the differences to other SCMs like Subversion were bigger than I had expected. But now, having read the excellent blog post Got git? by Steven Bristol, I am starting to understand the concepts and the motivation behind it.

Frankly, I am still not quite sure if the “doing it distributed” paradigm will really establish itself in everyday coding work, but for open source projects git looks very promising at least.

NoInternetOpenWith

2008-06-26T00:00:00+01:00

At the moment, I am forced to use a box running Windows Vista for parts of my daily work. One thing that really annoys me about Windows is that nasty dialog asking you if you want to search the internet which pops up when you want to open a file with an extension Windows doesn’t know about. I am sure that 99 percent of the users would prefer to directly jump into the window where you can choose the application. But well, that’s just Microsoft, I guess.

Today, I finally found a tweak that removes that useless dialog:
Add a dword called NoInternetOpenWith with the value 0x1 under HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer.

If you do not want to poke around manually in your registry, you can copy the following lines in a new .reg file and double-click it to add the key.

Windows Registry Editor Version 5.00

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows\CurrentVersion\Policies\Explorer]
"NoInternetOpenWith"=dword:00000001

File encoding in Eclipse

2008-06-20T00:00:00+01:00

While coding my first Rails application in Eclipse using RadRails, I found that my HTML files were not encoded in the correct format. Although I knew that you can specify the default file encoding (e.g. UTF-8) in Eclipse, it took me quite some time to find the corresponding option.

After looking for it for way too long (I guess it felt longer than it was, though), I finally found it in Window > Preferences > General > Workspace.

Distributing Rails Applications