Matterhorn Experience Report

Since August 2016, Galois has been funding the development of Matterhorn, a Haskell terminal client for the MatterMost chat system. Recently, our core development team—Jonathan Daugherty, Jason Dagit and myself—made the first public release of Matterhorn. In this post we’ll discuss our experience building it.

All three of us—as well as several other coworkers—were used to terminal-based clients for chatting, clients like irssi, weechat, finch, and glirc. After Galois started using the web-based MatterMost system, we wanted to continue using that style of client. We tried using the IRC “bridge” for MatterMost, a system that would intercept messages from MatterMost and would forward them as though they were IRC messages, thus allowing us to use the IRC client of our choice. This was sufficient for simple chatting but it quickly became apparent that it wasn’t really enough: MatterMost is one of a new crop of chat systems that includes features like indefinite persistent history, editing and deletion of past messages, rich formatting, metadata, and so forth, whereas IRC is still very much based on the metaphor of a stream of simple plain-text messages. The IRC bridge necessarily only exposed only a small subset of MatterMost’s functionality.

Instead of using the IRC bridge, we opted to build our own client that would expose these richer features directly to the user while permitting us to stay in the terminal-based environments that we prefer, allowing us to work in a text-dense and keyboard-controlled setting without sacrificing the richer features of MatterMost. It’s still a work in progress and we still add new features on a regular basis, but we’re very proud of the program that we’ve built so far.

After a few early prototypes in other languages—including Python and Emacs—we decided to write Matterhorn in the Haskell programming language: all three of us were proficient in Haskell and we had already written several libraries which we planned to use in this application. Haskell is a very good general-purpose programming language, but perhaps because of its origins as a research language it doesn’t tend to be a common choice for moderately-sized user-facing applications like this (although another of our coworkers, Eric Mertens, has also written an excellent terminal-based IRC client, glirc, in Haskell). We wanted to write an experience report about writing this kind of project in Haskell, commenting on the parts that worked very well, the parts that didn’t work out as well, and various aspects of writing this kind of software.

Programming the terminal with Brick

A major reason we chose to use Haskell was that we wanted to be able to use brick, a library of Jonathan’s creation designed for building terminal-based user interfaces using higher-level layout combinators. One of our design goals with Matterhorn was to maintain a user experience that closely matches that of the official web client, simultaneously sticking to the metaphors of the original client and making transition from the web client to the terminal painless. This meant that we’d need to be able to program non-trivial terminal interfaces quickly, both during initial development and when MatterMost releases new features. We also knew that we’d be exploring the UI design space to find a good terminal alternative to the MatterMost web interface so we’d need to be able to quickly prototype new interface ideas. Brick allows us to do that by writing our interfaces declaratively and by separating event handling, state updates, and drawing from each other. This helped facilitate a relatively clean separation of concerns while making rapid prototyping of new user interface ideas very inexpensive.

Managing state with lenses

We used lenses from the very start, and this turned out to be a spectacular choice.

In general, lenses helped us manage access to our increasingly complex ChatState application state type. As the application evolved, more fields were added to the state, some fields became record types, and soon the ability to reach into the application state to read or write became critical for writing comprehensible state transformations and event handlers. The variety of composition and transformation primitives available with lenses made sophisticated state transformations concise.

An unexpected advantage of using lenses is that they’re excellent for helping with data representation refactoring. At various points we found that our main ChatState type had gotten overly large and we needed to refactor it into smaller chunks. But almost every operation in Matterhorn uses some part of the ChatState and many of them involve updating multiple fields of the state. Even a rename of a single state field could involve touching several parts of the program, to say nothing of a deeper reorganization of its fields.

As we refactored ChatState with better abstractions, we realized that we could maintain a lens API that emulated the old structure even though we had changed the representation of the underlying types. We could reorganize the state type into a new shape but then write lenses that “faked” the old structure. We could then gradually remove those lenses as part of other refactors until the old interface lenses were unnecessary. This allowed us to incrementally refactor the rest of the program while providing a backwards-compatible lens interface to the older state structure.

Easy multi-threading

For a while, our application had two primary threads: the main application thread and a background thread for receiving websocket events. It quickly became apparent that we needed another background worker thread to make some HTTP requests asynchronously.

Luckily, we were using Haskell, so the first draft of our code for managing asynchronous requests involved about a dozen lines of code. We added a new constructor to our brick event type to represent the successful result of a background task and created a new STM.TChan that we could use to queue up background requests. A background request is represented simply: as an IO computation that returns a function of type ChatState -> ChatState. Consequently, adding a new background computation looks like this:

STM.atomically $ STM.writeTChan asyncQueue $ do
 -- Work done here is done asynchronously.
 -- After we’re done working, return a function to
 -- transform the state with the results of the work.
 return (\ state -> ...)

When Matterhorn starts up, it spawns a new thread to process these requests. In a loop, the thread grabs an IO computation from the channel, executes it, and sends the resulting state transformation back to the brick event loop:

void $ forkIO $ forever $ do
 request <- STM.atomically $ STM.readTChan asyncQueue
 response <- request
 writeBChan brickEventChan (AsyncResponseEvent update)

Then the brick event handler can use the ChatState -> ChatState function that’s included in the event and apply it to the state:

onAppEvent :: ChatState -> MHEvent -> EventM Name (Next ChatState)
onAppEvent state (AsyncResponseEvent f) = continue (f state)

And with less than a dozen lines, we’ve got a background worker thread with a work queue that integrates cleanly into our existing application! It’s only a few more lines to add proper error handling and only a few more to implement high-priority background tasks which get higher priority in the queue.

Large Haskell projects can be hard to structure

A consistent problem in writing Matterhorn was that new functions had an obvious place to go—many of them were about state manipulation, so they’d go in State.hs! Those functions would (usually) be small, maybe a dozen lines at most, and were nicely composable and orthogonal in functionality. Unfortunately, there were also tons of them.

In most object-oriented languages (and even some non-object-oriented languages, like Rust), data types serve double duty as modules or namespaces. This choice has both advantages and disadvantages, but one advantage is that namespaces become cheap. An operation on a piece of data is namespaced to that piece of data, so instead of head(list), you can write list.head(), and code that deals with a type is scoped to that type.

Haskell does not do this: functions can be namespaced within modules, but beyond that, they all live at the top level. It’s up to the programmer to organize them effectively, and this affords more flexibility in terms of code structure but that also means more opportunities to structure code poorly. When you’re adding a lot of effectful operations over the stateful core of a program that needs to keep track of lots of pieces of information—user lists and statuses, chat messages, rendered message caches, network connectivity information—it’s easy to toss them all in a bin until that bin overflows.

This is less of a problem when you’re writing a small application with fewer conceptual parts (or parts that are less closely-linked), and it’s also easier when you’re writing a library. In a language with more lightweight namespaces—for example, in Rust, where you can create a new local namespace with mod, without having to create a new file and add it to your package description—it’d be easier to gradually break apart a module’s functionality.

Our experience with Matterhorn has shown that it’s useful to throw everything into one place in the beginning when the ultimate design isn’t yet clear, so long as there is time set aside to pull things apart later as useful abstraction boundaries become more evident.

Cabal new-build is awesome!

All of us were early adopters of Cabal’s Nix-style new-build system and it’s excellent. Like the Nix package manager, new-build builds shared packages and identifies them with a hash of their dependencies and configuration and any other ambient state which might change the built package. It doesn’t require sandboxes, and dependencies that can be shared will be shared, but it also doesn’t put packages into the global namespace like non-sandboxed cabal.

It also affords you the ability to use a cabal.project file, which combines multiple individual cabal modules into a larger whole. A cabal.project file can specify a set of related local packages which can then be built as a single unit.

It’s still in the ‘tech demo’ phase, but we’ve all had a remarkably stable and consistent time with it.

Haskell’s library ecosystem is an asset

At this point Haskell’s library ecosystem is rich enough that we didn’t have to implement any core components ourselves. For many tasks–JSON processing, HTTP, SSL, configuration file handling, etc.–we had many options available.

An abundance of choices is both a blessing and a curse: having many choices to solve a problem means some solutions may be more tailored to one’s needs, but it can also be difficult to evaluate the available options. For choosing Hackage packages, especially in cases where we weren’t already familiar with the options, we found that there are some decent heuristics for narrowing down the set of choices:

  • Look at the number of releases and the date of the most recent release. A recent release and/or a large number of releases at least suggests that the package is likely to be maintained.
  • Check http://packdeps.haskellers.com/reverse to get a sense for how many other package depend on the package in question.
  • Is the package maintained or written by an author of note in the community? This is not necessary but if you already know the author or trust their work, it can be helpful.
  • Look at the number of downloads for the package in the last 30 days. Is the package receiving attention?

These checks are by no means perfect, and it’s also important to give all of the options a fair shake because newcomers make valuable contributions and more established packages can build up technical debt that is not readily apparent. However, these checks can at least provide a decent measure of the “liveness” of a package.

Matterhorn has enough direct dependencies that the complete set of transitive dependencies is quite large. As with any software, direct and transitive dependencies always bring with them some amount of technical debt. In some ways it’s worse with transitive dependencies because those dependencies are often even more out of our control and problems in those dependencies can limit portability, GHC compatibility, or licensing. This isn’t necessarily bad, but it represents a trade-off that deserves consideration: using others’ works to save considerable time, potentially taking on technical debt and giving up control, versus building it yourself, paying the cost to build from scratch and having more control.

Looking ahead

Going forward, our development team will be maintaining compatibility with the latest MatterMost server release, adding new features to maintain feature parity with the upstream client, and maintaining a presence on Hackage and GitHub. We’re definitely interested in contributions, so stop by the GitHub page to get started!

Links

Matterhorn:

Mattermost-api:

MatterMost: http://about.mattermost.com/