HTML5 is Paving the Way for Semantically Aware Tools

Rich semantics are the Holy Grail for automated analysis tools; combined with extensible, familiar, and reusable tools and techniques we can seriously cut the costs associated with robust user interface development and testing.

Previously, we discussed the set of tools available for validating and linting HTML5-based user interfaces; (eg: the  W3C, numerous HTML/CSS editors, and tools like  HTML Lint). These tools help to identify syntactic issues, but what else is possible? The syntactic (and limited semantic) checks that these tools perform are necessary, but they aren't sufficient to cover the body of intricate failures that can occur while creating the rich user experiences we've come to expect from interactive web applications and mobile devices. Linters and Validators can't, for example, find bugs relating to the visual layout, and with good reason: Checking a UI is hard; it's repetitive, monotonous, and more importantly, subjective work.

However, there is still room for improvement. Surely we can push the envelope to do more. What's next, and how can we automate tasks that still challenge human analysts?

The first important insight is that many general guidelines for creating a good user interface have quantitative approximations. For example, the Windows 7 guidelines state:

Use title-style capitalization for titles, and sentence-style capitalization for all other UI elements.

and:

Write the label as a phrase or an imperative sentence, and use no ending punctuation.

In the words of Richard Anderson: We have the technology! Which brings us to the first of a few key areas or techniques that could improve the tools that help ensure UI Consistency:

Extensibility

We may have the technology, but we can't easily put it into play. Advances in practical natural language and image processing algorithms can be (and have been!) applied to web design analysis and verification, but these techniques are still only available separately. Many one-off tools check a small handful of important guidelines (such as Michael Tamm's image processing techniques), but there aren't any concerted efforts to unify the work that is going into these tools into one deployable and reusable system.

The HTML5 analysis tools that are well integrated (primarily IDEs and linters) lack the facilities to implement or integrate with the processing logic needed to check arbitrary UI guidelines. Most existing analysis tools simply weren't designed to be extended with new techniques, to incorporate new libraries, or to take into account external tools to generate visual renderings for testing purposes.

We view extensible tools as the first, and most important, step towards creating a unified toolchain to ensuring user interface consistency. Extensibility is about empowerment, efficiency, and community: extensible tools allow you to learn the intricacies of one tool and then either watch it grow through community contributions, or jump in and add the features you need without starting from scratch.

Most importantly, extensibility provides a path for integrating quantitative guidelines in a uniform testing and debugging interface.

Semantic Reasoning

However, many guidelines are inherently qualitative and subjective: To wit, read over the Design Concepts for confirmation dialogs in Windows 7, which starts with:

Unnecessary confirmations are annoying

While no one will take issue with the intention, it is quite difficult to come to consensus on which dialogs are unnecessary. Thankfully, the Microsoft guidelines present many examples, justifications, alternate solutions, and explanations to support the guidelines. Taken on a whole, the document provides a sound body of advice to apply to your interfaces. Combine that document with human intuition and semantic reasoning, and we can manually create user interfaces that meet the specification. We have an edge over automated tools because we're able to reason about the semantics of user interfaces with ease.

With that in mind, let's consider another, similar situation: the W3C Web Accessibility Initiative, and the ARIA specification:

This specification provides an ontology of roles, states, and properties that define accessible user interface elements and can be used to improve the accessibility and interoperability of web content and applications. These semantics are designed to allow an author to properly convey user interface behaviors and structural information to assistive technologies in document-level markup.

The ARIA specification specifically enables automated systems to better understand the semantics of user interfaces. Role attributes exist to differentiate form elements, dialogs, tooltips, etc... Many of the semantic annotations necessary to enforce common UI Guidelines already exist in the ARIA specification.

Furthermore, these semantic annotations are beginning to appear in common widget toolkits for HTML, such as Dojo. It is only a matter of time before every web application has semantic annotations as a matter of course. Checking properties on all the dialog widgets will become a triviality, but only if the tool support exists to perform heuristic analyses based on these (and other) semantic annotations.

If the tools were fully extensible, then the control over these heuristics would also be at hand: Disable or modify the checks that don't apply to your applications, or write your own heuristics that do. Then share your contributions to perpetuate the use of rich semantics.

Familiarity

Galois has a long history of applying Domain-Specific languages to make programming in a specific context easier. We're working on the same general idea on this project, but the language of choice is not a new creation. Rather, we think that the language of your testing and build system should be as closely related to the actual project as possible—in the case of HTML5, JavaScript is the natural choice.

So far, we've been discussing the use of human interface guidelines as a source for criteria to check. However, we also believe that an extensible tool should be able to accept arbitrary checks, written in a rich, Turing-complete language. This is one point where our approach differs from similar tools, such as  SeleniumIDE. Selenium IDE is extensible through "Selenese", a simple language of 0-2 parameter commands for automating web applications and performing simple checks. Automation is, without a doubt, a critical aspect of UI testing, but the expressiveness needed to check complex properties is beyond the capabilities of Selenese (and indeed, many UI Guidelines are complex enough to require a Turing-complete language as well).

Reusability

We've made the decision to trade simplicity for expressiveness by selecting JavaScript as our language for guidelines, which increases the importance of reuse. Selenese scripts would have been simpler to create, but they would not be as general, and therefore, more difficult to share and reuse. As a result, Selenese scripts are easier to use for simple tests; however, guidelines are more general, offsetting the increased encoding cost. Where tests only serve a single purpose, for a single application, guidelines are of value to a vast community of developers.

It's easy to imagine, for instance, that the Windows Metro-Style UI Guidelines would include an executable specification that ensures that your applications conform to the look and feel of the platform.

Where can we find these capabilities now?

The features of HTML analysis tools we've talked about above aren't new requirements; developers have had a need for extensible, reusable, familiar, and semantically-rich tools for years, but the stepping stones were not in place yet. Today, we have a large body of examples (in the form of linters, validators, and automation tools) to build on, each one contributing to one aspect of the problem. There is a progression of tools and languages that is culminating in documents such as the HTML5 specification and tools like Selenium WebDriver and Selenium IDE.

We, as a community, are on the cusp of a new generation of extensible, semantic, reusable, and pragmatic tools to ensure consistency in rich interfaces; it's just a matter of connecting the dots and collaborating to bring all the pieces together.