This article is for the old version. Please see the updated version.

I like spending time with words and agonising over them, but coming up with names for things is notoriously daunting. Needing a slew of titles for a music project, building a semi-random word generator seemed like a quick solution – until it started to grow. Here follows a reasonably detailed account of its development.


Sourcing the data was the easy part - evidently the resplendent scholars of Wikipedia have been compiling an extensive list of medical word stems (prefixes and suffixes) which looked large enough. A little while and a few regex headaches later and it was in a useable format.

As usual, the end goal was a Javascript version which could rapidly generate and regenerate without constantly calling back to the server. But it needed to be backed by a functional PHP base for both the initial page load as well as the paranoids and pedants who still refuse the wonders of the Indonesian-export-themed scripting language.

PHP apparently really likes arrays, so my initial thought was to keep the data in that format. But it’s not that useful outside of itself, and considering that the cost of running json_decode() once per page load is minimal on a relatively small dataset, JSON seemed a better alternative.

Word Formation

We want to generate theoretically pronounceable words, so some construction logic is needed. To avoid getting lost in the fascinating chasms of structural linguistics, phonemes and morphemes and the rest, let’s go with a simple rule for the moment: only join consonants with vowels and vice-versa.

The suffix changes the word form

In an attempt at forward-planning, we should choose the word’s suffix first as this decides whether it’s a noun or adjective. By giving the suffix priority, we can define the word’s category. Should we want to expand the generator in future to allow for compound terms using two words, we’ll need to request adjective-noun combinations which will require this suffix-priority generation.

So, building on the previous rule, the logic runs: first, randomly choose a suffix and check whether its ‘join’ (first letter) is a vowel; if so, find a prefix with a join (last letter) which is a consonant, if not find a prefix with a vowel join.

Outside Input & Permalinks

We also want to allow people to save a word they like and be able to come back to it or mail it to their friend. For this, some form of url query or hash will do nicely. The Javascript hash change question is a world of pain which I’d prefer to avoid it if at all possible. Let’s just stick to the traditional PHP-based url query.

To add to the usual security precautions taken whenever accepting url parameters, they were first parsed into a temporary array (not a plain variable) before being checked against the array of existing stems. They then could only affect the word formation process if found in the appropriate list.

But this introduces the first wrinkle in the relatively straight-forward suffix-priority of the word formation: as both, one or none of the stems are able to come in via the url query, we need to add logic to allow the prefix to take priority if it’s requested in the query.

Dodging AJAX

We ended up building the logic twice, first in PHP then in Javascript. Though clearly not the most elegant or efficient way, it allows the Javascript to run independently of the server after the initial load, eliminating AJAX calls and improving the frontend performance.

We knew that we’d need to have the full list of stems in the final page to allow for manual selection. Having PHP output the prefix and suffix arrays to the page as <ul>s eliminated the need to load the JSON file as the full dataset was then already available to Javascript via the DOM.

Interface Inception

The fundamental function of the layout is to simply show you the word itself, its word category and the English definition for both parts. We’ll rely on people’s inherent ability to construct meaning from those unconnected definitions rather than actually trying to programmatically make a single, grammatically-sound definition sentence.

There are only three primary actions required in the interface: 1) regenerate the whole word; 2) regenerate only one part of the word; 3) manually set one part of the word from a list. The ‘permalink to this word’ and ‘search for this word on Google’ would be secondary actions.

Initially it seemed obvious to layout each definition directly under its corresponding stem, but this quickly became a bad idea: even though the font containing the stem is far larger than the definition type, many short stems have long definitions. For example, the prefix ‘bi-’ is much narrower than its definition of ‘twice, double’, even with a huge difference in point size.

v1 interface showing text rollover highlight

So in this case, affordance was sacrificed for flexibility. The definitions were placed below one another, separated by a plus symbol to reinforce the notion of combining the two meanings together.

Next came the question of where to place the primary action buttons. We knew that the ‘regenerate’ button for the whole word should sit at the bottom of the container, in line with the prevailing interface pattern of ‘big bottom button activates the whole form’. As in the above image, this was combined with the secondary actions of permalink and Google search, respectively, displayed as icons with simple hover labels via title.

As for the two buttons for each stem, upon clicking a stem’s ‘regenerate’ button, a new stem of a different length will take its place. In order to allow the button to stay in the same place after regeneration and avoid the user having to chase the button each time, it should clearly be aligned to the left in the same way the text naturally aligns.

At first these were visually inside the container for each stem, invisible until hovered over. Not only did this obscure the text in an awkward way, but without any initial visual cue, it was not obvious that actions were available for the stems.

Better to have a clearer indication of the actionability of the stems. Let’s at least make them appear to have some utility without the need for a hover, even if we don’t make their buttons always visible.

Interface Matures

As cute as the rounded, skeuomorphic container was, it was just too limiting. To maintain that aesthetic consistently would have severely limited the positioning options of various elements and ultimately sacrifice usability. Where utility is the goal, never prioritise pretty over useful. Farewell pretty gradients and shadows, hello flexible flat boxes.

In the image above, though the vertical balance of the left-most version is optically pleasing, both it and the middle version confuse the visual meaning of the black bar: is it a usable button or just used to delineate the two stems? We needed two different element styles to keep clear what is a button and what is not.

Additionally, as can be seen in the middle design, an excessively long definition might stretch the container to the full width of the page while the delineating bar above the suffix would (and should) terminate where its suffix ends. The thinner lines at top and bottom of that design were an (unsuccessful) attempt at remedying that visual imbalance.

Questionable Hovers

As I’ve gone into before, I love the mousehover; it simplifies and collapses then reveals and expands when needed, it links disparate elements and enriches the palette of interactions. I’ve even argued that mystery meat is acceptable – better even – in rare circumstances, when used judicially. But we all know that touchscreens are rapidly making it an untenable choice.

The benefit to this particular project is debatable, but I couldn’t help myself. Primarily, the twin buttons below each stem retract when not hovered over to afford the interface a little more visual clarity. Secondly, and less disputably, each stem and its definition have a subtle, bi-directional highlight in order to compensate for the inability to vertically align them.

When the stem text itself is hovered, the appropriate ‘regenerate’ button highlights, as this is what a click in either place will do. When just the stem’s parent element (containing the lower bar) is hovered, the ‘regenerate’ button turns off and the buttons behave as usual.

Aiming for progressive enhancement, we’ve added this hidden/hover state of the stem buttons inside the media query @media (hover) so it will only be visible to those which report an available hover state. Many browsers don’t yet implement this correctly, but the worst that will happen is that the buttons will always be visible. So just marginally more visual clutter.

List Selection

With the current data, there are 649 prefixes and 108 suffixes, meaning displaying either list in it’s entirety will require a lot of space. We need to allow users to manually select a stem from these lists while both the word itself as well as the definitions are visible if possible.

A drop-down list is usually a good option for collapsing long lists, but here we also wanted to be able to give an indication of each list item’s definition before it’s selected. Seems like a scrolling window will work best.

As mentioned, the two lists are output separately to <li><a>s with the definition in the anchor’s title, which are then floated left to allow them to better take advantage of the horizontal space. These list containers are placed directly below the stem (and above the definitions) as their visual connection to the appropriate stem is of prime importance. Once opened, they extend their container as wide as the screen will allow so as to minimise the vertical space needed and avoid pushing the below definitions too far down the page.

Future Expansion

As mentioned above, the most obvious improvement would be to allow for generation of double-word terms. Though at least some elements of this were considered while building this initial version, the majority of work for that would be in preventing duplicate stems in the same term. Also from a UI standpoint, should controls for this be exposed or should it automatically decide if you get a single- or double-barrelled term?

Additionally, both the word formation logic and source data could be updated and expanded. The current stem list is far from comprehensive and the formation logic is much less sophisticated than it could be.

More consideration for mobile and touch-based devices could be taken as well. In its current state, the interface works well enough, but could benefit from a variety of modifications, not least of which is how to handle font size when a generated word is far to wide to fit on a small or narrow screen.

Finally, in a minor and oft-forgotten UI consideration, the current process of copy-pasting the term and its definitions from the page is less than ideal. Due to the structure of the HTML, the text from the stem buttons is included in the copy and the word itself is split by both this and line breaks. Perhaps some less-valid markup might solve this small gripe.


Flat shapes are far easier to work with than rounded, shaded, skeuomorphic elements. For all the shiny cuteness those afford, the lack of flexibility and unnecessary adherence to real-world concerns such as lighting often require too many usability compromises.

And language is amazing. Even with such simple rules for word construction, the depth and surprises which can arise are endlessly enjoyable.

That’s it for now; go make some body horror with the Medical Term Generator.

June 2015

Head image: What oesophagostasis might look like [src]