‘Data is’ versus ‘data are’

Publishing statistics is a tricky business. ONS is constantly marrying the absolute requirement for statistical accuracy with communicating our key messages to as wide an audience as possible.

With such a diverse target readership there are often differences of opinion within ONS on how best to communicate our statistics.

This was evident in the first session of ONS’s Style Council – convened not only to settle differences of opinion on how best to communicate with our readers but also to set a consistent standard.

The style issue up for debate was whether we continue to use the phrase ‘data are’ or change it to ‘data is’?

The grammatically correct approach is ‘data are’, because the word ‘data’ is the plural of the latin word ‘datum’.

As a language pedant, who corrects errant apostrophes when I see them and prefers ‘fewer’ to ‘less’, I would normally uphold the rule of grammatical accuracy, but there were powerful arguments on both sides of the debate.

There is a body of opinion outside the ONS, opting for the ‘data is’ route.

The Oxford Dictionaries blog, hives the use of the word ‘data’ off into two directions; in scientific and technical writing it’s plural (data are). In general usage, ‘data’ can take a singular form.

You might argue that use of ‘data’ as a plural is fine in specialised, scientific fields. However, is that the field ONS is in? Perhaps in terms of the gathering and analysis of data, that is true. But what about the dissemination of this data? There’s the challenge.

Which begs the question, what is ONS? Is it an academic institution, a government agency or a publisher? All three, I’d say.

The Wall Street Journal decided to use the singular form of ‘data’ in 2012, because “most style guides and dictionaries have come to accept the use of the noun ‘data’ with either singular or plural verbs.”

The Oxford English Dictionary refers to ‘data’ as a “mass noun, similar to a word like information, which cannot normally have a plural and which takes a singular verb. Sentences such as ‘data was’ (as well as data were) collected over a number of years are now widely accepted in standard English.”

screenshot-cdn1-the-orbit-net-2016-12-08-12-34-02

When the Guardian posted a piece on the topic in 2012, a user pointed out that although using the singular of ‘data’ was bad Latin, we’re not speaking Latin any more.

Guardian style guide guru David Marsh cites the example of the word ‘agenda’, a Latin plural now used almost universally as a singular. He described using its singular form ‘agendum’ as “hypercorrect, old-fashioned and pompous.”

The beauty of the English language is that it constantly and consistently evolves and publishers are faced with a choice over whether or not to maintain the status quo (not the band) or change to a word which has become more commonly used.

Our Style Council opted to continue referring to data as plural, the main rationale being that people were less likely to be offended.

It’s a valid point; which could be considered the lesser of two evils.

In the Style Council’s next meeting in the New Year, we hope to address that hotly-contested point concerning whether numbers under 10 should be written out in full. I anticipate much debate.

Performance dashboard

One of the things we have been focused on over the past couple of sprints is building a public performance dashboard, providing insight into the way our website is used and how it’s performing. This was a nice mixture of front-end, back-end and ops work which involved the entire team – something which doesn’t happen too often!

Here it is – https://performance.ons.gov.uk/

While it’s similar to the GDS performance dashboards (https://www.gov.uk/performance), the GDS dashboards are aimed at transactional services, whereas ours is based on user experience and being able to find the right content.

The dashboard currently pulls data from a number of sources:

  • Google Analytics – for our website usage data
  • Pingdom – for our availability and page response time data
  • Splunk – for our application-specific metrics

Although we’ve got something working, we still have a long way to go – integrating our website KPIs, adding more context around the data we provide, including more metrics from other sources and more of our internal systems, and making the data really useful to content owners around ONS and the Digital Publishing team.


If you have any thoughts or comments on any part of the website please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.

Weeknotes: Sprint 12/13

I have been back in the office for a few days so here is a quick update on what we have been up.

We had stacked up quite a few changes locally whilst I was off so part of the focus has been getting all these live and out to users. There probably a couple of item the team have been working on that I have missed but here are the highlights from the last couple of weeks.

PDFs

All of our work around updating PDFs has now been completed so you should see a lot more consistency with these across the site. This includes a complete compendia PDF and the ability for our statistician to append additional tables to the back to help users now across all our content types where previously this had been limited.

Equations

We have implemented a solution using MathJax to allow us to render equations on the site. This is something we have been keen to get working for a while but needed a rethink in the way we planned to implement it.

Performance platform

A significant chuck of the past couple of sprints has been looking at getting our performance dashboard up and running. We are not quite in a position to share, partly due to a couple of last minute bugs but also while we wait for some live production data to populate the charts. We will be doing a separate post about that on here in the next week or so.

screencapture-performance-ons-gov-uk-1473066493678 (1)

User research

Our researchers have been busy looking at the next round of lab testing, reviewing our personas and extending these to some of our internal users.

We will be back on the road in London next Tuesday, 13 September (we still have slots at 11.15 and 15.45 – if you are interested and available to help, please contact Alison) and will be focused on  testing our recent changes to time series as well as validating some of the wider data journeys.

We will also be doing our annual user satisfaction survey soon and have a shorter, 2 question, survey coming out this week which is supporting some changes we are currently working on around how we ask for feedback on the site.


If you have any thoughts or comments on these changes or any other part of the site please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.

A dashboard for the UK (and other conceits)

One of the things you will hear from agilistas at GDS and further afield is ‘show the thing’. The idea is that you need to get something, anything really, in front of users (and stakeholders) as early and often as possible to get real feedback about something real rather than just reiterating a set of ambitions or ideas via Powerpoint or Outlook.

It is a powerful tactic.

The thing is though you have to decide on something to actually build at some point.

During our somewhat extended ’discovery’ period we gathered a great deal of information on what our users actually needed and what they expected but at some point decisions needed to be made about how to translate those needs onto screens for them to interact with (and potentially tear down.)

It is at this point that Sharpie scribbles on scraps of paper start taking on increased importance and the search for other inspiring websites becomes slightly more desperate.

We settled on a broad hypothesis to test and a handful of principles we wanted to apply to that hypothesis.

The ONS website should be a data dashboard for the UK.

That simple statement was at the heart of every decision we made and to enforce the idea there were five design principles behind it;

  • The visual identity of the site would be defined by data visualisation. There would be no stock photography or decorative images.
  • The figures would be the headlines. We would make the pages ‘glanceable’ with the key figures standing out and acting as calls to action to find out more rather than hidden amongst dense text.
  • The design would be ‘mobile-first’ to prepare for the ongoing shift of our audiences from desktops to mobile devices.
  • Site search would be front and centre of every page.
  • We would accept that most people come via Google so every page would be a ‘landing page’.

The hypothesis and the principles were entirely influenced by the user research we had up until that point but they were very much my interpretation (with help from my team) so we needed to build something quick, show people and see to what extend it held up.

This has been a constant process now since the launch of the Alpha 18 months ago and over time the site has evolved and some things worked out better than others.

To be honest the thing that has worked least well was the initial main hypothesis — we never really found a way to make the homepage of the site live up to that ideal and it also became clear over time that it wasn’t something that held up as a real user need under deeper scrutiny. The homepage remains something of a quandary to be honest even after all this time.

Also it has to be said that mobile traffic didn’t really grow as expected — I stand by the decision to make the site work well on all devices but we have actually stalled at about 15% of traffic from mobile for a couple of years now. A large enough number to take it seriously but not the explosion other sites have seen.

Our search strategy remains pretty sound albeit it always needs improving. Having the large central search box on each page was jarring for some users initially and we did look at shifting it to the more familiar position on the top right of the layout but actually the feedback quickly reassured us that we were on the right track. The complexity of the site (despite all our efforts) means search is often the most convenient way for users to find what they are looking for so making it a clear focal point on the site makes sense. Externally Google continues to grow as a source of traffic to the site — sometimes reaching almost 70% — so our tactic of making the site more search friendly is also proving successful.

Internally the move to a site without ‘decoration’ was sometimes a little controversial — a lot of effort had been made in the months preceding the new project to make the ONS website more inviting and that had included a lot of images and colourful infographics — much of which we stripped away.

The reality is that users either didn’t notice that the images had gone (as they had always subconsciously dismissed them) or appreciated the new clean and focused look of the site. The prominence of charts and other data visualisations was immediately (and remains) popular. I have to be honest if I had remained product lead for the site I would have softened my stance on this over time — I think we had to clear the decks and then make some sensible decisions about the wider user of imagery longer term but the decision still seems valid.

So that is a little bit of behind the scenes information about why the site looks like it does these days — thanks to Jonathan, the team at CX and Onkar all of whom were instrumental in the thinking behind it all — and to Crispin and Jon who ended up building an awful lot of it!

Making life hard to make things easier

This is our particular take on principle four – Do the hard work to make it simple and it centres around probably the most controversial decision we made – at least in a particular corner of technically astute observers.

The questions is always variation on the following;

Why the heck did you build your own content management system rather than use Drupal/Wordpress/Joomla/Sitecore/Reddot/Liferay (the list is endless and you can delete as appropriate.)

This wasn’t a decision we took lightly and we looked hard at a number of the major open source CMSs out there but for what it is worth this was the thinking behind the decision – I’m not sure there is a right answer but I stand by what we chose.

We came to the project with some clear objectives and any publishing application was going to need to support them –

  • the ability to consistently publish within 60 seconds at 09.30
  • a platform that was optimised for continuous integration..
  • ..and automated testing
  • a secure platform where we had control as much as possible of the technical security
  • cloud ready
  • sustainable codebase..
  • ..and one we could recruit developers to support
  • an application that supported data visualisation as the primary visual language
  • a public website that was as performant as possible

Applications like Drupal (which I’ll use as the example as it was the closest to being used) come with a great many benefits as so much of the thinking around the standard use cases for web publishing has been done and tools built to support them. There is a large community and a ready pool of developers to get things working quickly and successfully.

The issue for us – and this is where you have to make a call and not everybody will agree – is that our main use cases for our application tended to be edge cases for the wider Drupal (and all the others) community. If we were to use something ‘off the shelf’ the team would have had to strip out a lot of un-needed / un-wanted functionality and add a fair amount of custom features – leaving only the skeleton of the original software and that created concerns about how maintainable it would be and started to dilute the advantages of using an existing product.

The feeling was we would quickly end up carrying weight (and risk) from a load of features we didn’t need and be committed to a product where the roadmap didn’t align to our needs which could potentially end up with us stranded on an old version due to the amount of customisation we would need.

Alongside this the teams we were talking to with the closest use cases to us (in particularly the GDS and Guardian development teams and later the Telegraph as well) had gone the DIY route.

The preference of the development team was to go with something lightweight, open source but bespoke – taking lessons from the GDS work but building something specific to our needs. It was about this time we started to talk about this as;

..a bespoke configuration of existing open source components

A decision needed to be taken and with the evidence at hand it was clear that there were strengths and weaknesses either way but given the preferences of the team, the work of GDS and the other organisations we spoke to with complex publishing operations I approved the direction to create ‘Florence & Friends’ which made up our platform.

It was certainly not a straightforward decision and created a LOT of pain for the team in the early days – there is a great deal of ‘water carrying’ going on in the background of modern CMSs and that all needed to be replicated but the freedom to be laser focused just on our user needs and to boil everything down to the MVP to get stats published allowed us to be build something and get it launched on schedule.

While currently far from perfect it has provided a real foundation for our ongoing ambitions while supporting the daily publishing schedule and successfully separating business process from web publishing.

The big question when you look back at difficult decisions like this is ‘would you do it again?’. Honestly there were times during the Beta when if you had asked me that the answer would have been a resounding ‘heck no!’ but with a little space now and looking at what was achieved I continue to stand by the decision and firmly believe it has provided us with something that will work better for our needs in the medium and long term in a way the other options wouldn’t have (though they may well have provided a less painful short term!)

Time for some (series) updates

Today we have launched some fairly substantial changes to the individual time series pages on the website, as mentioned in our last release note. This is the result of work and conversations over the last couple of sprints and is aimed to clarify this content and address some problems identified since we have launched.

Whilst these updates change the structure of these pages we have made sure that the current locations for any given time series do not break, including the ‘/data’ version. These URLs will continue to give the latest version of the data.

What’s different?

The website now offers an additional layer of information and options for each time series. These options are linked to the datasets that these time series are populated from and aimed to clarify to users exactly what data is being displayed.

Source dataset:

sourcedataset

The top block identifies the dataset you are currently viewing the data of. All the data shown will be from this particular dataset giving a consistent picture.

Other variations of this time series:

variationsoftimeseries

All other variations on this time series, and the dataset that updates them, are then available further down the page and can be viewed by following the links.

For an example, take a look at: http://www.ons.gov.uk/economy/grossdomesticproductgdp/timeseries/abmi/ukea

In addition each of these variations will be available separately from the time series tool.

Why change?

The aim of the time series structure implemented through the beta was to provide users with a consistent location where they could access the latest estimate for any given series. This structure also allowed us to support user needs such as being able to search by series ID.

Now we are into the production cycle on the new website we identified that the implementation did not support some of the ways we publish, where due to differences in the way revisions were handled or for consistency with the wider datasets these series, although having the same ID could have different values on the same day.

This meant we needed to keep the version of data loaded from a particular dataset consistent with all the other values, and length of series, in that dataset rather than updating a ‘master’ version of data on the website as new figures became available.


If you have any thoughts or comments on these changes or any other part of the site please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.

Release note: Sprint 9

The start of this sprint led to some quick re-planning following some publishing issues on the 22 and 23 July for the Investment by insurance companies, pension funds and trusts and Internal Migration releases respectively. These releases, one of a number of releases on each day, failed to publish due to a connection failure although all the others on those days published successfully.

We changed things around so that we could investigating the issues and prioritise developing a fix. We identified a couple of possible causes and made some changes, but were unable to replicate the issue to test conclusively. We have also developed the functionality to retry publication if this happens but as this issue has not happened again we are holding this to one side for now. We will be watching this closely going forward.

Font

Up until this sprint we have not been setting a font specifically on the site, instead relying on the browser default serif font. The plan has always been to use Open Sans, but given this is a narrow font we wanted to take our time to ensure we did not make the accessibility of the site worse. We have worked closely with our design team to identify appropriate font weights and have used this as an opportunity to review padding and spacing and look at how we are decreasing title size on mobile screens.

Time series

The main focus of this sprint outside of investigating the publishing issues has been reviewing the way we currently publish time series to better support our statistical publishers. The new functionality will allow us to hold consistent variations of each series linked with the dataset that they were populated from. Functionality around being able being able to search by ID will be retained and the current API will continue to show the latest version we have published. There is a significant piece to test and migrate content to this structure which will be looked at in the next sprint.

User requested data routes

Following on from our recent usability testing and feedback we have updated the template for our individual user requested data pages so that they have links back to the complete list and details of how to make a request. We have also boosted common terms related to these in search so that these pages appear more highly.

Continuous deployment and sandpit environment

Internally we have been reviewing and rebuilding our continuous deployment pipeline to streamline our development process and have created a new ‘sandpit’ environment for our publisher to try out new ideas and improvements to content.

Number of bug fixes

  • Correction note not showing in in PDF
  • Pie chart tooltip styling not consistent with new designs
  • Problems caused in Florence if related links were not a valid URL

If you have any thoughts or comments on these changes or any other part of the site please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.

Release note: Sprints 7 and 8

A lot of the last two sprints have been about fixing things and getting things stable before we start looking at some bigger enhancements to the site. With that said we have been also been focussed on addressing many of the issues raised via the feedback button on the site and have also been back on the road in Birmingham conducting our latest usability testing.

The focus of the testing this time being the release calendar, a to z and user requested data. This highlighted a number of issues and potential enhancement, many of which we have made and some which we will be looking at in upcoming sprints. We will be working to address many of these issues in upcoming springs, but the following changes have been made so far;

As a direct result of user testing:

  • On the dataset page the first file will be shown by default, previously the show/hide on this page was closed for all items
  • The a to z now includes the edition field to help navigation through census content, previously these were all displayed as just ‘2011 Census’
  • The release calendar will retain keywords, entered into the filters, between tabs to help users search for upcoming/published releases
  • Our email alerts service is now available from the release calendar as users expected it to be available from here
  • The ‘previous versions’ link on statistical bulletins has been changed to be more discreet.

From feedback:

Allow larger tables. We have increased the width available to inline tables so that larger tables are presented more clearly in our statistical bulletins and articles

Table on time series page. Data in the table on time series pages is now right aligned and displaying the correct number of decimal places for all values.

Replaced the logo so it is clearer on high resolution screens.

Made the hover over consistent on all chart types.

On the internal side of things we have been working to address a number of issues, including;

Fix for time series publishing issue. We have had a problem on several occasion where the individual time series for some of the taxonomy pages have failed to update. This was caused by the zip file we generate to transfer these to the website becoming corrupt. To resolve this  we now validate the zip is correct before publication and regenerate if required.

Audit history in Florence (our publishing tool). Additional functionality has been added to Florence to allow members of our publishing team to track the history of all the items in the system. This is currently at an early stage but will be developed and enhanced to match their requirements


If you have any thoughts or comments on these changes or any other part of the site  please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.

Getting back into the swing of things

Earlier this week we deployed our first proper set of changes since we went live 11 weeks ago today (that went quick!). The team have been making and deploying fixes and updates regularly since we launched this has been our first opportunity to introduce and improve functionality on a larger scale.

So what are the changes?

We have improved the PDFs available on the site both in terms of appearance and resilience to meet user feedback. This includes a specific template for the PDFs which allows us to be much more responsive in making future changes. These changes take effect in anything new we publish and we are working to update all previously published PDF’s over the next couple of weeks.

We have had a lot of feedback around the PDFs on the new site, which tended to fall into one of two camps;

  • problems with only partial or incomplete files being downloaded, including missing data tables at the end
  • presentation of the PDF’s, in particular font size

When we looked into these issues we realised that a change of approach was the best solution. Rather than generate each PDF as it was requested we decided to bring this forward a step and pre-generate before publication. This fixed a number of issues as we could ensure these were complete and there was no risk of users getting a partial PDF, but it also meant that these were served to users more quickly.

We decided that whilst we were fixing these problems it also made sense to revisit the way we were generating the files themselves to give us more control of the way they were being presented so that we could solve the presentational issues that had been raised as well.

We have enhanced the table builder we use in our written content to allow the headings to be marked up correctly in HTML so going forward these will be much more accessible.

We have a lot of complex tables as part of our bulletins and articles and developing a tool to accurately convert these from Excel into HTML proved to be difficult, so much so that the first iteration of the table builder did not have the facility to mark-up these tables with appropriate headings.

This was one of big things we identified at launch as a priority to fix as soon after as possible was around the accessibility of the data tables we use in written content on the site and in particular extending our table builder to allow us to mark these up with appropriate headings.

In the background we have also reviewed the UI of our publishing tool, Florence, and started to update the key areas that needed a bit more attention. We have also added additional functionality to allow us to make bulk updates to the titles of individual time series on the site as part of the publishing process and developed the basis for more robust and informative logging to allow us to identify problems sooner and easier to identify and fix once we do.


If you have any thoughts or comments on these changes or any other part of the site  please do let us know by email, the comments on this blogpost or on Twitter to @ONSdigital.