We’ve already had a look at the vast array of different problems associated with attempting to track Australian health and medical research (HMR) spending. We’ve also explored the hoops that researchers have to jump through when they apply for grants or seek to gain approval from ethics committees, not to mention when they make requests for access or linkage from custodians and data gatekeepers. One might be tempted to think that no matter where Australian HMR researchers turn, the data they need is locked behind bars that are almost impossible to open – so much so that it almost takes a special kind of resilience to finish their projects.
What we haven’t discussed so far is what happens to linked datasets after research projects are complete.
Linking Datasets – and Re-Using Them
Linking datasets is an incredibly valuable activity for researchers in any discipline, and this is especially true in HMR. Linked datasets allow researchers to find new patterns in the data they’re examining and unearth previously unrecognised correlations between different phenomena, something which helps them derive new insights and solve problems that can’t be tackled properly using individual datasets alone. Indeed, the use and re-use of linked datasets offers incredible potential to speed up the pace of Australian HMR – but as we know, researchers have to jump through a number of hoops before they’re able to access and link datasets for their projects.
Considering the effort involved in linking datasets, one might be forgiven for thinking that, when a publicly funded research project concludes, any dataset that has undergone such a lengthy process of construction would be made available for future researchers to use and re-use in an anonymised or de-identified form. That’s at least what happens in a variety of countries overseas, such as in America and the United Kingdom.1
Unfortunately, in Australia, there is no consistent policy around the re-use of linked datasets. We do not mandate that publicly funded researchers make their linked datasets available to approved users at the conclusion of projects, and whether linked datasets themselves are retained varies between jurisdictions. Some state-based linkage agencies retain linked datasets, or at the very least, retain data linkage keys to facilitate easier re-linkage of data in the future; when it comes to Commonwealth data, it’s standard practice to destroy linked datasets following project completion. Not only that, but if a linked dataset involves the use of national health administrative records (such as MBS and PBS), then this destruction is a legal requirement.2
What this means is that any researcher who wants to replicate a study or perform some form of secondary analysis on a linked dataset has to go through the whole series of protocols and processes that the original researchers went through in the first place.
The traditional justification for destroying linked datasets or making them difficult to access is that individual privacy will be impinged upon if these datasets remain open to future use, and this argument holds some merit if the datasets contain identifiable data referring to specific persons. However, these datasets can be subjected to processes of de-identification and anonymisation to remove identifiable information, and modern privacy preserving techniques have advanced to the point where re-identification has become particularly difficult; likewise, data security protocols provide an extra layer of protection. On top of this, these concerns need to be balanced against the multitude of public benefits that would flow from making linked datasets available and streamlining Australia's research environment more generally – an issue which we'll discuss in a future blog post.
Indeed, failing to ensure that linked datasets are adequately available for new use is deeply problematic for HMR. Not only does it drastically slow the pace of Australian research and limit the kinds of research questions that researchers can answer, it also goes against the principles of the scientific method. Advances in HMR require observation, repetition and replication, and as such, studies and trials need others to conduct similar experiments using the same information for their validity to be confirmed or refuted. The present environment renders such necessary steps exceedingly unattractive to researchers, and as a result, the quality of Australian HMR suffers.3
Of all the barriers to Australian HMR data access, the current failure to promote the re-use of linked datasets is perhaps the most bizarre. The solution is simple: we need a nationally consistent policy that mandates linked data retention for publicly funded research. Luckily, we’re not the only ones who think so, and there is growing awareness that this practice – among others – is in need of change. For instance, the Productivity Commission has recently recommended that the Federal Government abolish its requirement to destroy linked datasets at the completion of researchers’ projects, and that data and metadata used in publicly funded research projects should be made available for other researchers to use (with appropriate approval) at project completion.4
Such moves are wise: doing so will make it far easier for researchers to understand if data have been manipulated or if the assumptions underpinning a specific study are valid; it’ll also speed up the pace of HMR, allow researchers to find exciting new ways of using existing datasets, and, just as importantly, save taxpayers considerable amounts of money. In any case, let us hope that this recommendation is taken seriously by the Federal Government and does not fall on deaf ears – the way that so many others have.
- 1. In both countries, publicly funded researchers are encouraged to make their anonymised linked datasets available for re-use and secondary use by other researchers. See RJ Mitchell et al., 'Data linkage capabilities in Australia: practical issues identified by a Population Health Research Network 'Proof of Concept project' ', Aust NZ J Public Health, 39, No. 4 (Aug., 2015), pp. 319-325.
See S135AA of the National Health Act, The High Level Principles for Data Integration Involving Commonwealth Data for Statistical and Research Purposes (which were adopted by the Australian Government in 2010) and, more specifically for MBS and PBS data, the Privacy Guidelines for the Medicare Benefits and Pharmaceutical Benefits Programs – March 2008.
That said, the rules surrounding the destruction of linked datasets do not mandate a period by which this destruction is meant to have taken place.
Things are changing, but slowly and in an ad-hoc fashion: the Multi Agency Data Integration Project (MADIP) aims to create enduring datasets linking Commonwealth data from various departments, including Social Services, Human Services, the Australian Taxation Office and the Australian Bureau of Statistics, and, importantly, Health.
- 3. It should be noted that failing to promote the re-use of linked datasets is not the only problem – a wide range of scientific disciplines across the world are undergoing a research reproducibility ‘crisis’ stemming from the fact that funding is primarily geared toward novel research papers, whilst prestigious journals often reject replicated research on the basis of unoriginality. See https://theconversation.com/the-science-reproducibility-crisis-and-what-can-be-done-about-it-74198).
- 4. See recommendations 6.14 and 6.17 on pp. 43 and 44 respectively. Other recommendations include publishing league tables on the progress that institutions have made in making their ‘unique research data and metadata’ available to others as well as including details regarding how and when other researchers can access project metadata and data following the completion of a research project. The report can be found at http://www.pc.gov.au/inquiries/completed/data-access/report/data-access.pdf.