Data driven healthcare studies should include model details

The quality and quantity of healthcare data is on an upward trajectory as EMRs become more ubiquitous. One of the many benefits of this data has accrued to researchers who are increasingly using retrospective analysis to publish meaningful research. This is good news because as noted recently by Jeffrey Drazen, MD, Editor-in-Chief of the NEJM

the number of evidence-based recommendations built on randomized controlled trials (RCTs), the current gold standard for data quality, is insufficient to address the majority of clinical decisions.

However, one disappointment that I have encountered fairly consistently with these research studies is that they very rarely, if at all, include details regarding the model.  The most information that they may have regarding the modeling exercise will be the types of models that were considered (for example, logistic regression, naive Bayes, SVM etc.) and measures of their statistical performance. Very rarely will they contain details for those models such as the coefficients, conditional probabilities etc. The exceptions that I have seen are papers published outside of the US, most often from NHS or from Canada. For example, in this post, I discuss a study from NHS that develops a 30-day readmission risk model. The information available in this paper should be par for the course for all such publicly funded research.

Now, I understand that algorithms and models can be a competitive advantage and why someone may opt to not make their secret sauce public. But for research studies that are funded through public grants such as those from NHS, it should be required that not only the model details be made available but the underlying data set (if it can be satisfactorily de-identified) should be publicly shared as well. The idea behind publicly supported research is to advance the knowledge base in a given field and the best way to do so is to share as much as possible regarding the research so that other individuals and organizations can learn from it and carry the field forward.


Leave a Reply

Your email address will not be published. Required fields are marked *