Skip to content
brickster.ai
All videos
newsDatabricks·July 19, 2022

Lessons Learned from Deidentifying 700 Million Patient Notes

Description

Providence embarked on an ambitious journey to de-identify all our clinical electronic medical record (EMR) data to support medical research and the development of novel treatments. This talk shares how this was done for patient notes and how you can achieve the same. First, we built a deidentification pipeline using pre-trained deep learning models, fine-tuned to our own data. We then developed an innovative methodology to evaluate reidentification risk, as American healthcare laws (HIPAA) require that de-identified data have a “very low” risk of reidentification, but do not specify a standard. Our next challenge was to annotate a dataset large enough to produce meaningful statistics and improve the fine-tuning of our model. Finally, through experimentation and iteration, we achieved a level of level of performance that would safeguard patient privacy while minimizing information loss. Our technology partner provided the computing power to efficiently process hundreds of millions of records of historical data and incremental daily loads. Through this endeavor, we have learned many lessons that we will share: • Evaluating risk of reidentification to meet HIPAA requir

Description from YouTube. Full content on the video page.

More from Databricks