OBJECTIVE: The objective of this study is to develop a deep learning pipeline to detect signals on dietary supplement-related adverse events (DS AEs) from Twitter.
MATERIALS AND METHODS: We obtained 247 807 tweets ranging from 2012 to 2018 that mentioned both DS and AE. We designed a tailor-made annotation guideline for DS AEs and annotated biomedical entities and relations on 2000 tweets. For the concept extraction task, we fine-tuned and compared the performance of BioClinical-BERT, PubMedBERT, ELECTRA, RoBERTa, and DeBERTa models with a CRF classifier. For the relation extraction task, we fine-tuned and compared BERT models to BioClinical-BERT, PubMedBERT, RoBERTa, and DeBERTa models. We chose the best-performing models in each task to assemble an end-to-end deep learning pipeline to detect DS AE signals and compared the results to the known DS AEs from a DS knowledge base (ie, iDISK).
RESULTS: DeBERTa-CRF model outperformed other models in the concept extraction task, scoring a lenient microaveraged F1 score of 0.866. RoBERTa model outperformed other models in the relation extraction task, scoring a lenient microaveraged F1 score of 0.788. The end-to-end pipeline built on these 2 models was able to extract DS indication and DS AEs with a lenient microaveraged F1 score of 0.666.
CONCLUSION: We have developed a deep learning pipeline that can detect DS AE signals from Twitter. We have found DS AEs that were not recorded in an existing knowledge base (iDISK) and our proposed pipeline can as sist DS AE pharmacovigilance.