Retrieval-augmented generation based doctor recommendation system using knowledge graph

Afrose, Sadia

View/Open

21266004_CSE.pdf (4.312Mb)

Date

2024-08

Publisher

BRAC University

Abstract

Finding suitable healthcare professionals and the growing demand for efficient healthcare access is challenging and advanced language modeling techniques to provide tailored medical advice. Reviewing the existing research on doctor recommendation systems, it became apparent that while previous authors developed functional models, their datasets are likely outdated and no longer reflective of current healthcare trends. These models would require complete retraining with a new or updated dataset to ensure their relevance and accuracy. In contrast, our approach takes advantage of the ability to update the existing database with additional, current information without the need for retraining the model from scratch. By simply integrating the updated data, we can maintain the integrity and functionality of the previous system, making our method both time-efficient and resource-conserving. This approach eliminates the need for redundant work, allowing us to leverage the previous model while ensuring the data remains current and applicable. The proposed Doctor Recommendation System leveraging a Knowledge Graph built with Neo4j and enhanced by a Retrieval-Augmented Generation (RAG) model using the LangChain framework. The system aims to provide up-to-date, personalized and accurate doctor recommendations by integrating structured and unstructured data sources. The Neo4j Knowledge Graph captures comprehensive relationships between doctors, specialties, disease, symptoms and medical conditions, offering a robust data foundation. The LangChain framework, incorporating a large language model (LLM), enhances the recommendation process by generating context-aware suggestions based on patient queries and historical data. The Doctor’s Specialty Recommendation dataset has been used on three chains which are RetrievalQAChain, GraphCypherQAChain, RetrievalQAWithSourcesChain and also similarity search is used. However, based on the correctness, distance, context accuracy and CoT context accuracy, graphCypherQAChain performed well. Moreover, ROUGE-1, ROUGE-2, ROUGE-3, ROUGE-L and BLEU are used to determine the performance for all the chains as well as compared with GPT-4. Based on these evaluation metrics, graphCypherQAChain attained a high performance level with 86% for ROUGE-1, 82% for ROUGE-2, 77% for ROUGE-3, 86% for ROUGE-L and 46% for BLEU using GPT-3.5-turbo whereas GPT-4 performed 78%, 60%, 48%, 78% and 40% respectively.

Keywords

Doctor recommendation system; Knowledge graph; Large language model; Retrieval-augmented generation

LC Subject Headings

Natural language processing (Computer science).; Machine learning.; Database management.

Description

This thesis is submitted in partial fulfillment of the requirements for the degree of Master of Science in Computer Science and Engineering, 2024.

Cataloged from the PDF version of the thesis.

Includes bibliographical references (pages 47-59).

Department

Department of Computer Science and Engineering, BRAC University

Type

Thesis

Collections

Thesis & Report, MSc (Computer Science and Engineering) [88]