DrugSLM - Small Language Model for Drug Information

Master's Thesis Project | Federal University of Paraná (UFPR) | Computer Science Department

DrugSLM is a specialized Small Language Model (SLM) trained on drug package inserts and other pharmacological databases, designed to understand and generate accurate and simple pharmaceutical information.

🎓 Academic Context

This project is part of a Master's thesis in Computer Science at the Federal University of Paraná (UFPR), Curitiba, Brazil. The research focuses on:

Democratizing access to complex pharmacological information
Structuring unstructured data from official pharmaceutical documentation
Domain-adaptation of Language Models for pharmacological information
Resource-efficient fine-tuning strategies for Small Language Models (SLMs)
Validation and reliability of Generative AI in healthcare contexts

Researcher: Vinícius de Lima Gonçalves
Advisor: Professor Eduardo Todt, PhD
Institution: Department of Computer Science, UFPR

🎯 Project Vision

High-quality outcomes likely depend on rigorously structured data rather than massive scale, favoring Small Language Models (SLMs). Leveraging Knowledge Graphs aims to provide precise context and granularity. Comparing architectures intends to demonstrate that data structure is key to resource-efficient, reliable pharmacological AI.

🧬 Project Lifecycle and Roadmap

The project follows a rigorous 6-phase data-centric methodology, ensuring reproducibility and reliability from data acquisition to model deployment.


%%{init: {'theme': 'base', 'themeVariables': { 'fontSize': '20px', 'fontFamily': 'arial' }}}%%

flowchart LR

    classDef phase fill:#f0f4f8,stroke:#2c3e50,stroke-width:1px,color:#2c3e50, text-decoration: none;

    P1(Data Acquisition</br>& Preparation):::phase
    P2(Modeling</br>& Design):::phase
    P3(Traning</br>& Optimization):::phase
    P4(Evaluation</br>& Validation):::phase
    P5(Integration</br>& Optimization):::phase
    P6(Deployment</br>& Feedback):::phase

    P1 ==> P2 ==> P3 ==> P4 ==> P5 ==> P6
    P2 -.-> P1
    P4 -.-> P2

    click P1 "architecture/roadmap/#1-data-acquisition-and-preparation" "Go to Phase 1: Data Acquisition and Preparation"
    click P2 "architecture/roadmap/#2-modeling-and-system-design" "Go to Phase 2: Modeling and System Design"
    click P3 "architecture/roadmap/#3-training-and-optimization" "Go to Phase 3: Training and Optimization"
    click P4 "architecture/roadmap/#4-evaluation-and-validation" "Go to Phase 4: Evaluation and Validation"
    click P5 "architecture/roadmap/#5-integration-and-optimization" "Go to Phase 5: Integration and Optimization"
    click P6 "architecture/roadmap/#6-deployment-and-feedback" "Go to Phase 6: Monitoring"

Explore the detailed lineage regarding extraction, transformation, training strategies, and validation metrics for each phase by clicking on the nodes below.