Ongoing
Historically, the development and evaluation of machine learning models—especially large and small language models (LLMs/SLMs)—have focused primarily on improving accuracy. More recently, dimensions such as fluency, factuality, coherence, and consistency have also been explored. However, one critical yet underexplored facet of intelligence is personalization — the model's ability to tailor outputs based on individual user preferences, which are often subjective and dynamic.
This project focuses on addressing the lack of robust evaluation frameworks to measure and understand the personalization capabilities of modern language models.
1. Static Evaluation
An evaluation framework to measure how effectively language models can generate outputs that align with a user’s individual preferences, at one time-step.
🎯 Goal: To assess how well summarization models capture and reflect user-specific preferences in a static context.
2. Dynamic Evaluation
An evaluation framework to measure how effectively models adapt their outputs as user preferences evolve over time, offering a temporal lens on personalization.
🎯 Goal: To evaluate the capacity of models to generate outputs that stay aligned with shifting user interests, providing a dynamic understanding of personalization.
Existing datasets often lack the diversity needed to fully capture the nuances of user behavior, interactions, and context. This limitation restricts a model's ability to generalize and personalize effectively across varied user profiles.
🎯 Goal: To boost adaptability and personalization by using data augmentation to simulate diverse user behaviors, enriching training data and enhancing model performance.
Personalized text summarization models often assume static user preferences, relying primarily on explicit input signals without accounting for the natural evolution of user interests over time. However, effectively capturing how a user's focus and behavior change is essential for deeper personalization. To address this limitation, our ongoing work introduces a dynamic approach that leverages evolving representations of user interests.
🎯 Goal: To guide summarization models toward producing more relevant and personalized outputswithout altering model architectures.
Large Language Models (LLMs) often operate as black boxes, making it difficult for users to understand how and why specific outputs are generated—limiting trust, transparency, and responsible adoption.
🎯 Goal: To improve LLM interpretability by developing techniques that reveal model reasoning, build trust, and enable transparent, responsible use.