Skip to content

Publications

PerSEval: Assessing Personalization in Text Summarizers

Transactions on Machine Learning Research (2024)

Personalized summarization models cater to individuals' subjective understanding of saliency, as represented by their reading history and current topics of attention. Existing personalized text summarizers are primarily evaluated based on accuracy measures such as BLEU, ROUGE, and METEOR. However, a recent study argued that accuracy measures are inadequate for evaluating the degree of personalization of these models and proposed EGISES, the first metric to evaluate personalized text summaries. It was suggested that accuracy is a separate aspect and should be evaluated standalone. In this paper, we challenge the necessity of an accuracy leaderboard, suggesting that relying on accuracy-based aggregated results might lead to misleading conclusions. To support this, we delve deeper into EGISES, demonstrating both theoretically and empirically that it measures the degree of responsiveness, a necessary but not sufficient condition for degree-of-personalization. We subsequently propose PerSEval, a novel measure that satisfies the required sufficiency condition. Based on the benchmarking of ten SOTA summarization models on the PENS dataset, we empirically establish that -- (i) PerSEval is reliable w.r.t human-judgment correlation (Pearson's r = 0.73; Spearman's ρ = 0.62; Kendall's τ = 0.42), (ii) PerSEval has high rank-stability, (iii) PerSEval as a rank-measure is not entailed by EGISES-based ranking, and (iv) PerSEval can be a standalone rank-measure without the need of any aggregated ranking.


Are Large Language Models In-Context Personalized Summarizers? Get an iCOPERNICUS Test Done!

Empirical Methods in Natural Language Processing (2024)

Large Language Models (LLMs) have succeeded considerably in In-Context-Learning (ICL) based summarization. However, saliency is subject to the users' specific preference histories. Hence, we need reliable In-Context Personalization Learning (ICPL) capabilities within such LLMs. For any arbitrary LLM to exhibit ICPL, it needs to have the ability to discern contrast in user profiles. A recent study proposed a measure for degree-of-personalization called EGISES for the first time. EGISES measures a model's responsiveness to user profile differences. However, it cannot test if a model utilizes all three types of cues provided in ICPL prompts: (i) example summaries, (ii) user's reading histories, and (iii) contrast in user profiles. To address this, we propose the iCOPERNICUS framework, a novel In-COntext PERsonalization learNIng sCrUtiny of Summarization capability in LLMs that uses EGISES as a comparative measure. As a case-study, we evaluate 17 state-of-the-art LLMs based on their reported ICL performances and observe that 15 models' ICPL degrades (min: 1.6%; max: 3.6%) when probed with richer prompts, thereby showing lack of true ICPL.


Accuracy is not enough: Evaluating Personalization in Summarizers

Empirical Methods in Natural Language Processing (2023)

Text summarization models are evaluated in terms of their accuracy and quality using various measures such as ROUGE, BLEU, METEOR, BERTScore, PYRAMID, readability, and several other recently proposed ones. The central objective of all accuracy measures is to evaluate the model’s ability to capture saliency accurately. Since saliency is subjective wrt the readers’ preferences, there cannot be a fit-all summary for a given document. This means that in many use-cases, summarization models need to be personalized wrt user-profiles. However, to our knowledge, there is no measure to evaluate the degree-of-personalization of a summarization model. In this paper, we first establish that existing accuracy measures cannot evaluate the degree of personalization of any summarization model, and then propose a novel measure, called EGISES, for automatically computing the same. Using the PENS dataset released by Microsoft Research, we analyze the degree of personalization of ten different state-of-the-art summarization models (both extractive and abstractive), five of which are explicitly trained for personalized summarization, and the remaining are appropriated to exhibit personalization. We conclude by proposing a generalized accuracy measure, called P-Accuracy, for designing accuracy measures that should also take personalization into account and demonstrate the robustness and reliability of the measure through meta-evaluation.


AutoReco: A Tool for Recommending Requirements for their Non-Conformance with Requirement Templates (RTs)

IEEE 31st International Requirements Engineering Conference (RE) (2023)

RTs generally possess a fixed syntactic structure and comprise pre-defined slots, and requirements written in the format of RTs must conform with the template structure. If the requirements do not conform to the RT, manually rewriting them to adhere to the RTs structure is tedious. In this paper, we develop the AutoReco tool for the automated recommendation of functional requirements for non-conformance with requirement templates (RTs). Our preliminary results on nine case studies show an accuracy of 83.9% for providing recommendations to non-conformant requirements with RTs.


Inline Citation Classification Using Peripheral Context and Time-Evolving Augmentation

Lecture Notes in Computer Science (2023)

Citation plays a pivotal role in determining the associations among research articles. It portrays essential information in indicative, supportive, or contrastive studies. The task of inline citation classification aids in extrapolating these relationships; However, existing studies are still immature and demand further scrutiny. Current datasets and methods used for inline citation classification only use citation-marked sentences constraining the model to turn a blind eye to domain knowledge and neighboring contextual sentences. In this paper, we propose a new dataset, named 3Cext, which along with the cited sentences, provides discourse information using the vicinal sentences to analyze the contrasting and entailing relationships as well as domain information. We propose PeriCite, a Transformer-based deep neural network that fuses peripheral sentences and domain knowledge. Our model achieves the state-of-the-art on the 3Cext dataset by F1 against the best baseline. We conduct extensive ablations to analyze the efficacy of the proposed dataset and model fusion methods.


Consistency Analysis of NLP Approaches for a Conference Reviewer-Manuscript Match-Making System

IEEE 18th India Council International Conference (2021)

Peer-review process is an important part of scholarly communication. The quality of a conference also depends on its peer-review process. The selection of a competent reviewer to review a submitted manuscript in a conference, is a crucial facet of a peer-review process. This selection relies on the adopted match-making approach along with the constraint optimization reviewer allocation algorithm. The match-making approach needs to be consistent with its decision of selection and allocation of the reviewers. In this work, we proposed a framework for evaluating the consistency of various standard NLP approaches that are used for match-making process in a conference. The consistency analysis was performed over a real multi-tracked conference organized in 2019. We showed that the Contextual Neural Topic Modeling (CNTM) with word embedding technique was most consistence among all the 13 approaches that we chose to analyze.


A Consistency Analysis of Different NLP Approaches for Reviewer-Manuscript Matchmaking

Lecture Notes in Computer Science (2021)

Selecting a potential reviewer to review a manuscript, submitted at a conference is a crucial task for the quality of a peer-review process that ultimately determines the success and impact of any conference. The approach adopted to find the potential reviewer needs to be consistent with its decision of allocation. In this work, we propose a framework for evaluating the reliability of different NLP approaches that are implemented for the match-making process. We bring various algorithmic approaches from different paradigms and an existing system Erie, implemented in IEEE INFOCOM conference, on a common platform to study their consistency of predicting the set of the potential reviewers, for a given manuscript. The consistency analysis has been performed over an actual multi-track conference organized in 2019. We conclude that Contextual Neural Topic Modeling (CNTM) with a balanced combinatorial optimization technique showed better consistency, among all the approaches we choose to study.


Faster Private Rating Update via Integer-Based Homomorphic Encryption

Lecture Notes in Computer Science (2021)

In encryption-based privacy-preserving recommender systems (PPRS), the user sends encrypted ratings to the server. An encrypted rating vector can have thousands of ciphertexts, causing a communication overhead. In some encryption-based PPRS proposed in the literature, if a user wants to rate a single item, he/she is required to send the entire rating vector to hide which item was rated. A user’s rating value and the item that is being rated both should remain private. This can be seen as a variant of the classical PIR-write problem. The goal is that each time a user wants to modify any data block, the communication should be minimal from the user.

In encryption-based PPRS, the ratings are required to be encrypted using homomorphic schemes so that the server can generate recommendations. Arjan proposed a private rating update protocol for the recommender system applications, whereas Lipmaa and Zhang gave a protocol for a more general database scenario. We propose a hybrid approach that combines the advantages of each protocol, yielding a more efficient protocol. Our approach has constant user-side computation, and it reduces the communication and computation overhead at the server-side compared to previous approaches.


Improving Access to Science for Social Good

Communications in Computer and Information Science (2020)

One of the major goals of science is to make the world socially a good place to live. The old paradigm of scholarly communication through publishing has generated enormous amount of heterogeneous data and metadata. However, most of the scientific results are not easily discoverable, in particular those results which benefit social good and are also targeted by non-scientists. In this paper, we showcase a knowledge graph embedding (KGE) based recommendation system to be used by students involved in activities aiming at social good. The proposed recommendation system has been trained on a scholarly knowledge graph constructed for this specific goal. The obtained results highlight that the KGEs successfully encoded the structure of the KG, and therefore, our system could provide valuable recommendations.


Item-Based Privacy-Preserving Recommender System with Offline Users and Reduced Trust Requirements

Lecture Notes in Computer Science (2019)

Safeguarding privacy of ratings assigned by users is an important issue for recommender systems. There are several existing protocols that allow a server to generate recommendations from homomorphically encrypted ratings, thereby ensuring privacy of rating data. After collecting the encrypted ratings, the server may require further interaction with each user, which is problematic in case some users were to go offline. To solve the offline user problem previous solutions use additional semi-honest third parties. In this paper, we propose a privacy-preserving recommender system that does not suffer from the offline user problem. Unlike previous works, our proposal does not require any additional third party. We demonstrate with the help of experiments that the time required to generate recommendations is efficient for practical applications.


An Enhanced Privacy-Preserving Recommender System

Communications in Computer and Information Science (2019)

A recommender system stores historical data collected over a long period from various users, these are used to predict how new and existing users would rate an item. As user data is stored by the system, this poses threat to user’s privacy. The goal of a privacy preserving recommender system is to hide user ratings from system and yet allow to make recommendations.

A recent example is the privacy-preserving recommender scheme proposed by Badsha, Yi and Khalil. Their scheme assumes that the server is semi-honest. However, when the server is malicious an attack is possible, as shown by Mu, Shao and Miglani. In this paper, we propose a simple modification to their scheme, which preserves the privacy of ratings against a malicious server. We demonstrate that the computation and communication costs of modified protocol are reasonable in comparison to original protocol.


SimDoc: Topic Sequence Alignment based Document Similarity Framework

K-CAP: Proceedings of the 9th Knowledge Capture Conference (2017)

Document similarity is the problem of estimating the degree to which a given pair of documents has similar semantic content. An accurate document similarity measure can improve several enterprise relevant tasks such as document clustering, text mining, and question-answering. In this paper, we show that a document's thematic flow, which is often disregarded by bag-of-word techniques, is pivotal in estimating their similarity. To this end, we propose a novel semantic document similarity framework, called SimDoc. We model documents as topic-sequences, where topics represent latent generative clusters of related words. Then, we use a sequence alignment algorithm to estimate their semantic similarity. We further conceptualize a novel mechanism to compute topic-topic similarity to fine tune our system. In our experiments, we show that SimDoc outperforms many contemporary bag-of-words techniques in accurately computing document similarity, and on practical applications such as document clustering.


s-Birds Avengers: A Dynamic Heuristic Engine-Based Agent for the <italic>Angry Birds</italic> Problem

IEEE Transactions on Computational Intelligence and AI in Games (2016)

Angry Birds is a popular video game in which a set of birds has to perform sling shots (bird shots) so as to kill pigs that are protected by a structure composed of different building blocks. The fewer birds we use and the more blocks we destroy, the higher the score we achieve. AIBirds competition is an AI challenge where an intelligent bot has to be developed that plays the game without human intervention. In this paper, we describe the approach implemented in the bot, s-birds Avengers, that participated in ECAI AIBirds 2014. Heuristic techniques were designed to analyze unseen structures using various structural parameters and then to discover their vulnerable points using prior parameter learning training algorithm. The bot then uses this to decide where to hit the structure with the birds.


AskNow: A Framework for Natural Language Query Formalization in SPARQL

Lecture Notes in Computer Science (2016)

Natural Language Query Formalization involves semantically parsing queries in natural language and translating them into their corresponding formal representations. It is a key component for developing question-answering (QA) systems on RDF data. The chosen formal representation language in this case is often SPARQL. In this paper, we propose a framework, called AskNow, where users can pose queries in English to a target RDF knowledge base (e.g. DBpedia), which are first normalized into an intermediary canonical syntactic form, called Normalized Query Structure (NQS), and then translated into SPARQL queries. NQS facilitates the identification of the desire (or expected output information) and the user-provided input information, and establishing their mutual semantic relationship. At the same time, it is sufficiently adaptive to query paraphrasing. We have empirically evaluated the framework with respect to the syntactic robustness of NQS and semantic accuracy of the SPARQL translator on standard benchmark datasets.


SMARTSPACE: Multiagent Based Distributed Platform for Semantic Service Discovery

IEEE Transactions on Systems, Man, and Cybernetics: Systems (2014)

Service discovery is an integral issue in the area of service oriented computing (SOC). A centralized platform based service discovery suffers from major drawbacks such as scalability and a single point of failure. A P2P based design incurs high maintenance overhead for a distributed service registry and querying task. In this paper, we have proposed SMARTSPACE-a hybrid multiagent based distributed platform for efficient semantic service discovery. By utilizing reactive agents in modeling services, users' requests, and registry management middleware, the proposed service discovery algorithm, SmartDiscover, is able to achieve fast, scalable, parallel, and concurrent service finding within a systemic environment that can be highly dynamic, asynchronous, and concurrent. We have conducted the SmartDiscover experiments within the JADE 3.7 agent framework on top of both IBM Cloud Cluster and NetLogo simulation environments. The results showed promising positive outcomes in terms of average query response time and the number of message exchanges to maintain the distributed registry. The accuracy of SmartDiscover was measured and compared with the widely accepted benchmark OWL-S MX approach.