Publications

Selected publications and manuscripts related to large language models, robust NLP, and machine learning.

2026Manuscript in Preparation
Tiwari, K., Zhang, L. Counterfactual Text Generation via Geometric Style Control in Large Language Models.

This ongoing research investigates counterfactual text generation in large language models through structured representation-level interventions. The work explores how stylistic attributes can be modeled as directional structures in latent embedding space, enabling controlled modification of attributes such as writing style while preserving semantic meaning.

The proposed framework integrates contrastive learning, semantic regularization, and identity-preserving constraints to guide latent representations toward target attribute anchors while minimizing semantic drift. Extensive evaluation using automatic metrics, LLM-based judges, and geometric analysis is used to study controllable attribute manipulation in language models.

Large Language ModelsCounterfactual GenerationControllable Text GenerationRepresentation LearningCausal NLP
2024Published
Tiwari, K., Zhang, L. (2024). Implications of Minimum Description Length for Adversarial Attack in Natural Language Processing. Entropy, 26(5), 354.

This work studies adversarial robustness in natural language processing from an information-theoretic perspective. Instead of directly modeling adversarial perturbations, the approach treats the attack process as a complex causal mechanism and quantifies its algorithmic information using the Minimum Description Length (MDL) framework.

Using masked language modeling, the method estimates the amount of information required to transform an original text into its adversarially modified version. This signal can then be used to identify altered tokens and detect adversarial manipulation even without access to the original text.

Adversarial NLPMinimum Description LengthCausal InferenceRobust NLPLanguage Models
2022Published
Tiwari, K., Yuan, S., Zhang, L. (2022). Robust Hate Speech Detection via Mitigating Spurious Correlations. Proceedings of AACL-IJCNLP 2022 (Short Papers), 51–56.

This work proposes a robust hate speech detection model that mitigates spurious correlations between lexical cues and prediction labels. Traditional classifiers often rely on superficial correlations, making them vulnerable to adversarial word or character perturbations.

The proposed method formulates hate speech detection using a causal graph and quantifies spurious correlations through causal strength. A regularized entropy loss function is then introduced to reduce reliance on these correlations, improving robustness against adversarial attacks.

Hate Speech DetectionCausal InferenceAdversarial RobustnessRobust NLPText Classification