Evaluating Authorship Verification Robustness Under Domain Shift and LLM-Based Rewriting
Overview
This repository contains the code and experimental framework for my MSc dissertation at the University of Sheffield. The project investigates the robustness of transformer-based authorship verification (AV) models under challenging real-world conditions: domain shift (e.g., news articles vs. tweets) and adversarial rewriting using large language models (LLMs).
Research Questions
- Domain Shift: Can authorship verification models reliably detect stylistic consistency across different genres when no adversarial rewriting is applied?
- Adversarial Robustness: How robust are these models to LLM-based adversarial rewriting (style obfuscation and impersonation) in same-domain texts?
- Combined Challenge: How do AV models perform when domain shift and adversarial attacks are combined?
Key Findings
- DistilBERT showed superior robustness across all scenarios despite being the smallest model
- Domain shift impact: DistilBERT maintained stability (1% drop), while RoBERTa showed catastrophic failure (18% drop)
- Adversarial attacks: Impersonation attacks reduced all models to near-random performance (ROC-AUC < 0.56)
- Combined challenges: When domain shift and impersonation were combined, performance approached random guessing
External Dataset: CrossNews
This project uses the CrossNews dataset as a Git submodule for experiments related to authorship verification and threat text analysis.
- Repository:
external/CrossNews - Description: A cross-source dataset for document-level fake news detection.
Citation:
M. Ma, “CROSSNEWS: A Cross-Genre Authorship Verification and Attribution Benchmark”, AAAI, vol. 39, no. 23, pp. 24777-24785, Apr. 2025. GitHub: https://github.com/mamarcus64/CrossNews
Models
Three transformer architectures were selected to represent different design trade-offs:
| Model | Description | Parameters | Context Length |
|---|---|---|---|
| DistilBERT | Lightweight, efficient baseline | 66M | 512 tokens |
| RoBERTa | Enhanced BERT with robust pretraining | 125M | 512 tokens |
| BigBird | Sparse attention for long sequences | 128M | 4096 tokens |
Adversarial Attacks
Two LLM-based attack strategies using Flan-T5-Large:
- Style Obfuscation: Untargeted paraphrasing to conceal authorial cues
- Style Impersonation: Targeted rewriting to mimic another author’s style
Results Summary
In-Domain Performance (Article-Article)
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.8882 | 0.7999 | 0.8161 |
| RoBERTa | 0.8785 | 0.7946 | 0.8084 |
| BigBird | 0.8108 | 0.7321 | 0.7438 |
Cross-Domain Performance (Article-Tweet)
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.8711 | 0.7874 | 0.8006 |
| RoBERTa | 0.8703 | 0.6127 | 0.4880 |
| BigBird | 0.8149 | 0.6719 | 0.5891 |
Under Adversarial Attacks (Worst Case: Impersonation + Domain Shift)
| Model | ROC-AUC | Accuracy | F1 Score |
|---|---|---|---|
| DistilBERT | 0.5590 | 0.5406 | 0.5431 |
| RoBERTa | 0.5587 | 0.5391 | 0.5455 |
| BigBird | 0.5444 | 0.5316 | 0.5305 |
Interpretability Analysis
SHAP (SHapley Additive exPlanations) analysis revealed critical insights:
- Models often rely on platform-specific artifacts (hashtags, URLs) rather than genuine stylistic cues
- Punctuation patterns and function words dominate decisions under adversarial conditions
- Even correct predictions often stem from topic-related content rather than authorial style
Reproducibility
All experiments use fixed random seeds (7, 1001, 1211) for reproducibility.
Ethics Review
This project has been ethically reviewed and approved by the Ethics Committee of the University of Sheffield.