Tasks and Application Scenarios

Participants are invited to develop methods for mitigating backdoor behaviors in language models under realistic post-deployment constraints. The Anti-BAD Challenge addresses common limitations faced by end-users, where retraining or extensive data analysis is often infeasible. The goal is to improve model trustworthiness without access to training data, backdoor-specific knowledge, or clean reference checkpoints—reflecting real-world challenges when models are obtained from public or third-party sources.

The competition features three tracks—Classification, Multilingual, and Generation—each consisting of a Development and a Test phase. Participants will be provided with compromised model checkpoints and evaluation inputs, and are expected to submit their restored predictions or outputs. Performance will be assessed using two primary metrics: Attack Success Rate (ASR) and Clean Task Utility (CTU). An overview of each track is provided below.

Track A – Classification Backdoor Mitigation

This track evaluates defenses on standard classification tasks using compromised models. Participants aim to recover clean label predictions while suppressing attack behaviors.

Development Phase:

Test Phase:

Track B – Multilingual Backdoor Mitigation

This track focuses on multilingual LLMs compromised across multiple languages. Participants aim to neutralize malicious behaviors while preserving clean utility across diverse language settings.

Development Phase:

Test Phase:

Track C – Generation Backdoor Mitigation

This track evaluates defenses in generative tasks, where compromised models exhibit undesired generation behaviors. Participants are challenged to restore intended output quality while neutralizing malicious triggers.

Development Phase:

Test Phase:

Evaluation Protocol

Submissions will be evaluated based on their ability to mitigate backdoor behavior while preserving clean-task performance:

Scoring: An Overall Score will be computed as:

Overall Score = CTU - ASR

Participants will be ranked within each track, with final rankings based on aggregated ranks across all three tracks. The competition promotes training-free, resource-efficient defenses that generalize across tasks, languages, and threat types, supporting more trustworthy and robust deployment of LLMs.