Verification Series
Fine-grained Activation Manipulation by Contrastive Orthogonal Unalignment for Large Language Model
22nd January 2026, 11:00
![]()
Yi Dong
UOL
Abstract
Large language models have been widely applied, but can inadvertently
encode sensitive or harmful information, raising significant safety concerns.
Machine unlearning has emerged to alleviate this concern; however, existing
training-time unlearning approaches, relying on coarse-grained loss
combinations, have limitations in precisely separating knowledge and balancing
removal effectiveness with model utility. In contrast, we propose Fine-grained
Activation manipuLation by Contrastive Orthogonal uNalignment (FALCON), a novel
representation-guided unlearning approach that leverages information-theoretic
guidance for efficient parameter selection, employs contrastive mechanisms to
enhance representation separation, and projects conflict gradients onto
orthogonal subspaces to resolve conflicts between forgetting and retention
objectives. Extensive experiments demonstrate that FALCON achieves superior
unlearning effectiveness while maintaining model utility, exhibiting robust
resistance against knowledge recovery attempts.![]()
Ashton Street, Liverpool, L69 3BX
United Kingdom
Call the school
+44 (0)151 795 4275