Debiased Contrastive Pairs
Generate balanced positive and negative prompts, then keep only the concept-bearing token spans so the direction isolates the intended attribute.
A portrait of a frowning person in warm window light.
A portrait of a smiling person in warm window light.
A close-up shot of a frowning face beside a curtain.
A close-up shot of a smiling face beside a curtain.
A studio photo of a frowning figure wearing a blue shirt.
A studio photo of a smiling figure wearing a blue shirt.
Difference-of-Means Direction
Pool the selected token embeddings, average each side, take their difference, and normalize once to obtain the global steering direction ds.
Difference-of-means over pooled concept-token embeddings.