Georeactor Blog

RSS Feed

Abliterating Refusal and Code LLMs

Tags: mlcodethrough

I did a post about CodeLlama and the paper "Refusal in LLMs is mediated by a single direction" on HuggingFace:

This comes with a CodeLlama model without safety parameters, and another version where I 2x'd the intervention vector.

Some takeaways: