Georeactor Blog

RSS Feed

Abliterating Refusal and Code LLMs



Tags: mlcodethrough

I did a post about CodeLlama and the paper "Refusal in LLMs is mediated by a single direction" on HuggingFace: https://huggingface.co/blog/monsoon-nlp/refusal-in-code-llms

This comes with a CodeLlama model without safety parameters, and another version where I 2x'd the intervention vector.

Some takeaways: