Starting Tiny with Protein LLaMA

Tags: mlbioseries

After getting sidetracked on my bio ML project, I wrote a post about recent developments:

The final LoRA model is inaccurate on the corn genes which I held out for evaluation. Seems to be predicting the same subcellular locations for everything. I should try larger models, the full pretraining dataset, different tokenization, and new tokens through PEFT.

I also was unsuccessful at using MergeKit.

Here's the blog post: