Georeactor Blog
RSS FeedCartoon ML - Part 5 - IDEFICS
Two developments:
I saw a Tweet about multi-modal instruct model IDEFICS, which was released in August. This offered a newer and more accessible alternative to the InstructBLIP model, so I decided to give it a try. It still requires an A100 to run the 9B-param model.
HuggingFace also recently added community blog posts. For a while I thought about my first post being comments on 5-10 ML papers, then I landed on the idea to write about trying IDEFICS: https://huggingface.co/blog/monsoon-nlp/idefics-newyorker
I like the prompting setup:
prompts = [
[
"User: Describe all characters and setting of this cartoon in detail. It may be sardonic or absurdist.",
image, # PIL object or URL string
"<end_of_utterance>",
],
]
This could be used for few-shot multimodal prompts, chat-like prompts, etc.