Cartoon ML - Part 5 - IDEFICS

Tags: mlcodethroughnycartoon

Two developments:

I saw a Tweet about multi-modal instruct model IDEFICS, which was released in August. This offered a newer and more accessible alternative to the InstructBLIP model, so I decided to give it a try. It still requires an A100 to run the 9B-param model.

HuggingFace also recently added community blog posts. For a while I thought about my first post being comments on 5-10 ML papers, then I landed on the idea to write about trying IDEFICS:

I like the prompting setup:

prompts = [
        "User: Describe all characters and setting of this cartoon in detail. It may be sardonic or absurdist.",
        image, # PIL object or URL string

This could be used for few-shot multimodal prompts, chat-like prompts, etc.