Georeactor Blog

RSS Feed

ML Arxiv Haul #24

Tags: arxiv

Comments / Criticism

Great moments in academia: U. Penn and Nature celebrate Katalin Karikó's Nobel Prize in Medicine (for mRNA), not acknowledging them effectively rejecting her and her work - SlateStarCodex discussion - Also raised how a Lithuanian scientist submitted a paper on CRISPR but got rejected and scooped

There was a Tweet war about whether LLMs with text-speech interfaces could be therapists, or are at least interesting posing as therapists, and whether an OpenAI person talking about that is too cavalier. Recap.

OpenAI removes access to output token probabilities:

EleutherAI Tweeter comments on tokenization of numbers:

Satirical prompt for an LLM to reject every request as unsafe:


Survey looking for evidence of GPT use in papers:

Prompt engineering to convince GPT to read a CAPTCHA:


Aligning Large Multimodal Models with Factually Augmented RLHF

Large Multimodal Models (LMM) are built across modalities and the misalignment between two modalities can result in…

Trying to standardize the process of RLHF for a text + image model.

BayesDLL: Bayesian Deep Learning Library

We release a new Bayesian neural network library for PyTorch for large-scale deep networks. Our library implements…

Releasing a new library in probabilistic AI. The pre-trained model serves as the prior mean. Can measure uncertainty and things.

Fine-tuning Aligned Language Models Compromises Safety, Even When Users Do Not Intend To!

Optimizing large language models (LLMs) for downstream use cases often involves the customization of pre-trained LLMs…

Team applies fine-tuning to GPT-3.5 and with just 10 examples can break the safety features of the typical model. Even fine-tuning on benign questions can disrupt some sections of safety features (I think because it reminds the model to be universally helpful?).

A similar paper Shadow Alignment uses 100 malicious examples against several safety-aligned models.

How to Catch an AI Liar: Lie Detection in Black-Box LLMs by Asking Unrelated Questions

Large language models (LLMs) can "lie", which we define as outputting false statements despite "knowing" the truth in a…

This paper was circulating and puzzling everyone. After asking an LLM a question, they follow up with a series of general knowledge questions (and also some offbeat questions). A classifier can train on responses to those follow-up questions and warn you if the first answer was truthful or a lie. The questions include both some known factual questions and "random" questions such as flipping a coin.

The team is interested in several different ways of probing the model's chain-of-thought, etc. to understand hallucination. The paper talks about these different frames of it, but the main thing is that we can pick up on whether the model lied and is continuing to lie.


LAION is better-known for their images dataset, but here's an LLM with a well-documented training process.

OpenWebMath: An Open Dataset of High-Quality Mathematical Web Text

There is growing evidence that pretraining on high quality, carefully thought-out tokens such as code or mathematics…

A corpus of all known math on the internet, generally looking for LaTeX.

PB-LLM: Partially Binarized Large Language Models

This paper explores network binarization, a radical form of quantization, compressing model weights to a single bit…

This was introduced as a super-quantized version of models. The "partial" means that some weights are actually frozen at full size, and the quantized parts are shrunken down. The frozen weights are selected by a procedure to measure salience, so we're sorta doing a pruning or distillation of the original model.

Promptbreeder: Self-Referential Self-Improvement Via Prompt Evolution

Popular prompt strategies like Chain-of-Thought Prompting can dramatically improve the reasoning abilities of Large…

This was pitched as an evolutionary model for prompts. They prompt an LLM to generate new variants of the original prompt, which seems to be based on or comparable to "Plan-and-Solve" and "Automatic Prompt Engineer" papers. What LLM do they use? When they talk about using Google's PaLM as the "underlying LLM" is it both prompting and being evaluated on the benchmark?

Self-exfiltration is a key dangerous capability

To gauge risks from future LLMs that could be misaligned, we need to measure whether LLMs could "steal" their own…

Substack lengthy discussion of the threat posed by a model extricating itself from the test environment and being free to do malicious stuff.

Thespian: Multi-Character Text Role-Playing Game Agents

Text-adventure games and text role-playing games are grand challenges for reinforcement learning game playing agents…

I follow this Georgia Tech group so I see a bunch of these RPG models. They continue using the LIGHT environment. They suggest using LoRA adapters for characters but instead they've continued to use reinforcement learning.

Think before you speak: Training Language Models With Pause Tokens

Language models generate responses by producing a series of tokens in immediate succession: the $(K+1)^{th}$ token is…

A Google Research project which unexpectedly works better with a <pause> token. To avoid biasing the model to fine-tuning, the token is used from pre-training. One theory for the success is that blank tokens make the model train more on reading further back in the input/context. They do some experiments on more or fewer pause tokens.

Towards Carbon Transparency: A High-Resolution Carbon Emissions Database for China’s Listed Companies

The dual-carbon goals of China necessitate precise accounting of company carbon emissions, vital for green development…

Team from several institutions investigates the open data being given to potential investors about "A-share listed companies" in China. They also use satellite data on emissions (which doesn't mention methane).

Towards provably efficient quantum algorithms for large-scale machine-learning models

Large machine learning models are revolutionary technologies of artificial intelligence whose bottlenecks include huge…

Large team investigates gradient descent for machine learning on quantum computers. This introduced me to the concept of variational quantum algorithms with cooperation between classical and quantum computing. As always with quantum, the work is waiting for "fault-tolerant" quantum computing (i.e. built-in error correction) so I wouldn't expect this to roll out soon.

Vector Search with OpenAI Embeddings: Lucene Is All You Need

We provide a reproducible, end-to-end demonstration of vector search with OpenAI embeddings using Lucene on the popular…

I did some earlier experiments with OpenAI + Cohere embeddings and a vector search engine, and other people are using Postgres's vector support. The title suggests you can just use a more conventional search, but they actually use hierarchical navigable small-world network (HNSW) indexes which were just added to Lucene in the past two years.

What does CLIP know about a red circle? Visual prompt engineering for VLMs

Large-scale Vision-Language Models, such as CLIP, learn powerful image-text representations that have found numerous…

Seems like it would be obvious, but showing that this multi-modal model is understanding a red circle on an image serves as a highlight / focus element.