Skip to content

A brief Introduction of In-context Learning (ICL)

In this post, I write down the cutting-edge strategies of In-context Learning and record corresponding repo/codes. We will record different methods that improve LLM's emergency abilities.

Here's the repo in terms of related papers to In-context Learning. All notes taken in this post is based on reading papers in this repo.

Here's a chinese interpretation about the survey of ICL.

What is In-context Learning

From the survey of In-context Learning, we can learn the definition of it is that, Incontext learning is a paradigm that allows language models to learn tasks given only a few examples in the form of demonstration.

Differences with other professional terms:

  1. Prompt: Prompts can be discrete templates or soft parameters that encourage the model to predict the desired output. So prompt is a broader term than ICL. ICL is a subclass of prompt.

  2. Few-shot: few-shot learning is a general machine learning approach that uses parameter adaptation to learn the best model parameters for the task with a limited number of supervised examples. ICL do NOT intend to change the parameters

Does ICL need to tune parameters?

From the content above, we can seed that ICL do not contain tuning LLM's parameters. But Warmup stage does change the parameters. My perspective is that, Warmup does need to update parameters, but it was added later as an expandation concept of ICL.

Knowledge Structure Graph: Below is a graph listing different stages of ICL, including Training (Warmup) & Inferenced. Taxonomy of ICL

ICL Demonstration Designing

Demonstration Designing is about selecting & ordering examples and generateing input template with the help of several different methods.

Now, Wu et al. developed a toolkit that composes the Retriever and Inferencer together. In other words, it merge the process of Selecting & Formatting.

Organization

ICL organization contains demonstration selecting and demonstration ordering. It's about how to organize the demonstrations.

Selecting

Unsupervised Method:

Supervised Method:

Ordering

Formatting

Some problems like math problems are hard for LLM to learn the mapping from \(x_i\) to \(y_i\). Some researchers aim to design a better format of demonstrations for ICL by describing tasks with the instruction and and adding intermediate reasoning steps between \(x_i\) and \(y_i\).

Instruction Formatting

Resoning Steps Formatting

Score Function

Score Function is used for how to generate answers. Take context classification as an example. Given \(x\)="A three-hour cinema master class", the Direct model compares the prebabilities of "It was great" and "It was terrible" when following "A three-hour cinema master class". In contrast, the Channel model considers the probabilities of "A three-hour cinema master class" when following "It was great" or "It was terrible".