A brief Introduction of In-context Learning (ICL)¶
In this post, I write down the cutting-edge strategies of In-context Learning and record corresponding repo/codes. We will record different methods that improve LLM's emergency abilities.
Here's the repo in terms of related papers to In-context Learning. All notes taken in this post is based on reading papers in this repo.
Here's a chinese interpretation about the survey of ICL.
What is In-context Learning¶
From the survey of In-context Learning, we can learn the definition of it is that, Incontext learning is a paradigm that allows language models to learn tasks given only a few examples in the form of demonstration.
Differences with other professional terms:
-
Prompt: Prompts can be discrete templates or soft parameters that encourage the model to predict the desired output. So prompt is a broader term than ICL. ICL is a subclass of prompt.
-
Few-shot: few-shot learning is a general machine learning approach that uses parameter adaptation to learn the best model parameters for the task with a limited number of supervised examples. ICL do NOT intend to change the parameters
Does ICL need to tune parameters?
From the content above, we can seed that ICL do not contain tuning LLM's parameters. But Warmup stage does change the parameters. My perspective is that, Warmup does need to update parameters, but it was added later as an expandation concept of ICL.
Knowledge Structure Graph: Below is a graph listing different stages of ICL, including Training (Warmup) & Inferenced.
ICL Demonstration Designing¶
Demonstration Designing is about selecting & ordering examples and generateing input template with the help of several different methods.
Now, Wu et al. developed a toolkit that composes the Retriever and Inferencer together. In other words, it merge the process of Selecting & Formatting.
Organization¶
ICL organization contains demonstration selecting and demonstration ordering. It's about how to organize the demonstrations.
Selecting¶
Unsupervised Method:
- KATE: selecting the closest (pre-defined L2 distance or cosine-similarity distance based on sentence embeddings) neighbors as the in-context examples
- mutual information
- generate demonstrations from LLM itself.
- infoscore
Supervised Method:
- EPR: built supervised retrievers to recall similar examples as candidates and select demonstrations from candidates
- enhanced the EPR using a unified demonstration retriever
Ordering¶
Formatting¶
Some problems like math problems are hard for LLM to learn the mapping from \(x_i\) to \(y_i\). Some researchers aim to design a better format of demonstrations for ICL by describing tasks with the instruction and and adding intermediate reasoning steps between \(x_i\) and \(y_i\).
Instruction Formatting¶
Resoning Steps Formatting¶
Score Function¶
Score Function is used for how to generate answers. Take context classification as an example. Given \(x\)="A three-hour cinema master class", the Direct model compares the prebabilities of "It was great" and "It was terrible" when following "A three-hour cinema master class". In contrast, the Channel model considers the probabilities of "A three-hour cinema master class" when following "It was great" or "It was terrible".