diff --git a/bibliography.bib b/bibliography.bib index d10a58c3857..213879415d3 100644 --- a/bibliography.bib +++ b/bibliography.bib @@ -210,6 +210,35 @@ @misc{zhou2022large primaryClass={cs.LG} } +@misc{zhang2022tempera, + title={TEMPERA: Test-Time Prompting via Reinforcement Learning}, + author={Tianjun Zhang and Xuezhi Wang and Denny Zhou and Dale Schuurmans and Joseph E. Gonzalez}, + year={2022}, + eprint={2211.11890}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{deng2022rlprompt, + title={RLPrompt: Optimizing Discrete Text Prompts with Reinforcement Learning}, + author={Mingkai Deng and Jianyu Wang and Cheng-Ping Hsieh and Yihan Wang and Han Guo and Tianmin Shu and Meng Song and Eric P. Xing and Zhiting Hu}, + year={2022}, + eprint={2205.12548}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +@misc{guo2021efficient, + title={Efficient (Soft) Q-Learning for Text Generation with Limited Good Data}, + author={Han Guo and Bowen Tan and Zhengzhong Liu and Eric P. Xing and Zhiting Hu}, + year={2021}, + eprint={2106.07704}, + archivePrefix={arXiv}, + primaryClass={cs.CL} +} + +% Models + % Language Model Guides @book{jurafsky2009, diff --git a/docs/automated_pe/_category_.json b/docs/automated_pe/_category_.json new file mode 100644 index 00000000000..1c5c44c4be1 --- /dev/null +++ b/docs/automated_pe/_category_.json @@ -0,0 +1,8 @@ +{ + "label": "⚙️ Automated Prompting", + "position": 70, + "link": { + "type": "generated-index", + "description": "Methods that automate prompt engineering" + } +} diff --git a/docs/automated_pe/ape.md b/docs/automated_pe/ape.md new file mode 100644 index 00000000000..b438314993a --- /dev/null +++ b/docs/automated_pe/ape.md @@ -0,0 +1,47 @@ +--- +sidebar_position: 1 +--- + +# 🟢 APE + +Automatic Prompt Engineering (APE)(@zhou2022large) is an approach to automating the generation and +selection of prompts. The basic idea of APE is to give a LLM a prompt containing +a few shot exemplars, and ask it generate a prompt that would create these exemplars. + +## Example + +For example, if we give the LLM the following prompt: + +```text +Is a banana a fruit? +Yes +Is a tomato a fruit? +No +Is a fish a fruit? +No + +What would be a good prompt to generate an answer to the above questions? +``` + +```text +banana +Yes + +tomato +No + +fish +No + +watermelon +Yes + +What would be a good prompt to generate an answer to the above questions? +// highlight-start +Is the following item a fruit: +// highlight-end +``` + +## Notes + +Another simple automatic prompt engineering strategy is to simply give GPT-3 your prompt and ask GPT-3 to improve it. \ No newline at end of file diff --git a/docs/trainable/discretized.md b/docs/automated_pe/discretized.md similarity index 100% rename from docs/trainable/discretized.md rename to docs/automated_pe/discretized.md diff --git a/docs/automated_pe/more.md b/docs/automated_pe/more.md new file mode 100644 index 00000000000..1d94baacb01 --- /dev/null +++ b/docs/automated_pe/more.md @@ -0,0 +1,7 @@ +--- +sidebar_position: 200 +--- + +# More + +Other methods exist, such as Autoprompt(@shin2020autoprompt), which uses gradient based search to build prompts for MLMs. \ No newline at end of file diff --git a/docs/automated_pe/overview.md b/docs/automated_pe/overview.md new file mode 100644 index 00000000000..fd1606056e3 --- /dev/null +++ b/docs/automated_pe/overview.md @@ -0,0 +1,7 @@ +--- +sidebar_position: 0 +--- + +# Overview + +Can prompt engineering really be automated? Sometimes. \ No newline at end of file diff --git a/docs/automated_pe/rl.md b/docs/automated_pe/rl.md new file mode 100644 index 00000000000..2144341d579 --- /dev/null +++ b/docs/automated_pe/rl.md @@ -0,0 +1,68 @@ +--- +sidebar_position: 130 +--- + +# 🟣 Reinforcement Learning + +This section covers reinforcement learning methods which optimize discrete prompts (not soft prompts).
This is extremely complicated. + +## RLPrompt + +RLPrompt(@deng2022rlprompt) is a method that takes an input and trains a language model (the policy) +to generate a good prompt for that input. + +More formally, given an input sequence $x$, the policy designs a prompt $z$ by selecting $[z_1, z_2, ..., z_T]$ tokens from the vocabulary sequentially. + +After creating the prompt, it combines it with $x$, and uses another language model to +generate the completion. The LM output of x prompted by z can be described as $y_{LM}(\hat{z}, x)$. + +Then, the policy receives some reward according to this output: $R(y_{LM}(\hat{z}, x))$ + +### Example + +Assuming we have partially trained RLPrompt on classifying movie reviews, and our next +training point example is `x = "I hate this movie."`. RLPrompt will generate a prompt like +`z = "Movie review bad or good:`. Then, it will combine the prompt with the input to get +`x' = "Movie review bad or good: I hate this movie."`. Then, it will use a language model +to generate the completion. Say it generates `bad`. Then, the reward is computed as +`R(y_{LM}(\hat{z}, x))`. Deng et al. do not use a simple 0/1 reward. + +## Training + +RLPrompt embeds a task specific MLP inside a frozen LM. The MLP is trained with Soft Q Learning(@guo2021efficient). + +## TEMPERA + +**TE**st-ti**M**e **P**rompt **E**diting using **R**einforcement le**A**rning +(TEMPERA)(@zhang2022tempera) is a method for automatically generating +interpretable prompts. + +At a high level, instead of building a prompt from scratch like RLPrompt, TEMPERA takes a starting prompt and modifies different parts of it in order to see what changes help most. + +## Action Space + +TEMPERA is allowed to edit 3 parts of the prompt: + +### 1) The instruction + +Given the instruction $i$, one could parse it through `nltk.tokenize.treebank` into a set of phrases. Then the actions allow swapping, addition and deletion between current set of phrases. For example, this will first parse the sentence `"Given text, classify whether it is good or bad."` to `["Given text", "classify", "whether", "it is", "good", "or", "bad"]`. Then we can perform different editing strategies (e.g., swapping two phrases, delete one phrase or repeat one phrase) on this set of phrases. + +### 2) In-context examples + +Given a example pool of $K$ examples (aka %%exemplars|exemplars%%), we want to select $k$ from them to formulate the final prompt. The action space allows change position of examples $i, j$ with $0 < i < j < k$. It also supports replacing example $0 < i < k$ with any candidate from the pool $k < j < K+1$. + +### 3) The verbalizers + +The editing space simply allows changing the current verbalizer to any other verbalizer from the `promptsource` collections. For examples, changing from `["positive", "negative"]` to `["great", "terrible"]`. + +## Reward + +They use a reward which consists of the difference of score between a prompt before/after an edit. + +TEMPERA is densely reward, computing a reward for each edit step according to the accuracy improvement comparing the current prompt (after editing) and the previous prompt (before editing). + +## Training + +TEMPERA uses a GPT architecture and is trained with proximal policy optimization. + +They use a reward which consists of the difference of score between a prompt before/after an edit. diff --git a/docs/trainable/soft_prompting.md b/docs/automated_pe/soft_prompting.md similarity index 100% rename from docs/trainable/soft_prompting.md rename to docs/automated_pe/soft_prompting.md diff --git a/docs/bibliography.md b/docs/bibliography.md index 46a5542cb28..942daaecd48 100644 --- a/docs/bibliography.md +++ b/docs/bibliography.md @@ -56,7 +56,11 @@ cite them as such. #### AutoPrompt(@shin2020autoprompt) 🔵 -#### Automatic Prompt Engineer(@zhou2022large) +#### Automatic Prompt Engineer(@zhou2022large) 🔵 + +#### TEMPERA(@zhang2022tempera) 🔵 + +#### RLPrompt(@deng2022rlprompt) ## Models diff --git a/docs/trainable/_category_.json b/docs/trainable/_category_.json deleted file mode 100644 index bba090d936b..00000000000 --- a/docs/trainable/_category_.json +++ /dev/null @@ -1,8 +0,0 @@ -{ - "label": "💪 Prompt Tuning", - "position": 70, - "link": { - "type": "generated-index", - "description": "Prompt engineering that you can fine tune with gradients" - } -}