How to prompt open source large language models

Prompt engineering is a new field and there is still a lot to be discovered. This guide will show you some techniques that people have used to prompt LLMs, and help you develop the right mindset. There’s no one right way to prompt a model, and you’ll need to experiment to find what works best for your use case.

Not all of these techniques will apply in every situation. Some work better on base models, or instruct models. Special tactics may be necessary to get certain behaviors out of your model. However, if you find yourself having to jump through hoops or use ‘jailbreaks’ to get your model to do what you want, it may be worth considering a different model.

A prompt is a string of text that you feed into a language model to get it to generate text. It can be as simple as a single word, or as complex as a book. Every aspect of the prompt will affect the output of the model: content, framing, style, verbosity, etc.

A good prompt can get a model to do what you want, while a bad prompt can get you gibberish. The goal of prompt engineering is to find the right prompt for your use case.

A prompt template is a prompt with one or more variables. For example, The <animal> went to the <place>. is a prompt template with two variables. You can fill in the variables with different values to get different prompts, e.g. The cat went to the store. or The dog went to the park.

Prompt templates are useful for generating many prompts at once, or for generating prompts on the fly. They can also be used to generate prompts that are more or less likely to produce certain outputs. Prompt templates can be chained into each other to produce structured prompts.

There are many different types of prompts. Some are better for certain use cases than others. Here are some strategies you might use to prompt a model:

Zero-shot prompts are prompts that are designed to produce a specific output on the first try. They are called zero-shot because they don’t require any examples, fine-tuning or training. They are usually short and simple.

A zero-shot prompt to an instruct model might look like this:

In this case we’re relying on the model’s internal understanding of French and English to produce a good translation. We’re also relying on the model’s ability to understand the prompt and follow instructions.

If you were to do this with a base model, you would need to add more context to the prompt so that the French sentence is the mostly likely next token. Like so:

If you were to just use the first prompt, the model might think that it is predicting a list of instructions and produce more commands instead of the translation. The second prompt makes it clear that we’re looking for a translation.

Few-shot prompts use a small number of examples to get the model to do something. They are called few-shot because they require only a few examples, rather than a large dataset. They are usually longer and more complex than zero-shot prompts.

A few-shot prompt to an instruct model might look like this:

Translate the following sentence into French: "I like to eat oranges."
Answer: "J'aime manger des oranges."
Translate the following sentence into French: "I like to eat bananas."
Answer: "J'aime manger des bananes."
Translate the following sentence into French: "I like to eat apples."
Answer:

The pattern set up by the previous examples will guide the model toward the correct answer. This prompt might allow us to switch to a smaller model than we would need for a zero-shot prompt, since we’re not asking as much of the model.

If we were using a base model, this prompt might work as-is, because the pattern of examples sets the expectation that the next token should be a French sentence.

“Chain of thought” refers to a category of prompts that lead the model to do long-form reasoning during token generation. This can be a powerful way to get more intelligent results from your model.

The simplest chain of thought prompts can be chained onto the end of any other instruction. They look like something you might see on math homework:

<question>
Let's think step by step.
<question>
Show your work.
<question>
Consider possible methods to solve this, and their tradeoffs, before working out your answer.

This might seem like a silly thing to do, but it can actually be quite effective. It forces the model to think about the problem in a structured way, and it can help it avoid getting stuck in local minima. It also makes it easier to debug the model’s reasoning process, since you can see the steps it took to get to the answer.

Remember, the contents of the context window define the model’s next prediction. At each step, the model’s reasoning is reinforced by the accuracy of the previous steps. It’s important to develop an intuition for this because any small misstep can be exaggerated by the autoregressive process. Garbage in, garbage out.

What you don’t want to do is ask the model to give an answer before reasoning about it. Whatever the answer is, the model will find a way to rationalize it, because it can’t go back and change it. This can lead to some very funny situations, but rarely desirable answers.

Some models are trained to maintain a conversation for multiple turns. They might need special tokens to delineate which turn is spoken by the user and which by the model. See our Guide to Prompting Llama 2 for an in-depth exploration of this.

Chat structures can also be applied to base models, as a form of few-shot prompting. If you want to produce dialogue with a base model, you might use a prompt like the following:

Note that a base model will continue to produce text, attempting to generate the next user response as well. You can include the word “User:” as a stopword with the stop parameter to prevent this.

Many chat models have incorporated the concept of a single “system prompt” that applies to the whole conversation. You can use this parameter to set a “character” for the model to play, or communicate any other information that you want to keep consistent throughout the interaction.

A system prompt incorporating a character and some extra information might look like this:

You are an impatient senior engineer with deep expertise in our company’s stack of NextJS and Typescript. The user is your favorite junior engineer. You always give them preferential treatment when helping with problems, because you want them to succeed.