If you tell people to not think about pink elephants, they immediately start thinking about pink elephants… Sometimes LLMs are the same way.
I often see inexperienced prompt writers attempting to solve a chatbot’s tendency to make up incorrect facts by adding in things to the prompt like: “Don’t hallucinate!”, “Don’t make things up!”, or sometimes “Don’t say XYZ”. This doesn’t always have the desired effect; the chatbot still makes up incorrect facts, the chatbot still says “XYZ” under certain circumstances… What’s going on here?
All of this simply demonstrates that large language models don’t understand prompts like humans, instead they assess patterns in the data they have been trained on and use that to synthesize the most probable next token. Oftentimes the more you talk about something the more likely the LLM will be to echo that concept, sentence structure, or idea back to you. If we aren’t careful, things we tell the LLM not to do instead prime the context with a higher probability for those tokens to come back up again in responses. Some of this tendency ties back to how the models were trained where training data often times contains more positive examples than negative ones. This problem crops up a lot with image generation where often times the model will have difficulty generating images that conflict with its training data, like if you ask it to generate an image of a burger without cheese. In this case the more you talk about cheese, even if it’s instructing the model to “hold the cheese” the more cheese you are going to get.
We have to remember that LLMs don’t really have an inherent sense of truthfulness or falsehood, they generate responses based on likelihoods derived from their training data. When instructed not to make things up, the model does not have a mechanism to fact-check its own output or discern between factual and fictional content. It simply continues to follow the patterns it has learned. In fact, some demonstrations show that asking the model not to hallucinate makes it more likely to generate incorrect information.
There isn’t a hard and fast solution to this problem, but some things to keep in mind when writing prompts are:
Try to use positive statements in your prompts instead of negative ones
Give examples of what you want to see, not what you don’t want to see
Inject fact-checked context and ask the LLM to use it in its response or to base its response on that context (known as retrieval augmented generation, or RAG). This isn’t a silver bullet that will always work, but at the very least it feeds in information that is truthful for the LLM to pattern its responses off of and primes the model to be more likely to move in that direction.