Improving instruction hierarchy in frontier LLMs

IH-Challenge trains models to prioritize trusted instructions, improving instruction hierarchy, safety steerability, and resistance to prompt injection attacks.

📰 Original Source

This article was originally published on OpenAI News. Click below to read the complete article.

Read Full Article on OpenAI News →