Skip to content

Prompt Structure

Sources: This skill draws primarily from two official sources: Anthropic's Prompting Best Practices documentation, and the talk "Prompting 101" by Hannah Moran and Christian Ryan (Applied AI team at Anthropic), presented at Code w/ Claude in San Francisco on May 22, 2025. The structural template, worked example, and "what mistakes you'd rather it make" framing come from the talk; those principles and mechanical levers reflect the current docs.

Talk: youtu.be/ysPbXH0LpIE

Use this skill when writing or revising a prompt. The structure and techniques here were developed and tested by Anthropic's Applied AI team to produce behaviors any prompt benefits from: reliability across repeated runs, refusal to guess on ambiguous input, calibrated confidence when context is sufficient, and adherence to a specified order of operations. They also produce parseable, structured output — useful when the prompt feeds another system, less useful for prose answers to a human.

Example uses

  • Writing or revising a system prompt
  • Debugging an LLM application that hallucinates, hedges, or formats output inconsistently
  • Building an extraction pipeline, classifier, or agent step that must produce structured output
  • Restructuring a prompt that "kind of works" into one that works every time

The mental model

Prompt engineering is iterative empirical science, not creative writing. The goal is to remove ambiguity so the model doesn't have to guess at intent, domain, or output shape. Every quality problem you see in output is usually a missing piece of context, missing instruction, or wrong order — not a model limitation.

The colleague test. Could a smart colleague execute the task with nothing but your prompt? If they'd need to ask clarifying questions, the model will need to guess instead. Add what they'd ask for.

Three core moves do most of the work:

  1. Tell the model what kind of mistakes you'd rather it make. ("Stay factual. Don't guess if you can't see the data clearly. Refuse to assess if confidence is low.") This converts hallucinations into honest "I can't tell" answers.
  2. Control the order of operations. Tell the model what to look at first, second, third — the way a human expert would approach the task. Wrong order produces wrong answers even with perfect context.
  3. Tell the model what to do, not what not to do. Negative instructions ("don't be verbose," "don't include disclaimers") are weaker than positive ones ("respond in one sentence," "answer directly with the conclusion"). Frame the desired behavior, not the forbidden one.

The structural template

Build the prompt in this order. Not every prompt needs every section, but when output quality is bad, the fix is almost always a section that was skipped.

  1. Task & role — One sentence: what the model is and what it's here to do. ("You are an AI assistant helping a claims adjuster review Swedish car accident report forms.")

  2. Tone & disposition — How the model should behave when uncertain, and what kind of mistakes you'd rather it make. The most valuable line is usually some version of "stay factual; refuse to guess; only make a determination when confidence is high." Frame these as positive behaviors, not prohibitions.

  3. Background detail — Everything the model needs to know about the world the task lives in: things an experienced human in this role would already know but the model doesn't. The single highest-leverage place to add context.

    Concrete examples by task type:

    • Coding assistant: the codebase's conventions, the language and framework versions, the test framework, the style guide
    • Customer support: the product catalog, the refund policy, what escalation tiers mean, which actions require human approval
    • Email drafting: the user's role and communication style, who they're writing to, recent thread context
    • Data extraction: the form schema, what each field means, how humans typically misfill it
    • Research / analysis: the user's domain (e.g., "clinician, not patient"), what counts as a credible source, what's out of scope
  4. Detailed instructions — A step-by-step list of how to approach the task. Spell out the order: "First, read the form carefully and list what's checked. Then, look at the sketch with the form's contents in mind. Then, make your assessment."

  5. Examples — A handful of well-chosen edge cases beats a generic instruction every time. Especially valuable for gray-area cases your eval set keeps failing on. Wrap each example in delimiter tags (typically <example>) and the whole set in <examples> so the model can distinguish examples from instructions.

  6. The actual content / dynamic data — The form, the document, the question, the retrieved context for this request.

  7. Final reminder — Repeat what's critical. Re-emphasize the confidence rules, the order of operations, the "refuse if unclear" behavior. Recency in the prompt matters; this is your last chance to pin the model's attention. (When the prompt runs in a multi-turn conversation, the final reminder matters even more, since the model has prior turns competing for its attention.)

  8. Output format — Spell out exactly what the response should look like. Show the shape: tags, keys, structure. Match the style of your prompt to the style you want back — bulleted prompts tend to produce bulleted answers, prose prompts tend to produce prose.

Mechanical levers

These are the small techniques that compound on top of good structure.

Structured delimiters

Models parse tag-delimited content as structured. Wrap distinct sections — for example <form_schema>, <examples>, <final_verdict>. Tags also let the model refer back to specific blocks later in the prompt unambiguously. Markdown headers and fences work too, but XML-style tags are the most precise when sections need to be referenced from elsewhere in the prompt.

Few-shot for the long tail

Few-shot examples shine on the cases that are hard for the model but solvable for a human with intuition. Build an example set out of your failure cases — the gray-area inputs your eval keeps getting wrong. Each example should show the input plus a clear breakdown of how to reason about it. Aim for diversity across the example set rather than volume; three to five varied examples usually beats ten similar ones. Production applications eventually accumulate tens or hundreds of these across the eval set, but the prompt itself should stay focused.

When you're authoring for an API call

Most of this skill applies regardless of where the prompt runs. This section covers what changes when you're constructing the API call yourself — populating the system parameter and the user message directly, rather than letting a chat interface or wrapper handle that for you.

You have two slots to work with: the system parameter and the user message. This adds a placement question on top of structure.

A prompt has two parts:

  • Static: what you write once and reuse on every call — the role, the rules, the background, the examples, the output format.
  • Dynamic: what comes in fresh each call — the specific document, question, or retrieved context being processed this time.

"You are a claims adjuster. Analyze this form: [FORM]" — everything before [FORM] is static; [FORM] is dynamic.

Put the static portion (template items 1–5 and 8) in the system prompt. Put the dynamic content (items 6–7) in the user message. Keeping them separated lets the static portion be cached and reused; mixing them defeats that and makes the prompt harder to maintain.

Authoring workflow

  1. Identify what you have. A task description, a draft, a flaky prompt that needs fixing. Read it carefully.
  2. Diagnose what's missing. Walk the eight-section template mentally and call out which sections are absent or weak. Most ad-hoc prompts are missing three to five.
  3. Surface domain context. Background detail is the highest-leverage section. If you don't know the domain well enough to write it — ask the person you're authoring for, dig into the codebase, read the schema. You can't write background detail blind.
  4. Build the prompt. Apply the structural template. Use tag delimiters for sections. Separate static from dynamic.
  5. Read the model's reasoning when you can. If the runtime exposes the model's thinking or scratchpad output, read it. It's a window into where your instructions are unclear or where the model is doing work you should be doing for it. Bake the steps it had to figure out into the prompt explicitly.
  6. Iterate on the next failure mode. If quality is still off, the usual first moves are: reorder the instructions, tighten the confidence rules, or add a few-shot example for the failing case.

Worked example: the prompt evolution

The talk demonstrated this progression on a Swedish car insurance claim analyzer (a structured form plus a hand-drawn sketch of the accident). Each version fixed exactly the failure mode the previous version exhibited.

V1 — bare task. ("Review this accident report and determine fault.") The model hallucinated a skiing accident on a Swedish street name because it had no domain anchor.

V2 — added role, domain, tone. ("Swedish car insurance claims adjuster, stay factual, don't guess.") Correct domain, correctly hedged. The model said it didn't have enough info — the right answer given what was provided.

V3 — added the form schema as background detail. All 17 checkbox meanings, what the form looks like, how humans fill it out imperfectly. Confident, correct fault determination. The schema converted "the model reverse-engineering the form every request" into "the model reading the form."

V4 — added ordered instructions, final reminders, output format. Read the form first, then the sketch; refer back to checked boxes when making claims; wrap the final answer in <final_verdict> tags. Clean, parseable, confident output suitable for a database write.

The loop: run it, find the next failure mode, add the missing piece.

Anti-patterns to avoid

  • Over-explaining the chatbot persona. "You are a helpful, harmless AI assistant who..." is wasted tokens. Get to the role and task.
  • Negative instructions instead of positive ones. "Don't be verbose" is weaker than "respond in one sentence." Tell the model what to do.
  • Burying the instructions. If the actual job is buried under three paragraphs of context, restate it at the end.
  • No confidence rules. Without "refuse if unclear," the model will attempt the task on garbage input and produce confident garbage.
  • Vague output format. "Return the answer in a structured way" is not a format. Show the shape: tags, keys, types.
  • Mismatched prompt and output styles. A bulleted prompt tends to produce bulleted output. If you want prose back, write the prompt in prose.
  • Skipping the final reminder. Recency matters; the model attends hardest to what's just before its turn.

Credit

  • Anthropic, Prompting Best Practices (current as of May 2026): principles, structural guidance, and mechanical levers in this skill.
  • "Prompting 101", Hannah Moran & Christian Ryan, Applied AI team, Anthropic, presented at Code w/ Claude, San Francisco, May 22, 2025 (youtu.be/ysPbXH0LpIE): structural-template framing, the "tell the model what mistakes you'd rather it make" core move, and the V1→V4 worked example.

Any errors in synthesis or interpretation are this skill's, not the sources'.