Encoding Legal Expertise into AI Agent Skills

In my post on Claude Cowork and the legal plugin panic, I introduced skills as one of the core building blocks of the new agent infrastructure — portable, folder-based workflow definitions that the agent reads before it does anything else. That post was about the conceptual framing. This one is about building skills that actually work.

Skills have become central to how I use agents. Every non-trivial workflow I run uses them, and the same goes for most agent setups worth taking seriously. The pattern holds consistently: output quality tracks skill quality. A vague skill produces vague work. A well-specified one means the agent starts from your standards, your process, your constraints, and your output format — not from scratch every time.

A few best practices have emerged over the last months. This post covers them through one example skill built for legal work — M&A transaction review. The patterns it demonstrates apply to any repeatable workflow worth encoding.

What we're building, and who can build it

A skill is a folder containing instructions and reference material. The agent reads the main file, the SKILL.md, first, then pulls from subfolders as needed.

Think of it as the briefing memo you'd write for a new associate joining your deal team. Not a task list — your thinking. How to read the structure. What "market" actually means for this deal size. Where sellers try to slide things past buyers in the indemnification section. The memo you never had time to write because you always just explained it on the phone. A skill is that memo, in a format the agent can use every time.

Here's the important part: you don't need to write code to build one. In fact, you don't need to write anything at all (in particular if you use voice transcription for your prompts). The agent will draft the skill files for you based on your input. You review them, test them, and tell the agent what's wrong. The domain knowledge is yours; the structure is the agent's job. What you do need to get right are four things that no amount of AI assistance can substitute for:

A precise trigger: the description that tells the agent when to load this skill
Explicit process steps: not prose guidance, but literal instructions the agent turns into an action plan
Do's and don'ts: the constraints that steer the agent away from failure modes
A concrete output format: show the agent exactly what the deliverable looks like, don't describe it

Get these four right and the agent handles the rest. Get them wrong and the skill either doesn't fire, produces unusable output, or both.

Let's see how this works in practice with a transaction document review skill. The five-step creation process:

Capture intent (what should this do?)
Draft the SKILL.md (core instructions)
Build reference files (playbooks, templates, brand guide)
Test the skill
Iterate until it works consistently

Step 1: Capture Intent

Before writing anything, answer four questions. Write the answers down — this is the brief you hand to the agent when you ask it to draft the SKILL.md.

What should this skill do when triggered?

For transaction review: Analyze share purchase agreements against due diligence findings, check closing conditions, representations and warranties, indemnification provisions, produce branded PDF reports for deal teams.

What are the inputs? - SPA or merger agreement (full text) - Due diligence findings summary - Deal context (structure, size, jurisdictions involved) - Target business type (affects materiality thresholds)

What is the expected output? Branded PDF transaction review report with: executive summary, closing condition risk assessment, reps/warranties analysis, indemnification review, recommended schedule adjustments, overall deal risk rating, formatted to the firm's standard report template and brand guide.

When should this skill trigger? User says "review this SPA," "analyze closing conditions," or "check indemnification provisions."

Answering these questions before asking the agent to draft forces you to specify what "done" looks like before you define how to get there.

Step 2: Draft the `SKILL.md`

Hand the four answers from Step 1 to the agent and ask it to draft a SKILL.md. Most agents ship with a pre-configured skill-creator skill that the agent will (hopefully) use to draft the SKILL.md. Review what it produces against these four requirements.

The Header (YAML Frontmatter)

---
name: transaction-review
description: Use when reviewing M&A transaction documents (share purchase agreements, merger agreements). Analyzes closing conditions, reps/warranties, indemnification against diligence findings and market standards.
---

The description is the trigger. If it's vague, the skill won't load when you need it. "Helps with contract review" is too quiet — the model can't reliably match that to "review this SPA." The agent tends to write safe, generic descriptions. Push it: name the document type, the sections being analyzed, the output being produced. If the first draft is too broad, tell the agent to be more specific and show it what specific looks like.

The Body Structure

# Transaction Document Review

## Trigger Conditions
- User says "review this SPA" or "analyze closing conditions"
- Reviewing share purchase or merger agreements
- Comparing provisions against diligence findings
- Checking indemnification against market standards

## Context Required
- SPA or merger agreement (full text)
- Due diligence findings summary
- Deal structure (asset deal vs. share deal)
- Deal size range (affects market standards)
- Target business type/sector

## Process

### 1. Identify Document Type
Check: Share Purchase Agreement vs. Merger Agreement vs. Asset Purchase Agreement. Note jurisdictions involved. If unclear, ask for clarification.

### 2. Load Deal Parameters
Determine from context:
- Deal size range (€10M-€50M, €50M-€200M, €200M+)
- Structure (100% acquisition vs. majority stake)
- Jurisdictions involved
- Sector (manufacturing, services, tech, etc.)

Use these to reference appropriate market standards.

### 3. Section-by-Section Review

**A. Closing Conditions**
Check against diligence findings:

| Condition | Diligence Status | Risk |
|-----------|-----------------|------|
| Regulatory approvals | [Status] | [High/Med/Low] |
| No MAC | [Business performance] | [Assess] |
| Key contracts consent | [Consent status] | [Risk] |
| Financial condition | [Latest numbers] | [Compare to threshold] |

MAC = Material Adverse Change clause — the condition that lets a buyer walk away if the target business deteriorates significantly between signing and closing.

**B. Representations and Warranties**
- **Fundamental reps:** Title, capitalization, authority — survival period?
- **Business reps:** Ordinary course, material contracts, financials — qualifiers?
- **Specific reps:** Environmental, IP, employment — tailored to target?
- **Bringing down reps:** When? At signing? At closing?

**C. Indemnification**
- **Basket structure:** Deductible vs. tipping basket vs. first dollar
  *(A tipping basket means once claims exceed the threshold, all claims are recoverable from dollar one — not just the excess.)*
- **Cap amount:** % of purchase price?
- **Survival periods:** Fundamental vs. general reps
- **Special indemnities:** Specific deal risks addressed?
- **Escrow/Holdback:** Amount, duration, release conditions

**D. Material Adverse Change (MAC)**
- Qualifications: Carve-outs for market/industry changes?
- Probability standard: Probable vs. reasonably likely?
- Specificity: Enumerated events or general?

**E. Price Adjustments**
- Working capital: Mechanism clear? Accounting standards?
- Earnout: Metrics realistic? Dispute resolution?

### 4. Cross-Reference Diligence
For each rep, verify:
- Does diligence support the representation?
- Are there known exceptions not disclosed in schedules?
- Which reps might fail at closing?

### 5. Compare to Market Standards
Check against `references/market-standards.md`:
- Is indemnification basket standard for this deal size?
- Are survival periods market?
- Is MAC clause buyer-favorable or balanced?

### 6. Generate Report
Produce a branded PDF using the firm's report template and brand guide.
See `examples/brand-guide.md` for typography, colors, logo placement, and spacing.
See `examples/report-template.md` for section structure, page layout, and risk rating formatting.

The output must look like a firm deliverable — not a markdown file converted to PDF. Follow the brand guide precisely.

### 7. Review Against Checklist
Before finalizing, check the draft report against `references/review-checklist.md`:
- All closing conditions assessed with risk ratings?
- All rep categories covered (fundamental, business, specific, bringing-down)?
- Indemnification market check complete (basket, cap, survival periods)?
- MAC clause assessment included?
- Schedule adjustment recommendations provided?
- Overall risk rating consistent with individual section findings?

If any item is incomplete, address it before generating the final PDF.

## Output Format

[PDF report structured as follows]

Cover page: Firm logo, matter name, date, confidentiality notice
(See examples/brand-guide.md for exact logo placement and cover layout)

Section 1 — Executive Summary
  Risk Rating: [HIGH / MEDIUM / LOW] (use brand color coding from brand guide)
  2-3 sentence bottom line on deal viability and key concerns

Section 2 — Closing Conditions Analysis
  Table: Condition / Diligence Status / Risk / Notes
  Narrative: Critical risks and conditions unlikely to be satisfied

Section 3 — Representations & Warranties
  Fundamental reps: survival period, assessment, known issues
  Business reps: materiality qualifiers, specific concerns
  Bringing-down reps: timing and closing risk

Section 4 — Indemnification Review
  Basket: type, amount, market check
  Cap: amount, market check, carve-outs
  Survival periods table
  Escrow/holdback: amount, duration, concerns

Section 5 — Material Adverse Change Clause
  Qualifications, probability standard, risk assessment

Section 6 — Recommended Schedule Adjustments
  Numbered list of disclosure gaps and suggested additions

Section 7 — Deal Risk Assessment
  Overall risk rating, key factors, priority recommendations

## Do's and Don'ts

- **Don't assume German law for all provisions** — Check the governing law clause (the clause specifying which country's law governs the contract); warranty standards differ significantly across jurisdictions
- **Don't infer size-based standards from absolute numbers** — What matters is percentage of purchase price, not absolute € amounts
- **Don't skip the "knowledge" qualifier analysis** — Actual knowledge vs. constructive knowledge makes a significant difference in risk allocation for unknown liabilities
- **Don't conflate materiality in representations with materiality in baskets** — Different concepts, different standards
- **Don't ignore the "ordinary course" definition** — Vague definitions create closing risk if business conditions change
- **I know you want to flag every point of negotiation, but don't** — Focus on material risks that affect deal viability or value significantly
- **Don't produce markdown** — Output must be a properly formatted PDF matching the brand guide

## References

- `references/market-standards.md` — Indemnification baskets, caps, survival periods by deal size
- `references/closing-conditions-checklist.md` — Standard conditions and typical issues
- `references/mac-clause-patterns.md` — MAC clause structures and risk allocation
- `references/german-ma-law.md` — German law specific provisions
- `references/review-checklist.md` — Final quality check before PDF generation
- `examples/brand-guide.md` — Firm typography, colors, logo usage, page margins
- `examples/report-template.md` — Section structure, risk rating formatting, cover page layout

The agent will produce something close to this structure on the first draft. What it won't get right — and can't — are the do's and don'ts. Those require your experience: the jurisdiction-specific traps, the failure modes you've seen in real deals, the things that look fine on paper until they blow up at closing. That section is the most important part of the SKILL.md, and it's the only part that can't be AI-generated from a description alone.

Key Design Decisions

Numbered steps, not prose. The process section uses ordered steps with tables. This is literal instruction, not suggestion. The agent turns this into an action plan.

Show the output structure. Don't describe what the report should contain. Show the section-by-section layout with explicit references to the brand guide. The agent produces what you show — and a generic markdown description will produce a generic markdown output.

The do's and don'ts section is non-negotiable. Every skill needs one. The don'ts are especially valuable — they steer the agent away from confident-looking mistakes that a trigger and process alone won't prevent. "I know you want to X, but don't" is the pattern.

Let the agent draft this. Describe what you want, paste in the four answers from Step 1, and ask the agent to write the SKILL.md. Review it against the four requirements (trigger, process, do's and don'ts, output). Tell it what's wrong. It will iterate faster than you will.

Step 3: Build Reference Files

The SKILL.md references files in subfolders. The agent can draft most of these too — give it the domain knowledge and ask it to structure it.

Example Folder Structure

transaction-review/
├── SKILL.md
├── references/
│   ├── market-standards.md
│   ├── closing-conditions-checklist.md
│   ├── mac-clause-patterns.md
│   ├── german-ma-law.md
│   └── review-checklist.md
└── examples/
    ├── brand-guide.md
    ├── report-template.md
    └── sample-output.pdf

Why Separation Matters

The SKILL.md knows how to run a review. The references hold what to check — market standards, clause patterns, jurisdiction-specific rules. The examples folder holds the formatting standards. Each layer serves a different purpose, and each can change independently.

That separation pays off in two ways. First, maintainability: market standards shift — updating market-standards.md doesn't touch the process logic. The firm rebrands — updating brand-guide.md doesn't touch anything else. Second, context efficiency: a language model has limited working memory, and loading everything upfront on a long transaction document wastes it. The agent reads SKILL.md first, then pulls individual reference files only when a step requires them. When assessing indemnification, it reads market-standards.md. When generating the PDF, it reads brand-guide.md. The more granular the reference files, the more focused the context stays.

What Goes in the Reference Files

references/market-standards.md — Market data the skill will cite when assessing whether a provision is standard. For each deal size bracket: basket structure and percentage, cap range and common carve-outs, survival period norms by rep type, escrow amounts and duration, R&W insurance thresholds. Example:

## Indemnification Baskets

| Deal Size | Approximate Range | Structure |
|-----------|------------------|-----------|
| €10-50M | 0.5-1% of PP | Tipping basket or deductible |
| €50-100M | 0.25-0.5% of PP | Tipping basket common |
| €100-200M | 0.1-0.25% of PP | Often first dollar after threshold |
| €200M+ | Negotiated, often 0.1% | First dollar common |

Ask the agent to draft this file from a description of the market you work in. Review the numbers carefully — this is where domain knowledge matters and where the agent will fill gaps with generic or US-market data if you don't correct it.

references/mac-clause-patterns.md — Three MAC clause types ranging from broad buyer-favorable language to objective financial thresholds, with notes on how German courts interpret them (narrowly, with burden on buyer). Key insight encoded here: absence of specific carve-outs for market and industry conditions is a high-risk flag, not a drafting oversight.

references/german-ma-law.md — Provisions specific to German-law deals: § 311b BGB notarization requirements, works council process obligations, GmbH vs. AG transfer formalities. This file exists because German M&A has jurisdiction-specific rules that differ significantly from US or UK practice — and an agent without this context defaults to US-market assumptions. The result is risk assessments that look authoritative but are calibrated to the wrong legal framework.

examples/brand-guide.md — The firm's visual identity rules translated into instructions the agent can follow: typeface and sizes, primary and secondary colors with hex codes, logo placement rules, page margins, header and footer formats, how risk ratings should be color-coded (red/amber/green or equivalent). If the firm has a design manual, paste the relevant sections in and ask the agent to reformat them as agent-readable instructions.

examples/report-template.md — Section structure and page layout for the standard transaction review report: cover page fields, section order, table formats, how the executive summary risk rating should appear, footer content. The agent uses this as the blueprint when generating the PDF.

Reference files don't have to hold all the knowledge

The examples above store knowledge inline — market standards written into a markdown table, MAC patterns described in prose. That works, but it's not the only option. If your firm has a deal database, a document management system, or any knowledge base that's queryable — via MCP, an API, or even a URL — the reference file can describe how to access it rather than duplicate its contents.

A reference file might say: "For current market standards, query the firm's precedent database using the deal-search MCP tool with parameters: deal_type=SPA, jurisdiction=DE, size_range=[deal size]." The agent reads the instruction, connects to the live source, and pulls current data rather than relying on whatever was last written down.

This keeps reference files light, knowledge current, and avoids the maintenance problem of manually updating markdown whenever market terms shift.

Step 4: Validate the Skill

The most meaningful test is a real deal with a known outcome. Pull an SPA your team has already worked through, run it against the skill, and compare the output to what the experienced lawyers actually flagged. If the skill misses a risk they caught — or flags things they dismissed as immaterial — that's your iteration input. Real-world validation beats hypothetical scenarios because the edge cases that matter are the ones your practice has already encountered.

If you're starting fresh without access to historical deal files, representative scenarios covering different risk profiles work as a starting point. The goal is the same: vary the inputs enough to surface how the skill handles different deal structures. I start with the most obviously broken scenario — Test Case 3 below, unlimited liability and no MAC carve-outs — because if the skill doesn't catch that, there's no point running the nuanced ones.

Test Case 1: Straightforward Acquisition Input: €30M manufacturing target, standard German SPA, buyer-friendly but not aggressive terms. Expected: Medium risk. Standard 1% tipping basket, 15% cap, 18-month survival. MAC clause has standard carve-outs but broad language. Report formatted correctly to brand guide.

Test Case 2: R&W Insurance Deal (R&W insurance = representations and warranties insurance — a policy that covers buyer losses if seller's reps turn out to be false.) Input: €150M services company, R&W insurance in place, fundamental reps only recourse against seller, no escrow. Expected: Lower seller credit risk but flag need to verify policy adequacy. Check exclusions against known diligence issues.

Test Case 3: Problematic Indemnification Input: €75M transaction, unlimited liability for reps, no cap, no basket, MAC clause with no carve-outs. Expected: High risk. Flag unlimited liability. MAC clause creates massive signing-to-closing exposure for seller. Report shows red risk ratings.

Test Case 4: Closing Condition Issues Input: SPA has regulatory approval condition but antitrust filing not yet made; key customer contract consent not obtained; target showing declining EBITDA (down 12% YoY), potentially approaching MAC territory. Expected: High risk on conditions. Regulatory timing uncertain. MAC clause may already be triggered.

Run each test and grade the output: pass means the format matches the brand guide, risks are correctly identified, and market standards are applied for the right deal size; partial if something was missed; fail if the format is wrong or a critical risk wasn't flagged. If the output needs significant editing before it's usable, the skill isn't ready. Each failure is a don't waiting to be written.

Step 5: Iterate

Every failure becomes a don't. Every partial result reveals a gap in either the process or the reference files.

Skill never triggers - Symptom: You ask for SPA review, agent doesn't load the skill. - Fix: Strengthen the description. Ask the agent to rewrite it with more specific trigger keywords: "SPA," "share purchase," "closing conditions."

Wrong market standards applied - Symptom: €15M deal assessed against €200M+ standards. - Fix: Make deal size determination explicit in Step 2. Add a don't: "Don't derive standards from absolute numbers — use percentage of purchase price."

Misses MAC clause risk - Symptom: No carve-outs flagged as absent or unusual. - Fix: Strengthen the MAC patterns reference file. Add an explicit don't: "Absence of carve-outs is a high risk flag, not standard drafting."

Produces markdown instead of PDF - Symptom: Output is a formatted markdown file. - Fix: Strengthen the "Don't produce markdown" rule and make the brand guide reference more prominent in the output format section. Add an explicit step 6 instruction: "Generate PDF using brand guide and report template."

Too verbose on minor points - Symptom: Report flags drafting style issues rather than material risks. - Fix: Reinforce the "I know you want to flag every point, but don't" rule. Add a materiality threshold to Step 3.

When test cases pass consistently and outputs require little to no post-processing, the skill is ready. Tell the agent what failed after each round — it will revise the SKILL.md, update the reference files, and add do's and don'ts faster than editing them manually.

Where the skill won't help

Some failures don't respond to iteration. If the SPA is structured unusually the market standards file may simply not apply. The skill will still produce output that looks confident. That's the dangerous case: treat anything atypical more skeptically than the risk rating alone suggests.

Reference files also need active maintenance. Market terms shift, statutory requirements change, and nothing in the skill flags when its own reference data is stale. Someone has to own that update cycle.

What this unlocks

Skills encode what you know into something the agent applies consistently — not as a vague instruction set, but as a structured file that versions, shares, and improves over time. The domain knowledge is yours. The structure, the drafting, and the iteration can be almost entirely AI-assisted.

The barrier to building one is lower than it looks. You need four things to get right — trigger, process, do's and don'ts, output format — and a willingness to test and iterate. The agent does the writing. You supply the expertise it can't invent: which market standards actually apply, which don'ts come from real deal experience, what a firm-quality report looks like.

That's the actual division of labor here. Not "lawyers who code" versus everyone else. The skills mechanism is accessible to any practitioner who can describe their workflow clearly.

Where to start

Pick one task you do repeatedly — something with a consistent output format, or things you always check, or preferred sources you always use.

Describe it to the agent using the four questions in Step 1. Ask it to draft a SKILL.md. Review the four core requirements: does the trigger description match how you'd actually ask for this? Are the process steps explicit enough to produce consistent output? Are the don'ts specific enough to prevent the failure modes you'd actually see? Does the output format show exactly what you want, not just describe it?

Test it. Fix what fails. Add a don't for every failure mode you find.

The first skill is the hardest to build — you're figuring out the format at the same time as the content. The second takes half as long. By the third, you're adding reference files that improve all of them at once. That's not an abstraction; that's what institutional knowledge looks like when it has somewhere to live.

Once a few skills work reliably, the next step is a shared library — a folder your colleagues can pull from. The process is the same for everyone; the reference files are where firm-specific context lives. That's the point where a skill stops being a personal workflow and starts being something the whole team builds on.