Initial commit

This commit is contained in:
Zhongwei Li
2025-11-30 08:25:58 +08:00
commit e13b6ff259
31 changed files with 3185 additions and 0 deletions

View File

@@ -0,0 +1,181 @@
# AskUserQuestion using chatfield.cli strategy
**CRITICAL: Strict adherence required. No deviations permitted.**
This document defines MANDATORY patterns for using `AskUserQuestion` with `chatfield.cli` interviews. Assumes you already know the AskUserQuestion tool signature.
---
## MANDATORY Pattern for EVERY Question
**REQUIRED - EXACT structure:**
```python
AskUserQuestion(
questions=[{
"question": "<chatfield.cli's exact question>", # No paraphrasing
"header": "<12 chars max>",
"multiSelect": <True/False>, # Based on data model
"options": [
# POSITION 1: REQUIRED
{"label": "Skip", "description": "Skip (N/A, blank, negative, etc)"},
# POSITION 2: REQUIRED
{"label": "Delegate", "description": "Ask Claude to look up the needed information using all available resources"},
# POSITION 3: First option from chatfield.cli (if present)
{"label": "<First from chatfield.cli>", "description": "..."},
# POSITION 4: Second option from chatfield.cli (if present)
{"label": "<Second from chatfield.cli>", "description": "..."}
]
}]
)
# POSITION 5 (implicit): "Other" - auto-added for free text
```
---
## Determine multiSelect
**Check `interview.py` Form Data Model (Chatfield builder API):**
| Data Model | multiSelect |
|------------|-------------|
| `.as_multi()` or `.one_or_more()` | `True` |
| `.as_one()` or `.as_nullable_one()` | `False` |
| Plain `.field()` (no cardinality) | `False` |
---
## Parse chatfield.cli Options
**If chatfield.cli output contains options, extract and prioritize:**
**Recognize patterns:**
- `"Status? (Single, Married, Divorced)"`
- `"Choose: A, B, C, D"`
- `"Preference: Red | Blue | Green"`
Add **first TWO** as positions 3-4
**Example:**
```
chatfield.cli: "Status? (Single, Married, Divorced, Widowed)"
Options:
1. Skip
2. Delegate
3. Single ← First from chatfield.cli
4. Married ← Second from chatfield.cli
"Other": User can type "Divorced" or "Widowed"
```
---
## Handle Responses
| Selection | Action |
|-----------|--------|
| Types via "Other" | If starts with `'`: strip prefix and pass verbatim to chatfield.cli. Otherwise: judge if it's a direct answer or instruction to Claude. Direct answer → pass to chatfield.cli; Request for Claude → research/process, then respond to chatfield.cli |
| "Skip" | Context-aware response: Yes/No questions → "No"; Optional/nullable fields → "N/A"; Other fields → "Skip" |
| "Delegate" | Research & provide answer |
| Option 3-4 | Pass selection to CLI |
| Multi-select | Join: "Email, Phone" to chatfield.cli next iteration |
## Distinguishing Direct Answers from Claude Requests
**When user types via "Other", judge intent:**
**Direct answers** (pass to chatfield.cli):
- "Find new customers in new markets" ← answer to "What is your business strategy?"
- "123 Main St, Boston MA" ← answer to "What is your address?"
- "Python and TypeScript" ← answer to "What programming languages?"
**Requests for Claude** (research first):
- "look up my SSN" ← asking Claude to find something
- "research the population" ← asking Claude to look something up
- "what's today's date" ← asking Claude a question
**Edge case:** `'` prefix forces verbatim pass-through regardless of content
---
## Delegation Pattern
**When user selects "Delegate":**
1. Parse question to understand needed info
2. Treat this as if the user directly asked, "Help me find out ..."
2. Use ALL tools available to you,
4. Pass the result to chatfield.cli as if user typed it
5. If not found, ask user
---
## Quick Examples (RULES 1-7)
**Note:** Skip handling is context-aware per "Handle Responses" table above.
### RULE 1: Free Text
```
# chatfield.cli: "What is your name?"
# multiSelect: False
# Options: Skip, Delegate
```
### RULE 2: Yes/No
```
# chatfield.cli: "Are you employed?"
# multiSelect: False
# Options: Skip, Delegate, Yes, No
```
### RULE 3: Single-Select Choice
```
# chatfield.cli: "Status? (Single, Married, Divorced, Widowed)"
# multiSelect: False
# Extract: ["Single", "Married", "Divorced", "Widowed"]
# Options: Skip, Delegate, Single, Married
# Via Other: "Divorced", "Widowed"
```
### RULE 4: Multi-Select Choice
```
# chatfield.cli: "Contact? (Email, Phone, Text, Mail)"
# Data model: .as_multi(...)
# multiSelect: True
# Extract: ["Email", "Phone", "Text", "Mail"]
# Options: Skip, Delegate, Email, Phone
# Via Other: "Text", "Mail"
```
### RULE 5: Numeric
```
# chatfield.cli: "How many dependents?"
# multiSelect: False
# Options: Skip, Delegate (optionally: "0", "1-2")
# Via Other: Exact number
```
### RULE 6: Complex/Address
```
# chatfield.cli: "Mailing address?"
# multiSelect: False
# Options: Skip, Delegate
# Via Other: Full address
```
### RULE 7: Date
```
# chatfield.cli: "Date of birth?"
# multiSelect: False
# Options: Skip, Delegate (optionally: "Today", "Tomorrow")
# Via Other: Specific date
```
---
## MANDATORY Checklist
**EVERY question MUST:**
- [ ] Be based on chatfield.cli's stdout message
- [ ] Include "Skip" as option 1
- [ ] Include "Delegate" as option 2
- [ ] Check Form Data Model for multiSelect
- [ ] Add first TWO chatfield.cli options as 3-4 (if present)

View File

@@ -0,0 +1,74 @@
# CLI Interview Loop
**CRITICAL: Strict adherence required. No deviations permitted.**
Run `chatfield.cli` iteratively, presenting its output messages via AskUserQuestion(), passing responses back, repeating until complete.
**Files:**
- State: `<basename>.chatfield/interview.db`
- Interview: `<basename>.chatfield/interview.py` (or `interview_<lang>.py` if translated)
## Workflow Overview
```plantuml
@startuml CLI-Interview-Loop
title CLI Interview Loop
start
:Initialize chatfield.cli (no message);
:chatfield.cli outputs first question;
repeat
:Understand the chatfield.cli message;
:Consider the Form Data Model for multiSelect;
:Build AskUserQuestion;
:Present to user via AskUserQuestion();
:Call chatfield.cli with the result as a message;
:chatfield.cli outputs next question/response;
repeat while (Complete?) is (no)
->yes;
:Run chatfield.cli --inspect;
:Parse collected data;
stop
@enduml
```
## CLI Command Reference
```bash
# Initialize (NO user message)
python -m chatfield.cli --state=<state> --interview=<interview>
# Continue (WITH message)
python -m chatfield.cli --state=<state> --interview=<interview> "user response"
# Inspect (when complete, or any time to troubleshoot)
python -m chatfield.cli --state=<state> --interview=<interview> --inspect
```
In all cases, chatfield.cli will print to its stdout a message for the user.
## Interview Loop Process
**CRITICAL**: When building AskUserQuestion from chatfield.cli's message, you MUST strictly follow ./AskUserQuestion-Rules.md
1. Initialize: `python -m chatfield.cli --state=<state> --interview=<interview>` (NO message)
2. Read chatfield.cli's stdout message
3. Recall or look up Form Data Model for multiSelect (`.as_multi()`, `.one_or_more()` → True)
4. Build AskUserQuestion per mandatory rules: ./AskUserQuestion-Rules.md
5. Present AskUserQuestion to user
6. Handle response:
- "Other" text → pass to chatfield.cli
- "Skip" → Context-aware response: Yes/No questions → "No"; Optional/nullable fields → "N/A"; Other fields → "Skip"
- "Delegate" → research answer, pass to chatfield.cli
- Options 3-4 → pass selection to chatfield.cli
- Multi-select → join with commas, pass to chatfield.cli
7. Call: `python -m chatfield.cli --state=<state> --interview=<interview> "user response"`
8. Repeat steps 2-7 until completion signal
9. Run: `python -m chatfield.cli --state=<state> --interview=<interview> --inspect`
## Completion Signals
Watch for:
- "Thank you! I have all the information I need."
- "complete" / "done"
When Chatfield mentions the conversation is complete, stop the loop. The CLI Interview loop is done.

View File

@@ -0,0 +1,332 @@
# Converting PDF Forms to Chatfield Interviews
<purpose>
This guide covers how to build a Chatfield interview definition from PDF form data. This is the core transformation step that converts a static PDF form into a conversational interview.
</purpose>
<important>
**Read complete API reference**: See ./Data-Model-API.md for all builder methods, transformations, and validation rules.
</important>
## Process Overview
```plantuml
@startuml Converting-PDF-To-Chatfield
title Converting PDF Forms to Chatfield Interviews
start
:Prerequisites: Form extraction complete;
partition "Read Input Files" {
:Read <basename>.form.md;
:Read <basename>.form.json;
}
:Build Interview Definition;
repeat
:Validate Form Data Model
(see validation checklist);
if (All checks pass?) then (yes)
else (no)
:Fix issues identified in validation;
endif
repeat while (All checks pass?) is (no)
->yes;
:**✓ FORM DATA MODEL COMPLETE**;
:interview.py ready for next step;
stop
@enduml
```
## The Form Data Model
<definition>
The **Form Data Model** is the `interview.py` file in the `.chatfield/` working directory. This file contains the chatfield builder definition that faithfully represents the PDF form.
</definition>
## Critical Principle: Faithfulness to Original PDF
<critical_principle>
**The Form Data Model must be as accurate and faithful as possible to the source PDF.**
**Why?** Downstream code will NOT see the PDF anymore. The interview must create the "illusion" that the AI agent has full access to the form, speaking to the user, writing information - all from the Form Data Model alone.
This means every field, every instruction, every validation rule from the PDF must be captured in the interview definition.
</critical_principle>
## Language Matching Rule
**CRITICAL: Only pass English-language strings to the chatfield builder API for English-language forms.**
The chatfield object strings should virtually always match the PDF's primary language:
- `.type()` - Use short identifier (e.g., "DHFS_FoodBusinessLicense"), not full official name. **HARD LIMIT: 64 characters maximum**
- `.desc()` - Use form's language
- `.trait()` - Use form's language for Background content
- `.hint()` - Use form's language
**Translation happens LATER** (see ./Translating.md), not during initial definition.
## Key Rules
These fundamental rules apply to all Form Data Models:
1. **Faithfulness to PDF**: The interview definition must accurately represent the source PDF form
2. **Short type identifiers**: Top-level `.type()` should be a short "class name" identifier (e.g., "W9_TIN", "DHFS_FoodBusinessLicense"), not the full official form name. **HARD LIMIT: 64 characters maximum**
3. **Direct mapping default**: Use PDF field_ids directly from `.form.json` unless using fan-out patterns
4. **Fan-out patterns**: Use `.as_*()` casts to populate multiple PDF fields from single collected value
5. **Exact field_ids**: Keep field IDs from `.form.json` unchanged (use as cast names or direct field names)
6. **Extract knowledge**: ALL form instructions go into Alice traits/hints
7. **Format flexibility**: Never specify format in `.desc()` - Alice accepts variations
8. **Validation vs transformation**: `.must()` for content constraints (use SPARINGLY), `.as_*()` for formatting (use LIBERALLY). Alice NEVER mentions format requirements to Bob
9. **Language matching**: All strings (`.desc()`, `.trait()`, `.hint()`) must match the PDF's language
## Reading Input Files
Your inputs from form-extract:
- **`<basename>.chatfield/<basename>.form.md`** - PDF content as Markdown (use this for form knowledge)
- **`<basename>.chatfield/<basename>.form.json`** - Field IDs, types, and metadata
## Extracting Form Knowledge
From `.form.md`, extract ONLY actionable knowledge:
- Form purpose (1-2 sentences)
- Key term definitions
- Field completion instructions
- Valid options/codes
- Decision logic ("If X then Y")
**Do NOT extract:**
- Decorative text
- Repeated boilerplate
- Page numbers, footers
Place extracted knowledge in interview:
- **Form-level** → Alice traits: `.trait("Background: [context]...")`
- **Field-level** → Field hints: `.hint("Background: [guidance]")`
## Builder API Patterns
### Direct Mapping (Default)
One PDF field_id → one question
```python
.field("topmostSubform[0].Page1[0].f1_01[0]")
.desc("What is your full legal name?") # English .desc() for English form
.hint("Background: Should match official records")
```
### Fan-out Pattern
Collect once, populate multiple PDF fields via `.as_*()` casts
```python
.field("age")
.desc("What is your age in years?")
.as_int("age_years", "Age as integer")
.as_bool("over_18", "True if 18 or older")
.as_str("age_display", "Age formatted for display")
```
**CRITICAL**: For fan-out, cast names MUST be exact PDF field_ids from `.form.json`
#### Re-representation Sub-pattern
When PDF has multiple fields for the same value in different formats (numeric vs words, date vs formatted date, etc.), collect ONCE and use casts:
```python
.field("amount")
.desc("What is the payment amount?")
.as_int("amount_numeric", "Amount as number")
.as_str("amount_in_words", "Amount spelled out in words (e.g., 'One hundred')")
.field("event_date")
.desc("When did the event occur?")
.as_str("date_iso", "Date in ISO format (YYYY-MM-DD)")
.as_str("date_display", "Date formatted as 'January 15, 2025'")
```
**Key principle**: Eliminate duplicate questions about the same underlying information.
### Discriminate + Split Pattern
Mutually-exclusive fields
```python
.field("tin")
.desc("Is your taxpayer ID an EIN or SSN, and what is the number?")
.must("be exactly 9 digits")
.must("indicate SSN or EIN type")
.as_str("ssn_part1", "First 3 of SSN, or empty if N/A")
.as_str("ssn_part2", "Middle 2 of SSN, or empty if N/A")
.as_str("ssn_part3", "Last 4 of SSN, or empty if N/A")
.as_str("ein_full", "Full 9-digit EIN, or empty if N/A")
```
### Expand Pattern
Multiple checkboxes from single field
```python
.field("preferences")
.desc("What are your communication preferences?")
.as_bool("email_ok", "True if wants email")
.as_bool("phone_ok", "True if wants phone calls")
.as_bool("mail_ok", "True if wants postal mail")
```
## `.must()` vs `.as_*()` Usage
**`.must()`** - CONTENT constraints (use SPARINGLY):
- Only when field MUST contain specific information
- Creates hard blocking constraint
- Example: `.must("match tax return exactly")`
**`.as_*()`** - TYPE/FORMAT transformations (use LIBERALLY):
- For any type casting, formatting, derived values
- Alice accepts variations, computes transformation
- Example: `.as_int()`, `.as_bool()`, `.as_str("name", "desc")`
**Rule of thumb**: Expect MORE `.as_*()` calls than `.must()` calls.
## Field Types
- **Text** → `.field("id").desc("question")`
- **Checkbox** → `.field("id").desc("question").as_bool()`
- **Radio/choice (required)** → `.field("id").desc("question").as_one("opt1", "opt2")`
- **Radio/choice (optional)** → `.field("id").desc("question").as_nullable_one("opt1", "opt2")`
## Optional Fields
```python
.field("middle_name")
.desc("Middle name")
.hint("Background: Optional per form instructions")
```
## Hint Conventions
All hints must have a prefix:
- **"Background:"** - Internal notes for Alice only
- Alice uses these for formatting, conversions, context without mentioning to Bob
- Example: `.hint("Background: Convert to Buddhist calendar by adding 543 years")`
- **"Tooltip:"** - May be shared with Bob if helpful
- Example: `.hint("Tooltip: Your employer provides this number")`
**See ./Data-Model-API.md** for complete list of transformations (`.as_int()`, `.as_bool()`, etc.) and cardinality options (`.as_one()`, `.as_multi()`, etc.).
## When to Use `.conclude()`
Only when derived field depends on multiple previous fields OR complex logic that can't be expressed in a single field's casts.
## Additional Guidance from PDF Forms
**Extract Knowledge Wisely:**
- Extract actionable knowledge ONLY from PDF
- Form purpose (1-2 sentences max)
- Key term definitions
- Field completion instructions
- Valid options/codes
- Decision logic ("If X then Y")
- **Do NOT extract**: Decorative text, repeated boilerplate, page numbers, footers
**Alice Traits for Format Flexibility:**
```python
.alice()
.type("Form Assistant")
.trait("Collects information content naturally, handling all formatting invisibly")
.trait("Accepts format variations (SSN with/without hyphens)")
.trait("Background: [extracted form knowledge goes here]")
```
**Default to Direct Mapping:**
PDF field_ids are internal - users only see `.desc()`. Use field IDs directly unless using fan-out patterns.
**Format Flexibility:**
Never specify format in `.desc()` - Alice accepts variations. Use `.as_*()` for formatting requirements.
## Complete Example
```python
from chatfield import chatfield
interview = (chatfield()
.type("W9_TIN")
.desc("Form to provide TIN to entities paying income")
.alice()
.type("Tax Form Assistant")
.trait("Collects information content naturally, handling all formatting invisibly")
.trait("Accepts format variations (SSN with/without hyphens)")
.trait("Background: W-9 used to provide TIN to entities paying income")
.trait("Background: EIN for business entities, SSN for individuals")
.bob()
.type("Taxpayer completing W-9 form")
.trait("Speaks naturally and freely")
.field("name")
.desc("What is your full legal name as shown on your tax return?")
.hint("Background: Must match IRS records exactly")
.field("business_name")
.desc("Business name or disregarded entity name, if different from above")
.hint("Background: Optional - only if applicable")
.field("tin")
.desc("What is your taxpayer identification number (SSN or EIN)?")
.must("be exactly 9 digits")
.must("indicate whether SSN or EIN")
.as_str("ssn_part1", "First 3 digits of SSN, or empty if using EIN")
.as_str("ssn_part2", "Middle 2 digits of SSN, or empty if using EIN")
.as_str("ssn_part3", "Last 4 digits of SSN, or empty if using EIN")
.as_str("ein_part1", "First 2 digits of EIN, or empty if using SSN")
.as_str("ein_part2", "Last 7 digits of EIN, or empty if using SSN")
.field("address")
.desc("What is your address (number, street, apt/suite)?")
.field("city_state_zip")
.desc("What is your city, state, and ZIP code?")
.as_str("city", "City name")
.as_str("state", "State abbreviation (2 letters)")
.as_str("zip", "ZIP code")
.build()
)
```
## Validation Checklist
Before proceeding, validate the interview definition:
<validation_checklist>
```
Interview Validation Checklist:
- [ ] All field_ids from .form.json are mapped
- [ ] No field_ids duplicated or missing
- [ ] Re-representations (amount/amount_in_words, date/date_formatted, etc.) use single field with casts, not duplicate questions
- [ ] .desc() describes WHAT information is needed (content), never HOW it should be formatted
- [ ] .hint() provides context about content (e.g., "Optional", "Must match passport"), never formatting instructions
- [ ] All formatting requirements (dates, codes, number formats, etc.) use .as_*() transformations exclusively
- [ ] Fan-out patterns use .as_*() with PDF field_ids as cast names
- [ ] Split patterns use .as_*() with "or empty/0 if N/A" descriptions
- [ ] Discriminate + split uses .as_*() for mutually-exclusive fields
- [ ] Expand pattern uses .as_*() casts on single field
- [ ] .conclude() used only when necessary (multi-field dependencies)
- [ ] Alice traits include extracted form knowledge
- [ ] Field hints provide context from PDF instructions
- [ ] Optional fields explicitly marked with hint("Background: Optional...")
- [ ] .must() used sparingly (only true content requirements)
- [ ] Field .desc() questions are natural and user-friendly (no technical field_ids)
- [ ] ALL STRINGS match the PDF's primary language
```
</validation_checklist>
If any items fail:
1. Review the specific issue
2. Fix the interview definition
3. Re-run validation checklist
4. Proceed only when all items pass
## The Result: Form Data Model
When validation passes, you have successfully created the **Form Data Model** in `<basename>.chatfield/interview.py`.

View File

@@ -0,0 +1,216 @@
# Conversational Form API Reference
**Library:** `chatfield` Python package
API reference for building conversational form interviews. Powered by the Chatfield library.
## Contents
- Quick Start
- Builder API
- Interview Configuration
- Roles
- Fields
- Validation
- Special Field Types
- Transformations
- Cardinality
- Field Access
- Optional Fields
---
## Quick Start
```python
from chatfield import chatfield, Interviewer
# Define
interview = (chatfield()
.field("name")
.desc("What is your full name?")
.must("include first and last")
.field("age")
.desc("Your age?")
.as_int()
.must("be between 18 and 120")
.build())
# Run
interviewer = Interviewer(interview)
user_input = None
while not interview._done:
message = interviewer.go(user_input)
print(f"Assistant: {message}")
if not interview._done:
user_input = input("You: ").strip()
# Access
print(interview.name, interview.age.as_int)
```
---
## Builder API
### Interview Configuration
```python
interview = (chatfield()
.type("Job Application") # Interview type
.desc("Collect applicant info") # Description
.build())
```
### Roles
```python
.alice() # Configure AI assistant
.type("Tax Assistant")
.trait("Professional and accurate")
.trait("Never provides tax advice")
.bob() # Configure user
.type("Taxpayer")
.trait("Speaks colloquially")
```
### Fields
```python
.field("email") # Define field (becomes interview.email)
.desc("What is your email?") # User-facing question
```
**All fields mandatory to populate** (must be non-`None` for `._done`). Content can be empty string `""`.
Exception: `.as_one()`, `.as_multi()`, and fields with strict validation require non-empty values.
### Validation
```python
.field("email")
.must("be valid email format") # Requirement (AND logic)
.must("not be disposable")
.reject("profanity") # Block pattern
.hint("Background: Company email preferred") # Advisory (not enforced)
```
### Hints
Hints provide context and guidance to Alice. **All hints must start with "Background:" or "Tooltip:"**
```python
# Background hints: Internal notes for Alice only (not mentioned to Bob)
.hint("Background: Convert Gregorian to Buddhist calendar (+543 years)")
.hint("Background: Optional per form instructions")
# Tooltip hints: May be shared with Bob if helpful
.hint("Tooltip: Your employer should provide this number")
.hint("Tooltip: Ask your supervisor if unsure")
```
**Background hints** are for Alice's internal use - she handles formatting/conversions transparently without mentioning them to Bob.
**Tooltip hints** may be shared with Bob to help clarify what information is needed.
### Special Field Types
```python
.field("sentiment_score")
.confidential() # Track silently, never ask Bob
.field("summary")
.conclude() # Compute after regular fields (auto-confidential)
```
### Transformations
LLM computes during collection. Access via `interview.field.as_*`
```python
.field("age").as_int() # → interview.age.as_int = 25
.field("price").as_float() # → interview.price.as_float = 99.99
.field("citizen").as_bool() # → interview.citizen.as_bool = True
.field("hobbies").as_list() # → interview.hobbies.as_list = ["reading", "coding"]
.field("config").as_json() # → interview.config.as_json = {"theme": "dark"}
.field("progress").as_percent() # → interview.progress.as_percent = 0.75
.field("greeting").as_lang("fr") # → interview.greeting.as_lang_fr = "Bonjour"
# Optional descriptions guide edge cases
.field("has_partners")
.as_bool("true if you have partners; false if not or N/A")
.field("quantity")
.as_int("parse as integer, ignore units")
# Named string casts for formatting
.field("ssn")
.must("be exactly 9 digits")
.as_str("formatted", "Format as ###-##-####")
# Access: interview.ssn.as_str_formatted → "123-45-6789"
```
**Validation vs. Casts:**
- **Validation** (`.must()`): Check content ("9 digits", "valid email")
- **Casts** (`.as_*()`): Provide format (hyphens, capitalization)
### Choice Cardinality
Select from predefined options:
```python
.field("tax_class")
.as_one("Individual", "C Corp", "S Corp") # Exactly one choice required
.field("dietary")
.as_nullable_one("Vegetarian", "Vegan") # Zero or one
.field("languages")
.as_multi("Python", "JavaScript", "Go") # One or more choices required
.field("interests")
.as_nullable_multi("ML", "Web Dev", "DevOps") # Zero or more
```
### Build
```python
.build() # Return Interview instance
```
---
## Field Access
**Dot notation** (regular fields):
```python
interview.name
interview.age.as_int
```
**Bracket notation** (special characters):
```python
interview["topmostSubform[0].Page1[0].f1_01[0]"] # PDF form fields
interview["user.name"] # Dots
interview["full name"] # Spaces
interview["class"] # Reserved words
```
---
## Optional Fields
Fields known to be optional (from PDF tooltip, nearby context, or instructions):
```python
.alice()
.trait("Records optional fields as empty string when user says blank/none/skip")
.field("middle_name")
.desc("Middle name")
.hint("Background: Optional per form instructions")
.field("extension")
.desc("Phone extension")
.hint("Background: Leave blank if none")
```
For optional **choices**, use `.as_nullable_one()` or `.as_nullable_multi()` (see examples above).

View File

@@ -0,0 +1,100 @@
# Populating Fillable PDF Forms
<purpose>
After collecting data via Chatfield interview, populate fillable PDF forms with the results.
</purpose>
## Process Overview
```plantuml
@startuml Populating-Fillable
title Populating Fillable PDF Forms
start
:Parse Chatfield output;
:Read <basename>.form.json for metadata;
:Create <basename>.values.json;
repeat
:Validate .values.json
(see validation checklist);
if (All checks pass?) then (yes)
else (no)
:Fix .values.json;
endif
repeat while (All checks pass?) is (no)
->yes;
:Execute fill_fillable_fields.py;
:**✓ PDF POPULATION COMPLETE**;
stop
@enduml
```
## Process
### 1. Parse Chatfield Output
Run Chatfield with `--inspect` for a final summary of all collected data:
```bash
python -m chatfield.cli --state='<basename>.chatfield/interview.db' --interview='<basename>.chatfield/interview.py' --inspect
```
Extract `field_id` and the proper value for each field.
### 2. Create `.values.json`
Create `<basename>.values.json` in the `<basename>.chatfield/` directory with the collected field values:
```json
[
{"field_id": "name", "page": 1, "value": "John Doe"},
{"field_id": "age_years", "page": 1, "value": 25},
{"field_id": "age_display", "page": 1, "value": "25"},
{"field_id": "checkbox_over_18", "page": 1, "value": "/1"}
]
```
**Value selection priority:**
- **CRITICAL**: If a language cast exists for a field (e.g., `.as_lang_es`, `.as_lang_fr`), **always prefer it** over the raw value
- This ensures forms are populated in the form's language, not the conversation language
- The language cast name matches the form's language code (e.g., `as_lang_es` for Spanish forms)
- Only use the raw value if no language cast exists
**Boolean conversion for checkboxes:**
- Read `.form.json` for `checked_value` and `unchecked_value`
- Typically: `"/1"` or `"/On"` for checked, `"/Off"` for unchecked
- Convert Python `True`/`False` → PDF checkbox values
### 3. Validate `.values.json`
**Before running the population script**, validate the `.values.json` file against the validation checklist below:
- Verify all field_ids from `.form.json` are present
- Check checkbox values match `checked_value`/`unchecked_value` from `.form.json`
- Ensure numeric fields use numbers, not strings
- Confirm language cast values are used when available
If validation fails, fix the `.values.json` file and re-validate until all checks pass.
### 4. Populate PDF
Once validation passes, run the population script (note, the `scripts` directory is relative to the base directory for this skill):
```bash
python scripts/fill_fillable_fields.py <basename>.pdf <basename>.chatfield/<basename>.values.json <basename>.done.pdf
## Validation Checklist
<validation_checklist>
**Missing fields:**
- Check that all field_ids from `.form.json` are in `.values.json`
- Verify field_id spelling matches exactly
**Wrong checkbox values:**
- Check `checked_value`/`unchecked_value` in `.form.json`
- Common values: `/1`, `/On`, `/Yes` for checked; `/Off`, `/No` for unchecked
**Type errors:**
- Ensure numeric fields use numbers, not strings: `25` not `"25"`
- Ensure boolean checkboxes use proper values from `.form.json`
**Language translation (for translated forms):**
- Ensure language cast value is used when it exists (e.g., `as_lang_es` for Spanish forms)
</validation_checklist>

View File

@@ -0,0 +1,121 @@
# Populating Non-fillable PDF Forms
<purpose>
After collecting data via Chatfield interview, populate the non-fillable PDF with text annotations.
</purpose>
## Process Overview
```plantuml
@startuml Populating-Nonfillable
title Populating Non-fillable PDF Forms
start
:Parse Chatfield output;
:Create .values.json with field values;
:Add annotations to PDF;
:**✓ PDF POPULATION COMPLETE**;
stop
@enduml
```
## Process
### 1. Parse Chatfield Output
Run Chatfield with `--inspect` for a final summary of all collected data:
```bash
python -m chatfield.cli --state='<basename>.chatfield/interview.db' --interview='<basename>.chatfield/interview.py' --inspect
```
Extract `field_id` and value for each field from the interview results.
### 2. Create `.values.json`
Create `<basename>.chatfield/<basename>.values.json` with the collected field values in the format expected by the annotation script:
```json
{
"fields": [
{
"field_id": "full_name",
"page": 1,
"value": "John Doe"
},
{
"field_id": "is_over_18",
"page": 2,
"value": "X"
}
]
}
```
**Value selection priority:**
- **CRITICAL**: If a language cast exists for a field (e.g., `.as_lang_es`, `.as_lang_fr`), **always prefer it** over the raw value
- This ensures forms are populated in the form's language, not the conversation language
- The language cast name matches the form's language code (e.g., `as_lang_es` for Spanish forms)
- Only use the raw value if no language cast exists
**Boolean conversion for checkboxes:**
- Read `.form.json` for `checked_value` and `unchecked_value`
- Typically: `"X"` or `"✓"` for checked, `""` (empty string) for unchecked
- Convert Python `True`/`False` → checkbox display values
### 3. Add Annotations to PDF
Run the annotation script to create the filled PDF:
```bash
python scripts/fill_nonfillable_fields.py <basename>.pdf <basename>.chatfield/<basename>.values.json <basename>.done.pdf
```
This script:
- Reads the `.values.json` file with field values
- Reads the `.form.json` file (from extraction) with bounding box information
- Adds text annotations at the specified bounding boxes
- Creates the output PDF with all annotations
**Verification:**
- Verify `<basename>.done.pdf` exists
- Spot-check a few fields to ensure values are correctly placed
**Result**: `<basename>.done.pdf`
## Validation Checklist
<validation_checklist>
```
Non-fillable Population Validation:
- [ ] All field values extracted from CLI output
- [ ] Language casts used when available (not raw values)
- [ ] Boolean values converted to checkbox display values
- [ ] .values.json created with correct format
- [ ] fill_nonfillable_fields.py executed successfully
- [ ] Output PDF exists at expected location
- [ ] Spot-checked fields contain correct values
- [ ] Text is visible and properly positioned
```
</validation_checklist>
## Troubleshooting
**Text not visible:**
- Check font color in .form.json (should be dark, e.g., "000000" for black)
- Verify bounding boxes are correct size
- Ensure font size is appropriate for the bounding box
**Text cut off:**
- Bounding boxes may be too small
- Review validation images from extraction phase
- Consider adjusting bounding boxes and re-running extraction validation
**Wrong language:**
- Verify you're using language cast values (e.g., `as_lang_es`) not raw values
- Check that language casts were properly requested in the Form Data Model
---
**See Also:**
- ./Populating-Fillable.md - Population workflow for fillable PDFs
- ../extracting-form-fields/references/Nonfillable-Forms.md - How bounding boxes were created
- ./Converting-PDF-To-Chatfield.md - How the Form Data Model was built

View File

@@ -0,0 +1,218 @@
# Translating Forms for Users
<purpose>
Use this guide when the PDF form is in a language different from the user's language. This enables cross-language form completion where the user speaks one language and the form is in another.
</purpose>
## Process Overview
```plantuml
@startuml Translating
title Translating Forms for Users
start
:Prerequisites: Form Data Model created\n(form language already determined);
partition "1. Copy Form Data Model" {
:Create language-specific .py file;
}
partition "2. Edit Language-Specific Version" {
:Edit interview_<lang>.py;
partition "3. Alice Translation Traits" {
:Add translation traits to Alice;
}
partition "4. Bob Language Traits" {
:Add language trait to Bob;
}
partition "5. Field Language Casts" {
:Add .as_lang("<lang>") to all text fields;
}
}
repeat
:Validate translation setup
(see validation checklist);
if (All checks pass?) then (yes)
else (no)
:Fix issues;
endif
repeat while (All checks pass?) is (no)
->yes;
:**✓ TRANSLATION COMPLETE**;
:Re-define Form Data Model as interview_<lang>.py;
stop
@enduml
```
## Critical Principle
<critical_principle>
The **Form Data Model** (`interview.py`) was already created with the form's language.
**DO NOT recreate it.** Instead, ADAPT it for translation.
The form definition stays in the form's language. Only Alice's behavior and Bob's profile are modified to enable translation.
</critical_principle>
## Process
### 1. Copy Form Data Model
Create a language-specific .py file. Use ISO 639-1 language codes: `en`, `es`, `fr`, `de`, `zh`, `ja`, etc.
```bash
# If user speaks Spanish
cp input.chatfield/interview.py input.chatfield/interview_es.py
```
### 2. Edit Language-Specific Version
Edit `interview_<lang>.py` to add translation traits.
**What to change:**
- ✅ Alice traits - Add translation instructions
- ✅ Bob traits - Add language preference
- ✅ Text fields - Add `.as_lang("<form-lang-code>")` for translation (e.g., "es" for Spanish)
**What NOT to change:**
- ❌ Form `.type()` or `.desc()` - Keep form's language
- ❌ Field definitions - Keep all field IDs unchanged
- ❌ Field `.desc()` - Keep form's language
- ❌ Background hints - Keep form's language
- ❌ Any field IDs or cast names
### 3. Alice Translation Traits
Add these traits to Alice:
```python
.alice()
# Keep existing .type()
.trait("Conducts this conversation in [USER_LANGUAGE]")
.trait("Translates [USER_LANGUAGE] responses into [FORM_LANGUAGE] for the form")
.trait("Explains [FORM_LANGUAGE] terms in [USER_LANGUAGE]")
# Keep all existing .trait() calls
```
### 4. Bob Language Traits
Add these traits to Bob:
```python
.bob()
# Keep existing .type()
.trait("Speaks [USER_LANGUAGE] only")
# Keep all existing .trait() calls
```
### 5. Field Language Casts
Add `.as_lang("<form-lang-code>")` to **all text fields** to ensure values are translated to the form's language using ISO 639-1 language codes (es, fr, th, de, etc.):
```python
.field("field_name")
.desc("...")
.as_lang("es") # For Spanish form, use "fr" for French, "th" for Thai, etc.
# Keep all existing casts
```
## Complete Example
**Original Form Data Model** (`interview.py`):
```python
from chatfield import chatfield
interview = (chatfield()
.type("Solicitud de Visa")
.desc("Formulario de solicitud de visa de turista")
.alice()
.type("Asistente de Formularios")
.trait("Usa lenguaje claro y natural")
.trait("Acepta variaciones de formato")
.bob()
.type("Solicitante de visa")
.trait("Habla de forma natural y libre")
.field("nombre_completo")
.desc("¿Cuál es su nombre completo?")
.hint("Background: Debe coincidir con el pasaporte")
.field("fecha_nacimiento")
.desc("¿Cuál es su fecha de nacimiento?")
.as_str("dia", "Día (DD)")
.as_str("mes", "Mes (MM)")
.as_str("anio", "Año (YYYY)")
.build()
)
```
**Translated Version** (`interview_en.py` for English-speaking user):
```python
from chatfield import chatfield
interview = (chatfield()
.type("Solicitud de Visa") # Unchanged - form's language
.desc("Formulario de solicitud de visa de turista") # Unchanged
.alice()
.type("Asistente de Formularios") # Unchanged
.trait("Conducts this conversation in English") # ADDED
.trait("Translates English responses into Spanish for the form") # ADDED
.trait("Explains Spanish terms in English") # ADDED
.trait("Usa lenguaje claro y natural") # Keep existing
.trait("Acepta variaciones de formato") # Keep existing
.bob()
.type("Solicitante de visa") # Unchanged
.trait("Speaks English only") # ADDED
.trait("Habla de forma natural y libre") # Keep existing
.field("nombre_completo") # Unchanged
.desc("¿Cuál es su nombre completo?") # Unchanged - form's language
.hint("Background: Debe coincidir con el pasaporte") # Unchanged
.as_lang("es") # ADDED - translate to Spanish
.field("fecha_nacimiento") # Unchanged
.desc("¿Cuál es su fecha de nacimiento?") # Unchanged
.as_str("dia", "Día (DD)") # Unchanged
.as_str("mes", "Mes (MM)") # Unchanged
.as_str("anio", "Año (YYYY)") # Unchanged
.build()
)
```
## Validation Checklist
Before proceeding, verify ALL items:
<validation_checklist>
```
Translation Validation Checklist:
- [ ] Created interview_<lang>.py (copied from interview.py)
- [ ] No changes to form .type() or .desc()
- [ ] No changes to field definitions (field IDs)
- [ ] No changes to field .desc() (keep form's language)
- [ ] No changes to .as_*() cast names or descriptions
- [ ] No changes to Background hints (keep form's language)
- [ ] Added Alice trait: "Conducts this conversation in [USER_LANGUAGE]"
- [ ] Added Alice trait: "Translates [USER_LANGUAGE] responses into [FORM_LANGUAGE]"
- [ ] Added Alice trait: "Explains [FORM_LANGUAGE] terms in [USER_LANGUAGE]"
- [ ] Added Bob trait: "Speaks [USER_LANGUAGE] only"
- [ ] Added .as_lang("<form-lang-code>") to all text fields (e.g., "es" for Spanish)
```
</validation_checklist>
If any items fail:
1. Review the specific issue
2. Fix the interview definition
3. Re-run validation checklist
4. Proceed only when all items pass
## Re-define Form Data Model
**CRITICAL**: When translation setup is complete, the **Form Data Model** is now the language-specific version (`interview_<lang>.py`), NOT the base `interview.py`.
Use this file for all subsequent steps (CLI execution, etc.).