Files
2025-11-30 08:25:58 +08:00

3.6 KiB

Populating Non-fillable PDF Forms

After collecting data via Chatfield interview, populate the non-fillable PDF with text annotations.

Process Overview

@startuml Populating-Nonfillable
title Populating Non-fillable PDF Forms
start
:Parse Chatfield output;
:Create .values.json with field values;
:Add annotations to PDF;
:**✓ PDF POPULATION COMPLETE**;
stop
@enduml

Process

1. Parse Chatfield Output

Run Chatfield with --inspect for a final summary of all collected data:

python -m chatfield.cli --state='<basename>.chatfield/interview.db' --interview='<basename>.chatfield/interview.py' --inspect

Extract field_id and value for each field from the interview results.

2. Create .values.json

Create <basename>.chatfield/<basename>.values.json with the collected field values in the format expected by the annotation script:

{
  "fields": [
    {
      "field_id": "full_name",
      "page": 1,
      "value": "John Doe"
    },
    {
      "field_id": "is_over_18",
      "page": 2,
      "value": "X"
    }
  ]
}

Value selection priority:

  • CRITICAL: If a language cast exists for a field (e.g., .as_lang_es, .as_lang_fr), always prefer it over the raw value
  • This ensures forms are populated in the form's language, not the conversation language
  • The language cast name matches the form's language code (e.g., as_lang_es for Spanish forms)
  • Only use the raw value if no language cast exists

Boolean conversion for checkboxes:

  • Read .form.json for checked_value and unchecked_value
  • Typically: "X" or "✓" for checked, "" (empty string) for unchecked
  • Convert Python True/False → checkbox display values

3. Add Annotations to PDF

Run the annotation script to create the filled PDF:

python scripts/fill_nonfillable_fields.py <basename>.pdf <basename>.chatfield/<basename>.values.json <basename>.done.pdf

This script:

  • Reads the .values.json file with field values
  • Reads the .form.json file (from extraction) with bounding box information
  • Adds text annotations at the specified bounding boxes
  • Creates the output PDF with all annotations

Verification:

  • Verify <basename>.done.pdf exists
  • Spot-check a few fields to ensure values are correctly placed

Result: <basename>.done.pdf

Validation Checklist

<validation_checklist>

Non-fillable Population Validation:
- [ ] All field values extracted from CLI output
- [ ] Language casts used when available (not raw values)
- [ ] Boolean values converted to checkbox display values
- [ ] .values.json created with correct format
- [ ] fill_nonfillable_fields.py executed successfully
- [ ] Output PDF exists at expected location
- [ ] Spot-checked fields contain correct values
- [ ] Text is visible and properly positioned

</validation_checklist>

Troubleshooting

Text not visible:

  • Check font color in .form.json (should be dark, e.g., "000000" for black)
  • Verify bounding boxes are correct size
  • Ensure font size is appropriate for the bounding box

Text cut off:

  • Bounding boxes may be too small
  • Review validation images from extraction phase
  • Consider adjusting bounding boxes and re-running extraction validation

Wrong language:

  • Verify you're using language cast values (e.g., as_lang_es) not raw values
  • Check that language casts were properly requested in the Form Data Model

See Also:

  • ./Populating-Fillable.md - Population workflow for fillable PDFs
  • ../extracting-form-fields/references/Nonfillable-Forms.md - How bounding boxes were created
  • ./Converting-PDF-To-Chatfield.md - How the Form Data Model was built