Here’s an adjusted reply you can paste straight to Replit, incorporating the CTO’s requirement 👇

---

Thanks – that summary is spot on.

For this checkpoint, please **use the spec as the canonical system prompt for AI-powered customer messaging**, but **keep the existing FastAPI + DB contract exactly the same.**

Concretely:

1. **Keep the existing wrapper + response shape**

   * Do **not** change the shape of what Python returns to FastAPI or the DB.
   * The model wrapper must still return the same `updated_state` dict with the existing keys:

     * `soft_no_count`
     * `confusion_count`
     * `needs_human_review`
     * `appointment_status`
     * `opted_out`
     * `last_reply_type`
   * The new prompt can add **`tone_tag`** and **`intent_tag`**, but those should be:

     * Parsed inside the Python layer, and
     * Either stored **inside `updated_state`** as additional keys, or kept internally in the wrapper – **without changing** the FastAPI / DB contract.
   * In other words: same outer structure, just a richer `updated_state` internally.

2. **Save the specification as the LLM system prompt**

   * Create a dedicated prompt module, e.g.:
     `app/prompts/lia_llm_system_prompt.py`
   * Store the full LIA behaviour spec there as the **system message** that’s always sent when generating a WhatsApp reply.

3. **Implement an LLM-powered reply generator behind the existing wrapper**

   * Inside the existing model wrapper (the function that currently calls OpenAI and returns `{"reply": ..., "updated_state": ...}`):

     * Load the system prompt from `lia_llm_system_prompt.py`.
     * Build the **user message** using the structured input you already pass in (customer_name, vehicle_model, tier, stage, template_key, template_text, recent_messages, etc.), as described in the spec.
     * Call OpenAI with:

       * `system`: the spec
       * `user`: the JSON context
     * Parse the JSON response from the LLM into:

       * `reply` (the WhatsApp message text)
       * `tone_tag`
       * `intent_tag`
       * any internal flags you want
     * Then **map that back** into the existing return shape, e.g.:

       ```python
       return {
           "reply": reply_text,
           "updated_state": {
               "soft_no_count": prev_state.soft_no_count,  # or updated if you choose
               "confusion_count": prev_state.confusion_count,
               "needs_human_review": prev_state.needs_human_review,
               "appointment_status": prev_state.appointment_status,
               "opted_out": prev_state.opted_out,
               "last_reply_type": "llm",  # or similar
               "tone_tag": tone_tag,      # NEW but still inside updated_state
               "intent_tag": intent_tag   # NEW but still inside updated_state
           }
       }
       ```

   * Python can choose to **use or ignore** `tone_tag` / `intent_tag` for now (e.g. for logging or future analytics), but the outer contract stays unchanged.

4. **Keep URE logic as-is for now**

   * Don’t rewrite URE rules in this step.
   * Continue to let URE:

     * Classify `tier` (T1–T6),
     * Determine `stage` (18/12/9/6/3),
     * Pick `template_key` and `template_text`.
   * The LLM just becomes the **final wording layer** that:

     * Uses `template_text` as the “spine”,
     * Applies brand voice and behaviour rules from the spec,
     * Returns a `reply` and extra tags inside the existing state shape.

5. **Don’t touch FastAPI / DB schemas**

   * FastAPI endpoints and DB models should not change for this step.
   * This is a **drop-in behavioural upgrade** inside the existing model wrapper, not an API/schema change.

In short:

> ✅ Use the new spec as the system prompt behind the current model wrapper.
> ✅ Add `tone_tag` and `intent_tag` inside `updated_state` only.
> ❌ Don’t change the external response schema or any FastAPI/DB contracts yet.