fidelity-ai-workspace/ai/discourse-archive/charles-session-format.md

# Charles Session File Format (.chlsx)

Reference for AI agents that need to parse Charles Proxy session files.

---

## File Format

`.chlsx` is a **ZIP archive** containing numbered XML files, each representing one HTTP request/response pair.

### Structure inside the ZIP

```
session.chlsx
├── 00001.xml
├── 00002.xml
├── 00003.xml
├── ...
└── 00/
    ├── 00001.xml
    ├── 00002.xml
    └── ...
```

Files may be flat at the root or grouped in two-digit subdirectories (`00/`, `01/`, etc.) depending on session size.

### XML Structure Per File

Each XML file contains:

- **Request**: method, URL, protocol, headers, body
- **Response**: status, protocol, headers, body
- **Timing**: start time, duration

Key XML elements:

```xml
<?xml version="1.0" encoding="UTF-8"?>
<session>
  <request>
    <method>GET</method>
    <url>https://discourse.example.com/t/123.json</url>
    <protocol>HTTP/1.1</protocol>
    <header name="Accept">application/json</header>
    <header name="Cookie">_t=abc123</header>
    <body></body>
  </request>
  <response>
    <status>200</status>
    <protocol>HTTP/1.1</protocol>
    <header name="Content-Type">application/json; charset=utf-8</header>
    <body>{"id": 123, "title": "...", "post_stream": {...}}</body>
  </response>
  <timing>
    <start>2026-01-15T10:30:00.000Z</start>
    <duration>450</duration>
  </timing>
</session>
```

### .chls vs .chlsx vs .chlsj

| Extension | Format | Notes |
|---|---|---|
| `.chls` | Binary | Legacy format, harder to parse |
| `.chlsx` | ZIP + XML | **Prefer this**. Most common modern format |
| `.chlsj` | JSON | Newer, less common; each session is one JSON file with an array of request/response objects |

**Recommendation**: Configure Charles to save as `.chlsx` (File → Save Session As... → choose `.chlsx`).

---

## Discourse API Endpoints to Look For

These are the endpoints worth extracting from a Charles session:

| Purpose | URL pattern | Parsing target |
|---|---|---|
| Topic feed | `/latest.json` | `topic_list.topics[]` |
| Category topics | `/c/{slug}.json` | `topic_list.topics[]` |
| Single topic | `/t/{id}.json` | The full topic with posts |
| Posts in topic | `/t/{id}/{page}.json` | Paginated posts |
| Search | `/search.json?q=...` | `topics[]`, `posts[]` |
| User activity | `/u/{username}/activity.json` | User posts/topics |

---

## Extraction Strategy for AI

1. **Open the `.chlsx` as a ZIP** (it is not encrypted)
2. **Iterate over all XML files** inside
3. For each XML, check if the request URL matches a Discourse API endpoint
4. Extract the JSON response body from `<response><body>`
5. Parse the JSON and convert to Markdown
6. Organize by topic ID + title for easy search

---

## Common Pitfalls

- Some responses are paginated (`/t/{id}.json?page=1`). Collect all pages for completeness.
- Binary responses (images, JS bundles) should be skipped.
- The same topic may appear multiple times in different Charles sessions; deduplicate by topic ID + last updated timestamp.
- Session cookies captured in Charles will be expired by the time the AI reads them; only the response data matters.