105 lines
3.1 KiB
Markdown
105 lines
3.1 KiB
Markdown
# Charles Session File Format (.chlsx)
|
|
|
|
Reference for AI agents that need to parse Charles Proxy session files.
|
|
|
|
---
|
|
|
|
## File Format
|
|
|
|
`.chlsx` is a **ZIP archive** containing numbered XML files, each representing one HTTP request/response pair.
|
|
|
|
### Structure inside the ZIP
|
|
|
|
```
|
|
session.chlsx
|
|
├── 00001.xml
|
|
├── 00002.xml
|
|
├── 00003.xml
|
|
├── ...
|
|
└── 00/
|
|
├── 00001.xml
|
|
├── 00002.xml
|
|
└── ...
|
|
```
|
|
|
|
Files may be flat at the root or grouped in two-digit subdirectories (`00/`, `01/`, etc.) depending on session size.
|
|
|
|
### XML Structure Per File
|
|
|
|
Each XML file contains:
|
|
|
|
- **Request**: method, URL, protocol, headers, body
|
|
- **Response**: status, protocol, headers, body
|
|
- **Timing**: start time, duration
|
|
|
|
Key XML elements:
|
|
|
|
```xml
|
|
<?xml version="1.0" encoding="UTF-8"?>
|
|
<session>
|
|
<request>
|
|
<method>GET</method>
|
|
<url>https://discourse.example.com/t/123.json</url>
|
|
<protocol>HTTP/1.1</protocol>
|
|
<header name="Accept">application/json</header>
|
|
<header name="Cookie">_t=abc123</header>
|
|
<body></body>
|
|
</request>
|
|
<response>
|
|
<status>200</status>
|
|
<protocol>HTTP/1.1</protocol>
|
|
<header name="Content-Type">application/json; charset=utf-8</header>
|
|
<body>{"id": 123, "title": "...", "post_stream": {...}}</body>
|
|
</response>
|
|
<timing>
|
|
<start>2026-01-15T10:30:00.000Z</start>
|
|
<duration>450</duration>
|
|
</timing>
|
|
</session>
|
|
```
|
|
|
|
### .chls vs .chlsx vs .chlsj
|
|
|
|
| Extension | Format | Notes |
|
|
|---|---|---|
|
|
| `.chls` | Binary | Legacy format, harder to parse |
|
|
| `.chlsx` | ZIP + XML | **Prefer this**. Most common modern format |
|
|
| `.chlsj` | JSON | Newer, less common; each session is one JSON file with an array of request/response objects |
|
|
|
|
**Recommendation**: Configure Charles to save as `.chlsx` (File → Save Session As... → choose `.chlsx`).
|
|
|
|
---
|
|
|
|
## Discourse API Endpoints to Look For
|
|
|
|
These are the endpoints worth extracting from a Charles session:
|
|
|
|
| Purpose | URL pattern | Parsing target |
|
|
|---|---|---|
|
|
| Topic feed | `/latest.json` | `topic_list.topics[]` |
|
|
| Category topics | `/c/{slug}.json` | `topic_list.topics[]` |
|
|
| Single topic | `/t/{id}.json` | The full topic with posts |
|
|
| Posts in topic | `/t/{id}/{page}.json` | Paginated posts |
|
|
| Search | `/search.json?q=...` | `topics[]`, `posts[]` |
|
|
| User activity | `/u/{username}/activity.json` | User posts/topics |
|
|
|
|
---
|
|
|
|
## Extraction Strategy for AI
|
|
|
|
1. **Open the `.chlsx` as a ZIP** (it is not encrypted)
|
|
2. **Iterate over all XML files** inside
|
|
3. For each XML, check if the request URL matches a Discourse API endpoint
|
|
4. Extract the JSON response body from `<response><body>`
|
|
5. Parse the JSON and convert to Markdown
|
|
6. Organize by topic ID + title for easy search
|
|
|
|
---
|
|
|
|
## Common Pitfalls
|
|
|
|
- Some responses are paginated (`/t/{id}.json?page=1`). Collect all pages for completeness.
|
|
- Binary responses (images, JS bundles) should be skipped.
|
|
- The same topic may appear multiple times in different Charles sessions; deduplicate by topic ID + last updated timestamp.
|
|
- Session cookies captured in Charles will be expired by the time the AI reads them; only the response data matters.
|