feat: add tooling and documentation for archiving Discourse content via Charles Proxy .chlsx sessions.

2026-05-19 14:28:45 -06:00
parent f726814811
commit 73166b585f
8 changed files with 322 additions and 3 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -35,3 +35,6 @@ project-knowledge/.obsidian/plugins/
 project-knowledge/.obsidian/snippets/
 project-knowledge/.obsidian/cache/
 .trash/
 # Antigravity CLI local workspace configuration
 .antigravitycli/
--- a/agent-memory/workflows/ai-to-ai-prompting.md
+++ b/agent-memory/workflows/ai-to-ai-prompting.md
@@ -61,6 +61,9 @@ Use this structure by default:
 - For VS Code multi-root Copilot workflows, preserve repo-provided customizations such as `.github/prompts`, `.github/instructions`, `.github/agents`, `.github/skills`, and `AGENTS.md`. Shared `fidelity-ai-copilot` customizations should supplement these repo files, while repo-specific instructions should be treated as the practical authority when they conflict.
 - For Fidelity Jira/Confluence access from GitHub Copilot CLI or VS Code, do not assume the approved access method. First have the target AI read the current Fidelity-provided human instructions from Confluence or local exported docs, then configure the smallest matching workflow. If those instructions require terminal `curl` with environment variables such as `COPILOT_JIRA_URL` and `COPILOT_JIRA_TOKEN`, enforce that path; otherwise follow the documented Fidelity-approved method. Never print, persist, or hardcode tokens.
 - Treat `fidelity-ai-copilot` as a self-improving AI harness rather than a static prompt dump: the target AI should notice recurring useful workflows, newly discovered internal instructions, and tool changes, then propose small auditable updates to instructions, skills, prompts, agents, specs, or validation checklists. It should ask before making broad changes and keep product repos clean.
 - For corporate-tool captures in `fidelity-ai-copilot`, prefer a single raw Charles Mirror source such as `archive/charles-mirror/` and treat it as read-only evidence organized by hostname. Generated Copilot outputs should be written to separate per-platform folders only when useful, with prompts requiring source inspection, narrow scope, local-only processing, and explicit evidence paths.
 - When advising on `fidelity-ai-copilot` customization, use this routing: keep global safety and repo role in `AGENTS.md` / `.github/copilot-instructions.md`; use `.github/instructions/*.instructions.md` for path-scoped rules such as `archive/**`; use `.github/prompts/*.prompt.md` for repeatable slash-command tasks; use `.github/agents/*.agent.md` for persistent personas with tool restrictions and handoffs; use `.github/skills/*/SKILL.md` for reusable multi-step capabilities with scripts, examples, or resources. Prefer small, composable artifacts over one large instruction file.
 - For read-only evidence prompts such as Discourse/Charles Mirror search, explicitly prevent the target AI from editing the prompt/configuration files while running the workflow. If Copilot changes `.github/prompts/*.prompt.md` during an evidence query, treat that as a workflow bug unless the user specifically asked to update the prompt.
 - When the user says they will handle dependency alignment, registry configuration, or compile/test execution manually on the development machine, generated Copilot follow-ups should not ask Copilot to solve those dependency/tooling issues or run broad builds. Instead, ask Copilot for the smallest source-level fix for the specific compiler error the user provides, state that the user will rerun validation manually, and request a concise summary of changed files and expected validation impact.
 ---
--- a/ai/discourse-archive/charles-session-format.md
+++ b/ai/discourse-archive/charles-session-format.md
@@ -0,0 +1,104 @@
 # Charles Session File Format (.chlsx)
 Reference for AI agents that need to parse Charles Proxy session files.
 ---
 ## File Format
 `.chlsx` is a **ZIP archive** containing numbered XML files, each representing one HTTP request/response pair.
 ### Structure inside the ZIP
 ```
 session.chlsx
 ├── 00001.xml
 ├── 00002.xml
 ├── 00003.xml
 ├── ...
 └── 00/
    ├── 00001.xml
    ├── 00002.xml
    └── ...
 ```
 Files may be flat at the root or grouped in two-digit subdirectories (`00/`, `01/`, etc.) depending on session size.
 ### XML Structure Per File
 Each XML file contains:
 - **Request**: method, URL, protocol, headers, body
 - **Response**: status, protocol, headers, body
 - **Timing**: start time, duration
 Key XML elements:
 ```xml
 <?xml version="1.0" encoding="UTF-8"?>
 <session>
  <request>
    <method>GET</method>
    <url>https://discourse.example.com/t/123.json</url>
    <protocol>HTTP/1.1</protocol>
    <header name="Accept">application/json</header>
    <header name="Cookie">_t=abc123</header>
    <body></body>
  </request>
  <response>
    <status>200</status>
    <protocol>HTTP/1.1</protocol>
    <header name="Content-Type">application/json; charset=utf-8</header>
    <body>{"id": 123, "title": "...", "post_stream": {...}}</body>
  </response>
  <timing>
    <start>2026-01-15T10:30:00.000Z</start>
    <duration>450</duration>
  </timing>
 </session>
 ```
 ### .chls vs .chlsx vs .chlsj
 | Extension | Format | Notes |
 |---|---|---|
 | `.chls` | Binary | Legacy format, harder to parse |
 | `.chlsx` | ZIP + XML | **Prefer this**. Most common modern format |
 | `.chlsj` | JSON | Newer, less common; each session is one JSON file with an array of request/response objects |
 **Recommendation**: Configure Charles to save as `.chlsx` (File → Save Session As... → choose `.chlsx`).
 ---
 ## Discourse API Endpoints to Look For
 These are the endpoints worth extracting from a Charles session:
 | Purpose | URL pattern | Parsing target |
 |---|---|---|
 | Topic feed | `/latest.json` | `topic_list.topics[]` |
 | Category topics | `/c/{slug}.json` | `topic_list.topics[]` |
 | Single topic | `/t/{id}.json` | The full topic with posts |
 | Posts in topic | `/t/{id}/{page}.json` | Paginated posts |
 | Search | `/search.json?q=...` | `topics[]`, `posts[]` |
 | User activity | `/u/{username}/activity.json` | User posts/topics |
 ---
 ## Extraction Strategy for AI
 1. **Open the `.chlsx` as a ZIP** (it is not encrypted)
 2. **Iterate over all XML files** inside
 3. For each XML, check if the request URL matches a Discourse API endpoint
 4. Extract the JSON response body from `<response><body>`
 5. Parse the JSON and convert to Markdown
 6. Organize by topic ID + title for easy search
 ---
 ## Common Pitfalls
 - Some responses are paginated (`/t/{id}.json?page=1`). Collect all pages for completeness.
 - Binary responses (images, JS bundles) should be skipped.
 - The same topic may appear multiple times in different Charles sessions; deduplicate by topic ID + last updated timestamp.
 - Session cookies captured in Charles will be expired by the time the AI reads them; only the response data matters.
--- a/ai/discourse-archive/copilot-prompt-charles-discourse-archiver.md
+++ b/ai/discourse-archive/copilot-prompt-charles-discourse-archiver.md
@@ -0,0 +1,140 @@
 ---
 type: copilot-prompt
 status: ready
 target: github-copilot
 purpose: Parse Charles .chlsx sessions to create a searchable Discourse archive
 ---
 # Copilot Prompt — Charles Discourse Archiver
 Paste this into GitHub Copilot on the corporate device.
 ---
 ## Prompt
 You are helping me build a local searchable archive of a Discourse forum from captured Charles Proxy session files.
 ### Background
 I browse a Discourse forum in my browser while Charles Proxy records traffic. I save the session as a `.chlsx` file. Inside that file are all the HTTP request/response pairs for the pages I visited — including Discourse API calls that return structured JSON (topics, posts, categories, user profiles).
 I need you to extract only the Discourse content and organize it into a Markdown archive that:
 - Is searchable by an AI in future sessions
 - Preserves topic titles, post authors, dates, and content
 - Groups by category
 - Deduplicates topics that appear across multiple sessions
 ### File format: `.chlsx`
 `.chlsx` is a ZIP archive. Inside are numbered XML files (e.g. `00001.xml`, `00/00001.xml`). Each XML file represents one HTTP request/response pair with this structure:
 ```xml
 <session>
  <request>
    <method>GET</method>
    <url>https://forum.example.com/t/123.json</url>
    <protocol>HTTP/1.1</protocol>
    <header name="Cookie">...</header>
    <body></body>
  </request>
  <response>
    <status>200</status>
    <protocol>HTTP/1.1</protocol>
    <header name="Content-Type">application/json; charset=utf-8</header>
    <body>{"id": 123, "title": "Some Topic", "post_stream": {...}}</body>
  </response>
  <timing>
    <start>2026-01-15T10:30:00.000Z</start>
    <duration>450</duration>
  </timing>
 </session>
 ```
 ### Discourse API endpoints to extract
 | What | URL pattern | JSON fields |
 |---|---|---|
 | Latest topics | `/latest.json` | `topic_list.topics[].{id, title, slug, category_id, created_at, last_posted_at}` |
 | Category index | `/categories.json` | `category_list.categories[].{id, name, slug}` |
 | Single topic (with posts) | `/t/{id}.json` | `id, title, slug, category_id, post_stream.posts[].{username, cooked, created_at, post_number}` |
 | Topic with page | `/t/{id}/{page}.json` | Same as above, paginated |
 | User activity | `/u/{username}/activity.json` | `user_actions[]` |
 | Search results | `/search.json?q=...` | `topics[]`, `posts[]` |
 ### What to do
 1. **Open the `.chlsx` file** as a ZIP archive.
 2. **List all XML files** inside (both flat and in subdirectories).
 3. **For each XML file**, parse it and check if the request URL matches one of the Discourse endpoints above.
 4. **Skip**: CSS, JS, images, font files, analytics, CDN assets, and any non-Discourse endpoint.
 5. **Parse the JSON response body** from `<response><body>`.
 6. **Create this folder structure** as output:
 ```
 discourse-archive/
 ├── categories.json          # All categories found
 ├── index.md                 # Master index (table of all topics with ID, title, date, category, URL)
 ├── topics/
 │   ├── 123-your-topic-slug.md
 │   ├── 456-another-topic.md
 │   └── ...
 ```
 ### Markdown format per topic
 Each topic file should be a clean Markdown document with YAML frontmatter:
 ```markdown
 ---
 id: 123
 title: "Your Topic Title"
 slug: your-topic-slug
 category: "Category Name"
 created: 2026-01-15
 updated: 2026-01-16
 url: https://forum.example.com/t/your-topic-slug/123
 ---
 # Your Topic Title
 **Category**: Category Name
 ---
 ## Post 1 — @username1 (2026-01-15T10:30:00Z)
 Post content here (HTML stripped, plain Markdown preferred).
 ---
 ## Post 2 — @username2 (2026-01-16T14:00:00Z)
 More content.
 ---
 ```
 ### Deduplication rules
 - If the same topic ID appears in multiple `.chlsx` files, keep the one with the most recent `last_posted_at`.
 - If a session has page 2+ of a topic (`/t/123/2.json`), merge the posts with page 1.
 - Never duplicate posts within a topic.
 ### What to do with the output
 Place the resulting `discourse-archive/` folder in a location I can reference in future Copilot sessions. I will point Copilot to that folder when I need to search past Discourse conversations.
 ### Constraints
 - Do not modify the original `.chlsx` file.
 - Do not upload or send the extracted data anywhere — keep it local.
 - If a topic has no readable content (deleted, access restricted), note it in the index but skip the full extraction.
 - HTML in `cooked` fields should be converted to readable plain text / Markdown (Discourse stores posts as HTML in the JSON).
 ### First action
 Ask me for:
 1. The path to the `.chlsx` file (or files)
 2. The Discourse base URL (so you can construct canonical topic URLs)
 3. Where I want the output folder created
--- a/project-knowledge/02-work-items/IA
+++ b/project-knowledge/02-work-items/IA
@@ -16,9 +16,13 @@
 	- Differences from skills with cli commands vs mcps
 	- Differences of skills vs instructions from vscode copilot
 	- Differences from agents vs skills using agent, what is more general? correct relationship and use
- Charles Proxy integration
+- Fidelity
- LaunchDarkly integration
+	- Charles Proxy integration
- Teams integration
+	- LaunchDarkly integration
 	- Teams integration
 	- Splunk analyzer
 	- [x] Discourse integration
 	- ServiceNow access
 - Photo uploader
 	- [x] Multi photos session, copy multiples images in clipboard
 	- [ ] Start as a service
--- a/project-knowledge/06-daily/2026-05-18.md
+++ b/project-knowledge/06-daily/2026-05-18.md
@@ -0,0 +1,38 @@
 ---
 type: daily
 project: fidelity
 date: 2026-05-18
 status: active
 focus: [context-refresh, pdiap-12284]
 work-items: [PDIAP-12284]
 blockers: []
 tags:
  - daily
  - fidelity
 updated: 2026-05-18
 ---
 # 2026-05-18
 ## Work Done
 - Sent the daily scrum update for `PDIAP-12284 - Remove UIKit wrapping from XFlow`: continued SampleApp validation in both host modes, aligned the host-mode path with current flag behavior instead of the deprecated `enable-swift-ui` toggle, and started broader Fid4 smoke testing with temporary validation logs.
 ## Findings
 - While refreshing context for Adam's duplicate-request question, David clarified that the April 4 follow-up in the `Production - Crypto Delinking issue` thread was from David/Jeff's side after Yuva's March 30 comment.
 - That April 4 follow-up said the iOS SDK-side network requests looked correct and intentional, matched Android, and did not reproduce duplicate `open-account` API calls from the client side in non-prod.
 - Therefore, prior `PDSPS-29371` context should not be summarized as a confirmed reproduction from the iOS SDK side; it is related background, but the previous investigation did not reproduce the issue from the client-side SDK path.
 - Follow-up Copilot analysis suggested a plausible but unconfirmed link between REST migration and the current duplicate-page/account report: REST did not introduce a new duplicate-trigger mechanism by itself, but the REST/FTNetwork path may have changed timeout/error behavior enough to expose an existing XFlow re-trigger path under slow BPDC responses.
 - Treat that REST link as a hypothesis requiring current logs, dates, versions, and timeout/error evidence before reporting it as root cause.
 - Jeff asked whether the REST switch could have impacted Adam's duplicate-page/account report, while noting he assumed they were unrelated. David initially answered that REST should only affect XFlow API transport, not page sequencing or submission count, and offered to trace REST-toggle state once Adam provided an exact date and flow/page.
 - After Adam provided more context, David updated Jeff that REST still should not be treated as a direct sequencing cause, but it cannot be fully ruled out because REST/FTNetwork timeout/error behavior might expose an existing XFlow retry or page-rebuild path under load.
 - Jeff asked for either a proposed response to Adam or a statement that more information is needed, suggesting Adam should open a Discourse ticket and attach the relevant evidence if more detail is required.
 ---
 ## Next Steps
 - Frame any update to Jeff as a context refresh: related prior investigation exists, but the previous iOS SDK-side review did not reproduce duplicate client-side `open-account` calls, so current logs/examples are needed before calling the new report the same issue or a regression.
 - If discussing REST impact, separate confirmed facts from hypothesis: confirmed prior non-prod iOS review did not reproduce duplicate client-side calls; current hypothesis is that REST timeout/error semantics may expose the existing XFlow model-state retry/rebuild path under production load.
 - Prepare a concise proposed response to Adam that asks for a Discourse ticket with exact incident date/time, affected flow/page, app/XFlowSDK version, REST state if known, user journey logs, and examples needed to compare against `PDSPS-29371` / `PDIAP-11561`.
--- a/project-knowledge/06-daily/2026-05-19.md
+++ b/project-knowledge/06-daily/2026-05-19.md
@@ -0,0 +1,21 @@
 ---
 type: daily
 project: fidelity
 date: 2026-05-19
 status: active
 focus: [pdiap-12284, duplicate-ao-report]
 work-items: [PDIAP-12284]
 blockers: []
 tags:
  - daily
  - fidelity
 updated: 2026-05-19
 ---
 # 2026-05-19
 ## Work Done
 - Sent the daily scrum update for today.
 - Proposed and sent the request for a Discourse ticket to Adam to obtain details (exact date/time, affected flow/page, build version, logs, and examples) for the duplicate account-opening report.
 - Confirmed that no new Discourse ticket with the `xflog` tag has been posted yet.
--- a/project-knowledge/06-daily/index.md
+++ b/project-knowledge/06-daily/index.md
@@ -32,6 +32,12 @@ Promote durable facts into `project-knowledge/01-current/`, `project-knowledge/0
 - [2026-05-05](2026-05-05.md)
 - [2026-05-07](2026-05-07.md)
 - [2026-05-08](2026-05-08.md)
 - [2026-05-11](2026-05-11.md)
 - [2026-05-12](2026-05-12.md)
 - [2026-05-13](2026-05-13.md)
 - [2026-05-14](2026-05-14.md)
 - [2026-05-18](2026-05-18.md)
 - [2026-05-19](2026-05-19.md)
 ---