Building an MCP Server for Malware Traffic Analysis

Malware traffic analysis often involves spending a significant amount of time working through HTTP sessions. Fiddler sessions, response headers, request bodies, redirect chains. The patterns are there, but finding them requires manually scanning hundreds of sessions and knowing exactly what to look for. I wanted to improve that workflow by connecting Fiddler directly to a large language model so I could ask questions about captured traffic in plain English.

The result is Fiddler-MCP-Server, an open source tool that bridges Fiddler's traffic capture to Google's Gemini model using the Model Context Protocol. An analyst can capture a browsing session and then ask things like "which sessions have suspicious redirect chains" or "show me the headers for session 47 and explain the caching behaviour" and get a structured analysis back in seconds.

The problem with manual traffic analysis

When you are triaging a potential malware delivery chain, you are typically looking at dozens or hundreds of HTTP sessions. You need to identify the initial redirect, track cookie propagation, spot TDS fingerprinting in query parameters, find the final payload delivery, and correlate it all against known patterns. That process is entirely manual in Fiddler. You click through sessions one at a time, inspect headers, decode bodies, and build a mental model of the infection chain.

LLMs are good at exactly this kind of pattern recognition when given structured data. The challenge is getting Fiddler session data into a format the model can consume, and doing it in a way that feels like a natural conversation rather than a copy paste exercise.

Architecture: four components, one data path

The system has four components. Each one does exactly one job, and data flows through them in sequence.

Fiddler captures traffic and publishes each completed session as JSON to a local HTTP endpoint. A CustomRules script fires after every response, serialises the session data, and POSTs it to the staging server.

static function McpTryPost(oSession: Session): void {
    try {
        if ((oSession.oResponse == null) || (oSession.responseCode == 0)) return;
        var json: String = McpBuildSimpleJson(oSession);
        McpHttpPost(json);
    } catch (e) {
        FiddlerApplication.Log.LogString("MCP error: " + e.Message);
    }
}

The key design decision here is pushing data on every response rather than batching. This means the staging server always has the most recent traffic available for queries, and the analyst does not need to manually export or refresh anything.

The staging server is a Flask application that buffers sessions in a ring buffer and exposes REST endpoints for different data views. Headers, response bodies, statistics, timelines. Each endpoint returns clean JSON that can be consumed by any HTTP client.

@self.app.route('/api/sessions/headers/<session_id>', methods=['GET'])
def get_session_headers(session_id):
    with self.session_lock:
        for session in reversed(self.live_sessions):
            if str(session.get('id', '')) == str(session_id):
                return jsonify({
                    "success": True,
                    "session_id": session_id,
                    "request_headers": session.get('requestHeaders', {}),
                    "response_headers": session.get('responseHeaders', {}),
                    "found": True
                })

Keeping it as plain REST means the staging server is debuggable with curl and testable independently of the MCP layer. That separation saved significant time during development.

The MCP bridge translates LLM tool invocations into REST calls against the staging server. Each MCP tool maps to one REST endpoint. When the model decides it needs session headers, it calls the tool, which calls the endpoint, which returns the JSON.

@mcp.tool()
def fiddler_mcp__session_headers(
    session_id: Annotated[str, Field(description="Session ID from live_sessions.")],
) -> Dict[str, Any]:
    """Fetch the HTTP headers for a captured session."""
    return client.get_session_headers(session_id=session_id)

I implemented seven tools in total covering session headers, response bodies, traffic statistics, timeline views, session search, risk assessment, and full session detail. Each follows the same pattern: thin MCP wrapper calling a REST helper that hits the staging server.

The Gemini client orchestrates the conversation. When a tool returns data, the client injects the JSON result into a new prompt and asks Gemini to analyse it in the context of the original question. Whatever the bridge returns becomes part of the prompt sent to the model.

tool_result = self.call_tool(tool_name, arguments)

self.conversation_history.append({
    "role": "tool",
    "tool": tool_name,
    "content": json.dumps(tool_result, indent=2)
})

analysis_prompt = f"""The tool '{tool_name}' returned this result:
{json.dumps(tool_result, indent=2)}
Please analyze this result and answer: "{user_query}" """

analysis_response = self.model.generate_content(analysis_prompt)

What this enables

With the bridge running, an analyst can capture traffic from a suspicious site and immediately start asking questions. "List all sessions that returned JavaScript content." "Show me the full redirect chain from session 12 to the final landing page." "Are there any sessions with unusual Set-Cookie headers that might indicate TDS fingerprinting?"

The model has access to the same data the analyst would manually inspect, but it can process all sessions simultaneously and surface patterns that might take a human analyst several minutes to find through manual inspection.

The conversation history means follow-up questions work naturally. Ask about a specific session, then ask "compare that to session 23" without re-specifying context. The model retains the prior tool results and builds on them.

Design decisions that mattered

Separating the staging server from the MCP bridge turned out to be the most important architectural choice. During development I could test data flow by curling the REST endpoints directly, without needing the LLM in the loop. When something went wrong, I could immediately isolate whether the problem was in data capture, staging, or the MCP layer.

Using a ring buffer for session storage keeps memory bounded. In a malware analysis session you might capture thousands of requests across dozens of sites. The ring buffer drops the oldest sessions automatically, keeping the system responsive without manual cleanup.

Making tool names explicit and descriptive, like fiddler_mcp__session_headers rather than generic names, helps the model select the right tool. The model sees the tool list and descriptions, and clear naming reduces incorrect tool selection significantly.

What comes next

The current implementation works with Gemini but the MCP layer is model-agnostic. Swapping to Claude or any other model that supports tool use requires only changing the client. The staging server and bridge remain identical.

I am working on adding detection rule matching directly into the bridge, so the model can not only describe what it sees in the traffic but also flag sessions that match known malware patterns from EKFiddle or YARA rule sets. That turns the system from an analysis assistant into a detection assistant.

The full source code is available on my GitHub.