From 0518df155bc4ae3b60befc7e5f1e579abff96d7a Mon Sep 17 00:00:00 2001 From: frgmt_ <105447989+frgmt0@users.noreply.github.com> Date: Mon, 6 Jan 2025 10:34:18 -0800 Subject: [PATCH 1/8] Update README.md I added some tests and more a "bloggy" approach to explaining how the reasoning works under the hood I guess? --- README.md | 111 ++++++++++++++++++++++++++++++++---------------------- 1 file changed, 67 insertions(+), 44 deletions(-) diff --git a/README.md b/README.md index 8d39f31..9b97f2f 100644 --- a/README.md +++ b/README.md @@ -1,14 +1,14 @@ # MCP Reasoner -A systematic reasoning MCP server implementation for Claude Desktop featuring both Beam Search and Monte Carlo Tree Search (MCTS) capabilities. +A reasoning implementation for Claude Desktop that lets you use both Beam Search and Monte Carlo Tree Search (MCTS). tbh this started as a way to see if we could make Claude even better at complex problem-solving... turns out we definitely can. ## Features -* Dual search strategies: - * Beam search with configurable width - * MCTS for complex decision spaces -* Thought scoring and evaluation -* Tree-based reasoning paths -* Statistical analysis of reasoning process -* MCP protocol compliance +- Two search strategies that you can switch between: + - Beam search (good for straightforward stuff) + - MCTS (when stuff gets complex) +- Tracks how good different reasoning paths are +- Maps out all the different ways Claude thinks through problems +- Analyzes how the reasoning process went +- Follows the MCP protocol (obviously) ## Installation ``` @@ -34,43 +34,66 @@ Add to Claude Desktop config: ## Search Strategies ### Beam Search -* Maintains fixed-width set of most promising paths -* Optimal for step-by-step reasoning -* Best for: Mathematical problems, logical puzzles +so beam search is pretty straightforward; it keeps track of the most promising solution paths as it goes. works really well when you've got problems with clear right answers, like math stuff or certain types of puzzles. + +interesting thing i found while testing: when i threw 50 puzzles from the Arc AGI benchmark at it, it only scored 24%. like, it wasn't completely lost, but... not great. here's how i tested it: + +- first, i'd check if claude actually got the pattern from the examples. if it seemed confused, i'd try to nudge it in the right direction (but dock points cause that's not ideal) +- then for the actual test cases, i had this whole scoring system: + - 5 points - nailed it + - 4 points - reasoning was solid but maybe i fucked up following the instructions + - 3 points - kinda got the pattern but didn't quite nail it + - 2 points - straight up failed + - 1 point - at least the initial reasoning wasn't completely off ### Monte Carlo Tree Search -* Simulation-based exploration of decision space -* Balances exploration and exploitation -* Best for: Complex problems with uncertain outcomes - -**Note:** Monte Carlo Tree Search allowed Claude to perform really well on the Arc AGI benchmark (scored 6/10 on the public test), whereas beam search yielded a (3/10) on the same puzzles. For super complex tasks, you'd want to direct Claude to utilize the MCTS strategy over the beam search. - -## Algorithm Details -1. Search Strategy Selection - * Beam Search: Evaluates and ranks multiple solution paths - * MCTS: Uses UCT for node selection and random rollouts -2. Thought Scoring Based On: - * Detail level - * Mathematical expressions - * Logical connectors - * Parent-child relationship strength -3. Process Management - * Tree-based state tracking - * Statistical analysis of reasoning - * Progress monitoring - -## Use Cases -* Mathematical problems -* Logical puzzles -* Step-by-step analysis -* Complex problem decomposition -* Decision tree exploration -* Strategy optimization - -## Future Implementations -* Implement New Algorithms - * Iterative Deepening Depth-First Search (IDDFS) - * Alpha-Beta Pruning +now THIS is where it gets interesting. MCTS absolutely crushed it compared to beam search - we're talking 48% on a different set of 50 Arc puzzles. yeah yeah, maybe they were easier puzzles (this isn't an official benchmark or anything), but doubling the performance? that's not just luck. + +the cool thing about MCTS is how it explores different possibilities. instead of just following what seems best right away, it tries out different paths to see what might work better in the long run. claude spent way more time understanding the examples before diving in, which probably helped a lot. + +## Why This Matters +adding structured reasoning to claude makes it way better... no der, right? but what's really interesting is how different methods work for different types of problems. + +why'd i test on puzzles instead of coding problems? honestly, claude's already proven itself on stuff like polyglot and codeforces. i wanted to see how it handled more abstract reasoning - the kind of stuff that's harder to measure. + +## What's Next +got some cool stuff in the pipeline: + +### IDDFS (Iterative Deepening Depth-First Search) +basically IDDFS is like... imagine you're exploring a maze but instead of going all in, you check everything 1 step deep, then 2 steps deep, and so on. + +pros: +- uses way less memory than regular DFS (which is huge for complex problems) +- guaranteed to find the shortest path to a solution +- works really well when you don't know how deep you need to go + +cons: +- might seem slower since you're re-exploring stuff +- not great if your solution is super deep in the tree +- can get stuck in loops if i'm not careful with the implementation, which is hard because im usually not careful hahaha + +working on `iddfs-exp` right now and the theory is that it might handle certain types of puzzles better than MCTS, especially ones where the solution path isn't too deep but needs systematic exploration. + +### Alpha-Beta Pruning +ok this one's a bit of an experiment and may not work... it's traditionally used in game trees (like chess engines) but i think it could be interesting for reasoning too. + +pros: +- super efficient at cutting off "dead end" reasoning paths +- works really well when you can evaluate how good a partial solution is +- could be amazing for problems where you need to consider opposing viewpoints + +cons: +* needs a good evaluation function (which is hard af for general reasoning) +* might miss some creative solutions by cutting off paths too early +* really depends on the order you explore things + +`alphabeta-exp` is definitely gonna be rough at first, but i'm curious to see if we can make it work for non-game-tree problems. might be especially interesting for scenarios where claude needs to reason about competing hypotheses. + +also working on letting claude control the MCTS sampling parameters directly - could lead to some interesting adaptive behavior where it adjusts its exploration strategy based on how well it's understanding the problem. + +will definitely share more test results as we implement these. if you're interested in helping test or have ideas for other algorithms that might work well, hit me up. + +-frgmt0, Jacck ## License -This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. \ No newline at end of file +This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. From 567ae04d3ef865e86d059084cb7c78c9230cb51d Mon Sep 17 00:00:00 2001 From: frgmt_ <105447989+frgmt0@users.noreply.github.com> Date: Mon, 6 Jan 2025 10:44:12 -0800 Subject: [PATCH 2/8] Update README.md added current version, so we can do "what changed" if needed --- README.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/README.md b/README.md index 9b97f2f..4a6c838 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,9 @@ # MCP Reasoner A reasoning implementation for Claude Desktop that lets you use both Beam Search and Monte Carlo Tree Search (MCTS). tbh this started as a way to see if we could make Claude even better at complex problem-solving... turns out we definitely can. +### Current Version: +1.0.0 + ## Features - Two search strategies that you can switch between: - Beam search (good for straightforward stuff) From e2ded5e2d8fab2e8b7841d49b1bd84fcd1738015 Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Mon, 6 Jan 2025 11:05:23 -0800 Subject: [PATCH 3/8] added beamWidth control and simulation control to both beam search and MCTS respectively --- dist/index.js | 26 ++++++++++++++++++++-- dist/reasoner.d.ts | 2 +- dist/reasoner.js | 32 ++++++++++++++++++++++----- dist/strategies/beam-search.d.ts | 2 +- dist/strategies/factory.d.ts | 2 +- dist/strategies/factory.js | 6 +++--- dist/strategies/mcts.d.ts | 4 ++-- dist/strategies/mcts.js | 5 +++-- dist/types.d.ts | 3 +++ dist/types.js | 3 ++- node_modules/.package-lock.json | 15 +++++++++++++ src/index.ts | 26 ++++++++++++++++++++-- src/reasoner.ts | 37 +++++++++++++++++++++++++++----- src/strategies/beam-search.ts | 2 +- src/strategies/factory.ts | 8 ++++--- src/strategies/mcts.ts | 6 ++++-- src/types.ts | 5 ++++- 17 files changed, 152 insertions(+), 32 deletions(-) mode change 100755 => 100644 dist/index.js diff --git a/dist/index.js b/dist/index.js old mode 100755 new mode 100644 index 6d4231a..d56c464 --- a/dist/index.js +++ b/dist/index.js @@ -22,7 +22,9 @@ function processInput(input) { thoughtNumber: Number(input.thoughtNumber || 0), totalThoughts: Number(input.totalThoughts || 0), nextThoughtNeeded: Boolean(input.nextThoughtNeeded), - strategyType: input.strategyType + strategyType: input.strategyType, + beamWidth: Number(input.beamWidth || 3), + numSimulations: Number(input.numSimulations || 50) }; // Validate if (!result.thought) { @@ -34,6 +36,12 @@ function processInput(input) { if (result.totalThoughts < 1) { throw new Error("totalThoughts must be >= 1"); } + if (result.beamWidth < 1 || result.beamWidth > 10) { + throw new Error("beamWidth must be between 1 and 10"); + } + if (result.numSimulations < 1 || result.numSimulations > 150) { + throw new Error("numSimulations must be between 1 and 150"); + } return result; } // Register the tool @@ -66,6 +74,18 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({ type: "string", enum: Object.values(ReasoningStrategy), description: "Reasoning strategy to use (beam_search or mcts)" + }, + beamWidth: { + type: "integer", + description: "Number of top paths to maintain (n-sampling). Defaults to 3 if not specified", + minimum: 1, + maximum: 10 + }, + numSimulations: { + type: "integer", + description: "Number of MCTS simulations to run. Defaults to 50 if not specified", + minimum: 1, + maximum: 150 } }, required: ["thought", "thoughtNumber", "totalThoughts", "nextThoughtNeeded"] @@ -92,7 +112,9 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => { thoughtNumber: step.thoughtNumber, totalThoughts: step.totalThoughts, nextThoughtNeeded: step.nextThoughtNeeded, - strategyType: step.strategyType + strategyType: step.strategyType, + beamWidth: step.beamWidth, + numSimulations: step.numSimulations }); // Get reasoning stats const stats = await reasoner.getStats(); diff --git a/dist/reasoner.d.ts b/dist/reasoner.d.ts index fcd0716..842568a 100644 --- a/dist/reasoner.d.ts +++ b/dist/reasoner.d.ts @@ -11,6 +11,6 @@ export declare class Reasoner { getCurrentStrategyName(): ReasoningStrategy; getBestPath(): Promise; clear(): Promise; - setStrategy(strategyType: ReasoningStrategy): void; + setStrategy(strategyType: ReasoningStrategy, beamWidth?: number, numSimulations?: number): void; getAvailableStrategies(): ReasoningStrategy[]; } diff --git a/dist/reasoner.js b/dist/reasoner.js index 77e6020..f597e67 100644 --- a/dist/reasoner.js +++ b/dist/reasoner.js @@ -6,8 +6,8 @@ export class Reasoner { this.stateManager = new StateManager(CONFIG.cacheSize); // Initialize available strategies this.strategies = new Map(); - this.strategies.set(ReasoningStrategy.BEAM_SEARCH, StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager)); - this.strategies.set(ReasoningStrategy.MCTS, StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager)); + this.strategies.set(ReasoningStrategy.BEAM_SEARCH, StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager, CONFIG.beamWidth)); + this.strategies.set(ReasoningStrategy.MCTS, StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager, undefined, CONFIG.numSimulations)); // Set default strategy const defaultStrategy = CONFIG.defaultStrategy; this.currentStrategy = this.strategies.get(defaultStrategy) || @@ -16,7 +16,19 @@ export class Reasoner { async processThought(request) { // Switch strategy if requested if (request.strategyType && this.strategies.has(request.strategyType)) { - this.currentStrategy = this.strategies.get(request.strategyType); + // Create new strategy instance with current beamWidth if specified + const strategyType = request.strategyType; + if (strategyType === ReasoningStrategy.BEAM_SEARCH) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, request.beamWidth); + this.strategies.set(strategyType, this.currentStrategy); + } + else if (strategyType === ReasoningStrategy.MCTS) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, request.numSimulations); + this.strategies.set(strategyType, this.currentStrategy); + } + else { + this.currentStrategy = this.strategies.get(strategyType); + } } // Process thought using current strategy const response = await this.currentStrategy.processThought(request); @@ -80,11 +92,21 @@ export class Reasoner { await strategy.clear(); } } - setStrategy(strategyType) { + setStrategy(strategyType, beamWidth, numSimulations) { if (!this.strategies.has(strategyType)) { throw new Error(`Unknown strategy type: ${strategyType}`); } - this.currentStrategy = this.strategies.get(strategyType); + if (strategyType === ReasoningStrategy.BEAM_SEARCH && beamWidth !== undefined) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, beamWidth); + this.strategies.set(strategyType, this.currentStrategy); + } + else if (strategyType === ReasoningStrategy.MCTS && numSimulations !== undefined) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); + this.strategies.set(strategyType, this.currentStrategy); + } + else { + this.currentStrategy = this.strategies.get(strategyType); + } } getAvailableStrategies() { return Array.from(this.strategies.keys()); diff --git a/dist/strategies/beam-search.d.ts b/dist/strategies/beam-search.d.ts index 943a03b..b7bad40 100644 --- a/dist/strategies/beam-search.d.ts +++ b/dist/strategies/beam-search.d.ts @@ -3,7 +3,7 @@ import { BaseStrategy } from './base.js'; export declare class BeamSearchStrategy extends BaseStrategy { private beamWidth; private beams; - constructor(stateManager: any, beamWidth?: 3); + constructor(stateManager: any, beamWidth?: number); processThought(request: ReasoningRequest): Promise; private calculatePossiblePaths; getBestPath(): Promise; diff --git a/dist/strategies/factory.d.ts b/dist/strategies/factory.d.ts index cf307e5..fc72493 100644 --- a/dist/strategies/factory.d.ts +++ b/dist/strategies/factory.d.ts @@ -5,5 +5,5 @@ export declare enum ReasoningStrategy { MCTS = "mcts" } export declare class StrategyFactory { - static createStrategy(type: ReasoningStrategy, stateManager: StateManager): BaseStrategy; + static createStrategy(type: ReasoningStrategy, stateManager: StateManager, beamWidth?: number, numSimulations?: number): BaseStrategy; } diff --git a/dist/strategies/factory.js b/dist/strategies/factory.js index c7efa23..626252b 100644 --- a/dist/strategies/factory.js +++ b/dist/strategies/factory.js @@ -6,12 +6,12 @@ export var ReasoningStrategy; ReasoningStrategy["MCTS"] = "mcts"; })(ReasoningStrategy || (ReasoningStrategy = {})); export class StrategyFactory { - static createStrategy(type, stateManager) { + static createStrategy(type, stateManager, beamWidth, numSimulations) { switch (type) { case ReasoningStrategy.BEAM_SEARCH: - return new BeamSearchStrategy(stateManager); + return new BeamSearchStrategy(stateManager, beamWidth); case ReasoningStrategy.MCTS: - return new MonteCarloTreeSearchStrategy(stateManager); + return new MonteCarloTreeSearchStrategy(stateManager, numSimulations); default: throw new Error(`Unknown strategy type: ${type}`); } diff --git a/dist/strategies/mcts.d.ts b/dist/strategies/mcts.d.ts index 27ce5c1..1531bea 100644 --- a/dist/strategies/mcts.d.ts +++ b/dist/strategies/mcts.d.ts @@ -3,9 +3,9 @@ import { BaseStrategy } from './base.js'; export declare class MonteCarloTreeSearchStrategy extends BaseStrategy { private readonly explorationConstant; private readonly simulationDepth; - private readonly numSimulations; + private numSimulations; private root; - constructor(stateManager: any); + constructor(stateManager: any, numSimulations?: number); processThought(request: ReasoningRequest): Promise; private runSimulations; private select; diff --git a/dist/strategies/mcts.js b/dist/strategies/mcts.js index 573015f..c0b1ab3 100644 --- a/dist/strategies/mcts.js +++ b/dist/strategies/mcts.js @@ -2,12 +2,13 @@ import { v4 as uuidv4 } from 'uuid'; import { CONFIG } from '../types.js'; import { BaseStrategy } from './base.js'; export class MonteCarloTreeSearchStrategy extends BaseStrategy { - constructor(stateManager) { + constructor(stateManager, numSimulations = CONFIG.numSimulations) { super(stateManager); this.explorationConstant = Math.sqrt(2); this.simulationDepth = CONFIG.maxDepth; - this.numSimulations = 50; this.root = null; + // Ensure numSimulations is within reasonable bounds + this.numSimulations = Math.max(1, Math.min(150, numSimulations)); } async processThought(request) { const nodeId = uuidv4(); diff --git a/dist/types.d.ts b/dist/types.d.ts index e437186..8879d57 100644 --- a/dist/types.d.ts +++ b/dist/types.d.ts @@ -14,6 +14,8 @@ export interface ReasoningRequest { nextThoughtNeeded: boolean; parentId?: string; strategyType?: string; + beamWidth?: number; + numSimulations?: number; } export interface ReasoningResponse { nodeId: string; @@ -40,4 +42,5 @@ export declare const CONFIG: { readonly temperature: 0.7; readonly cacheSize: 1000; readonly defaultStrategy: "beam_search"; + readonly numSimulations: 50; }; diff --git a/dist/types.js b/dist/types.js index 96ac36b..8e81f70 100644 --- a/dist/types.js +++ b/dist/types.js @@ -4,5 +4,6 @@ export const CONFIG = { minScore: 0.5, // Threshold for path viability temperature: 0.7, // For thought diversity cacheSize: 1000, // LRU cache size - defaultStrategy: 'beam_search' // Default reasoning strategy + defaultStrategy: 'beam_search', // Default reasoning strategy + numSimulations: 50 // Default number of MCTS simulations }; diff --git a/node_modules/.package-lock.json b/node_modules/.package-lock.json index 5a53277..0d64744 100644 --- a/node_modules/.package-lock.json +++ b/node_modules/.package-lock.json @@ -2936,6 +2936,21 @@ "integrity": "sha512-OO0pH2lK6a0hZnAdau5ItzHPI6pUlvI7jMVnxUQRtw4owF2wk8lOSabtGDCTP4Ggrg2MbGnWO9X8K1t4+fGMDw==", "dev": true }, + "node_modules/fsevents": { + "version": "2.3.3", + "resolved": "https://registry.npmjs.org/fsevents/-/fsevents-2.3.3.tgz", + "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^8.16.0 || ^10.6.0 || >=11.0.0" + } + }, "node_modules/function-bind": { "version": "1.1.2", "resolved": "https://registry.npmjs.org/function-bind/-/function-bind-1.1.2.tgz", diff --git a/src/index.ts b/src/index.ts index e3b4c21..6a0f63a 100644 --- a/src/index.ts +++ b/src/index.ts @@ -28,7 +28,9 @@ function processInput(input: any) { thoughtNumber: Number(input.thoughtNumber || 0), totalThoughts: Number(input.totalThoughts || 0), nextThoughtNeeded: Boolean(input.nextThoughtNeeded), - strategyType: input.strategyType as ReasoningStrategy | undefined + strategyType: input.strategyType as ReasoningStrategy | undefined, + beamWidth: Number(input.beamWidth || 3), + numSimulations: Number(input.numSimulations || 50) }; // Validate @@ -41,6 +43,12 @@ function processInput(input: any) { if (result.totalThoughts < 1) { throw new Error("totalThoughts must be >= 1"); } + if (result.beamWidth < 1 || result.beamWidth > 10) { + throw new Error("beamWidth must be between 1 and 10"); + } + if (result.numSimulations < 1 || result.numSimulations > 150) { + throw new Error("numSimulations must be between 1 and 150"); + } return result; } @@ -75,6 +83,18 @@ server.setRequestHandler(ListToolsRequestSchema, async () => ({ type: "string", enum: Object.values(ReasoningStrategy), description: "Reasoning strategy to use (beam_search or mcts)" + }, + beamWidth: { + type: "integer", + description: "Number of top paths to maintain (n-sampling). Defaults to 3 if not specified", + minimum: 1, + maximum: 10 + }, + numSimulations: { + type: "integer", + description: "Number of MCTS simulations to run. Defaults to 50 if not specified", + minimum: 1, + maximum: 150 } }, required: ["thought", "thoughtNumber", "totalThoughts", "nextThoughtNeeded"] @@ -104,7 +124,9 @@ server.setRequestHandler(CallToolRequestSchema, async (request) => { thoughtNumber: step.thoughtNumber, totalThoughts: step.totalThoughts, nextThoughtNeeded: step.nextThoughtNeeded, - strategyType: step.strategyType + strategyType: step.strategyType, + beamWidth: step.beamWidth, + numSimulations: step.numSimulations }); // Get reasoning stats diff --git a/src/reasoner.ts b/src/reasoner.ts index 773cc8b..df13e67 100644 --- a/src/reasoner.ts +++ b/src/reasoner.ts @@ -15,11 +15,11 @@ export class Reasoner { this.strategies = new Map(); this.strategies.set( ReasoningStrategy.BEAM_SEARCH, - StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager) + StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager, CONFIG.beamWidth) ); this.strategies.set( ReasoningStrategy.MCTS, - StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager) + StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager, undefined, CONFIG.numSimulations) ); // Set default strategy @@ -31,7 +31,26 @@ export class Reasoner { public async processThought(request: ReasoningRequest): Promise { // Switch strategy if requested if (request.strategyType && this.strategies.has(request.strategyType as ReasoningStrategy)) { - this.currentStrategy = this.strategies.get(request.strategyType as ReasoningStrategy)!; + // Create new strategy instance with current beamWidth if specified + const strategyType = request.strategyType as ReasoningStrategy; + if (strategyType === ReasoningStrategy.BEAM_SEARCH) { + this.currentStrategy = StrategyFactory.createStrategy( + strategyType, + this.stateManager, + request.beamWidth + ); + this.strategies.set(strategyType, this.currentStrategy); + } else if (strategyType === ReasoningStrategy.MCTS) { + this.currentStrategy = StrategyFactory.createStrategy( + strategyType, + this.stateManager, + undefined, + request.numSimulations + ); + this.strategies.set(strategyType, this.currentStrategy); + } else { + this.currentStrategy = this.strategies.get(strategyType)!; + } } // Process thought using current strategy @@ -109,11 +128,19 @@ export class Reasoner { } } - public setStrategy(strategyType: ReasoningStrategy): void { + public setStrategy(strategyType: ReasoningStrategy, beamWidth?: number, numSimulations?: number): void { if (!this.strategies.has(strategyType)) { throw new Error(`Unknown strategy type: ${strategyType}`); } - this.currentStrategy = this.strategies.get(strategyType)!; + if (strategyType === ReasoningStrategy.BEAM_SEARCH && beamWidth !== undefined) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, beamWidth); + this.strategies.set(strategyType, this.currentStrategy); + } else if (strategyType === ReasoningStrategy.MCTS && numSimulations !== undefined) { + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); + this.strategies.set(strategyType, this.currentStrategy); + } else { + this.currentStrategy = this.strategies.get(strategyType)!; + } } public getAvailableStrategies(): ReasoningStrategy[] { diff --git a/src/strategies/beam-search.ts b/src/strategies/beam-search.ts index 8c4ff14..9bbfc51 100644 --- a/src/strategies/beam-search.ts +++ b/src/strategies/beam-search.ts @@ -6,7 +6,7 @@ export class BeamSearchStrategy extends BaseStrategy { private beamWidth: number; private beams: Map; - constructor(stateManager: any, beamWidth = CONFIG.beamWidth) { + constructor(stateManager: any, beamWidth: number = CONFIG.beamWidth) { super(stateManager); this.beamWidth = beamWidth; this.beams = new Map(); diff --git a/src/strategies/factory.ts b/src/strategies/factory.ts index e493727..eaf58da 100644 --- a/src/strategies/factory.ts +++ b/src/strategies/factory.ts @@ -11,13 +11,15 @@ export enum ReasoningStrategy { export class StrategyFactory { static createStrategy( type: ReasoningStrategy, - stateManager: StateManager + stateManager: StateManager, + beamWidth?: number, + numSimulations?: number ): BaseStrategy { switch (type) { case ReasoningStrategy.BEAM_SEARCH: - return new BeamSearchStrategy(stateManager); + return new BeamSearchStrategy(stateManager, beamWidth); case ReasoningStrategy.MCTS: - return new MonteCarloTreeSearchStrategy(stateManager); + return new MonteCarloTreeSearchStrategy(stateManager, numSimulations); default: throw new Error(`Unknown strategy type: ${type}`); } diff --git a/src/strategies/mcts.ts b/src/strategies/mcts.ts index 7d155ee..2e1a383 100644 --- a/src/strategies/mcts.ts +++ b/src/strategies/mcts.ts @@ -11,11 +11,13 @@ interface MCTSNode extends ThoughtNode { export class MonteCarloTreeSearchStrategy extends BaseStrategy { private readonly explorationConstant = Math.sqrt(2); private readonly simulationDepth = CONFIG.maxDepth; - private readonly numSimulations = 50; + private numSimulations: number; private root: MCTSNode | null = null; - constructor(stateManager: any) { + constructor(stateManager: any, numSimulations: number = CONFIG.numSimulations) { super(stateManager); + // Ensure numSimulations is within reasonable bounds + this.numSimulations = Math.max(1, Math.min(150, numSimulations)); } public async processThought(request: ReasoningRequest): Promise { diff --git a/src/types.ts b/src/types.ts index ac6fe29..f0cffec 100644 --- a/src/types.ts +++ b/src/types.ts @@ -15,6 +15,8 @@ export interface ReasoningRequest { nextThoughtNeeded: boolean; parentId?: string; // For branching thoughts strategyType?: string; // Strategy to use for reasoning + beamWidth?: number; // Number of top paths to maintain (n-sampling) + numSimulations?: number; // Number of MCTS simulations to run } export interface ReasoningResponse { @@ -43,5 +45,6 @@ export const CONFIG = { minScore: 0.5, // Threshold for path viability temperature: 0.7, // For thought diversity cacheSize: 1000, // LRU cache size - defaultStrategy: 'beam_search' // Default reasoning strategy + defaultStrategy: 'beam_search', // Default reasoning strategy + numSimulations: 50 // Default number of MCTS simulations } as const; From b31050c0440632fae4d3154e94ea6fb630f4c8ca Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Mon, 6 Jan 2025 11:11:36 -0800 Subject: [PATCH 4/8] updated readme --- README.md | 14 +++++++++++++- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 4a6c838..5a8908c 100644 --- a/README.md +++ b/README.md @@ -2,7 +2,19 @@ A reasoning implementation for Claude Desktop that lets you use both Beam Search and Monte Carlo Tree Search (MCTS). tbh this started as a way to see if we could make Claude even better at complex problem-solving... turns out we definitely can. ### Current Version: -1.0.0 +**v1.1.0** + +#### What's New: + +> Added model control over search parameters: +> +> beamWidth - lets Claude adjust how many paths to track (1-10) +> numSimulations - fine-tune MCTS simulation count (1-150) + +#### Previous Versions: +**v1.0.0** + +> Initial release with base Beam Search and MCTS implementations ## Features - Two search strategies that you can switch between: From d9cd8a2eed853394be18ae3eee550ec4ffb63f7f Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Mon, 6 Jan 2025 11:12:08 -0800 Subject: [PATCH 5/8] updated version --- node_modules/.package-lock.json | 2 +- package-lock.json | 4 ++-- package.json | 2 +- 3 files changed, 4 insertions(+), 4 deletions(-) diff --git a/node_modules/.package-lock.json b/node_modules/.package-lock.json index 0d64744..61aa393 100644 --- a/node_modules/.package-lock.json +++ b/node_modules/.package-lock.json @@ -1,6 +1,6 @@ { "name": "mcp-reasoner", - "version": "1.0.0", + "version": "1.1.0", "lockfileVersion": 3, "requires": true, "packages": { diff --git a/package-lock.json b/package-lock.json index 16f7f8b..e967da6 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "mcp-reasoner", - "version": "1.0.0", + "version": "1.1.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "mcp-reasoner", - "version": "1.0.0", + "version": "1.1.0", "license": "MIT", "dependencies": { "@modelcontextprotocol/sdk": "*", diff --git a/package.json b/package.json index 17926b8..07a5397 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "mcp-reasoner", - "version": "1.0.0", + "version": "1.1.0", "description": "MCP Reasoner with multiple reasoning strategies including Beam Search and Monte Carlo Tree Search", "type": "module", "main": "dist/index.js", From 28c60fe875a7466278d66ad26c055cd3868eb03c Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Tue, 7 Jan 2025 17:44:19 -0800 Subject: [PATCH 6/8] Major Changes: added 2 experimental reasoning algorithms cleaned up codebase to reduce redundant files while retaining the functionality changed metrics to include new experimental reasoning algorithms when they are used Updated README.md Updated LICENSE to 2025 --- README.md | 97 ++--- dist/engine.d.ts | 24 -- dist/engine.js | 78 ---- dist/reasoner.js | 29 +- dist/strategies/factory.d.ts | 4 +- dist/strategies/factory.js | 8 + index.js | 166 -------- package.json | 2 +- src/engine.ts | 99 ----- src/reasoner.ts | 36 +- src/strategies/experiments/mcts-002-alpha.ts | 391 ++++++++++++++++++ .../experiments/mcts-002alt-alpha.ts | 386 +++++++++++++++++ src/strategies/factory.ts | 10 +- state-manager.js | 40 -- tot-engine.js | 56 --- 15 files changed, 872 insertions(+), 554 deletions(-) delete mode 100644 dist/engine.d.ts delete mode 100644 dist/engine.js delete mode 100644 index.js delete mode 100644 src/engine.ts create mode 100644 src/strategies/experiments/mcts-002-alpha.ts create mode 100644 src/strategies/experiments/mcts-002alt-alpha.ts delete mode 100644 state-manager.js delete mode 100644 tot-engine.js diff --git a/README.md b/README.md index 5a8908c..24e71a8 100644 --- a/README.md +++ b/README.md @@ -2,24 +2,38 @@ A reasoning implementation for Claude Desktop that lets you use both Beam Search and Monte Carlo Tree Search (MCTS). tbh this started as a way to see if we could make Claude even better at complex problem-solving... turns out we definitely can. ### Current Version: -**v1.1.0** +**v2.0.0** #### What's New: +> Added 2 Experimental Reasoning Algorithms: +> - `mcts-002-alpha` +> - Uses the A* Search Method along with an early *alpha* implementation of a Policy Simulation Layer +> - Also includes an early *alpha* implementation of Adaptive Exploration Simulator & Outcome Based Reasoning Simulator +> *NOTE* the implementation of these alpha simulators is not complete and is subject to change +> - `mcts-002alt-alpha` +> - Uses the Bidirectional Search Method along with an early *alpha* implementation of a Policy Simulation Layer +> - Also includes an early *alpha* implementation of Adaptive Exploration Simulator & Outcome Based Reasoning Simulator +> *NOTE* the implementation of these alpha simulators is not complete and is subject to change +> +What happened to `mcts-001-alpha` and `mcts-001alt-alpha`? +> Quite simply: It was useless and near similar to the base `mcts` method. After initial testing the results yielded in basic thought processes was near similar showing that simply adding policy simulation may not have an effect. + +So why add Polciy Simulation Layer now? +> Well i think its important to incorporate Policy AND Search in tandem as that is how most of the algorithms implement them. + +#### Previous Versions: +**v1.1.0** + > Added model control over search parameters: > > beamWidth - lets Claude adjust how many paths to track (1-10) > numSimulations - fine-tune MCTS simulation count (1-150) -#### Previous Versions: -**v1.0.0** - -> Initial release with base Beam Search and MCTS implementations - ## Features - Two search strategies that you can switch between: - Beam search (good for straightforward stuff) - - MCTS (when stuff gets complex) + - MCTS (when stuff gets complex) with alpha variations (see above) - Tracks how good different reasoning paths are - Maps out all the different ways Claude thinks through problems - Analyzes how the reasoning process went @@ -27,7 +41,12 @@ A reasoning implementation for Claude Desktop that lets you use both Beam Search ## Installation ``` +git clone https://github.com/frgmt0/mcp-reasoner.git + +OR clone the original: + git clone https://github.com/Jacck/mcp-reasoner.git + cd mcp-reasoner npm install npm run build @@ -46,69 +65,23 @@ Add to Claude Desktop config: } ``` -## Search Strategies - -### Beam Search -so beam search is pretty straightforward; it keeps track of the most promising solution paths as it goes. works really well when you've got problems with clear right answers, like math stuff or certain types of puzzles. - -interesting thing i found while testing: when i threw 50 puzzles from the Arc AGI benchmark at it, it only scored 24%. like, it wasn't completely lost, but... not great. here's how i tested it: - -- first, i'd check if claude actually got the pattern from the examples. if it seemed confused, i'd try to nudge it in the right direction (but dock points cause that's not ideal) -- then for the actual test cases, i had this whole scoring system: - - 5 points - nailed it - - 4 points - reasoning was solid but maybe i fucked up following the instructions - - 3 points - kinda got the pattern but didn't quite nail it - - 2 points - straight up failed - - 1 point - at least the initial reasoning wasn't completely off - -### Monte Carlo Tree Search -now THIS is where it gets interesting. MCTS absolutely crushed it compared to beam search - we're talking 48% on a different set of 50 Arc puzzles. yeah yeah, maybe they were easier puzzles (this isn't an official benchmark or anything), but doubling the performance? that's not just luck. - -the cool thing about MCTS is how it explores different possibilities. instead of just following what seems best right away, it tries out different paths to see what might work better in the long run. claude spent way more time understanding the examples before diving in, which probably helped a lot. - -## Why This Matters -adding structured reasoning to claude makes it way better... no der, right? but what's really interesting is how different methods work for different types of problems. - -why'd i test on puzzles instead of coding problems? honestly, claude's already proven itself on stuff like polyglot and codeforces. i wanted to see how it handled more abstract reasoning - the kind of stuff that's harder to measure. - -## What's Next -got some cool stuff in the pipeline: - -### IDDFS (Iterative Deepening Depth-First Search) -basically IDDFS is like... imagine you're exploring a maze but instead of going all in, you check everything 1 step deep, then 2 steps deep, and so on. - -pros: -- uses way less memory than regular DFS (which is huge for complex problems) -- guaranteed to find the shortest path to a solution -- works really well when you don't know how deep you need to go - -cons: -- might seem slower since you're re-exploring stuff -- not great if your solution is super deep in the tree -- can get stuck in loops if i'm not careful with the implementation, which is hard because im usually not careful hahaha +## Testing -working on `iddfs-exp` right now and the theory is that it might handle certain types of puzzles better than MCTS, especially ones where the solution path isn't too deep but needs systematic exploration. +[More Testing Coming Soon] -### Alpha-Beta Pruning -ok this one's a bit of an experiment and may not work... it's traditionally used in game trees (like chess engines) but i think it could be interesting for reasoning too. +## Benchmarks -pros: -- super efficient at cutting off "dead end" reasoning paths -- works really well when you can evaluate how good a partial solution is -- could be amazing for problems where you need to consider opposing viewpoints +[Benchmarking will be added soon] -cons: -* needs a good evaluation function (which is hard af for general reasoning) -* might miss some creative solutions by cutting off paths too early -* really depends on the order you explore things +Key Benchmarks to test against: -`alphabeta-exp` is definitely gonna be rough at first, but i'm curious to see if we can make it work for non-game-tree problems. might be especially interesting for scenarios where claude needs to reason about competing hypotheses. +- MATH500 -also working on letting claude control the MCTS sampling parameters directly - could lead to some interesting adaptive behavior where it adjusts its exploration strategy based on how well it's understanding the problem. +- GPQA-Diamond -will definitely share more test results as we implement these. if you're interested in helping test or have ideas for other algorithms that might work well, hit me up. +- GMSK8 --frgmt0, Jacck +- Maybe Polyglot &/or SWE-Bench ## License This project is licensed under the MIT License - see the [LICENSE](LICENSE) file for details. diff --git a/dist/engine.d.ts b/dist/engine.d.ts deleted file mode 100644 index c0cecf3..0000000 --- a/dist/engine.d.ts +++ /dev/null @@ -1,24 +0,0 @@ -interface ThoughtNode { - thought: string; - thoughtNumber: number; - totalThoughts: number; - nextThoughtNeeded: boolean; - score: number; - children: ThoughtNode[]; - parent?: ThoughtNode; -} -export declare class ReasoningEngine { - private thoughts; - private readonly beamWidth; - private readonly minScore; - private evaluateThought; - addThought(thought: string, thoughtNumber: number, totalThoughts: number, nextThoughtNeeded: boolean): ThoughtNode; - getBestPath(): ThoughtNode[]; - getStats(): { - totalThoughts: number; - bestScore: number; - averageScore: number; - branchingFactor: number; - }; -} -export {}; diff --git a/dist/engine.js b/dist/engine.js deleted file mode 100644 index a28e926..0000000 --- a/dist/engine.js +++ /dev/null @@ -1,78 +0,0 @@ -export class ReasoningEngine { - constructor() { - this.thoughts = []; - this.beamWidth = 3; - this.minScore = 0.5; - } - evaluateThought(thought) { - // Simple scoring based on: - // - Length (indicating detail) - // - Contains mathematical expressions - // - Contains logical connectors - let score = 0; - // Length score - score += Math.min(thought.thought.length / 100, 0.4); - // Mathematical expressions - if (/[+\-*/=<>]/.test(thought.thought)) { - score += 0.2; - } - // Logical connectors - if (/\b(therefore|because|if|then|thus|hence|so)\b/i.test(thought.thought)) { - score += 0.2; - } - return score; - } - addThought(thought, thoughtNumber, totalThoughts, nextThoughtNeeded) { - const node = { - thought, - thoughtNumber, - totalThoughts, - nextThoughtNeeded, - score: 0, - children: [] - }; - // Evaluate thought - node.score = this.evaluateThought(node); - // Add to parent if this is not the first thought - if (this.thoughts.length > 0) { - const potentialParents = this.thoughts.filter(t => t.thoughtNumber === thoughtNumber - 1); - if (potentialParents.length > 0) { - // Find best parent based on score - const bestParent = potentialParents.reduce((a, b) => a.score > b.score ? a : b); - node.parent = bestParent; - bestParent.children.push(node); - } - } - // Keep beam width best thoughts at each level - const sameLevel = this.thoughts.filter(t => t.thoughtNumber === thoughtNumber); - sameLevel.push(node); - if (sameLevel.length > this.beamWidth) { - sameLevel.sort((a, b) => b.score - a.score); - sameLevel.splice(this.beamWidth); - } - this.thoughts.push(node); - return node; - } - getBestPath() { - const bestLast = [...this.thoughts] - .filter(t => !t.nextThoughtNeeded) - .sort((a, b) => b.score - a.score)[0]; - if (!bestLast) - return []; - const path = [bestLast]; - let current = bestLast; - while (current.parent) { - path.unshift(current.parent); - current = current.parent; - } - return path; - } - getStats() { - return { - totalThoughts: this.thoughts.length, - bestScore: Math.max(...this.thoughts.map(t => t.score)), - averageScore: this.thoughts.reduce((a, b) => a + b.score, 0) / this.thoughts.length, - branchingFactor: this.thoughts.reduce((a, b) => a + b.children.length, 0) / this.thoughts.length - }; - } -} diff --git a/dist/reasoner.js b/dist/reasoner.js index f597e67..02adef1 100644 --- a/dist/reasoner.js +++ b/dist/reasoner.js @@ -6,8 +6,12 @@ export class Reasoner { this.stateManager = new StateManager(CONFIG.cacheSize); // Initialize available strategies this.strategies = new Map(); + // Initialize base strategies this.strategies.set(ReasoningStrategy.BEAM_SEARCH, StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager, CONFIG.beamWidth)); this.strategies.set(ReasoningStrategy.MCTS, StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager, undefined, CONFIG.numSimulations)); + // Initialize experimental MCTS strategies + this.strategies.set(ReasoningStrategy.MCTS_002_ALPHA, StrategyFactory.createStrategy(ReasoningStrategy.MCTS_002_ALPHA, this.stateManager, undefined, CONFIG.numSimulations)); + this.strategies.set(ReasoningStrategy.MCTS_002_ALT_ALPHA, StrategyFactory.createStrategy(ReasoningStrategy.MCTS_002_ALT_ALPHA, this.stateManager, undefined, CONFIG.numSimulations)); // Set default strategy const defaultStrategy = CONFIG.defaultStrategy; this.currentStrategy = this.strategies.get(defaultStrategy) || @@ -16,19 +20,17 @@ export class Reasoner { async processThought(request) { // Switch strategy if requested if (request.strategyType && this.strategies.has(request.strategyType)) { - // Create new strategy instance with current beamWidth if specified const strategyType = request.strategyType; + // Create new strategy instance with appropriate parameters if (strategyType === ReasoningStrategy.BEAM_SEARCH) { this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, request.beamWidth); - this.strategies.set(strategyType, this.currentStrategy); - } - else if (strategyType === ReasoningStrategy.MCTS) { - this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, request.numSimulations); - this.strategies.set(strategyType, this.currentStrategy); } else { - this.currentStrategy = this.strategies.get(strategyType); + // All MCTS variants (base and experimental) use numSimulations + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, request.numSimulations); } + // Update strategy in map + this.strategies.set(strategyType, this.currentStrategy); } // Process thought using current strategy const response = await this.currentStrategy.processThought(request); @@ -96,17 +98,16 @@ export class Reasoner { if (!this.strategies.has(strategyType)) { throw new Error(`Unknown strategy type: ${strategyType}`); } - if (strategyType === ReasoningStrategy.BEAM_SEARCH && beamWidth !== undefined) { + // Create new strategy instance with appropriate parameters + if (strategyType === ReasoningStrategy.BEAM_SEARCH) { this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, beamWidth); - this.strategies.set(strategyType, this.currentStrategy); - } - else if (strategyType === ReasoningStrategy.MCTS && numSimulations !== undefined) { - this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); - this.strategies.set(strategyType, this.currentStrategy); } else { - this.currentStrategy = this.strategies.get(strategyType); + // All MCTS variants (base and experimental) use numSimulations + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); } + // Update strategy in map + this.strategies.set(strategyType, this.currentStrategy); } getAvailableStrategies() { return Array.from(this.strategies.keys()); diff --git a/dist/strategies/factory.d.ts b/dist/strategies/factory.d.ts index fc72493..566dcd2 100644 --- a/dist/strategies/factory.d.ts +++ b/dist/strategies/factory.d.ts @@ -2,7 +2,9 @@ import { StateManager } from '../state.js'; import { BaseStrategy } from './base.js'; export declare enum ReasoningStrategy { BEAM_SEARCH = "beam_search", - MCTS = "mcts" + MCTS = "mcts", + MCTS_002_ALPHA = "mcts_002_alpha", + MCTS_002_ALT_ALPHA = "mcts_002_alt_alpha" } export declare class StrategyFactory { static createStrategy(type: ReasoningStrategy, stateManager: StateManager, beamWidth?: number, numSimulations?: number): BaseStrategy; diff --git a/dist/strategies/factory.js b/dist/strategies/factory.js index 626252b..5532281 100644 --- a/dist/strategies/factory.js +++ b/dist/strategies/factory.js @@ -1,9 +1,13 @@ import { BeamSearchStrategy } from './beam-search.js'; import { MonteCarloTreeSearchStrategy } from './mcts.js'; +import { MCTS002AlphaStrategy } from './experiments/mcts-002-alpha.js'; +import { MCTS002AltAlphaStrategy } from './experiments/mcts-002alt-alpha.js'; export var ReasoningStrategy; (function (ReasoningStrategy) { ReasoningStrategy["BEAM_SEARCH"] = "beam_search"; ReasoningStrategy["MCTS"] = "mcts"; + ReasoningStrategy["MCTS_002_ALPHA"] = "mcts_002_alpha"; + ReasoningStrategy["MCTS_002_ALT_ALPHA"] = "mcts_002_alt_alpha"; })(ReasoningStrategy || (ReasoningStrategy = {})); export class StrategyFactory { static createStrategy(type, stateManager, beamWidth, numSimulations) { @@ -12,6 +16,10 @@ export class StrategyFactory { return new BeamSearchStrategy(stateManager, beamWidth); case ReasoningStrategy.MCTS: return new MonteCarloTreeSearchStrategy(stateManager, numSimulations); + case ReasoningStrategy.MCTS_002_ALPHA: + return new MCTS002AlphaStrategy(stateManager, numSimulations); + case ReasoningStrategy.MCTS_002_ALT_ALPHA: + return new MCTS002AltAlphaStrategy(stateManager, numSimulations); default: throw new Error(`Unknown strategy type: ${type}`); } diff --git a/index.js b/index.js deleted file mode 100644 index 6278e59..0000000 --- a/index.js +++ /dev/null @@ -1,166 +0,0 @@ -#!/usr/bin/env node - -import { Server } from "@modelcontextprotocol/sdk/server/index.js"; -import { StdioServerTransport } from "@modelcontextprotocol/sdk/server/stdio.js"; -import { - CallToolRequestSchema, - ListToolsRequestSchema, - Tool, -} from "@modelcontextprotocol/sdk/types.js"; -import chalk from 'chalk'; - -class ReasonerServer { - constructor() { - this.thoughts = []; - this.branches = {}; - } - - validateInput(input) { - const data = input; - - if (!data.thought || typeof data.thought !== 'string') { - throw new Error('Invalid thought: must be a string'); - } - if (!data.thoughtNumber || typeof data.thoughtNumber !== 'number') { - throw new Error('Invalid thoughtNumber: must be a number'); - } - if (!data.totalThoughts || typeof data.totalThoughts !== 'number') { - throw new Error('Invalid totalThoughts: must be a number'); - } - if (typeof data.nextThoughtNeeded !== 'boolean') { - throw new Error('Invalid nextThoughtNeeded: must be a boolean'); - } - return true; - } - - formatThought(thoughtData) { - const { thoughtNumber, totalThoughts, thought } = thoughtData; - const prefix = chalk.blue('🤔 Reasoning'); - const header = `${prefix} ${thoughtNumber}/${totalThoughts}`; - const border = '─'.repeat(Math.max(header.length, thought.length) + 4); - - return ` -┌${border}┐ -│ ${header.padEnd(border.length - 2)} │ -├${border}┤ -│ ${thought.padEnd(border.length - 2)} │ -└${border}┘`; - } - - processThought(input) { - try { - this.validateInput(input); - - // Adjust total thoughts if needed - if (input.thoughtNumber > input.totalThoughts) { - input.totalThoughts = input.thoughtNumber; - } - - // Add to history - this.thoughts.push(input); - - // Format and display - const formattedThought = this.formatThought(input); - console.error(formattedThought); - - return { - content: [{ - type: "text", - text: JSON.stringify({ - thoughtNumber: input.thoughtNumber, - totalThoughts: input.totalThoughts, - nextThoughtNeeded: input.nextThoughtNeeded, - thoughtCount: this.thoughts.length - }, null, 2) - }] - }; - } catch (error) { - return { - content: [{ - type: "text", - text: JSON.stringify({ - error: error.message, - status: 'failed' - }, null, 2) - }], - isError: true - }; - } - } -} - -const REASONER_TOOL = { - name: "reasoner", - description: "A reasoning engine that helps break down and analyze problems step by step", - inputSchema: { - type: "object", - properties: { - thought: { - type: "string", - description: "The current reasoning step" - }, - thoughtNumber: { - type: "integer", - description: "Current step number", - minimum: 1 - }, - totalThoughts: { - type: "integer", - description: "Estimated total steps needed", - minimum: 1 - }, - nextThoughtNeeded: { - type: "boolean", - description: "Whether another step is needed" - } - }, - required: ["thought", "thoughtNumber", "totalThoughts", "nextThoughtNeeded"] - } -}; - -// Initialize MCP server -const server = new Server( - { - name: "reasoner-server", - version: "1.0.0", - }, - { - capabilities: { - tools: {}, - }, - } -); - -const reasonerServer = new ReasonerServer(); - -// Register tool listing handler -server.setRequestHandler(ListToolsRequestSchema, async () => ({ - tools: [REASONER_TOOL], -})); - -// Register tool execution handler -server.setRequestHandler(CallToolRequestSchema, async (request) => { - if (request.params.name === "reasoner") { - return reasonerServer.processThought(request.params.arguments); - } - - return { - content: [{ - type: "text", - text: `Unknown tool: ${request.params.name}` - }], - isError: true - }; -}); - -// Start the server -async function runServer() { - const transport = new StdioServerTransport(); - await server.connect(transport); - console.error("Reasoner MCP Server running on stdio"); -} - -runServer().catch((error) => { - console.error("Fatal error running server:", error); - process.exit(1); -}); \ No newline at end of file diff --git a/package.json b/package.json index 07a5397..55d0d77 100644 --- a/package.json +++ b/package.json @@ -1,6 +1,6 @@ { "name": "mcp-reasoner", - "version": "1.1.0", + "version": "2.0.0", "description": "MCP Reasoner with multiple reasoning strategies including Beam Search and Monte Carlo Tree Search", "type": "module", "main": "dist/index.js", diff --git a/src/engine.ts b/src/engine.ts deleted file mode 100644 index df98e7e..0000000 --- a/src/engine.ts +++ /dev/null @@ -1,99 +0,0 @@ -interface ThoughtNode { - thought: string; - thoughtNumber: number; - totalThoughts: number; - nextThoughtNeeded: boolean; - score: number; - children: ThoughtNode[]; - parent?: ThoughtNode; -} - -export class ReasoningEngine { - private thoughts: ThoughtNode[] = []; - private readonly beamWidth = 3; - private readonly minScore = 0.5; - - private evaluateThought(thought: ThoughtNode): number { - // Simple scoring based on: - // - Length (indicating detail) - // - Contains mathematical expressions - // - Contains logical connectors - let score = 0; - - // Length score - score += Math.min(thought.thought.length / 100, 0.4); - - // Mathematical expressions - if (/[+\-*/=<>]/.test(thought.thought)) { - score += 0.2; - } - - // Logical connectors - if (/\b(therefore|because|if|then|thus|hence|so)\b/i.test(thought.thought)) { - score += 0.2; - } - - return score; - } - - public addThought(thought: string, thoughtNumber: number, totalThoughts: number, nextThoughtNeeded: boolean): ThoughtNode { - const node: ThoughtNode = { - thought, - thoughtNumber, - totalThoughts, - nextThoughtNeeded, - score: 0, - children: [] - }; - - // Evaluate thought - node.score = this.evaluateThought(node); - - // Add to parent if this is not the first thought - if (this.thoughts.length > 0) { - const potentialParents = this.thoughts.filter(t => t.thoughtNumber === thoughtNumber - 1); - if (potentialParents.length > 0) { - // Find best parent based on score - const bestParent = potentialParents.reduce((a, b) => a.score > b.score ? a : b); - node.parent = bestParent; - bestParent.children.push(node); - } - } - - // Keep beam width best thoughts at each level - const sameLevel = this.thoughts.filter(t => t.thoughtNumber === thoughtNumber); - sameLevel.push(node); - if (sameLevel.length > this.beamWidth) { - sameLevel.sort((a, b) => b.score - a.score); - sameLevel.splice(this.beamWidth); - } - - this.thoughts.push(node); - return node; - } - - public getBestPath(): ThoughtNode[] { - const bestLast = [...this.thoughts] - .filter(t => !t.nextThoughtNeeded) - .sort((a, b) => b.score - a.score)[0]; - - if (!bestLast) return []; - - const path: ThoughtNode[] = [bestLast]; - let current = bestLast; - while (current.parent) { - path.unshift(current.parent); - current = current.parent; - } - return path; - } - - public getStats() { - return { - totalThoughts: this.thoughts.length, - bestScore: Math.max(...this.thoughts.map(t => t.score)), - averageScore: this.thoughts.reduce((a, b) => a + b.score, 0) / this.thoughts.length, - branchingFactor: this.thoughts.reduce((a, b) => a + b.children.length, 0) / this.thoughts.length - }; - } -} \ No newline at end of file diff --git a/src/reasoner.ts b/src/reasoner.ts index df13e67..77cff01 100644 --- a/src/reasoner.ts +++ b/src/reasoner.ts @@ -13,6 +13,8 @@ export class Reasoner { // Initialize available strategies this.strategies = new Map(); + + // Initialize base strategies this.strategies.set( ReasoningStrategy.BEAM_SEARCH, StrategyFactory.createStrategy(ReasoningStrategy.BEAM_SEARCH, this.stateManager, CONFIG.beamWidth) @@ -21,6 +23,16 @@ export class Reasoner { ReasoningStrategy.MCTS, StrategyFactory.createStrategy(ReasoningStrategy.MCTS, this.stateManager, undefined, CONFIG.numSimulations) ); + + // Initialize experimental MCTS strategies + this.strategies.set( + ReasoningStrategy.MCTS_002_ALPHA, + StrategyFactory.createStrategy(ReasoningStrategy.MCTS_002_ALPHA, this.stateManager, undefined, CONFIG.numSimulations) + ); + this.strategies.set( + ReasoningStrategy.MCTS_002_ALT_ALPHA, + StrategyFactory.createStrategy(ReasoningStrategy.MCTS_002_ALT_ALPHA, this.stateManager, undefined, CONFIG.numSimulations) + ); // Set default strategy const defaultStrategy = CONFIG.defaultStrategy as ReasoningStrategy; @@ -31,26 +43,26 @@ export class Reasoner { public async processThought(request: ReasoningRequest): Promise { // Switch strategy if requested if (request.strategyType && this.strategies.has(request.strategyType as ReasoningStrategy)) { - // Create new strategy instance with current beamWidth if specified const strategyType = request.strategyType as ReasoningStrategy; + + // Create new strategy instance with appropriate parameters if (strategyType === ReasoningStrategy.BEAM_SEARCH) { this.currentStrategy = StrategyFactory.createStrategy( strategyType, this.stateManager, request.beamWidth ); - this.strategies.set(strategyType, this.currentStrategy); - } else if (strategyType === ReasoningStrategy.MCTS) { + } else { + // All MCTS variants (base and experimental) use numSimulations this.currentStrategy = StrategyFactory.createStrategy( strategyType, this.stateManager, undefined, request.numSimulations ); - this.strategies.set(strategyType, this.currentStrategy); - } else { - this.currentStrategy = this.strategies.get(strategyType)!; } + // Update strategy in map + this.strategies.set(strategyType, this.currentStrategy); } // Process thought using current strategy @@ -132,15 +144,15 @@ export class Reasoner { if (!this.strategies.has(strategyType)) { throw new Error(`Unknown strategy type: ${strategyType}`); } - if (strategyType === ReasoningStrategy.BEAM_SEARCH && beamWidth !== undefined) { + // Create new strategy instance with appropriate parameters + if (strategyType === ReasoningStrategy.BEAM_SEARCH) { this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, beamWidth); - this.strategies.set(strategyType, this.currentStrategy); - } else if (strategyType === ReasoningStrategy.MCTS && numSimulations !== undefined) { - this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); - this.strategies.set(strategyType, this.currentStrategy); } else { - this.currentStrategy = this.strategies.get(strategyType)!; + // All MCTS variants (base and experimental) use numSimulations + this.currentStrategy = StrategyFactory.createStrategy(strategyType, this.stateManager, undefined, numSimulations); } + // Update strategy in map + this.strategies.set(strategyType, this.currentStrategy); } public getAvailableStrategies(): ReasoningStrategy[] { diff --git a/src/strategies/experiments/mcts-002-alpha.ts b/src/strategies/experiments/mcts-002-alpha.ts new file mode 100644 index 0000000..7d2a390 --- /dev/null +++ b/src/strategies/experiments/mcts-002-alpha.ts @@ -0,0 +1,391 @@ +import { v4 as uuidv4 } from 'uuid'; +import { ThoughtNode, ReasoningRequest, ReasoningResponse, CONFIG } from '../../types.js'; +import { MonteCarloTreeSearchStrategy } from '../mcts.js'; + +interface PolicyGuidedNode extends ThoughtNode { + visits: number; + totalReward: number; + untriedActions?: string[]; + policyScore: number; // Policy network prediction + valueEstimate: number; // Value network estimate + priorActionProbs: Map; // Action probabilities + puct?: number; // PUCT score for selection + actionHistory?: string[]; // Track sequence of actions + noveltyScore?: number; // Measure of thought novelty +} + +interface PolicyMetrics { + averagePolicyScore: number; + averageValueEstimate: number; + actionDistribution: { [action: string]: number }; + explorationStats: { + temperature: number; + explorationRate: number; + noveltyBonus: number; + }; + convergenceMetrics: { + policyEntropy: number; + valueStability: number; + }; +} + +export class MCTS002AlphaStrategy extends MonteCarloTreeSearchStrategy { + private readonly temperature: number; + private explorationRate: number; + private readonly learningRate: number; + private readonly noveltyBonus: number; + private policyMetrics: PolicyMetrics; + protected readonly simulationCount: number; + + constructor(stateManager: any, numSimulations: number = CONFIG.numSimulations) { + super(stateManager, numSimulations); + this.temperature = 1.0; + this.explorationRate = Math.sqrt(2); + this.learningRate = 0.1; + this.noveltyBonus = 0.2; + this.simulationCount = numSimulations; + this.policyMetrics = this.initializePolicyMetrics(); + } + + private initializePolicyMetrics(): PolicyMetrics { + return { + averagePolicyScore: 0, + averageValueEstimate: 0, + actionDistribution: {}, + explorationStats: { + temperature: this.temperature, + explorationRate: this.explorationRate, + noveltyBonus: this.noveltyBonus + }, + convergenceMetrics: { + policyEntropy: 0, + valueStability: 0 + } + }; + } + + public async processThought(request: ReasoningRequest): Promise { + // Get base MCTS response + const baseResponse = await super.processThought(request); + + const nodeId = uuidv4(); + const parentNode = request.parentId ? + await this.getNode(request.parentId) as PolicyGuidedNode : undefined; + + const node: PolicyGuidedNode = { + id: nodeId, + thought: request.thought, + depth: request.thoughtNumber - 1, + score: 0, + children: [], + parentId: request.parentId, + isComplete: !request.nextThoughtNeeded, + visits: 0, + totalReward: 0, + untriedActions: [], + policyScore: 0, + valueEstimate: 0, + priorActionProbs: new Map(), + actionHistory: parentNode ? + [...(parentNode.actionHistory || []), this.extractAction(request.thought)] : + [this.extractAction(request.thought)] + }; + + // Initialize node with policy guidance + node.score = this.evaluateThought(node, parentNode); + node.visits = 1; + node.totalReward = node.score; + node.policyScore = this.calculatePolicyScore(node, parentNode); + node.valueEstimate = this.estimateValue(node); + node.noveltyScore = this.calculateNovelty(node); + + await this.saveNode(node); + + // Update parent if exists + if (parentNode) { + parentNode.children.push(node.id); + await this.saveNode(parentNode); + await this.updatePolicyMetrics(node, parentNode); + } + + // Run policy-guided search + if (!node.isComplete) { + await this.runPolicyGuidedSearch(node); + } + + // Calculate enhanced path statistics + const currentPath = await this.stateManager.getPath(nodeId); + const enhancedScore = this.calculatePolicyEnhancedScore(currentPath); + + return { + ...baseResponse, + score: enhancedScore, + bestScore: Math.max(baseResponse.bestScore || 0, enhancedScore) + }; + } + + private extractAction(thought: string): string { + // Simple action extraction based on first few words + return thought.split(/\s+/).slice(0, 3).join('_').toLowerCase(); + } + + private calculatePolicyScore(node: PolicyGuidedNode, parent?: PolicyGuidedNode): number { + // Combine multiple policy factors + const depthFactor = Math.exp(-0.1 * node.depth); + const parentAlignment = parent ? + this.thoughtCoherence(node.thought, parent.thought) : 1; + const noveltyBonus = node.noveltyScore || 0; + + return ( + 0.4 * depthFactor + + 0.4 * parentAlignment + + 0.2 * noveltyBonus + ); + } + + private estimateValue(node: PolicyGuidedNode): number { + // Combine immediate score with future potential + const immediateValue = node.score; + const depthPotential = 1 - (node.depth / CONFIG.maxDepth); + const noveltyValue = node.noveltyScore || 0; + + return ( + 0.5 * immediateValue + + 0.3 * depthPotential + + 0.2 * noveltyValue + ); + } + + private calculateNovelty(node: PolicyGuidedNode): number { + // Measure thought novelty based on action history + const uniqueActions = new Set(node.actionHistory).size; + const historyLength = node.actionHistory?.length || 1; + const uniquenessRatio = uniqueActions / historyLength; + + // Combine with linguistic novelty + const complexityScore = (node.thought.match(/[.!?;]|therefore|because|if|then/g) || []).length / 10; + + return (0.7 * uniquenessRatio + 0.3 * complexityScore); + } + + private thoughtCoherence(thought1: string, thought2: string): number { + const words1 = new Set(thought1.toLowerCase().split(/\W+/)); + const words2 = new Set(thought2.toLowerCase().split(/\W+/)); + const intersection = new Set([...words1].filter(x => words2.has(x))); + const union = new Set([...words1, ...words2]); + return intersection.size / union.size; + } + + private async runPolicyGuidedSearch(node: PolicyGuidedNode): Promise { + for (let i = 0; i < this.simulationCount; i++) { + const selectedNode = await this.selectWithPUCT(node); + const expandedNode = await this.expandWithPolicy(selectedNode); + const reward = await this.simulateWithValueGuidance(expandedNode); + await this.backpropagateWithPolicyUpdate(expandedNode, reward); + + // Adapt exploration rate + this.adaptExplorationRate(expandedNode); + } + } + + private async selectWithPUCT(root: PolicyGuidedNode): Promise { + let node = root; + + while (node.children.length > 0) { + const children = await Promise.all( + node.children.map(id => this.getNode(id)) + ) as PolicyGuidedNode[]; + + node = this.selectBestPUCTChild(children); + } + + return node; + } + + private selectBestPUCTChild(nodes: PolicyGuidedNode[]): PolicyGuidedNode { + const totalVisits = nodes.reduce((sum, node) => sum + node.visits, 0); + + return nodes.reduce((best, node) => { + const exploitation = node.valueEstimate; + const exploration = Math.sqrt(Math.log(totalVisits) / node.visits); + const policyTerm = node.policyScore * this.explorationRate; + const noveltyBonus = (node.noveltyScore || 0) * this.noveltyBonus; + + const puct = exploitation + + exploration * policyTerm + + noveltyBonus; + + node.puct = puct; + return puct > (best.puct || 0) ? node : best; + }); + } + + private async expandWithPolicy(node: PolicyGuidedNode): Promise { + if (node.isComplete) return node; + + const newNode: PolicyGuidedNode = { + ...node, + id: uuidv4(), + depth: node.depth + 1, + parentId: node.id, + children: [], + visits: 1, + totalReward: 0, + policyScore: 0, + valueEstimate: 0, + priorActionProbs: new Map(), + actionHistory: [...(node.actionHistory || [])] + }; + + newNode.policyScore = this.calculatePolicyScore(newNode, node); + newNode.score = this.evaluateThought(newNode, node); + newNode.valueEstimate = this.estimateValue(newNode); + newNode.noveltyScore = this.calculateNovelty(newNode); + + await this.saveNode(newNode); + return newNode; + } + + private async simulateWithValueGuidance(node: PolicyGuidedNode): Promise { + let current = node; + let totalReward = 0; + let depth = 0; + + while (!current.isComplete && depth < CONFIG.maxDepth) { + const reward = current.valueEstimate; + totalReward += reward; + + const expanded = await this.expandWithPolicy(current); + current = expanded; + depth++; + } + + return totalReward / depth; + } + + private async backpropagateWithPolicyUpdate( + node: PolicyGuidedNode, + reward: number + ): Promise { + let current: PolicyGuidedNode | undefined = node; + + while (current) { + current.visits++; + current.totalReward += reward; + + // Update value estimate with temporal difference + const newValue = (1 - this.learningRate) * current.valueEstimate + + this.learningRate * reward; + current.valueEstimate = newValue; + + // Update action probabilities + if (current.parentId) { + const parentNode = await this.getNode(current.parentId) as PolicyGuidedNode; + const actionKey = this.extractAction(current.thought); + const currentProb = parentNode.priorActionProbs.get(actionKey) || 0; + const newProb = currentProb + this.learningRate * (reward - currentProb); + parentNode.priorActionProbs.set(actionKey, newProb); + await this.saveNode(parentNode); + } + + await this.saveNode(current); + + current = current.parentId ? + await this.getNode(current.parentId) as PolicyGuidedNode : + undefined; + } + } + + private adaptExplorationRate(node: PolicyGuidedNode): void { + const successRate = node.totalReward / node.visits; + const targetRate = 0.6; + + if (successRate > targetRate) { + // Reduce exploration when doing well + this.explorationRate = Math.max(0.5, this.explorationRate * 0.95); + } else { + // Increase exploration when results are poor + this.explorationRate = Math.min(2.0, this.explorationRate / 0.95); + } + } + + private async updatePolicyMetrics(node: PolicyGuidedNode, parent: PolicyGuidedNode): Promise { + // Update running averages + this.policyMetrics.averagePolicyScore = + (this.policyMetrics.averagePolicyScore + node.policyScore) / 2; + this.policyMetrics.averageValueEstimate = + (this.policyMetrics.averageValueEstimate + node.valueEstimate) / 2; + + // Update action distribution + const action = this.extractAction(node.thought); + this.policyMetrics.actionDistribution[action] = + (this.policyMetrics.actionDistribution[action] || 0) + 1; + + // Update exploration stats + this.policyMetrics.explorationStats = { + temperature: this.temperature, + explorationRate: this.explorationRate, + noveltyBonus: this.noveltyBonus + }; + + // Calculate policy entropy and value stability + const probs = Array.from(parent.priorActionProbs.values()); + this.policyMetrics.convergenceMetrics = { + policyEntropy: this.calculateEntropy(probs), + valueStability: Math.abs(node.valueEstimate - parent.valueEstimate) + }; + } + + private calculateEntropy(probs: number[]): number { + const sum = probs.reduce((a, b) => a + b, 0); + return -probs.reduce((acc, p) => { + const norm = p / sum; + return acc + (norm * Math.log2(norm + 1e-10)); + }, 0); + } + + private calculatePolicyEnhancedScore(path: ThoughtNode[]): number { + if (path.length === 0) return 0; + + return path.reduce((acc, node) => { + const policyNode = node as PolicyGuidedNode; + const baseScore = node.score; + const policyBonus = policyNode.policyScore || 0; + const valueBonus = policyNode.valueEstimate || 0; + const noveltyBonus = (policyNode.noveltyScore || 0) * this.noveltyBonus; + + return acc + (baseScore + policyBonus + valueBonus + noveltyBonus) / 4; + }, 0) / path.length; + } + + public async getMetrics(): Promise { + const baseMetrics = await super.getMetrics(); + const nodes = await this.stateManager.getAllNodes() as PolicyGuidedNode[]; + + // Calculate additional policy-specific metrics + const currentNode = nodes[nodes.length - 1]; + const policyStats = { + currentNode: currentNode ? { + policyScore: currentNode.policyScore, + valueEstimate: currentNode.valueEstimate, + noveltyScore: currentNode.noveltyScore, + actionHistory: currentNode.actionHistory + } : null, + averages: { + policyScore: nodes.reduce((sum, n) => sum + n.policyScore, 0) / nodes.length, + valueEstimate: nodes.reduce((sum, n) => sum + n.valueEstimate, 0) / nodes.length, + noveltyScore: nodes.reduce((sum, n) => sum + (n.noveltyScore || 0), 0) / nodes.length + }, + policyMetrics: this.policyMetrics + }; + + return { + ...baseMetrics, + name: 'MCTS-002-Alpha (Policy Enhanced)', + temperature: this.temperature, + explorationRate: this.explorationRate, + learningRate: this.learningRate, + policyStats + }; + } +} diff --git a/src/strategies/experiments/mcts-002alt-alpha.ts b/src/strategies/experiments/mcts-002alt-alpha.ts new file mode 100644 index 0000000..44a6b09 --- /dev/null +++ b/src/strategies/experiments/mcts-002alt-alpha.ts @@ -0,0 +1,386 @@ +import { v4 as uuidv4 } from 'uuid'; +import { ThoughtNode, ReasoningRequest, ReasoningResponse, CONFIG } from '../../types.js'; +import { MCTS002AlphaStrategy } from './mcts-002-alpha.js'; + +// Queue implementation for bidirectional search +class Queue { + private items: T[]; + + constructor() { + this.items = []; + } + + enqueue(item: T): void { + this.items.push(item); + } + + dequeue(): T | undefined { + return this.items.shift(); + } + + isEmpty(): boolean { + return this.items.length === 0; + } + + size(): number { + return this.items.length; + } +} + +// Extend PolicyGuidedNode to include both A* and bidirectional properties +interface BidirectionalPolicyNode extends ThoughtNode { + visits: number; + totalReward: number; + untriedActions?: string[]; + g: number; // A* cost from start + h: number; // A* heuristic to goal + f: number; // A* f = g + h + policyScore: number; + valueEstimate: number; + priorActionProbs: Map; + puct?: number; + actionHistory?: string[]; + noveltyScore?: number; + parent?: string; // For path reconstruction + direction?: 'forward' | 'backward'; + searchDepth?: number; + meetingPoint?: boolean; +} + +export class MCTS002AltAlphaStrategy extends MCTS002AlphaStrategy { + private startNode: BidirectionalPolicyNode | null = null; + private goalNode: BidirectionalPolicyNode | null = null; + private bidirectionalStats: { + forwardExplorationRate: number; + backwardExplorationRate: number; + meetingPoints: number; + pathQuality: number; + }; + + constructor(stateManager: any, numSimulations: number = CONFIG.numSimulations) { + super(stateManager, numSimulations); + this.bidirectionalStats = { + forwardExplorationRate: Math.sqrt(2), + backwardExplorationRate: Math.sqrt(2), + meetingPoints: 0, + pathQuality: 0 + }; + } + + public async processThought(request: ReasoningRequest): Promise { + // Get base response first to ensure proper MCTS initialization + const baseResponse = await super.processThought(request); + + const nodeId = uuidv4(); + const parentNode = request.parentId ? + await this.getNode(request.parentId) as BidirectionalPolicyNode : undefined; + + const node: BidirectionalPolicyNode = { + id: nodeId, + thought: request.thought, + depth: request.thoughtNumber - 1, + score: 0, + children: [], + parentId: request.parentId, + isComplete: !request.nextThoughtNeeded, + visits: 0, + totalReward: 0, + untriedActions: [], + g: parentNode ? parentNode.g + 1 : 0, + h: 0, + f: 0, + policyScore: 0, + valueEstimate: 0, + priorActionProbs: new Map(), + actionHistory: parentNode ? + [...(parentNode.actionHistory || []), this.getActionKey(request.thought)] : + [this.getActionKey(request.thought)], + searchDepth: 0, + direction: parentNode ? parentNode.direction : 'forward' + }; + + // Track start and goal nodes for bidirectional search + if (!parentNode) { + this.startNode = node; + node.direction = 'forward'; + } + if (node.isComplete) { + this.goalNode = node; + node.direction = 'backward'; + } + + // Run bidirectional search if we have both endpoints + if (this.startNode && this.goalNode) { + const path = await this.bidirectionalSearch(this.startNode, this.goalNode); + if (path.length > 0) { + await this.updatePathWithPolicyGuidance(path); + } + } + + // Calculate enhanced path statistics + const currentPath = await this.stateManager.getPath(nodeId); + const enhancedScore = this.calculateBidirectionalPolicyScore(currentPath); + + return { + ...baseResponse, + score: enhancedScore, + bestScore: Math.max(baseResponse.bestScore || 0, enhancedScore) + }; + } + + protected getActionKey(thought: string): string { + // Simple action extraction based on first few words + return thought.split(/\s+/).slice(0, 3).join('_').toLowerCase(); + } + + private async searchLevel( + queue: Queue, + visited: Map, + otherVisited: Map, + direction: 'forward' | 'backward' + ): Promise { + const levelSize = queue.size(); + + for (let i = 0; i < levelSize; i++) { + const current = queue.dequeue(); + if (!current) continue; + + // Check if we've found a meeting point + if (otherVisited.has(current.id)) { + current.meetingPoint = true; + this.bidirectionalStats.meetingPoints++; + await this.saveNode(current); + return current; + } + + // Get neighbors based on direction and policy scores + const neighbors = direction === 'forward' ? + await Promise.all(current.children.map(id => this.getNode(id))) : + await Promise.all([current.parentId].filter((id): id is string => !!id).map(id => this.getNode(id))); + + const validNeighbors = neighbors.filter((n): n is BidirectionalPolicyNode => !!n) + .sort((a, b) => b.policyScore - a.policyScore); // Use policy scores for neighbor selection + + for (const neighbor of validNeighbors) { + if (!visited.has(neighbor.id)) { + visited.set(neighbor.id, neighbor); + neighbor.parent = current.id; + neighbor.direction = direction; + neighbor.searchDepth = (current.searchDepth || 0) + 1; + await this.saveNode(neighbor); + queue.enqueue(neighbor); + } + } + } + + return null; + } + + private async bidirectionalSearch( + start: BidirectionalPolicyNode, + goal: BidirectionalPolicyNode + ): Promise { + const forwardQueue = new Queue(); + const backwardQueue = new Queue(); + const forwardVisited = new Map(); + const backwardVisited = new Map(); + + forwardQueue.enqueue(start); + backwardQueue.enqueue(goal); + forwardVisited.set(start.id, start); + backwardVisited.set(goal.id, goal); + + while (!forwardQueue.isEmpty() && !backwardQueue.isEmpty()) { + // Search from both directions with policy guidance + const meetingPoint = await this.searchLevel( + forwardQueue, + forwardVisited, + backwardVisited, + 'forward' + ); + + if (meetingPoint) { + const path = this.reconstructPath( + meetingPoint, + forwardVisited, + backwardVisited + ); + this.updateBidirectionalStats(path); + return path; + } + + const backMeetingPoint = await this.searchLevel( + backwardQueue, + backwardVisited, + forwardVisited, + 'backward' + ); + + if (backMeetingPoint) { + const path = this.reconstructPath( + backMeetingPoint, + forwardVisited, + backwardVisited + ); + this.updateBidirectionalStats(path); + return path; + } + + // Adapt exploration rates based on progress + this.adaptBidirectionalExploration(forwardVisited, backwardVisited); + } + + return []; + } + + private reconstructPath( + meetingPoint: BidirectionalPolicyNode, + forwardVisited: Map, + backwardVisited: Map + ): BidirectionalPolicyNode[] { + const path: BidirectionalPolicyNode[] = [meetingPoint]; + + // Reconstruct forward path + let current = meetingPoint; + while (current.parent && forwardVisited.has(current.parent)) { + current = forwardVisited.get(current.parent)!; + path.unshift(current); + } + + // Reconstruct backward path + current = meetingPoint; + while (current.parent && backwardVisited.has(current.parent)) { + current = backwardVisited.get(current.parent)!; + path.push(current); + } + + return path; + } + + private async updatePathWithPolicyGuidance(path: BidirectionalPolicyNode[]): Promise { + const pathBonus = 0.2; + + for (const node of path) { + // Boost both policy and value estimates for nodes along the path + node.policyScore += pathBonus; + node.valueEstimate = (node.valueEstimate + 1) / 2; + + // Update action probabilities with path information + if (node.parentId) { + const parentNode = await this.getNode(node.parentId) as BidirectionalPolicyNode; + const actionKey = this.getActionKey(node.thought); + const currentProb = parentNode.priorActionProbs.get(actionKey) || 0; + const newProb = Math.max(currentProb, 0.8); // Strong preference for path actions + parentNode.priorActionProbs.set(actionKey, newProb); + await this.saveNode(parentNode); + } + + await this.saveNode(node); + } + + // Update path quality metric + this.bidirectionalStats.pathQuality = path.reduce((acc, node) => + acc + node.policyScore + node.valueEstimate, 0) / (path.length * 2); + } + + private adaptBidirectionalExploration( + forwardVisited: Map, + backwardVisited: Map + ): void { + // Adjust exploration rates based on search progress + const forwardProgress = Array.from(forwardVisited.values()) + .reduce((acc, node) => acc + node.policyScore, 0) / forwardVisited.size; + const backwardProgress = Array.from(backwardVisited.values()) + .reduce((acc, node) => acc + node.policyScore, 0) / backwardVisited.size; + + // Increase exploration in the direction making less progress + if (forwardProgress > backwardProgress) { + this.bidirectionalStats.backwardExplorationRate *= 1.05; + this.bidirectionalStats.forwardExplorationRate *= 0.95; + } else { + this.bidirectionalStats.forwardExplorationRate *= 1.05; + this.bidirectionalStats.backwardExplorationRate *= 0.95; + } + } + + private updateBidirectionalStats(path: BidirectionalPolicyNode[]): void { + const forwardNodes = path.filter(n => n.direction === 'forward'); + const backwardNodes = path.filter(n => n.direction === 'backward'); + + // Update exploration rates based on path composition + const forwardQuality = forwardNodes.reduce((acc, n) => acc + n.policyScore, 0) / forwardNodes.length; + const backwardQuality = backwardNodes.reduce((acc, n) => acc + n.policyScore, 0) / backwardNodes.length; + + this.bidirectionalStats.pathQuality = (forwardQuality + backwardQuality) / 2; + } + + private calculateBidirectionalPolicyScore(path: ThoughtNode[]): number { + if (path.length === 0) return 0; + + return path.reduce((acc, node) => { + const biNode = node as BidirectionalPolicyNode; + const baseScore = node.score; + const policyBonus = biNode.policyScore || 0; + const valueBonus = biNode.valueEstimate || 0; + const meetingPointBonus = biNode.meetingPoint ? 0.2 : 0; + const directionBonus = biNode.direction === 'forward' ? + this.bidirectionalStats.forwardExplorationRate * 0.1 : + this.bidirectionalStats.backwardExplorationRate * 0.1; + + return acc + ( + baseScore + + policyBonus + + valueBonus + + meetingPointBonus + + directionBonus + ) / 5; + }, 0) / path.length; + } + + public async getMetrics(): Promise { + const baseMetrics = await super.getMetrics(); + const nodes = await this.stateManager.getAllNodes() as BidirectionalPolicyNode[]; + + const forwardNodes = nodes.filter(n => n.direction === 'forward'); + const backwardNodes = nodes.filter(n => n.direction === 'backward'); + const meetingPoints = nodes.filter(n => n.meetingPoint); + + const bidirectionalMetrics = { + forwardSearch: { + nodesExplored: forwardNodes.length, + averagePolicyScore: forwardNodes.reduce((sum, n) => sum + n.policyScore, 0) / forwardNodes.length, + explorationRate: this.bidirectionalStats.forwardExplorationRate + }, + backwardSearch: { + nodesExplored: backwardNodes.length, + averagePolicyScore: backwardNodes.reduce((sum, n) => sum + n.policyScore, 0) / backwardNodes.length, + explorationRate: this.bidirectionalStats.backwardExplorationRate + }, + meetingPoints: { + count: this.bidirectionalStats.meetingPoints, + averageDepth: meetingPoints.reduce((sum, n) => sum + n.depth, 0) / (meetingPoints.length || 1) + }, + pathQuality: this.bidirectionalStats.pathQuality + }; + + return { + ...baseMetrics, + name: 'MCTS-002Alt-Alpha (Bidirectional + Policy Enhanced)', + hasStartNode: !!this.startNode, + hasGoalNode: !!this.goalNode, + bidirectionalMetrics + }; + } + + public async clear(): Promise { + await super.clear(); + this.startNode = null; + this.goalNode = null; + this.bidirectionalStats = { + forwardExplorationRate: Math.sqrt(2), + backwardExplorationRate: Math.sqrt(2), + meetingPoints: 0, + pathQuality: 0 + }; + } +} diff --git a/src/strategies/factory.ts b/src/strategies/factory.ts index eaf58da..b1d50a3 100644 --- a/src/strategies/factory.ts +++ b/src/strategies/factory.ts @@ -2,10 +2,14 @@ import { StateManager } from '../state.js'; import { BaseStrategy } from './base.js'; import { BeamSearchStrategy } from './beam-search.js'; import { MonteCarloTreeSearchStrategy } from './mcts.js'; +import { MCTS002AlphaStrategy } from './experiments/mcts-002-alpha.js'; +import { MCTS002AltAlphaStrategy } from './experiments/mcts-002alt-alpha.js'; export enum ReasoningStrategy { BEAM_SEARCH = 'beam_search', - MCTS = 'mcts' + MCTS = 'mcts', + MCTS_002_ALPHA = 'mcts_002_alpha', + MCTS_002_ALT_ALPHA = 'mcts_002_alt_alpha' } export class StrategyFactory { @@ -20,6 +24,10 @@ export class StrategyFactory { return new BeamSearchStrategy(stateManager, beamWidth); case ReasoningStrategy.MCTS: return new MonteCarloTreeSearchStrategy(stateManager, numSimulations); + case ReasoningStrategy.MCTS_002_ALPHA: + return new MCTS002AlphaStrategy(stateManager, numSimulations); + case ReasoningStrategy.MCTS_002_ALT_ALPHA: + return new MCTS002AltAlphaStrategy(stateManager, numSimulations); default: throw new Error(`Unknown strategy type: ${type}`); } diff --git a/state-manager.js b/state-manager.js deleted file mode 100644 index 5fd79eb..0000000 --- a/state-manager.js +++ /dev/null @@ -1,40 +0,0 @@ -export class StateManager { - constructor() { - this.sessions = new Map(); - } - - async createSession(sessionId, params) { - const state = { - context: params.context, - currentPaths: [], - depth: 0, - completed: false - }; - - this.sessions.set(sessionId, state); - return state; - } - - async getSession(sessionId) { - const state = this.sessions.get(sessionId); - if (!state) { - throw new Error('Session not found'); - } - return state; - } - - async updateSession(sessionId, newPaths) { - const state = await this.getSession(sessionId); - - state.currentPaths = newPaths; - state.depth += 1; - state.completed = state.depth >= 3; // Max depth reached - - this.sessions.set(sessionId, state); - return state; - } - - async deleteSession(sessionId) { - this.sessions.delete(sessionId); - } -} \ No newline at end of file diff --git a/tot-engine.js b/tot-engine.js deleted file mode 100644 index d4cc365..0000000 --- a/tot-engine.js +++ /dev/null @@ -1,56 +0,0 @@ -export class ToTEngine { - constructor() { - this.beamWidth = 5; - this.maxDepth = 3; - } - - async evaluateThought(thought) { - // Simple scoring based on thought coherence and relevance - const score = Math.random(); // Replace with actual evaluation logic - return score; - } - - async generateThoughts(context, numThoughts = 3) { - const thoughts = []; - const scores = []; - - for (let i = 0; i < numThoughts; i++) { - // In a real implementation, this would use an LLM to generate thoughts - const thought = `Thought ${i + 1} about ${context}`; - const score = await this.evaluateThought(thought); - - thoughts.push(thought); - scores.push(score); - } - - return { thoughts, scores }; - } - - async expandPath(path) { - const { thoughts, scores } = await this.generateThoughts(path[path.length - 1]); - return thoughts.map((thought, i) => ({ - path: [...path, thought], - score: scores[i] - })); - } - - async search(initialContext) { - let paths = [{ path: [initialContext], score: 1.0 }]; - - for (let depth = 0; depth < this.maxDepth; depth++) { - const newPaths = []; - - for (const currentPath of paths) { - const expansions = await this.expandPath(currentPath.path); - newPaths.push(...expansions); - } - - // Keep top k paths based on scores - paths = newPaths - .sort((a, b) => b.score - a.score) - .slice(0, this.beamWidth); - } - - return paths; - } -} \ No newline at end of file From a2251a2ba02f89aa0d54e11e2009f9e39ec47dbd Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Tue, 7 Jan 2025 17:46:53 -0800 Subject: [PATCH 7/8] forgot to update the license --- LICENSE | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/LICENSE b/LICENSE index 89b0b3b..0f2ddf3 100644 --- a/LICENSE +++ b/LICENSE @@ -1,6 +1,6 @@ MIT License -Copyright (c) 2024 Jacck, frgmt0 +Copyright (c) 2025 Jacck, frgmt0 Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal From ee93c70d31a1611d0db9ac4f93592d6e15a98be2 Mon Sep 17 00:00:00 2001 From: Jason Weiss Date: Tue, 7 Jan 2025 17:49:23 -0800 Subject: [PATCH 8/8] fixed readme for formatting --- README.md | 12 +++++++++++- node_modules/.package-lock.json | 2 +- package-lock.json | 4 ++-- 3 files changed, 14 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 24e71a8..65d821a 100644 --- a/README.md +++ b/README.md @@ -7,15 +7,24 @@ A reasoning implementation for Claude Desktop that lets you use both Beam Search #### What's New: > Added 2 Experimental Reasoning Algorithms: +> > - `mcts-002-alpha` +> > - Uses the A* Search Method along with an early *alpha* implementation of a Policy Simulation Layer +> > - Also includes an early *alpha* implementation of Adaptive Exploration Simulator & Outcome Based Reasoning Simulator +> > *NOTE* the implementation of these alpha simulators is not complete and is subject to change +> > - `mcts-002alt-alpha` +> > - Uses the Bidirectional Search Method along with an early *alpha* implementation of a Policy Simulation Layer +> > - Also includes an early *alpha* implementation of Adaptive Exploration Simulator & Outcome Based Reasoning Simulator +> > *NOTE* the implementation of these alpha simulators is not complete and is subject to change -> + + What happened to `mcts-001-alpha` and `mcts-001alt-alpha`? > Quite simply: It was useless and near similar to the base `mcts` method. After initial testing the results yielded in basic thought processes was near similar showing that simply adding policy simulation may not have an effect. @@ -28,6 +37,7 @@ So why add Polciy Simulation Layer now? > Added model control over search parameters: > > beamWidth - lets Claude adjust how many paths to track (1-10) +> > numSimulations - fine-tune MCTS simulation count (1-150) ## Features diff --git a/node_modules/.package-lock.json b/node_modules/.package-lock.json index 61aa393..3cc4c68 100644 --- a/node_modules/.package-lock.json +++ b/node_modules/.package-lock.json @@ -1,6 +1,6 @@ { "name": "mcp-reasoner", - "version": "1.1.0", + "version": "2.0.0", "lockfileVersion": 3, "requires": true, "packages": { diff --git a/package-lock.json b/package-lock.json index e967da6..457840b 100644 --- a/package-lock.json +++ b/package-lock.json @@ -1,12 +1,12 @@ { "name": "mcp-reasoner", - "version": "1.1.0", + "version": "2.0.0", "lockfileVersion": 3, "requires": true, "packages": { "": { "name": "mcp-reasoner", - "version": "1.1.0", + "version": "2.0.0", "license": "MIT", "dependencies": { "@modelcontextprotocol/sdk": "*",