Werewolf v0.2 #416

hannw · 2025-09-26T18:06:16Z

Add the game engine and visualizer to Kaggle environements.

add werewolf game, prototype runs e2e werewolf env stablized. 1. fixed renderer to return input and output of agent at each step 2. add a better random agent 3. add get_human_readable to WerewolfObservationModel add unit tests for the WerewolfEnv add werewolf basic html rendering add default max days add dummy llm agent initialized new werewolf game engine with more principled and modular design

add game init logic werewolf integration tests working e2e, resolved minor bugs refactored add history entry Enable debug agents for debug mode fixed action parsing logic and random agent Night stage stable refactor __run_interpreter to respect debug mode refactored werewolf players to receive observations in message queue. game interpreter working. werewolf game engine working end to end werewolf rendering working e2e 1. Resolved subtle werewolf voting bug, where SimultaneousMajority Voting need to log votes even if the action is invalid. Also, tie breaking extend to no valid votes, randomly select one. 2. Resolved HistoryEntry serialization issue of the subfield DataEntry (having multiple subclasses). Now, we serialize all subclasses manually to dict. 3. Resolve random werewolf agent issues in parsing observations (HistoryEntry) using isinstance, since now it only have access to json objects. 4. refactored werewolf.js. Now it works with new game engine.

add more number of valid agents to the werewolf game config fix the debug mode of agent.py to allow manual debugging Add two panel frontend for werewolf Also, add doctor seer action data entry

added day and night voting to werewolf.js Add doctor save information to werewolf.js Use action from steps to render input actions. add logos to test_werewolf.py add player_thumbnails to config schema delete obselete observation prep temp commit test_engine.py day of actions in event log rendered correctly now separate day and night event logs into two shades in werewolf.js Event log rendering order completed correctly in werewolf.js add moderator announcement to event log in werewolf.js UI related fixes 1. add moderator observations to env.info 2. refactor action to record game phase timestamps 3. change some entry to moderator announcement type so UI will render those

add llm harness for werewolf fix the bug that reasoning and voting details were revealed to all players. 1. refactor DataEntry to have public_view method, so reasoning trace is hidden from public. 2. fix the bug of voting action add_history_entry calls in SimultaneousMajority. moderator announce roles and their abilities support better end game logging fix supported llm model names refactor player schema in config 1. modify werewolf.json 2. fix Enum value representation for model_dump 3. refactor player representation add try except in llm werewolf harness also added vertex ai credential for litellm add display name to agent schema add more supported models to llm harness refactor llm harness to have better instruction template to delineate sections pass agent_config to llm harness fixed infinite voting loop problem and elected not eliminated problems 1. infinite voting bug: if certain players Timeout by kaggle, it will result in infinite voting loop due to potential votes never cast. 2. elected not eliminated problems: set default logic need to happen after player vote cast, otherwise the check for duplicate vote logic will prevent voter new vote to be cast. 3. tally vote has a subtle bug of not eliminating abstain, "-1", votes while counting. resolve infinite voting loop bug 1. use collect_votes method to set default for all expected voters. 2. fix winning condition bug for villagers. add retry logic minor fixes add event truncation to LLMWerewolfAgent add pydantic to requirements fix prompt template json fences and wording record timestamp during history entry creation change default voting scheme to sequential voting This is for better dynamics and visualization for viewers during game replay

simplify visuals in werewolf.js add player capsules in werewolf rendering fix night action reasoning and actor id rendering improve line spacing of moderator and game over messsage

fix issues in SequentialVoting protocol refactor action queue to use action specific queue adjust phase separator style add perceived threat level to backend and frontend improve json schema export for pydantic basemodel

refactor __run_interpreter to remove duplicated code add comments to address debug mode branching Refactoring interpreter() for readability and adding some high level documentation minor new line improvement

add checks to ensure all player ids are unique refactor phase transition to be more robust make sure allow doctor self save is configurable

Add actor id capsule to event log updates fix sequential voting bug add voter capsule to day votes add timeout display in werewolf.js

adding additional llm models with quota for testing adding snippet to track token usage fixing bug where single element instead of list is being returned Adding additonal packages that seem to be required from base image fixing issue with undefined 2d array for moderator and player roles not getting set making some options required Adding README.md to werwolf with bare bones example

adding branch to werewolf_harness adding branch to werewolf_harness adding a flat() call as moderator results can be 2d, kind of a hack for now adding a flat() call as moderator results can be 2d, kind of a hack for now, another missed call add werewolf runner scripts for simple testing and experimentation remove print statements and use logger instead revert the voting index change sequential voting should only have one actor a time

add TTS to werewolf game replay Add continual audio to front end Add voices and association with model fix werewolf.js black screen issue reporting "Waiting for game data..." making the audio speed bar persistent across playbacks minor fixes for dump_audio.py add announcements to audio event add action announcements to audio event

add 3d background add 3d background rendering and portable replay folder refactor dump_audio.py to be more modularized added wolf and windmill added wolf and windmill loaded 8 stickmans and idle animation adjusted stickman directions and numbers add nameplate on top of the stickman add local image into asset for serving thumbnail nameplate text aligned center fix z placement of nameplate add moon to the scene add file dump

render audio script refactor dump_audio.py to enable debug path to play simple audio refactor werewolf.js to have hover effect and select event to playback change text color during audio playback event refactor base.py instructions fix finding audio key issues add gpt-oss model add new audio generation instructions and examples Change the tts instruction format to gemini-2.5-pro's suggestions Improve on the audio prompts

add 3d werewolf script add assets resolve moderator obs bug use logger for self_play.py resolve sequential voting protocol bug refactor dump_audio.py add instructions for audio generation in README.md add instructions for audio generation in README.md add flask and gym back remove redundant files clean up tests add agent action error code handling, default to game end if there is agent error fix out and err context fix input args of interpreter loop improve exemplar formatting Change the debug mode for tests to False for testing prod paths These tests are not really debugging, they are testing side effects of prod path which catches error and update agent status. add game rules # Conflicts: # requirements.txt

add violent language filter to discussion protocols enforce violent language filter in action initializer level add optional reveal night elimination or day exile role Fix bid driven discussion state machine and add turn by turn bidding discussion refactor action queue to a single class and refine bidding driven discussion add bid data entry and bid result data entry use deterministic tie breaking mechanisms for UrgencyBidProtocol add bid_result_public to properties add test_turn_by_turn_bidding_discussion

3d view changes Improved 3d layout Player list css improvements Fix for emojis 3d scene effects Player logo fading Adjusted panel margins fix for flipped players Day/night transitions Nameplate adjustments Status panel fix animation so backward replay restore past state fix event log so backward replay restore past log state add moderator announcement back to event log fix play bar event loop step alignment disable key control of scene add block experiment code for role balanced sampling add spotlight logic add more supported models

report cost of inference for LLMs enable configuring LLMWerewolfAgent and add text mode prompts add script to measure cost add bid action to llm harness enable harness bid reasoning config options and add cost tracker refactor measure_cost.py to use new cost trackers fix day addition bug in engine.py fix token trajectory plot add agent reset mechanism to solve global agent state carry over across episode issue and cost accumulation bug Load usage directly from litellm for cost analysis fix trajectory loading in measure_cost.py

add run.py script and refactor run_block.py to use it rearrange config files add toggle for reasoning traces

Fix pulse and spotlight animation misalignment with actor add targeting arcs Improved vote target visualizer Change color of arcs for different actions Fixed day/night timing and werewolf turns red during night time fix day voting arcs add phase divider logic to cleanly segment events separate allEvents and visibleEvents to remove undisplayed events from the replay bar fix night werewolf red light glitches Fix night eliminated werewolf resurrection issues fix phase indicator on left panel remove redundant code, change event log phase indicator text Resolve event log duplication bug Resolve game end phase indicator to no day count change thumbnail size on left panel, clean up console logs Add display_name rendering and adjust thumbnail background to be white Add optional display name to 3d player nameplate

refactor run_block.py to use run.py as subprocess and parallelize the operations add option to append timestamp to output dir add option to shuffle player ids revise run_block.py instructions add LogExecutionTime context manager to keep track of task time add instructions to configure agents fix generate unique role permutations Add register agent method Make register agent the responsibility of the user. This way we can register llm harnesses on the fly. add litellm_models.yaml config to register litellm model for cost provider etc remove capital P from player logs Fix reveal day and night elimination bug Fix reveal text for no reveal situation in visualizer Update player capsule to show display name (placeholder for model name) Handle capsule parsing better 1. introduce a caching function for player name parser 2. better formatting for list parsing in system messages add doctor and seer night glow add options to choose random first actor for roundrobiin discussion and voting fix sequential voting protocol voter bug and history visibility bug Minor format improvement to llm harness Minor format improvement Fix capsule regex for parsing "... <player_name>." correct threat level text use orb as self assessed threat level detector 1. repurpose orb as threat level detector 2. decouple is_active animation with role add sound wave animation to speaker change random message. fix role_msg add censor words, refactor censor pattern compile to global add llm query logs, fix no cost logging add flag to shuffle roles refactor llm harness for better error handling and logging 1. refactored action parsing to be dispatched pattern. 2. introduce retry mechanism and callbacks for dealing with different errors (rate limit, context window, parsing error) 3. record completion and prompt in resulting actions. fix tally exposing player role bug add global toggle for turning reasoning on/off add reveal rule messages in moderator announcement refactor action schema for player improve role specification formatting fix schema_for_player return None bug add action to ActionDataMixin block action from public view of data entry add player first person view on click of player card at left panel add reset button for 3d camera position fix query_parse bug, remove observation from action obj

remove observations from actions.py which is causing recursive memory growth optimize imports [critical] fix public view bug resulted from Mixin method resolution order (MRO) bug 1. The ActionDataMixin was wrongfully placed at the end of parent class hierarchy, it should be the first with the highest MRO priority. 2. from_history_entry has wrong signature and never used dict and DataEntry as input. It has been using HistoryEntry as input. add missing error prompt Refactor history entry access patterns 1. add const for observation key 2. agents (random and llm harness) use a more principled way to get action request history entry 3. refactor history entry type to be more detailed 4. update records.py to have access control for action data. 5. decouple history entry and player history entry view 6. remove unused history entries in observations in werewolf.py 7. control history entry access completely from state 8. clean up protocols.py and engine.py add history entry Add visible history entry type control fix raw observation key bug make actor data observable to the actor by specifying source cache schema_for_player for better latency refactor raw_observation consumer to use getter and setter refactor llm harness to record self action and reasoning Introduce StrEnum to be better json serializable and clearer types

add configs for llm experiment add packages to requirements.txt add pairwise zero sum game tournament add task shuffling to reduce LLM api load reuse run_single_game_cli code add display name to the right panel player name tag revise self_play.py to run n games from a given config add "random" and "no exile" tie exile options in SimultaneousMajority voting Add RoundByRoundBiddingDiscussion refactor _handle_night_await_actions() into smaller methods for readability. Remove check for GOOGLE_APPLICATION_CREDENTIALS as sdk looks for them rather than application Updating docs to remove internal OCTO project Refactor to use event driven architecture for Role extensibility 1. Introduce EventBus in GameState to control fan-in and fan-out of game events. 2. refactor role specific event handlers to roles. Use decorator to register event handler. 3. action confirmation centrally handled. 4. Introduce PlayerID. 5. general improvement on symbol annotations.

Refactor protocols to be modules 1. provide factory methods and registry for proper configurable protocols. 2. refactor the protocols to be multiple modules. Improve on player id annotations standardize variable names from history entry to event fix harness phase error add cost to litellm models remove general confirmation of action event to player The action event will clutter llm prompt Fix StrEnum repr Improve on prompting 1. Providing human friendly name of protocols. 2. Improve on LLM prompts and wording of rule sets. Improve the state machine transition to be explicit in engine.py Add phase category as attribute of DetailedPhase Resolve detailed_phase naming issue add utility to log git hash use wait random exponential to avoid "thundering herd" VM crash Use threadpool instead refactor voting to use Ballot Refactor Role initialization to use role_params dict 1. remove allow_doctor_self_save flag 2. set params directly in Role subclasses. 3. update agent config to use role_params. 4. fix run scripts role shuffling logic. 5. fix minor public announcement in night elimination manager. 6. change no elimination night announcement. Refactor day/night elimination reveal to use RevealLevel 1. moderator and night elimination manager config changes 2. general config schema and existing config updates. 3. Fix werewolf observation model reveal level. 4. Fix LLM harness observation model reveal level. Fix seer team reveal level result blob Add option to disallow doctor consecutive saving of same player This option is mainly to prevent infinite Seer save and provide more strategic game play

Audios are added to existing replay to decouple voice over from game generation

SohierDane · 2025-09-30T02:46:01Z

@hannw I'm trying to look for ways to break this into additional smaller PRs. Does this sound like a reasonable list of smaller PRs we could review in isolation? Chaining them should be fine if necessary.

The game logic & visualizers.
The harness.
The other experiment configs & runner scripts.
The LLM logos.

develra · 2025-09-30T17:02:24Z

kaggle_environments/static/player.html

                parent: ref.current,
                preact,
                styled,
+                __mainContext: context,


All the changes to this file read really strangely to me - generally it's a bad idea to link context and state like this. Can you tell me a bit more about what you are trying to accomplish with these changes?

We are trying here to overwrite the behavior of the functions of the control interface. Do you have suggestions on how to better approach this?

This is needed since Kaggle environment have assumptions about game loops that breaks for some game engines like werewolf (specifically, one kaggle steps may map to many werewolf steps). Therefore, we can only overwrite the play function and the steps in order to make a "step" more intuitive for werewolf (say, one moderator announcement step that result in no player action).

This is one of the main learning and suggestion for future 3rd party game integration that some assumptions need to be relaxed or redesigned to fit a wider set of game engine.

We are trying here to overwrite the behavior of the functions of the control interface. Do you have suggestions on how to better approach this?

Ya fair enough, I think my preference here would be to have a more well-defined interface of 'overridable controls' that we allow passing into the game renderer, and the game can choose to override those controls if they want. I'll try to think a bit about how to update the interface to support this.

This is one of the main learning and suggestion for future 3rd party game integration that some assumptions need to be relaxed or redesigned to fit a wider set of game engine.

Just out of curiosity, why does werewolf need it's own set of 'steps' instead of relying on the 'steps' defined in the replay interface?

@hannw - maybe for the time being we can do overrides similar to what setStep is doing here? https://github.com/Kaggle/kaggle-environments/blob/master/kaggle_environments/static/player.html#L536 - then can just pass in that with the context and override it as needed. Not my favorite pattern but I think it should work.

There will be state reset issue with direct overrides (I've tried). Once we override setStep the original copy would be lost from top level context, that's why the setters and the global cache pattern are necessary here for the monkey patching to work correctly. We need a global cache window.wwOriginals to store the original setStep.

develra

Few requests for cleaning up the html and javascript - I left some comments about this but given the length of the file a number of other places it should also be applied.

Figure out a debug strategy for the console.logs and console.warnings so it isn't always spamming the console output unless you specifically ask for it.
Clean up a lot extraneous comments and dead code commented out.
I'm pretty nervous about needing to maintain this over time given the complexity involved. Did you use a PLAN.MD or something for development? Might be nice to include a context doc here in case we need to dump it in for iterations in the future.
I'm not crazy about the global changes to player.html and other top-level things. Can we discuss what you need these for and figure out a more sustainable way to make the changes? Happy to assist.

Thanks!

develra · 2025-09-30T19:55:45Z