Technical Debt Tracking¶
This document tracks known technical debt in the MAID codebase. Technical debt represents areas where we've made pragmatic trade-offs that should be addressed in future refactoring efforts.
Overview¶
Technical debt is not inherently bad - it often represents reasonable decisions made under constraints. This document helps us:
- Track known debt so it doesn't get forgotten
- Prioritize which items to address first
- Plan migration paths for future refactoring
- Onboard new contributors by explaining why certain patterns exist
Active Technical Debt Items¶
TD-001: Module-Level Singletons in API Layer¶
Status: Partially migrated (Phases 1-3 complete, Phase 4 in progress)
Priority: Medium
Affected Files:
- packages/maid-engine/src/maid_engine/api/auth.py
- packages/maid-engine/src/maid_engine/api/admin/websocket.py
- packages/maid-engine/src/maid_engine/api/admin/dashboard.py
- packages/maid-engine/src/maid_engine/api/admin/entities.py
- packages/maid-engine/src/maid_engine/api/v1/events_ws.py
Phase 1 Migration Progress (2026-02-04):
Phase 1 is complete. All affected modules now have app.state-aware getter functions that:
1. Check request.app.state first for the dependency
2. Fall back to the module-level global if not found in app.state
This provides backwards compatibility while enabling new code to use proper dependency injection. The server initialization code (net/web/server.py) now stores auth components in app.state.
New app.state-aware functions added (Phase 1):
- auth.py: _get_key_store_from_app(), _get_rate_limiter_from_app(), _get_auth_rate_limiter_from_app()
- websocket.py: get_ws_manager_from_app(), get_event_broadcaster_from_app()
- dashboard.py: get_event_buffer_from_app(), get_metrics_collector_from_app(), get_websocket_handler_from_app()
- entities.py: get_component_registry_from_app()
- events_ws.py: get_event_broadcaster_from_app()
Phase 2-3 Migration Progress (2026-03-03):
Phase 2 (app.state storage) and Phase 3 (route migration) are now complete.
Singletons wired to app.state in server.py and api/app.py: - admin_ws_manager, admin_event_broadcaster (WebSocket) - dashboard_event_buffer, dashboard_metrics_collector, dashboard_ws_handler (Dashboard) - admin_log_buffer, admin_log_stream_handler (Logging) - config_audit_log (Config) - audit_log_store (Middleware) - v1_event_broadcaster (v1 Events) - component_registry (Entity management) - settings, runtime_config (Configuration)
New _from_app getters created:
- middleware.py: get_audit_log_store_from_app()
- logs.py: get_log_buffer_from_app(), get_log_stream_handler_from_app()
- config.py: get_config_audit_log_from_app(), get_settings_from_app(), get_runtime_config_from_app()
- auth.py: get_key_store_from_app(), get_rate_limiter_from_app(), get_auth_rate_limiter_from_app() (public aliases)
Route functions migrated (30+ routes):
- config.py: 9 routes
- logs.py: 4 routes + 1 class method
- router.py: 4 routes (including WebSocket endpoints)
- dependencies.py: 1 dependency function
- admin.py: 3 routes
- world.py: 2 routes
- middleware.py: 1 middleware dispatch
- dashboard.py: 2 class constructors + internal methods
- events_ws.py: 1 WebSocket endpoint
Phase 4 Migration Progress (test isolation):
Test utility tests/helpers/app_factory.py provides create_test_app() for isolated test apps. create_test_app() now provisions auth components (api_key_store, api_rate_limiter, api_auth_rate_limiter) on app.state, matching production server initialization.
Migrated (13 files — use create_test_app() or app.state-based isolation in fixtures):
- test_admin_entities.py
- test_admin_websocket.py
- test_admin_dashboard.py
- api/test_middleware.py
- test_admin_logs.py
- test_admin_config.py
- api/v1/test_events_ws.py
- api/v1/test_admin.py
- test_api_auth.py
- api/v1/test_stats.py
- api/v1/test_world.py
- api/v1/test_players.py
- api/test_benchmarks.py
Conftest files updated:
- tests/conftest.py (maid-engine): autouse=False on reset_global_state
- tests/conftest.py (maid-classic-rpg): autouse=False on reset_global_state
- tests/conftest.py (maid-stdlib): autouse=False on reset_stdlib_global_state
- tests/conftest.py (maid-tutorial-world): autouse=False on reset_global_state
Remaining work (Phase 4):
- Remove legacy reset_*() / init_auth() module-level code once no tests depend on it (currently ~10 references remain in TestGlobalFunctions classes that deliberately test the singleton API itself)
Description:
Several API modules use module-level singletons (global variables with lazy initialization) to manage stateful objects like WebSocket managers, event broadcasters, metrics collectors, and component registries.
Current Pattern:
# Module-level global variable
_ws_manager: WebSocketManager | None = None
def get_websocket_manager() -> WebSocketManager:
"""Get or create the global WebSocket manager."""
global _ws_manager
if _ws_manager is None:
_ws_manager = WebSocketManager()
return _ws_manager
def reset_websocket_manager() -> None:
"""Reset the global WebSocket manager. Useful for testing."""
global _ws_manager
_ws_manager = None
Why This Exists:
The singleton pattern was adopted for simplicity during initial development. It provides:
- Simple access to shared state from anywhere in the API layer
- Lazy initialization (objects created only when needed)
- Easy integration with FastAPI's dependency injection via Depends(get_websocket_manager)
Problems:
- Testing Difficulties
- Tests must manually reset state between runs using
reset_*()functions - State from one test can leak into another if fixtures aren't configured correctly
-
Makes it harder to reason about test isolation
-
Parallel Test Execution
- Tests that modify these globals cannot safely run in parallel
-
This limits test suite performance on multi-core machines
-
Implicit Dependencies
- The init pattern (e.g.,
init_auth()) requires careful setup/teardown - Dependencies between modules are hidden rather than explicit
-
Makes code harder to understand and maintain
-
Single Instance Limitation
- Cannot easily create multiple instances for different scenarios
- Limits flexibility in testing edge cases
Current Workaround:
We provide reset functions and a unified reset_all_api_state() function for testing:
# In test fixtures
@pytest.fixture(autouse=True)
def reset_api_state():
yield
reset_all_api_state() # Resets all API singletons
Individual reset functions are also available:
- reset_auth_singletons() - Auth module (key store, rate limiter)
- reset_admin_api_state() - All admin API singletons
- reset_dashboard_state() - Dashboard metrics and event buffer
- reset_websocket_state() - WebSocket manager and broadcaster
- reset_component_registry() - Component registry
- reset_events_ws_state() - v1 events WebSocket broadcaster
- reset_audit_log_store() - Middleware audit log store
Recommended Migration Path:
Use FastAPI's built-in dependency injection more fully:
Phase 1: App State Storage
from contextlib import asynccontextmanager
from fastapi import FastAPI
@asynccontextmanager
async def lifespan(app: FastAPI):
# Initialize state during startup
app.state.ws_manager = WebSocketManager()
app.state.key_store = APIKeyStore(...)
app.state.rate_limiter = RateLimiter(...)
app.state.metrics_collector = MetricsCollector()
yield
# Cleanup during shutdown
await app.state.ws_manager.shutdown()
Phase 2: Dependency Functions
from fastapi import Request, Depends
def get_ws_manager(request: Request) -> WebSocketManager:
"""Dependency that retrieves WebSocketManager from app state."""
return request.app.state.ws_manager
def get_key_store(request: Request) -> APIKeyStore:
"""Dependency that retrieves APIKeyStore from app state."""
return request.app.state.key_store
Phase 3: Use in Routes
from typing import Annotated
@router.websocket("/ws")
async def websocket_endpoint(
websocket: WebSocket,
manager: Annotated[WebSocketManager, Depends(get_ws_manager)],
):
await manager.connect(websocket)
Phase 4: Testing with Overrides
def test_websocket_connection():
app = create_test_app()
mock_manager = Mock(spec=WebSocketManager)
# Override dependency for testing
app.dependency_overrides[get_ws_manager] = lambda: mock_manager
client = TestClient(app)
# Test with mock_manager
Benefits of Migration:
- Better Test Isolation - Each test can create its own app with fresh state
- Parallel Tests - No shared global state means safe parallel execution
- Explicit Dependencies - Dependencies are visible in function signatures
- Easy Mocking - Use
app.dependency_overridesinstead of patching globals - Type Safety - Better IDE support and type checking
Migration Considerations:
- Requires updating all route functions to use
Depends() - Need to ensure app state is available in all contexts (background tasks, etc.)
- Some edge cases around startup order may need careful handling
- Should be done incrementally, one module at a time
Estimated Effort: Medium (2-3 days per module)
References: - FastAPI Lifespan Events: https://fastapi.tiangolo.com/advanced/events/ - FastAPI Testing with Overrides: https://fastapi.tiangolo.com/advanced/testing-dependencies/ - Python Dependency Injection Patterns: https://python-dependency-injector.ets-labs.org/
TD-002: Low Severity Issues from Implementation Review¶
Status: Mostly resolved (2026-03-XX) Priority: Low Source: Multi-agent implementation review (2026-02-04)
Resolved (2026-03-XX)¶
- Environment variable naming: Fixed 9 env vars in CLAUDE.md, 37 env vars in README.md (double underscore → single underscore to match actual
env_prefixin settings.py) - Wrong field name: Fixed
MAID_ADMIN__JWT_SECRET_KEY→MAID_ADMIN_SECRET_KEY(wrong field name and delimiter) - Class name mismatch: Fixed
LoaderPipeline→Pipelineclass name reference in CLAUDE.md - API doc coverage gaps: Added OpenAPI summaries to 49 route decorators across 9 API files
- Env prefix
__in bridge settings: Verified as INTENTIONAL — Pydantic v2 nestedmodel_configrequires__forenv_nested_delimiterin sub-models (BridgeSettings→DiscordSettings). Not a bug. - Package location drift: Admin endpoints in both maid-engine and maid-stdlib is BY DESIGN — maid-engine's admin
__init__.pylazy-imports from maid-stdlib.
Remaining (Low Priority / Opportunistic)¶
- CLI command documentation gaps — ~50 undocumented commands in CLAUDE.md
Recommended Approach: Address when working on related code.
TD-003: Pervasive except Exception: Without Specific Handling¶
Status: ✅ Resolved (2026-03-XX)
Priority: Medium
Resolution: All broad except Exception catches have been addressed:
- ~130 narrowed to specific exception types (ConnectionError, TimeoutError, OSError, KeyError, ValueError, TypeError, ImportError, AttributeError, json.JSONDecodeError)
- ~320 annotated with # noqa: BLE001 where broad catches are justified (tick loops, event dispatch, plugin isolation, network resilience, arbitrary user code execution)
- 0 un-suppressed except Exception remaining across all source files
- All justified broad catches log at ERROR level with exc_info=True
- No silent except Exception: pass patterns remain
TD-004: type: ignore Suppressions Indicating Type System Gaps¶
Status: Substantially resolved (78 remaining in source, 0 bare; 2026-03-XX)
Priority: Low
Affected Files: 78 occurrences in source (down from 126), 0 bare # type: ignore
Progress:
- 48 suppressions eliminated via
cast(), type narrowing, assert guards, widened signatures - 0 bare
# type: ignoreremaining (all have specific error codes) - Event handler subscriptions fixed with
cast(EventHandler, handler)in social systems - MockEngine/MockWorld test fixtures properly typed with
cast() json.loadsreturn values cast todict[str, Any]- SSL config paths guarded with
assertbefore use - Relationship dimension dicts widened to accept
str | RelationshipDimensionkeys
Remaining 78 are legitimate:
- Optional imports (discord, aiohttp, psutil, redis, watchfiles)
- Pydantic @computed_field / model_config typing limitations
- Command decorator generic typing (ParamSpec edge cases)
- Loader phase execute() signature overrides
- asyncpg missing stubs
TD-005: Global Singleton Proliferation Beyond API Layer¶
Status: Partially resolved (2026-03-XX)
Priority: Medium
Affected Files: ~45 global keyword usages across source files (31 in API layer tracked by TD-001, ~14 non-API)
Progress:
_allowed_originsremoved —get_allowed_origins()reads fromget_settings()directly (already cached via@lru_cache)_shutdown_taskremoved — stored inrequest.app.state.shutdown_task_config_audit_logremoved — returned frominitialize_config_audit_log(), stored inapp.state_event_hook_subscription_idsremoved — stored asbroadcaster._event_hook_subscription_ids_loaderglobals (abilities, spells, skills) — refactored to accept optionalworldparameter, cached viaworld.set_data()when provided
Remaining globals (justified/architectural):
- Performance caches (keep as-is): _tokenizer, _PRICING_CACHE, _parser, _FALLBACK_PROFANITY_WORDS, _cached_default_words, _sampler
- API-layer (tracked by TD-001): _ws_manager, _event_broadcaster, _event_buffer, _metrics_collector, _websocket_handler, _log_buffer, _log_stream_handler, _key_store, _rate_limiter, _audit_log_store, _component_registry
- RPG state (future migration): _registry (commands), _olc_manager (builder tools)
- Config/logging: _runtime_config, _audit_logger
TD-006: Excessive Any Type Annotations in Core Modules¶
Status: Substantially resolved (~180 non-trivial remaining, down from ~320; 2026-03-XX)
Priority: Low
Affected Files: ~180 non-trivial : Any annotations remaining (excluding dict[str, Any], **kwargs: Any, list[Any])
Progress:
- NPC autonomy models:
world: Any→World,entity: Any→Entityacross 12 files - weapons.py: 12
weapon_type: Any→WeaponType | str | None - Building commands:
player_id: Any→UUID,ctx: Any→CommandContext,obj: Any→Entity | Componentacross 10 files - Loader pipeline:
world: Any→World,reference_registry: Any→ReferenceRegistryacross 6 files - Engine commands:
player_entity: Any→Entity,target: Any→Entity | Nonein arguments.py, locks.py, registry.py - Classic RPG:
item_manager: Any→ItemManager,character: Any→Character,store: Any→DocumentStorein 4 files - Stdlib:
_collection() -> Any→DocumentCollection[Any]in memory/knowledge/relationships managers - Network:
self._asgi_app: Any→ASGIApp,request_id: Any→str | int | Nonein web/server.py - Core:
inventory_items: list[tuple[UUID, list[Any], set[str]]]→list[tuple[UUID, list[Component], set[str]]]in multiworld.py
Remaining ~180 are necessary/low-value:
- dict[str, Any] patterns for JSON-like data (config values, API payloads, metadata)
- **kwargs: Any in forwarding functions
- Plugin interfaces with genuinely polymorphic parameters
- ECS generic component values
- Optional dependency types (discord, aiohttp stub limitations)
TD-007: Quest Generation System Type Safety Bypass¶
Status: ✅ Resolved (2026-03-XX)
Priority: Medium
Resolution: All type safety bypasses have been eliminated:
- Public APIs added: pending_seed_count, in_flight_count, status(), entries(), entries_for_character(), recent_entries()
- quest_generation_history stored via world.set_data() instead of monkey-patching
- SystemManager.get_system() enables type-safe system lookup
- 0 remaining # type: ignore or # noqa: SLF001 in quest generation code
TD-008: Deprecated Code Awaiting Removal¶
Status: ✅ Resolved (2026-03-XX) — all deprecated paths now emit DeprecationWarning
Priority: Low
Resolution:
- auth.py: 8 deprecated methods (generate_key, validate_key, revoke_key, check, record, reset, cleanup_stale_keys, get_stats) all emit DeprecationWarning with "will be removed in v2.0"
- rate_limiter.py: 2 deprecated methods (check_rate_limit, record_request) emit DeprecationWarning
- create.py: 11 functions with global fallback paths now emit DeprecationWarning via warnings.warn() (register_entity_factory, get_entity_factory, reset_entity_factories, _ensure_default_factories, _get_factory_keys, register_template, get_template, unregister_template, list_templates, clear_templates, get_template_count)
- Legacy position.py row-based damage/defense modifiers removed
- Remaining work: Remove deprecated code in v2.0 as documented
Resolved Technical Debt¶
Items that have been addressed are moved here for historical reference.
TD-013: Unused Variable in Content Pack Loading (Resolved 2026-03-04)¶
Event catalog now stored on GameEngine._event_catalog, exposed via event_catalog property. Content pack events are collected per-pack during loading. Variable is no longer unused and # noqa: F841 suppression removed.
TD-012: PvP Rankings Potentially Undefined Variables (Resolved 2026-03-04)¶
Variables (old_winner_rating, old_winner_rank, old_loser_rating, old_loser_rank) now defined unconditionally before the guarded block. All 6 # noqa: F821 suppressions removed.
TD-011: Private Module-Level Noqa Suppressions for Encapsulation Violations (Resolved 2026-03-04)¶
Public APIs added to replace private member access:
- BaseManager.cached_items property (replaces _cache access in ecs_item_bridge.py)
- NeedDecaySystem.set_memory_provider() (replaces system._memory_provider mutation in pack.py)
- SocialFabricSystem.set_relationship_provider() (replaces system._relationship_provider mutation)
- QuestGenerationSystem.register_seed_type() (replaces system._seed_registry mutation)
- Pipeline.default_context property (replaces pipeline._default_context access in cli/app.py)
TD-010: Migration Template Placeholder Values (Resolved 2026-03-04)¶
CLI arguments --collection, --component, --field now substitute placeholder values at template creation time. A pre-migrate validation guard rejects migrations with "TODO" in metadata fields, preventing invalid migrations from being applied.
TD-009: CI Benchmark Enforcement Missing (Resolved 2026-03-04)¶
.github/workflows/benchmark.yml created to enforce performance benchmarks in CI. Uses custom timing assertions (not pytest-benchmark) to validate performance targets against documented thresholds.
TD-002-8: PO Parser Escape Sequences (Resolved 2026-02-04)¶
Comprehensive escape sequence handling was already implemented. Added tests verifying handling of \n, \t, \r, \\, \", \xNN, \uXXXX, \UXXXXXXXX, octal escapes, and C-style escapes (\a, \b, \f, \v).
TD-002-9: Locale Handling Edge Cases (Resolved 2026-02-04)¶
Added comprehensive tests for locale normalization edge cases including empty strings, whitespace, script codes (zh-Hans, zh-Hant), unknown locales, mixed case handling, and multiple separators.
TD-002-10: conversation_count Property Lock (Resolved 2026-02-04)¶
The property is already thread-safe: in asyncio, only one coroutine runs at a time (cooperative multitasking), and Python's len() on dict is atomic in CPython. Added documentation explaining this and tests verifying the behavior.
TD-002-11: Token Estimation Edge Cases (Resolved 2026-02-04)¶
Added comprehensive edge case tests for token estimation: Unicode text (Russian, Chinese, Arabic), emoji handling, mixed scripts, punctuation-only text, whitespace-only, single characters, newlines/tabs, very long text, and messages missing content keys.
TD-002-12: Callable Conditions in Persistence (Resolved 2026-02-04)¶
Not an issue: all conditions use serializable string enums (SpawnCondition, ResetCondition, etc.) rather than callables. Pydantic's model_dump_json() would fail loudly if non-serializable data were present. The architecture correctly separates serializable data models from runtime behavior.
Contributing¶
When adding new technical debt documentation:
- Assign an ID - Use the format
TD-XXXwith incrementing numbers - Document why - Explain why the pattern exists, not just that it's bad
- Provide workarounds - Help current developers work with the code as-is
- Plan migration - Include a concrete migration path with code examples
- Estimate effort - Give a rough estimate to help with prioritization
When resolving technical debt:
- Update this document - Move the item to "Resolved" section
- Remove TODO comments - Clean up the source code markers
- Update tests - Remove any workarounds that are no longer needed
- Document in PR - Reference the tech debt item in your PR description
Last updated: 2026-03-04
See also: Exception Handling Policy | Style Guide