Technical Debt Tracking¶

This document tracks known technical debt in the MAID codebase. Technical debt represents areas where we've made pragmatic trade-offs that should be addressed in future refactoring efforts.

Overview¶

Technical debt is not inherently bad - it often represents reasonable decisions made under constraints. This document helps us:

Track known debt so it doesn't get forgotten
Prioritize which items to address first
Plan migration paths for future refactoring
Onboard new contributors by explaining why certain patterns exist

Active Technical Debt Items¶

TD-001: Module-Level Singletons in API Layer¶

Status: Partially migrated (Phases 1-3 complete, Phase 4 in progress) Priority: Medium Affected Files: - packages/maid-engine/src/maid_engine/api/auth.py - packages/maid-engine/src/maid_engine/api/admin/websocket.py - packages/maid-engine/src/maid_engine/api/admin/dashboard.py - packages/maid-engine/src/maid_engine/api/admin/entities.py - packages/maid-engine/src/maid_engine/api/v1/events_ws.py

Phase 1 Migration Progress (2026-02-04):

Phase 1 is complete. All affected modules now have app.state-aware getter functions that: 1. Check request.app.state first for the dependency 2. Fall back to the module-level global if not found in app.state

This provides backwards compatibility while enabling new code to use proper dependency injection. The server initialization code (net/web/server.py) now stores auth components in app.state.

New app.state-aware functions added (Phase 1): - auth.py: _get_key_store_from_app(), _get_rate_limiter_from_app(), _get_auth_rate_limiter_from_app() - websocket.py: get_ws_manager_from_app(), get_event_broadcaster_from_app() - dashboard.py: get_event_buffer_from_app(), get_metrics_collector_from_app(), get_websocket_handler_from_app() - entities.py: get_component_registry_from_app() - events_ws.py: get_event_broadcaster_from_app()

Phase 2-3 Migration Progress (2026-03-03):

Phase 2 (app.state storage) and Phase 3 (route migration) are now complete.

Singletons wired to app.state in server.py and api/app.py: - admin_ws_manager, admin_event_broadcaster (WebSocket) - dashboard_event_buffer, dashboard_metrics_collector, dashboard_ws_handler (Dashboard) - admin_log_buffer, admin_log_stream_handler (Logging) - config_audit_log (Config) - audit_log_store (Middleware) - v1_event_broadcaster (v1 Events) - component_registry (Entity management) - settings, runtime_config (Configuration)

New _from_app getters created: - middleware.py: get_audit_log_store_from_app() - logs.py: get_log_buffer_from_app(), get_log_stream_handler_from_app() - config.py: get_config_audit_log_from_app(), get_settings_from_app(), get_runtime_config_from_app() - auth.py: get_key_store_from_app(), get_rate_limiter_from_app(), get_auth_rate_limiter_from_app() (public aliases)

Route functions migrated (30+ routes): - config.py: 9 routes - logs.py: 4 routes + 1 class method - router.py: 4 routes (including WebSocket endpoints) - dependencies.py: 1 dependency function - admin.py: 3 routes - world.py: 2 routes - middleware.py: 1 middleware dispatch - dashboard.py: 2 class constructors + internal methods - events_ws.py: 1 WebSocket endpoint

Phase 4 Migration Progress (test isolation):

Test utility tests/helpers/app_factory.py provides create_test_app() for isolated test apps. create_test_app() now provisions auth components (api_key_store, api_rate_limiter, api_auth_rate_limiter) on app.state, matching production server initialization.

Migrated (13 files — use create_test_app() or app.state-based isolation in fixtures): - test_admin_entities.py - test_admin_websocket.py - test_admin_dashboard.py - api/test_middleware.py - test_admin_logs.py - test_admin_config.py - api/v1/test_events_ws.py - api/v1/test_admin.py - test_api_auth.py - api/v1/test_stats.py - api/v1/test_world.py - api/v1/test_players.py - api/test_benchmarks.py

Conftest files updated: - tests/conftest.py (maid-engine): autouse=False on reset_global_state - tests/conftest.py (maid-classic-rpg): autouse=False on reset_global_state - tests/conftest.py (maid-stdlib): autouse=False on reset_stdlib_global_state - tests/conftest.py (maid-tutorial-world): autouse=False on reset_global_state

Remaining work (Phase 4): - Remove legacy reset_*() / init_auth() module-level code once no tests depend on it (currently ~10 references remain in TestGlobalFunctions classes that deliberately test the singleton API itself)

Description:

Several API modules use module-level singletons (global variables with lazy initialization) to manage stateful objects like WebSocket managers, event broadcasters, metrics collectors, and component registries.

Current Pattern:

# Module-level global variable
_ws_manager: WebSocketManager | None = None

def get_websocket_manager() -> WebSocketManager:
    """Get or create the global WebSocket manager."""
    global _ws_manager
    if _ws_manager is None:
        _ws_manager = WebSocketManager()
    return _ws_manager

def reset_websocket_manager() -> None:
    """Reset the global WebSocket manager. Useful for testing."""
    global _ws_manager
    _ws_manager = None

Why This Exists:

The singleton pattern was adopted for simplicity during initial development. It provides: - Simple access to shared state from anywhere in the API layer - Lazy initialization (objects created only when needed) - Easy integration with FastAPI's dependency injection via Depends(get_websocket_manager)

Problems:

Testing Difficulties
Tests must manually reset state between runs using reset_*() functions
State from one test can leak into another if fixtures aren't configured correctly
Makes it harder to reason about test isolation
Parallel Test Execution
Tests that modify these globals cannot safely run in parallel
This limits test suite performance on multi-core machines
Implicit Dependencies
The init pattern (e.g., init_auth()) requires careful setup/teardown
Dependencies between modules are hidden rather than explicit
Makes code harder to understand and maintain
Single Instance Limitation
Cannot easily create multiple instances for different scenarios
Limits flexibility in testing edge cases

Current Workaround:

We provide reset functions and a unified reset_all_api_state() function for testing:

# In test fixtures
@pytest.fixture(autouse=True)
def reset_api_state():
    yield
    reset_all_api_state()  # Resets all API singletons

Individual reset functions are also available: - reset_auth_singletons() - Auth module (key store, rate limiter) - reset_admin_api_state() - All admin API singletons - reset_dashboard_state() - Dashboard metrics and event buffer - reset_websocket_state() - WebSocket manager and broadcaster - reset_component_registry() - Component registry - reset_events_ws_state() - v1 events WebSocket broadcaster - reset_audit_log_store() - Middleware audit log store

Recommended Migration Path:

Use FastAPI's built-in dependency injection more fully:

Phase 1: App State Storage

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize state during startup
    app.state.ws_manager = WebSocketManager()
    app.state.key_store = APIKeyStore(...)
    app.state.rate_limiter = RateLimiter(...)
    app.state.metrics_collector = MetricsCollector()
    yield
    # Cleanup during shutdown
    await app.state.ws_manager.shutdown()

Phase 2: Dependency Functions

from fastapi import Request, Depends

def get_ws_manager(request: Request) -> WebSocketManager:
    """Dependency that retrieves WebSocketManager from app state."""
    return request.app.state.ws_manager

def get_key_store(request: Request) -> APIKeyStore:
    """Dependency that retrieves APIKeyStore from app state."""
    return request.app.state.key_store

Phase 3: Use in Routes

from typing import Annotated

@router.websocket("/ws")
async def websocket_endpoint(
    websocket: WebSocket,
    manager: Annotated[WebSocketManager, Depends(get_ws_manager)],
):
    await manager.connect(websocket)

Phase 4: Testing with Overrides

def test_websocket_connection():
    app = create_test_app()
    mock_manager = Mock(spec=WebSocketManager)

    # Override dependency for testing
    app.dependency_overrides[get_ws_manager] = lambda: mock_manager

    client = TestClient(app)
    # Test with mock_manager

Benefits of Migration:

Better Test Isolation - Each test can create its own app with fresh state
Parallel Tests - No shared global state means safe parallel execution
Explicit Dependencies - Dependencies are visible in function signatures
Easy Mocking - Use app.dependency_overrides instead of patching globals
Type Safety - Better IDE support and type checking

Migration Considerations:

Requires updating all route functions to use Depends()
Need to ensure app state is available in all contexts (background tasks, etc.)
Some edge cases around startup order may need careful handling
Should be done incrementally, one module at a time

Estimated Effort: Medium (2-3 days per module)

References: - FastAPI Lifespan Events: https://fastapi.tiangolo.com/advanced/events/ - FastAPI Testing with Overrides: https://fastapi.tiangolo.com/advanced/testing-dependencies/ - Python Dependency Injection Patterns: https://python-dependency-injector.ets-labs.org/

TD-002: Low Severity Issues from Implementation Review¶

Status: Mostly resolved (2026-03-XX) Priority: Low Source: Multi-agent implementation review (2026-02-04)

Resolved (2026-03-XX)¶

Environment variable naming: Fixed 9 env vars in CLAUDE.md, 37 env vars in README.md (double underscore → single underscore to match actual env_prefix in settings.py)
Wrong field name: Fixed MAID_ADMIN__JWT_SECRET_KEY → MAID_ADMIN_SECRET_KEY (wrong field name and delimiter)
Class name mismatch: Fixed LoaderPipeline → Pipeline class name reference in CLAUDE.md
API doc coverage gaps: Added OpenAPI summaries to 49 route decorators across 9 API files
Env prefix __ in bridge settings: Verified as INTENTIONAL — Pydantic v2 nested model_config requires __ for env_nested_delimiter in sub-models (BridgeSettings → DiscordSettings). Not a bug.
Package location drift: Admin endpoints in both maid-engine and maid-stdlib is BY DESIGN — maid-engine's admin __init__.py lazy-imports from maid-stdlib.

Remaining (Low Priority / Opportunistic)¶

CLI command documentation gaps — ~50 undocumented commands in CLAUDE.md

Recommended Approach: Address when working on related code.

TD-003: Pervasive `except Exception:` Without Specific Handling¶

Status: ✅ Resolved (2026-03-XX) Priority: Medium Resolution: All broad except Exception catches have been addressed: - ~130 narrowed to specific exception types (ConnectionError, TimeoutError, OSError, KeyError, ValueError, TypeError, ImportError, AttributeError, json.JSONDecodeError) - ~320 annotated with # noqa: BLE001 where broad catches are justified (tick loops, event dispatch, plugin isolation, network resilience, arbitrary user code execution) - 0 un-suppressed except Exception remaining across all source files - All justified broad catches log at ERROR level with exc_info=True - No silent except Exception: pass patterns remain

TD-004: `type: ignore` Suppressions Indicating Type System Gaps¶

Status: Substantially resolved (78 remaining in source, 0 bare; 2026-03-XX) Priority: Low Affected Files: 78 occurrences in source (down from 126), 0 bare # type: ignore

Progress:

48 suppressions eliminated via cast(), type narrowing, assert guards, widened signatures
0 bare # type: ignore remaining (all have specific error codes)
Event handler subscriptions fixed with cast(EventHandler, handler) in social systems
MockEngine/MockWorld test fixtures properly typed with cast()
json.loads return values cast to dict[str, Any]
SSL config paths guarded with assert before use
Relationship dimension dicts widened to accept str | RelationshipDimension keys

Remaining 78 are legitimate: - Optional imports (discord, aiohttp, psutil, redis, watchfiles) - Pydantic @computed_field / model_config typing limitations - Command decorator generic typing (ParamSpec edge cases) - Loader phase execute() signature overrides - asyncpg missing stubs

TD-005: Global Singleton Proliferation Beyond API Layer¶

Status: Partially resolved (2026-03-XX) Priority: Medium Affected Files: ~45 global keyword usages across source files (31 in API layer tracked by TD-001, ~14 non-API)

Progress:

_allowed_origins removed — get_allowed_origins() reads from get_settings() directly (already cached via @lru_cache)
_shutdown_task removed — stored in request.app.state.shutdown_task
_config_audit_log removed — returned from initialize_config_audit_log(), stored in app.state
_event_hook_subscription_ids removed — stored as broadcaster._event_hook_subscription_ids
_loader globals (abilities, spells, skills) — refactored to accept optional world parameter, cached via world.set_data() when provided

Remaining globals (justified/architectural): - Performance caches (keep as-is): _tokenizer, _PRICING_CACHE, _parser, _FALLBACK_PROFANITY_WORDS, _cached_default_words, _sampler - API-layer (tracked by TD-001): _ws_manager, _event_broadcaster, _event_buffer, _metrics_collector, _websocket_handler, _log_buffer, _log_stream_handler, _key_store, _rate_limiter, _audit_log_store, _component_registry - RPG state (future migration): _registry (commands), _olc_manager (builder tools) - Config/logging: _runtime_config, _audit_logger

TD-006: Excessive `Any` Type Annotations in Core Modules¶

Status: Substantially resolved (~180 non-trivial remaining, down from ~320; 2026-03-XX) Priority: Low Affected Files: ~180 non-trivial : Any annotations remaining (excluding dict[str, Any], **kwargs: Any, list[Any])

Progress:

NPC autonomy models: world: Any → World, entity: Any → Entity across 12 files
weapons.py: 12 weapon_type: Any → WeaponType | str | None
Building commands: player_id: Any → UUID, ctx: Any → CommandContext, obj: Any → Entity | Component across 10 files
Loader pipeline: world: Any → World, reference_registry: Any → ReferenceRegistry across 6 files
Engine commands: player_entity: Any → Entity, target: Any → Entity | None in arguments.py, locks.py, registry.py
Classic RPG: item_manager: Any → ItemManager, character: Any → Character, store: Any → DocumentStore in 4 files
Stdlib: _collection() -> Any → DocumentCollection[Any] in memory/knowledge/relationships managers
Network: self._asgi_app: Any → ASGIApp, request_id: Any → str | int | None in web/server.py
Core: inventory_items: list[tuple[UUID, list[Any], set[str]]] → list[tuple[UUID, list[Component], set[str]]] in multiworld.py

Remaining ~180 are necessary/low-value: - dict[str, Any] patterns for JSON-like data (config values, API payloads, metadata) - **kwargs: Any in forwarding functions - Plugin interfaces with genuinely polymorphic parameters - ECS generic component values - Optional dependency types (discord, aiohttp stub limitations)

TD-007: Quest Generation System Type Safety Bypass¶

Status: ✅ Resolved (2026-03-XX) Priority: Medium Resolution: All type safety bypasses have been eliminated: - Public APIs added: pending_seed_count, in_flight_count, status(), entries(), entries_for_character(), recent_entries() - quest_generation_history stored via world.set_data() instead of monkey-patching - SystemManager.get_system() enables type-safe system lookup - 0 remaining # type: ignore or # noqa: SLF001 in quest generation code

TD-008: Deprecated Code Awaiting Removal¶

Status: ✅ Resolved (2026-03-XX) — all deprecated paths now emit DeprecationWarning Priority: Low Resolution: - auth.py: 8 deprecated methods (generate_key, validate_key, revoke_key, check, record, reset, cleanup_stale_keys, get_stats) all emit DeprecationWarning with "will be removed in v2.0" - rate_limiter.py: 2 deprecated methods (check_rate_limit, record_request) emit DeprecationWarning - create.py: 11 functions with global fallback paths now emit DeprecationWarning via warnings.warn() (register_entity_factory, get_entity_factory, reset_entity_factories, _ensure_default_factories, _get_factory_keys, register_template, get_template, unregister_template, list_templates, clear_templates, get_template_count) - Legacy position.py row-based damage/defense modifiers removed - Remaining work: Remove deprecated code in v2.0 as documented

Resolved Technical Debt¶

Items that have been addressed are moved here for historical reference.

TD-013: Unused Variable in Content Pack Loading (Resolved 2026-03-04)¶

Event catalog now stored on GameEngine._event_catalog, exposed via event_catalog property. Content pack events are collected per-pack during loading. Variable is no longer unused and # noqa: F841 suppression removed.

TD-012: PvP Rankings Potentially Undefined Variables (Resolved 2026-03-04)¶

Variables (old_winner_rating, old_winner_rank, old_loser_rating, old_loser_rank) now defined unconditionally before the guarded block. All 6 # noqa: F821 suppressions removed.

TD-011: Private Module-Level Noqa Suppressions for Encapsulation Violations (Resolved 2026-03-04)¶

Public APIs added to replace private member access: - BaseManager.cached_items property (replaces _cache access in ecs_item_bridge.py) - NeedDecaySystem.set_memory_provider() (replaces system._memory_provider mutation in pack.py) - SocialFabricSystem.set_relationship_provider() (replaces system._relationship_provider mutation) - QuestGenerationSystem.register_seed_type() (replaces system._seed_registry mutation) - Pipeline.default_context property (replaces pipeline._default_context access in cli/app.py)

TD-010: Migration Template Placeholder Values (Resolved 2026-03-04)¶

CLI arguments --collection, --component, --field now substitute placeholder values at template creation time. A pre-migrate validation guard rejects migrations with "TODO" in metadata fields, preventing invalid migrations from being applied.

TD-009: CI Benchmark Enforcement Missing (Resolved 2026-03-04)¶

.github/workflows/benchmark.yml created to enforce performance benchmarks in CI. Uses custom timing assertions (not pytest-benchmark) to validate performance targets against documented thresholds.

TD-002-8: PO Parser Escape Sequences (Resolved 2026-02-04)¶

Comprehensive escape sequence handling was already implemented. Added tests verifying handling of \n, \t, \r, \\, \", \xNN, \uXXXX, \UXXXXXXXX, octal escapes, and C-style escapes (\a, \b, \f, \v).

TD-002-9: Locale Handling Edge Cases (Resolved 2026-02-04)¶

Added comprehensive tests for locale normalization edge cases including empty strings, whitespace, script codes (zh-Hans, zh-Hant), unknown locales, mixed case handling, and multiple separators.

TD-002-10: conversation_count Property Lock (Resolved 2026-02-04)¶

The property is already thread-safe: in asyncio, only one coroutine runs at a time (cooperative multitasking), and Python's len() on dict is atomic in CPython. Added documentation explaining this and tests verifying the behavior.

TD-002-11: Token Estimation Edge Cases (Resolved 2026-02-04)¶

Added comprehensive edge case tests for token estimation: Unicode text (Russian, Chinese, Arabic), emoji handling, mixed scripts, punctuation-only text, whitespace-only, single characters, newlines/tabs, very long text, and messages missing content keys.

TD-002-12: Callable Conditions in Persistence (Resolved 2026-02-04)¶

Not an issue: all conditions use serializable string enums (SpawnCondition, ResetCondition, etc.) rather than callables. Pydantic's model_dump_json() would fail loudly if non-serializable data were present. The architecture correctly separates serializable data models from runtime behavior.

Contributing¶

When adding new technical debt documentation:

Assign an ID - Use the format TD-XXX with incrementing numbers
Document why - Explain why the pattern exists, not just that it's bad
Provide workarounds - Help current developers work with the code as-is
Plan migration - Include a concrete migration path with code examples
Estimate effort - Give a rough estimate to help with prioritization

When resolving technical debt:

Update this document - Move the item to "Resolved" section
Remove TODO comments - Clean up the source code markers
Update tests - Remove any workarounds that are no longer needed
Document in PR - Reference the tech debt item in your PR description

Last updated: 2026-03-04

Technical Debt Tracking¶

Overview¶

Active Technical Debt Items¶

TD-001: Module-Level Singletons in API Layer¶

TD-002: Low Severity Issues from Implementation Review¶

Resolved (2026-03-XX)¶

Remaining (Low Priority / Opportunistic)¶

TD-003: Pervasive except Exception: Without Specific Handling¶

TD-004: type: ignore Suppressions Indicating Type System Gaps¶

TD-005: Global Singleton Proliferation Beyond API Layer¶

TD-006: Excessive Any Type Annotations in Core Modules¶

TD-007: Quest Generation System Type Safety Bypass¶

TD-008: Deprecated Code Awaiting Removal¶

Resolved Technical Debt¶

TD-013: Unused Variable in Content Pack Loading (Resolved 2026-03-04)¶

TD-012: PvP Rankings Potentially Undefined Variables (Resolved 2026-03-04)¶

TD-011: Private Module-Level Noqa Suppressions for Encapsulation Violations (Resolved 2026-03-04)¶

TD-010: Migration Template Placeholder Values (Resolved 2026-03-04)¶

TD-009: CI Benchmark Enforcement Missing (Resolved 2026-03-04)¶

TD-002-8: PO Parser Escape Sequences (Resolved 2026-02-04)¶

TD-002-9: Locale Handling Edge Cases (Resolved 2026-02-04)¶

TD-002-10: conversation_count Property Lock (Resolved 2026-02-04)¶

TD-002-11: Token Estimation Edge Cases (Resolved 2026-02-04)¶

TD-002-12: Callable Conditions in Persistence (Resolved 2026-02-04)¶

Contributing¶

TD-003: Pervasive `except Exception:` Without Specific Handling¶

TD-004: `type: ignore` Suppressions Indicating Type System Gaps¶

TD-006: Excessive `Any` Type Annotations in Core Modules¶