Skip to content

Technical Debt Tracking

This document tracks known technical debt in the MAID codebase. Technical debt represents areas where we've made pragmatic trade-offs that should be addressed in future refactoring efforts.

Overview

Technical debt is not inherently bad - it often represents reasonable decisions made under constraints. This document helps us:

  1. Track known debt so it doesn't get forgotten
  2. Prioritize which items to address first
  3. Plan migration paths for future refactoring
  4. Onboard new contributors by explaining why certain patterns exist

Active Technical Debt Items

TD-001: Module-Level Singletons in API Layer

Status: Partially migrated (Phases 1-3 complete, Phase 4 in progress) Priority: Medium Affected Files: - packages/maid-engine/src/maid_engine/api/auth.py - packages/maid-engine/src/maid_engine/api/admin/websocket.py - packages/maid-engine/src/maid_engine/api/admin/dashboard.py - packages/maid-engine/src/maid_engine/api/admin/entities.py - packages/maid-engine/src/maid_engine/api/v1/events_ws.py

Phase 1 Migration Progress (2026-02-04):

Phase 1 is complete. All affected modules now have app.state-aware getter functions that: 1. Check request.app.state first for the dependency 2. Fall back to the module-level global if not found in app.state

This provides backwards compatibility while enabling new code to use proper dependency injection. The server initialization code (net/web/server.py) now stores auth components in app.state.

New app.state-aware functions added (Phase 1): - auth.py: _get_key_store_from_app(), _get_rate_limiter_from_app(), _get_auth_rate_limiter_from_app() - websocket.py: get_ws_manager_from_app(), get_event_broadcaster_from_app() - dashboard.py: get_event_buffer_from_app(), get_metrics_collector_from_app(), get_websocket_handler_from_app() - entities.py: get_component_registry_from_app() - events_ws.py: get_event_broadcaster_from_app()

Phase 2-3 Migration Progress (2026-03-03):

Phase 2 (app.state storage) and Phase 3 (route migration) are now complete.

Singletons wired to app.state in server.py and api/app.py: - admin_ws_manager, admin_event_broadcaster (WebSocket) - dashboard_event_buffer, dashboard_metrics_collector, dashboard_ws_handler (Dashboard) - admin_log_buffer, admin_log_stream_handler (Logging) - config_audit_log (Config) - audit_log_store (Middleware) - v1_event_broadcaster (v1 Events) - component_registry (Entity management) - settings, runtime_config (Configuration)

New _from_app getters created: - middleware.py: get_audit_log_store_from_app() - logs.py: get_log_buffer_from_app(), get_log_stream_handler_from_app() - config.py: get_config_audit_log_from_app(), get_settings_from_app(), get_runtime_config_from_app() - auth.py: get_key_store_from_app(), get_rate_limiter_from_app(), get_auth_rate_limiter_from_app() (public aliases)

Route functions migrated (30+ routes): - config.py: 9 routes - logs.py: 4 routes + 1 class method - router.py: 4 routes (including WebSocket endpoints) - dependencies.py: 1 dependency function - admin.py: 3 routes - world.py: 2 routes - middleware.py: 1 middleware dispatch - dashboard.py: 2 class constructors + internal methods - events_ws.py: 1 WebSocket endpoint

Phase 4 Migration Progress (test isolation):

Test utility tests/helpers/app_factory.py provides create_test_app() for isolated test apps. create_test_app() now provisions auth components (api_key_store, api_rate_limiter, api_auth_rate_limiter) on app.state, matching production server initialization.

Migrated (13 files — use create_test_app() or app.state-based isolation in fixtures): - test_admin_entities.py - test_admin_websocket.py - test_admin_dashboard.py - api/test_middleware.py - test_admin_logs.py - test_admin_config.py - api/v1/test_events_ws.py - api/v1/test_admin.py - test_api_auth.py - api/v1/test_stats.py - api/v1/test_world.py - api/v1/test_players.py - api/test_benchmarks.py

Conftest files updated: - tests/conftest.py (maid-engine): autouse=False on reset_global_state - tests/conftest.py (maid-classic-rpg): autouse=False on reset_global_state - tests/conftest.py (maid-stdlib): autouse=False on reset_stdlib_global_state - tests/conftest.py (maid-tutorial-world): autouse=False on reset_global_state

Remaining work (Phase 4): - Remove legacy reset_*() / init_auth() module-level code once no tests depend on it (currently ~10 references remain in TestGlobalFunctions classes that deliberately test the singleton API itself)

Description:

Several API modules use module-level singletons (global variables with lazy initialization) to manage stateful objects like WebSocket managers, event broadcasters, metrics collectors, and component registries.

Current Pattern:

# Module-level global variable
_ws_manager: WebSocketManager | None = None

def get_websocket_manager() -> WebSocketManager:
    """Get or create the global WebSocket manager."""
    global _ws_manager
    if _ws_manager is None:
        _ws_manager = WebSocketManager()
    return _ws_manager

def reset_websocket_manager() -> None:
    """Reset the global WebSocket manager. Useful for testing."""
    global _ws_manager
    _ws_manager = None

Why This Exists:

The singleton pattern was adopted for simplicity during initial development. It provides: - Simple access to shared state from anywhere in the API layer - Lazy initialization (objects created only when needed) - Easy integration with FastAPI's dependency injection via Depends(get_websocket_manager)

Problems:

  1. Testing Difficulties
  2. Tests must manually reset state between runs using reset_*() functions
  3. State from one test can leak into another if fixtures aren't configured correctly
  4. Makes it harder to reason about test isolation

  5. Parallel Test Execution

  6. Tests that modify these globals cannot safely run in parallel
  7. This limits test suite performance on multi-core machines

  8. Implicit Dependencies

  9. The init pattern (e.g., init_auth()) requires careful setup/teardown
  10. Dependencies between modules are hidden rather than explicit
  11. Makes code harder to understand and maintain

  12. Single Instance Limitation

  13. Cannot easily create multiple instances for different scenarios
  14. Limits flexibility in testing edge cases

Current Workaround:

We provide reset functions and a unified reset_all_api_state() function for testing:

# In test fixtures
@pytest.fixture(autouse=True)
def reset_api_state():
    yield
    reset_all_api_state()  # Resets all API singletons

Individual reset functions are also available: - reset_auth_singletons() - Auth module (key store, rate limiter) - reset_admin_api_state() - All admin API singletons - reset_dashboard_state() - Dashboard metrics and event buffer - reset_websocket_state() - WebSocket manager and broadcaster - reset_component_registry() - Component registry - reset_events_ws_state() - v1 events WebSocket broadcaster - reset_audit_log_store() - Middleware audit log store

Recommended Migration Path:

Use FastAPI's built-in dependency injection more fully:

Phase 1: App State Storage

from contextlib import asynccontextmanager
from fastapi import FastAPI

@asynccontextmanager
async def lifespan(app: FastAPI):
    # Initialize state during startup
    app.state.ws_manager = WebSocketManager()
    app.state.key_store = APIKeyStore(...)
    app.state.rate_limiter = RateLimiter(...)
    app.state.metrics_collector = MetricsCollector()
    yield
    # Cleanup during shutdown
    await app.state.ws_manager.shutdown()

Phase 2: Dependency Functions

from fastapi import Request, Depends

def get_ws_manager(request: Request) -> WebSocketManager:
    """Dependency that retrieves WebSocketManager from app state."""
    return request.app.state.ws_manager

def get_key_store(request: Request) -> APIKeyStore:
    """Dependency that retrieves APIKeyStore from app state."""
    return request.app.state.key_store

Phase 3: Use in Routes

from typing import Annotated

@router.websocket("/ws")
async def websocket_endpoint(
    websocket: WebSocket,
    manager: Annotated[WebSocketManager, Depends(get_ws_manager)],
):
    await manager.connect(websocket)

Phase 4: Testing with Overrides

def test_websocket_connection():
    app = create_test_app()
    mock_manager = Mock(spec=WebSocketManager)

    # Override dependency for testing
    app.dependency_overrides[get_ws_manager] = lambda: mock_manager

    client = TestClient(app)
    # Test with mock_manager

Benefits of Migration:

  1. Better Test Isolation - Each test can create its own app with fresh state
  2. Parallel Tests - No shared global state means safe parallel execution
  3. Explicit Dependencies - Dependencies are visible in function signatures
  4. Easy Mocking - Use app.dependency_overrides instead of patching globals
  5. Type Safety - Better IDE support and type checking

Migration Considerations:

  • Requires updating all route functions to use Depends()
  • Need to ensure app state is available in all contexts (background tasks, etc.)
  • Some edge cases around startup order may need careful handling
  • Should be done incrementally, one module at a time

Estimated Effort: Medium (2-3 days per module)

References: - FastAPI Lifespan Events: https://fastapi.tiangolo.com/advanced/events/ - FastAPI Testing with Overrides: https://fastapi.tiangolo.com/advanced/testing-dependencies/ - Python Dependency Injection Patterns: https://python-dependency-injector.ets-labs.org/


TD-002: Low Severity Issues from Implementation Review

Status: Mostly resolved (2026-03-XX) Priority: Low Source: Multi-agent implementation review (2026-02-04)

Resolved (2026-03-XX)

  • Environment variable naming: Fixed 9 env vars in CLAUDE.md, 37 env vars in README.md (double underscore → single underscore to match actual env_prefix in settings.py)
  • Wrong field name: Fixed MAID_ADMIN__JWT_SECRET_KEYMAID_ADMIN_SECRET_KEY (wrong field name and delimiter)
  • Class name mismatch: Fixed LoaderPipelinePipeline class name reference in CLAUDE.md
  • API doc coverage gaps: Added OpenAPI summaries to 49 route decorators across 9 API files
  • Env prefix __ in bridge settings: Verified as INTENTIONAL — Pydantic v2 nested model_config requires __ for env_nested_delimiter in sub-models (BridgeSettingsDiscordSettings). Not a bug.
  • Package location drift: Admin endpoints in both maid-engine and maid-stdlib is BY DESIGN — maid-engine's admin __init__.py lazy-imports from maid-stdlib.

Remaining (Low Priority / Opportunistic)

  • CLI command documentation gaps — ~50 undocumented commands in CLAUDE.md

Recommended Approach: Address when working on related code.


TD-003: Pervasive except Exception: Without Specific Handling

Status: ✅ Resolved (2026-03-XX) Priority: Medium Resolution: All broad except Exception catches have been addressed: - ~130 narrowed to specific exception types (ConnectionError, TimeoutError, OSError, KeyError, ValueError, TypeError, ImportError, AttributeError, json.JSONDecodeError) - ~320 annotated with # noqa: BLE001 where broad catches are justified (tick loops, event dispatch, plugin isolation, network resilience, arbitrary user code execution) - 0 un-suppressed except Exception remaining across all source files - All justified broad catches log at ERROR level with exc_info=True - No silent except Exception: pass patterns remain


TD-004: type: ignore Suppressions Indicating Type System Gaps

Status: Substantially resolved (78 remaining in source, 0 bare; 2026-03-XX) Priority: Low Affected Files: 78 occurrences in source (down from 126), 0 bare # type: ignore

Progress:

  • 48 suppressions eliminated via cast(), type narrowing, assert guards, widened signatures
  • 0 bare # type: ignore remaining (all have specific error codes)
  • Event handler subscriptions fixed with cast(EventHandler, handler) in social systems
  • MockEngine/MockWorld test fixtures properly typed with cast()
  • json.loads return values cast to dict[str, Any]
  • SSL config paths guarded with assert before use
  • Relationship dimension dicts widened to accept str | RelationshipDimension keys

Remaining 78 are legitimate: - Optional imports (discord, aiohttp, psutil, redis, watchfiles) - Pydantic @computed_field / model_config typing limitations - Command decorator generic typing (ParamSpec edge cases) - Loader phase execute() signature overrides - asyncpg missing stubs


TD-005: Global Singleton Proliferation Beyond API Layer

Status: Partially resolved (2026-03-XX) Priority: Medium Affected Files: ~45 global keyword usages across source files (31 in API layer tracked by TD-001, ~14 non-API)

Progress:

  • _allowed_origins removed — get_allowed_origins() reads from get_settings() directly (already cached via @lru_cache)
  • _shutdown_task removed — stored in request.app.state.shutdown_task
  • _config_audit_log removed — returned from initialize_config_audit_log(), stored in app.state
  • _event_hook_subscription_ids removed — stored as broadcaster._event_hook_subscription_ids
  • _loader globals (abilities, spells, skills) — refactored to accept optional world parameter, cached via world.set_data() when provided

Remaining globals (justified/architectural): - Performance caches (keep as-is): _tokenizer, _PRICING_CACHE, _parser, _FALLBACK_PROFANITY_WORDS, _cached_default_words, _sampler - API-layer (tracked by TD-001): _ws_manager, _event_broadcaster, _event_buffer, _metrics_collector, _websocket_handler, _log_buffer, _log_stream_handler, _key_store, _rate_limiter, _audit_log_store, _component_registry - RPG state (future migration): _registry (commands), _olc_manager (builder tools) - Config/logging: _runtime_config, _audit_logger


TD-006: Excessive Any Type Annotations in Core Modules

Status: Substantially resolved (~180 non-trivial remaining, down from ~320; 2026-03-XX) Priority: Low Affected Files: ~180 non-trivial : Any annotations remaining (excluding dict[str, Any], **kwargs: Any, list[Any])

Progress:

  • NPC autonomy models: world: AnyWorld, entity: AnyEntity across 12 files
  • weapons.py: 12 weapon_type: AnyWeaponType | str | None
  • Building commands: player_id: AnyUUID, ctx: AnyCommandContext, obj: AnyEntity | Component across 10 files
  • Loader pipeline: world: AnyWorld, reference_registry: AnyReferenceRegistry across 6 files
  • Engine commands: player_entity: AnyEntity, target: AnyEntity | None in arguments.py, locks.py, registry.py
  • Classic RPG: item_manager: AnyItemManager, character: AnyCharacter, store: AnyDocumentStore in 4 files
  • Stdlib: _collection() -> AnyDocumentCollection[Any] in memory/knowledge/relationships managers
  • Network: self._asgi_app: AnyASGIApp, request_id: Anystr | int | None in web/server.py
  • Core: inventory_items: list[tuple[UUID, list[Any], set[str]]]list[tuple[UUID, list[Component], set[str]]] in multiworld.py

Remaining ~180 are necessary/low-value: - dict[str, Any] patterns for JSON-like data (config values, API payloads, metadata) - **kwargs: Any in forwarding functions - Plugin interfaces with genuinely polymorphic parameters - ECS generic component values - Optional dependency types (discord, aiohttp stub limitations)


TD-007: Quest Generation System Type Safety Bypass

Status: ✅ Resolved (2026-03-XX) Priority: Medium Resolution: All type safety bypasses have been eliminated: - Public APIs added: pending_seed_count, in_flight_count, status(), entries(), entries_for_character(), recent_entries() - quest_generation_history stored via world.set_data() instead of monkey-patching - SystemManager.get_system() enables type-safe system lookup - 0 remaining # type: ignore or # noqa: SLF001 in quest generation code


TD-008: Deprecated Code Awaiting Removal

Status: ✅ Resolved (2026-03-XX) — all deprecated paths now emit DeprecationWarning Priority: Low Resolution: - auth.py: 8 deprecated methods (generate_key, validate_key, revoke_key, check, record, reset, cleanup_stale_keys, get_stats) all emit DeprecationWarning with "will be removed in v2.0" - rate_limiter.py: 2 deprecated methods (check_rate_limit, record_request) emit DeprecationWarning - create.py: 11 functions with global fallback paths now emit DeprecationWarning via warnings.warn() (register_entity_factory, get_entity_factory, reset_entity_factories, _ensure_default_factories, _get_factory_keys, register_template, get_template, unregister_template, list_templates, clear_templates, get_template_count) - Legacy position.py row-based damage/defense modifiers removed - Remaining work: Remove deprecated code in v2.0 as documented


Resolved Technical Debt

Items that have been addressed are moved here for historical reference.

TD-013: Unused Variable in Content Pack Loading (Resolved 2026-03-04)

Event catalog now stored on GameEngine._event_catalog, exposed via event_catalog property. Content pack events are collected per-pack during loading. Variable is no longer unused and # noqa: F841 suppression removed.

TD-012: PvP Rankings Potentially Undefined Variables (Resolved 2026-03-04)

Variables (old_winner_rating, old_winner_rank, old_loser_rating, old_loser_rank) now defined unconditionally before the guarded block. All 6 # noqa: F821 suppressions removed.

TD-011: Private Module-Level Noqa Suppressions for Encapsulation Violations (Resolved 2026-03-04)

Public APIs added to replace private member access: - BaseManager.cached_items property (replaces _cache access in ecs_item_bridge.py) - NeedDecaySystem.set_memory_provider() (replaces system._memory_provider mutation in pack.py) - SocialFabricSystem.set_relationship_provider() (replaces system._relationship_provider mutation) - QuestGenerationSystem.register_seed_type() (replaces system._seed_registry mutation) - Pipeline.default_context property (replaces pipeline._default_context access in cli/app.py)

TD-010: Migration Template Placeholder Values (Resolved 2026-03-04)

CLI arguments --collection, --component, --field now substitute placeholder values at template creation time. A pre-migrate validation guard rejects migrations with "TODO" in metadata fields, preventing invalid migrations from being applied.

TD-009: CI Benchmark Enforcement Missing (Resolved 2026-03-04)

.github/workflows/benchmark.yml created to enforce performance benchmarks in CI. Uses custom timing assertions (not pytest-benchmark) to validate performance targets against documented thresholds.

TD-002-8: PO Parser Escape Sequences (Resolved 2026-02-04)

Comprehensive escape sequence handling was already implemented. Added tests verifying handling of \n, \t, \r, \\, \", \xNN, \uXXXX, \UXXXXXXXX, octal escapes, and C-style escapes (\a, \b, \f, \v).

TD-002-9: Locale Handling Edge Cases (Resolved 2026-02-04)

Added comprehensive tests for locale normalization edge cases including empty strings, whitespace, script codes (zh-Hans, zh-Hant), unknown locales, mixed case handling, and multiple separators.

TD-002-10: conversation_count Property Lock (Resolved 2026-02-04)

The property is already thread-safe: in asyncio, only one coroutine runs at a time (cooperative multitasking), and Python's len() on dict is atomic in CPython. Added documentation explaining this and tests verifying the behavior.

TD-002-11: Token Estimation Edge Cases (Resolved 2026-02-04)

Added comprehensive edge case tests for token estimation: Unicode text (Russian, Chinese, Arabic), emoji handling, mixed scripts, punctuation-only text, whitespace-only, single characters, newlines/tabs, very long text, and messages missing content keys.

TD-002-12: Callable Conditions in Persistence (Resolved 2026-02-04)

Not an issue: all conditions use serializable string enums (SpawnCondition, ResetCondition, etc.) rather than callables. Pydantic's model_dump_json() would fail loudly if non-serializable data were present. The architecture correctly separates serializable data models from runtime behavior.


Contributing

When adding new technical debt documentation:

  1. Assign an ID - Use the format TD-XXX with incrementing numbers
  2. Document why - Explain why the pattern exists, not just that it's bad
  3. Provide workarounds - Help current developers work with the code as-is
  4. Plan migration - Include a concrete migration path with code examples
  5. Estimate effort - Give a rough estimate to help with prioritization

When resolving technical debt:

  1. Update this document - Move the item to "Resolved" section
  2. Remove TODO comments - Clean up the source code markers
  3. Update tests - Remove any workarounds that are no longer needed
  4. Document in PR - Reference the tech debt item in your PR description

Last updated: 2026-03-04

See also: Exception Handling Policy | Style Guide