{
"event": "PreToolUse",
"tool_name": "Write",
"tool_input": {
"file_path": "\/var\/www\/dev.campus.systemische-tools.de\/crawler\/planung\/02-tasks.yaml",
"content": "---\n# ============================================================================\n# Crawler – Task-Liste (formal)\n# ============================================================================\n# Ausführungsreihenfolge: Phasen sequenziell, Tasks innerhalb einer Phase\n# parallelisierbar sofern dependencies leer.\n# Jeder Task MUSS vor Abschluss die in acceptance_criteria gelisteten Gates passieren.\n# Stand: 2026-04-21\n# ============================================================================\n\nmeta:\n document: crawler-tasks\n version: 1.0.0\n total_tasks: 58\n project_root: \/var\/www\/dev.campus.systemische-tools.de\/crawler\n\nconventions:\n status_values: [todo, in_progress, blocked, review, done]\n priority_values: [critical, high, medium, low]\n all_tasks_must_pass: [G1, G4] # Lint und Architektur immer\n exit_definition: status=done AND alle required_gates passed\n\n# ----------------------------------------------------------------------------\n# PHASE 0 – Foundation\n# ----------------------------------------------------------------------------\n- id: T-001\n phase: 0\n title: MariaDB-Datenbank `crawler` anlegen\n deliverable: Datenbank existiert mit utf8mb4_unicode_ci\n commands:\n - db_execute(sql=\"CREATE DATABASE crawler CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;\", database=\"mysql\")\n acceptance_criteria:\n - db_databases() enthält \"crawler\"\n dependencies: []\n priority: critical\n status: todo\n estimate_hours: 0.25\n\n- id: T-002\n phase: 0\n title: Schema-Migrationen erstellen und anwenden\n deliverable: src\/db\/schema.sql + src\/db\/migrations\/0001_init.sql, alle Tabellen vorhanden\n tables: [crawls, urls, pages, page_headers, links, images, resources, structured_data, hreflang, js_errors, errors, external_checks]\n acceptance_criteria:\n - alle 12 Tabellen in `crawler`-DB\n - Foreign Keys aktiv\n - G6 grün\n dependencies: [T-001]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-003\n phase: 0\n title: Projekt-Skeleton (Hexagonal-Struktur) anlegen\n deliverable: |\n crawler\/\n src\/{domain,application,infrastructure,interfaces}\/\n tests\/{unit,integration,e2e}\/\n scripts\/\n artifacts\/\n pyproject.toml\n requirements.txt\n .import-linter\n .ruff.toml\n mypy.ini\n README.md (nur Struktur-Stub)\n acceptance_criteria:\n - Verzeichnisse + Konfigs vorhanden\n - `python -m src` importiert ohne Fehler\n dependencies: []\n priority: critical\n status: todo\n estimate_hours: 0.5\n\n- id: T-004\n phase: 0\n title: Playwright + Chromium installieren\n commands:\n - \".\/venv\/bin\/pip install playwright\"\n - \".\/venv\/bin\/playwright install chromium\"\n acceptance_criteria:\n - headless-chromium startet aus Python-Skript\n dependencies: [T-003]\n priority: critical\n status: todo\n estimate_hours: 0.25\n\n- id: T-005\n phase: 0\n title: Quality-Tools einrichten (ruff, mypy, radon, import-linter)\n deliverable: Konfigs + `scripts\/check.sh`\n acceptance_criteria:\n - `scripts\/check.sh` führt G1 und G4 auf leerem Skelett aus und meldet PASS\n dependencies: [T-003]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-006\n phase: 0\n title: Eigenen MaxLOC-Linter schreiben (Python + PHP)\n deliverable: scripts\/max_loc_linter.py, scripts\/max_loc_linter.php\n behavior: |\n Prüft Klassen und Methoden\/Funktionen auf 80-LOC-Grenze exkl. Kommentare,\n Docstrings, Leerzeilen. Exit-Code != 0 bei Verletzung.\n acceptance_criteria:\n - Unit-Test gegen Fixture-Dateien mit 79\/80\/81 LOC\n - Integration in G1\n dependencies: [T-005]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-007\n phase: 0\n title: HTMX-Linter schreiben\n deliverable: scripts\/htmx_lint.py\n behavior: enforces HTMX-C1..C5 gemäß CLAUDE.md\n acceptance_criteria:\n - G7 läuft gegen Fixture-Templates\n dependencies: [T-005]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-008\n phase: 0\n title: Schema-Drift-Check schreiben\n deliverable: scripts\/schema_drift.py\n acceptance_criteria:\n - G6 meldet Diff bei manueller Schema-Änderung\n dependencies: [T-002]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-009\n phase: 0\n title: Fixture-Webserver für Smoke\/Integration\n deliverable: tests\/fixtures\/fixture_server.py (aiohttp)\n content: statische Seiten mit bekannten SEO-Fehlern (fehlendes H1, doppelter Title, broken Link, etc.)\n acceptance_criteria:\n - Server startet auf 127.0.0.1:8089, liefert definierte Seitenstruktur\n dependencies: [T-003]\n priority: high\n status: todo\n estimate_hours: 1.0\n\n# ----------------------------------------------------------------------------\n# PHASE 1 – Domain Layer\n# ----------------------------------------------------------------------------\n- id: T-010\n phase: 1\n title: Value Object `NormalizedUrl`\n location: src\/domain\/value_objects\/normalized_url.py\n responsibilities: [Normalisierung, Hashing, Host-\/Scheme-Zerlegung]\n acceptance_criteria: [Unit-Tests ≥ 95%, G1, G4]\n dependencies: [T-005, T-006]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-011\n phase: 1\n title: Weitere Value Objects\n objects: [HttpStatus, CrawlMode, RenderMode, QualityFlag, LinkRelation, MetaRobots, Hreflang]\n acceptance_criteria: [Unit-Tests, Immutability, G1, G4]\n dependencies: [T-010]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-012\n phase: 1\n title: Domain-Entity `Crawl`\n fields_ref: planung\/00-planung.md §5.2\n invariants:\n - \"finished_at >= started_at\"\n - \"mode ∈ {fast, full, hybrid}\"\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-013\n phase: 1\n title: Domain-Entity `Page`\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-014\n phase: 1\n title: Weitere Entities – Url, Link, Image, Resource, StructuredData, JsError, CrawlError, ExternalCheck\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 2.0\n\n- id: T-015\n phase: 1\n title: Domain-Service `UrlNormalizer`\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-010]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-016\n phase: 1\n title: Domain-Service `LinkClassifier` (intern\/extern, Subdomain-Regel)\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-010, T-015]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-017\n phase: 1\n title: Domain-Service `QualityFlagCalculator`\n logic_ref: planung\/00-planung.md §4.9\n acceptance_criteria: [Unit-Tests für jede Regel, G1, G4]\n dependencies: [T-013]\n priority: high\n status: todo\n estimate_hours: 1.25\n\n- id: T-018\n phase: 1\n title: Ports (abstrakte Interfaces) definieren\n ports:\n - UrlQueuePort\n - CrawlRepositoryPort\n - PageRepositoryPort\n - LinkRepositoryPort\n - ImageRepositoryPort\n - ResourceRepositoryPort\n - StructuredDataRepositoryPort\n - ErrorRepositoryPort\n - ExternalCheckRepositoryPort\n - HttpFetcherPort\n - BrowserFetcherPort\n - HtmlParserPort\n - StructuredDataParserPort\n - FileStoragePort\n - LoggerPort\n acceptance_criteria: [ABCs mit nur Signaturen, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n# ----------------------------------------------------------------------------\n# PHASE 2 – Application Layer (Use Cases)\n# ----------------------------------------------------------------------------\n- id: T-020\n phase: 2\n title: UseCase `StartCrawl`\n responsibility: legt Crawl-Datensatz an, enqueued Base-URL + Sitemap-URLs\n acceptance_criteria: [Unit-Tests mit Mock-Ports, G1, G2, G4]\n dependencies: [T-012, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-021\n phase: 2\n title: UseCase `ProcessUrl`\n responsibility: fetcht URL, parst, persistiert Page, enqueued neue Links\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-013, T-016, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-022\n phase: 2\n title: UseCase `ExtractSeoData`\n responsibility: transformiert HTML\/DOM-Response in Page-Entity inkl. aller SEO-Felder\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-013, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-023\n phase: 2\n title: UseCase `CheckExternalLink`\n responsibility: HEAD\/GET externer URL, persistiert in external_checks, Cache-Logik\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-014, T-018]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-024\n phase: 2\n title: UseCase `FinalizeCrawl`\n responsibility: setzt finished_at, aggregiert Statistiken, berechnet globale Flags (broken_internal_links etc.)\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-012, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-025\n phase: 2\n title: UseCase `GenerateCrawlReport`\n output: JSON-Report pro Crawl-ID\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: high\n status: todo\n estimate_hours: 1.0\n\n- id: T-026\n phase: 2\n title: UseCase `CompareCrawls`\n output: Diff zwischen zwei Crawl-IDs (hinzugekommene, entfernte, geänderte URLs)\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: high\n status: todo\n estimate_hours: 1.25\n\n- id: T-027\n phase: 2\n title: UseCase `ListCrawls`\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: medium\n status: todo\n estimate_hours: 0.25\n\n- id: T-028\n phase: 2\n title: UseCase `RecheckExternalLinks`\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-023]\n priority: medium\n status: todo\n estimate_hours: 0.5\n\n# ----------------------------------------------------------------------------\n# PHASE 3 – I... [TRUNCATED-e6ae34f4dea9811e]"
}
}
{
"tool_response": {
"type": "create",
"filePath": "\/var\/www\/dev.campus.systemische-tools.de\/crawler\/planung\/02-tasks.yaml",
"content": "---\n# ============================================================================\n# Crawler – Task-Liste (formal)\n# ============================================================================\n# Ausführungsreihenfolge: Phasen sequenziell, Tasks innerhalb einer Phase\n# parallelisierbar sofern dependencies leer.\n# Jeder Task MUSS vor Abschluss die in acceptance_criteria gelisteten Gates passieren.\n# Stand: 2026-04-21\n# ============================================================================\n\nmeta:\n document: crawler-tasks\n version: 1.0.0\n total_tasks: 58\n project_root: \/var\/www\/dev.campus.systemische-tools.de\/crawler\n\nconventions:\n status_values: [todo, in_progress, blocked, review, done]\n priority_values: [critical, high, medium, low]\n all_tasks_must_pass: [G1, G4] # Lint und Architektur immer\n exit_definition: status=done AND alle required_gates passed\n\n# ----------------------------------------------------------------------------\n# PHASE 0 – Foundation\n# ----------------------------------------------------------------------------\n- id: T-001\n phase: 0\n title: MariaDB-Datenbank `crawler` anlegen\n deliverable: Datenbank existiert mit utf8mb4_unicode_ci\n commands:\n - db_execute(sql=\"CREATE DATABASE crawler CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;\", database=\"mysql\")\n acceptance_criteria:\n - db_databases() enthält \"crawler\"\n dependencies: []\n priority: critical\n status: todo\n estimate_hours: 0.25\n\n- id: T-002\n phase: 0\n title: Schema-Migrationen erstellen und anwenden\n deliverable: src\/db\/schema.sql + src\/db\/migrations\/0001_init.sql, alle Tabellen vorhanden\n tables: [crawls, urls, pages, page_headers, links, images, resources, structured_data, hreflang, js_errors, errors, external_checks]\n acceptance_criteria:\n - alle 12 Tabellen in `crawler`-DB\n - Foreign Keys aktiv\n - G6 grün\n dependencies: [T-001]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-003\n phase: 0\n title: Projekt-Skeleton (Hexagonal-Struktur) anlegen\n deliverable: |\n crawler\/\n src\/{domain,application,infrastructure,interfaces}\/\n tests\/{unit,integration,e2e}\/\n scripts\/\n artifacts\/\n pyproject.toml\n requirements.txt\n .import-linter\n .ruff.toml\n mypy.ini\n README.md (nur Struktur-Stub)\n acceptance_criteria:\n - Verzeichnisse + Konfigs vorhanden\n - `python -m src` importiert ohne Fehler\n dependencies: []\n priority: critical\n status: todo\n estimate_hours: 0.5\n\n- id: T-004\n phase: 0\n title: Playwright + Chromium installieren\n commands:\n - \".\/venv\/bin\/pip install playwright\"\n - \".\/venv\/bin\/playwright install chromium\"\n acceptance_criteria:\n - headless-chromium startet aus Python-Skript\n dependencies: [T-003]\n priority: critical\n status: todo\n estimate_hours: 0.25\n\n- id: T-005\n phase: 0\n title: Quality-Tools einrichten (ruff, mypy, radon, import-linter)\n deliverable: Konfigs + `scripts\/check.sh`\n acceptance_criteria:\n - `scripts\/check.sh` führt G1 und G4 auf leerem Skelett aus und meldet PASS\n dependencies: [T-003]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-006\n phase: 0\n title: Eigenen MaxLOC-Linter schreiben (Python + PHP)\n deliverable: scripts\/max_loc_linter.py, scripts\/max_loc_linter.php\n behavior: |\n Prüft Klassen und Methoden\/Funktionen auf 80-LOC-Grenze exkl. Kommentare,\n Docstrings, Leerzeilen. Exit-Code != 0 bei Verletzung.\n acceptance_criteria:\n - Unit-Test gegen Fixture-Dateien mit 79\/80\/81 LOC\n - Integration in G1\n dependencies: [T-005]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-007\n phase: 0\n title: HTMX-Linter schreiben\n deliverable: scripts\/htmx_lint.py\n behavior: enforces HTMX-C1..C5 gemäß CLAUDE.md\n acceptance_criteria:\n - G7 läuft gegen Fixture-Templates\n dependencies: [T-005]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-008\n phase: 0\n title: Schema-Drift-Check schreiben\n deliverable: scripts\/schema_drift.py\n acceptance_criteria:\n - G6 meldet Diff bei manueller Schema-Änderung\n dependencies: [T-002]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-009\n phase: 0\n title: Fixture-Webserver für Smoke\/Integration\n deliverable: tests\/fixtures\/fixture_server.py (aiohttp)\n content: statische Seiten mit bekannten SEO-Fehlern (fehlendes H1, doppelter Title, broken Link, etc.)\n acceptance_criteria:\n - Server startet auf 127.0.0.1:8089, liefert definierte Seitenstruktur\n dependencies: [T-003]\n priority: high\n status: todo\n estimate_hours: 1.0\n\n# ----------------------------------------------------------------------------\n# PHASE 1 – Domain Layer\n# ----------------------------------------------------------------------------\n- id: T-010\n phase: 1\n title: Value Object `NormalizedUrl`\n location: src\/domain\/value_objects\/normalized_url.py\n responsibilities: [Normalisierung, Hashing, Host-\/Scheme-Zerlegung]\n acceptance_criteria: [Unit-Tests ≥ 95%, G1, G4]\n dependencies: [T-005, T-006]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-011\n phase: 1\n title: Weitere Value Objects\n objects: [HttpStatus, CrawlMode, RenderMode, QualityFlag, LinkRelation, MetaRobots, Hreflang]\n acceptance_criteria: [Unit-Tests, Immutability, G1, G4]\n dependencies: [T-010]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-012\n phase: 1\n title: Domain-Entity `Crawl`\n fields_ref: planung\/00-planung.md §5.2\n invariants:\n - \"finished_at >= started_at\"\n - \"mode ∈ {fast, full, hybrid}\"\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-013\n phase: 1\n title: Domain-Entity `Page`\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-014\n phase: 1\n title: Weitere Entities – Url, Link, Image, Resource, StructuredData, JsError, CrawlError, ExternalCheck\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 2.0\n\n- id: T-015\n phase: 1\n title: Domain-Service `UrlNormalizer`\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-010]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-016\n phase: 1\n title: Domain-Service `LinkClassifier` (intern\/extern, Subdomain-Regel)\n acceptance_criteria: [Unit-Tests, G1, G4]\n dependencies: [T-010, T-015]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-017\n phase: 1\n title: Domain-Service `QualityFlagCalculator`\n logic_ref: planung\/00-planung.md §4.9\n acceptance_criteria: [Unit-Tests für jede Regel, G1, G4]\n dependencies: [T-013]\n priority: high\n status: todo\n estimate_hours: 1.25\n\n- id: T-018\n phase: 1\n title: Ports (abstrakte Interfaces) definieren\n ports:\n - UrlQueuePort\n - CrawlRepositoryPort\n - PageRepositoryPort\n - LinkRepositoryPort\n - ImageRepositoryPort\n - ResourceRepositoryPort\n - StructuredDataRepositoryPort\n - ErrorRepositoryPort\n - ExternalCheckRepositoryPort\n - HttpFetcherPort\n - BrowserFetcherPort\n - HtmlParserPort\n - StructuredDataParserPort\n - FileStoragePort\n - LoggerPort\n acceptance_criteria: [ABCs mit nur Signaturen, G1, G4]\n dependencies: [T-011]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n# ----------------------------------------------------------------------------\n# PHASE 2 – Application Layer (Use Cases)\n# ----------------------------------------------------------------------------\n- id: T-020\n phase: 2\n title: UseCase `StartCrawl`\n responsibility: legt Crawl-Datensatz an, enqueued Base-URL + Sitemap-URLs\n acceptance_criteria: [Unit-Tests mit Mock-Ports, G1, G2, G4]\n dependencies: [T-012, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.0\n\n- id: T-021\n phase: 2\n title: UseCase `ProcessUrl`\n responsibility: fetcht URL, parst, persistiert Page, enqueued neue Links\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-013, T-016, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-022\n phase: 2\n title: UseCase `ExtractSeoData`\n responsibility: transformiert HTML\/DOM-Response in Page-Entity inkl. aller SEO-Felder\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-013, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 1.5\n\n- id: T-023\n phase: 2\n title: UseCase `CheckExternalLink`\n responsibility: HEAD\/GET externer URL, persistiert in external_checks, Cache-Logik\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-014, T-018]\n priority: high\n status: todo\n estimate_hours: 0.75\n\n- id: T-024\n phase: 2\n title: UseCase `FinalizeCrawl`\n responsibility: setzt finished_at, aggregiert Statistiken, berechnet globale Flags (broken_internal_links etc.)\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-012, T-017, T-018]\n priority: critical\n status: todo\n estimate_hours: 0.75\n\n- id: T-025\n phase: 2\n title: UseCase `GenerateCrawlReport`\n output: JSON-Report pro Crawl-ID\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: high\n status: todo\n estimate_hours: 1.0\n\n- id: T-026\n phase: 2\n title: UseCase `CompareCrawls`\n output: Diff zwischen zwei Crawl-IDs (hinzugekommene, entfernte, geänderte URLs)\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: high\n status: todo\n estimate_hours: 1.25\n\n- id: T-027\n phase: 2\n title: UseCase `ListCrawls`\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-018]\n priority: medium\n status: todo\n estimate_hours: 0.25\n\n- id: T-028\n phase: 2\n title: UseCase `RecheckExternalLinks`\n acceptance_criteria: [Unit-Tests, G1, G2, G4]\n dependencies: [T-023]\n priority: medium\n status: todo\n estimate_hours: 0.5\n\n# ----------------------------------------------------------------------------\n# PHASE 3 – I... [TRUNCATED-e6ae34f4dea9811e]",
"structuredPatch": [],
"originalFile": null,
"userModified": false
}
}