Semantic Explorer

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-29

Exploration und Suche in der Dokumentations-Pipeline. Zeigt hierarchische Dokumente, deren Seiten und die extrahierten Text-Chunks mit Taxonomie-Klassifizierung, Keywords und Entities.

URL	/semantic-explorer
Datenbank	ki_content (documents, chunks)
Controller	/src/Controller/SemanticExplorerController.php
Views	/src/View/semantic-explorer/
Pipeline	/var/www/scripts/pipeline/

Datenmodell

Tabelle	Beschreibung	Felder
`documents`	Quelldokumente	id, filename, title, source, page_count, status, created_at
`chunks`	Extrahierte Text-Chunks	id, document_id, chunk_index, content, token_count, embedding_model, qdrant_id

Web-UI Komponenten

Seite	URL	Beschreibung
Dashboard	/semantic-explorer	Statistiken, Top-Kategorien, neueste Chunks
Entitäten	/semantic-explorer/entitaeten	Alle extrahierten Entitäten
Relationen	/semantic-explorer/relationen	Beziehungen zwischen Entitäten
Taxonomie	/semantic-explorer/taxonomie	Taxonomie-Hierarchie
Ontologie	/semantic-explorer/ontologie	Ontologie-Klassen
Chunks	/semantic-explorer/chunks	Alle Chunks mit Filter
Graph	/semantic-explorer/graph	Wissens-Graph Visualisierung

Detail-Ansichten

/semantic-explorer/entitaeten/{id} - Entity-Details mit Vorkommen
/semantic-explorer/chunks/{id} - Chunk-Details: Entities, Keywords, Semantik

API-Endpoints

Method	Endpoint	Beschreibung
GET	/api/v1/explorer/stats	Statistiken (Documents, Chunks, Entities)
GET	/api/v1/explorer/entities	Alle Entitäten
GET	/api/v1/explorer/relations	Alle Relationen
GET	/api/v1/explorer/taxonomy	Taxonomie-Hierarchie
GET	/api/v1/explorer/ontology	Ontologie-Klassen
GET	/api/v1/explorer/documents	Alle Dokumente

Chunk-Analyse

Die Pipeline extrahiert aus jedem Chunk:

entities: Erkannte Entitäten mit Typ (PERSON, ORGANIZATION, CONCEPT, etc.)
keywords: Relevante Schlüsselwörter
token_count: Anzahl Tokens für LLM-Budgetierung
embedding_model: Verwendetes Embedding-Modell
qdrant_id: ID im Vektor-Index

HybridSearch

Die Suche kombiniert semantische Vektorsuche (Qdrant) mit Keyword-Matching (MariaDB).

API-Aufruf

# Einfache Suche
curl -X POST https://dev.campus.systemische-tools.de/api/v1/explorer/search \
  -H "Content-Type: application/json" \
  -d '{"query": "Systemische Therapie", "limit": 10}'

Response-Format

{
  "success": true,
  "data": {
    "query": "Systemische Therapie",
    "results": [
      {
        "id": 42,
        "content": "...",
        "document_title": "Handbuch",
        "score": 0.87
      }
    ],
    "count": 5
  }
}

Statistiken abrufen

curl https://dev.campus.systemische-tools.de/api/v1/explorer/stats

Response

{
  "success": true,
  "data": {
    "documents": 12,
    "chunks": {
      "total": 234,
      "tokens": 125000
    },
    "entities": 450,
    "relations": 320
  }
}

Pipeline-Integration

Die Chunks werden von der Analyse-Pipeline generiert:

/var/www/scripts/pipeline/pipeline.py - Haupt-Pipeline
/var/www/scripts/pipeline/step_chunk.py - Chunk-Erstellung
/var/www/scripts/pipeline/step_embed.py - Embedding-Generierung
/var/www/scripts/pipeline/step_semantic.py - Semantik-Analyse

Siehe: RAG-Prozess