KI-System

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-20

Integriertes System zur Dokumentenverarbeitung, semantischen Analyse und Content-Erstellung mit RAG-Chat.

Status	In Entwicklung
Backend	Python 3.13
Frontend	PHP 8.4 + HTMX
LLM	Claude Opus 4.5 + Ollama
Datenbanken	ki_dev (Infrastructure), ki_content (User-facing)
Embedding	mxbai-embed-large (1024 dim)

Infrastruktur

Ollama - LLM Runtime
Qdrant - Vektor-Datenbank
NVIDIA CUDA - RTX 4000 SFF Ada
Datenbank - MariaDB Schema
Embedding - mxbai-embed-large

Pipeline

Document Pipeline - Nextcloud → MariaDB → Qdrant
Protokoll - Claude Request Logging

Semantik

Entitäten - Extraktion, Deduplizierung, Relationen
Taxonomie - Automatische Klassifikation
Ontologie - Wissensstruktur, Graph

Anwendungen

RAG-Chat - Semantische Suche + LLM
Content-Studio - Autorenprofile, Contracts, Kritiker

Datenbank-Struktur

Datenbank	Zweck	Tabellen
ki_dev	Development/Infrastructure	protokoll, tasks, contracts, dokumentation, prompts, mcp_log
ki_content	Content/User-facing	chat_sessions, chat_messages, content, personas, knowledge_graph

Datenfluss

Nextcloud (lokal)
    ↓
Pipeline (Python)
├── Text-Extraktion (OCR, Vision)
├── Semantisches Chunking
└── Metadaten-Anreicherung
    ↓
MariaDB (ki_dev + ki_content)
├── Dokumente, Chunks
├── Entitäten, Relationen
└── Taxonomie, Ontologie
    ↓
Qdrant (Embeddings, 1024 dim)
    ↓
Web-UI (Chat, Content-Studio)

Ollama

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-20

Lokale LLM-Runtime für KI-Modelle. Nutzt die NVIDIA GPU für schnelle Inferenz ohne Cloud-Abhängigkeit. Ermöglicht datenschutzkonforme KI-Nutzung.

Version	0.13.5
Port	11434
Modelle	/usr/share/ollama/.ollama/models

Installierte Modelle

Modell	Größe	Zweck
mxbai-embed-large	669 MB	Embeddings (1024 dim) für KI-System
mistral	4.4 GB	Chat, Analyse (7.2B)
llama3.2	2 GB	Schnelle Tasks (3.2B)

Modell herunterladen

ollama pull mxbai-embed-large
ollama pull mistral
ollama pull llama3.2
ollama list

Modell ausführen

ollama run llama3.2

API - Chat

curl http://localhost:11434/api/generate -d '{
  "model": "llama3.2",
  "prompt": "Hallo!"
}'

API - Embeddings

curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Text zum Embedden"
}'

Befehle

systemctl status ollama
ollama --version
ollama list

Qdrant Vektor-Datenbank

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-20

Speichert Embeddings für semantische Suche und RAG (Retrieval Augmented Generation). Ermöglicht KI-gestützte Ähnlichkeitssuche in Dokumenten.

Version	1.12.5
HTTP Port	6333
gRPC Port	6334
Storage	/opt/qdrant/storage

Konfiguration

Datei: /opt/qdrant/config/config.yaml

storage:
  storage_path: /opt/qdrant/storage
  snapshots_path: /opt/qdrant/snapshots
service:
  http_port: 6333
  grpc_port: 6334
  host: 127.0.0.1
log_level: INFO

Collection erstellen

curl -X PUT 'http://127.0.0.1:6333/collections/my_collection' \
  -H 'Content-Type: application/json' \
  -d '{"vectors": {"size": 384, "distance": "Cosine"}}'

Befehle

systemctl status qdrant
curl http://127.0.0.1:6333

NVIDIA CUDA

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

GPU-Treiber und CUDA-Toolkit für Hardware-beschleunigte KI-Berechnungen. Die RTX 4000 mit 20GB VRAM ermöglicht lokales LLM-Hosting ohne Cloud.

GPU	NVIDIA RTX 4000 SFF Ada Generation
VRAM	20 GB
Treiber	590.44.01
CUDA	13.1

Installation

apt-get install -y nvidia-driver nvidia-cuda-toolkit
reboot

Status prüfen

nvidia-smi

Zeigt GPU-Status, Treiber-Version und CUDA-Version an. Der nvcc-Compiler ist nicht installiert (nur für CUDA-Entwicklung benötigt).

Document Pipeline

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Automatischer Import und Verarbeitung von Dokumenten aus Nextcloud.

Quelle	/var/www/nextcloud/data/root/files/Documents
Formate	PDF, PPTX, DOCX, MD, TXT
Trigger	Polling + Event-basiert
Sprache	Python 3.13

Pipeline-Schritte

1. DETECT    → Neue/geänderte Dateien erkennen
2. EXTRACT   → Text extrahieren (OCR, Vision)
3. CHUNK     → Semantisches Chunking
4. ENRICH    → Metadaten anreichern
5. STORE     → In MariaDB speichern
6. EMBED     → Vektoren erzeugen
7. INDEX     → In Qdrant speichern
8. ANALYZE   → Semantische Analyse

Text-Extraktion

Format	Tool	Features
PDF	PyMuPDF	OCR via Tesseract
PPTX	python-pptx	Slides + Speaker Notes
DOCX	python-docx	Text-Extraktion
MD/TXT	direkt	UTF-8

Bild-Handling

Bilder in Dokumenten werden via Vision-API beschrieben und als Text-Chunk gespeichert.

Chunking

Methode	Semantisch + Hierarchisch
Größe	Intelligent (kontextabhängig)
Overlap	~10%
Hierarchie	Dokument → Kapitel → Abschnitt

Chunk-Metadaten

{
  "document_id": 123,
  "chunk_index": 0,
  "heading_path": ["Kapitel 1", "Abschnitt 1.2"],
  "source_folder": "/Documents/Therapie",
  "entities": ["Carl Rogers"],
  "taxonomy_terms": ["Methoden"]
}

Queue-System

Queue	ki_content.pipeline_queue
Runs	ki_content.pipeline_runs (Status, Logging)
Retry	Max 3 Versuche, exponential backoff

Pipeline-Scripts

/var/www/scripts/pipeline/
├── pipeline.py          → Haupt-Orchestrierung
├── detect.py            → Datei-Monitoring
├── extract.py           → Text-Extraktion
├── chunk.py             → Semantisches Chunking
├── embed.py             → Embedding-Erzeugung
├── analyze.py           → Semantische Analyse
├── generate_semantics.py → Semantik-Generierung (Entities, Relations)
├── db.py                → Datenbank-Operationen
├── config.py            → Konfiguration
├── run.sh               → Ausführungs-Wrapper
│
├── generate.py          → Content-Generierung (RAG + Kritiker)
├── web_generate.py      → Web-API für Content-Generierung
├── chat.py              → RAG-Chat (interaktiv + CLI)
├── web_chat.py          → Web-API für RAG-Chat
│
└── venv/                → Python Virtual Environment

Script-Kategorien

Kategorie	Scripts	Docs
Import-Pipeline	detect, extract, chunk, embed, analyze	Embedding
Semantik	generate_semantics.py	Entitäten
Content-Generierung	generate.py, web_generate.py	Content-Studio
RAG-Chat	chat.py, web_chat.py	RAG-Chat
Infrastruktur	db.py, config.py, run.sh	Datenbank

Ausführung

cd /var/www/scripts/pipeline

# Neue Dokumente scannen
./run.sh scan

# Queue verarbeiten
./run.sh process

# Pending Embeddings
./run.sh embed

# Volle Pipeline
./run.sh all

# Einzelne Datei
./run.sh file /pfad/zur/datei.pdf

# Status anzeigen
./run.sh status

Embedding

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-20

Vektorerzeugung für semantische Suche und RAG.

Model	mxbai-embed-large
Dimensionen	1024
Provider	Ollama (lokal)
Fallback	OpenAI (optional)

Qdrant Collections

Collection	Zweck	Dimensionen
documents	Dokument-Chunks	1024
mail	E-Mail-Inhalte	1024
entities	Entitäten-Embeddings	1024

Qdrant-Konfiguration

{
  "vectors": {
    "size": 1024,
    "distance": "Cosine"
  },
  "hnsw_config": {
    "m": 16,
    "ef_construct": 100
  }
}

Model installieren

ollama pull mxbai-embed-large
ollama list

API-Aufruf

curl http://localhost:11434/api/embeddings -d '{
  "model": "mxbai-embed-large",
  "prompt": "Text zum Embedden"
}'

Python-Integration

import requests

def get_embedding(text: str) -> list[float]:
    response = requests.post(
        'http://localhost:11434/api/embeddings',
        json={
            'model': 'mxbai-embed-large',
            'prompt': text
        }
    )
    return response.json()['embedding']

Qdrant-Speicherung

from qdrant_client import QdrantClient
from qdrant_client.models import PointStruct

client = QdrantClient(host="localhost", port=6333)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=uuid4().hex,
            vector=embedding,
            payload={
                "document_id": 123,
                "chunk_id": 1,
                "content_preview": text[:200]
            }
        )
    ]
)

Entitäten

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-29

Automatische Extraktion und Verwaltung von Entitäten aus Dokumenten.

Methode	LLM-Extraktion (prompt-basiert)
Typen	Dynamisch aus Dokumenten
Sprache	Deutsch
Kuratierung	Manuell via Web-UI
Datenbank	ki_content

Entitätstypen

Typ	Beschreibung	Beispiel
PERSON	Autoren, Therapeuten	Carl Rogers
ORGANIZATION	Institute, Verlage	Carl Auer Verlag
CONCEPT	Theorien, Methoden	Systemtheorie
WORK	Bücher, Artikel	Die Kunst der Psychotherapie
EVENT	Konferenzen	Heidelberger Symposium
TERM	Fachbegriffe	Zirkuläres Fragen

Semantik-Generierung

Das Script generate_semantics.py generiert semantische Definitionen für Entitäten mit Ollama.

Script	/var/www/scripts/pipeline/generate_semantics.py
Modell	mistral (Ollama)
Ziel-Tabelle	chunk_semantics

Ablauf

Entitäten ohne Semantik laden
Dokument-Kontext aus chunks-Tabelle laden (Top 5)
Für jede Entity: LLM-Prompt mit Kontext generieren
JSON-Response parsen (definition, domain, context, attributes)
In chunk_semantics speichern (UPSERT)

Generiertes Schema

{
  "definition": "Bedeutung in 1-2 Sätzen",
  "domain": "Wissensdomäne",
  "context": "Verwendungskontext",
  "attributes": {},
  "usage_notes": "",
  "confidence": 0.8
}

Ausführung

cd /var/www/scripts/pipeline
source venv/bin/activate
python generate_semantics.py

Deduplizierung

Synonyme werden in einer Referenzierungs-Tabelle gespeichert:

entity_synonyms:
  entity_id: 42 (Carl Rogers)
  synonyms:
    - "Rogers"
    - "C. Rogers"
    - "Carl R. Rogers"

Extraktions-Prompt

Analysiere folgenden Text und extrahiere alle Entitäten.
Bestimme den Typ selbstständig basierend auf dem Kontext.

Text: {chunk_content}

Ausgabeformat JSON:
{
  "entities": [
    {
      "name": "Carl Rogers",
      "type": "PERSON",
      "context": "Begründer der klientenzentrierten Therapie",
      "confidence": 0.95
    }
  ]
}

Relationen

Relation	Beschreibung
AUTHORED_BY	Person verfasste Werk
INFLUENCED	Person beeinflusste Person/Konzept
PART_OF	Konzept ist Teil von
APPLIES	Methode wendet Konzept an
EXTENDS	Konzept erweitert Konzept
CITES	Werk zitiert Werk

Datenbank-Schema (ki_content)

entities (
    id, name, canonical_name, type,
    description, created_at
)

entity_synonyms (
    entity_id, synonym
)

entity_relations (
    source_entity_id, target_entity_id,
    relation_type, confidence
)

chunk_entities (
    chunk_id, entity_id,
    mention_count, relevance_score
)

chunk_semantics (
    id, chunk_id,
    definition, domain, context,
    attributes (JSON), usage_notes,
    confidence, source,
    created_at, updated_at
)

Web-UI Features

Entitätenliste mit Filtern
Detailansicht mit allen Vorkommen
Batch-Merge für Duplikate
Manuelles Anlegen/Bearbeiten
Graph-Visualisierung (Vanilla JS)

Taxonomie

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Automatische hierarchische Klassifikation von Dokumenten.

Erstellung	Automatisch aus Dokumenten ableiten
Hierarchie	Dynamisch (so tief wie nötig)
Mehrfachzuordnung	Ja
Verwaltung	Web-UI

Beispiel-Taxonomie

Systemische Therapie
├── Grundlagen
│   ├── Systemtheorie
│   ├── Konstruktivismus
│   └── Kybernetik
├── Methoden
│   ├── Zirkuläres Fragen
│   ├── Genogramm
│   └── Aufstellung
├── Anwendungsfelder
│   ├── Familientherapie
│   ├── Paartherapie
│   └── Organisationsberatung
└── Personen
    ├── Begründer
    └── Zeitgenössisch

Automatische Ableitung

1. LLM analysiert Dokument-Inhalte
2. Extrahiert Themen-Cluster
3. Erstellt hierarchische Struktur
4. Ordnet Dokumente zu
5. Benutzer kann anpassen

Datenbank-Schema (ki_content)

taxonomy_terms (
    id INT PK AUTO,
    name VARCHAR(255),
    slug VARCHAR(255) UNIQUE,
    parent_id INT FK (self-ref),
    description TEXT,
    depth INT DEFAULT 0,
    path VARCHAR(1000),
    created_at DATETIME
)

document_taxonomy (
    document_id INT PK FK,
    taxonomy_term_id INT PK FK,
    confidence DECIMAL(5,4),
    assigned_by ENUM('llm','rule','manual') DEFAULT 'llm',
    created_at DATETIME
)

LLM-Klassifikation

Ordne das folgende Dokument in die Taxonomie ein.
Mehrfachzuordnung ist erlaubt.

Dokument: {document_summary}

Taxonomie:
{taxonomy_tree}

Ausgabeformat JSON:
{
  "assignments": [
    {"term_id": 42, "confidence": 0.85},
    {"term_id": 17, "confidence": 0.72}
  ]
}

Confidence-Schwellwert

Minimum: 0.5 für automatische Zuordnung

Web-UI Features

Hierarchie bearbeiten (Drag & Drop)
Terme hinzufügen/umbenennen
Dokumente zuordnen/entfernen
Statistiken pro Term

Ontologie

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Formale Wissensstruktur mit Klassen, Eigenschaften und Relationen.

Formalität	Leichtgewichtig (DB-basiert)
Speicherung	MariaDB
Relationsextraktion	Automatisch via LLM
Visualisierung	Graph (Vanilla JS)
Export	Nein (kein OWL/RDF)

Klassen-Struktur

Person
├── Eigenschaften: name, wirkungsbereich
└── Relationen:
    ├── verfasste → Werk
    └── beeinflusste → Person

Konzept
├── Eigenschaften: name, definition
└── Relationen:
    ├── gehört_zu → Konzept
    └── nutzt → Methode

Werk
├── Eigenschaften: titel, jahr, typ
└── Relationen:
    ├── behandelt → Konzept
    └── verfasst_von → Person

Relationstypen

Relation	Beschreibung
AUTHORED_BY	Person verfasste Werk
INFLUENCED	Person beeinflusste Person/Konzept
PART_OF	Konzept ist Teil von
APPLIES	Methode wendet Konzept an
CONTRADICTS	Konzept widerspricht Konzept
EXTENDS	Konzept erweitert Konzept
CITES	Werk zitiert Werk
SYNONYM_OF	Entität ist Synonym
RELATED_TO	Allgemeine Beziehung

Datenbank-Schema (ki_content)

ontology_classes (
    id INT PK AUTO,
    name VARCHAR(255) UNIQUE,
    parent_class_id INT FK (self-ref),
    description TEXT,
    properties LONGTEXT (JSON),
    created_at DATETIME
)

entity_classifications (
    id INT PK AUTO,
    entity_id INT FK,
    ontology_class_id INT FK,
    confidence FLOAT DEFAULT 1
)

entity_relations (
    source_entity_id INT FK,
    target_entity_id INT FK,
    relation_type VARCHAR(100),
    confidence FLOAT,
    chunk_id INT FK (Herkunft)
)

LLM-Relationsextraktion

Analysiere den Text und identifiziere Relationen.

Text: {chunk_content}
Bekannte Entitäten: {entities}

Ausgabeformat JSON:
{
  "relations": [
    {
      "source": "Carl Rogers",
      "target": "Klientenzentrierte Therapie",
      "type": "AUTHORED_BY",
      "confidence": 0.92
    }
  ]
}

Graph-Visualisierung

Interaktiver Graph mit Vanilla JS:

Knoten = Entitäten (farbcodiert nach Typ)
Kanten = Relationen (beschriftet)
Zoom, Pan, Filter
Click = Details anzeigen

RAG-Chat

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Retrieval-Augmented Generation Chat mit semantischer Suche und Session-Persistenz.

Tool	/chat
API-Referenz	/docs/api/chat
LLM	Claude Opus 4.5 (Anthropic) / Ollama (lokal)
Embedding	mxbai-embed-large (Ollama)
Vektoren	Qdrant
Datenbank	ki_content (chat_sessions, chat_messages)

Web-UI Routen

Route	Methode	Beschreibung
/chat	GET	Neue Session erstellen, Redirect zu /chat/{uuid}
/chat/{uuid}	GET	Session anzeigen (Sidebar + Nachrichten)
/chat/{uuid}/message	POST	Nachricht senden (HTMX)
/chat/{uuid}/title	POST	Session-Titel aktualisieren (HTMX)
/chat/{uuid}	DELETE	Session löschen (HTMX)
/chat/sessions	GET	Session-Liste Partial (HTMX)

Features

Session-Verwaltung

Session-Persistenz mit UUID-basierten URLs
Session-Liste in Sidebar mit Auto-Refresh (HTMX)
Auto-Titel aus erster Nachricht
Manueller Titel-Edit via Inline-Input
Session löschen (CASCADE auf Nachrichten)

Konfiguration

Modell-Auswahl (Claude / Ollama)
Collection-Auswahl (Multi-Select aus rag_collections)
Kontext-Limit (3/5/10/15 Quellen)
Temperature (0.0-1.0)
Max Tokens (bis 8192)
Autorenprofil-Auswahl für Schreibstil

Token & Kosten

Pro Nachricht: Input-Tokens, Output-Tokens, Kosten
Pro Session: Token-Summe und Gesamtkosten in Sidebar
Kosten-Berechnung: Opus 4.5 Pricing ($15/1M input, $75/1M output)
Ollama: "lokal" Label statt Token-Zahlen

Datenbank-Schema (ki_content)

chat_sessions

id INT PK AUTO
uuid VARCHAR(36) UNIQUE
session_token VARCHAR(64) UNIQUE
user_id INT NULL
persona_id INT FK NULL
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
last_activity DATETIME DEFAULT CURRENT_TIMESTAMP
model VARCHAR(100) DEFAULT 'claude-opus-4-5-20251101'
context_limit INT DEFAULT 5
temperature DECIMAL(3,2) DEFAULT 0.50
max_tokens INT DEFAULT 4096
title VARCHAR(255) DEFAULT 'Neuer Chat'
author_profile_id INT FK NULL
system_prompt_id INT NULL
updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP
collections TEXT DEFAULT '["documents"]' (JSON-Array)

chat_messages

id INT PK AUTO
session_id INT FK NOT NULL (CASCADE DELETE)
role ENUM('user','assistant','system')
model VARCHAR(100)
content TEXT NOT NULL
start_microtime DECIMAL(16,6)
end_microtime DECIMAL(16,6)
tokens_input INT
tokens_output INT
sources LONGTEXT (JSON)
author_profile_id INT
system_prompt_id INT
collections TEXT (JSON-Array)
context_limit INT
chunks_used LONGTEXT (JSON)
llm_request_id INT FK
created_at DATETIME DEFAULT CURRENT_TIMESTAMP

RAG-Pipeline

User Query (Text)
    ↓
Embedding erzeugen (mxbai-embed-large)
    ↓
Qdrant: Ähnliche Chunks finden (Top-K)
    ↓
Kontext zusammenstellen
    ↓
LLM (Claude/Ollama): Antwort generieren
    ↓
Response + Quellenangaben + Tokens speichern

Siehe auch

Content-Studio Architektur

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Strukturierte Content-Erstellung mit Autorenprofilen, Contracts und Kritikersystem.

Tool	/content
Dokumentation	/docs/content-studio
API-Referenz	/docs/api/content
LLM	Claude Opus 4.5
Kritiker-Durchläufe	Max. 3
Datenbank	ki_content

Web-UI (RESTful)

URL	Beschreibung
/content	Auftrags-Liste
/content/new	Neuer Auftrag
/content/{id}	Details anzeigen
/content/{id}/edit	Bearbeiten

Workflow

1. BRIEFING → Thema, Zielgruppe, Umfang
2. KONFIGURATION → Autorenprofil, Contract, Struktur, Quellen
3. GENERIERUNG → Kapitel-für-Kapitel mit Fortschrittsanzeige
4. CRITIQUE (max 3x) → Kritiker analysieren, automatische Revision
5. VALIDATE → Contract-Prüfung
6. APPROVE → Menschliches OK
7. PUBLISH → Export, Archivierung

Datenbank-Schemata (ki_content)

content_config

Unified Config für Autorenprofile, Contracts, Strukturen, Critics etc.

id INT PK AUTO
type ENUM('author_profile','structure','organization','contract','rule','system_prompt','critic')
name VARCHAR(100) NOT NULL
slug VARCHAR(100) NOT NULL
description TEXT
content LONGTEXT NOT NULL (JSON/YAML)
version VARCHAR(20) DEFAULT '1.0'
status ENUM('draft','active','deprecated') DEFAULT 'draft'
parent_id INT FK NULL
prompt_id INT FK NULL
sort_order INT DEFAULT 0
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP

Hinweis: Kritiker-Definitionen werden in dieser Tabelle mit type='critic' gespeichert.

content_orders

Content-Aufträge mit Briefing, Status und Verknüpfungen.

id INT PK AUTO
title VARCHAR(255) NOT NULL
briefing TEXT
author_profile_id INT FK → content_config
contract_id INT FK → content_config
structure_id INT FK → content_config
model VARCHAR(100) DEFAULT 'claude-sonnet-4-20250514'
collections LONGTEXT (JSON-Array)
context_limit INT DEFAULT 5
status ENUM('draft','generating','critique','revision','validate','approve','published')
generation_status ENUM('idle','queued','generating','completed','failed')
generation_started_at DATETIME
generation_error TEXT
generation_log TEXT
generation_step VARCHAR(50)
critique_status ENUM('idle','critiquing','completed','failed')
critique_started_at DATETIME
critique_error TEXT
critique_log TEXT
critique_step VARCHAR(50)
current_critique_round INT DEFAULT 0
created_by INT
temperature DECIMAL(3,2) DEFAULT 0.50
max_tokens INT DEFAULT 4096
system_prompt_id INT FK → content_config
selected_critics LONGTEXT (JSON-Array)
quality_check TINYINT(1) DEFAULT 0
created_at DATETIME DEFAULT CURRENT_TIMESTAMP
updated_at DATETIME ON UPDATE CURRENT_TIMESTAMP

content_versions

Versionierte Content-Texte.

id INT PK AUTO
order_id INT FK NOT NULL → content_orders
version_number INT NOT NULL
content LONGTEXT
created_at DATETIME DEFAULT CURRENT_TIMESTAMP

content_sources

RAG-Quellen pro Order (Chunk-Referenzen mit Relevanz-Score).

order_id INT PK FK → content_orders
chunk_id INT PK FK → chunks
relevance_score DECIMAL(5,4) NULL

content_critiques

Kritiker-Feedback pro Version und Runde.

id INT PK AUTO
version_id INT FK NOT NULL → content_versions
critic_id INT FK NOT NULL → content_config (type='critic')
round INT NOT NULL
feedback LONGTEXT NULL
created_at DATETIME DEFAULT CURRENT_TIMESTAMP

content_config_history

Config-Änderungshistorie für Versionierung.

id INT PK AUTO
config_id INT FK NOT NULL → content_config
content LONGTEXT NOT NULL
version VARCHAR(20) NOT NULL
changed_by VARCHAR(100) NULL
change_description TEXT NULL
created_at DATETIME DEFAULT CURRENT_TIMESTAMP

Pipeline-Scripts

/var/www/scripts/pipeline/
├── generate.py       → Content-Generierung Kernlogik
└── web_generate.py   → Web-API Wrapper

CLI-Verwendung

# Content generieren
python /var/www/scripts/pipeline/generate.py generate <order_id> [model]

# Kritik-Runde starten
python /var/www/scripts/pipeline/generate.py critique <version_id> [model]

# Revision erstellen
python /var/www/scripts/pipeline/generate.py revise <version_id> [model]

Dokumentation » KI-System » Datenbank

KI-System Datenbank

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

MariaDB-Schema für Dokumentenverarbeitung, Semantik und Content-Erstellung.

Datenbanken	ki_dev + ki_content
Tabellen	58 (23 + 35)
Engine	InnoDB
Charset	utf8mb4_unicode_ci

Datenbank-Architektur

Datenbank	Zweck	Tabellen
ki_dev	Infrastruktur: Tasks, Contracts, Docs, Pipeline, Logs	23
ki_content	Content: Chat, Wissen, Entitäten, Taxonomie	35

ki_dev (23 Tabellen)

Contracts (3 Tabellen)

Tabelle	Beschreibung
contracts	Contract-Definitionen (YAML)
contract_history	Versions-Historie
contract_validations	Validierungsergebnisse

Dokumentation (3 Tabellen)

Tabelle	Beschreibung
dokumentation	Hierarchische Dokumentationsseiten (MCP-Docs)
dokumentation_chunks	Chunked Content für RAG
dokumentation_history	Änderungshistorie

Tasks (4 Tabellen)

Tabelle	Beschreibung
tasks	Task-Verwaltung
task_assignments	Zuweisungen (Human/AI)
task_comments	Kommentare zu Tasks
task_results	Ergebnisse

Code-Analyse (4 Tabellen)

Tabelle	Beschreibung
code_analysis	PHP-Klassen/Interfaces/Traits
code_dependencies	Abhängigkeiten zwischen Klassen
code_quality	Quality-Scan-Ergebnisse
code_scan_config	Scan-Konfiguration

LLM & RAG (3 Tabellen)

Tabelle	Beschreibung
prompts	Versionierte Prompts
llm_requests	Request-Logging mit Kosten
rag_collections	Collection-Metadaten (Qdrant-Sync)

AI & Modelle (1 Tabelle)

Tabelle	Beschreibung
ai_models	Registrierte AI-Modelle

Logging & Audit (5 Tabellen)

Tabelle	Beschreibung
protokoll	Claude-Protokoll
mcp_log	MCP-Server Logging
pipeline_log	Pipeline-Verarbeitungs-Log
audit_log	System-Audit-Trail
file_backup_history	Datei-Backup-Historie

ki_content (35 Tabellen)

Chat (3 Tabellen)

Tabelle	Beschreibung
chat_sessions	Chat-Sessions mit Settings
chat_messages	Nachrichten mit Chunk-Referenzen
search_history	Suchverlauf

Content Studio (6 Tabellen)

Tabelle	Beschreibung
content_config	Unified Config (Profiles, Contracts, Structures, Critics)
content_config_history	Config-Änderungshistorie
content_orders	Erstellungsaufträge
content_versions	Content-Versionen
content_critiques	Kritik-Ergebnisse
content_sources	RAG-Quellen pro Auftrag

Hinweis: Kritiker-Personas werden in content_config mit type='critic' gespeichert.

Dokumente & Chunks (4 Tabellen)

Tabelle	Beschreibung
documents	Quelldokumente aus Nextcloud
document_pages	Seiten pro Dokument
chunks	Extrahierte Text-Chunks mit Metadaten
generated_questions	Generierte Fragen für Chunks

Chunk-Zuordnungen (3 Tabellen)

Tabelle	Beschreibung
chunk_entities	Entity-Chunk-Zuordnung
chunk_semantics	Semantik-Chunk-Zuordnung
chunk_taxonomy	Taxonomie-Chunk-Zuordnung

Entitäten (7 Tabellen)

Tabelle	Beschreibung
entities	Extrahierte Entitäten (Personen, Konzepte, ...)
entity_types	Entity-Typen-Definition
entity_synonyms	Synonyme für Deduplizierung
entity_relations	Relationen zwischen Entitäten
entity_classifications	Entity-Ontologie-Mapping
entity_semantics	Semantische Annotationen
entity_taxonomy_mapping	Entity-Taxonomie-Zuordnung

Dokument-Zuordnungen (2 Tabellen)

Tabelle	Beschreibung
document_entities	Entity-Dokument-Zuordnung
document_taxonomy	Dokument-Taxonomie-Zuordnung

Semantik (4 Tabellen)

Tabelle	Beschreibung
ontology_classes	Ontologie-Klassen (hierarchisch)
taxonomy_terms	Taxonomie-Hierarchie
stopwords	Stoppwörter für NLP
provenance	Herkunfts-Tracking

Pipeline (4 Tabellen)

Tabelle	Beschreibung
pipeline_configs	Pipeline-Konfigurationen
pipeline_queue	Verarbeitungs-Queue
pipeline_runs	Pipeline-Ausführungen
pipeline_steps	Step-Definitionen

Sonstige (2 Tabellen)

Tabelle	Beschreibung
prompts	Content-spezifische Prompts
semantic_queue	Queue für semantische Analyse

Datenbankzugriff

Wichtig: Verwende MCP-DB für sichere Datenbankzugriffe statt direkter SQL-Befehle.

# MCP-DB (empfohlen)
db_tables(database="ki_dev")
db_tables(database="ki_content")
db_select("SELECT * FROM documents LIMIT 5", database="ki_content")
db_describe(table="chat_sessions", database="ki_content")

# Direkter Zugriff (nur für Admin-Tasks)
mysql -u root -p ki_dev
mysql -u root -p ki_content

Siehe MCP-DB Dokumentation für Details.

Änderungshistorie

Datum	Änderung
2025-12-31	Tabellenanzahl korrigiert: ki_dev 19→23, ki_content 23→35, Gesamt 42→58
2025-12-31	critics-Tabelle entfernt (existiert nicht, Kritiker in content_config mit type='critic')
2025-12-31	Neue Tabellen dokumentiert: code_analysis, code_dependencies, code_quality, code_scan_config, ai_models, audit_log, dokumentation_chunks, pipeline_, entity_, document_*, stopwords, semantic_queue, provenance
2025-12-29	Row-Counts entfernt (ändern sich ständig), Tabellennamen verifiziert
2025-12-21	Korrektur: ki_system → ki_dev/ki_content, 31 → 42 Tabellen
2025-12-21	Entfernt: author_profiles, content_contracts, content_structures (ersetzt durch content_config)
2025-12-21	Hinzugefügt: rag_collections, task_comments, content_config, chunk_* Tabellen
2025-12-20	Initial erstellt

]]>

KI-Protokoll

Erstellt: 2025-12-20 | Aktualisiert: 2025-12-31

Automatisches Logging-System für Claude Code Sessions. Erfasst alle Requests, Responses und Tool-Aufrufe in einer MariaDB-Datenbank via Hook-System.

Datenbank	ki_dev
Tabelle	protokoll
Hook-Script	/var/www/tools/ki-protokoll/claude-hook/log_to_db.py
Config (User)	/root/.claude/settings.json
Config (Projekt)	/var/www/dev.campus.systemische-tools.de/.claude/settings.local.json

Erfasste Events

Event	Beschreibung	Hooks
UserPromptSubmit	User-Eingaben	log_to_db.py
PreToolUse	Tool-Aufruf (Request)	log_to_db.py + spezifische Matcher
PostToolUse	Tool-Ergebnis (Response)	log_to_db.py + spezifische Matcher
SessionStart	Session-Beginn	log_to_db.py
SessionEnd	Session-Ende	log_to_db.py
Stop	Abbruch	log_to_db.py
SubagentStop	Subagent-Abbruch	log_to_db.py
Notification	Benachrichtigungen	log_to_db.py

Hook-Scripts

Script	Zweck	Events/Matcher
log_to_db.py	Logging in protokoll-Tabelle	Alle Events
block_direct_db.py	Blockiert mysql/mariadb CLI	PreToolUse:Bash
block_direct_task_db.py	Blockiert direkte Task-DB-Zugriffe	PreToolUse:Bash
block_password_exposure.py	Blockiert Passwort-Exposition	PreToolUse:Bash
file_backup_hook.py	Backup vor Dateiänderungen	PreToolUse:Edit\|Write
hook_dispatcher.py	Contract-Validierung	PreToolUse:Write, PostToolUse:Write\|Edit
task_completion_guard.py	Task-Completion-Prüfung	PreToolUse:tasks_status

Datenbank-Schema

CREATE TABLE protokoll (
  id bigint(20) NOT NULL AUTO_INCREMENT,
  timestamp datetime(6) DEFAULT current_timestamp(6),
  request_ip varchar(45) NOT NULL,
  client_name varchar(255) NOT NULL,
  request text NOT NULL,
  request_timestamp datetime(6) NOT NULL,
  response text DEFAULT NULL,
  response_timestamp datetime(6) DEFAULT NULL,
  duration_ms int(10) unsigned DEFAULT NULL,
  tokens_input int(10) unsigned DEFAULT NULL,
  tokens_output int(10) unsigned DEFAULT NULL,
  tokens_total int(10) unsigned DEFAULT NULL,
  model_name varchar(255) DEFAULT NULL,
  status enum('pending','completed','error') DEFAULT 'pending',
  error_message text DEFAULT NULL,
  PRIMARY KEY (id),
  KEY idx_timestamp (timestamp),
  KEY idx_client_name (client_name),
  KEY idx_status (status)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

Hook-Konfiguration

Die Hooks werden in zwei Konfigurationsdateien definiert:

User-Scope: /root/.claude/settings.json - gilt für alle Projekte
Projekt-Scope: .claude/settings.local.json - projektspezifisch

Beispiel: PreToolUse mit Matchern

"PreToolUse": [
  {
    "matcher": "Bash",
    "hooks": [
      {"type": "command", "command": "/var/www/scripts/hooks/block_direct_db.py"},
      {"type": "command", "command": "/var/www/scripts/hooks/block_password_exposure.py"}
    ]
  },
  {
    "matcher": "Edit|Write",
    "hooks": [
      {"type": "command", "command": "/var/www/tools/ki-protokoll/claude-hook/file_backup_hook.py", "timeout": 10}
    ]
  },
  {
    "matcher": "",
    "hooks": [
      {"type": "command", "command": "/var/www/tools/ki-protokoll/claude-hook/log_to_db.py", "timeout": 5}
    ]
  }
]

Abfragen

# Via MCP-DB (empfohlen)
db_select("SELECT id, timestamp, client_name, status FROM protokoll ORDER BY id DESC LIMIT 10", database="ki_dev")

# Requests pro Tag
db_select("SELECT DATE(timestamp) as tag, COUNT(*) as anzahl FROM protokoll GROUP BY DATE(timestamp) ORDER BY tag DESC", database="ki_dev")

Sicherheit

Sensible Daten (Passwörter, API-Keys, Tokens) werden automatisch mit [REDACTED] maskiert
Lange Inhalte werden auf 10.000 Zeichen gekürzt
DB-User hat entsprechende Rechte
Blocking-Hooks verhindern unsichere Operationen

CSS-Datei	/public/css/nav.css
JavaScript	/public/js/app.js
Layout	/src/View/layout.php
Mobile Breakpoint	768px

Struktur

Die Navigation ist in drei Hauptbereiche gegliedert:

Bereich	Zielgruppe	Inhalte
Anwendungen	Endanwender	KI-Chat, Content Studio, Semantic Explorer, Nextcloud
Entwicklung	Entwickler	Tasks, Protokoll, Contracts, System Explorer
Ressourcen	Alle	Dokumentation, File Backup

CSS-Architektur

Die Navigation verwendet CSS-Variablen gemäß CSS Contract v1.0:

:root {
    --nav-bg: #2c3e50;
    --nav-text: #fff;
    --nav-hover-bg: rgba(255, 255, 255, 0.1);
    --nav-dropdown-bg: #fff;
    --nav-dropdown-text: #333;
    --nav-dropdown-hover-bg: #f8f9fa;
    --nav-dropdown-border: #eee;
    --nav-focus-color: #3498db;
}

Accessibility

Implementierte WCAG 2.1 AA Features:

Feature	Umsetzung	WCAG
Focus-States	2px solid outline auf allen interaktiven Elementen	2.4.7
Kontrast	Weiß auf #2c3e50 = 11.7:1	1.4.3
ARIA	aria-label, aria-expanded auf Toggle	4.1.2
Keyboard	Tab-Navigation, Enter/Space für Dropdowns	2.1.1

Mobile Navigation

Bei Viewport ≤ 768px:

Hamburger-Menü-Button erscheint
Navigation-Items werden vertikal gestapelt
Dropdowns öffnen per Click statt Hover
Volle Breite für Touch-Targets

JavaScript

// Mobile Toggle
navToggle.addEventListener("click", function () {
    const isOpen = navItems.classList.toggle("open");
    navToggle.setAttribute("aria-expanded", isOpen);
});

// Dropdown Toggle (Mobile)
if (window.innerWidth <= 768) {
    dropdown.classList.toggle("active");
}

Dateien

Datei	Zweck
/public/css/nav.css	Alle Navigation-Styles (dediziert)
/public/js/app.js	Mobile Toggle + Dropdown-Logik
/src/View/layout.php	HTML-Struktur der Navigation

HTML-Struktur

<nav class="main-nav">
    <a href="/" class="nav-brand">Campus</a>
    
    <button class="nav-toggle" aria-label="Navigation">
        <span class="nav-toggle-icon"></span>
    </button>
    
    <div class="nav-items">
        <!-- Anwendungen -->
        <div class="nav-dropdown">
            <button class="nav-dropdown-btn">Anwendungen</button>
            <div class="nav-dropdown-content">
                <a href="/chat">KI-Chat</a>
                <a href="/content">Content Studio</a>
                <a href="/semantic-explorer">Semantic Explorer</a>
            </div>
        </div>
        
        <!-- Entwicklung -->
        <div class="nav-dropdown">
            <button class="nav-dropdown-btn">Entwicklung</button>
            <div class="nav-dropdown-content">
                <a href="/tasks">Tasks</a>
                <a href="/protokoll">Protokoll</a>
                <a href="/contracts">Contracts</a>
            </div>
        </div>
        
        <!-- Ressourcen -->
        <div class="nav-dropdown">
            <button class="nav-dropdown-btn">Ressourcen</button>
            <div class="nav-dropdown-content">
                <a href="/docs">Dokumentation</a>
                <a href="/backup-restore">File Backup</a>
            </div>
        </div>
    </div>
</nav>

Änderungshistorie

Datum	Änderung
2025-12-21	Ressourcen-Menü: File Backup hinzugefügt
2025-12-20	Initial: Dropdown-Navigation mit Mobile-Support erstellt
2025-12-20	nav.css als dedizierte Datei ausgelagert
2025-12-20	/explorer Route entfernt (redundant)

]]>

Dokumentation » KI-System » RAG Collections

RAG Collections

Erstellt: 2025-12-21 | Aktualisiert: 2025-12-31

Verwaltung der Qdrant-Collections für RAG-Suche in Chat und Content Studio.

Datenbank	ki_dev
Tabelle	rag_collections
Repository	/src/Infrastructure/Persistence/CollectionRepository.php
Qdrant-API	http://localhost:6333

Zweck

Die Tabelle rag_collections synchronisiert Metadaten von Qdrant-Collections mit der Anwendung:

Collection-Auswahl in Chat und Content Studio
Validierung der Embedding-Dimensionen
Anzeige von Collection-Statistiken
Steuerung der Suchbarkeit (is_searchable)

Schema

Spalte	Typ	Default	Beschreibung
id	INT AUTO_INCREMENT	-	Primary Key
collection_id	VARCHAR(100) UNIQUE	-	Qdrant Collection-Name
display_name	VARCHAR(100)	-	Anzeigename in UI
description	TEXT	NULL	Beschreibung
vector_size	INT	NULL	Embedding-Dimension (z.B. 1024)
distance_metric	VARCHAR(20)	NULL	Cosine, Euclidean, Dot
points_count	INT	0	Anzahl Vektoren
embedding_model	VARCHAR(100)	NULL	z.B. mxbai-embed-large
chunk_size	INT	NULL	Chunk-Größe in Zeichen
chunk_overlap	INT	NULL	Chunk-Überlappung
source_type	ENUM	'manual'	nextcloud, mail, manual, system
source_path	VARCHAR(500)	NULL	Quellpfad (z.B. Nextcloud-Ordner)
is_active	TINYINT(1)	1	Collection aktiv
is_searchable	TINYINT(1)	1	In Suche verfügbar
sort_order	INT	0	Sortierung in UI
last_synced_at	DATETIME	NULL	Letzte Synchronisation
created_at	DATETIME	CURRENT_TIMESTAMP	Erstellungszeitpunkt
updated_at	DATETIME	CURRENT_TIMESTAMP	Letzte Änderung

Aktuelle Collections (Qdrant)

Collections werden in /var/www/scripts/pipeline/config.py definiert:

collection_id	display_name	vector_size	Zweck
documents	Dokumente	1024	Nextcloud-Dokumente (PDF, DOCX, etc.)
mail	E-Mails	1024	E-Mail-Archiv
entities	Entitäten	1024	Extrahierte Entitäten mit Embeddings

Repository-Methoden

// Alle suchbaren Collections abrufen
$collections = $collectionRepository->getSearchable();

// Collection-Details
$collection = $collectionRepository->findById('documents');

// Statistiken aktualisieren (Qdrant-Sync)
$collectionRepository->syncFromQdrant();

Validierung

Bei Multi-Collection-Suche müssen alle Collections die gleiche Embedding-Dimension haben:

$validator = new CollectionValidator($collectionRepository);
$result = $validator->validateSelection(['documents', 'entities']);

if (!$result->isValid()) {
    throw new \InvalidArgumentException($result->getError());
}

UI-Integration

Collections werden als Checkbox-Gruppe in Chat und Content Studio angezeigt:

<?php
$selected = ['documents'];
$variant = 'checkbox';
include __DIR__ . '/../partials/form/collections-select.php';
?>

Siehe auch

Änderungshistorie

Datum	Änderung
2025-12-31	Collections korrigiert: entities hinzugefügt, dokumentation entfernt (existiert nicht in Qdrant)
2025-12-21	Initial erstellt

]]>

Dokumentation » KI-System » Content Config

Content Config

Erstellt: 2025-12-21 | Aktualisiert: 2025-12-31

Unified Configuration für Autorenprofile, Contracts, Strukturen und System-Konfigurationen im Content Studio.

Datenbank	ki_content
Tabelle	content_config
Historie	content_config_history
Repository	/src/Infrastructure/Persistence/ContentRepository.php

Konzept

Die Tabelle content_config vereinheitlicht verschiedene Konfigurationstypen:

Type	Zweck
author_profile	Schreibstil und Tonalität
contract	Qualitätsanforderungen
structure	Gliederungs-Templates
organization	Organisations-/Kundendaten
rule	Einzelne Regeln für Contracts
system_prompt	System-Prompts für LLM-Aufrufe
critic	Kritiker-Konfigurationen für Content-Prüfung

Schema: content_config

Spalte	Typ	Default	Beschreibung
id	INT AUTO_INCREMENT	-	Primary Key
type	ENUM	-	author_profile, structure, organization, contract, rule, system_prompt, critic
name	VARCHAR(100)	-	Anzeigename
slug	VARCHAR(100)	-	URL-freundlicher Identifier
description	TEXT	NULL	Kurzbeschreibung
content	LONGTEXT	-	JSON-Konfiguration
version	VARCHAR(20)	'1.0'	Versionsnummer
status	ENUM	'draft'	draft, active, deprecated
parent_id	INT	NULL	FK für Vererbung
prompt_id	INT	NULL	FK zu prompts-Tabelle
sort_order	INT	0	Sortierung
created_at	DATETIME	CURRENT_TIMESTAMP	Erstellungszeitpunkt
updated_at	DATETIME	CURRENT_TIMESTAMP	Letzte Änderung

Config-Typen

author_profile

Definiert Schreibstil und Tonalität für Content-Generierung:

{
    "tone": "didaktisch",
    "style": "erklärend",
    "vocabulary": "fachlich",
    "target_audience": "Coaches und Berater",
    "examples": true
}

contract

Qualitätsanforderungen und Regeln:

{
    "min_words": 500,
    "max_words": 2000,
    "required_sections": ["Einleitung", "Hauptteil", "Fazit"],
    "forbidden_phrases": ["offensichtlich", "natürlich"],
    "citation_style": "Harvard"
}

structure

Gliederungs-Templates:

{
    "format": "blog_article",
    "sections": [
        {"name": "hook", "required": true},
        {"name": "problem", "required": true},
        {"name": "solution", "required": true},
        {"name": "cta", "required": false}
    ]
}

organization

Organisations- und Kundendaten für personalisierte Inhalte:

{
    "name": "Campus am See",
    "domain": "Systemische Beratung",
    "tone_preferences": ["professionell", "warmherzig"],
    "brand_keywords": ["Coaching", "Systemisch", "Entwicklung"]
}

rule

Einzelne Regeln, die in Contracts referenziert werden können:

{
    "rule_type": "forbidden_phrase",
    "pattern": "offensichtlich|natürlich|selbstverständlich",
    "severity": "warning",
    "message": "Vermeiden Sie Füllwörter"
}

system_prompt

System-Prompts für verschiedene LLM-Anwendungsfälle:

{
    "purpose": "rag_chat",
    "template": "Du bist ein hilfreicher Assistent...",
    "variables": ["context", "question"],
    "model_hints": {"temperature": 0.7}
}

critic

Kritiker-Konfigurationen für automatisierte Content-Prüfung:

{
    "focus": "clarity",
    "criteria": ["Verständlichkeit", "Struktur", "Zielgruppenpassung"],
    "severity_threshold": "medium",
    "auto_apply": false
}

Repository-Methoden

// Alle aktiven Profile
$profiles = $contentRepository->findAllProfiles();

// Alle aktiven Contracts
$contracts = $contentRepository->findAllContracts();

// Alle aktiven Strukturen
$structures = $contentRepository->findAllStructures();

// Nach Typ abrufen
$critics = $contentRepository->findByType('critic');

// SQL-Äquivalent
SELECT * FROM content_config 
WHERE type = 'author_profile' AND status = 'active'
ORDER BY name;

Verwendung in Content Orders

Content Orders referenzieren Config-Einträge via Foreign Keys:

content_orders.author_profile_id → content_config.id (type='author_profile')
content_orders.contract_id → content_config.id (type='contract')
content_orders.structure_id → content_config.id (type='structure')

Settings-Persistenz

Die zuletzt verwendeten Settings werden beim nächsten Auftrag als Default übernommen:

$lastSettings = $contentRepository->getLastOrderSettings();
// Returns: model, collections, context_limit, author_profile_id, contract_id, structure_id

Siehe auch

Änderungshistorie

Datum	Änderung
2025-12-31	Fehlende Config-Typen dokumentiert: organization, rule, system_prompt, critic
2025-12-21	Initial erstellt (ersetzt author_profiles, content_contracts, content_structures)

]]>