{
"event": "PreToolUse",
"tool_name": "mcp__mcp-tasks__tasks_create",
"tool_input": {
"title": "5\/6 Python-Executor: Implementierung der 6 neuen step_types",
"description": "## Ziel\nPython-Executor in \/opt\/scripts\/pipeline erweitern um die 6 neuen step_types.\n\n## Neue Step-Types zu implementieren\n\n### 1. duplicate_check\n```python\ndef step_duplicate_check(doc_data, config):\n \"\"\"Prüft auf Duplikate via Content-Hash.\"\"\"\n content_hash = doc_data.get('content_hash')\n if not content_hash:\n return {'status': 'skip', 'reason': 'no_hash'}\n \n # Query: SELECT id FROM documents WHERE content_hash = ? AND id != ?\n existing = db.query(...)\n if existing:\n return {\n 'status': 'abort', # WICHTIG: Pipeline abbrechen!\n 'reason': 'duplicate',\n 'duplicate_id': existing['id']\n }\n return {'status': 'ok'}\n```\n\n### 2. text_semantic_analyze (LLM)\n```python\ndef step_text_semantic_analyze(chunks, config):\n \"\"\"Analysiert WIE etwas gesagt wird.\"\"\"\n prompt = '''Analysiere den folgenden Text semantisch:\n \n {chunk_content}\n \n Antworte im JSON-Format:\n {\n \"statement_form\": \"assertion|question|command|hypothesis\",\n \"intent\": \"informing|persuading|instructing|evaluating\",\n \"frame\": \"theoretical|practical|normative\",\n \"is_negated\": true|false,\n \"discourse_role\": \"definition|example|argument|conclusion\"\n }'''\n \n for chunk in chunks:\n response = ollama.generate(model=config['model'], prompt=prompt.format(...))\n chunk['text_semantics'] = json.loads(response)\n return chunks\n```\n\n### 3. text_semantic_store\n```python\ndef step_text_semantic_store(chunks, config):\n \"\"\"Speichert Textsemantik in DB.\"\"\"\n for chunk in chunks:\n if 'text_semantics' not in chunk:\n continue\n sem = chunk['text_semantics']\n db.insert('chunk_text_semantics', {\n 'chunk_id': chunk['id'],\n 'statement_form': sem['statement_form'],\n 'intent': sem['intent'],\n 'frame': sem['frame'],\n 'is_negated': sem['is_negated'],\n 'discourse_role': sem['discourse_role']\n })\n return {'stored': len(chunks)}\n```\n\n### 4. knowledge_semantic_analyze (LLM)\n```python\ndef step_knowledge_semantic_analyze(entities, config):\n \"\"\"Analysiert WAS Entitäten BEDEUTEN.\"\"\"\n prompt = '''Analysiere die Bedeutung dieser Entität im Kontext:\n \n Entität: {entity_name} (Typ: {entity_type})\n Kontext: {chunk_content}\n \n Antworte im JSON-Format:\n {\n \"semantic_role\": \"agent|patient|instrument|location|source|goal\",\n \"properties\": {\n \"temporal\": \"past|present|future|atemporal\",\n \"spatial\": \"local|regional|global|abstract\",\n \"modal\": \"factual|possible|necessary|hypothetical\"\n },\n \"functional_category\": \"process|state|event|quality\",\n \"context_meaning\": \"Kurze Erklärung der Bedeutung im Kontext\"\n }'''\n \n for entity in entities:\n response = ollama.generate(...)\n entity['knowledge_semantics'] = json.loads(response)\n return entities\n```\n\n### 5. knowledge_semantic_store\n```python\ndef step_knowledge_semantic_store(entities, config):\n \"\"\"Speichert Wissenssemantik in DB.\"\"\"\n for entity in entities:\n if 'knowledge_semantics' not in entity:\n continue\n sem = entity['knowledge_semantics']\n db.insert('entity_knowledge_semantics', {\n 'entity_id': entity['id'],\n 'chunk_id': entity['chunk_id'],\n 'semantic_role': sem['semantic_role'],\n 'properties': json.dumps(sem['properties']),\n 'functional_category': sem['functional_category'],\n 'context_meaning': sem['context_meaning']\n })\n return {'stored': len(entities)}\n```\n\n### 6. ontology_store\n```python\ndef step_ontology_store(classifications, config):\n \"\"\"Speichert Ontologie-Klassifikationen.\"\"\"\n for cls in classifications:\n db.insert('ontology_classes', {\n 'entity_id': cls['entity_id'],\n 'class_name': cls['class_name'],\n 'parent_class': cls.get('parent_class'),\n 'confidence': cls.get('confidence', 1.0)\n })\n return {'stored': len(classifications)}\n```\n\n## Dateien zu bearbeiten\n- `\/opt\/scripts\/pipeline\/steps\/__init__.py` - Step-Registry\n- `\/opt\/scripts\/pipeline\/steps\/semantic.py` - Neue Datei für Semantik-Steps\n- `\/opt\/scripts\/pipeline\/steps\/knowledge.py` - Neue Datei für Wissens-Steps\n- `\/opt\/scripts\/pipeline\/executor.py` - Step-Dispatcher erweitern\n\n## Abhängigkeiten\n- Task #493: DB-Tabellen müssen existieren\n- Task #494: ENUMs müssen existieren\n- LLM-Modell: mistral oder llama3",
"type": "ai_task",
"parent_task_id": 492
}
}
{
"tool_response": [
{
"type": "text",
"text": "{\n \"success\": true,\n \"task\": {\n \"id\": 497,\n \"uuid\": \"b265c524-ca27-4dbf-9cde-d94f594464aa\",\n \"title\": \"5\/6 Python-Executor: Implementierung der 6 neuen step_types\",\n \"description\": \"## Ziel\\nPython-Executor in \/opt\/scripts\/pipeline erweitern um die 6 neuen step_types.\\n\\n## Neue Step-Types zu implementieren\\n\\n### 1. duplicate_check\\n```python\\ndef step_duplicate_check(doc_data, config):\\n \\\"\\\"\\\"Prüft auf Duplikate via Content-Hash.\\\"\\\"\\\"\\n content_hash = doc_data.get('content_hash')\\n if not content_hash:\\n return {'status': 'skip', 'reason': 'no_hash'}\\n \\n # Query: SELECT id FROM documents WHERE content_hash = ? AND id != ?\\n existing = db.query(...)\\n if existing:\\n return {\\n 'status': 'abort', # WICHTIG: Pipeline abbrechen!\\n 'reason': 'duplicate',\\n 'duplicate_id': existing['id']\\n }\\n return {'status': 'ok'}\\n```\\n\\n### 2. text_semantic_analyze (LLM)\\n```python\\ndef step_text_semantic_analyze(chunks, config):\\n \\\"\\\"\\\"Analysiert WIE etwas gesagt wird.\\\"\\\"\\\"\\n prompt = '''Analysiere den folgenden Text semantisch:\\n \\n {chunk_content}\\n \\n Antworte im JSON-Format:\\n {\\n \\\"statement_form\\\": \\\"assertion|question|command|hypothesis\\\",\\n \\\"intent\\\": \\\"informing|persuading|instructing|evaluating\\\",\\n \\\"frame\\\": \\\"theoretical|practical|normative\\\",\\n \\\"is_negated\\\": true|false,\\n \\\"discourse_role\\\": \\\"definition|example|argument|conclusion\\\"\\n }'''\\n \\n for chunk in chunks:\\n response = ollama.generate(model=config['model'], prompt=prompt.format(...))\\n chunk['text_semantics'] = json.loads(response)\\n return chunks\\n```\\n\\n### 3. text_semantic_store\\n```python\\ndef step_text_semantic_store(chunks, config):\\n \\\"\\\"\\\"Speichert Textsemantik in DB.\\\"\\\"\\\"\\n for chunk in chunks:\\n if 'text_semantics' not in chunk:\\n continue\\n sem = chunk['text_semantics']\\n db.insert('chunk_text_semantics', {\\n 'chunk_id': chunk['id'],\\n 'statement_form': sem['statement_form'],\\n 'intent': sem['intent'],\\n 'frame': sem['frame'],\\n 'is_negated': sem['is_negated'],\\n 'discourse_role': sem['discourse_role']\\n })\\n return {'stored': len(chunks)}\\n```\\n\\n### 4. knowledge_semantic_analyze (LLM)\\n```python\\ndef step_knowledge_semantic_analyze(entities, config):\\n \\\"\\\"\\\"Analysiert WAS Entitäten BEDEUTEN.\\\"\\\"\\\"\\n prompt = '''Analysiere die Bedeutung dieser Entität im Kontext:\\n \\n Entität: {entity_name} (Typ: {entity_type})\\n Kontext: {chunk_content}\\n \\n Antworte im JSON-Format:\\n {\\n \\\"semantic_role\\\": \\\"agent|patient|instrument|location|source|goal\\\",\\n \\\"properties\\\": {\\n \\\"temporal\\\": \\\"past|present|future|atemporal\\\",\\n \\\"spatial\\\": \\\"local|regional|global|abstract\\\",\\n \\\"modal\\\": \\\"factual|possible|necessary|hypothetical\\\"\\n },\\n \\\"functional_category\\\": \\\"process|state|event|quality\\\",\\n \\\"context_meaning\\\": \\\"Kurze Erklärung der Bedeutung im Kontext\\\"\\n }'''\\n \\n for entity in entities:\\n response = ollama.generate(...)\\n entity['knowledge_semantics'] = json.loads(response)\\n return entities\\n```\\n\\n### 5. knowledge_semantic_store\\n```python\\ndef step_knowledge_semantic_store(entities, config):\\n \\\"\\\"\\\"Speichert Wissenssemantik in DB.\\\"\\\"\\\"\\n for entity in entities:\\n if 'knowledge_semantics' not in entity:\\n continue\\n sem = entity['knowledge_semantics']\\n db.insert('entity_knowledge_semantics', {\\n 'entity_id': entity['id'],\\n 'chunk_id': entity['chunk_id'],\\n 'semantic_role': sem['semantic_role'],\\n 'properties': json.dumps(sem['properties']),\\n 'functional_category': sem['functional_category'],\\n 'context_meaning': sem['context_meaning']\\n })\\n return {'stored': len(entities)}\\n```\\n\\n### 6. ontology_store\\n```python\\ndef step_ontology_store(classifications, config):\\n \\\"\\\"\\\"Speichert Ontologie-Klassifikationen.\\\"\\\"\\\"\\n for cls in classifications:\\n db.insert('ontology_classes', {\\n 'entity_id': cls['entity_id'],\\n 'class_name': cls['class_name'],\\n 'parent_class': cls.get('parent_class'),\\n 'confidence': cls.get('confidence', 1.0)\\n })\\n return {'stored': len(classifications)}\\n```\\n\\n## Dateien zu bearbeiten\\n- `\/opt\/scripts\/pipeline\/steps\/__init__.py` - Step-Registry\\n- `\/opt\/scripts\/pipeline\/steps\/semantic.py` - Neue Datei für Semantik-Steps\\n- `\/opt\/scripts\/pipeline\/steps\/knowledge.py` - Neue Datei für Wissens-Steps\\n- `\/opt\/scripts\/pipeline\/executor.py` - Step-Dispatcher erweitern\\n\\n## Abhängigkeiten\\n- Task #493: DB-Tabellen müssen existieren\\n- Task #494: ENUMs müssen existieren\\n- LLM-Modell: mistral oder llama3\",\n \"type\": \"ai_task\",\n \"status\": \"pending\",\n \"created_by\": \"mcp-tasks\",\n \"created_by_type\": \"ai\",\n \"parent_task_id\": 492,\n \"due_date\": null,\n \"created_at\": \"2025-12-28T01:33:12.736956\",\n \"updated_at\": \"2025-12-28T01:33:12.736959\",\n \"completed_at\": null,\n \"metadata\": {}\n },\n \"message\": \"Task #497 created\"\n}"
}
]
}