Protokoll #7769

ID	7769
Zeitstempel	2025-12-23 08:33:12.080159
Client	root
IP	145.224.96.190
Modell	claude-sonnet-4-20250514
Status	completed
Tokens	1,030 (Input: 35, Output: 995)
Dauer	32445 ms
Request-Zeit	2025-12-23 08:33:12.080159
Response-Zeit	2025-12-23 08:33:44.524765

Request

{
    "event": "PreToolUse",
    "tool_name": "WebSearch",
    "tool_input": {
        "query": "ollama vision models list 2024 comparison llava minicpm moondream"
    }
}

Response

{
    "tool_response": {
        "query": "ollama vision models list 2024 comparison llava minicpm moondream",
        "results": [
            {
                "tool_use_id": "srvtoolu_01C6EAbG9bKZbKqdFEnX9VhT",
                "content": [
                    {
                        "title": "Vision models · Ollama Search",
                        "url": "https:\/\/ollama.com\/search?c=vision"
                    },
                    {
                        "title": "library",
                        "url": "https:\/\/ollama.com\/library"
                    },
                    {
                        "title": "Vision Models | imanoop7\/Ollama-OCR | DeepWiki",
                        "url": "https:\/\/deepwiki.com\/imanoop7\/Ollama-OCR\/7-vision-models"
                    },
                    {
                        "title": "Vision models · Ollama Blog",
                        "url": "https:\/\/ollama.com\/blog\/vision-models"
                    },
                    {
                        "title": "Multimodal LLMs on a Mac M1: A Quick Test | by Prashant Dandriyal | Medium",
                        "url": "https:\/\/prashantdandriyal.medium.com\/multimodal-llms-on-a-mac-m1-a-quick-test-5397bd33a6b6"
                    },
                    {
                        "title": "GitHub - imanoop7\/Ollama-OCR",
                        "url": "https:\/\/github.com\/imanoop7\/Ollama-OCR"
                    },
                    {
                        "title": "Thought Eddies | Multi-Modal Models with ollama",
                        "url": "https:\/\/www.danielcorin.com\/til\/ollama\/multi-modal-models\/"
                    },
                    {
                        "title": "Support newer vision models from ollama · Issue #241 · wandb\/openui",
                        "url": "https:\/\/github.com\/wandb\/openui\/issues\/241"
                    },
                    {
                        "title": "Vision Models in Ollama: AI Insights 2025",
                        "url": "https:\/\/www.byteplus.com\/en\/topic\/418117"
                    },
                    {
                        "title": "The Latest Ollama Models in 2025 Update",
                        "url": "https:\/\/www.elightwalk.com\/blog\/latest-ollama-models"
                    }
                ]
            },
            "Based on the search results, here's an overview of Ollama vision models and their comparisons:\n\n## Ollama Vision Models Overview\n\n### Available Vision Models\n\nLLaVA is a novel end-to-end trained large multimodal model that combines a vision encoder and Vicuna for general-purpose visual and language understanding.(no content)The LLaVA model collection has been updated to version 1.6, supporting higher image resolution with up to 4x more pixels, allowing the model to grasp more details.(no content)LLaVA offers 3 variants of weights: 7B, 13B, and 34B.(no content)Moondream is a small vision language model optimized for edge devices.(no content)Moondream 2 has 1.4B parameters and is only 829MB in size.(no content)It's an excellent choice for applications running on edge devices or where processing resources are limited, such as mobile applications or IoT devices.(no content)MiniCPM-V 2.6 can process images with any aspect ratio and up to 1.8 million pixels (e.g., 1344x1344).(no content)The Ollama library also includes other vision models like minicpm-v and qwen2.5vl, providing a diverse set of tools for various vision-language tasks.\n\n### Other Vision Models\n\nAdditional models include LLaVA 1.5\/1.6 (general-purpose multimodal model for visual understanding and VQA), Qwen-VL 2.5 (capable of document OCR, layout analysis, translation, and visual reasoning), and Gemma 3 (Multimodal) which accepts multiple images as input.(no content)BakLLaVA is a variation of LLaVA that is based on the Mistral 7B parameter model, offering an alternative architecture.\n\n### Model Recommendations by Use Case\n\n| Model | Size | Best For |\n|-------|------|----------|\n| LLaVA | 4.5-4.7GB | General-purpose OCR tasks where processing speed is important |\n| Llama 3.2 Vision | 11B | Processing complex documents where accuracy is critical, such as legal documents, financial statements, or technical papers |\n| Moondream 2\/1.8B | ~829MB-1.7GB | Lightweight visual models suitable for running on CPU and mobile environments |\n| MiniCPM-V | varies | High-resolution image processing |\n\n### Quick Start Commands\n\nTo pull these models: `ollama pull llama3.2-vision:11b`, `ollama pull granite3.2-vision`, `ollama pull moondream`, `ollama pull minicpm-v`\n\n### Performance Notes\n\nThese models (llava-phi3, moondream, llava-llama3, llava) are \"pretty small models and the outputs are directionally correct, but I wouldn't rely on these to get details correct.\"(no content)Note: You should have at least 8 GB of RAM available to run the 7B models, 16 GB to run the 13B models, and 32 GB to run the 33B models."
        ],
        "durationSeconds": 32.3331981099993
    }
}

← Vorheriger Zur Liste Nächster →