Producer/consumer pipeline orchestrator for image analysis.
Module: factory.ai/ObjAiVision.py
Class: ObjAiVision
Version: 8.0
ObjAiVision sits in the middle of an image analysis pipeline — it connects
any image producer to ObjAI's vision-model consumer interface. The
orchestrator is deliberately agnostic about which model handles each end.
Every analysis is automatically persisted to def_vision_analysis.
analyse_url(url)
→ ObjAI.capture(url, provider="playwright")
→ ObjAiMcpPlaywright.capture(url) [sync Playwright, 15 s timeout]
→ base64 PNG
→ ObjAI.prompt(image_base64=...)
→ ObjAiMcpOllama.prompt(images=[…]) [ollama.chat + llava]
→ text response + _persist_analysis()
analyse_image(path_or_b64)
→ ObjAI.prompt(image_base64=...)
→ ObjAiMcpOllama.prompt(images=[…])
→ text response + _persist_analysis()
analyse_url_diff(url)
→ ObjAI.capture(url, ...) → base64 PNG
→ _get_previous_analysis(url) → previous text
→ ObjAI.prompt(previous + new image)
→ diff description + _persist_analysis()
analyse_image_multi(path, models=[...])
→ for each model: ObjAI.prompt(...)
→ dict[model_name → text]
ObjAiVision(db=0, model="llava", capture_provider="playwright")
| Parameter | Type | Default | Description |
|---|---|---|---|
db |
int | 0 |
Database connection instance |
model |
str | "llava" |
Vision model name (Ollama model tag) |
capture_provider |
str | "playwright" |
MCP provider used for URL capture |
analyse_url(url, prompt, role) → strCaptures a URL via the configured capture_provider, then sends the
screenshot to the vision model. Persists the result.
analyse_image(image_path, prompt, role) → strSends an existing image to the vision model. image_path can be either:
ObjAI.prompt)analyse_url_diff(url, prompt, role) → strCaptures the URL, retrieves the most recent stored analysis for it from
def_vision_analysis, and asks the model what has changed. Falls back to a
plain analysis if no previous record exists.
analyse_image_multi(image_path, models, prompt, role) → dict[str, str]Queries multiple vision models sequentially and returns all responses as a
{model_name: text} dict. Useful for comparing model outputs side-by-side.
results = vision.analyse_image_multi(
"screenshot.png",
models=["llava", "bakllava"],
prompt="Describe this invoice.",
)
# {"llava": "...", "bakllava": "..."}
Every analyse_* call automatically writes a row to def_vision_analysis
(created on first use via create_tables_from_yaml). Schema:
| Column | Type | Description |
|---|---|---|
_guid |
VARCHAR(255) | Unique row identifier |
Module |
VARCHAR(255) | Active package name |
Url |
VARCHAR(2048) | Source URL (empty for analyse_image) |
ImagePath |
VARCHAR(2048) | File path (empty for analyse_url) |
Prompt |
TEXT | Prompt sent to the model |
Role |
TEXT | System role sent to the model |
Analysis |
LONGTEXT | Model response |
Model |
VARCHAR(255) | Ollama model tag used |
LatencyMs |
INT | Inference time in milliseconds |
CreatedAt |
DATETIME | Auto-set on insert |
The following keys are available in ObjAI.yaml for targeted analysis:
| Key | Use case |
|---|---|
VISION_DEFAULT_PROMPT |
General webpage screenshot |
VISION_IMAGE_PROMPT |
Generic image file |
VISION_LOGIN_PAGE_PROMPT |
Login/auth forms |
VISION_ERROR_PROMPT |
Error banners and failure states |
VISION_REPORT_PROMPT |
Charts, KPIs, dashboards |
VISION_CHANGE_DIFF_PROMPT |
Change detection (used by analyse_url_diff) |
VISION_INVOICE_PROMPT |
Invoice data extraction |
VISION_FORM_PROMPT |
Web form field listing |
import sys
sys.path += ["factory.core", "factory.ai"]
from ObjAiVision import ObjAiVision
import ObjAI
vision = ObjAiVision(model="llava")
# Analyse a web page
result = vision.analyse_url(
"https://example.com",
prompt="What is on this page?",
)
# Detect changes since last check
diff = vision.analyse_url_diff("https://example.com")
# Domain-specific prompt from YAML
invoice_prompt = ObjAI.get_ai_prompt("VISION_INVOICE_PROMPT")
result = vision.analyse_image("receipt.png", prompt=invoice_prompt)
# Fan-out across multiple models
results = vision.analyse_image_multi(
"screenshot.png",
models=["llava", "bakllava"],
)
| Module | Role |
|---|---|
factory.core/ObjAI.py |
Router — capture() and prompt() methods; usage log_event |
factory.core/ObjAI.yaml |
Prompt library (domain-specific prompts) |
factory.ai/ObjAiVision.yaml |
def_vision_analysis table schema |
factory.ai/package.mcp/ObjAiMcpPlaywright.py |
URL → base64 PNG (sync Playwright) |
factory.ai/package.mcp/ObjAiMcpOllama.py |
Vision LLM consumer |
factory.ai/package.mcp/ObjAiMcpBase.py |
Base class with capture() stub and usage dict |
# Ollama + llava model
ollama pull llava
# Playwright browser (sync API, no event loop conflict)
playwright install chromium