Skip to main content
Back to registry

ocr-document-processor

dkyazzentwatwa/chatgpt-skills

Extract text from images, scanned PDFs, and photographs using Optical Character Recognition (OCR). Supports multiple languages, structured output formats, and intelligent document parsing.

Installs1
Install command
npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill ocr-document-processor
Security audits
Gen Agent Trust HubPASS
SocketPASS
SnykPASS
About this skill
Extract text from images, scanned PDFs, and photographs using Optical Character Recognition (OCR). Supports multiple languages, structured output formats, and intelligent document parsing. Preprocessing improves OCR accuracy on low-quality images. Output includes: Output structure: Creates styled HTML with: - Image OCR : Extract text from PNG, JPEG, TIFF, BMP images - PDF OCR : Process scanned PDFs page by page - Multi-language : Support for 100+ languages - Structured Output : Plain text, Markdown, JSON, or HTML - Table Detection : Extract tabular data to CSV/JSON - Batch Processing : Process multiple documents at once - Quality Assessment : Confidence scoring for OCR results - Document title (if detected) - Structured headings - Paragraphs - Tables (as Markdown tables) - Page breaks for multi-page docs - Preserved layout approximation - Highlighted low-confidence regions - Embedded images (optional) - Print-friendly styling - Image Quality : Higher resolution (300+ DPI) improves accuracy - Preprocessing : Use for low-quality scans - Language : Specifying language improves speed and accuracy - PSM Mode : Choose appropriate mode for document type - Large Files : Process PDFs page by page for memory efficiency - Handwritten text: Limited accuracy - Complex layouts: May lose structure - Very low quality: Preprocessing helps but has limits - Non-Latin scripts: Require specific...

Source description provided by the upstream skill listing. Community reviews and install context appear in the sections below.

Community Reviews

Latest reviews

Sign in to review

No community reviews yet. Be the first to review.

Browse this skill in context
FAQ
What does ocr-document-processor do?

Extract text from images, scanned PDFs, and photographs using Optical Character Recognition (OCR). Supports multiple languages, structured output formats, and intelligent document parsing.

Is ocr-document-processor good?

ocr-document-processor does not have approved reviews yet, so SkillJury cannot publish a community verdict.

What agent does ocr-document-processor work with?

ocr-document-processor currently lists compatibility with codex, gemini-cli, opencode, cursor, kimi-cli, github-copilot.

What are alternatives to ocr-document-processor?

Skills in the same category include telegram-bot-builder, flutter-app-size, sharp-edges, iterative-retrieval.

How do I install ocr-document-processor?

npx skills add https://github.com/dkyazzentwatwa/chatgpt-skills --skill ocr-document-processor

Related skills

More from dkyazzentwatwa/chatgpt-skills

Related skills

Alternatives in Software Engineering