top of page
Profile
Join date: Oct 15, 2019
Posts (13)
Jan 4, 2026 ∙ 32 min
Building an Advanced OCR System on Diverse Documents with DeepSeek and Gemma
Optical Character Recognition (OCR) has grown from simple text extraction to understanding complex documents. In this post, we’ll explore how to train two cutting-edge OCR models – DeepSeek-OCR and Gemma 3 – using PyTorch on a personal, air‑gapped server. We’ll cover the unique challenges of OCR on handwritten notes, printed forms, and invoices, why DeepSeek and Gemma are ideal in secure low-resource settings, how to set up and train them offline, and how to evaluate their performance....
8
0
Nov 23, 2025 ∙ 4 min
JEPA World Models: Innovative Predictive Learning Across Images, Video, and Agents
Joint-Embedding Predictive Architectures (JEPAs) are a family of models that learn by predicting high-level features rather than pixels. They unify image-based learning (I-JEPA), video-based learning (V-JEPA), and general predictive world models for autonomous agents. If you zoom out a bit, modern self‑supervised vision methods mostly fall into two categories: Invariance-based : given two augmented views of the same image, force the encoder to produce almost identical embeddings, and push...
23
1
1
Jul 27, 2025 ∙ 5 min
Universal App Launcher: Build Once, Use Everywhere
Every AI app wiring directly to multiple tools using multiple integrations lead brittle prompts and duplicated glue code. Here comes a solution - MCP (Model Context Protocol) , a standard client–server contract . Each Host ships one MCP Client and each capability lives behind an MCP Server . New pairings require zero new glue . It insures safer tool use, portable integrations, and faster iteration. MCP Overview An architecture diagram depicting the infrastructure prior to MCP...
2
0
Srijon Mandal
Admin
More actions
bottom of page
