Midv-550 Today

Existing public benchmarks (e.g., [1], IDDoc [2], SROIE [3]) either contain a limited number of document classes, provide only coarse bounding‑box annotations, or lack realistic mobile acquisition conditions. Consequently, progress in robust MIV systems has been hindered by a mismatch between training data and real‑world deployment scenarios.

Technical Report – April 2026 Abstract The proliferation of mobile‑based identity‑verification services has created a pressing need for realistic, large‑scale datasets that capture the visual variability of government‑issued identification (ID) documents captured with consumer‑grade smartphones. We introduce MIDV‑550 , a publicly released benchmark consisting of 5 550 high‑resolution images of five common ID‑document types (passport, national ID card, driver’s licence, residence permit, and employee badge) captured under uncontrolled lighting, pose, motion blur, and occlusion conditions. Each image is richly annotated with document‑level bounding boxes, per‑field polygons, text transcriptions, and a hierarchy of quality‑assessment tags. We present a systematic evaluation of state‑of‑the‑art detection (YOLOv8, EfficientDet‑D4) and recognition pipelines (CRNN, Transformer‑based OCR) on MIDV‑550, establishing baseline performance and highlighting the remaining challenges in mobile ID verification. The dataset, annotation tools, and evaluation scripts are released under a permissive CC‑BY‑4.0 license to foster reproducible research. 1. Introduction Mobile identity verification (MIV) has become a core component of financial onboarding, e‑government services, and travel‑related applications. Unlike traditional document‑verification workflows that rely on high‑quality scanners, MIV must cope with images captured by handheld smartphones in a wide range of uncontrolled environments. This introduces a set of visual degradations—low illumination, motion blur, perspective distortion, specular highlights, and partial occlusion—that dramatically affect both document detection and optical character recognition (OCR). MIDV-550

A composite score is reported for overall ranking. 5. Experimental Results 5.1 Document Detection | Model | mAP@0.5 | Inference (ms / img) | |-------|---------|----------------------| | Faster R‑CNN (ResNet‑101) | 0.89 | 128 | | EfficientDet‑D4 | 0.92 | 71 | | YOLOv8‑x (baseline) | 0.95 | 38 | Existing public benchmarks (e

Data augmentation (random motion blur, brightness jitter, perspective warp) during OCR training yields a 22 % relative CER reduction. | Pipeline | E2E Accuracy | Composite Score (S) | |----------|--------------|---------------------| | YOLOv8 We introduce MIDV‑550 , a publicly released benchmark