
Clinical NER is the backbone of healthcare analytics, turning unstructured clinical notes into usable, structured data, but traditional evaluation (precision/recall/F1) misses “near-miss” matches, clinical context, and attribute correctness. This whitepaper introduces a Multi-Tier Adaptive Evaluation Framework that automatically calibrates evaluation depth based on ground-truth data availability. It combines concept-level hybrid similarity (exact + fuzzy + medical embeddings), section-aware context validation, and attribute-level checks for medications, labs, and vitals. The result is a more clinically meaningful, data-aware, and robust way to benchmark modern NER systems, especially in the era of LLM-powered extraction.