Our CEFR Methodology
Transparent, research-based approach to determining word difficulty levels
Research-Based Approach
Our CEFR level determinations combine authoritative linguistic databases with empirical frequency analysis to provide accurate, educationally relevant word difficulty assessments. We prioritize transparency and educational value over speed, ensuring each level assignment serves language learners effectively.
Our Data Sources
Weight: 40% of consensus calculation
Authority: Cambridge University Press linguistics research
Coverage: Core vocabulary with pedagogical focus
Reliability: Highest - Based on learner corpus analysis and curriculum alignment
Weight: 30% of consensus calculation
Authority: Oxford University Press lexicographic team
Coverage: Comprehensive learner vocabulary
Reliability: High - Extensive ESL research foundation
Weight: 20% of consensus calculation
Authority: Corpus linguistics and usage frequency data
Coverage: Statistical analysis of authentic language use
Reliability: High - Data-driven frequency mapping to CEFR levels
Weight: 10% of consensus calculation
Authority: British Council English language teaching expertise
Coverage: Curated educational vocabulary
Reliability: Moderate - Focused on teaching priorities
Consensus Algorithm
Weighted Consensus Calculation
Step 1: Source Validation
We verify that each source provides a valid CEFR level (A1, A2, B1, B2, C1, C2) for the queried word. Invalid responses are excluded from calculation.
Step 2: Weighted Scoring
Each valid source contributes to the final score based on its reliability weight:
- A1 = 1 point, A2 = 2 points, B1 = 3 points, B2 = 4 points, C1 = 5 points, C2 = 6 points
- Cambridge: Score × 0.4
- Oxford: Score × 0.3
- Word Frequency: Score × 0.2
- British Council: Score × 0.1
Step 3: Consensus Determination
The weighted average score is converted back to the nearest CEFR level. In cases of ties or ambiguous results, we defer to the most authoritative source (Cambridge Dictionary).
Quality Assurance
Words showing significant disagreement between sources (variance > 1.5 levels) are flagged for manual review by our linguistics team.
Limitations & Considerations
Context Dependency
Word difficulty can vary significantly based on context, register, and usage. Our levels represent general pedagogical guidelines rather than absolute classifications.
Source Availability
Some words may not appear in all databases, particularly specialized terms, proper nouns, or very recent additions to English vocabulary.
Regional Variations
Our methodology primarily reflects British and American English usage patterns. Regional vocabulary differences may affect accuracy.
Dynamic Language
Language evolves continuously. We update our databases regularly, but some classifications may lag behind current usage trends.
Quality Standards
Editorial Review
Our team of qualified English language teachers and linguists reviews classifications showing high variance or user feedback.
Continuous Improvement
We track accuracy metrics and user feedback to refine our methodology and source weightings.
Transparency
All source determinations are displayed openly, allowing users to understand and evaluate our consensus process.
Educational Focus
Our primary goal is supporting language learners and educators with practical, pedagogically sound level assignments.
Try Our CEFR Lookup Tool
Experience our methodology in action with any English word
Lookup Word Level