Our CEFR Methodology

Transparent, research-based approach to determining word difficulty levels

Dr. Sarah Mitchell, Applied Linguistics PhD | Last updated: September 2025

Research-Based Approach

Our CEFR level determinations combine authoritative linguistic databases with empirical frequency analysis to provide accurate, educationally relevant word difficulty assessments. We prioritize transparency and educational value over speed, ensuring each level assignment serves language learners effectively.

Our Data Sources

Cambridge Dictionary (Primary)

Weight: 40% of consensus calculation

Authority: Cambridge University Press linguistics research

Coverage: Core vocabulary with pedagogical focus

Reliability: Highest - Based on learner corpus analysis and curriculum alignment

Oxford Learner's Dictionary

Weight: 30% of consensus calculation

Authority: Oxford University Press lexicographic team

Coverage: Comprehensive learner vocabulary

Reliability: High - Extensive ESL research foundation

Word Frequency Analysis

Weight: 20% of consensus calculation

Authority: Corpus linguistics and usage frequency data

Coverage: Statistical analysis of authentic language use

Reliability: High - Data-driven frequency mapping to CEFR levels

British Council Word Lists

Weight: 10% of consensus calculation

Authority: British Council English language teaching expertise

Coverage: Curated educational vocabulary

Reliability: Moderate - Focused on teaching priorities

Consensus Algorithm

Weighted Consensus Calculation

Step 1: Source Validation

We verify that each source provides a valid CEFR level (A1, A2, B1, B2, C1, C2) for the queried word. Invalid responses are excluded from calculation.

Step 2: Weighted Scoring

Each valid source contributes to the final score based on its reliability weight:

  • A1 = 1 point, A2 = 2 points, B1 = 3 points, B2 = 4 points, C1 = 5 points, C2 = 6 points
  • Cambridge: Score × 0.4
  • Oxford: Score × 0.3
  • Word Frequency: Score × 0.2
  • British Council: Score × 0.1
Step 3: Consensus Determination

The weighted average score is converted back to the nearest CEFR level. In cases of ties or ambiguous results, we defer to the most authoritative source (Cambridge Dictionary).

Quality Assurance

Words showing significant disagreement between sources (variance > 1.5 levels) are flagged for manual review by our linguistics team.

Limitations & Considerations

Context Dependency

Word difficulty can vary significantly based on context, register, and usage. Our levels represent general pedagogical guidelines rather than absolute classifications.

Source Availability

Some words may not appear in all databases, particularly specialized terms, proper nouns, or very recent additions to English vocabulary.

Regional Variations

Our methodology primarily reflects British and American English usage patterns. Regional vocabulary differences may affect accuracy.

Dynamic Language

Language evolves continuously. We update our databases regularly, but some classifications may lag behind current usage trends.

Quality Standards

Editorial Review

Our team of qualified English language teachers and linguists reviews classifications showing high variance or user feedback.

Continuous Improvement

We track accuracy metrics and user feedback to refine our methodology and source weightings.

Transparency

All source determinations are displayed openly, allowing users to understand and evaluate our consensus process.

Educational Focus

Our primary goal is supporting language learners and educators with practical, pedagogically sound level assignments.

Try Our CEFR Lookup Tool

Experience our methodology in action with any English word

Lookup Word Level