Evidence
The consistency of the Puffin Math framework was conducted independently by the University of Oxford and led by Dr. Ann Dowker (2016).
The study used the assessment data collected from 3465 students in 368 schools across England, Wales, Scotland and Europe (Englishspeaking schools). Additionally an independent study was carried out to correlate results with SEN school in Oklahoma.
The correlation analysis was carried out on the assessment data that was grouped into Number Meaning, Number Magnitude and Number Relationship components (Visual Numbers, counting through to Multiplication) by adding scores.
The data was analysed from ages 7 through to 11, to avoid the influence of too many small and diverse age groups.
The analysis showed all the components in the NumberSenseMMR® framework correlated significantly at (p < 001) with one another.
The analysis shown in the table below showed Number Magnitude was a highly significant independent predictor (beta = 0.313; t = 12.92; p < 0.001).
Similarly, Number Meaning (beta = 0.091; t = 3.784; p < 0.001) was also a significant independent predictor and Age was also significant (beta = 0.48; t = 2.46; p < 0.014).
Arguably, Age could be excluded from the multiple regression analysis as the data did not contain a truly continuous age variable
Number Relationship  
Standardized Coefficients beta 
t  p  
Number Magnitude  0.313  12.92  p < 0.001 
Number Meaning  0.091  3.784  p < 0.001 
Age  0.48  2.46  p < 0.014 
Table 1. Multiple Regression Analysis
The ANOVA analysis in Table 2 shows participant variance analysis with Age as the factor and the Number Meaning, Number Magnitude and Number Relationship components as the dependent variables.
The analysis showed that there was high significant effect of Age on Number Meaning score (F(4,2381) = 13.26; p = 0.001). The mean scores were 18.65 (s.d 2.113) for 7yearolds; 18.99 (s.d 1.76) for 8yearolds; 19.3 (s.d 1.39) for 9yearolds; 14.038 (s.d 5.198) for 10yearolds; and 14.54 (s.d 1.66) for 11yearolds.
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 8 and 9yearolds, 9 and 10yearolds or 9 and 11yearolds or 10and 11yearolds; but there were highly significant differences between 7 and 9yearolds, 7 and 10yearolds, 7 and 11yearolds, 8 and 10yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
The analysis showed that there was high significant effect of Age on Number Magnitude score (F(4,2381) = 24.467; p = 0.001). The mean scores were 12.18 (s.d 4.74) for 7yearolds; 11.78 (s.d 4.75) for 8yearolds; 12.87 (s.d 5.177) for 9yearolds; 14.038 (s.d 6.63) for 10yearolds; and 14.54 (s.d 5.48) for 11yearolds.
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 7 and 9yearolds, 8 and 9yearolds or 10 and 11yearolds; but there were highly significant differences between 7 and 10yearolds, 7 and 11yearolds, 8 and 9yearolds, 8 and 10yearolds, 8 and 11yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
The analysis showed there was high significant effect of Age on Number Relationship score (F(4,2381) = 12.86; p = 0.001). The mean scores were 9.54 (s.d 6.976) for 7yearolds; 9.776 (s.d 7.15) for 8yearolds; 10.007 (s.d 6.74) for 9yearolds; 19.41 (s.d 1.35) for 10yearolds; and 19.28 (s.d 7.1) for 11yearolds.
Factor  p  Mean  s.d  
Number Meaning  F(4,2381)=13.26  p = 0.001  
Age 7  18.65  2.113  
Age 8  18.99  1.76  
Age 9  19.3  1.39  
Age 10  14.038  >5.198  
Age 11  14.54  
Number Magnitude  F(4,2381)=24.467  p = 0.001  
Age 7  12.18  4.74  
Age 8  11.78  4.75  
Age 9  12.87  5.177  
Age 10  14.038  6.63  
Age 11  14.54  5.48  
Number Relationship  F(4,2381)=12.86  p = 0.001  
Age 7  9.54  6.976  
Age 8  9.776  7.15  
Age 9  10.007  6.74  
Age 10  19.41  1.35  
Age 11  19.28  7.1 
Table 2. ANOVA Variance analysis with Age and dependent variables
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 7 and 9yearolds, 8 and 9yearolds or 10 and 11yearolds; but there were highly significant differences between 7 and 10yearolds, 8 and 11yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
Therefore, this shows that there is a large difference in Number Meaning between 7yearolds and older children and for age 9, there may have been a tendency for a ceiling effect. Similarly, for Number Magnitude and Number Relationship, the biggest difference was between ages 7 to 9 and 10 to 11.
Puffin Math Assessment Standardization
Standardised Scores
What are Standardized Scores?
Standardized scores allow comparison of an individual’s performance with welldefined reference groups. The normative scores are sometimes referred to as standardized scores.
The standardized score indicates the degree to which an individual’s score deviates from the average for people of the same age.
The scale is based on the ‘normal‘ distribution of scores that would be expected within the population, and is calculated on the basis that the overall mean (average) standardized score is 100 and the standard deviation is 15, so that about 68% of people will score between 85 and 115.
Test Construction
The Puffin Assessment has 647 test items that are derived from dyscalculia research, taking into account the following areas:
 Progression of strands.
 Progression of questions within each strand.
 Questions within each strand meet the strand’s objective.
 An Individual Support Plan with signposts to the Puffin Intervention.
 Progression of questions within the Meaning, Magnitude and Relationship areas.
 Dynamic generation of questions based on responses and progression across the strands.
 Random generation of questions within each strand.
 Universality of language and operational symbols.
 Questions are developmentally appropriate for the baselined ages.
 Universality in the use of the illustrations.
 Universality in the use of the language.
 Measurement of the response time for each question.
 Questions reviewed for gender and cultural balance.
 The use of technology to ensure that when the questions are read, there is a common set of instructions for all students.
 Accessibility settings so that the necessary screen adjustments can take place.
 The provision of Support Tools:
 Student Profile Questionnaire to gain a snapshot of the pupil’s current development and functioning so that the necessary adjustments could be offered during the assessment.
 Workingout Notes for students to show their thinking on paper.
 Observation Notes for the test administrator to observe the pupil’s methods, approaches and thinking during the assessment.
The data extract was confirmed as results from tests administered online independently carried out at schools by SEN Coordinators, Math Coordinators, Class teachers and HigherLevel Teaching Assistants.
A sample of 3465 students was used for the standardization of the Puffin Math Assessment.
The sample data was chosen to achieve the widest balance of content, both with the sets and throughout as a whole to represent the population.
The whole represented the collective three stages attainment in all of the NumberSenseMMR® framework components and the component measurements were taken as raw scores.
The data sets were grouped in seven sets for the purposes of analysing Standard Deviation and Mean. The seven groups were grouped in Ages 6, 7, 8, 9, 10, 11, and 12+ shown in Table 2 below.
The student results were stratified by the NumberSenseMMR® framework components and were not based on Key Stage results, as this information was not available in the same format.
About the Sample Data
The sample test results data set was extracted for the assessments carried out between the period September 2014 and April 2015. The data sets for analysis represented 3465 students in 368 schools in England, Wales, Scotland and Europe (Englishspeaking schools using the UK curriculum).
The distribution of number of schools and number of students to represent the population is shown in Table 1, and the percentage distribution of the sample data by regions is shown in Diagram 1.
A total of 5 student data sets that represented incomplete assessments or expired time were removed from the extracted sample data for analysis.
Regions  No. of Students 
%  No. of Schools 
%  Population  % 
North  521  15  54  15  12570  17 
Midlands  525  15  62  17  15384  21 
South  2038  59  200  54  35515  49 
Wales  91  3  18  5  3145  4 
Europe  141  4  20  5  1811  3 
Scotland  149  4  14  4  3642  5 
Total:  3465  100  368  100  72067  100 
Table 1. Number of schools and students in the standardised sample
Diagram 1. Distribution of schools and students by regions in the standardised sample
The sample data represented a distribution of students between 6 and 15 years old. The equivalent of school curricular measures used in England, Scotland and Europe is shown in Table 2.
The gender split in the data represented 52% male and 48% female students. The detailed split by age is shown in Diagram 2 below.
Diagram 2. Profile of male and female students in the standardised sample
Age  England  Europe British Schools 
Scotland  Male  Female  No. of Students 
No. of Schools 
6 – 7 years  Year 1  Grade 1  P1  270  248  518  16 
7 – 8 years  Year 2  Grade 2  P2  369  321  690  36 
8 – 9 years  Year 3  Grade 3  P3  364  344  708  45 
9 – 10 years  Year 4  Grade 4  P4  281  319  600  76 
10 – 11 years  Year 5  Grade 5  P5  197  207  404  96 
11 – 12 years  Year 6  Grade 6  P6  119  83  202  49 
12 – 13 years  Intervention  Grade 7  Intervention  61  49  110  9 
13 + years  Intervention  Grade 8  Intervention  137  96  233  41 
Total:  1798  1667  3465  368 
Table 2. Number of students and schools participating in field testing
Intervention Validation
Research
There have been many attempts to raise the performance of children with low numeracy skills, although not specifically for dyscalculia. In the United States, for example, evidencebased approaches have focused on children from deprived backgrounds, usually low socioeconomic status (C. Mussolin et al.2009; R. Price, D. Ansari, Curr. Biol. (2007)).
The 2003 Primary National Strategy gave special attention to children with low numeracy skills by:
(i) Diagnosing each child’s conceptual gaps in understanding
(ii) Giving the child more individual support in working through visual, verbal, and physical activities designed to bridge each gap.
Unfortunately, there is little quantitative evaluation of the effectiveness of these strategies: It has not been possible to tell whether identifying and targeting an individual’s conceptual gaps with a more individualized version of the same teaching is effective. A further problem is that these interventions are effective when there has been specialist training for teaching assistants, but not all schools can provide this (A. Dowker, 2009).
Standardized approaches depend on curriculumbased definitions of typical arithmetical development, and how children with low numeracy differ from the typical trajectory.
In contrast, neuroscience research suggests that rather than address isolated conceptual gaps, remediation should build the foundational number concepts first. It offers a clear cognitive target for assessment and intervention that is largely independent of the learners’ social and educational circumstances. In the assessment of individual cognitive capacities, set enumeration and comparison can supplement performance on curriculumbased standardized tests of arithmetic to differentiate dyscalculia from other causes of low numeracy (B. Butterworth, D. Laurillard, (2010), K. Landerl, Child Psychol. 2009).
B. Butterworth and D. Laurillard Science 2011, show that the intervention that strengthens the meaningfulness of numbers, especially the link between the maths facts and their component meanings, is crucial. Typical retrieval of simple arithmetical facts from memory elicits activation of the numerical value of the component numbers.
Without specialized intervention, most dyscalculic learners struggle with basic arithmetic in secondary school (R. S. Shalev Child Neurol 2005).
Effective early intervention may help to reduce the later impact on poor numeracy skills, as it does in dyslexia (Goswami 2006).
Although this approach is very expensive, it promises to repay 12 to 19 times of the investment (J. Gross, Every Child a Chance Trust 2009).
A further study was conducted to confirm the effectiveness of the purposeful Puffin Intervention Program. The study took a sample size of 50 pupils between ages 6 and 15 from Dynamo Intervention.
These pupils had taken the first assessment at the beginning of the Spring school term 2015 and were independently provided with 12 weeks of intervention support followed by a second assessment at the end of the school term.
An analysis was carried out to compare the first and second assessments. The analysis showed that the percentage improvement for the combined MMR stages was 11.67% for the intervention period of 12 weeks.
Further analysis showed that the improvement in the Magnitude and Relationship stage was 21.44% and solely for the Relationship stage, a staggering 31.94% improvement.
This shows that a small improvement made in the Number Meaning and Number Magnitude stages brought a large improvement in the Number Relationship stages (Math Foundation).
This analysis further provides confidence in the reliability of the NumberSenseMMR™ framework to support the findings from neuroscience research.
Case Studies
News and Reviews
 + Validation and Standardization

Validation of the NumberSenseMMR® Framework
The consistency of the Puffin Math framework was conducted independently by the University of Oxford and led by Dr. Ann Dowker (2016).
The study used the assessment data collected from 3465 students in 368 schools across England, Wales, Scotland and Europe (Englishspeaking schools). Additionally an independent study was carried out to correlate results with SEN school in Oklahoma.
The correlation analysis was carried out on the assessment data that was grouped into Number Meaning, Number Magnitude and Number Relationship components (Visual Numbers, counting through to Multiplication) by adding scores.
The data was analysed from ages 7 through to 11, to avoid the influence of too many small and diverse age groups.
The analysis showed all the components in the NumberSenseMMR® framework correlated significantly at (p < 001) with one another.
The analysis shown in the table below showed Number Magnitude was a highly significant independent predictor (beta = 0.313; t = 12.92; p < 0.001).
Similarly, Number Meaning (beta = 0.091; t = 3.784; p < 0.001) was also a significant independent predictor and Age was also significant (beta = 0.48; t = 2.46; p < 0.014).
Arguably, Age could be excluded from the multiple regression analysis as the data did not contain a truly continuous age variable
Number Relationship Standardized Coefficients
betat p Number Magnitude 0.313 12.92 p < 0.001 Number Meaning 0.091 3.784 p < 0.001 Age 0.48 2.46 p < 0.014 Table 1. Multiple Regression Analysis
The ANOVA analysis in Table 2 shows participant variance analysis with Age as the factor and the Number Meaning, Number Magnitude and Number Relationship components as the dependent variables.
The analysis showed that there was high significant effect of Age on Number Meaning score (F(4,2381) = 13.26; p = 0.001). The mean scores were 18.65 (s.d 2.113) for 7yearolds; 18.99 (s.d 1.76) for 8yearolds; 19.3 (s.d 1.39) for 9yearolds; 14.038 (s.d 5.198) for 10yearolds; and 14.54 (s.d 1.66) for 11yearolds.
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 8 and 9yearolds, 9 and 10yearolds or 9 and 11yearolds or 10and 11yearolds; but there were highly significant differences between 7 and 9yearolds, 7 and 10yearolds, 7 and 11yearolds, 8 and 10yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
The analysis showed that there was high significant effect of Age on Number Magnitude score (F(4,2381) = 24.467; p = 0.001). The mean scores were 12.18 (s.d 4.74) for 7yearolds; 11.78 (s.d 4.75) for 8yearolds; 12.87 (s.d 5.177) for 9yearolds; 14.038 (s.d 6.63) for 10yearolds; and 14.54 (s.d 5.48) for 11yearolds.
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 7 and 9yearolds, 8 and 9yearolds or 10 and 11yearolds; but there were highly significant differences between 7 and 10yearolds, 7 and 11yearolds, 8 and 9yearolds, 8 and 10yearolds, 8 and 11yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
The analysis showed there was high significant effect of Age on Number Relationship score (F(4,2381) = 12.86; p = 0.001). The mean scores were 9.54 (s.d 6.976) for 7yearolds; 9.776 (s.d 7.15) for 8yearolds; 10.007 (s.d 6.74) for 9yearolds; 19.41 (s.d 1.35) for 10yearolds; and 19.28 (s.d 7.1) for 11yearolds.
Factor p Mean s.d Number Meaning F(4,2381)=13.26 p = 0.001 Age 7 18.65 2.113 Age 8 18.99 1.76 Age 9 19.3 1.39 Age 10 14.038 >5.198 Age 11 14.54 Number Magnitude F(4,2381)=24.467 p = 0.001 Age 7 12.18 4.74 Age 8 11.78 4.75 Age 9 12.87 5.177 Age 10 14.038 6.63 Age 11 14.54 5.48 Number Relationship F(4,2381)=12.86 p = 0.001 Age 7 9.54 6.976 Age 8 9.776 7.15 Age 9 10.007 6.74 Age 10 19.41 1.35 Age 11 19.28 7.1 Table 2. ANOVA Variance analysis with Age and dependent variables
The Tamhane2 post hoc tests showed that there were no significant differences between 7 and 8yearolds, 7 and 9yearolds, 8 and 9yearolds or 10 and 11yearolds; but there were highly significant differences between 7 and 10yearolds, 8 and 11yearolds, 9 and 10yearolds and 9 and 11yearolds. All significant differences were in the direction of older pupils scoring higher.
Therefore, this shows that there is a large difference in Number Meaning between 7yearolds and older children and for age 9, there may have been a tendency for a ceiling effect. Similarly, for Number Magnitude and Number Relationship, the biggest difference was between ages 7 to 9 and 10 to 11.
Puffin Math Assessment Standardization
Standardised Scores
What are Standardized Scores?
Standardized scores allow comparison of an individual’s performance with welldefined reference groups. The normative scores are sometimes referred to as standardized scores.The standardized score indicates the degree to which an individual’s score deviates from the average for people of the same age.
The scale is based on the ‘normal‘ distribution of scores that would be expected within the population, and is calculated on the basis that the overall mean (average) standardized score is 100 and the standard deviation is 15, so that about 68% of people will score between 85 and 115.
Test Construction
The Puffin Assessment has 647 test items that are derived from dyscalculia research, taking into account the following areas:
 Progression of strands.
 Progression of questions within each strand.
 Questions within each strand meet the strand’s objective.
 An Individual Support Plan with signposts to the Puffin Intervention.
 Progression of questions within the Meaning, Magnitude and Relationship areas.
 Dynamic generation of questions based on responses and progression across the strands.
 Random generation of questions within each strand.
 Universality of language and operational symbols.
 Questions are developmentally appropriate for the baselined ages.
 Universality in the use of the illustrations.
 Universality in the use of the language.
 Measurement of the response time for each question.
 Questions reviewed for gender and cultural balance.
 The use of technology to ensure that when the questions are read, there is a common set of instructions for all students.
 Accessibility settings so that the necessary screen adjustments can take place.
 The provision of Support Tools:
 Student Profile Questionnaire to gain a snapshot of the pupil’s current development and functioning so that the necessary adjustments could be offered during the assessment.
 Workingout Notes for students to show their thinking on paper.
 Observation Notes for the test administrator to observe the pupil’s methods, approaches and thinking during the assessment.
The data extract was confirmed as results from tests administered online independently carried out at schools by SEN Coordinators, Math Coordinators, Class teachers and HigherLevel Teaching Assistants.
A sample of 3465 students was used for the standardization of the Puffin Math Assessment.
The sample data was chosen to achieve the widest balance of content, both with the sets and throughout as a whole to represent the population.
The whole represented the collective three stages attainment in all of the NumberSenseMMR® framework components and the component measurements were taken as raw scores.
The data sets were grouped in seven sets for the purposes of analysing Standard Deviation and Mean. The seven groups were grouped in Ages 6, 7, 8, 9, 10, 11, and 12+ shown in Table 2 below.
The student results were stratified by the NumberSenseMMR® framework components and were not based on Key Stage results, as this information was not available in the same format.
About the Sample Data
The sample test results data set was extracted for the assessments carried out between the period September 2014 and April 2015. The data sets for analysis represented 3465 students in 368 schools in England, Wales, Scotland and Europe (Englishspeaking schools using the UK curriculum).
The distribution of number of schools and number of students to represent the population is shown in Table 1, and the percentage distribution of the sample data by regions is shown in Diagram 1.
A total of 5 student data sets that represented incomplete assessments or expired time were removed from the extracted sample data for analysis.
Regions No. of
Students% No. of
Schools% Population % North 521 15 54 15 12570 17 Midlands 525 15 62 17 15384 21 South 2038 59 200 54 35515 49 Wales 91 3 18 5 3145 4 Europe 141 4 20 5 1811 3 Scotland 149 4 14 4 3642 5 Total: 3465 100 368 100 72067 100 Table 1. Number of schools and students in the standardised sample
Diagram 1. Distribution of schools and students by regions in the standardised sample
The sample data represented a distribution of students between 6 and 15 years old. The equivalent of school curricular measures used in England, Scotland and Europe is shown in Table 2.
The gender split in the data represented 52% male and 48% female students. The detailed split by age is shown in Diagram 2 below.
Diagram 2. Profile of male and female students in the standardised sample
Age England Europe British
SchoolsScotland Male Female No. of
StudentsNo. of
Schools6 – 7 years Year 1 Grade 1 P1 270 248 518 16 7 – 8 years Year 2 Grade 2 P2 369 321 690 36 8 – 9 years Year 3 Grade 3 P3 364 344 708 45 9 – 10 years Year 4 Grade 4 P4 281 319 600 76 10 – 11 years Year 5 Grade 5 P5 197 207 404 96 11 – 12 years Year 6 Grade 6 P6 119 83 202 49 12 – 13 years Intervention Grade 7 Intervention 61 49 110 9 13 + years Intervention Grade 8 Intervention 137 96 233 41 Total: 1798 1667 3465 368 Table 2. Number of students and schools participating in field testing
Intervention Validation
Research
There have been many attempts to raise the performance of children with low numeracy skills, although not specifically for dyscalculia. In the United States, for example, evidencebased approaches have focused on children from deprived backgrounds, usually low socioeconomic status (C. Mussolin et al.2009; R. Price, D. Ansari, Curr. Biol. (2007)).
The 2003 Primary National Strategy gave special attention to children with low numeracy skills by:
(i) Diagnosing each child’s conceptual gaps in understanding
(ii) Giving the child more individual support in working through visual, verbal, and physical activities designed to bridge each gap.Unfortunately, there is little quantitative evaluation of the effectiveness of these strategies: It has not been possible to tell whether identifying and targeting an individual’s conceptual gaps with a more individualized version of the same teaching is effective. A further problem is that these interventions are effective when there has been specialist training for teaching assistants, but not all schools can provide this (A. Dowker, 2009).
Standardized approaches depend on curriculumbased definitions of typical arithmetical development, and how children with low numeracy differ from the typical trajectory.
In contrast, neuroscience research suggests that rather than address isolated conceptual gaps, remediation should build the foundational number concepts first. It offers a clear cognitive target for assessment and intervention that is largely independent of the learners’ social and educational circumstances. In the assessment of individual cognitive capacities, set enumeration and comparison can supplement performance on curriculumbased standardized tests of arithmetic to differentiate dyscalculia from other causes of low numeracy (B. Butterworth, D. Laurillard, (2010), K. Landerl, Child Psychol. 2009).
B. Butterworth and D. Laurillard Science 2011, show that the intervention that strengthens the meaningfulness of numbers, especially the link between the maths facts and their component meanings, is crucial. Typical retrieval of simple arithmetical facts from memory elicits activation of the numerical value of the component numbers.
Without specialized intervention, most dyscalculic learners struggle with basic arithmetic in secondary school (R. S. Shalev Child Neurol 2005).
Effective early intervention may help to reduce the later impact on poor numeracy skills, as it does in dyslexia (Goswami 2006).
Although this approach is very expensive, it promises to repay 12 to 19 times of the investment (J. Gross, Every Child a Chance Trust 2009).
A further study was conducted to confirm the effectiveness of the purposeful Puffin Intervention Program. The study took a sample size of 50 pupils between ages 6 and 15 from Dynamo Intervention.
These pupils had taken the first assessment at the beginning of the Spring school term 2015 and were independently provided with 12 weeks of intervention support followed by a second assessment at the end of the school term.
An analysis was carried out to compare the first and second assessments. The analysis showed that the percentage improvement for the combined MMR stages was 11.67% for the intervention period of 12 weeks.
Further analysis showed that the improvement in the Magnitude and Relationship stage was 21.44% and solely for the Relationship stage, a staggering 31.94% improvement.
This shows that a small improvement made in the Number Meaning and Number Magnitude stages brought a large improvement in the Number Relationship stages (Math Foundation).
This analysis further provides confidence in the reliability of the NumberSenseMMR™ framework to support the findings from neuroscience research.
 + Case Studies

Case Studies
 + News and Reviews

News and Reviews