## I. Introduction

The gender gap in mathematics achievement has been one of the most controversial issues in the field of mathematics over the last few decades. One reason to be concerned about differences in mathematics achievement between female and male students is that mathematics achievement can be used as a promising predictor of future income (Murnane, Willett, & Levy, 1995; Niederle & Vesterlund, 2010); furthermore, Mitra (2002) found that higher income tends to be guaranteed for women with above-average mathematical skills. The author also found that the wage differences between females and males were significant across all educational levels, but this difference disappeared for those females and males who have higher mathematical skills.

Most commonly, the gender difference between females and males has decreased over the years, but the difference becomes much more apparent at higher levels of schooling (Cheema & Galluzzo, 2013; Cimpian, Lubienski, Timmer, Makowski, & Miller, 2016; Friedman, 1989; Ganley & Lubienski, 2016; Guiso, Monte, Sapienza, & Zingales, 2008; Hyde, Lindberg, Linn, Ellis, & Williams, 2008). For example, Cimpian et al. (2016) used Early Childhood Longitudinal Study-Kindergarten Class of 1998-1999 and 2010-2011 and found out that the gender gap emerged early only among higher achievers, but this tendency spread out throughout the others as the grade increased.

Some studies have focused on the evident gender gap in the right tail of mathematical ability (Benbow & Stanley, 1980, 1983; Ellison & Swanson, 2010; Fryer & Levitt, 2010; Robinson & Lubienski, 2011; Wai, Putallaz, & Makel, 2010). Benbow & Stanley (1980, 1983) reported the male-female ratio in the extreme right tail was 13 to 1. Similarly, Ellison & Swanson (2010) showed that the male-female ratio at the 99th percentile exceeded 10 to 1. In contrast, Wai et al. (2010) showed that the gender gap in the right tail of mathematical ability still exists, but the ratio has declined. This body of research indicates that the gender gap should be explored depending on the different levels of mathematical ability.

Caution is required, however, to interpret this gender gap in mathematics achievement since this gap may differ depending on the statistical methodology used. For example, Cheema and Galluzzo (2013) argued that some of the findings from early studies could be problematic for two reasons. First, they did not consider controlling demographic factors (e.g., race or socioeconomic status) which could affect mathematics achievement. Second, the gender gap could not be generalized to the population when the sample was not nationally representative.

This study, therefore, aims to examine the gender gap in mathematics achievement in the Trends in International Mathematics and Science Study (TIMSS) 2015 according of the level of mathematical ability as well as the gender gap among total students using a two-level multilevel modeling. TIMSS 2015 was implemented followed by suggestions from Cheema and Galluzzo (2013), since it consists of nationally representative samples and provides various background information which can be used as control variables. A two-level multilevel model was used due to a hierarchical structure of the data. In addition, the gender gap among eighth grade students was examined based on the results of the previous studies that gender differences tend to be more apparent at higher levels of schooling. Finally, this study focused on the high-achieving countries in TIMSS 2015 mathematics assessment as good examples, which resulted in selecting East Asian countries, including Chinese Taipei, Hong Kong Special Administrative Region (SAR), Japan, Korea, and Singapore. The performance of East Asian countries on TIMSS 2015 was outstanding for both the fourth and eighth grade students, and the gap between these countries and their Western counterparts such as the United States, England, and Canada was more than 50 points (Mullis, Martin, Foy, & Hooper, 2016).

## II. Literature Review

Even though many previous studies noted that the gender gap in mathematics achievement was consistent, especially in higher-level skills, other research still suggests that this gap can differ as a result of the features of the data, such as whether a study is a large-scale assessment or whether there are important control variables such as self-efficacy or anxiety. A large-scale assessment enables researchers to generalize the results to the population level because its huge sample size and control variables which are provided in assessments can clarify gender effects on mathematics achievement by considering other important factors. For example, Cheema and Galluzzo (2013) used data from the Program for International Student Assessment (PISA). One of the advantages of the PISA dataset is that it includes background information on students, and this information can be used as control variables. The results indicated that the difference between female and male students in mathematics achievement disappeared when important predictors of mathematics achievement were considered, such as self-efficacy and anxiety related to mathematics. This finding seems intriguing because it suggests that the gender gap comes from emotional aspects such as the negative self-efficacy or anxiety of female students rather than actual differences in intelligence.

As noted above, there is a tendency for the gender gap in mathematics achievement to decrease as female students’ college preparation increases, but the gap between boys and girls appears to widen among high mathematics achievers. Ellison and Swanson (2010) confirmed this tendency using data from American Mathematics Competitions (AMC), which consisted of a series of mathematics contests for high school students. The results indicated that the male-female ratio at the 99th percentile exceeded 10 to 1. Also, the results showed that the magnitude of the variation in the gender gap across schools in the United States was moderate, which implies that the difference in mathematics achievement between boys and girls looks similar from school to school. Finally, the results indicated that the highest-scoring girls’ schools were extremely limited to super-elite schools, whereas the highest-scoring boys came from various backgrounds. The authors speculated that “girls suffer in becoming high achievers in mathematics because they are more compliant with authority figures and/or are more sensitive to social environment” (p. 126).

Similarly, Fryer and Levitt (2010) found a gender gap in mathematics achievement in the upper tail of the distribution. The authors used data from the Early Childhood Longitudinal Study Kindergarten Cohort (ECLS-K). The sample consisted of more than 20,000 children who entered kindergarten in 1998. The participants were re-interviewed at the end of the first year at kindergarten, first grade, third grade, and fifth grade. The results indicated that the proportion of girls in the top 5 percent changed dramatically from 45 percent on entering kindergarten to 28 percent at the end of the fifth grade. Using various information variables from the data, the authors found that the gender difference in mathematics performance did not appear among the female students whose mothers worked in mathematics-related occupations or whose mothers’ educational levels were higher than those of their fathers. This evidence suggests that role models can have a positive effect on girls’ mathematics achievement.

Little research has been done on the gender gap in mathematics achievement of East Asian students, and the results of the studies indicated that the gender gap of these students differs from those of Western countries (Kane & Mertz, 2012; Tsai, Smith, & Hauser, 2018; Tsui, 2007). For example, Tsui (2007) implemented the F-test to examine the gender gap in mathematics scores between China and the United States, especially for talented students. The results showed that the quality of teachers might have an influence on the gender gap. In this study, the gender differences of the top-level students were unclear, but the results indicated that the gender gap at the general level can be related to the quality of the teachers and teaching methods. Specifically, the level of training and overall understanding of mathematical concepts of Chinese mathematics teachers seems to be higher than that of the average mathematics teachers in the United States. Also, this study reported that having a culturally American background might contribute to the acceptance of gender differences in mathematics. Within the culture, teachers hesitate to encourage high mathematics scores for female students, and this culture is rarely observed in China.

Also, Kane and Mertz (2012) used multiple regression to analyze gender differences in mathematics scores, taking into account the religious, cultural, and class styles of 17 countries. In this study, they used various panel data, including PISA, TIMSS and International Mathematical Olympiad (IMO). Among the results, the results of TIMSS 2007 for the eighth grade students showed that there was little evidence that female students show lower scores than male students’ score among three East Asian countries; Korea, Singapore, and Hong Kong SAR. Specifically, in Korea, female students scored slightly lower than male students. However, in all other countries, female students scored higher than male students.

Tsai, Smith, and Hauser (2018) used PISA 2012 to explore the gender gap between three East Asian countries (i.e., Japan, Korea, and Taiwan) and three Western countries (i.e., USA, Germany, and the Czech Republic). In this study, a multilevel multiple indicators and multiple causes (MIMIC; Jöreskog & Goldberger, 1975) model is applied to identify the gender gap for ninth and tenth grade students. The results indicated that while the gender gap was observed in the Western countries, the gap was not observed in the East Asian countries examined in this study.

In contrast, Meisenberg (2016) showed that gender gap in East Asia might be obvious, which is similar to the other countries. In this study, regression models were applied to explore the cultural impact on the gender gap in the three different achievements (i.e., reading, science, and mathematics) for 15-year-olds in 75 countries using PISA 2000-2012 data sets. The author divided the 75 countries into 8 groups based on the cultural similarities (i.e., Protestant Europe, Catholic Europe/Mediterranean, English-speaking countries, Ex-communist countries, Latin America, North Africa/Middle East, South/Southeast Asia, and East Asia). The results showed that the mean score of the female students was consistently lower than that of male students for mathematics and science. Especially, this study showed that the gender gap related to mathematics in East Asia was not significantly smaller than the gender gap of the other countries.

A few limitations still exist in these previous studies which examined gender differences in East Asian countries. To be more precise, Kane and Mertz (2012) did not consider the possibilities that the gender gap may vary depending on level of achievement. Tsui (2007) focused only on the gender differences among talented students and did not control other factors important to mathematics, whereas both Tsai et al. (2018) and Meisenberg (2016) did not consider levels of the mathematical ability when examining the gender gap. This study, therefore, examines gender differences in the East Asian countries for low- and high-achieving students, as well as for total students. At the same time, important variables related to mathematics achievement are included in the two-level multilevel model.

Strayhorn (2010) suggested that the gap in mathematics achievement can be explained by Bronfenbrenner’s (1979) ecological system theory. Within this framework, the ability of growth is accomplished through the system consisted of the four hierarchical sub-systems: microsystem, mesosystem, exosystem, and macrosystem. Specifically, the most inclusive one called as macrosystem denotes the socio-cultural factors such as historical trends. The exosystem means the policy or faculty curriculum regarding the schools. The mesosystem typically implies the parental involvement. The last one called as the microsystem represents the psychological traits (e.g., self-concept, self-efficacy, and etc.). These four subsystems interact with each other and have an impact on not only the academic achievement itself but also the achievement gap among some different demographic groups.

Also, as discussed in the previous section, the international gender gap studies commonly used large-scale assessment including PISA and TIMSS. In the studies, they generally include various variables (e.g., psychological variable, socio-economic status variables, and school-related variable) related to the achievement, and these variables are well matched with the system referred above. Thus, it might be a useful perspective to draw meaningful characteristics of East Asian mathematics education when the results of the previous studies are summarized at the individual level, the family level, and the school level.

At the individual level, the achievement level of East Asian students is generally very high. Specifically, Ho (2009) found that students in Japan, Korea, Hong Kong, and Macao showed higher mathematics achievement than the OECD average, using PISA 2003 data. Hojo and Oshio (2012) also showed that students from Japan, Korea, Taiwan, Hong Kong, and Singapore obtained high mathematics achievement compared to the other countries. Results regarding the high mathematics achievement of East Asian students have been continuously reported in recent study that analyzes the factors of high PISA achievement among East Asian students (Jerrim, 2015). However, in contrast to their high achievement, these students showed relatively low self-concept and self-efficacy (Ho, 2009; Leung, 2010; Shen, 2007).

At the family level, parents in East Asian countries have a high level of interest in their children’s education and have invested heavily in the achievement of their children (Jerrim, 2015). It is an interesting result because the educational level of East Asian parents is relatively low compared to the level of parents from Western countries (Shen, 2007).

At the school level, the quality of teachers is much higher than those of Western countries (Wößmann, 2005).

However, even in the same East Asian countries, there are some differences. First, there are differences in students’ learning strategies. According to Ho (2009), students’ learning strategies and learning environments in Chinese countries such as Macao and Hongkong are better than those in Japan and Korea. And although the mathematics anxiety of East Asian students is generally high, this anxiety is especially high in Japan and Korea when compared to Chinese countries. Inversely, self-concept was lower in both Korea and Japan than in the two Chinese countries. There was also a difference in intrinsic motivation, which showed that Chinese students had higher intrinsic motivation. It is also found that the students in Singapore have more positive attitudes toward mathematics and students’ self-concepts than those in the other East Asian countries (Leung, 2010). Also, Hojo and Oshio (2012) presented different results regarding the scores. According to the results of this study, Japanese students’ mathematics scores were more uniform than those of China, Korea, Taiwan, Hong Kong, and Singapore.

## III. Methods

The data was obtained from the TIMSS 2015. TIMSS is “a comparative assessment of the achievement of students in many countries” (Glynn, 2012, p. 1321) and consists of mathematics and science assessments for fourth and eighth grade students.

<Table III-1> and <Table III-2> present the student- and school-level variables used in this study, respectively. The outcome measures investigated were five plausible values of mathematics achievement provided in TIMSS 2015. In the TIMSS achievement tests, a subset of items is used to estimate a student’s ability, as it is almost impossible for students to answer all TIMSS achievement items (e.g., 211 mathematics items for eighth grade) in the limited testing time. Instead of calculating a point estimation for a student’s ability, the plausible value methodology (Mislevy, 1991) estimates the ability distribution from the responses to the subset of items and draws a fixed number of individual estimates called the plausible values randomly from the estimated ability distribution. These five random numbers reflect the uncertainty of the student’s proficiency in mathematics (Foy & Yin, 2016).

For the gender variable, males were coded as 1, and females were coded as 0. Thus, the estimate of gender effects indicates how male students perform better/worse on the TIMSS mathematics achievement test as compared to female students.

The control variables, all of the independent variables apart from gender, were selected based on both theoretical and empirical consideration. The control variables except science achievement and school location were the scales which combined a set of TIMSS context questionnaire items. TIMSS also provides cut scores for each scale which makes it easy to interpret the scale scores. For the student level, the following variables were selected as the control variables: science achievement, home educational resources, like learning mathematics, a student’s view on engaging teaching in mathematics lessons, confidence in mathematics, and value mathematics. For the school-level, emphasis on academic success, school disciplinary problems, and location were selected. For school disciplinary problems, the higher the score, the lower the number of problems.

In this study, gender differences were examined among low-, and high-achieving students as well as the total students in order to see if the gender gap varies according to mathematics achievement. Therefore, the sample was divided into two groups: low- and high-achieving students, using the first and third quartile scores of the five plausible values for mathematics achievement. More specifically, all of the five plausible values of the low-achieving students were equal or lower than the first quartile score of the plausible values, whereas all of the five plausible values of the high-achieving students were equal or higher than the third quartile score. Since all plausible values rather than a single score were used to classify the groups, the actual proportions of the low- and high-achieving students were much smaller than 25%. For example, the proportion of the low-achieving students in Chinese Taipei was 18.4% and the proportion of high-achieving students in Chinese Taipei was 13.4%; the number of students and schools in each country are shown in <Table III-3>.

A two-level multilevel model was used to examine the gender differences in mathematics achievement of total, low-, and high-achieving students due to the hierarchical structure of the TIMSS 2015.

First, the unconditional model (i.e., a model with no predictors) was produced to provide the information of variance component for mathematics achievement. The student-level model is given by

where *Y _{ij}* is the mathematics achievement of student

*i*in school

*j*,

*β*

_{0j}is the school mean achievement score of school

*j*, and

*e*is the unique error of student

_{ij}*i*in school

*j*.

The school-level model describes the school mean achievement score which is randomly around a grand mean and is explained as

where *γ*_{00} is the grand mean achievement score for all schools, and *u*_{0j} is the unique effect of school *j*.

The conditional model was used to examine gender effects after controlling all of the independent variables which are associated with the mathematics achievement. The student-level model is given by

where *β*_{0j} is the school mean achievement score for female students after all the independent variables are controlled; the female students were coded as 0 and the male students as 1. *X _{pij}* is the value of the student-level control variable for student

*i*in school

*j*.

Similarly, the school-level model includes school-level control variables for the grand mean, which is described as

where *γ*_{00} is the grand mean achievement score for female students after all of the independent variables are controlled, and *Z _{qj}* is the value of the school-level control variable for school

*j*. The second equation of the student-level model implies that all of the explanatory variables are inserted in the intercept coefficient, since this study focuses on the gender gap when possible influencing factors are controlled.

This study used SPSS 21.0 to analyze descriptive statistics and compute the mean achievement scores. Specifically, the SPSS macro program created by the IEA IDB Analyzer (IEA, 2016) was required to compute the mean achievement scores due to the five plausible values for mathematics achievement. For the two-level multilevel modeling, HLM 6 (Randenbush, Byrk, Cheong, & Congdon, 2004) was used due to the hierarchical structure of TIMSS 2015. Also, the plausible values can be easily analyzed with this software.

## IV. Results

<Table IV-1>, <Table IV-2>, and <Table IV-3> provide the means and standard deviations of the measures used in this study for total, low-, and high-achieving students, respectively. As can be seen in these tables, there is a tendency for most control variables to be positively associated with the level of mathematical ability. For example, confidence in mathematics for high-achieving students was the highest, followed by total students and then by low-achieving students.

There was no consistency in the proportions of female and male students in each group. The proportions remained around 0.5 in the total sample, which was intended by the sampling procedure in TIMSS 2015. The low-achieving group has a larger number of male students in every country, but the difference in the proportions was negligible. For high-achieving students, there were more male students in Chinese Taipei, Hong Kong SAR, and Korea. Again, the difference in the proportions was small.

Prior to the multilevel modeling analysis, the mean achievement scores of female and male students were simply compared using the *t*-test. As shown in <Table VI-4>, the gender differences for total students were insignificant except in Singapore. The mean score of the female students was almost 10 points higher than that of the male students in Singapore. For low-achieving students, a gender gap was present in Hong Kong SAR and Singapore. In both countries, the mean score of the female students was higher than that of the male students. For high-achieving students, a gender difference was detected in Hong Kong SAR. The mean score of the male students was 13 points higher than that of female students.

The variance components obtained from the unconditional model are summarized in <Table IV-5>. The intraclass correlation (ICC) was also computed to measure the magnitude of the variance between schools in the overall variance. The results revealed that between-school variance accounts for 8.8% to 52.3% of the total sample, 0.7% to 33.1% of the low-achieving group, and 3.0% to 10.5% of the high-achieving group. The between-school variance in Hong Kong SAR was much larger than that of the other countries. Next, a two-level multilevel model was applied to all of the samples, although a couple of subsamples showed rather small between-school variance (e.g., 0.7% of the low-achieving group of Korea). This is because the multilevel model still accounts for part of the between-school variance, even for these subsamples.

In the following analysis, the control variables as well as the gender variables were introduced in order to examine the gender gap of total, low-, and high-achieving students after controlling all possible influencing factors on mathematics achievement. The result of the multilevel analysis for total, low-, and high-achieving students are presented in <Table IV-6>, <Table IV-7>, and <Table IV-8>, respectively.

For the total sample, the gender gap in mathematics achievement was significant except in Hong Kong SAR, even controlling for other variables (-9.721 in Chinese Taipei, -4.815 in Japan, -6.760 in Korea, and -11.252 in Singapore). In accordance with Kane and Mertz (2012), the mean achievement score of female students was higher than that of male students in those countries. For example, the gender gap was –9.721 in Chinese Taipei, which indicates that the mean achievement score of female students was 9.721 higher than that of male students. The surprising aspect of the results is that those gender differences were not identified when the *t*-test was used, except for Singapore.

For low-achieving students, a significant gender gap was detected in Chinese Taipei, Hong Kong SAR, and Singapore (-8.908 in Chinese Taipei, -11.242 in Hong Kong SAR, and -9.604 in Singapore). Similar to the total students’ results, the mean achievement score of female students was higher than that of male students. It is interesting that the gender difference in Hong Kong SAR was insignificant for all students but was significant for the low-achieving students. Again, the gender differences in Chinese Taipei were not detected in the *t*-test.

For high-achieving students, a gender gap was not present, which is consistent with Tsui (2007). It should be noted that this result conflicts with the consensus from the Western countries that more evident gap between boys and girls was detected at the right tail of mathematical ability (Benbow & Stanley, 1980, 1983; Ellison & Swanson, 2010; Fryer & Levitt, 2010; Robinson & Lubienski, 2011; Wai et al., 2010). Similar to the results from total and low-achieving students, the results were not matched to those from the *t*-test.

Interestingly, there was no gender effect either in low- or high-achieving students in Japan and Korea, even though the gender gap of the total students in both countries was significant. More importantly, a gender gap for low-achieving students was not present in these two countries, while the gender gap in other countries was almost 10 points. In accordance with previous research (Ho, 2009), these two countries showed similar characteristics in control variables. Especially, the students in Japan and Korea showed relatively high scores for home educational resources and relatively lower scores for engaging teaching than other countries. This finding implies that these two factors may be related to the small achievement gap between the female and male students in low-achieving group.

The impact of each control variable differed by sample. More specifically, science achievement had a significant impact on mathematics achievement for both total and low-achieving students. The effect of home educational resources was significant only for the total sample. The degree to which a student likes mathematics was significant for the total students in most of the countries, whereas it was insignificant for the low- and high-achieving students, apart from the low-achieving students in Singapore. The effect of confidence in mathematics was significant regardless of the sample in most countries. Emphasis on academic success affected mathematics achievement in all countries only for the total sample. The impacts of a school’s disciplinary problems and a school’s location were significant in some countries.

The impact of most variables was positive except for students’ views on engaging teaching in mathematics lessons. In other words, the more students highly evaluated engaged teaching in mathematics lessons, the lower their achievement. It is intriguing that this impact was not significant in Singapore, as the scale score for a student’s view on engaging teaching of the students in Singapore was higher than other countries.

## V. Conclusion

The gender gap between female and male students in mathematics achievement has been studied over the past decades. The consensus derived from previous research is that gender differences are more apparent in high-achieving students. Cheema and Galluzzo (2013) pointed out, however, that this gender gap can be regarded as the “mixed effect” of actual gender differences and the impact of other important variables to mathematics achievement. The controversy over the gender gap in mathematics achievement implies two things: first, a gender gap may vary depending on levels of mathematical ability. Second, researchers need to consider the influencing variables on achievement. This study, therefore, investigated the gender differences of total, low-, and high-achieving students in high-achieving countries in TIMSS 2015, which includes Chinese Taipei, Hong Kong SAR, Japan, Korea, and Singapore. In consideration of the hierarchical structure of the data, a two-level multilevel model was implemented.

The main results of this study are summarized as follows. First, gender differences in mathematics achievement in East Asian countries were different depending on the level of mathematical ability. Specifically, for total students, this study has shown that gender differences in mathematics achievement were present in most East Asian countries except for Hong Kong SAR. However, for low-achieving students, of the four countries which showed a gender gap for total students, two countries, Chinese Taipei and Singapore, showed a gender gap. It is interesting that gender differences were not detected for total students but that a significant gender gap was detected for low-achieving students in Hong Kong SAR. For high-achieving students, this study showed that the gender gap was insignificant in all countries in East Asia, which conflicts with the consensus derived from Western countries that the gender difference is more apparent among high-achievers (Benbow & Stanley, 1980, 1983; Ellison & Swanson, 2010; Fryer & Levitt, 2010; Robinson & Lubienski, 2011; Wai et al., 2010).

Second, on the question of proper methodology for examining gender differences, this study found that there were clear differences between the results from the mean comparison and those from the two-level multilevel modeling. To be more precise, for total students, the result from the mean comparison found gender gap only in Singapore, whereas the result from the two-level multilevel model found gender gap in Chinese Taipei, Japan, Korea, and Singapore. For low-achieving students, the result from the mean comparison found gender gap in Hong Kong SAR and Singapore, whereas the result from the two-level multilevel model found gender gap in Chinese Taipei, Hong Kong SAR, and Singapore. For high-achieving students, the result from the mean comparison fount gender gap in Hong Kong SAR, whereas the result from the two-level multilevel model found no gender gap in East Asian countries. This finding confirms the idea set forth by Cheema and Galluzzo (2013) that the gender gap needs to be investigated alongside other important variables.

This study contributes additional evidence that gender differences in mathematics achievement should be examined with proper statistical methodology, which was suggested by Cheema and Galluzzo (2013). Also, this study found no gender gap in high-achieving students in East Asian countries, which was different from that in Western countries where more apparent gender difference in high-achieving students was found.

The limitations of this study and suggestions for future research are as follows. First, the limitation of this study lies in the fact that it investigated the gender gap in eighth grade students. Further research is recommended to investigate the gender differences among high school students, which can be directly related to college entrance examinations. Second, more various explanatory variables need to be considered, especially for low- and high-achieving students, considering that the effects of the control variables for those groups were not significant in this study. Third, this study might lead to the conclusion that there is little evidence of a gender gap for high-achievement students. However, according to a previous study, it was found that female students spend 19.5% more time studying mathematics than do male students (Guiso et al., 2008). In other words, not only the difference in scores between the male and the female students, but also learning strategies for how to efficiently study mathematics need to be examined to produce more in-depth explanations of the gender gap in mathematics achievement. Finally, further research should be done to investigate the possible reasons of gender differences in East Asian countries especially among low-achieving students, which may help assist and support these students.