This portfolio examines gender disparities in STEM across education, entrance into the workforce, and within the labor force. Many datasets were collected from a multitude of sources on different components of gender discrimination and were then thoroughly examined and transformed for visualization. Then, six visualizations were developed to illuminate how test scores have evolved at the education level, how the share of men and women entering professional fields have evolved moving into the labor force, and how labor force participation, fertility rate, parental leave, earnings, and technical skills in the labor force compare for men versus women and change over time. Overall, this analysis finds that gender divides are still present but are diminishing in education, entrance into the labor force, and within the workplace. Potential reasons for the narrowing gap include lower fertility rates and improved parental leave policies.
Gender discrimination is a prominent human rights issue that manifests itself via job segregation, employment inequity, pay gaps, and lack of freedom in career choice for women around the world. In fact, it is estimated that it will take another 14.2 years to close the educational attainment and health and survival gap, 267.6 years to close the economic participation and opportunity gap, 145.5 years to close the political empowerment gap, and 135.6 years to close the gender gap globally (Global Gender Gap Report 2021). Despite these dire statistics, over the last decade and mainly in the last few years, gender disparities have been at the forefront of the media, social justice movements, and political reforms both globally and in the United States. There have been many studies, publications, and analyses conducted on the different components of gender disparities, and the majority of these study one facet at a time.
This portfolio offers a more general toolset to examine gender inequities in education, entrance into the labor force, and within the labor force by location and over time. Developed by women in tech, this analysis specifically hones in on gender disparities in STEM and assesses how test scores, entrance into professional fields, labor participation, parental leave, and salaries differ for men and women. Through a variety of interactive visualizations, this platform enables students, women, policy-makers, and stakeholders to see the presence and evolution of gender disparities and drill-down into underlying factors to understand potential causes and solutions. Due to the widespread awareness on gender inequality, it is expected that this portfolio will elucidate a narrowing gender gap in education, entrance into the labor force, and in the labor force via the factors mentioned above, but also that the disparities are still present.
In order to closely examine this topic, six datasets were leveraged to visualize gender disparities and their underlying factors. To investigate the gap at an educational level, an OECD dataset of standardized test scores for boys and girls containing PISA test scores for each subject, country, gender, and year from 2000 to 2018 was collected from the OECD’s website. This dataset contained three separate files, one for each subject - reading, science, and math. PISA is the OECD’s Program for International Student Assessment which measures 15-year-olds’ ability to use reading, mathematics, and science knowledge to meet real-life challenges (PISA). To understand the disparities into the workforce, a dataset was downloaded from the OECD’s website that contains the distribution of graduates and new entrants by field from 2005 to 2019.
Then, to explore the more macro trends in the labor force, a separate dataset was collected from the OWID that contains the percentage of women active in the labor force and the average births per woman by country from 1960 to 2020. Additionally, an OECD dataset on length of parental leave in fifty countries from 1970 to 2018 that contains minimum government granted parental leave for four separate metrics was collected. To explore the inequities in STEM and non-STEM fields, a dataset from the United States Census Bureau was collected that contains 2019 median earnings for a series of different occupations. And, finally, a Kaggle dataset on women in data science was used to examine the technical skillset between men and women. This dataset looks at the gender divide in data science between 1990 and 2018 and contains attributes on men and women’s education levels, length of time coding, perceptions of ML/AI, and use of different technical tools.
After the data was collected, missing values, outliers, and data types were explored and appropriate techniques were leveraged to clean the datasets. Aggregation, feature generation, and additional transformation was conducted for each dataset to prepare them for data visualization.
Utilizing R, the three PISA test score datasets were combined into one data frame where unwanted columns were removed, columns were appropriately renamed, the case was standardized, and duplicates, missing values, and data types were accounted for. Two new columns were added to show the total score per gender, country, and year and to rank the countries by total test score for each year. Also, OECD countries that did not have at least three years worth of data were removed. The final dataframe had 2087 rows and 7 columns (country, year, subject, gender, test score, total score, and rank).
Using Python, the OECD distribution of new entrants by fields dataset was cleaned using standard munging practices, such as converting columns to appropriate data types and removing columns with a majority of missing values. Feature generation was employed to create a column “Differences” that calculated the share of men minus the share of women entering a given field for each country and year. The data was organized such that each professional field was a column containing the male-female difference for a given country and year. The final dataframe had 386 rows and 14 columns (country code, country name, year, and eleven columns for each professional field’s difference).
Leveraging Python, the STEM jobs dataset was prepared by splitting the occupation column into two separate columns, occupation title and occupation type. Then, occupation category columns were iteratively filled with the appropriate value for the respective occupation title. For instance, the occupation category Computer Occupations categorized the rows that corresponded with the job titles of Computer programmers, Software developers, Web developers, and many more. Column names were appropriately renamed and the final dataframe had 120 rows and 10 columns (occupation, occupation_category, total_employed, men_employed, women_employed, percent_of_women, total_median_earnings, men_earnings, women_earnings, and percent_of_men_earnings).
The dataset on technical skills obtained from Kaggle had a fairly straightforward cleaning process. Using Python, only variables with relevant demographic information and the data on technical skills were kept and renamed in a clear and concise way. Then, the dataset was transformed from wide format into long format. Finally, the data was aggregated by gender and programming skill and the ratios were calculated. The final, aggregated dataframe had 37 rows and 4 columns (gender, programming_skill, counts, and ratio).
Lastly, the datasets containing parental leave information obtained from OECD and fertility/labor force participation data obtained from OWID also had a straightforward cleaning process. For both datasets, only the relevant variables were kept including relevant demographic information, rows with incorrect time values were expunged, and missing values were removed using R. The final fertility/labor force participation dataframe had 4773 rows and 4 columns (country, year, female_labor_force_participation_rate, and fertility_rate). And, the final parental leave dataframe had 2658 rows and 5 columns (country, indicator, gender, time, and value).
Using these clean datasets, six separate visualizations were correspondingly created.
Figure 1 depicts the average test scores for each gender and subject, at three year increments, from 2000 to 2018 for OECD countries. The intent of this visualization is to compare how men and women perform in education via test scores across different countries, time periods, and subjects. The data is illustrated on a faceted multiple line graph where each frame is segmented by country. The x-axis represents the time period via year, the y-axis represents average PISA test score, and each facet row represents a different test subject domain. This chart was created using Altair in Python.
A multiple line graph was chosen as the geometry for these two figures based on the intent of the figures - to assess test score trends over time and compare them by gender and to the overall average. Facets were selected to segment the visualization by subject without overcrowding and adding confusion. It is trivial for the viewer to compare test scores across subjects. The color scheme was purposefully selected to distinguish male average test scores by blue, female average test scores by pink, and total average test scores by a neutral green; pink is a common association for denoting girls as is blue for boys. A legend is included to readily explain the meaning of these colors. White points were added to each line to make it clear where test score values exist, at three year increments. A drop-down was also included for each figure that enables the viewer to select a country to view test scores for over time; the default view is “ALL” which represents the average test scores for all the countries included in the dropdown. This feature was added for the user to see how average test scores have evolved by gender for each country to have a more granular understanding of how the gender gap manifests itself in the educational sector. The drop-down was further alphabetized to make it easier for the viewer to find their country of interest. A tool tip was included to provide on-demand information for each data point on the chart with the year, gender, average test score value, country code, and subject. The range of the y-axis is also independent for each frame which allows the lines to be more spread out for each country, based on its test score values. And, lastly, an option exists to zoom in and out to drill-down and view finer details, especially when lines overlap or converge.
The design of Figure 1 has largely remained the same from its original proposal. However, the additional feature of filtering the view by country was added. This was to add another dimension and describe trends at a more granular view; a viewer can now describe and understand how the educational gender gap has evolved by country and overall. Depending on the use case for analyzing this information, the viewer has more control over the information they see. When preparing the dataset for visualization, a rank variable was derived that sorted countries by highest to lowest overall test score, only for 2018. This was designed because the first prototype used a slider instead of a dropdown. However, since it did not fit all of the countries well, only a subset of countries with the highest and lowest test scores for 2018 were used. It was determined that this setup may confuse the user, so a dropdown was chosen instead.
***
Figure 2 depicts the difference of women and men entering eleven separate fields post college graduation in OECD countries. The purpose of this novel visualization is to illustrate the gender disparity in employment opportunities for men and women in STEM fields by country and see how they evolve over time. This chart was created using Plotly in Python.
A choropleth map was selected as the geometry to compare the entrance into the labor force geospatially. With over thirty OECD countries, an interactive spatial map offers the ability to view the difference of men and women entering an industry post-graduation by country and over time in one view. The colors overlaid on each country represent the difference in the share of men and women, where positive values indicate there are more men entering a field and negative values indicate there are more women entering a field. A divergent gradient of pink and blue was intentionally selected to depict if the share of men and women entering the given field is skewed more towards men or women for a given country; the difference is depicted on the legend on the right-hand side of the figure. A divergent palette was chosen over a sequential color palette to clearly illustrate whether men or women are predominantly entering a given field. The neutral gray color, at a value of 0, indicates that the share of men and women entering a field are equal. Additionally, the range of values intentionally change for each field and year combination, instead of being standardized. Although this makes it slightly more difficult to compare trends across fields and years, it is necessary to get a sense of intra-country differences since the difference for a given field ranges from 1 to 30 percentage points. Standardizing the range would make the difference imperceptible to the naked eye for the majority of data.
Furthermore, countries that do not have values are depicted in white to clearly indicate that there are no values present for that selection. A tool tip is included for the viewer to readily understand the exact value of the difference in men and women entering a field and the country name. A dropdown functionality was added to enable a viewer to filter the frame and see rates by industry. Being able to subset by field allows viewers to gain specific information for career paths that they inquire about. One other aspect of this novel visualization is a time slider that allows the viewer to filter the frame on yet another dimension by selecting a year on the slider. Ranging from 2010 to 2019, a viewer can see how entrance rates have changed for different industries across time and understand country-specific trends.
The vision of the novel visualization has changed from its initial proposal. The original version attempted to overlay test scores as a symbol over the choropleth to see if educational trends corresponded with occupational trends for a given country over time. Due to technical constraints as well as concerns about overcrowding, the granulated symbol portion of novel visualization was not developed. In order to obtain the same insights, the viewer can look at this figure alongside Figure 1 to see how educational disparity over time and by country compares to those post-graduation. Additionally, overlaying the symbols may clutter the visual and draw attention away from the primary intent to explore gender differences by industry. Thus, the focus of this visual is on entrance rates and interactivity to maximize engagement with the dataset. It is imperative to note that this is a novel visualization because it combines two interactive features - a slider and a dropdown menu - to filter the frame by two dimensions at a time. The novel features allow the user to have a lot of control over the visualization and understand what the share of men and women entering the labor force looks like for each year and field to draw extremely granular insights.
***
Figure 3 depicts the relationship between female labor force participation and fertility rate in OECD countries from 1960-2020. The purpose of this figure is to illustrate the correlation between fertility rate and labor force participation over time by country. To achieve this, three separate visualizations were included - the first to depict how the average female labor force participation differs by country, the second to showcase how the distribution of fertility rate over time varies by country, and the third to illustrate the association between fertility rate and female labor force participation over time. This chart was created using Altair in Python.
A bar chart was selected as the geometry to display the average labor force participation over time in order to easily compare aggregate values by country. A box plot was selected as the geometry to display the distribution of fertility rate over time in order to view the range of a continuous variable for each country and compare across countries. A scatter plot geometry was chosen to display the relationship between fertility rate and female labor force participation as it is the best way to display the correlation between two continuous variables while adding an additional year dimension to see the evolution over time. Since the purpose of Figure 3 is not to compare females to males, the neutral green color leveraged above was chosen for all charts to match the theme across the portfolio. A legend was included on the right of the scatter plot to illustrate what the size and color of the points represent; darker points and larger diameters indicate later time periods. A sequential color palette was chosen because the intent is to mark higher values with darker colors and lower values with lighter colors for one time period dimension. Also, the countries on the bar chart were ordered by decreasing labor force participation to make it apparent which countries had high and low rates over time. A tool tip was included on all of the charts to show the reader the exact data points for each country on all three charts; this feature is imperative to distinguish between values due to the amount of information and dimensions across Figure 3. Lastly, all three of these visualizations are linked. The default view is to see all countries, but a single country can be selected on the bar chart to filter the entire view to that country. This allows the user to hone in on a singular view when interested in a specific country. They are then able to double click on the same bar to remove that filter and reset the view.
The initial proposed structure of Figure 3 was to be alongside parental leave to evaluate how family leave policy and women's ability to maintain careers relate to one another. Because the additional dimension of fertility rate was included, a separate figure was created altogether instead of crowding the initial visual. Thus, to assess family leave policy and women’s ability to maintain careers, the viewer can look at this figure in conjunction with Figure 4. Now, it is possible to see the relationship between fertility, labor force participation, and parental leave policy.
***
Figure 4 depicts the minimum government required parental leave length, in weeks, for each country from 1970 to 2018. The first goal of this visualization is to compare different types of required parental leave (length of maternity leave, paid paternity leave, and total length of parental leave with job protection) to one another. The second intent is to compare the length of parental leave for each segment by country. And, lastly, it aims to examine how the length of parental leave evolves over time. The data is illustrated on a horizontal grouped bar chart where the y-axis represents country, the x-axis represents the average length in weeks, and each frame represents a specific year between 1970 and 2018. This chart was created using Plotly in R.
A bar chart was selected as the geometry for this visualization to compare summarized values of parental leave against one another, within and across countries. A grouped bar chart was specifically chosen because there are multiple types of parental leave that are compared within each country. It is worth noting that a faceted bar chart could have been leveraged instead, where each facet is a type of parental leave metric. While that design may be less cluttered, it does lose the ability to effectively compare parental leave measures within specific countries. The color palette was chosen such that blue represents paternal leave (men), pink represents maternal leave (women), and the neutral green represents total parental leave, similar to the color scheme above. A legend is included to distinguish the meaning of the colored lines. It is interactive and when the viewer selects a value, they are able to isolate them on the frame. For instance, if a viewer wanted to compare the length of maternity leave across countries, they could click on all the other values to isolate that one metric. And, when there is one parental leave metric on the frame, the chart is ordered by decreasing length for ease of legibility; they are able to quickly understand which countries have the highest parental leave length for the given time period. A slider was also included to enable the viewer to see parental leave length at each year in the 1970-2018 timeframe. The early years (1970-2005) only had data for every five years while the later years have annual data available. By selecting play or each year value on the slider, viewers can see changes over time. The pace of the animation was slowed in order to give the user ample time to understand how the trends evolve year over year. A tool tip was included that provides the length of parental leave for each metric and each country for the viewer to see exact values.
Currently, the design of Figure 4 does deviate from its initial proposed structure in that it does not overlay female labor force participation data to evaluate how family leave policy and women's ability to maintain careers relate to one another. While the two separate datasets were initially prepared and joined accordingly, the sheer number of features that were attempted to be displayed on one chart was too overwhelming. This chart already had a country, year, and parental leave dimensions where a grouped bar chart geometry was leveraged. Adding fertility rate and female labor force participation with potentially different aggregation would be too cumbersome for a reader to quickly draw insights from. As such, Figure 3 was developed such that the viewer can explore both charts in tandem to see a relationship between parental leave, female labor force participation, and fertility rate.
***
Figure 5 depicts the relationship between the percentage of women employed in STEM occupations and their earnings as a percentage of men’s in 2019. The purpose of this visualization is to understand which STEM occupations have the greatest gender inequity and how the wage disparity contributes to the divide. This chart was created using Plotly in R.
A scatter plot was selected as the geometry for this visualization because the intent is to understand the relationship between two continuous variables. A bubble chart was specifically used so that categorical variables could further segment the data. The size of each mark is visually encoded to the total number of employees (both men and women) for the given occupation while the color represents the respective occupation on the legend. The color palette was selected such that each occupation had a noticeably different color from one another for the viewer to easily distinguish; a legend was included to provide further clarity for the reader. The legend allows the viewer to isolate occupation types on the graph by selecting values to exclude. This allows them to compare the percentage of women employed and earnings for the STEM occupation of choice. A tool tip was provided to reveal the coordinates and occupation of each point on the chart. These interactions allow the viewer to have more control over the visualization and understand how occupations pay and hire women. Lastly, the 50% mark of percentage of women employed and 100% mark of women’s earnings as a percentage of men lines are bolded to delineate the values where women would be equally represented and paid; annotations on these lines were specifically excluded to avoid clutter amongst the many points and tool tips.
The execution of this visualization has adhered to the proposed design and successfully communicated the desired narrative.
***
Figure 6 illustrates the percentage of respondents of each gender using different programming tools on a regular basis. Programming tool is shown on the y-axis and the percentage of respondents using each tool is shown on the x-axis. The data came from a 2018 study posted on Kaggle, The Gender Divide In Data Science, which surveyed over 23,000 data science and analytics professionals to assess gender discrimination in the field. The primary goal of this visualization is to explore whether or not male and female data science and analytics professionals have the same technical skills in the current labor force. The insights gained from this visualization ties into the figures above in exploring gender gaps in certain STEM occupations as well as wage discrimination in these STEM fields based on gender. This chart was created using Plotly in Python.
A bidirectional bar chart was chosen to depict this data because it shows a very clear comparison between males and females, side by side. Blue was chosen to show the male data and pink to show the female data as those are intuitive colors and would be easily understood by a wide audience. Since the sample of males is drastically larger than the sample of females, the ratios were calculated as the individuals using each programming tool out of the total number of respondents for each gender. This visualization has several interactive features including zooming, panning, select, and hover-over capabilities. The hover-over data is shown as black text on a white background to blend with the white background of the plot. It clearly shows the ratio as a decimal as well as the programming tool and gender. The tooltip labels depict rounded percentages for males and females respectively. A legend was intentionally left off the chart, as the labels, colors, and layout clearly indicate what each bar represents.
Figure 6 deviates from its initial proposed structure in that it is not a stacked bi-directional bar chart and removes the country dimension from the x-axis. This is because upon further examination of the dataset, more valuable insights would be generated by displaying usage for all programming tools rather than the initially proposed subset (SQL, Python, R). One option to include the country dimension and all programming tools is to generate a stacked bi-directional bar graph with more than just three programming tools; however, this design would add a lot of clutter and make it hard to compare usage across males and females. As such, a more simplified version of this chart was developed as a final version.
***
Overall, the technical goals for all but one visualization have evolved from the initial proposal, as described above. However, the overall vision and narrative have remained the same. The desired plots, narrative, and outcomes of the proposal were very realistic with the exception of the initial novel visualization that involved overlaying granulated symbols on top of its current design. Due to the technical complexity of the current design and of a granulated symbol map, the original proposal was in hindsight a little unrealistic. In a further iteration of this project, ideally all views with a country and year component would be linked. While each figure already has many interactive functionalities, it would be even more powerful for the viewer to interact with all of the charts in one view for full control. Additionally, data on gender biases could be assessed in conjunction with these more quantifiable factors to dissect the cause of gender disparities more thoroughly.
Figure 1 contains the PISA test scores for each subject and gender over time. From the default view that contains the average test scores for all countries, math test scores have decreased from 2003 to 2018 for boys and girls. In 2003, girls had an average 488.74 score and boys had an average 499.55 score, and in 2018, girls had an average of 486.73 while boys had an average of 491.76. Although average math scores have decreased, the gap between boys and girls has also decreased as evident by the difference in values. Simultaneously, reading test scores have decreased for boys and girls, contributing to an overall decrease in average test scores from 2000 to 2018. In 2000, girls had an average of 505.4 and boys had an average of 474.2, and in 2018, girls had an average of 502.17 and boys had an average of 472.44. Although average reading scores have decreased overall, the gap between boys and girls has decreased. Lastly, science test scores have decreased for all cohorts from 2006 to 2018. In 2006, girls had an average of 493.94 and boys had an average of 496.46 while, in 2018, girls had an average of 498.54 and boys had an average of 487.03. In this case, the gap between boys and girls has reversed where girls have higher science test scores than their counterparts. Drilling down, most countries emulate this narrowing gender gap amongst subjects, but their overall trends deviate significantly. For instance, Portugal (PRT) shows that math, reading, and science test scores have increased significantly over time.
Figure 2 illustrates labor force entrance rates across eleven professions, both non-STEM (Agriculture, forestry, fisheries and veterinary; Arts and humanities; Business administration and law; Education; Generic programmes and qualifications; Services; Social sciences, journalism, and information) and STEM (Engineering, manufacturing, and construction; Information and Communication Technologies; Health and welfare; Natural sciences, mathematics and statistics). Specifically, the difference between the share of men and women entering the professional sphere for each field is examined for OECD countries between 2010 and 2019. A few major trends have emerged.
First, the gender disparities in entrance rates for STEM fields remain higher than in non-STEM fields. Using Natural sciences, mathematics, and statistics as a proxy for STEM fields, the majority of countries saw 1-2 percentage points more men entering the workforce than women, as shades of blue indicate that the rate of entrance for men is higher than women. This trend remains consistent across time, with the exception of 2011 and 2016 where more women than men entered the field. Such anomalies indicate that progress has been made where more women are pursuing STEM occupations, but that the progress has not been linear. In contrast, most non-STEM fields see more women entering the workforce than men with the map shaded pink across almost all countries during this timeframe. Overall, the gender gap across fields is shrinking on average. For instance, in the United States, 10.95 percentage points more men were entering Natural sciences, mathematics, and statistics than women in 2010 and 1.48 percentage points more men were entering the field in 2018.
Figure 3 contains three figures that examine macro trends in the labor force for fertility rate and female labor force participation. From the scatter plot on the bottom right, the female labor force participation has increased over time; the darker and larger points are more concentrated at higher female labor force participation values. Additionally, the fertility rate has decreased over time; the darker and larger points are more concentrated at lower fertility rate values. Turkey appears to be an exception to this trend, which appears to have higher labor force participation at earlier time periods. From the box plot on the top right, the fertility rate distribution from 1960-2020 appears to be highest for the U.S. Virgin Islands, Turkey, Costa Rica, Columbia, Chile, and Mexico and lowest for Italy, Switzerland, Austria, Luxembourg, and Germany. And, from the bar plot on the left, the average labor force participation from 1960-2020 has been highest for Iceland, Sweden, and Norway and lowest for Turkey, Italy, Chile, Greece, Spain and Mexico. Though there are exceptions to this trend, many countries that have lower distributions of fertility rates have higher average female labor force participation and vice versa for countries with higher fertility rate distributions. One notable exception to this trend is Italy which has a low fertility rate distribution and a low average female labor force participation.
Figure 4 depicts the length of minimum government granted parental leave by country over time from 1970 to 2018. It is clear that the length of parental leave (maternity, paid paternity, and total with job protection) have all increased over time. Expectedly, the first eight steps in the animation have the largest increase in granted parental leave, since they are at five year increments and not yearly increments. During this timeframe, from 1970 to 2005, parental leave increased drastically for almost all of the OECD countries. Countries with no minimum parental leave requirements, like the United States and Australia, began to enforce requirements. After 2005, granted parental leave continued to increase for the majority of countries, but at a slower rate. Interestingly, New Zealand’s total parental leave with job protection requirements seem to have decreased from 2005 to 2018 from 40 to 45 weeks while other countries like the United States seemed to have remained constant at 12 weeks. Furthermore, granted paid paternity leave has increased over time. In 1970, almost no countries had paid paternity leave requirements. In 1995, six countries had minimum requirements. In 2005 a little less than half of all countries had minimum granted leave requirements. And, in 2018, almost 75% of countries had requirements in place. As of 2018, European countries like the UK and Greece grant the most maternity leave and Asian countries like Japan and Korea grant the most paid paternity leave. And, the United States continually is at the bottom of the pack for both.
Figure 5 depicts the relationship between the percentage of women employed in STEM occupations and their earnings as a percentage of mens’ in 2019. The purpose of this visualization is to understand which STEM occupations have the greatest gender inequity and how the wage disparity contributes to the divide. It is clear that women get paid less than men whether or not they dominate the workforce for that profession. For instance, computer occupations all have less women employed than men (under 50% on the y-axis), and all but one of the sub-occupations pay women less than men (under 100% on the x-axis). And, healthcare occupations have more women employed than men, and all but two sub-occupations pay women less than men. These two examples illustrate that women appear to be paid less than men in all types of STEM occupations. But, as evident by the position of points on the x-axis, computer occupations (hard sciences) pay women the least as compared to men while healthcare occupations (soft sciences) pay women the most as compared to men. Simultaneously, as evident by the position of points on the y-axis, healthcare occupations have the greatest percentage of women employed while computer and engineering occupations have the least. Figure 5 suggests occupations that have more women as compared to men pay women more on average as compared to other professions.
Figure 6 illustrates the percentage of survey respondents of each gender using different programming tools on a regular basis. The primary goal of this visualization is to explore whether or not male and female data science and analytics (STEM specialty) professionals have the same technical skills. This visualization illustrates a similar pattern in programming tool usage across male and female data analytics professionals. Python, SQL, and R are the three most common tools for both males and females used on a regular basis while Julia, Ruby, and Go are the least common. Over one quarter of male and female data analytics professionals use Python and over one tenth of them use SQL and R regularly. Amongst these three tools, slightly more men use Python than women while slightly more women use SQL and R than men. Additionally, slightly fewer females use Javascript and Bash than men while slightly more females use MATLAB and SAS.
These findings highlight the more systemic gender gap present in education over time; girls have historically performed poorer in STEM fields than boys while boys have historically performed worse in non-STEM fields than girls. As of 2018, these patterns seem to be dissolving and in some cases reversing. But, it appears that this gap is reversing at the cost of lower average test scores overall. This may be indicative of a cultural paradigm to encourage more girls to focus on math and science due to the underrepresentation of the fields overall. Additionally, lower average test scores may be representative of larger concerns including educational access, segregation, instruction, and curriculum. These results reveal that a greater emphasis must be placed on improving education outcomes overall, with an emphasis on math for girls and reading for boys.
The share of women entering the workforce in STEM fields appears to be increasing globally while the share of men entering the workforce in non-STEM fields appears to be increasing. These trends largely emulate those in the educational sector which suggests that student performance in school manifests into career choice. However, unlike test scores, the gender equality progress is not linear. There are aberrant years across STEM and non-STEM fields where the dichotomies reverse and more women and men are entering the fields respectively. These years raise questions about what underlying factors are causing these shifts as well as what countries turn to as a model for equality and representation in STEM employment. As the gap continues to narrow, contributing factors to inequality will become more opaque and shift from quantifiable aspects like test scores to unquantifiable ideas around biases and social norms. Looking to countries like Slovenia, Poland, and Turkey that fairly consistently have minimal differences in the share of men and women entering STEM fields will shed light on ways to close these gender gaps.
Trends in the workforce reveal that there is an inverse relationship between female labor force participation and fertility rate over time. Female labor force participation has increased over time while the fertility rate has slightly decreased. Though a causal inference cannot be drawn, it is clear that there is a correlation between labor force participation and fertility rate. One could infer that because women today have more opportunities in the workforce and are able to be more independent than ever, there may be a reduced emphasis on having a large family. Or inversely, perhaps there is a larger emphasis on family planning today in order to participate in the workforce.
Correspondingly, the length of government guaranteed parental leave has increased in the majority of countries over time. While maternity leave has been in place for most countries, paid-paternity leave has more recently been adopted and increased by countries. This illuminates the fact that gender disparities are not only female dominated; men and fathers need time to support their children, wives, and families too. Also, by offering men paid leave, governments support women returning to work, as they do not have to be the sole caretakers of the family. This trend, in conjunction with the increase in maternity leave length, enables women to support their families both physically and financially. Interestingly, the total length of parental leave with job protection has been astronomical for some European countries, even up to three years in some. Though parents might not get paid for a large part of this leave, governments are committed to supporting parents and offering job security. The job security that results from improved parental leave policies may also contribute to the increasing trends in labor force participation.
Honing in on the most recent trends in STEM professions, the majority of occupations have skewed representations with men dominating the workforce. This suggests that the narrowing gender gaps in education and entrance into the labor force have not manifested into the labor force just yet. Additionally, for almost all these professions, men make more money than women. So, perhaps women are choosing not to work in these fields because they know they’ll be paid less. Conversely, because women are not equally represented, employers might not feel obliged to pay them equally. The latter seems more plausible given the findings from Figure 5, where occupations that comprise more women pay better than those that do not. It appears as though this gap persists in STEM professions despite men and women having similar skill sets. Specifically, in data science and analytics, men and women utilize the same programming languages regularly in the workplace, largely Python, SQL, and R. Regardless of why the pay gap exists, companies must make a concerted effort to pay employees with similar experiences equitably. By not doing so, they are stifling productivity, labor force growth, and overall gender equality.
This research has found that gender inequality is still present but has decreased over time. Regarding education, the disparities as assessed by PISA test scores have shrunk to where boys no longer significantly outperform girls in STEM fields and girls no longer significantly outperform boys in non-STEM fields. These trends largely persist into the workforce entrance post-education. From 2010 to 2019, STEM fields have historically had more men than women entering the labor force, but that gap is diminishing. And, non-STEM fields have more women than men entering the labor force which also appears to be closing over time. Regarding the labor force, female participation in the labor force has increased while fertility rates have decreased. Simultaneously, government granted maternity leave and paid paternity leave have both increased over time. Currently, male and female analytics professionals have the same technical skills, but women are less represented and paid significantly less in almost every STEM field. Therefore, the original hypothesis that the gender gap has narrowed over time but is still present can be corroborated.
In addition to understanding how the gender gap has changed over time, the intent of this analysis was to investigate the potential factors for its evolution and presence today. Given the inverse relationship between fertility rate and female labor force participation, one potential reason the gender gap may be diminishing is due to a lower percentage of women giving birth. Correspondingly, the length of minimum granted parental leave for both men and women have increased globally. With less women giving birth and more time allotted to families who opt to have children, women are able to participate and perform better in the workforce. Although no causal relationship can be drawn from this analysis alone, there is a relationship between the narrowing gender gap in the workforce, parental leave policies, and fertility rate. The current disparities in workforce earnings and representation are likely not attributed to test scores or technical skills. This indicates that there are likely more abstract and cultural causes that contribute to the current disparity. For instance, perhaps more girls and women should be encouraged to study STEM subjects and join STEM professions.
Overall, this research can be used to raise awareness of gender inequalities in STEM and to promote policies to combat it in the future. However, one limitation of this analysis was the inability to explore additional factors that contribute to the gender gap across STEM fields, such as access to education, job segregation, and medical care. Subsequent analyses can hone in on these factors in addition to the parental leave policies and fertility rates discussed in this paper to more comprehensively understand why the gender gap persists. Another limitation of this analysis was the inability to truly assess the current-state of gender disparities, as of 2022. The most recent data obtained for programming tool usage was from 2018 while the most recent data obtained for workforce earnings and representation was from 2019. As such, future research can iteratively add the most up to date information released by these sources or obtain new data altogether.
1. “Global Gender Gap Report 2021.” World Economic Forum, https://www.weforum.org/reports/global-gender-gap-report-2021/digest.
2. IPisa - Pisa.” PISA - PISA, https://www.oecd.org/pisa/.
3. “International Student Assessment (PISA) - Reading Performance (PISA) - OECD Data.” TheOECD, https://data.oecd.org/pisa/reading-performance-pisa.htm#indicator-chart.
4. Oecd. Distribution of Graduates and New Entrants by Field : Share of Tertiary Graduates by Field of Education and Gender, https://stats.oecd.org/Index.aspx?QueryId=109881.
5. Employment - OECD Statistics. https://stats.oecd.org/index.aspx?queryid=54760.
6. “Fertility and Female Labor Force Participation.” Our World in Data, https://ourworldindata.org/grapher/fertility-and-female-labor-force-participation.
7. Martinlbarron. “The Gender Divide In Data Science.” Kaggle, Kaggle, 29 Nov. 2018, https://www.kaggle.com/code/martinlbarron/the-gender-divide-in-data-science/data?select=multipleChoiceResponses.csv.
8. U.S. Census Bureau 2019. STEM and STEM-Related Occupations by Sex and Median Earnings: ACS 2019, https://www.census.gov/data/tables/time-series/demo/income-poverty/stem-occ-sex-med-earnings.html