College Trends
College Trends
College majors and employment
The American Community Survey is a survey run by the US Census Bureau that collects data on everything from the affordability of housing to employment rates for different industries. For this experiment, I will using the data derived from the American Community Survey for years 2010-2012. The team at FiveThirtyEight has cleaned the dataset and made it available on their Github repo.
Here’s a quick overview of the files I’ll be working with:
all-ages.csv
- employment data by major for all agesrecent-grads.csv
- employment data by major for just recent college graduates
By completing this challenge, I will test your comfort with Pandas for manipulating DataFrames and calculating summary statistics.
import pandas as pd
all_ages = pd.read_csv("all-ages.csv")
all_ages.head(5)
Major_code | Major | Major_category | Total | Employed | Employed_full_time_year_round | Unemployed | Unemployment_rate | Median | P25th | P75th | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1100 | GENERAL AGRICULTURE | Agriculture & Natural Resources | 128148 | 90245 | 74078 | 2423 | 0.026147 | 50000 | 34000 | 80000 |
1 | 1101 | AGRICULTURE PRODUCTION AND MANAGEMENT | Agriculture & Natural Resources | 95326 | 76865 | 64240 | 2266 | 0.028636 | 54000 | 36000 | 80000 |
2 | 1102 | AGRICULTURAL ECONOMICS | Agriculture & Natural Resources | 33955 | 26321 | 22810 | 821 | 0.030248 | 63000 | 40000 | 98000 |
3 | 1103 | ANIMAL SCIENCES | Agriculture & Natural Resources | 103549 | 81177 | 64937 | 3619 | 0.042679 | 46000 | 30000 | 72000 |
4 | 1104 | FOOD SCIENCE | Agriculture & Natural Resources | 24280 | 17281 | 12722 | 894 | 0.049188 | 62000 | 38500 | 90000 |
Summarizing major categories
In both of these datasets, majors are grouped into categories. As you may have noticed, there are multiple rows with a common value for Major_category
but different values for Major
. We would like to know the total number of people in each Major_category
for both datasets.
I will use the Total
column to calculate the number of people who fall under each Major_category
and store the result as a separate dictionary for each dataset. The key for the dictionary should be the Major_category
and the value should be the total count. For the counts from all_ages
, store the results as a dictionary named all_ages_major_categories
and for the counts from recent-grads
, store the results as a dictionary named recent_grads_major_categories
.
all_ages = pd.read_csv("all-ages.csv")
all_ages_totals = all_ages.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
all_ages_totals
Major_category
Business 9858741
Education 4700118
Humanities & Liberal Arts 3738335
Engineering 3576013
Health 2950859
Social Science 2654125
Psychology & Social Work 1987278
Arts 1805865
Communications & Journalism 1803822
Computers & Mathematics 1781378
Biology & Life Science 1338186
Industrial Arts & Consumer Services 1033798
Physical Sciences 1025318
Law & Public Policy 902926
Agriculture & Natural Resources 632437
Interdisciplinary 45199
Name: Total, dtype: int64
recent_grads = pd.read_csv("recent-grads.csv")
recent_totals = recent_grads.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
recent_totals
Major_category
Business 1302376
Humanities & Liberal Arts 713468
Education 559129
Engineering 537583
Social Science 529966
Psychology & Social Work 481007
Health 463230
Biology & Life Science 453862
Communications & Journalism 392601
Arts 357130
Computers & Mathematics 299008
Industrial Arts & Consumer Services 229792
Physical Sciences 185479
Law & Public Policy 179107
Agriculture & Natural Resources 79981
Interdisciplinary 12296
Name: Total, dtype: int64
Low wage jobs rates
The press likes to talk a lot about how many college grads are unable to get higher wage, skilled jobs and end up working lower wage, unskilled jobs instead. As a data person, it is your job to be skeptical of any broad claims and explore if you can acquire and analyze relevant data to obtain a more nuanced view. Let’s run some basic calculations to explore that idea further.
I will use the Low_wage_jobs
and Total
columns to calculate the proportion of recent college graduates that worked low wage jobs. Store the resulting Float object of the calculation as low_wage_percent
.
recent_grads = pd.read_csv("recent-grads.csv")
low_wage_percent = 0.0
low_wage_sum = float(recent_grads["Low_wage_jobs"].sum())
recent_sum = float(recent_grads["Employed"].sum())
low_wage_percent = low_wage_sum / recent_sum
low_wage_percent
0.12371514957893746
So it looks like %12.3 percent of new grads are working in low-wage jobs.
Comparing datasets
Both all_ages
and recent_grads
datasets have 173 rows, corresponding to the 173 college major codes. This enables us to do some comparisons between the two datasets and perform some initial calculations to see how similar or different the statistics of recent college graduates are from those of the entire population.
We want to know the number of majors where recent grads fare better than the overall population. For each major, determine if the Unemployment_rate
is lower for recent_grads
or for all_ages
and increment either recent_grads_lower_emp_count
or all_ages_lower_emp_count
respectively.
# All majors, common to both DataFrames
majors = recent_grads['Major'].value_counts().index
recent_grads_lower_emp=[]
all_ages_lower_emp=[]
for major in majors:
recent_unemply_rate = recent_grads[recent_grads["Major"]==major]["Unemployment_rate"].values[0]
all_time_unemply_rate = all_ages[all_ages["Major"]==major]["Unemployment_rate"].values[0]
diff = recent_unemply_rate - all_time_unemply_rate #comparator
if diff < 0:
recent_grads_lower_emp.append(major)
elif diff >0:
all_ages_lower_emp.append(major)
else:
pass #equal
len(recent_grads_lower)
43
len(all_ages_lower)
128
So it looks like for only 43/173 majors new grads have more success than older workers. It follows the old addage the experience is key in the job search. Let’s take a look at what industries favor new grads:
recent_grads_lower_emp
['HUMAN SERVICES AND COMMUNITY ORGANIZATION',
'ART AND MUSIC EDUCATION',
'ASTRONOMY AND ASTROPHYSICS',
'MISCELLANEOUS ENGINEERING TECHNOLOGIES',
'UNITED STATES HISTORY',
'SOCIAL PSYCHOLOGY',
'SOIL SCIENCE',
'COUNSELING PSYCHOLOGY',
'INDUSTRIAL AND MANUFACTURING ENGINEERING',
'PHYSICS',
'CHEMISTRY',
'ATMOSPHERIC SCIENCES AND METEOROLOGY',
'EDUCATIONAL PSYCHOLOGY',
'PHYSICAL SCIENCES',
'MISCELLANEOUS PSYCHOLOGY',
'EARLY CHILDHOOD EDUCATION',
'DRAMA AND THEATER ARTS',
'NEUROSCIENCE',
'GEOSCIENCES',
'HUMAN RESOURCES AND PERSONNEL MANAGEMENT',
'MATHEMATICS',
'ARCHITECTURAL ENGINEERING',
'MATHEMATICS AND COMPUTER SCIENCE',
'COURT REPORTING',
'SPECIAL NEEDS EDUCATION',
'MATHEMATICS TEACHER EDUCATION',
'GENETICS',
'ENGINEERING AND INDUSTRIAL MANAGEMENT',
'HUMANITIES',
'AREA ETHNIC AND CIVILIZATION STUDIES',
'INDUSTRIAL PRODUCTION TECHNOLOGIES',
'GENERAL AGRICULTURE',
'ART HISTORY AND CRITICISM',
'ENGINEERING MECHANICS PHYSICS AND SCIENCE',
'METALLURGICAL ENGINEERING',
'MULTI/INTERDISCIPLINARY STUDIES',
'ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION',
'MISCELLANEOUS FINE ARTS',
'ZOOLOGY',
'HEALTH AND MEDICAL PREPARATORY PROGRAMS',
'PETROLEUM ENGINEERING',
'MATERIALS ENGINEERING AND MATERIALS SCIENCE',
'BOTANY']
all_ages_lower_emp
['AEROSPACE ENGINEERING',
'PLANT SCIENCE AND AGRONOMY',
'GENERAL MEDICAL AND HEALTH SERVICES',
'ELECTRICAL ENGINEERING',
'COMMUNICATION TECHNOLOGIES',
'GEOGRAPHY',
'AGRICULTURE PRODUCTION AND MANAGEMENT',
'NUCLEAR ENGINEERING',
'MASS MEDIA',
'AGRICULTURAL ECONOMICS',
'MISCELLANEOUS SOCIAL SCIENCES',
'FOOD SCIENCE',
'VISUAL AND PERFORMING ARTS',
'ENGINEERING TECHNOLOGIES',
'MOLECULAR BIOLOGY',
'COMPUTER NETWORKING AND TELECOMMUNICATIONS',
'PHYSICAL AND HEALTH EDUCATION TEACHING',
'BIOLOGY',
'ECONOMICS',
'SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION',
'ENVIRONMENTAL ENGINEERING',
'TRANSPORTATION SCIENCES AND TECHNOLOGIES',
'HEALTH AND MEDICAL ADMINISTRATIVE SERVICES',
'ADVERTISING AND PUBLIC RELATIONS',
'COMPUTER PROGRAMMING AND DATA PROCESSING',
'POLITICAL SCIENCE AND GOVERNMENT',
'FINANCE',
'INTERNATIONAL BUSINESS',
'COMMUNICATIONS',
'BIOCHEMICAL SCIENCES',
'MUSIC',
'GEOLOGICAL AND GEOPHYSICAL ENGINEERING',
'NATURAL RESOURCES MANAGEMENT',
'TREATMENT THERAPY PROFESSIONS',
'COMMUNICATION DISORDERS SCIENCES AND SERVICES',
'PHYSIOLOGY',
'MISCELLANEOUS HEALTH MEDICAL PROFESSIONS',
'PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION',
'GENERAL ENGINEERING',
'COGNITIVE SCIENCE AND BIOPSYCHOLOGY',
'STUDIO ARTS',
'MEDICAL TECHNOLOGIES TECHNICIANS',
'COMPUTER SCIENCE',
'COMPUTER ENGINEERING',
'COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY',
'CRIMINOLOGY',
'LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE',
'MISCELLANEOUS BIOLOGY',
'MINING AND MINERAL ENGINEERING',
'INTERNATIONAL RELATIONS',
'ARCHITECTURE',
'ECOLOGY',
'OCEANOGRAPHY',
'NURSING',
'ANIMAL SCIENCES',
'SCIENCE AND COMPUTER TEACHER EDUCATION',
'THEOLOGY AND RELIGIOUS VOCATIONS',
'CONSTRUCTION SERVICES',
'BUSINESS ECONOMICS',
'SOCIAL WORK',
'MARKETING AND MARKETING RESEARCH',
'NUTRITION SCIENCES',
'COMMUNITY AND PUBLIC HEALTH',
'CIVIL ENGINEERING',
'FORESTRY',
'ELEMENTARY EDUCATION',
'MISCELLANEOUS AGRICULTURE',
'JOURNALISM',
'OTHER FOREIGN LANGUAGES',
'ACCOUNTING',
'MATERIALS SCIENCE',
'ELECTRICAL ENGINEERING TECHNOLOGY',
'LANGUAGE AND DRAMA EDUCATION',
'PSYCHOLOGY',
'OPERATIONS LOGISTICS AND E-COMMERCE',
'APPLIED MATHEMATICS',
'ENGLISH LANGUAGE AND LITERATURE',
'FAMILY AND CONSUMER SCIENCES',
'PHARMACOLOGY',
'NAVAL ARCHITECTURE AND MARINE ENGINEERING',
'SOCIOLOGY',
'SCHOOL STUDENT COUNSELING',
'COMPOSITION AND RHETORIC',
'FILM VIDEO AND PHOTOGRAPHIC ARTS',
'MISCELLANEOUS ENGINEERING',
'BIOMEDICAL ENGINEERING',
'INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY',
'LIBERAL ARTS',
'COMMERCIAL ART AND GRAPHIC DESIGN',
'BIOLOGICAL ENGINEERING',
'PRE-LAW AND LEGAL STUDIES',
'PHILOSOPHY AND RELIGIOUS STUDIES',
'ENVIRONMENTAL SCIENCE',
'PHYSICAL FITNESS PARKS RECREATION AND LEISURE',
'STATISTICS AND DECISION SCIENCE',
'MECHANICAL ENGINEERING RELATED TECHNOLOGIES',
'HISTORY',
'FINE ARTS',
'TEACHER EDUCATION: MULTIPLE LEVELS',
'NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES',
'MANAGEMENT INFORMATION SYSTEMS AND STATISTICS',
'GENERAL EDUCATION',
'PUBLIC POLICY',
'COSMETOLOGY SERVICES AND CULINARY ARTS',
'MEDICAL ASSISTING SERVICES',
'LIBRARY SCIENCE',
'HOSPITALITY MANAGEMENT',
'ACTUARIAL SCIENCE',
'BUSINESS MANAGEMENT AND ADMINISTRATION',
'INTERDISCIPLINARY SOCIAL SCIENCES',
'CLINICAL PSYCHOLOGY',
'MECHANICAL ENGINEERING',
'ANTHROPOLOGY AND ARCHEOLOGY',
'INTERCULTURAL AND INTERNATIONAL STUDIES',
'MISCELLANEOUS EDUCATION',
'PUBLIC ADMINISTRATION',
'MULTI-DISCIPLINARY OR GENERAL SCIENCE',
'CRIMINAL JUSTICE AND FIRE PROTECTION',
'GENERAL BUSINESS',
'CHEMICAL ENGINEERING',
'SECONDARY TEACHER EDUCATION',
'MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION',
'FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES',
'MICROBIOLOGY',
'COMPUTER AND INFORMATION SYSTEMS',
'GENERAL SOCIAL SCIENCES',
'GEOLOGY AND EARTH SCIENCE',
'INFORMATION SCIENCES']
Permalink: college-trends
Tags: