College Trends

Alex Egg,

College Trends

College majors and employment

The American Community Survey is a survey run by the US Census Bureau that collects data on everything from the affordability of housing to employment rates for different industries. For this experiment, I will using the data derived from the American Community Survey for years 2010-2012. The team at FiveThirtyEight has cleaned the dataset and made it available on their Github repo.

Here’s a quick overview of the files I’ll be working with:

By completing this challenge, I will test your comfort with Pandas for manipulating DataFrames and calculating summary statistics.

import pandas as pd

all_ages = pd.read_csv("all-ages.csv")
all_ages.head(5)
Major_code Major Major_category Total Employed Employed_full_time_year_round Unemployed Unemployment_rate Median P25th P75th
0 1100 GENERAL AGRICULTURE Agriculture & Natural Resources 128148 90245 74078 2423 0.026147 50000 34000 80000
1 1101 AGRICULTURE PRODUCTION AND MANAGEMENT Agriculture & Natural Resources 95326 76865 64240 2266 0.028636 54000 36000 80000
2 1102 AGRICULTURAL ECONOMICS Agriculture & Natural Resources 33955 26321 22810 821 0.030248 63000 40000 98000
3 1103 ANIMAL SCIENCES Agriculture & Natural Resources 103549 81177 64937 3619 0.042679 46000 30000 72000
4 1104 FOOD SCIENCE Agriculture & Natural Resources 24280 17281 12722 894 0.049188 62000 38500 90000

Summarizing major categories

In both of these datasets, majors are grouped into categories. As you may have noticed, there are multiple rows with a common value for Major_category but different values for Major. We would like to know the total number of people in each Major_category for both datasets.

I will use the Total column to calculate the number of people who fall under each Major_category and store the result as a separate dictionary for each dataset. The key for the dictionary should be the Major_category and the value should be the total count. For the counts from all_ages, store the results as a dictionary named all_ages_major_categories and for the counts from recent-grads, store the results as a dictionary named recent_grads_major_categories.

all_ages = pd.read_csv("all-ages.csv")
all_ages_totals  = all_ages.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
all_ages_totals
Major_category
Business                               9858741
Education                              4700118
Humanities & Liberal Arts              3738335
Engineering                            3576013
Health                                 2950859
Social Science                         2654125
Psychology & Social Work               1987278
Arts                                   1805865
Communications & Journalism            1803822
Computers & Mathematics                1781378
Biology & Life Science                 1338186
Industrial Arts & Consumer Services    1033798
Physical Sciences                      1025318
Law & Public Policy                     902926
Agriculture & Natural Resources         632437
Interdisciplinary                        45199
Name: Total, dtype: int64
recent_grads = pd.read_csv("recent-grads.csv")
recent_totals = recent_grads.pivot_table(index="Major_category", aggfunc="sum").sort("Total", ascending=[0])["Total"]
recent_totals
Major_category
Business                               1302376
Humanities & Liberal Arts               713468
Education                               559129
Engineering                             537583
Social Science                          529966
Psychology & Social Work                481007
Health                                  463230
Biology & Life Science                  453862
Communications & Journalism             392601
Arts                                    357130
Computers & Mathematics                 299008
Industrial Arts & Consumer Services     229792
Physical Sciences                       185479
Law & Public Policy                     179107
Agriculture & Natural Resources          79981
Interdisciplinary                        12296
Name: Total, dtype: int64

Low wage jobs rates

The press likes to talk a lot about how many college grads are unable to get higher wage, skilled jobs and end up working lower wage, unskilled jobs instead. As a data person, it is your job to be skeptical of any broad claims and explore if you can acquire and analyze relevant data to obtain a more nuanced view. Let’s run some basic calculations to explore that idea further.

I will use the Low_wage_jobs and Total columns to calculate the proportion of recent college graduates that worked low wage jobs. Store the resulting Float object of the calculation as low_wage_percent.

recent_grads = pd.read_csv("recent-grads.csv")
low_wage_percent = 0.0

low_wage_sum = float(recent_grads["Low_wage_jobs"].sum())
recent_sum = float(recent_grads["Employed"].sum())

low_wage_percent = low_wage_sum / recent_sum
low_wage_percent
0.12371514957893746

So it looks like %12.3 percent of new grads are working in low-wage jobs.

Comparing datasets

Both all_ages and recent_grads datasets have 173 rows, corresponding to the 173 college major codes. This enables us to do some comparisons between the two datasets and perform some initial calculations to see how similar or different the statistics of recent college graduates are from those of the entire population.

We want to know the number of majors where recent grads fare better than the overall population. For each major, determine if the Unemployment_rate is lower for recent_grads or for all_ages and increment either recent_grads_lower_emp_count or all_ages_lower_emp_count respectively.

# All majors, common to both DataFrames
majors = recent_grads['Major'].value_counts().index

recent_grads_lower_emp=[]
all_ages_lower_emp=[]

for major in majors:
    recent_unemply_rate = recent_grads[recent_grads["Major"]==major]["Unemployment_rate"].values[0]
    all_time_unemply_rate = all_ages[all_ages["Major"]==major]["Unemployment_rate"].values[0]
    diff = recent_unemply_rate - all_time_unemply_rate #comparator
    
    if diff < 0:
        recent_grads_lower_emp.append(major)
    elif diff >0:
        all_ages_lower_emp.append(major)
    else:
        pass #equal
    
    
len(recent_grads_lower)
43
len(all_ages_lower)
128

So it looks like for only 43/173 majors new grads have more success than older workers. It follows the old addage the experience is key in the job search. Let’s take a look at what industries favor new grads:

recent_grads_lower_emp
['HUMAN SERVICES AND COMMUNITY ORGANIZATION',
 'ART AND MUSIC EDUCATION',
 'ASTRONOMY AND ASTROPHYSICS',
 'MISCELLANEOUS ENGINEERING TECHNOLOGIES',
 'UNITED STATES HISTORY',
 'SOCIAL PSYCHOLOGY',
 'SOIL SCIENCE',
 'COUNSELING PSYCHOLOGY',
 'INDUSTRIAL AND MANUFACTURING ENGINEERING',
 'PHYSICS',
 'CHEMISTRY',
 'ATMOSPHERIC SCIENCES AND METEOROLOGY',
 'EDUCATIONAL PSYCHOLOGY',
 'PHYSICAL SCIENCES',
 'MISCELLANEOUS PSYCHOLOGY',
 'EARLY CHILDHOOD EDUCATION',
 'DRAMA AND THEATER ARTS',
 'NEUROSCIENCE',
 'GEOSCIENCES',
 'HUMAN RESOURCES AND PERSONNEL MANAGEMENT',
 'MATHEMATICS',
 'ARCHITECTURAL ENGINEERING',
 'MATHEMATICS AND COMPUTER SCIENCE',
 'COURT REPORTING',
 'SPECIAL NEEDS EDUCATION',
 'MATHEMATICS TEACHER EDUCATION',
 'GENETICS',
 'ENGINEERING AND INDUSTRIAL MANAGEMENT',
 'HUMANITIES',
 'AREA ETHNIC AND CIVILIZATION STUDIES',
 'INDUSTRIAL PRODUCTION TECHNOLOGIES',
 'GENERAL AGRICULTURE',
 'ART HISTORY AND CRITICISM',
 'ENGINEERING MECHANICS PHYSICS AND SCIENCE',
 'METALLURGICAL ENGINEERING',
 'MULTI/INTERDISCIPLINARY STUDIES',
 'ELECTRICAL, MECHANICAL, AND PRECISION TECHNOLOGIES AND PRODUCTION',
 'MISCELLANEOUS FINE ARTS',
 'ZOOLOGY',
 'HEALTH AND MEDICAL PREPARATORY PROGRAMS',
 'PETROLEUM ENGINEERING',
 'MATERIALS ENGINEERING AND MATERIALS SCIENCE',
 'BOTANY']
all_ages_lower_emp
['AEROSPACE ENGINEERING',
 'PLANT SCIENCE AND AGRONOMY',
 'GENERAL MEDICAL AND HEALTH SERVICES',
 'ELECTRICAL ENGINEERING',
 'COMMUNICATION TECHNOLOGIES',
 'GEOGRAPHY',
 'AGRICULTURE PRODUCTION AND MANAGEMENT',
 'NUCLEAR ENGINEERING',
 'MASS MEDIA',
 'AGRICULTURAL ECONOMICS',
 'MISCELLANEOUS SOCIAL SCIENCES',
 'FOOD SCIENCE',
 'VISUAL AND PERFORMING ARTS',
 'ENGINEERING TECHNOLOGIES',
 'MOLECULAR BIOLOGY',
 'COMPUTER NETWORKING AND TELECOMMUNICATIONS',
 'PHYSICAL AND HEALTH EDUCATION TEACHING',
 'BIOLOGY',
 'ECONOMICS',
 'SOCIAL SCIENCE OR HISTORY TEACHER EDUCATION',
 'ENVIRONMENTAL ENGINEERING',
 'TRANSPORTATION SCIENCES AND TECHNOLOGIES',
 'HEALTH AND MEDICAL ADMINISTRATIVE SERVICES',
 'ADVERTISING AND PUBLIC RELATIONS',
 'COMPUTER PROGRAMMING AND DATA PROCESSING',
 'POLITICAL SCIENCE AND GOVERNMENT',
 'FINANCE',
 'INTERNATIONAL BUSINESS',
 'COMMUNICATIONS',
 'BIOCHEMICAL SCIENCES',
 'MUSIC',
 'GEOLOGICAL AND GEOPHYSICAL ENGINEERING',
 'NATURAL RESOURCES MANAGEMENT',
 'TREATMENT THERAPY PROFESSIONS',
 'COMMUNICATION DISORDERS SCIENCES AND SERVICES',
 'PHYSIOLOGY',
 'MISCELLANEOUS HEALTH MEDICAL PROFESSIONS',
 'PHARMACY PHARMACEUTICAL SCIENCES AND ADMINISTRATION',
 'GENERAL ENGINEERING',
 'COGNITIVE SCIENCE AND BIOPSYCHOLOGY',
 'STUDIO ARTS',
 'MEDICAL TECHNOLOGIES TECHNICIANS',
 'COMPUTER SCIENCE',
 'COMPUTER ENGINEERING',
 'COMPUTER ADMINISTRATION MANAGEMENT AND SECURITY',
 'CRIMINOLOGY',
 'LINGUISTICS AND COMPARATIVE LANGUAGE AND LITERATURE',
 'MISCELLANEOUS BIOLOGY',
 'MINING AND MINERAL ENGINEERING',
 'INTERNATIONAL RELATIONS',
 'ARCHITECTURE',
 'ECOLOGY',
 'OCEANOGRAPHY',
 'NURSING',
 'ANIMAL SCIENCES',
 'SCIENCE AND COMPUTER TEACHER EDUCATION',
 'THEOLOGY AND RELIGIOUS VOCATIONS',
 'CONSTRUCTION SERVICES',
 'BUSINESS ECONOMICS',
 'SOCIAL WORK',
 'MARKETING AND MARKETING RESEARCH',
 'NUTRITION SCIENCES',
 'COMMUNITY AND PUBLIC HEALTH',
 'CIVIL ENGINEERING',
 'FORESTRY',
 'ELEMENTARY EDUCATION',
 'MISCELLANEOUS AGRICULTURE',
 'JOURNALISM',
 'OTHER FOREIGN LANGUAGES',
 'ACCOUNTING',
 'MATERIALS SCIENCE',
 'ELECTRICAL ENGINEERING TECHNOLOGY',
 'LANGUAGE AND DRAMA EDUCATION',
 'PSYCHOLOGY',
 'OPERATIONS LOGISTICS AND E-COMMERCE',
 'APPLIED MATHEMATICS',
 'ENGLISH LANGUAGE AND LITERATURE',
 'FAMILY AND CONSUMER SCIENCES',
 'PHARMACOLOGY',
 'NAVAL ARCHITECTURE AND MARINE ENGINEERING',
 'SOCIOLOGY',
 'SCHOOL STUDENT COUNSELING',
 'COMPOSITION AND RHETORIC',
 'FILM VIDEO AND PHOTOGRAPHIC ARTS',
 'MISCELLANEOUS ENGINEERING',
 'BIOMEDICAL ENGINEERING',
 'INDUSTRIAL AND ORGANIZATIONAL PSYCHOLOGY',
 'LIBERAL ARTS',
 'COMMERCIAL ART AND GRAPHIC DESIGN',
 'BIOLOGICAL ENGINEERING',
 'PRE-LAW AND LEGAL STUDIES',
 'PHILOSOPHY AND RELIGIOUS STUDIES',
 'ENVIRONMENTAL SCIENCE',
 'PHYSICAL FITNESS PARKS RECREATION AND LEISURE',
 'STATISTICS AND DECISION SCIENCE',
 'MECHANICAL ENGINEERING RELATED TECHNOLOGIES',
 'HISTORY',
 'FINE ARTS',
 'TEACHER EDUCATION: MULTIPLE LEVELS',
 'NUCLEAR, INDUSTRIAL RADIOLOGY, AND BIOLOGICAL TECHNOLOGIES',
 'MANAGEMENT INFORMATION SYSTEMS AND STATISTICS',
 'GENERAL EDUCATION',
 'PUBLIC POLICY',
 'COSMETOLOGY SERVICES AND CULINARY ARTS',
 'MEDICAL ASSISTING SERVICES',
 'LIBRARY SCIENCE',
 'HOSPITALITY MANAGEMENT',
 'ACTUARIAL SCIENCE',
 'BUSINESS MANAGEMENT AND ADMINISTRATION',
 'INTERDISCIPLINARY SOCIAL SCIENCES',
 'CLINICAL PSYCHOLOGY',
 'MECHANICAL ENGINEERING',
 'ANTHROPOLOGY AND ARCHEOLOGY',
 'INTERCULTURAL AND INTERNATIONAL STUDIES',
 'MISCELLANEOUS EDUCATION',
 'PUBLIC ADMINISTRATION',
 'MULTI-DISCIPLINARY OR GENERAL SCIENCE',
 'CRIMINAL JUSTICE AND FIRE PROTECTION',
 'GENERAL BUSINESS',
 'CHEMICAL ENGINEERING',
 'SECONDARY TEACHER EDUCATION',
 'MISCELLANEOUS BUSINESS & MEDICAL ADMINISTRATION',
 'FRENCH GERMAN LATIN AND OTHER COMMON FOREIGN LANGUAGE STUDIES',
 'MICROBIOLOGY',
 'COMPUTER AND INFORMATION SYSTEMS',
 'GENERAL SOCIAL SCIENCES',
 'GEOLOGY AND EARTH SCIENCE',
 'INFORMATION SCIENCES']

Permalink: college-trends

Tags:

Last edited by Alex Egg, 2015-09-24 04:36:43
View Revision History