Song Toxicity

Exploring Toxicity of Song Lyrics Using Machine Learning


This project explores the relationship between song lyric toxicity and factors, such as genre and gender. It was originally completed for Stand Up Boston, a workshop aimed at empowering people to stand up against sexual misconduct. We used a dataset publically available from Kaggle, which we augmented with information about artist gender scraped from Wikipedia.

Our slides highlight notable results. The data exploration and presentation was done by Irene Chen, Marzyeh Ghassemi, and Deborah Hanus.

About the Authors

Irene Chen is a PhD student at MIT in electrical engineering and computer science. She is broadly interested in creating and applying machine learning algorithms to high impact areas including healthcare and fairness. Prior to MIT, she worked at Dropbox as a Data Scientist, Chief of Staff, and Machine Learning Engineer. Irene received her bachelors and masters degrees at Harvard in applied math where she studied racial and gender discrimination on Airbnb. Irene enjoys long distance running while listening to Adele and Broadway musicals.

Marzyeh Ghassemi is a post-doc in the Clinical Decision Making Group at MIT’s Computer Science and Artificial Intelligence Lab (CSAIL) supervised by Dr. Peter Szolovits. She will join the University of Toronto as an Assistant Professor in Computer Science and Medicine in Fall 2018, and will be affiliated with the Vector Institute. Marzyeh’s research focuses on machine learning with clinical data to predict and stratify relevant human risks, encompassing unsupervised learning, supervised learning, structured prediction. Marzyeh was formerly a Goldwater and Marshall Scholar; her favorite dessert is Roxbury Puddingstone from Toscanini’s, and she listens to a lot of the “classics”.

Deborah Hanus is a startup founder, who loves using machine learning and data to make a difference. She has done machine learning research at MIT, Harvard, and Google Brain. She has spoken or led tutorials at AI with the Best, PyCon, SciPy, and QCon NY. She graduated from MIT with a M.Eng. in Electrical Engineering & Computer Science and a dual-degree in Computer Science and Brain & Cognitive Sciences. She has been awarded the Fulbright Student Fellowship, NSF Graduate Research Fellowship, and the Intel/ACM SIGHPC Computational Data Science Fellowship. Deborah believes that matcha can make anything better, and her favorite artist is Janelle Monae.