-
Project archive: some projects from graduate school and some just for fun
From implementing natural language processing models to web scraping fan-made wikis with BeautifulSoup, I have a wide-range of projects. This page has project summaries and links to reports and github pages where the jupyter notebook files are hosted.
-
Tidy Data: what is it and how do I use it?
Tidy data, a concept by Hadley Wickham, simplifies dataset manipulation by structuring data efficiently. These are my notes on his paper with a couple simplified examples for how to restructure the datasets with pandas instead of R.
-
Wait, wait which graph do I want again?
When working on data analysis projects, there are a few things I do each time to get a sense of the data that I’m working wtih. One of those things is plotting. Before I plot something, I ask myself these questions to make sure the plot I’m using will help me get the answer I…
-
Books I’m reading and referencing for projects
This page has a list of the books I most frequently consult while working on projects.
-
My favorite tools for practicing with python and other programming languages
This is similar to my post about books except it’s focused on resources for practicing python.
-
From pandas to plotting: an exercise in going from a pandas query to a plot
This exercise uses a kaggle dataset from an Ecuadorian-based grocery retailer, Corporación Favorita, accessed here. At this point, I already cleaned and pre-processed the data (converting datatypes, resolving nulls, merging csv files, etc.). The purpose of this exercise was to answer questions with this dataset using both pandas queries & visualizations with matplotlib or the…