At the beginning of my journey, I was mainly concern on continously learning new stuff following MOOCs one after another, without spending enough time to "digest" all the learning and using it on real problems. It was not a wise approacch and, after a while, I have started to realize that it is actually not enough.
The best way to really master a topic is to find a balance between theory and continuous practicey/be hands-on. In other words, get your hands dirty working on some pet projects, replicating experiments, or even better, capstone projects and Kaggle competitions.
Description | Links | |
---|---|---|
Experimenting with 'ggplot2' July - October 2017 |
Building more knowledge around the 'ggplot2' package and how to use to create powerful visualizations and custom graphical elements. Learnings and findings summirized in a set of blogs (see Links). [Technology stack: R & R ecosystem] |
Basic Plotting Essential Concepts Guidelines for good plots How to work with maps Customize with 'grid' Customize with 'ggplot2' |
Extending ggplot2: create a new geom June 2017 |
Build a custom geom for ggplot2 that can be used to add the expected result for a single storm observation to a map. The data, these wind radii, are available for Atlantic basin tropical storms since 1988 through the Extended Best Tract dataset. [Technology stack: R & R ecosystem] |
More Info... Code & Data |
ML - KNN Algorithm April 2017 |
Using the KNN (K-Nearest Neighbors) algorithm to address a regression problem: the prediction of house values in the Seattle area. [Technology stack: Jupyter Notebook & Python ecosystem] |
KNN - Regression problem
Code & Data |
NLP - Naive Bayes Classifier December 2016 |
Using a Naive Bayes Classifier to perform text classification: classify spam vs. ham sms using the SMS Spam Collection v. 1 dataset. [Technology stack: R & R ecosystem] | Naive Bayes |
NLP - Exploring the `tidytext` package December 2016 |
Using the 'tidytext' package on different datasets (e.g. some books from the Project Gutenberg collection) to find useful insights/ information from text and transform it into data that could be used for further analysis. [Technology stack: R & R ecosystem] |
Basic Usage Sentiment Analysis |
NLP - Exploring the `tm` package November 2016 |
Using the 'tm' package on the SMS Spam Collection v. 1 dataset to find useful insights/ information from text and transform it into data that could be used for further analysis. [Technology stack: R & R ecosystem] | Basic Usage |
Capstone Project June 2016 |
The Capstone Project of the "Data Science Specialization", created by JHU in collaboration with SwiftKey. The goal is to create a text prediction application so that when someone types “I went to the”, the application should presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. The language model should be created using the HC corpora. [Technology stack: R & R ecosystem] |
Artifacts Lesson Learned Code |