Discovering the new Digital world

At the beginning of my journey, I was mainly concern on continously learning new stuff following MOOCs one after another, without spending enough time to "digest" all the learning and using it on real problems. It was not a wise approacch and, after a while, I have started to realize that it is actually not enough.

The best way to really master a topic is to find a balance between theory and continuous practicey/be hands-on. In other words, get your hands dirty working on some pet projects, replicating experiments, or even better, capstone projects and Kaggle competitions.


Description Links
Experimenting with 'ggplot2'
July - October 2017
Building more knowledge around the 'ggplot2' package and how to use to create powerful visualizations and custom graphical elements. Learnings and findings summirized in a set of blogs (see Links). [Technology stack: R & R ecosystem] Basic Plotting
Essential Concepts
Guidelines for good plots
How to work with maps
Customize with 'grid'
Customize with 'ggplot2'
Extending ggplot2: create a new geom
June 2017
Build a custom geom for ggplot2 that can be used to add the expected result for a single storm observation to a map. The data, these wind radii, are available for Atlantic basin tropical storms since 1988 through the Extended Best Tract dataset. [Technology stack: R & R ecosystem] More Info...
Code & Data
ML - KNN Algorithm
April 2017
Using the KNN (K-Nearest Neighbors) algorithm to address a regression problem: the prediction of house values in the Seattle area. [Technology stack: Jupyter Notebook & Python ecosystem] KNN - Regression problem Code & Data
NLP - Naive Bayes Classifier
December 2016
Using a Naive Bayes Classifier to perform text classification: classify spam vs. ham sms using the SMS Spam Collection v. 1 dataset. [Technology stack: R & R ecosystem] Naive Bayes
NLP - Exploring the `tidytext` package
December 2016

Using the 'tidytext' package on different datasets (e.g. some books from the Project Gutenberg collection) to find useful insights/ information from text and transform it into data that could be used for further analysis. [Technology stack: R & R ecosystem]

Basic Usage
Sentiment Analysis
NLP - Exploring the `tm` package
November 2016
Using the 'tm' package on the SMS Spam Collection v. 1 dataset to find useful insights/ information from text and transform it into data that could be used for further analysis. [Technology stack: R & R ecosystem] Basic Usage
Capstone Project
June 2016
The Capstone Project of the "Data Science Specialization", created by JHU in collaboration with SwiftKey. The goal is to create a text prediction application so that when someone types “I went to the”, the application should presents three options for what the next word might be. For example, the three words might be gym, store, restaurant. The language model should be created using the HC corpora. [Technology stack: R & R ecosystem] Artifacts
Lesson Learned
Code