khuyentran1401.github.io

My Collection of Data Science Articles

I like to read a variety of data science articles to expand my knowledge as well as apply it for real-world problems. But it is extremely difficult to remember every article. Thus, I want to organize the articles so that it is easier for me to refer back to them in the future.

In particular, I want to sort, summarize, and write down key takeaways from the article. I found Github issues is a perfect tool for this. I created this machine-learning-articles repo to organize my data science articles with Github issues.

Check out the page for the repo here. Contributions are welcomed.

My Data Science Articles

Collection of useful data science topics along with code and articles in my data science blog

Portfolio


Natural Language Processing

Explore and Visualize my LinkedIn Network with Python and Sentiment Analysis

View on GitHub View on Medium

As an active user on LinkedIn with more than 500 connections, I was curious about the statistics of people in my network as well as the messages I received over the last 2 years. This project applies visualization and sentiment analysis to analyze my network and LinkedIn messages.


Create an App to Predict Sentiment and Word Similaries

View on GitHub View on Medium

Utilize spaCy and Streamlit to create an app to predict sentiment and word similaries

Predict Gender and Location of Author in Twitter

View on GitHub

Identity gender and language variety in Twitter in English. In specific, we want to classify between male and female and their locations including Australia, Canada, Great Britain, Ireland, New Zealand, United States.


Real or Not - Predict which Tweets are about Real Disasters

View on GitHub

Predict whether a tweet is about a disaster ot not. Preprocess with NLTK (Natural Language Toolkit) and perform model training with various models and techniques including: Tf-Idf, Tf-Idf with Ngrams (words and characters), SelectKbest, Binary Vectorizer, Word2Vec, Neural Network and Convolutional Neural Network with PyTorch. Achieve a f1-score of 80%.


Computer Vision

Create an App to Classify Dogs Using fastai and Streamlit

View on GitHubView on Medium View on Streamlit

This is a simple app to classify dogs using fastai and streamlit. The app is deployed using Streamlit Sharing. Click here to view and play with the app.


Data Analysis

I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found

View on GitHub View on Medium

When searching the keyword “machine learning” on Github, I found 246,632 machine learning repositories. Since these are top repositories in machine learning, I expect the owners and the contributors of these repositories to be experts or competent in machine learning. Thus, I decided to extract the profiles of these users to gain some interesting insights into their background as well as statistics.

What I Learned from Scraping 15k Data Science Articles on Medium

View on GitHub View on Medium

As a data science writer, I wonder:

To answer these questions, I scraped all data science articles on Medium published within the last year.

Find most Frequent Words in Articles

View on GitHub View on Medium

Description: Extract the text from an article using Python Article Library and use NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article.

Tools: Newspaper3k (Python libary for article scraping), NLTK (Natural Language Processing Toolkit) image

Analyse Sentiment of Ghibli Movie Database

View on GitHub View on Medium

Description: Extract data from Ghibli movie database, preprocess the data, and perform sentiment analysis to predict if the movie is negative, positive, or neutral

Tools: Beautiful Soup (a Python library for scraping), NLTK (Natural Language Processing Toolkit), Scikit-learn, Numpy, Pandas image

Find the Correlation between Depression Rate and Sunshine Hours

View on GitHub View on Medium

Motivation: My roomate and I were discussing about the correlation between the sun and depression. To prove my point that less sun is correlated to depression rate, I gather the data to support my hypothesis.

Tools: Beautiful Soup, Numpy, Pandas, Matplotlib

image

Determine Predictors of Suicide

View on GitHub

Description: Find the predictors of suicide among different countries, years, sex, generation, and age group. Apply multiple preprocesing techniques, incorporate geopandas to visualize the distribution on the map, and utilize three different machine learning models with accurate metrics of measuring.

image

Analyze World Happiness Report

View on GitHub

Description: Explore the predictors of happiness among countries in the world

Visualization Techniques: Countplot, Jointplot, Heatmap image

Mathematical Programming

How to choose stocks to invest in with Python

View on GitHub View on Medium

Decide which stocks to invest in each year so as to maximize the total returns using mathematical programming.

image

How to Find a Best Match with Python

View on GitHub View on Medium

Split employees into groups of 2 people while maximizing the preference of each employee.

image