I like to read a variety of data science articles to expand my knowledge as well as apply it for real-world problems. But it is extremely difficult to remember every article. Thus, I want to organize the articles so that it is easier for me to refer back to them in the future.
In particular, I want to sort, summarize, and write down key takeaways from the article. I found Github issues is a perfect tool for this. I created this machine-learning-articles repo to organize my data science articles with Github issues.
Check out the page for the repo here. Contributions are welcomed.
Collection of useful data science topics along with code and articles in my data science blog
As an active user on LinkedIn with more than 500 connections, I was curious about the statistics of people in my network as well as the messages I received over the last 2 years. This project applies visualization and sentiment analysis to analyze my network and LinkedIn messages.
Utilize spaCy and Streamlit to create an app to predict sentiment and word similaries
Identity gender and language variety in Twitter in English. In specific, we want to classify between male and female and their locations including Australia, Canada, Great Britain, Ireland, New Zealand, United States.
Predict whether a tweet is about a disaster ot not. Preprocess with NLTK (Natural Language Toolkit) and perform model training with various models and techniques including: Tf-Idf, Tf-Idf with Ngrams (words and characters), SelectKbest, Binary Vectorizer, Word2Vec, Neural Network and Convolutional Neural Network with PyTorch. Achieve a f1-score of 80%.
Create an App to Classify Dogs Using fastai and Streamlit
This is a simple app to classify dogs using fastai and streamlit. The app is deployed using Streamlit Sharing. Click here to view and play with the app.
When searching the keyword “machine learning” on Github, I found 246,632 machine learning repositories. Since these are top repositories in machine learning, I expect the owners and the contributors of these repositories to be experts or competent in machine learning. Thus, I decided to extract the profiles of these users to gain some interesting insights into their background as well as statistics.
As a data science writer, I wonder:
To answer these questions, I scraped all data science articles on Medium published within the last year.
Description: Extract the text from an article using Python Article Library and use NLTK (Natural Language Processing Toolkit) to preprocess the text and extract the most common words in the article.
Tools: Newspaper3k (Python libary for article scraping), NLTK (Natural Language Processing Toolkit)
Description: Extract data from Ghibli movie database, preprocess the data, and perform sentiment analysis to predict if the movie is negative, positive, or neutral
Tools: Beautiful Soup (a Python library for scraping), NLTK (Natural Language Processing Toolkit), Scikit-learn, Numpy, Pandas
Motivation: My roomate and I were discussing about the correlation between the sun and depression. To prove my point that less sun is correlated to depression rate, I gather the data to support my hypothesis.
Tools: Beautiful Soup, Numpy, Pandas, Matplotlib
Description: Find the predictors of suicide among different countries, years, sex, generation, and age group. Apply multiple preprocesing techniques, incorporate geopandas to visualize the distribution on the map, and utilize three different machine learning models with accurate metrics of measuring.
Description: Explore the predictors of happiness among countries in the world
Visualization Techniques: Countplot, Jointplot, Heatmap
Decide which stocks to invest in each year so as to maximize the total returns using mathematical programming.
Split employees into groups of 2 people while maximizing the preference of each employee.