Data Science
Collection of useful data science topics along with articles and videos.
To receive a condensed overview of these tools and additional resources, sign up for CodeCutβs free PDF guide. This comprehensive 264-page document covers over 100 essential data science tools, providing you with a valuable reference for your work.
How to Download the Code in This Repository to Your Local Machine
To download the code in this repo, you can simply use git clone
git clone https://github.com/khuyentran1401/Data-science
Contents
- MLOps
- Data Management Tools
- Testing
- Productive Tools
- Python Helper Tools
- Tools for Deployment
- Speed-up Tools
- Math Tools
- Machine Learning
- Natural Language Processing
- Computer Vision
- Time Series
- Feature Engineering
- Visualization
- Mathematical Programming
- Scraping
- Python
- Logging and Debugging
- Linear Algebra
- Data Structure
- Statistics
- Web Applications
- Share Insights
- Cool Tools
- Learning Tips
- Productive Tips
- VSCode
- Book Review
- Data Science Portfolio
MLOps
Title |
Article |
Repository |
Video |
Stop Hard Coding in a Data Science Project β Use Configuration Files Instead |
π |
π |
π |
Poetry: A Better Way to Manage Python Dependencies |
π |
Β |
π |
Git for Data Scientists: Learn Git through Practical Examples |
π |
Β |
π |
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code |
π |
π |
Β |
Kedro β A Python Framework for Reproducible Data Science Project |
π |
π |
Β |
Orchestrate a Data Science Project in Python With Prefect |
π |
π |
Β |
Orchestrate Your Data Science Project with Prefect 2.0 |
π |
π |
π |
DagsHub: a GitHub Supplement for Data Scientists and ML Engineers |
π |
π |
Β |
4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python |
π |
π |
π |
BentoML: Create an ML Powered Prediction Service in Minutes |
π |
π |
π |
How to Structure a Data Science Project for Maintainability (with DVC) |
π |
π |
π |
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) |
π |
π |
Β |
GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model |
π |
π |
Β |
Create Robust Data Pipelines with Prefect, Docker, and GitHub |
π |
π |
Β |
Create a Maintainable Data Pipeline with Prefect and DVC |
π |
π |
Β |
Build a Full-Stack ML Application With Pydantic And Prefect |
π |
π |
π |
Streamline Code Updates with DVC and GitHub Actions |
π |
π |
π |
Create Observable and Reproducible Notebooks with Hex |
π |
π |
π |
Build Reliable Machine Learning Pipelines with Continuous Integration |
π |
π |
π |
Automate Machine Learning Deployment with GitHub Actions |
π |
π |
π |
How to Build a Fully Automated Data Drift Detection Pipeline |
π |
π |
π |
| Title | Article | Repository | Video
| ββββ- |:ββββ-:| :ββ:| :ββ:|
|Introduction to DVC: Data Version Control Tool for Machine Learning Projects | π | π | π
| Great Expectations: Always Know What to Expect From Your Data | π | π
| Validate Your pandas DataFrame with Pandera | π |π | π
| Introduction to Schema: A Python Libary to Validate your Data | π | π
| How to Create Fake Data with Faker | π | π |
| Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing | π | π | π
| What is dbt (data build tool) and When should you use it? | π | π| π
| Streamline dbt Model Development with Notebook-Style Workspace | π | π | π
Testing
Title |
Article |
Repository |
Video |
Pytest for Data Scientists |
π |
π |
π |
4 Lessor-Known Yet Awesome Tips forΒ Pytest |
π |
π |
Β |
DeepDiff β Recursively Find and Ignore Trivial Differences Using Python |
π |
π |
Β |
Checklist β Behavioral Testing of NLP Models |
π |
π |
Β |
Detect Defects in a Data Pipeline Early with Validation and Notifications |
π |
π |
π |
Write Readable Tests for Your Machine Learning Models with Behave |
π |
π |
π |
Title |
Article |
Repository |
3 Tools to Track and Visualize the Execution of your Python Code |
π |
π |
2 Tools to Automatically Reload when Python Files Change |
π |
π |
3 Ways to Get Notified with Python |
π |
π |
How to Create Reusable Command-Line |
π |
Β |
How to Strip Outputs and Execute Interactive Code in a Python Script |
π |
π |
Sending Slack Notifications in Python with Prefect |
π |
π |
| Title | Article | Repository | Video
| ββββ- |:ββββ-:| :ββ:| :ββ:|
| Pydash: A Kitchen Sink of Missing Python Utilities | π | π
| Write Clean Python Code Using Pipes | π | π | π
| Introducing FugueSQL β SQL for Pandas, Spark, and Dask DataFrames | π | π
| Fugue and DuckDB: Fast SQL Code in Python | π | π
| Simplify Data Science Workflows on BigQuery with Fugue and Python | π | π
Title |
Article |
Repository |
How to Effortlessly Publish your Python Package to PyPI Using Poetry |
π |
π |
Typer: Build Powerful CLIs in One Line of Code using Python |
π |
π |
Title |
Article |
Repository |
Cython-A Speed-Up Tool for your Python Function |
π |
π |
Train your Machine Learning Model 150x Faster with cuML |
π |
π |
Title |
Article |
Repository |
SymPy: Symbolic Computation in Python |
π |
π |
Machine Learning
Title |
Article |
Repository |
Video |
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash |
π |
π |
Β |
How to Efficiently Fine-Tune your Machine Learning Models |
π |
π |
Β |
How to Learn Non-linear Dataset with Support Vector Machines |
π |
π |
Β |
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data |
π |
π |
Β |
3 Steps to Improve your Efficiency when Hypertuning ML Models |
π |
Β |
Β |
human-learn: Create a Human Learning Model by Drawing |
π |
π |
Β |
Patsy: Build Powerful Features with Arbitrary Python Code |
π |
π |
Β |
SHAP: Explain Any Machine Learning Model in Python |
π |
π |
Β |
Predict Movie Ratings with User-Based Collaborative Filtering |
π |
π |
Β |
River: Online Machine Learning in Python |
π |
π |
π |
Human-Learn: Rule-Based Learning as an Alternative to Machine Learning |
π |
π |
π |
Natural Language Processing
Title |
Article |
Repository |
Video |
Sentiment Analysis of LinkedInΒ Messages |
π |
π |
Β |
Find Common Words in Article with Python Module Newspaper and NLTK |
π |
π |
Β |
How to Tokenize Tweets with Python |
π |
π |
Β |
How to Solve Analogies with Word2Vec |
π |
π |
Β |
What is PyTorch |
π |
π |
Β |
Convolutional Neural Network in Natural Language Processing |
π |
π |
Β |
Supercharge your Python String with TextBlob |
π |
π |
π |
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know |
π |
π |
Β |
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
π |
π |
Β |
Build a Robust Conversational Assistant with Rasa |
π |
π |
Β |
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found |
π |
π |
Β |
Checklist β Behavioral Testing of NLP Models |
π |
π |
Β |
PRegEx: Write Human-Readable Regular Expressions in Python |
π |
π |
π |
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame |
π |
π |
Β |
Computer Vision
Title |
Article |
Repository |
How to Create an App to Classify Dogs Using fastai and Streamlit |
π |
π |
Time Series
Title |
Article |
Repository |
Kats: a Generalizable Framework to Analyze Time Series Data in Python |
π |
π |
How to Detect Seasonality, Outliers, and Changepoints in Your Time Series |
π |
π |
4 Tools to Automatically Extract Data from Datetime in Python |
π |
π |
Feature Engineering
Title |
Article |
Repository |
Video |
3 Ways to Extract Features from Dates with Python |
π |
π |
Β |
Similarity Encoding for Dirty Categories Using dirty_cat |
π |
π |
Β |
Snorkel β A Human-In-The-Loop Platform to Build Training Data |
π |
π |
π |
Visualization
Title |
Article |
Repository |
Video |
How to Embed Interactive Charts on your Articles and Personal Website |
π |
π |
Β |
What I Learned from Scraping 15k Data Science Articles on Medium |
π |
π |
Β |
How to Create Interactive Plots with Altair |
π |
π |
Β |
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool |
π |
π |
Β |
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found |
π |
π |
Β |
Top 6 Python Libraries for Visualization: Which one to Use? |
π |
π |
Β |
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model |
π |
π |
Β |
Visualize Gender-Specific Tweets with Scattertext |
π |
π |
Β |
Visualize Your Teamβs Projects Using Python Gantt Chart |
π |
π |
Β |
How to Create Bindings and Conditions Between Multiple Plots Using Altair |
π |
π |
Β |
How to Sketch your Data Science Ideas With Excalidraw |
π |
Β |
Β |
Pyvis: Visualize Interactive Network Graphs in Python |
π |
π |
π |
Build and Analyze Knowledge Graphs with Diffbot |
π |
Β |
Β |
Observe The Friend Paradox in Facebook Data Using Python |
π |
π |
Β |
What skills and backgrounds do data scientists have in common? |
π |
π |
Β |
Visualize Similarities Between Companies With Graph Database |
π |
π |
Β |
Visualize GitHub Social Network with PyGraphistry |
π |
π |
Β |
Find the Top Bootcamps for Data Professionals From Over 5k Profiles |
π |
π |
Β |
floWeaver β Turn Flow Data Into a Sankey Diagram In Python |
π |
π |
Β |
atoti β Build a BI Platform in Python |
π |
π |
Β |
Analyze and Visualize URLs with Network Graph |
π |
π |
Β |
statsannotations: Add Statistical Significance Annotations on Seaborn Plots |
π |
π |
π |
Mathematical Programming
Title |
Article |
Repository |
How to choose stocks to invest in with Python |
π |
π |
Maximize your Productivity with Python |
π |
π |
How to Find a Good Match with Python |
π |
π |
How to Solve a Staff Scheduling Problem with Python |
π |
π |
How to Find Best Locations for your Restaurants with Python |
π |
π |
How to Schedule Flights in Python |
π |
π |
How to Solve a Production Planning and Inventory Problem in Python |
π |
π |
Scraping
Title |
Article |
Repository |
Web Scrape Movie Database with Beautiful Soup |
π |
π |
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code |
π |
π |
Python
Title |
Article |
Repository |
Video |
6 Common Mistakes to Avoid in Data Science Code |
π |
Β |
π |
5 Steps to Transform Messy Functions into Production-Ready Code |
π |
π |
π |
Numpy Tricks for your Data Science Projects |
π |
π |
Β |
Timing for Efficient Python Code |
π |
π |
Β |
How to Use Lambda for Efficient Python Code |
π |
π |
Β |
Python Tricks for Keeping Track of Your Data |
π |
π |
Β |
Boost Your Efficiency With Specialized Dictionary Implementations in Python |
π |
π |
Β |
Dictionary as an Alternative to If-Else |
π |
π |
Β |
How to Use Zip to Manipulate a List of Tuples |
π |
π |
Β |
Get the Most out of Your Array With These Four Numpy Methods |
π |
π |
Β |
3 Python Tricks to Read, Create, and Run Multiple Files Automatically |
π |
π |
Β |
How to Exclude the Outliers in Pandas DataFrame |
π |
π |
Β |
Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable |
π |
π |
π |
3 Techniques to Effortlessly Import and Execute Python Modules |
π |
π |
Β |
Simplify Your Functions with Functoolsβ Partial and Singledispatch |
π |
π |
Β |
Logging and Debugging
Title |
Article |
Repository |
Video |
How to Create and View Interactive Cheatsheets on the Command-line |
π |
Β |
Β |
Understand CSV Files from your Terminal with XSV |
π |
Β |
Β |
Prettify your Terminal Text With Termcolor and Pyfiglet |
π |
π |
Β |
Loguru: Simple as Print, Flexible as Logging |
π |
π |
π |
Stop Using Print to Debug in Python. Use Icecream Instead |
π |
Β |
Β |
Rich: Generate Rich and Beautiful Text in the Terminal with Python |
π |
π |
Β |
Create a Beautiful Dashboard in your Terminal with Wtfutil |
π |
π |
Β |
3 Tools to Monitor and Optimize your Linux System |
π |
Β |
Β |
Ptpython: A Better Python REPL |
π |
π |
Β |
fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line |
π |
Β |
Β |
Speed Up your Command-Line Navigation with These 3 Tools |
π |
Β |
Β |
Python and Data Science Snippets on the Command Line |
π |
π |
Β |
Statistics
Title |
Article |
Repository |
Can Datasets of a Dinosaur and a Circle have Identical Statistics? |
π |
π |
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups |
π |
π |
Bayesβ Theorem, Clearly Explained with Visualization |
π |
π |
Detect Change Points with Bayesian Inference and PyMC3 |
π |
π |
Bayesian Linear Regression with Bambi |
π |
π |
Earn More Salary as a Coder β Higher Degree or More Years of Experience? |
π |
π |
Linear Algebra
Title |
Article |
Repository |
How to Build a Matrix Module from Scratch |
π |
π |
Linear Algebra for Machine Learning: Solve a System of Linear Equations |
π |
π |
Data Structure
Title |
Article |
Repository |
Convex Hull: An Innovative Approach to Gift-Wrap your Data |
π |
π |
How to Visualize Social Network With Graph Theory |
π |
π |
How to Search Data with KDTree |
π |
π |
How to Find the Nearest Hospital with a Voronoi Diagram |
π |
π |
Web Applications
Title |
Article |
Repository |
How to Create an Interactive Startup Growth Calculator with Python |
π |
π |
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge |
π |
π |
PyWebIO: Write Interactive Web App in Script Way Using Python |
π |
π |
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input |
π |
π |
Create an App to Deal with Boredom Using PyWebIO |
π |
π |
Build a Robust Workflow to Visualize Trending GitHub Repositories in Python |
π |
π |
Share Insights
Title |
Article |
Repository |
Introduction to Datapane: A Python Library to Build Interactive Reports |
π |
Β |
Datapaneβs New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code |
π |
π |
Introduction to Datasette: Explore and Publish Your Data in One Line of Code |
π |
Β |
How to Share your Python Objects Across Different Environments in One Line of Code |
π |
π |
How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok |
π |
Β |
Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook |
π |
Β |
Title |
Article |
Repository |
Simulate Real-life Events in Python Using SimPy |
π |
π |
How to Create Mathematical Animations like 3Blue1Brown Using Python |
π |
π |
Learning Tips
Title |
Article |
Repository |
How to Learn Data Science when Life does not Give You a Break |
π |
Β |
How to Accelerate your Data Science Career by Putting yourself in the Right Environment |
π |
Β |
To become a Better Data Scientist, you need to Think like a Programmer |
π |
Β |
How not to be Overwhelmed with Data Science |
π |
Β |
Productive Tips
Title |
Article |
Repository |
How to Organize your Data Science Articles with Github |
π |
π |
5 Reasons why you should Switch from Jupyter Notebook to Scripts |
π |
Β |
7 Reasons Why you Should Start Documenting your Code |
π |
Β |
VSCode
Title |
Article |
Repository |
How to Leverage Visual Studio Code for your Data Science Projects |
π |
Β |
Top 4 Code Viewers for Data Scientist in VSCode |
π |
Β |
Incorporate the Best Practices for Python with These Top 4 VSCode Extensions |
π |
Β |
Boost Your Efficiency with Customized Code Snippets on VSCode |
π |
Β |
Top 9 Keyboard Shortcuts in VSCode for Data Scientists |
π |
Β |
Book Review
Title |
Article |
Repository |
Python Machine Learning: A Comprehensive Handbook for Machine Learning |
π |
Β |
Data Science Portfolio
Title |
Article |
Repository |
How to Create an Elegant Website for your Data Science Portfolio in 10 minutes |
π |
Β |
Build an Impressive Github Profile in 3 Steps |
π |
Β |