View on GitHub


Collection of useful data science topics along with articles, videos, and code

Data Science </h3> [![View on GitHub](]( [![Daily Data Science Tips](]( [![View on YouTube](]( Collection of useful data science topics along with articles and videos. Subscribe to: - [CodeCut]( for articles and bite-sized Python tips in your mailbox - [My YouTube channel]( for videos related to Python and data science ## How to Download the Code in This Repository to Your Local Machine To download the code in this repo, you can simply use git clone ```bash git clone ``` # Contents 1. [MLOps](#mlops) 1. [Data Management Tools](#data-management-tools) 1. [Testing](#testing) 1. [Productive Tools](#productive-tools) 1. [Python Helper Tools](#python-helper-tools) 1. [Tools for Deployment](#tools-for-deployment) 1. [Speed-up Tools](#speed-up-tools) 1. [Math Tools](#math-tools) 1. [Machine Learning](#machine-learning) 1. [Natural Language Processing](#natural-language-processing) 1. [Computer Vision](#computer-vision) 1. [Time Series](#time-series) 1. [Feature Engineering](#feature-engineering) 1. [Visualization](#visualization) 1. [Mathematical Programming](#mathematical-programming) 1. [Scraping](#scraping) 1. [Python](#python) 1. [Logging and Debugging](#logging-and-debugging) 1. [Linear Algebra](#linear-algebra) 1. [Data Structure](#data-structure) 1. [Statistics](#statistics) 1. [Web Applications](#web-applications) 1. [Share Insights](#share-insights) 1. [Cool Tools](#cool-tools) 1. [Learning Tips](#learning-tips) 1. [Productive Tips](#productive-tips) 1. [VSCode](#vscode) 1. [Book Review](#book-review) 1. [Data Science Portfolio](#data-science-portfolio) # MLOps | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | Stop Hard Coding in a Data Science Project – Use Configuration Files Instead | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Poetry: A Better Way to Manage Python Dependencies | [πŸ”—]( | | [πŸ”—]( | Git for Data Scientists: Learn Git through Practical Examples | [πŸ”—]( | | [πŸ”—]( | Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of Code | [πŸ”—]( | [πŸ”—]( | Kedro β€” A Python Framework for Reproducible Data Science Project | [πŸ”—]( | [πŸ”—]( | Orchestrate a Data Science Project in Python With Prefect | [πŸ”—]( | [πŸ”—]( | Orchestrate Your Data Science Project with Prefect 2.0 | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | DagsHub: a GitHub Supplement for Data Scientists and ML Engineers | [πŸ”—]( | [πŸ”—]( | 4 pre-commit Plugins to Automate Code Reviewing and Formatting in Python | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | BentoML: Create an ML Powered Prediction Service in Minutes | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | How to Structure a Data Science Project for Maintainability (with DVC) | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | How to Structure an ML Project for Reproducibility and Maintainability (with Prefect) | [πŸ”—]( | [πŸ”—]( | GitHub Actions in MLOps: Automatically Check and Deploy Your ML Model | [πŸ”—]( | [πŸ”—]( | Create Robust Data Pipelines with Prefect, Docker, and GitHub | [πŸ”—]( | [πŸ”—]( | Create a Maintainable Data Pipeline with Prefect and DVC | [πŸ”—]( | [πŸ”—]( | Build a Full-Stack ML Application With Pydantic And Prefect | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Streamline Code Updates with DVC and GitHub Actions | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Create Observable and Reproducible Notebooks with Hex | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Build Reliable Machine Learning Pipelines with Continuous Integration | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Automate Machine Learning Deployment with GitHub Actions | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | How to Build a Fully Automated Data Drift Detection Pipeline | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Data Management Tools | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| |Introduction to DVC: Data Version Control Tool for Machine Learning Projects | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Great Expectations: Always Know What to Expect From Your Data | [πŸ”—]( | [πŸ”—]( | Validate Your pandas DataFrame with Pandera | [πŸ”—]( |[πŸ”—]( | [πŸ”—]( | Introduction to Schema: A Python Libary to Validate your Data | [πŸ”—]( | [πŸ”—]( | How to Create Fake Data with Faker | [πŸ”—]( | [πŸ”—]( | | Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for Testing | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | What is dbt (data build tool) and When should you use it? | [πŸ”—]( | [πŸ”—](| [πŸ”—]( | Streamline dbt Model Development with Notebook-Style Workspace | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Testing | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | Pytest for Data Scientists | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | 4 Lessor-Known Yet Awesome Tips forΒ Pytest | [πŸ”—]( | [πŸ”—]( | DeepDiff β€” Recursively Find and Ignore Trivial Differences Using Python | [πŸ”—]( | [πŸ”—]( | Checklist β€” Behavioral Testing of NLP Models | [πŸ”—]( | [πŸ”—]( | Detect Defects in a Data Pipeline Early with Validation and Notifications | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Write Readable Tests for Your Machine Learning Models with Behave | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Productive Tools | Title | Article | Repository | | ------------- |:-------------:| :-----:| | 3 Tools to Track and Visualize the Execution of your Python Code | [πŸ”—]( | [πŸ”—]( | 2 Tools to Automatically Reload when Python Files Change | [πŸ”—]( | [πŸ”—]( | 3 Ways to Get Notified with Python | [πŸ”—]( | [πŸ”—]( | | How to Create Reusable Command-Line | [πŸ”—]( | | How to Strip Outputs and Execute Interactive Code in a Python Script | [πŸ”—]( | [πŸ”—]( | Sending Slack Notifications in Python with Prefect| [πŸ”—]( | [πŸ”—]( # Python Helper Tools | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | Pydash: A Kitchen Sink of Missing Python Utilities | [πŸ”—]( | [πŸ”—]( | Write Clean Python Code Using Pipes | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Introducing FugueSQL β€” SQL for Pandas, Spark, and Dask DataFrames | [πŸ”—]( | [πŸ”—]( | Fugue and DuckDB: Fast SQL Code in Python | [πŸ”—]( | [πŸ”—]( | Simplify Data Science Workflows on BigQuery with Fugue and Python | [πŸ”—]( | [πŸ”—]( # Tools for Deployment | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Effortlessly Publish your Python Package to PyPI Using Poetry | [πŸ”—]( | [πŸ”—]( | Typer: Build Powerful CLIs in One Line of Code using Python | [πŸ”—]( | [πŸ”—]( # Speed-up Tools | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Cython-A Speed-Up Tool for your Python Function | [πŸ”—]( | [πŸ”—]( | | Train your Machine Learning Model 150x Faster with cuML | [πŸ”—]( | [πŸ”—]( # Math Tools | Title | Article | Repository | | ------------- |:-------------:| :-----:| | SymPy: Symbolic Computation in Python | [πŸ”—]( | [πŸ”—]( # Machine Learning | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | How to Monitor And Log your Machine Learning Experiment Remotely with HyperDash | [πŸ”—]( | [πŸ”—]( | | How to Efficiently Fine-Tune your Machine Learning Models | [πŸ”—]( | [πŸ”—]( | | How to Learn Non-linear Dataset with Support Vector Machines | [πŸ”—]( | [πŸ”—]( | | Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private Data | [πŸ”—]( | [πŸ”—]( | 3 Steps to Improve your Efficiency when Hypertuning ML Models | [πŸ”—]( | human-learn: Create a Human Learning Model by Drawing | [πŸ”—]( | [πŸ”—]( | Patsy: Build Powerful Features with Arbitrary Python Code | [πŸ”—]( | [πŸ”—]( | SHAP: Explain Any Machine Learning Model in Python | [πŸ”—]( | [πŸ”—]( | Predict Movie Ratings with User-Based Collaborative Filtering | [πŸ”—]( | [πŸ”—]( | River: Online Machine Learning in Python | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Human-Learn: Rule-Based Learning as an Alternative to Machine Learning | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Natural Language Processing | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | Sentiment Analysis of LinkedInΒ Messages| [πŸ”—]( | [πŸ”—]( | | Find Common Words in Article with Python Module Newspaper and NLTK| [πŸ”—]( | [πŸ”—]( | | How to Tokenize Tweets with Python | [πŸ”—]( | [πŸ”—]( | | How to Solve Analogies with Word2Vec | [πŸ”—]( | [πŸ”—]( | | What is PyTorch | [πŸ”—]( | [πŸ”—]( | | Convolutional Neural Network in Natural Language Processing | [πŸ”—]( | [πŸ”—]( | | Supercharge your Python String with TextBlob | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should Know | [πŸ”—]( | [πŸ”—]( | Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge | [πŸ”—]( | [πŸ”—]( | Build a Robust Conversational Assistant with Rasa | [πŸ”—]( | [πŸ”—]( | I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I Found | [πŸ”—]( | [πŸ”—]( | Checklist β€” Behavioral Testing of NLP Models | [πŸ”—]( | [πŸ”—]( | PRegEx: Write Human-Readable Regular Expressions in Python | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrame | [πŸ”—]( | [πŸ”—]( # Computer Vision | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Create an App to Classify Dogs Using fastai and Streamlit | [πŸ”—]( | [πŸ”—]( # Time Series | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Kats: a Generalizable Framework to Analyze Time Series Data in Python | [πŸ”—]( | [πŸ”—]( | How to Detect Seasonality, Outliers, and Changepoints in Your Time Series | [πŸ”—]( | [πŸ”—]( | 4 Tools to Automatically Extract Data from Datetime in Python | [πŸ”—]( | [πŸ”—]( # Feature Engineering | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | 3 Ways to Extract Features from Dates with Python | [πŸ”—]( | [πŸ”—]( | Similarity Encoding for Dirty Categories Using dirty_cat | [πŸ”—]( | [πŸ”—]( | Snorkel β€” A Human-In-The-Loop Platform to Build Training Data | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Visualization | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | How to Embed Interactive Charts on your Articles and Personal Website | [πŸ”—]( | [πŸ”—]( | | What I Learned from Scraping 15k Data Science Articles on Medium | [πŸ”—]( | [πŸ”—]( | | How to Create Interactive Plots with Altair | [πŸ”—]( | [πŸ”—]( | | How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization Tool | [πŸ”—]( | [πŸ”—]( | | I Scraped more than 1k Top Machine Learning Github Profiles and this is what I Found | [πŸ”—]( | [πŸ”—]( | Top 6 Python Libraries for Visualization: Which one to Use? | [πŸ”—]( | [πŸ”—]( | Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning Model | [πŸ”—]( | [πŸ”—]( | Visualize Gender-Specific Tweets with Scattertext | [πŸ”—]( | [πŸ”—]( | Visualize Your Team’s Projects Using Python Gantt Chart | [πŸ”—]( | [πŸ”—]( | How to Create Bindings and Conditions Between Multiple Plots Using Altair | [πŸ”—]( | [πŸ”—]( | How to Sketch your Data Science Ideas With Excalidraw | [πŸ”—]( | | Pyvis: Visualize Interactive Network Graphs in Python | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Build and Analyze Knowledge Graphs with Diffbot | [πŸ”—]( | Observe The Friend Paradox in Facebook Data Using Python | [πŸ”—]( | [πŸ”—]( | What skills and backgrounds do data scientists have in common? | [πŸ”—]( | [πŸ”—]( | Visualize Similarities Between Companies With Graph Database | [πŸ”—]( | [πŸ”—]( | Visualize GitHub Social Network with PyGraphistry | [πŸ”—]( | [πŸ”—]( | Find the Top Bootcamps for Data Professionals From Over 5k Profiles | [πŸ”—]( | [πŸ”—]( | floWeaver β€” Turn Flow Data Into a Sankey Diagram In Python | [πŸ”—]( | [πŸ”—]( | atoti β€” Build a BI Platform in Python | [πŸ”—]( | [πŸ”—]( | Analyze and Visualize URLs with Network Graph | [πŸ”—]( | [πŸ”—]( | statsannotations: Add Statistical Significance Annotations on Seaborn Plots | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( # Mathematical Programming | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to choose stocks to invest in with Python | [πŸ”—]( | [πŸ”—]( | | Maximize your Productivity with Python | [πŸ”—]( | [πŸ”—]( | How to Find a Good Match with Python | [πŸ”—]( | [πŸ”—]( | How to Solve a Staff Scheduling Problem with Python | [πŸ”—]( | [πŸ”—]( | How to Find Best Locations for your Restaurants with Python | [πŸ”—]( | [πŸ”—]( | How to Schedule Flights in Python | [πŸ”—]( | [πŸ”—]( | How to Solve a Production Planning and Inventory Problem in Python | [πŸ”—]( | [πŸ”—]( # Scraping | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Web Scrape Movie Database with Beautiful Soup | [πŸ”—]( | [πŸ”—]( | | top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of Code | [πŸ”—]( | [πŸ”—]( # Python | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | 6 Common Mistakes to Avoid in Data Science Code | [πŸ”—]( | | [πŸ”—]( | 5 Steps to Transform Messy Functions into Production-Ready Code | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Numpy Tricks for your Data Science Projects| [πŸ”—]( | [πŸ”—]( | | Timing for Efficient Python Code | [πŸ”—]( | [πŸ”—]( | | How to Use Lambda for Efficient Python Code | [πŸ”—]( | [πŸ”—]( | | Python Tricks for Keeping Track of Your Data | [πŸ”—]( | [πŸ”—]( | | Boost Your Efficiency With Specialized Dictionary Implementations in Python | [πŸ”—]( | [πŸ”—]( | | Dictionary as an Alternative to If-Else | [πŸ”—]( | [πŸ”—]( | | How to Use Zip to Manipulate a List of Tuples | [πŸ”—]( | [πŸ”—]( | | Get the Most out of Your Array With These Four Numpy Methods | [πŸ”—]( | [πŸ”—]( | 3 Python Tricks to Read, Create, and Run Multiple Files Automatically | [πŸ”—]( | [πŸ”—]( | How to Exclude the Outliers in Pandas DataFrame | [πŸ”—]( | [πŸ”—]( | Python Clean Code: 6 Best Practices to Make Your Python Functions More Readable | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | 3 Techniques to Effortlessly Import and Execute Python Modules | [πŸ”—]( | [πŸ”—]( | Simplify Your Functions with Functools’ Partial and Singledispatch | [πŸ”—]( | [πŸ”—]( # Logging and Debugging | Title | Article | Repository | Video | ------------- |:-------------:| :-----:| :-----:| | How to Create and View Interactive Cheatsheets on the Command-line | [πŸ”—]( | | Understand CSV Files from your Terminal with XSV | [πŸ”—]( | Prettify your Terminal Text With Termcolor and Pyfiglet| [πŸ”—]( | [πŸ”—]( | | Loguru: Simple as Print, Flexible as Logging | [πŸ”—]( | [πŸ”—]( | [πŸ”—]( | Stop Using Print to Debug in Python. Use Icecream Instead | [πŸ”—]( | Rich: Generate Rich and Beautiful Text in the Terminal with Python | [πŸ”—]( | [πŸ”—]( | Create a Beautiful Dashboard in your Terminal with Wtfutil | [πŸ”—]( | [πŸ”—]( | 3 Tools to Monitor and Optimize your Linux System | [πŸ”—]( | Ptpython: A Better Python REPL | [πŸ”—]( | [πŸ”—]( | fd: a Simple but Powerful Tool to Find and Execute Files on the Command Line | [πŸ”—]( | Speed Up your Command-Line Navigation with These 3 Tools | [πŸ”—]( | Python and Data Science Snippets on the Command Line | [πŸ”—]( | [πŸ”—]( # Statistics | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Can Datasets of a Dinosaur and a Circle have Identical Statistics? | [πŸ”—]( | [πŸ”—]( |Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two Groups | [πŸ”—](| [πŸ”—]( | Bayes’ Theorem, Clearly Explained with Visualization | [πŸ”—]( | [πŸ”—]( | Detect Change Points with Bayesian Inference and PyMC3 | [πŸ”—]( | [πŸ”—]( | Bayesian Linear Regression with Bambi | [πŸ”—]( | [πŸ”—]( | Earn More Salary as a Coder β€” Higher Degree or More Years of Experience? | [πŸ”—]( | [πŸ”—]( # Linear Algebra | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Build a Matrix Module from Scratch | [πŸ”—]( | [πŸ”—]( | | Linear Algebra for Machine Learning: Solve a System of Linear Equations | [πŸ”—]( | [πŸ”—]( | # Data Structure | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Convex Hull: An Innovative Approach to Gift-Wrap your Data | [πŸ”—]( | [πŸ”—]( | | How to Visualize Social Network With Graph Theory | [πŸ”—]( | [πŸ”—]( | | How to Search Data with KDTree | [πŸ”—]( | [πŸ”—]( | | How to Find the Nearest Hospital with a Voronoi Diagram | [πŸ”—]( | [πŸ”—]( # Web Applications | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Create an Interactive Startup Growth Calculator with Python | [πŸ”—]( | [πŸ”—]( | Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain Knowledge | [πŸ”—]( | [πŸ”—]( | PyWebIO: Write Interactive Web App in Script Way Using Python | [πŸ”—]( | [πŸ”—]( | PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another Input | [πŸ”—]( | [πŸ”—]( | Create an App to Deal with Boredom Using PyWebIO | [πŸ”—]( | [πŸ”—]( | Build a Robust Workflow to Visualize Trending GitHub Repositories in Python | [πŸ”—]( | [πŸ”—]( # Share Insights | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Introduction to Datapane: A Python Library to Build Interactive Reports | [πŸ”—]( | | Datapane’s New Features: Create a Beautiful Dashboard in Python in a Few Lines of Code | [πŸ”—]( | [πŸ”—]( | Introduction to Datasette: Explore and Publish Your Data in One Line of Code | [πŸ”—]( | How to Share your Python Objects Across Different Environments in One Line of Code | [πŸ”—]( | [πŸ”—]( | | How to Share your Jupyter Notebook in 3 Lines of Code with Ngrok | [πŸ”—]( | | Introduction to Deepnote: Real-time Collaboration on Jupyter Notebook | [πŸ”—]( # Cool Tools | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Simulate Real-life Events in Python Using SimPy | [πŸ”—]( | [πŸ”—]( | How to Create Mathematical Animations like 3Blue1Brown Using Python |[πŸ”—]( | [πŸ”—]( # Learning Tips | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Learn Data Science when Life does not Give You a Break | [πŸ”—]( | | How to Accelerate your Data Science Career by Putting yourself in the Right Environment | [πŸ”—]( | | To become a Better Data Scientist, you need to Think like a Programmer | [πŸ”—]( | | How not to be Overwhelmed with Data Science | [πŸ”—]( # Productive Tips | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Organize your Data Science Articles with Github | [πŸ”—]( | [πŸ”—]( | | 5 Reasons why you should Switch from Jupyter Notebook to Scripts | [πŸ”—]( | | 7 Reasons Why you Should Start Documenting your Code | [πŸ”—]( # VSCode | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Leverage Visual Studio Code for your Data Science Projects | [πŸ”—]( | | Top 4 Code Viewers for Data Scientist in VSCode | [πŸ”—]( | | Incorporate the Best Practices for Python with These Top 4 VSCode Extensions | [πŸ”—]( | Boost Your Efficiency with Customized Code Snippets on VSCode | [πŸ”—]( | | Top 9 Keyboard Shortcuts in VSCode for Data Scientists | [πŸ”—]( | # Book Review | Title | Article | Repository | | ------------- |:-------------:| :-----:| | Python Machine Learning: A Comprehensive Handbook for Machine Learning | [πŸ”—]( | # Data Science Portfolio | Title | Article | Repository | | ------------- |:-------------:| :-----:| | How to Create an Elegant Website for your Data Science Portfolio in 10 minutes | [πŸ”—](| | Build an Impressive Github Profile in 3 Steps | [πŸ”—](