Home

Awesome

<img src="img/Social media_facebook.jpg">

Data Science

View on GitHub Daily Data Science Tips View on YouTube

Collection of useful data science topics along with articles and videos.

The Data Scientist’s Toolkit: 100+ Essential Tools for Modern Analytics

To receive a condensed overview of these tools and additional resources, sign up for CodeCut's free PDF guide. This comprehensive 264-page document covers over 100 essential data science tools, providing you with a valuable reference for your work.

How to Download the Code in This Repository to Your Local Machine

To download the code in this repo, you can simply use git clone

git clone https://github.com/khuyentran1401/Data-science

Contents

  1. MLOps
  2. Data Management Tools
  3. Testing
  4. Productive Tools
  5. Python Helper Tools
  6. Tools for Deployment
  7. Speed-up Tools
  8. Math Tools
  9. Machine Learning
  10. Natural Language Processing
  11. Computer Vision
  12. Time Series
  13. Feature Engineering
  14. Visualization
  15. Mathematical Programming
  16. Scraping
  17. Python
  18. Logging and Debugging
  19. Linear Algebra
  20. Data Structure
  21. Statistics
  22. Web Applications
  23. Share Insights
  24. Cool Tools
  25. Learning Tips
  26. Productive Tips
  27. VSCode
  28. Book Review
  29. Data Science Portfolio

MLOps

TitleArticleRepositoryVideo
Stop Hard Coding in a Data Science Project – Use Configuration Files InsteadπŸ”—πŸ”—πŸ”—
Poetry: A Better Way to Manage Python DependenciesπŸ”—πŸ”—
Git for Data Scientists: Learn Git through Practical ExamplesπŸ”—πŸ”—
Introduction to Weight & Biases: Track and Visualize your Machine Learning Experiments in 3 Lines of CodeπŸ”—πŸ”—
Kedro β€” A Python Framework for Reproducible Data Science ProjectπŸ”—πŸ”—
Orchestrate a Data Science Project in Python With PrefectπŸ”—πŸ”—
Orchestrate Your Data Science Project with Prefect 2.0πŸ”—πŸ”—πŸ”—
DagsHub: a GitHub Supplement for Data Scientists and ML EngineersπŸ”—πŸ”—
4 pre-commit Plugins to Automate Code Reviewing and Formatting in PythonπŸ”—πŸ”—πŸ”—
BentoML: Create an ML Powered Prediction Service in MinutesπŸ”—πŸ”—πŸ”—
How to Structure a Data Science Project for Maintainability (with DVC)πŸ”—πŸ”—πŸ”—
How to Structure an ML Project for Reproducibility and Maintainability (with Prefect)πŸ”—πŸ”—
GitHub Actions in MLOps: Automatically Check and Deploy Your ML ModelπŸ”—πŸ”—
Create Robust Data Pipelines with Prefect, Docker, and GitHubπŸ”—πŸ”—
Create a Maintainable Data Pipeline with Prefect and DVCπŸ”—πŸ”—
Build a Full-Stack ML Application With Pydantic And PrefectπŸ”—πŸ”—πŸ”—
Streamline Code Updates with DVC and GitHub ActionsπŸ”—πŸ”—πŸ”—
Create Observable and Reproducible Notebooks with HexπŸ”—πŸ”—πŸ”—
Build Reliable Machine Learning Pipelines with Continuous IntegrationπŸ”—πŸ”—πŸ”—
Automate Machine Learning Deployment with GitHub ActionsπŸ”—πŸ”—πŸ”—
How to Build a Fully Automated Data Drift Detection PipelineπŸ”—πŸ”—πŸ”—

Data Management Tools

TitleArticleRepositoryVideo
Introduction to DVC: Data Version Control Tool for Machine Learning ProjectsπŸ”—πŸ”—πŸ”—
Great Expectations: Always Know What to Expect From Your DataπŸ”—πŸ”—
Validate Your pandas DataFrame with PanderaπŸ”—πŸ”—πŸ”—
Introduction to Schema: A Python Libary to Validate your DataπŸ”—πŸ”—
How to Create Fake Data with FakerπŸ”—πŸ”—
Hypothesis and Pandera: Generate Synthesis Pandas DataFrame for TestingπŸ”—πŸ”—πŸ”—
What is dbt (data build tool) and When should you use it?πŸ”—πŸ”—πŸ”—
Streamline dbt Model Development with Notebook-Style WorkspaceπŸ”—πŸ”—πŸ”—

Testing

TitleArticleRepositoryVideo
Pytest for Data ScientistsπŸ”—πŸ”—πŸ”—
4 Lessor-Known Yet Awesome Tips forΒ PytestπŸ”—πŸ”—
DeepDiff β€” Recursively Find and Ignore Trivial Differences Using PythonπŸ”—πŸ”—
Checklist β€” Behavioral Testing of NLP ModelsπŸ”—πŸ”—
Detect Defects in a Data Pipeline Early with Validation and NotificationsπŸ”—πŸ”—πŸ”—
Write Readable Tests for Your Machine Learning Models with BehaveπŸ”—πŸ”—πŸ”—

Productive Tools

TitleArticleRepository
3 Tools to Track and Visualize the Execution of your Python CodeπŸ”—πŸ”—
2 Tools to Automatically Reload when Python Files ChangeπŸ”—πŸ”—
3 Ways to Get Notified with PythonπŸ”—πŸ”—
How to Create Reusable Command-LineπŸ”—
How to Strip Outputs and Execute Interactive Code in a Python ScriptπŸ”—πŸ”—
Sending Slack Notifications in Python with PrefectπŸ”—πŸ”—

Python Helper Tools

TitleArticleRepositoryVideo
Pydash: A Kitchen Sink of Missing Python UtilitiesπŸ”—πŸ”—
Write Clean Python Code Using PipesπŸ”—πŸ”—πŸ”—
Introducing FugueSQL β€” SQL for Pandas, Spark, and Dask DataFramesπŸ”—πŸ”—
Fugue and DuckDB: Fast SQL Code in PythonπŸ”—πŸ”—
Simplify Data Science Workflows on BigQuery with Fugue and PythonπŸ”—πŸ”—

Tools for Deployment

TitleArticleRepository
How to Effortlessly Publish your Python Package to PyPI Using PoetryπŸ”—πŸ”—
Typer: Build Powerful CLIs in One Line of Code using PythonπŸ”—πŸ”—

Speed-up Tools

TitleArticleRepository
Cython-A Speed-Up Tool for your Python FunctionπŸ”—πŸ”—
Train your Machine Learning Model 150x Faster with cuMLπŸ”—πŸ”—

Math Tools

TitleArticleRepository
SymPy: Symbolic Computation in PythonπŸ”—πŸ”—

Machine Learning

TitleArticleRepositoryVideo
How to Monitor And Log your Machine Learning Experiment Remotely with HyperDashπŸ”—πŸ”—
How to Efficiently Fine-Tune your Machine Learning ModelsπŸ”—πŸ”—
How to Learn Non-linear Dataset with Support Vector MachinesπŸ”—πŸ”—
Introduction to IBM Federated Learning: A Collaborative Approach to Train ML Models on Private DataπŸ”—πŸ”—
3 Steps to Improve your Efficiency when Hypertuning ML ModelsπŸ”—
human-learn: Create a Human Learning Model by DrawingπŸ”—πŸ”—
Patsy: Build Powerful Features with Arbitrary Python CodeπŸ”—πŸ”—
SHAP: Explain Any Machine Learning Model in PythonπŸ”—πŸ”—
Predict Movie Ratings with User-Based Collaborative FilteringπŸ”—πŸ”—
River: Online Machine Learning in PythonπŸ”—πŸ”—πŸ”—
Human-Learn: Rule-Based Learning as an Alternative to Machine LearningπŸ”—πŸ”—πŸ”—

Natural Language Processing

TitleArticleRepositoryVideo
Sentiment Analysis of LinkedInΒ MessagesπŸ”—πŸ”—
Find Common Words in Article with Python Module Newspaper and NLTKπŸ”—πŸ”—
How to Tokenize Tweets with PythonπŸ”—πŸ”—
How to Solve Analogies with Word2VecπŸ”—πŸ”—
What is PyTorchπŸ”—πŸ”—
Convolutional Neural Network in Natural Language ProcessingπŸ”—πŸ”—
Supercharge your Python String with TextBlobπŸ”—πŸ”—πŸ”—
pyLDAvis: Topic Modelling Exploration Tool That Every NLP Data Scientist Should KnowπŸ”—πŸ”—
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain KnowledgeπŸ”—πŸ”—
Build a Robust Conversational Assistant with RasaπŸ”—πŸ”—
I Analyzed 2k Data Scientist and Data Engineer Jobs and This is What I FoundπŸ”—πŸ”—
Checklist β€” Behavioral Testing of NLP ModelsπŸ”—πŸ”—
PRegEx: Write Human-Readable Regular Expressions in PythonπŸ”—πŸ”—πŸ”—
Texthero: Text Preprocessing, Representation, and Visualization for a pandas DataFrameπŸ”—πŸ”—

Computer Vision

TitleArticleRepository
How to Create an App to Classify Dogs Using fastai and StreamlitπŸ”—πŸ”—

Time Series

TitleArticleRepository
Kats: a Generalizable Framework to Analyze Time Series Data in PythonπŸ”—πŸ”—
How to Detect Seasonality, Outliers, and Changepoints in Your Time SeriesπŸ”—πŸ”—
4 Tools to Automatically Extract Data from Datetime in PythonπŸ”—πŸ”—

Feature Engineering

TitleArticleRepositoryVideo
3 Ways to Extract Features from Dates with PythonπŸ”—πŸ”—
Similarity Encoding for Dirty Categories Using dirty_catπŸ”—πŸ”—
Snorkel β€” A Human-In-The-Loop Platform to Build Training DataπŸ”—πŸ”—πŸ”—

Visualization

TitleArticleRepositoryVideo
How to Embed Interactive Charts on your Articles and Personal WebsiteπŸ”—πŸ”—
What I Learned from Scraping 15k Data Science Articles on MediumπŸ”—πŸ”—
How to Create Interactive Plots with AltairπŸ”—πŸ”—
How to Create a Drop-Down Menu and a Slide Bar for your Favorite Visualization ToolπŸ”—πŸ”—
I Scraped more than 1k Top Machine Learning Github Profiles and this is what I FoundπŸ”—πŸ”—
Top 6 Python Libraries for Visualization: Which one to Use?πŸ”—πŸ”—
Introduction to Yellowbrick: A Python Library to Visualize the Prediction of your Machine Learning ModelπŸ”—πŸ”—
Visualize Gender-Specific Tweets with ScattertextπŸ”—πŸ”—
Visualize Your Team’s Projects Using Python Gantt ChartπŸ”—πŸ”—
How to Create Bindings and Conditions Between Multiple Plots Using AltairπŸ”—πŸ”—
How to Sketch your Data Science Ideas With ExcalidrawπŸ”—
Pyvis: Visualize Interactive Network Graphs in PythonπŸ”—πŸ”—πŸ”—
Build and Analyze Knowledge Graphs with DiffbotπŸ”—
Observe The Friend Paradox in Facebook Data Using PythonπŸ”—πŸ”—
What skills and backgrounds do data scientists have in common?πŸ”—πŸ”—
Visualize Similarities Between Companies With Graph DatabaseπŸ”—πŸ”—
Visualize GitHub Social Network with PyGraphistryπŸ”—πŸ”—
Find the Top Bootcamps for Data Professionals From Over 5k ProfilesπŸ”—πŸ”—
floWeaver β€” Turn Flow Data Into a Sankey Diagram In PythonπŸ”—πŸ”—
atoti β€” Build a BI Platform in PythonπŸ”—πŸ”—
Analyze and Visualize URLs with Network GraphπŸ”—πŸ”—
statsannotations: Add Statistical Significance Annotations on Seaborn PlotsπŸ”—πŸ”—πŸ”—

Mathematical Programming

TitleArticleRepository
How to choose stocks to invest in with PythonπŸ”—πŸ”—
Maximize your Productivity with PythonπŸ”—πŸ”—
How to Find a Good Match with PythonπŸ”—πŸ”—
How to Solve a Staff Scheduling Problem with PythonπŸ”—πŸ”—
How to Find Best Locations for your Restaurants with PythonπŸ”—πŸ”—
How to Schedule Flights in PythonπŸ”—πŸ”—
How to Solve a Production Planning and Inventory Problem in PythonπŸ”—πŸ”—

Scraping

TitleArticleRepository
Web Scrape Movie Database with Beautiful SoupπŸ”—πŸ”—
top-github-scraper: Scrape Top Github Users and Repositories Based On a Keyword in One Line of CodeπŸ”—πŸ”—

Python

TitleArticleRepositoryVideo
6 Common Mistakes to Avoid in Data Science CodeπŸ”—πŸ”—
5 Steps to Transform Messy Functions into Production-Ready CodeπŸ”—πŸ”—πŸ”—
Numpy Tricks for your Data Science ProjectsπŸ”—πŸ”—
Timing for Efficient Python CodeπŸ”—πŸ”—
How to Use Lambda for Efficient Python CodeπŸ”—πŸ”—
Python Tricks for Keeping Track of Your DataπŸ”—πŸ”—
Boost Your Efficiency With Specialized Dictionary Implementations in PythonπŸ”—πŸ”—
Dictionary as an Alternative to If-ElseπŸ”—πŸ”—
How to Use Zip to Manipulate a List of TuplesπŸ”—πŸ”—
Get the Most out of Your Array With These Four Numpy MethodsπŸ”—πŸ”—
3 Python Tricks to Read, Create, and Run Multiple Files AutomaticallyπŸ”—πŸ”—
How to Exclude the Outliers in Pandas DataFrameπŸ”—πŸ”—
Python Clean Code: 6 Best Practices to Make Your Python Functions More ReadableπŸ”—πŸ”—πŸ”—
3 Techniques to Effortlessly Import and Execute Python ModulesπŸ”—πŸ”—
Simplify Your Functions with Functools’ Partial and SingledispatchπŸ”—πŸ”—

Logging and Debugging

TitleArticleRepositoryVideo
How to Create and View Interactive Cheatsheets on the Command-lineπŸ”—
Understand CSV Files from your Terminal with XSVπŸ”—
Prettify your Terminal Text With Termcolor and PyfigletπŸ”—πŸ”—
Loguru: Simple as Print, Flexible as LoggingπŸ”—πŸ”—πŸ”—
Stop Using Print to Debug in Python. Use Icecream InsteadπŸ”—
Rich: Generate Rich and Beautiful Text in the Terminal with PythonπŸ”—πŸ”—
Create a Beautiful Dashboard in your Terminal with WtfutilπŸ”—πŸ”—
3 Tools to Monitor and Optimize your Linux SystemπŸ”—
Ptpython: A Better Python REPLπŸ”—πŸ”—
fd: a Simple but Powerful Tool to Find and Execute Files on the Command LineπŸ”—
Speed Up your Command-Line Navigation with These 3 ToolsπŸ”—
Python and Data Science Snippets on the Command LineπŸ”—πŸ”—

Statistics

TitleArticleRepository
Can Datasets of a Dinosaur and a Circle have Identical Statistics?πŸ”—πŸ”—
Introduction to One-Way ANOVA: A Test to Compare the Means between More than Two GroupsπŸ”—πŸ”—
Bayes’ Theorem, Clearly Explained with VisualizationπŸ”—πŸ”—
Detect Change Points with Bayesian Inference and PyMC3πŸ”—πŸ”—
Bayesian Linear Regression with BambiπŸ”—πŸ”—
Earn More Salary as a Coder β€” Higher Degree or More Years of Experience?πŸ”—πŸ”—

Linear Algebra

TitleArticleRepository
How to Build a Matrix Module from ScratchπŸ”—πŸ”—
Linear Algebra for Machine Learning: Solve a System of Linear EquationsπŸ”—πŸ”—

Data Structure

TitleArticleRepository
Convex Hull: An Innovative Approach to Gift-Wrap your DataπŸ”—πŸ”—
How to Visualize Social Network With Graph TheoryπŸ”—πŸ”—
How to Search Data with KDTreeπŸ”—πŸ”—
How to Find the Nearest Hospital with a Voronoi DiagramπŸ”—πŸ”—

Web Applications

TitleArticleRepository
How to Create an Interactive Startup Growth Calculator with PythonπŸ”—πŸ”—
Streamlit and spaCy: Create an App to Predict Sentiment and Word Similarities with Minimal Domain KnowledgeπŸ”—πŸ”—
PyWebIO: Write Interactive Web App in Script Way Using PythonπŸ”—πŸ”—
PyWebIO 1.3.0: Add Tabs, Pin Input, and Update an Input Based on Another InputπŸ”—πŸ”—
Create an App to Deal with Boredom Using PyWebIOπŸ”—πŸ”—
Build a Robust Workflow to Visualize Trending GitHub Repositories in PythonπŸ”—πŸ”—

Share Insights

TitleArticleRepository
Introduction to Datapane: A Python Library to Build Interactive ReportsπŸ”—
Datapane’s New Features: Create a Beautiful Dashboard in Python in a Few Lines of CodeπŸ”—πŸ”—
Introduction to Datasette: Explore and Publish Your Data in One Line of CodeπŸ”—
How to Share your Python Objects Across Different Environments in One Line of CodeπŸ”—πŸ”—
How to Share your Jupyter Notebook in 3 Lines of Code with NgrokπŸ”—
Introduction to Deepnote: Real-time Collaboration on Jupyter NotebookπŸ”—

Cool Tools

TitleArticleRepository
Simulate Real-life Events in Python Using SimPyπŸ”—πŸ”—
How to Create Mathematical Animations like 3Blue1Brown Using PythonπŸ”—πŸ”—

Learning Tips

TitleArticleRepository
How to Learn Data Science when Life does not Give You a BreakπŸ”—
How to Accelerate your Data Science Career by Putting yourself in the Right EnvironmentπŸ”—
To become a Better Data Scientist, you need to Think like a ProgrammerπŸ”—
How not to be Overwhelmed with Data ScienceπŸ”—

Productive Tips

TitleArticleRepository
How to Organize your Data Science Articles with GithubπŸ”—πŸ”—
5 Reasons why you should Switch from Jupyter Notebook to ScriptsπŸ”—
7 Reasons Why you Should Start Documenting your CodeπŸ”—

VSCode

TitleArticleRepository
How to Leverage Visual Studio Code for your Data Science ProjectsπŸ”—
Top 4 Code Viewers for Data Scientist in VSCodeπŸ”—
Incorporate the Best Practices for Python with These Top 4 VSCode ExtensionsπŸ”—
Boost Your Efficiency with Customized Code Snippets on VSCodeπŸ”—
Top 9 Keyboard Shortcuts in VSCode for Data ScientistsπŸ”—

Book Review

TitleArticleRepository
Python Machine Learning: A Comprehensive Handbook for Machine LearningπŸ”—

Data Science Portfolio

TitleArticleRepository
How to Create an Elegant Website for your Data Science Portfolio in 10 minutesπŸ”—
Build an Impressive Github Profile in 3 StepsπŸ”—