Awesome
<!--- # The Data Engineering Cookbook --> <div align="center"> <img width="341" height="426" src="images/CookbookCover.jpg" alt="Data Engineering Cookbook"> <br> <br> <br> </div> <p align="center"> <a href="sections/01-Introduction.md">What is this Book?</a> <a href="#how-to-contribute">How to Contribute</a> <a href="https://www.youtube.com/channel/UCY8mzqqGwl5_bTpBY9qLMAA">YouTube</a> <a <a href="https://twitter.com/andreaskayy">Twitter</a> <a href="https://www.amazon.com/shop/plumbersofdatascience">Amazon Shop</a> </p> <br>If You Like This Book & Need More Help
Check out my Data Engineering Academy at LearnDataEngineering.com trusted by almost 2,000 students!
Visit learndataengineering.com: Click Here
- Learn Data Engineering with our online Academy
- Perfect for becoming a Data Engineer or add Data Engineering to your skillset
- Proven process based on years of experience and hundreds of hours of personal coaching
- Over 30 prepared courses on the most important techniques, fundamental tools and platforms plus our
- Associate Data Engineer Certification
- Academy Discord server with over 1,000 members
Support This Book For Free!
- Amazon: Click Here buy whatever you like from Amazon using this link* (Also check out my complete podcast gear and books)
Here's what's new:
Find the change log with all recent updates here: SEE UPDATES
Contents:
- Introduction
- Basic Engineering Skills
- Advanced Engineering Skills
- Free Hands On Courses / Tutorials‚
- Case Studies
- Best Practices Cloud Platforms
- 130+ Data Sources Data Science
- 1001 Interview Questions
- Recommended Books, Courses, and Podcasts
- Updates
Full Table Of Contents:
Introduction
- What is this Cookbook
- Data Engineers
- My Data Science Platform Blueprint
- Who Companies Need
- How to Learn Data Engineering
- Data Engineers Skills Matrix
- How to Become a Senior Data Engineer
Basic Engineering Skills
- Learn To Code
- Get Familiar With Git
- Agile Development
- Software Engineering Culture
- Learn how a Computer Works
- Data Network Transmission
- Security and Privacy
- Linux
- Docker
- The Cloud
- Security Zone Design
Advanced Engineering Skills
- Data Science Platform
- 81 Platform & Pipeline Design Questions
- Connect
- Buffer
- Processing Frameworks
- Lambda and Kappa Architecture
- Batch Processing
- Stream Processing
- Should You do Stream or Batch Processing
- Is ETL still relevant for Analytics?
- MapReduce
- Apache Spark
- What is the Difference to MapReduce?
- How Spark Fits to Hadoop
- Spark vs Hadoop
- Spark and Hadoop a Perfect Fit
- Spark on YARn
- My Simple Rule of Thumb
- Available Languages
- Spark Driver Executor and SparkContext
- Spark Batch vs Stream processing
- How Spark uses Data From Hadoop
- What are RDDs and How to Use Them
- SparkSQL How and Why to Use It
- What are Dataframes and How to Use Them
- Machine Learning on Spark (TensorFlow)
- MLlib
- Spark Setup
- Spark Resource Management
- AWS Lambda
- Apache Flink
- Elasticsearch
- Apache Drill
- StreamSets
- Store
- Visualize
- Machine Learning
- How to do Machine Learning in production
- Why machine learning in production is harder then you think
- Models Do Not Work Forever
- Where are The Platforms That Support Machine Learning
- Training Parameter Management
- How to Convince People That Machine Learning Works
- No Rules No Physical Models
- You Have The Data. Use It!
- Data is Stronger Than Opinions
- AWS Sagemaker
Hands On Course
- Free Data Engineering Course with AWS, TDengine, Docker and Grafana
- Monitor your data in dbt & detect quality issues with Elementary
- Solving Engineers 4 Biggest Airflow Problems
- The best alternative to Airlfow? Mage.ai
Case Studies
- Data Science @Airbnb
- Data Science @Amazon
- Data Science @Baidu
- Data Science @Blackrock
- Data Science @BMW
- Data Science @Booking.com
- Data Science @CERN
- Data Science @Disney
- Data Science @DLR
- Data Science @Drivetribe
- Data Science @Dropbox
- Data Science @Ebay
- Data Science @Expedia
- Data Science @Facebook
- Data Science @Google
- Data Science @Grammarly
- Data Science @ING Fraud
- Data Science @Instagram
- Data Science @LinkedIn
- Data Science @Lyft
- Data Science @NASA
- Data Science @Netflix
- Data Science @OLX
- Data Science @OTTO
- Data Science @Paypal
- Data Science @Pinterest
- Data Science @Salesforce
- Data Science @Siemens Mindsphere
- Data Science @Slack
- Data Science @Spotify
- Data Science @Symantec
- Data Science @Tinder
- Data Science @Twitter
- Data Science @Uber
- Data Science @Upwork
- Data Science @Woot
- Data Science @Zalando
Best Practices Cloud Platforms
130+ Free Data Sources For Data Science
- General And Academic
- Content Marketing
- Crime
- Drugs
- Education
- Entertainment
- Environmental And Weather Data
- Financial And Economic Data
- Government And World
- Health
- Human Rights
- Labor And Employment Data
- Politics
- Retail
- Social
- Travel And Transportation
- Various Portals
- Source Articles and Blog Posts
- Free Data Sources Data Science
1001 Interview Questions
Recommended Books, Courses, and Podcasts
How To Contribute
If you have some cool links or topics for the cookbook, please become a contributor.
Simply pull the repo, add your ideas and create a pull request. You can also open an issue and put your thoughts there.
Please use the "Issues" function for comments.
Important Links
Subscribe to my YouTube channel for regular updates: Link to YouTube
I have a Medium publication where you can publish your data engineer articles to reach more people: Medium publication
<br> *(As an Amazon Associate I earn from qualifying purchases from Amazon This is free of charge for you, but super helpful for supporting this channel)