Awesome
Data Science and Machine Learning Career
This repo is designed to give prospective analytical employees some additional information that might help with the job search. It takes inspiration from Conor Dewey, Academic, CIO, ZuZooVN, Maxim, ryanswanstorm
Platforms:
- Triplebyte - Take a quiz. Get offers from multiple top tech companies at once - includes a machine learning track.
- Toptal - Developers seeking to gain entry into the Toptal community are put through a battery of personality and technical tests.
- Hired - Hired matches employers with qualified candidates through a combination of in-house algorithms and online support.
- Kaggle - Take your competition skills to an employer.
- Direct - Contact companies directly (not recommended)
- AI Jobs - Jobs in AI and Big Data
- Analytics Jobs UK - Support analytics workers with useful career information
Reviews:
- Glassdoor - Best employee narratives.
- Indeed - Best coverage.
- Kununu - Best well-rounded infromation.
- Comparably - Best comparison functionality.
- InHerSight - Best female-friendly perspective.
- Paysa - Are you getting paid your market salary.
- Levels.fyi - Compare career levels across companies.
Respected Online Courses
- Deep Learning Specialization
- Mathematics for Machine Learning
- Machine Learning and Deep Learning Fast.ai
- Machine Learning with TensorFlow on Google Cloud Platform
- TensorFlow in Practice Specialization
Competitions
- Kaggle
- DrivenData
- DataHack
- Machine Hack
- The Data Science Game
- CrowdANALYTIX
- InnoCentive
- TuneedIT
- Hacker Earth
- Solutions
Respected Packages (From 300 listings)
- Python
- R
- SQL
- Apache Spark
- Hadoop
- Java
- Scala
- Tensorflow, Keras
- Pandas
- Numpy
- SAS
- MongoDB
- Tableau
- PowerBI
- Hive
Respected Skill Tags (From 300 listings)
- Machine Learning
- Statistics
- Applied Mathematics
- Big Data
- Deep Learning
- Data Visualisation
- Data Analysis
- NLP
- ETL
- Computer Vision
Respected Bootcamps
Name | Switchup Rating | Cost | Locations |
---|---|---|---|
NYC Data Science Academy | 4.87 | $17,600 | New York City and online |
Dataquest | 4.92 | $29 for a basic monthly subscription; $49 for a premium monthly subscription | Online |
RMOTR | 4.91 | $349 per month; one-week free trial available | Online |
Springboard | 4.73 | $499 per month | Online |
General Assembly | 3.98 | $3,950 for the part-time online courses; $15,950 for the in-person full-time immersive bootcamp program | Dallas, Providence, San Diego, San Francisco, Seattle, New York City, Washington (D.C.), Austin, Los Angeles, Atlanta, Denver, Chicago, London, Singapore, Hong Kong, Sydney, Melbourne, Boston, Santa Monica and online |
Metis | 4.91 | $750 per course | Chicago, New York City, San Francisco, Seattle, Singapore and online |
Data Science Dojo | 4.91 | Packages range from $3,799 to $4,499 with the option for flexible payment plans | Seattle, Washington (D.C.), Austin, Chicago, New York City, Toronto, Barcelona, Bucharest, Las Vegas, Singapore, Dubai, Amsterdam, Pretoria and Bangalore |
Thinkful | 4.89 | $16,000 for the full-time course; $9,500 for the flexible six-month course | Washington (D.C.), Philadelphia, Houston, Portland, Dallas, Los Angeles, Phoenix, San Diego, Atlanta, Miami, Tampa, Chicago, Raleigh-Durham, Denver, Boston, San Francisco, Detroit, Salt Lake City, Seattle, Minneapolis, Austin and online |
DataCamp | 4.61 | $25 per month | Online |
The Dev Masters | 4.97 | $4,995 for project-based learning; $6,995 for the mastering applied data science program; $3,500 for the data science for professionals program. | Los Angeles, Orange County and Santa Monica |
Ubiqum Code Academy | 4.85 | $9,000 | Amsterdam, Barcelona, Berlin and Madrid |
Level | 4.52 | $4,495 for the introductory data analytics course; $7,995 for the intermediate data analytics program | Boston, Charlotte, San Francisco, San Jose, Seattle, Toronto and online |
The Data Incubators | 4.52 | Free for accepted fellows | Boston, New York City, San Francisco, Washington (D.C.) and online |
Jedha | 5.0 | $3,595 for the full stack data science program; $995 for the fundamentals in data science | Lyon and Paris |
Science to Data Science | 4.83 | £800 registration fee, after that the course is free if you are accepted | London and online |
Podcasts
-
Podcasts for Beginners:
-
"More" advanced podcasts
-
Podcasts to think outside the box:
Popular Careers
- Data Scientist
- Data Analyst
- Data Architect
- Data Engineer
- Business Analyst
- Marketing Analyst
- Data Manager
- Business Intelligence Analyst
- Machine Learning Engineer
- Statistician
Groups
Linkedin Groups
- Data Mining, Statistics, Big Data, Data Visualization, and Data Science
- Artificial Intelligence, Deep Learning, Machine Learning
- Big Data, Analytics, Business Intelligence & Visualization Experts Community
- KDnuggets Machine Learning, Data Science, Data Mining, Big Data, AI
- Cloud Computing, SaaS & Virtualization
- Data Warehouse — Big Data — Hadoop — Cloud — Data Science — ETL
- Artificial Intelligence, Deep Learning and IoT
- SQL Server Business Intelligence(BI)
- Internet of Things
- Bank and Finance Technology — FinTech Banking Systems Financial Executives
- Cloud Computing
- Python Community
- Python Data Science and Machine Learning
Reddit:
- Dataengineering
- Dataisbeautiful
- Datasets
- Datascienceproject
- Learndatascience
- Learnprogramming
- Learnpython
- Machinelearning
- Learnmachinelearning
- Python
- Computervision
- learnprogramming
- Businessintelligence
- programming
- Scala
- AWS
- bigdata
- SQL
Main Industry Companies
- Artificial Intelligence
- Amazon
- Microsoft
- IBM
- Salesforce
- Intel
- OpenAI
- Biotechnology
- AbbVie
- Aduro Biotech
- Genentech
- Illumina
- Jounce Therapeutics
- Merck & Company
- Somalogic
- Finance
- JP Morgan
- Barclays
- Goldman Sachs
- ING
- Two Sigma
- Renaissance
- Citadel
- AQR
- Bridgewater
- DE Shaw
- Blackstone
- Bain Capital
- Health Care
- Berg
- CHAMPS Oncology
- DarkMatter2db
- Health Catalyst
- Kairoi Health
- Insurance
- Allianz SE
- UnitedHealth
- Anthem
- Humana
- Centene Corporation
- Logistics
- Amazon
- Wallmart
- Tesla
- Convoy
- Flexport
- FedEx
- CargoX
- 6 River Systems
- Nuro
- Marketing and Advertising
- Amazon
- Asos
- Alibaba
- InsideSales.com
- Conversica
Thought Leaders
I don't know the other areas that well, send my your thought leaders by pull request.
- Artificial Intelligence
- Biotechnology
- Finance
- Marcos Lopez de Prado
- Igor Halperin
- Gordon Ritter
- Paul Bilokon
- Saeed Amen
- Apoorv Saxena
- Charles Elkan
- Mike Schuster
- Health Care
- Insurance
- Law Enforcement
- Logistics
- Marketing and Advertising
- Sports
Highest Paying Data Science Jobs
Communities
-
Quora
-
Reddit
Conferences
- Neural Information Processing Systems (NIPS)
- International Conference on Learning Representations (ICLR)
- Association for the Advancement of Artificial Intelligence (AAAI)
- IEEE Conference on Computational Intelligence and Games (CIG)
- IEEE International Conference on Machine Learning and Applications (ICMLA)
- International Conference on Machine Learning (ICML)
- International Joint Conferences on Artificial Intelligence (IJCAI)
- Association for Computational Linguistics (ACL)
Journals, Publications and Magazines
- ICML - International Conference on Machine Learning
- epjdatascience
- Journal of Data Science - an international journal devoted to applications of statistical methods at large
- Big Data Research
- Journal of Big Data
- Big Data & Society
- Data Science Journal
- datatau.com/news - Like Hacker News, but for data
- Data Science Trello Board
- Medium Data Science Topic - Data Science related publications on medium
Colleges
- All Data Science Colleges
- Data Science Degree @ Berkeley
- Data Science Degree @ UVA
- Data Science Degree @ Wisconsin
- Master of Information @ Rutgers
- MS in Computer Information Systems @ Boston University
- MS in Business Analytics @ ASU Online
- Data Science Engineer @ BTH
- MS in Applied Data Science @ Syracuse
- M.S. Management & Data Science @ Leuphana
- Master of Data Science @ Melbourne University
- Msc in Data Science @ The University of Edinburgh
- Master of Management Analytics @ Queen's University
- Master of Data Science @ Illinois Institute of Technology
Newsletters:
Data Science Weekly is definitely a fan-favorite, and for good reason. The newsletter started in 2013 and has pumped out 276 consistent issues since. It starts off with an Editor Picks section and quickly moves onto listing a bunch of data science articles and videos. Furthermore, it includes a section for job openings, tutorials, and books as well. Sent every Thursday, this one is well worth your time. Check out a recent issue.
You have probably heard of O’Reilly Media in one way or another. Personally, I have a collection of their books sitting on my desk at all times. They also publish ebooks, host conferences, and offer other learning solutions. Their data-focused newsletter delivers 10 links each week that range from news to tutorials to white-papers.
Data Elixir takes a similar approach, breaking things down into a wide-ranging collection of weekly news, insights, tools & techniques, resources, and data visualization. The newsletter goes out to over 29,000 subscribers and is delivered every Tuesday. Check out a recent issue.
Data Machina is a more technical newsletter that breaks down links by technology, hitting on topics from R to blockchain to algorithms. There’s really a little bit of everything here. I subscribe to the free version, sent every two weeks but it looks like you can pay to receive the newsletter every week if you would like. Check out a recent issue.
Mode offers a number of enterprise data solutions, but they also put out a pretty good data newsletter every week. They primarily focus on articles that catch their eye around the community but also include a section for featured data jobs as well. Check out a recent issue.
As you might have guessed, Machine Learnings focuses on ML and AI news primarily. I particularly enjoy the Awesome and Not Awesome sections that give bite-sized news if you’re in a rush. Others seem to like it as well, as the newsletter boasts 40,000+ subscribers. Check out a recent issue.
Another newsletter that has been around for some time, The Data Science Roundup has 177 published issues and over 7,000 subscribers to date. This newsletter takes a more concise approach, offering 5 or so links each week with an insightful reflection written on each article. Check out a recent issue.
Not a data science newsletter per se, but a valuable resource nonetheless. Like most people in tech, I love Hacker News. However, I had a hard time keeping up with it, until I found this. You can dictate the frequency and amount of links that are sent to you based on the number of upvotes on each post.
This newsletter contains any recent blog posts, interviews, or news regarding Kaggle, everyone’s favorite machine learning competition site. It also includes links, resources, meetups, and job openings around the community. I couldn’t find a subscribe link for this one, pretty sure Kaggle automatically subscribes you when you make an account.
Stratechery’s Daily Update is a little different than the others in that it’s a paid, daily membership. Not a traditional data science newsletter, these reports focus on tech strategy think-pieces. It’ll run you around $10/month, a little less if you pay yearly. This is one of the few places where I gladly pay for written content, Medium being the other. There’s also plenty of free essays available on the site. Check out this post and others to get a feel for it.
Import AI leans heavily on technical machine learning and AI resources, often white-papers and recent research results. The issues also include an impressive amount of analysis. Even if none of that is your thing, make sure to read the Tech Tales section at the end for an always-interesting futuristic story. Check out a recent issue.
Similar to Import AI, this newsletter covers technical machine learning and AI tutorials, projects, research papers, and news. Delivered a bit sporadically, The Wild Week in AI has over 17,000 subscribers if that’s any indication of the content. Check out a recent issue.
Data Is Plural is delivered weekly, focusing solely on interesting datasets for you to explore or use in your next side-project. There’s also a pretty awesome Google doc that serves as the archive for all these datasets dating back to 2015. Check out a recent issue.
Last but not least, the team at Towards Data Science puts out both weekly and monthly digests of the most popular posts on the publication. You can receive these emails by accepting Letters from TDS if you go to the dropdown found on their homepage. Check out a recent issue.
Project Inspiration:
Data is Beautiful I could spend hours just browsing this subreddit of data visualizations. You’ll be interested in all of the unique ideas and questions that people think up. There’s also monthly challenge where a dataset is chosen, and users are tasked with visualizing it in the most effective way possible. Sort by best all time for instant gratification.
Kaggle I would be remiss if I didn’t mention the poster child of online data science. There’s a couple ways to use Kaggle effectively for inspiration. First, you can look at the trending datasets and think of interesting ways to leverage the information. If you’re more interested in machine learning and the examples themselves, the kernels feature has gotten better and better over time.
The Pudding It really is true that visual essays are an emerging form of journalism. The Pudding embodies this movement like none other. The team uses original datasets, primary research, and interactivity in order to explore tons of interesting topics.
FiveThirtyEight A classic, but still good to this day. I mean come on, Nate Silver is the man. The data-driven blog touches on everything from politics to sports to culture. Not to mention, they just revamped their much improved data export page.
Towards Data Science Lastly, I’ve got to give a shoutout to the TDS Team for bringing together this community of smart people with a passion for achieving things and helping others grow in data science. Browsing recent stories will bring you more than a few interesting project ideas on any given day.
Technical Prep
- HackerRank (http://hackerrank.com/)
- CodeChef (http://codechef.com/)
- HackerEarth(http://hackerearth.com/)
- LeetCode (http://leetcode.com/)
- Topcoder (http://topcoder.com/)
- Kaggle (http://kaggle.com/)
- ChallengePost (http://challengepost.com/)
- CodeForces (http://codeforces.com/)
- Brilliant (http://brilliant.org/)
- SPOJ (http://www.spoj.com/)
- Project Euler (https://projecteuler.net/)
- CodingBat (http://codingbat.com/)
- Codewars (http://www.codewars.com/)
- Codility (https://codility.com/)
- Codingame (https://www.codingame.com/)
- CoderByte (https://coderbyte.com/)
- CodeEval (https://www.codeeval.com/)
- UVA Online Judge (https://uva.onlinejudge.org/)
- CodeFights (https://codefights.com/)
- CheckiO (http://www.checkio.org/)
- Talentbuddy (http://talentbuddy.co/)
- PythonChallenge (http://pythonchallenge.com/)
- LintCode (http://www.lintcode.com/en/)
- Rosalind (http://rosalind.info/problems/locations/)
- CrowdANALYTIX (https://www.crowdanalytix.com/)
- SQL-EX.RU (http://sql-ex.ru/)
- Kattis (http://www.kattis.com/)
- CodeKata (http://codekata.com/)
- CodeAbbey (http://codeabbey.com/)
- FightCode (http://fightcodegame.com/)
- BeatMyCode (http://www.beatmycode.com/)
- TunedIT (http://tunedit.org/)
- MLComp (http://mlcomp.org/)
- HPC University (http://hpcuniversity.org/students/weeklyChallenge/)
- Practice It (https://practiceit.cs.washington.edu/)
Interviews
General
- Interview Q&A bank
- Tech Interview Handbook
- Best Data Science Courses Online
- What it’s like to be on the data science job market
- Learn Data Science on Quora
- Tips for Data Science Interviews on Quora
- How do I prepare for a phone interview with Airbnb?
- Emily Robinson Advice Applying to Data Science Jobs
- Two Sides of Getting a Job as a Data Scientist
- Robert Chang Doing Data Science at Twitter
- Questions I’m Asking in Interviews
- Creating a Great Data Science Resume
- Data Science Interview Guide
- 3 Types of Data Science Interview Questions Joma Tech
- How to Land a Data Scientist Position at Airbnb
- Red Flags In Data Science Interviews
- Advice Building out a Portfolio
- Youtube William Chen Resume/Portfolio Tips
- How to Prepare for Data Science Interviews Quora Answers
- 120 Data Science Questions Answers
- Analytics Vidhya Comprehensive Interview Resources
- Dataquest Data Science Career Guide
- Notes and technical questions from interviewing as a Data Scientist in 2018
- Mastering the Data Science Interview Loop
- Data Science Interview Questions for Top Tech Companies
- 66 Job Interview Questions for Data Scientists
- An Annotated List of Data Scientist Technical Interview Questions
Algorithmic Coding & Python
- Time complexity in Python
- Leetcode
- Stacks and queues in Python
- Preparing for Programming Interviews with Python
- Coding Interview University on GitHub
- Philip Guo Programming Interview Tips
- Google Python Style Guide
- Algorithms in Python GitHub
- Intro to Classes and Objects in Python
- Coding Interview Github Compilation
- Problem Solving with Data Structures & Algorithms in Python
- Python Leetcode Video Series Nasr Maswood
- Python Tricks and Tips
- Big List of Interviewee Interview Questions
Statistics and Probability
- Basics of Probability for Data Science
- William Chen Probability
- 40 Questions on Probability for Data Science Interviews
- Common Probability Distributions
- Probability and Statistics for DS Medium Series
Data Manipulation & SQL
- Mode Tutorial
- How to Ace Data Science Interviews: SQL
- How to Write Better Queries (Datacamp)
- Practice SQL problems
- 10 Frequently Asked SQL Questions
- 45 Essential SQL Interview Questions
- More SQL practice on Github
Data Analysis & Pandas
- Data School Video Series
- Intro to Pandas Data Structures
- Excel Tasks in Pandas
- Data Analyst Interview Practice Checkist Udacity
- More Pandas Exercises on Github
Machine Learning
- How To Prepare For A Machine Learning Interview
- 21 Must-Know Data Science Interview Questions and Answers
- Top 50 Machine learning Interview questions & Answers
- Machine Learning Engineer interview questions
- Popular Machine Learning Interview Questions
- 121 Essential Machine Learning Questions & Answers
- Machine Learning in Python Github Repo
- The Applied Machine Learning Process
- Springboard 41 Essential Machine Learning Questions
- Data School 15 Hours of Machine Learning Videos
- Difference between boosting and bagging
- Comprehensive Guide to Ensemble Learning Analytics Vidhya
- Kaggle Data Science Glossary
- Machine Learning Interview Checklist Udacity
- Google Machine Learning Glossary
- 100 Days of ML Code Infographics
- Machine Learning for Dummies Algorithm Overview
- ML Algorithm Pros and Cons
- Advantages of Different Classification Algorithms
- The MLInterview Repo
Product and Experimentation
- Experiments at Airbnb
- When Should A/B testing not be trusted?
- TutorialsPoint A/B testing questions
- Udacity A/B Testing Course
- Summery of Udacity A/B Testing Course
- Hubspot Frequently Asked Questions about A/B Testing
- Introduction to Churn in Python
- How Do You Set Metrics? - Julie Zhou
- Metrics vs. Experience - Julie Zhou
- How Not to Run an A/B Test
- 12 Guidlines for A/B Tests
- A/B Testing at Stack Overflow
- Type I vs. Type II Errors Simplified
- A/B Testing TutorialPoint
- Case Study: Pay as You Go
- 27 Metrics Used at Pinterest
- 70 Resources to Get Started With A/B Testing
Big Data
- Apache Spark in Python: Beginner’s Guide (Datacamp)
- Apache Spark vs. Mapreduce Whiteboard Walkthrough
- Differences Between Hadoop and Spark
- What is Hadoop?: SQL Comparison
- Data Engineering Interactive Map
- How to Learn Apache Spark? Quora Post
- Youtube Intro to Big Data with PySpark
- Spark Documentation Screencasts
- PySpark Cheatsheet
Tech Interview Handbook
- How to prepare for coding interviews
- Interview Cheatsheet - Straight-to-the-point Do's and Don'ts
- Algorithm tips and the best practice questions categorized by topic
- "Front-end Job Interview Questions" answers
- Interview formats of the top tech companies
- Behavioral questions asked by the top tech companies
- Good questions to ask your interviewers at the end of the interviews
- Helpful resume tips to get your resume noticed and the Do's and Don'ts
Python
- 50 Python interview questions and answers
- 11 Essential Python Interview Questions from Toptal
- A listing of questions that could potentially be asked for a python job listing
- Interview Questions for both beginners and experts
- Interview Cake Python Interview Questions
- Python Frequently Asked Questions (Programming)
- Python interview questions collected by Reddit users
- Python Interview Questions from questionscompiled
- Top 25 Python Interview Questions from Career Guru
- Python Interview 10 questions from Corey Schafer
- Python interview questions. Part I. Junior
- Python interview questions. Part II. Middle
- Python interview questions. Part III. Senior
- Python Interview Questions and Answers (2019)
Scala
- 4 Interview Questions for Scala Developers
- A list of Frequently Asked Questions and their answers, sorted by category
- A list of helpful Scala related questions you can use to interview potential candidates
- How Scala Developers Are Being Interviewed
- Scala Interview Questions/Answers including Language Questions, Functional Programming Questions, Reactive Programming Questions
- Top 25 Scala Interview Questions & Answers from Toptal
MongoDB
- 28 MongoDB NoSQL Database Interview Questions and Answers
- MongoDB frequently Asked Questions by expert members with experience in MongoDB These questions and answers will help you strengthen your technical skills, prepare for the new job test and quickly revise the concepts
- MongoDB Interview Questions from JavaTPointcom
- MongoDB Interview Questions that have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of MongoDB
- Top 20 MongoDB interview questions from Career Guru
MySQL
- 10 MySQL Database Interview Questions for Beginners and Intermediates
- 100 MySQL interview questions
- 15 Basic MySQL Interview Questions for Database Administrators
- 28 MySQL interview questions from JavaTPoint.com
- 40 Basic MySQL Interview Questions with Answers
- Top 50 MySQL Interview Questions & Answers from Career Guru
SQL
- 10 Frequently asked SQL Query Interview Questions
- 45 Essential SQL Interview Questions from Toptal
- Common Interview Questions and Answers
- General Interview Questions and Answers
- Schema, Questions & Solutions for SQL Exercising
- SQL Interview Questions that have been designed specially to get you acquainted with the nature of questions you may encounter during your interview for the subject of SQL
- SQL Interview Questions CHEAT SHEET
Business Analytics Companies - 2019 Glassdoor Rankings
"Best to work for"
Arcadia Data, FiveTran, InfluxData, Dataiku, Confluent, Redis Labs, StreamSets, Looker, Periscope Data, ThoughtSpot, Alation, Dremio, H2O.ai and SAP
"Great to work for"
Pivotal Software, Domo, Salesforce, SiSense, Google, Couchbase, Microsoft, DataStax, Actifio, MongoDB, Databricks, MemSQL, Informatica, Talend, Qubole
"Good to work for"
Tamr, VoltDB, Sumo Logic , Reltio, Trifacta, DataRobot, MarkLogic, Delphix, EnterpriseDB, Dell EMC, Tableau Software, Amazon Web Services, Paxata, Big Squid, Kyvos Insights, RapidMiner, TIBCO
"It is a job"
Qlik, IBM, SAS, Magnitude Software, Zaloni, Splunk, Information Builders, Hewlett Packard Enterprise, MicroStrategy, Cloudera, Oracle, Alteryx, Logi Analytics, GoodData, MapR Technologies, Syncsort, SnapLogic, Outlier, Zoomdata, Hitachi Vantara/Pentaho, Datameer
Most Data Scientists (per Linkedin Recruiter)
- IBM
- Microsoft
- Accenture
- Amazon
- Tata Consultancy
- Cognizant
- Capgemini
- Infosys
- Oracle
Most Numerous Data Science Skills (per Linkedin Recruiter)
- Data Analysis
- Python
- R
- Machine Learning
- Statistics
- Data Mining
- Big Data
- Deep Learning
- Data Visualisation
- NLP
Most Numerous Data Science Industries (per Linkedin Recruiter)
- Information Technology and Services
- Computer Software
- Research
- Higher Eductation
- Financial Services
- Telecommunications
- Management Consulting
- Internet
- Banking
- Insurances
Resume
Sponsors
Firmai.org is a project that focuse on the aggregation of open source AI-BI applications. FirmAI envisions a future of open data access and the facilitation of small-medium enterprise automation.
Tired of technical phone screens? Take Triplebyte’s <a href="https://triplebyte.com/a/Nosq7GM/careerb">quiz</a> and go straight to final onsite interviews! Also check out Triplebyte’s <a href="https://triplebyte.com/a/Nosq7GM/careera?page=salary">Salary Tool!</a> They use real data from actual offers made to Triplebyte engineers. A few of the companies that use Triplebyte include Adobe, Robinhood, Box, Dropbox, Instacart, Evernote, Hipmunk, Grammarly & Palantir
r/datascienceproject is a subreddit where you can share all your data science projects. There is no restrictions on self promotion. Let the best post rise to the top. One rule, it has to relate to a data science project.