Data Science Foundations
SQL for Data Science
SQL is essential not only for data science but for any data-related field. SQL allows us to work with databases. As a data scientist, you will frequently need to extract data from a database in order to perform your analysis, so this is an extremely useful skill you need to acquire.
This 4-hour video is a SQL course for beginners that will teach you all the stuff you need to know about database management and how to create SQL queries. You will learn SQL using MySQL database, SQL theory, some basic database terminology, Data definition language (DDL), SQL Keywords, Data manipulation language (DML), Data control language (DCL), Transaction control language (TCL), and more.
Let’s see in more detail this course’s curriculum (and what I consider the most important parts of the course for data science)
- Core Concepts on Relational Databases (tables and keys)
- SQL Basics (DDL, DML, DCL, and Queries)
- How to Create Tables and Insert Data (CREATE, DROP, INSERT)
- Logical and comparison operators
- SQL Constraints (NOT NULL, UNIQUE, PRIMARY KEY, etc)
- Aggregates (COUNT, SUM, MIN, MAX)
- Update and Delete (UPDATE, WHERE, DELETE)
- Basic Queries (SELECT …FROM, ORDER BY, LIMIT, etc)
- Wildcards (%, _)
- Unions and Joins (inner join, left join, right join, etc)
- Nested Queries
Learn Statistics & Probability for Data Science
- This 1-hour YouTube video covers descriptive statistics. You will learn the different types of data, how to build a histogram and scatterplot, how to find the mean, median, and mode, learn skewness, variance, and standard deviation and see some practical examples.
- This tutorial covers the normal distribution. This is one of the most important concepts in statistics. This is the foundation of inferential statistics (the inferences we make in data points in mostly based on the normal distribution).
- All the video tutorials in this unit cover confidence intervals. You will learn the confidence interval, how to estimate a population proportion, and how to estimate a population mean (you will learn the t statistics here).
- Finally, all the video tutorials in this unit cover hypothesis testing. You will learn important concepts such as the p-values, types of errors (Type I and Type II errors), and the significance test. In addition to that, you will learn how to construct a test about a proportion.
Learn Probability for Data Science
- The first 8 sections in this unit cover the basics of probability. There you will learn all the concepts about probability you need to know for data science such as basic theoretical probability, probability using sample spaces, basic set operations, experimental probability, randomness, addition rule, multiplication rule for independent and dependent events, and conditional probability.
- This unit covers counting, permutations, and combinations. You will learn first how to count outcomes using tree diagrams and flower plots. Then you will learn the concepts behind permutation and combinations and see some applications.
- Finally, in this playlist, you will find tutorials of many discrete distributions you should know as a data scientist. Learn at least the Bernoulli, Binomial, Poisson, and Uniform distribution (here’s an overview of some discrete distributions), while in this second playlist, you will find some continuous probability distributions you should know too (normal, students T, chi-squared, exponential, and logistic distribution)
Learn Mathematics for Data Science
There are 2 subfields of mathematics that are widely used in data science: calculus and linear algebra. This is the foundation to understand machine learning and deep learning models that you will implement with Python later.
- This first linear algebra unit covers vectors and spaces. You will learn vectors for linear algebra, linear combination and span, linear dependence and independence, subspaces and the basis for a subspace, vector dot and cross products, and matrices.
The second linear algebra unit is all about matrix transformations. It covers functions and linear transformations, transformations and matrix multiplication, inverse functions and transformations, how to find inverses and determinants and how to transpose a matrix. Most of this stuff can be done in Python with a few lines of code, but it’s good for you to know the math behind it.
The calculus 1 and calculus 2 units cover the basic math stuff such as derivatives, differential equations, integrals, series, and applications of all of them.
Paul’s Online Notes is a fantastic resource too!