Blog | Data Science Portfolio

AI Ethics

Jan 24, 2022

Model Cards
A model card is a short document that provides key information about a machine learning model. Model cards increase transparency by communicating information about trained models to broad audiences. Model cards Though AI systems are playing increasingly important roles in every industry, few people understand how these systems work. AI...

Jan 23, 2022

AI Fairness
There are many different ways of defining what we might look for in a fair machine learning (ML) model. For instance, say we’re working with a model that approves (or denies) credit card applications. Is it: fair if the approval rate is equal across genders, or is it better if...

Jan 20, 2022

Identifying Bias in AI
We can visually represent different types of bias, which occur at different stages in the ML workflow: Note that these are not mutually exclusive: that is, an ML application can easily suffer from more than one type of bias. For example, as Rachel Thomas describes in a recent research talk,...

Jan 19, 2022

Human-Centered Design for AI
AI increasingly has an impact on everything from social media to healthcare. AI is used to make credit card decisions, to conduct video surveillance in airports, and to inform military operations. These technologies have the potential to harm or help the people that they serve. By applying an ethical lens,...

Algorithms

Dec 21, 2021

Useful Algorithms
I find new and interesting algorithms and forget about them all the time! So from now on I’m going to save them all here. DBSCAN DBSCAN, aka Density-based spatial clustering of applications with Noise, is a clustering algorithm that identifies clusters by finding regions that are densely packed together, in...

Dec 1, 2021

Belief Propogation
Each row of the data contains a fully observed coffee machine, with the state of every random variable. The random variables are all binary, with \( \texttt{False} \) represented by 0 and \( \texttt{True} \) represented by 1. The variables are: Failures (we’re trying to detect these): he - No...

Oct 15, 2021

Minimax
The minimax algorithm is attributed to John von Neumann (1928, Zur Theorie der Gesellschaftsspiele), but its key features were described earlier by Émile Borel (1921, La théorie du jeu et les équations intégrales à noyau symétrique). It has also been sugested that Charles Babbage may have known about the algorithm....

Astrophysics

Nov 27, 2021

Radio Interferometer Simulation
The VLA hosts 27 antennas, with each one comprising of a 25 meter dish housing 8 receivers with a weight of 209 metric tonnes. The dishes move across three arms of a track, on an altitude-azimuth mount, in the shape of a Y configuration. Using the specially designed lifting train...

Nov 10, 2021

Discover Pi Mensae
Data from the TESS mission are available from the data archive at MAST. This tutorial demonstrates how the Lightkurve Python package can be used to read in these data and create your own TESS light curves with different aperture masks. Below is a quick tutorial on how to get started...

Oct 14, 2021

Introduction to Exoplanets
Historical records have demonstrated that the classification of the uniqueness of Earth and and the composition of the solar system were controversial. In the 3rd century BC, Epicurus (341-270 B.C.) stated that “There are infinite worlds both like and unlike this world of ours. For the atoms being infinite in...

Cheat Sheets

Feb 14, 2022

Supervised Learning
Supervised learning is the machine learning task of learning a function that maps an input to an output based on example input-output pairs. A supervised learning algorithm analyzes the training data and produces an inferred function, which can be used later for mapping new examples. The most popular supervised learning...

Feb 14, 2022

Clustering
In this article, you will find a complete clustering cheat sheet. In eleven minutes you will be able to know what it is and to refresh your memory of the main algorithms. Clustering (also called cluster analysis) is a task of grouping similar instances into clusters. More formally, clustering is...

Computer Vision

Dec 23, 2021

Data Augmentation
Now that you’ve learned the fundamentals of convolutional classifiers, you’re ready to move on to more advanced topics. In this lesson, you’ll learn a trick that can give a boost to your image classifiers: it’s called data augmentation. The Usefulness of Fake Data The best way to improve the performance...

Dec 22, 2021

Custom Convnets
Now that you’ve seen the layers a convnet uses to extract features, it’s time to put them together and build a network of your own! Simple to Refined In the last three lessons, we saw how convolutional networks perform feature extraction through three operations: filter, detect, and condense. A single...

Dec 20, 2021

The Sliding Window
In the previous two lessons, we learned about the three operations that carry out feature extraction from an image: filter with a convolution layer. detect with ReLU activation. condense with a maximum pooling layer. The convolution and pooling operations share a common feature: they are both performed over a sliding...

Dec 19, 2021

Maximum Pooling
Previously, we learned about how the first two operations in this process occur in a Conv2D layer with relu activation. Now we’ll look at the third (and final) operation in this sequence: condense with maximum pooling, which in Keras is done by a MaxPool2D layer. Condense with Maximum Pooling Adding...

Dec 18, 2021

Convolution and ReLU
A convolutional classifier has two parts: a convolutional base and a head of dense layers. We learned that the job of the base is to extract visual features from an image, which the head would then use to classify the image. We’re going to learn about the two most important...

Dec 17, 2021

The Convolutional Classifier
Have you ever wanted to teach a computer to see? Here, we will: Use modern deep-learning networks to build an image classifier with Keras. Design your own custom convnet with reusable blocks. Learn the fundamental ideas behind visual feature extraction. Master the art of transfer learning to boost your models....

DataCamp Projects

Feb 2, 2022

Nobel Prize Winners
1. The most Nobel of Prizes The Nobel Prize is perhaps the world’s most well known scientific award. Except for the honor, prestige and substantial prize money the recipient also gets a gold medal showing Alfred Nobel (1833 - 1896) who established the prize. Every year it’s given to scientists...

Jan 30, 2022

Google Play Store Reviews
1. Google Play Store apps and reviews Mobile apps are everywhere. They are easy to create and can be lucrative. Because of these two factors, more and more apps are being developed. In this notebook, we will do a comprehensive analysis of the Android app market by comparing over ten...

Deep Learning

Dec 16, 2021

Binary Classification
We’ve learned about how neural networks can solve regression problems. Now we’re going to apply neural networks to another common machine learning problem: classification. Most everything we’ve learned up until now still applies. The main difference is in the loss function we use and in what kind of outputs we...

Dec 15, 2021

Dropout and Batch Normalization
There’s more to the world of deep learning than just dense layers. There are dozens of kinds of layers you might add to a model. (Try browsing through the Keras docs for a sample!) Some are like dense layers and define connections between neurons, and others can do preprocessing or...

Dec 14, 2021

Overfitting and Underfitting
Recall from the example in the previous lesson that Keras will keep a history of the training and validation loss over the epochs that it is training the model. In this lesson, we’re going to learn how to interpret these learning curves and how we can use them to guide...

Dec 13, 2021

Stochastic Gradient Descent
Previously, we learned how to build fully-connected networks out of stacks of dense layers. When first created, all of the network’s weights are set randomly – the network doesn’t “know” anything yet. In this lesson we’re going to see how to train a neural network; we’re going to see how...

Dec 12, 2021

Deep Neural Networks
We’re going to see how we can build neural networks capable of learning the complex kinds of relationships deep neural nets are famous for. The key idea here is modularity, building up a complex network from simpler functional units. We’ve seen how a linear unit computes a linear function –...

Dec 11, 2021

A Single Neuron
Using Keras and Tensorflow we’ll learn how to: Create a fully-connected neural network architecture. Apply neural nets to two classic ML problems: regression and classification. Train neural nets with stochastic gradient descent. Improve performance with dropout, batch normalization, and other techniques. What is Deep Learning? Some of the most impressive...

Feature Engineering

Dec 6, 2021

Target Encoding
Most of the techniques we’ve seen in this course have been for numerical features. The technique we’ll look at in this lesson, target encoding, is instead meant for categorical features. It’s a method of encoding categories as numbers, like one-hot or label encoding, with the difference that it also uses...

Dec 5, 2021

Principal Component Analysis
In the previous lesson we looked at our first model-based method for feature engineering: clustering. In this lesson we look at our next: principal component analysis (PCA). Just like clustering is a partitioning of the dataset based on proximity, you could think of PCA as a partitioning of the variation...

Dec 3, 2021

Clustering with K-Means
This lesson and the next make use of what are known as unsupervised learning algorithms. Unsupervised algorithms don’t make use of a target; instead, their purpose is to learn some property of the data, to represent the structure of the features in a certain way. In the context of feature...

Dec 2, 2021

Feature Engineering
The plan is go through the feature engineering course where I’ll learn how to: Determine which features are the most important with mutual information. Invent new features in several real-world problem domains. Encode high-cardinality categoricals with a target encoding. Create segmentation features with k-means clustering. Decompose a dataset’s variation into...

Dec 1, 2021

Mutual Information
First encountering a new dataset can sometimes feel overwhelming. You might be presented with hundreds or thousands of features without even a description to go by. Where do you even begin? A great first step is to construct a ranking with a feature utility metric, a function measuring associations between...

Nov 28, 2021

Creating Features
Once you’ve identified a set of features with some potential, it’s time to start developing them. In this lesson, you’ll learn a number of common transformations you can do entirely in Pandas. We’ll use four datasets in this lesson having a range of feature types: US Traffic Accidents, 1985 Automobiles,...

Geospatial Analysis

Jan 12, 2022

Proximity Analysis
Introduction You are part of a crisis response team, and you want to identify how hospitals have been responding to crash collisions in New York City. Before you get started, run the code cell below to set everything up. import math import geopandas as gpd import pandas as pd from...

Jan 11, 2022

Interactive Maps
Your first interactive map We begin by creating a relatively simple map with folium.Map(). import pandas as pd import geopandas as gpd import math import folium from folium import Choropleth, Circle, Marker from folium.plugins import HeatMap, MarkerCluster def embed_map(m, file_name): from IPython.display import IFrame m.save(file_name) return IFrame(file_name, width='100%', height='500px') #...

Dec 26, 2021

Coordinate Reference Systems
The maps you create in this course portray the surface of the earth in two dimensions. But, as you know, the world is actually a three-dimensional globe. So we have to use a method called a map projection to render it as a flat surface. Map projections can’t be 100%...

Dec 24, 2021

Your First Map
In this micro-course, you’ll learn about different methods to wrangle and visualize geospatial data, or data with a geographic location. Along the way, you’ll offer solutions to several real-world problems like: Where should a global non-profit expand its reach in remote areas of the Philippines? How do purple martins, a...

Machine Learning Explainability

Dec 10, 2021

Aggregate SHAP Values
Now we’ll expand on SHAP values, seeing how aggregating many SHAP values can give more detailed alternatives to permutation importance and partial dependence plots. SHAP Values Review hap values show how much a given feature changed our prediction (compared to if we made that prediction at some baseline value of...

Dec 9, 2021

SHAP Values
You’ve seen (and used) techniques to extract general insights from a machine learning model. But what if you want to break down how the model works for an individual prediction? SHAP Values (an acronym from SHapley Additive exPlanations) break down a prediction to show the impact of each feature. Where...

Dec 8, 2021

Permutation Importance
One of the most basic questions we might ask of a model is: What features have the biggest impact on predictions? This concept is called feature importance. There are multiple ways to measure feature importance. Some approaches answer subtly different versions of the question above. Other approaches have documented shortcomings....

Dec 7, 2021

Model Insights
Many people say machine learning models are “black boxes”, in the sense that they can make good predictions but you can’t understand the logic behind those predictions. This statement is true in the sense that most data scientists don’t know how to extract insights from models yet. What features in...

Natural Language Processing

Nov 26, 2021

Plotting Library Catalog Subjects
Kaggle has a cool dataset of the Seattle Public Library’s catalog. Each item has a list of subjects. For example, the first entry has: Musicians Fiction, Bullfighters Fiction, Best friends Fiction, Friendship Fiction, Adventure and adventurers Fiction Using what I learned in my NLP course last semester, I used one...

Nov 23, 2021

Sentiment Analysis of Reviews
Each review is labelled with either Pos or Neg to indicate whether the review has been assessed as positive or negative in the sentiment it expresses. You should treat these labels as a reliable indicator of sentiment. You can assume that there are no neutral reviews. There are 1,382 reviews...

Python Snippets

Dec 27, 2021

Python Setup
First, look where it is installed: ls -l /usr/local/bin/python* note the line which ends with python3.9 without anything (as m for example) and type ln -s -f /usr/local/bin/python3.9 /usr/local/bin/python where ‘/usr/local/bin/python3.9’ is what you have copied from above. type in a new session python --version Download pip by running the...

Dec 6, 2021

Comprehensions
List comprehensions are a super cool feature in python. As such, I’ve added a popular interview question to explain each part of the list comprehension constructor below. A standard list comprehension is of the form: newlist = [expression for item in iterable if condition == True] where, The iterable can...

Statistics

Oct 13, 2021

Data Science Foundations
SQL for Data Science SQL is essential not only for data science but for any data-related field. SQL allows us to work with databases. As a data scientist, you will frequently need to extract data from a database in order to perform your analysis, so this is an extremely useful...