A good grasp of vectors is essential for mastering machine learning. This article will present a number of simple vector operations culminating in an applied example. Vectors are the prevalent in a number of fields. However, for the purposes of this article we will focus on their application in machine learning.
My hope is that after reading this article you have an understanding of the following:
- What is a vector?
- Vector operations
- Vector lengths and distances
- An applied example in machine learning
Visualizations will be provided where possible. Python code snippets to visualize the results be also be provided throughout. Although if you don’t know Python don’t worry! Import the following packages if you want to follow along:
import matplotlib.pyplot as plt
import numpy as np
plt.style.use('seaborn-whitegrid')
What is a Vector?
A definition from lexico.com below:
“A quantity having direction as well as magnitude, especially as determining the position of one point in space relative to another.”
Let’s focus on the point “determining the position of one point in space relative to another” which is quite straightforward to interpret geometrically. An example of a two dimensional vector is given below:
A two dimensional vector will have an x and y component. Vectors are often written with an ⃗ arrow accent indicating we are dealing with a vector therefore the above would become:
Note that the dimensionality of a vector relates to the number of elements it contains.
In order to visualize what a two dimensional vector looks like, take two examples below.
In Python:
v1 = np.array([1,2])
v2 = np.array([3,1])
So what’s a vector look like geometrically?
When seeing a two dimensional vector we should think of the first component as the x direction and the second component as the y direction. Therefore, for v⃗ 1 we move 1 unit in the x direction and two units in the y direction.
For v⃗ 2 we observe a move in the x direction of 3 units and a 1 unit move in the y direction.
Elementary Vector Operations
We can extend the notion of elementary operations in algebra to vectors. Vectors addition, subtraction and multiplication by a number are fairly straightforward.
Vector Addition
When we add two vectors, the result is a third vector. Taking the vectors we have been working with so far and adding them gives the following:
v3 = v1+v2
print(v3) Out: array([4, 3])
Vector addition has a nice geometric interpretation as shown below. When adding two vectors, just picture lifting the tail of one vector and placing it at the head of the other. The v⃗ 3 = v⃗ 1+ v⃗ 2 which is represented by taking the tail (dashed blue line) and putting it at the head of the black vector which leads to a third vector illustrated by the red arrow.
Notice above that we get the same answer regardless of whether we add v⃗ 3=v⃗ 1+ v⃗ 2 or we add v⃗ 3 = v⃗ 2 + v⃗ 1 this property is true for all addition, the fact that the order we add vectors doesn’t matter to the final outcome is known as commutivity.
Vector Subtraction
Vector subtraction works in a similar way, in that we just subtract the x and y components for each vector. As with vector addition, the result from this operation will be a new vector we will call v⃗ 3.
Subtracting v⃗ 2 from v⃗ 1
v1 = np.array([1,2])
v2 = np.array([3,1])
v3 = v1-v2 v3 Out: array([-2, 1])
Subtracting v⃗ 1 from v⃗ 2
v3 = v2-v1
v3 Out: array([ 2, -1])
A geometric representation of vector subtraction is given below. The rule for vector subtraction, is that we flip the direction of the vector we are subtracting and put the tail at the head of the first vector.
Notice when we subtract v⃗ 2 from v⃗ 1 as shown in the left plot, we flip the black vector v⃗ 2 upside-down and put it at the tip of the blue vector v⃗ 1 which then gives the new red coloured vector.
Scalar Multiplication
Consider a scalar just be a real number for the purposes of this article. When we multiply a vector by a scalar, we simply multiply each element of the vector by the given scalar. Let’s use our example vectors to demonstrate this concept.
Multiply the first vector by the scalar quantity 2.
v1*2Out: array([2, 4])
As we see in the plot below, multiplication by a scalar simply scales the vector elements. Notice the blue line on the right plot is twice the length as the one on the left prior to multiplication.
Multiplication by a negative scalar has a similar interpretation, in that the vector is scaled by the magnitude of the given scalar, but the direction is reversed.
Length of a Vector
The length of a vector is also known as the magnitude or Euclidean norm depending on the context. In order to calculate the length of a 2d vector the following formula is used:
The length of a vector v⃗ is denoted as ||v⃗ ||
To calculate this in Python we can use the function below:
def vector_length(vector):
return np.sqrt(np.sum([i**2 for i in vector]))
A remind of the example vectors we have been using:
lv1 = vector_length(v1)
lv2 = vector_length(v2)
print(f"Length of v1: {lv1}")
print(f"Length of v2: {lv2}") Out:
Length of v1: 2.23606797749979
Length of v2: 3.1622776601683795
Taking the length of v⃗ 1 and showing the geometric interpretation below, we observe that the length of the vector is √ 5 which agrees with the output from the python function shown above. You may recognize this length as being the hypotenuse of a right angled triangle from Pythagoras’ theorem.
Distance Between Vectors
Whilst on this section regarding the length of a vector it may be useful to consider the length between two vectors, know as the distance. The precise term for this quantity is the Euclidean distance. The formula for the 2 dimensional case is given below:
We could make a formula to generalize the formula above for an n dimensional vector. However, notice we already have the tools to calculate this distance. We can utilize vector subtraction which we have already discussed and then take the length of the resulting vector. Take two vectors of the same dimension v and u:
In python for our example vectors:
vector_length(v1-v2) Out: 2.23606797749979
You may be able to guess what this looks like visually. But let’s take a look anyway for the example vectors v⃗ 1−v⃗ 2 we have been using:
Let’s check whether this result agrees with the result we got from Python. Note that although we used v⃗ 1−v⃗ 2 in the plot above, we would have gotten exactly the same answer from v⃗ 2−v⃗ 1 .
vector_length(v1-v2) == vector_length(v2-v1) == np.sqrt(5)Out:
True
It should be said that there are other ways to calculate the distance between vectors, which we won’t go into in this article.
What does all this have to do with machine learning?
The reader would be forgiven for being slightly lost as to how exactly these funny arrows on a graph relate to machine learning. In order to demonstrate the usefulness of vectors let’s use the hello world of machine learning to put some of the math above into context.
K-Nearest-Neighbours
Let’s say we have are given a dataset (made for illustrative purposes), of cats and dogs. We denote a cat by 0 and a dog by 1.
animals = np.array([0,1,0,0,1,0,1,0,1,0,1,0,1,0])
We have the following information about these animals from our dataset:
Weight: Animal weight in Kilograms (KG)
Food consumed per month: Weight of the food consumed in pounds.
weights = np.array([20, 15, 23, 17, 26, 18, 17, 20, 28, 19, 28, 21, 22, 19])food_inpounds = np.array([6, 6, 4, 6, 12, 4, 9, 4, 6, 9, 12, 11, 8, 9])
On reflection these would be some very heavy cats :( . But onwards nonetheless!
Let’s visualize our dataset:
Say we get a new data-point and we know only the animal’s weight and food consumption. Can we make a prediction regarding whether its a cat or a dog based on what we know? We see the animal is 22 kg and eats 10 pounds of food per month.
new_point = np.array([22,10])
We are going to assume animals of similar weight and food consumption are the more likely to be the same species. Therefore we are going to find the nearest neighbours to our unknown data point and classify the new point based on a majority vote between the them.
How many neighbours (k) should we use?
The number of neighbours is the k from the name of the algorithm. For the convenience we will use the 3 nearest neighbours to our unknown data point. Therefore k=3.
How do we define ‘nearest’ ?
Well clearly we can see from the plot above which points are the 3 nearest neighbours to our data point. However, bear in mind that we can’t always plot data and look at it as we can in this contrived example.
You may realize that we can calculate the distance between the points, as we did in the ‘distance between vectors’ section. Let’s plot the Euclidean Distance between the unknown point and the labelled data.
Looking at the above plot, it appears that the unknown point’s 3 nearest neighbours and 2 cats and 1 dog. Therefore the KNN algorithm would predict that the unknown point is a cat, by a majority vote of 2:1.
Questions for the Reader to Consider
Since this article is getting rather long, and bearing in mind the fact that I personally find a few conceptual questions useful when learning a new topic. Perhaps you the reader, would like to answer the following questions in the comment section below.
- What other variables would make this algorithm perform better? Fur colour ? Age? Why? You are currently considering feature selection.
- What would the KNN vote ratio be if we decided to change k from 3 to 5?
- Can you think of a way to express how confident in % we are that the unknown point in the KNN example is a cat? What about if we changed k from 3 to 5? Are we now more or less confident?
- Are there any advantages in choosing an odd number for k as opposed to an even one?
- Viewing the formula for length of a vector, what would happen if our variables were of drastically different scale? For example x in [0, ….,1] and y in [0,1,….1000]. Can you think of any problems that may arise due to this? Any ideas on how to remedy this problem?
See all articles on www.codearmo.com and feel free to comment if you have any ideas for things you would like explained in more detail!