Tensors from Scratch #2: Elementwise operations and Broadcasting
This is a series of blog posts that will hopefully let you understand what exactly are Tensors, and how you can too implement them from scratch in C. I will be going from basic definitions, illustrations, and explanations to the actual code implementation in this series.
Don't worry, I'm here to help :)
In the part 1 of "Tensors from Scratch" we discussed about what tensors actually are how the shape of a tensor can give us important information about the tensor itself. As a recap:
- tensor is an umbrella term for "mathematical objects" or "containers" that have a rank/dimensionality associated with them.
- Scalars are rank-0 tensors, vectors are rank-1 tensors, matrices are rank-2 tensors and so on.
- One important property of a tensor is its shape. Shape is defined as: the number of elements present in each axis/dimension of the tensor.
- Shape is written as entries for each dimension between parenthesis and .
- From the shape we can determine: the rank/dimensionality of the tensor by shape's length, and the total number of values a tensor contains by multiplying all the entries of shape.
In this part, let us see what kind of operations we can perform with tensors, and the role which shapes play there.
Elementwise binary operations
Addition, subtraction, multiplication, division, and exponentiation.
You might be familiar with the above operations between two numbers. In short, with tensors one can perform the same operations and more. These operations are called "binary" because they are performed between tensors, and "elementwise" means that the operation will be performed between one element in the first tensor and the corresponding element in the second tensor, for elements at every position.
However, when we perform binary operations between two tensors we need to keep in mind, the shapes of both the tensors. In most cases, we can successfully perform binary operations when the shapes of two tensors are the equal. The equality of two shapes is defined as follows:
For any two shapes and when the entry (value) at axis/position in the first shape is equal to the entry (value) at the axis/position in the second shape, then the two shapes are said to be equal.
Example 1: The shapes given below that correspond to rank-4 tensors are equal.
S1 = (2, 2, 4, 5)
S2 = (2, 2, 4, 5)
Example 2: The shapes given below that correspond to rank-5 tensors are not equal because when (starting at left from ) the entries is two shapes are different i.e. .
S1 = (2, 3, 5, 4, 9)
S2 = (2, 3, 6, 4, 9)
Okay! Now, let's try and perform the multiplication operation between two rank-1 tensors i.e. vectors. Assume we have:
Both the rank-1 tensors have equal shapes:
The result of elementwise multiplication operation will be another rank-1 tensor with the same shape as above:
and,
As you can notice, the value is multiplied with where is the "index" or the position of an element in the given rank-1 tensor. Now, what if we had two rank-2 tensors with shape instead? The result of elementwise multiplication with be another rank-2 tensor with the same shape .
This might seem daunting at first but look at the subscripts on the elements of . The index goes from to and index goes from to . In general, for a rank-n tensor we call them the n-dimensional indices. For a given shape entry of some axis, the index on that particular axis goes from to .
Example 1: N-dimensional indices for rank-2 tensor having shape :
[0 0]
[0 1]
[0 2]
[1 0]
[1 1]
[1 2]
Example 2: N-dimensional indices for rank-4 tensor having shape :
[0 0 0 0]
[0 0 1 0]
[0 1 0 0]
[0 1 1 0]
[0 2 0 0]
[0 2 1 0]
[1 0 0 0]
[1 0 1 0]
[1 1 0 0]
[1 1 1 0]
[1 2 0 0]
[1 2 1 0]
We will use N-dimensional indices a lot while implementing tensors in C.
Elementwise binary operations are performed between two tensors having equal shapes. Given some n-dimensional index, the operation is performed (for all indices) between the element of the first tensor and the element of second tensor, at that index. The result of such operation would a rank-n tensor with the same shape as the operands.
The illustration below should clear any doubts about this.
Shape broadcasting
In the previous section I lied a little. Here's the truth:
Elementwise binary operations can be performed between two tensors even when their shapes are not exactly equal but are broadcastable.
Let's take an example of elementwise addition operation between two tensors and . I have written the shapes of the tensors alongside their names. Here's what the operation might look like:
Logically, we can add the values of with values in individual rows of , because each row of has the same number of values as has in total. Even though their shapes are not exactly equal, we can copy the values in the tensor along the "downward" dimension times to make it equal to 's shape before performing the addition operation.
Expanding and copying the shape and values of a particular tensor BEFORE performing any kind of operation with the modified tensor is called shape broadcasting.
Essentially, we broadcast the values of the tensors along a new shape in such a way that the shapes become equal before performing the operation. The values themselves do not change, only the shape changes and the values are copied.
Keep in mind, if the tensor didn't have values to begin with, then no matter what we do, we won't be able to make shape of equal to . Let's see the algorithm for shape broadcasting.
Algorithm for broadcasting:
- Compare the shapes of the input tensors element-wise, starting from the rightmost dimension.
- If the dimensions are equal, or one of them is 1, move to the next dimension.
- If the dimensions differ and neither is 1, raise an error.
- If one tensor has fewer dimensions, prepend 1s to its shape until both tensors have the same number of dimensions.
- The resulting tensor will have the maximum size along each dimension of the input arrays.
- For dimensions where one tensor had size 1, that tensor is virtually copied along that dimension to match the other tensor's size.
- The resultant shape will be the final shape for both the input tensors.
Example 1: Let's have two tensors and with shapes and respectively. The steps to find the new shape for both tensors after broadcasting would be:
- Compare rightmost dimensions: 1 () vs 2 ()
- 1 can be broadcast to 2
- Move to next dimension: 3 () vs no dimension ()
- 's shape is expanded to for broadcasting
- Final broadcast shape:
Result: Both tensors are broadcast to the shape
Example 2: and with shapes and respectively. The steps for resultant shape after broadcasting:
- Compare rightmost dimensions: 2 () vs 1 ()
- 1 can be broadcast to 2
- Next dimension: 3 () vs 3 () - entries match so continue
- Next dimension: 4 () vs no dimension ()
- 's shape is expanded to for broadcasting
- Final broadcast shape:
Result: Tensor remains , while tensor is broadcast to
Example 3: Non-broadcastable shapes and with shapes and respectively.
- Compare rightmost dimensions: 2 () vs 3 ()
- Neither is 1, and they're not equal
- Broadcasting is not possible here.
Result: The shapes aren't broadcastable
Finally, here is an example of elementwise addition operation for two tensors with broadcasting done implicitly. Don't worry about the operations arange
and reshape
as I will be talking about those later but take a good look at the shapes of the tensors.
>>> import numpy as np
>>> a = np.arange(1, 4, 1).reshape(3, 1)
>>> b = np.arange(1, 3, 1).reshape(2)
>>> c = a + b
>>> a
array([[1],
[2],
[3]])
>>> b
array([1, 2])
>>> c
array([[2, 3],
[3, 4],
[4, 5]])
>>> a.shape, b.shape, c.shape
(3, 1), (2), (3, 2)
We're not done...
This part discussed about elementwise binary operations and shape broadcasting on tensors. In the next part, we will focus on unary operations that we can perform on tensors. See you soon :)