Maharshi's blog

Tensors from Scratch #2: Elementwise operations and Broadcasting

This is a series of blog posts that will hopefully let you understand what exactly are Tensors, and how you can too implement them from scratch in C. I will be going from basic definitions, illustrations, and explanations to the actual code implementation in this series.

Don't worry, I'm here to help :)


In the part 1 of "Tensors from Scratch" we discussed about what tensors actually are how the shape of a tensor can give us important information about the tensor itself. As a recap:

In this part, let us see what kind of operations we can perform with tensors, and the role which shapes play there.

Elementwise binary operations

Addition, subtraction, multiplication, division, and exponentiation.

You might be familiar with the above operations between two numbers. In short, with tensors one can perform the same operations and more. These operations are called "binary" because they are performed between 2 tensors, and "elementwise" means that the operation will be performed between one element in the first tensor and the corresponding element in the second tensor, for elements at every position.

However, when we perform binary operations between two tensors we need to keep in mind, the shapes of both the tensors. In most cases, we can successfully perform binary operations when the shapes of two tensors are the equal. The equality of two shapes is defined as follows:

For any two shapes S1 and S2 when the entry (value) at ith axis/position in the first shape is equal to the entry (value) at the ith axis/position in the second shape, then the two shapes are said to be equal.

Example 1: The shapes given below that correspond to rank-4 tensors are equal.

S1 = (2, 2, 4, 5)
S2 = (2, 2, 4, 5)

Example 2: The shapes given below that correspond to rank-5 tensors are not equal because when axis=2 (starting at left from 0) the entries is two shapes are different i.e. 56.

S1 = (2, 3, 5, 4, 9)
S2 = (2, 3, 6, 4, 9)

Okay! Now, let's try and perform the multiplication operation between two rank-1 tensors i.e. vectors. Assume we have:

A=[a0,a1,a2,...,an1]B=[b0,b1,b2,...,bn1]

Both the rank-1 tensors have equal shapes:

shape=(n)

The result of elementwise multiplication operation will be another rank-1 tensor C with the same shape as above:

C=A×B=[a0×b0,a1×b1,a2×b2,...,an1×bn1]

and,

shapeC=(n)

As you can notice, the value ai is multiplied with bi where i is the "index" or the position of an element in the given rank-1 tensor. Now, what if we had two rank-2 tensors with shape (m,n) instead? The result of elementwise multiplication with be another rank-2 tensor with the same shape (m,n).

C=[a0,0×b0,0a0,1×b0,1a0,n1×b0,n1a1,0×b1,0a1,1×b1,1a1,n1×b1,n1am1,0×bm1,0am1,1×bm1,1am1,n1×bm1,n1]

This might seem daunting at first but look at the subscripts on the elements of C. The index i goes from 0 to m1 and index j goes from 0 to n1. In general, for a rank-n tensor we call them the n-dimensional indices. For a given shape entry K of some axis, the index on that particular axis goes from 0 to K1.

Example 1: N-dimensional indices for rank-2 tensor having shape (2,3):

[0 0]
[0 1]
[0 2]
[1 0]
[1 1]
[1 2]

Example 2: N-dimensional indices for rank-4 tensor having shape (2,3,2,1):

[0 0 0 0]
[0 0 1 0]
[0 1 0 0]
[0 1 1 0]
[0 2 0 0]
[0 2 1 0]
[1 0 0 0]
[1 0 1 0]
[1 1 0 0]
[1 1 1 0]
[1 2 0 0]
[1 2 1 0]

We will use N-dimensional indices a lot while implementing tensors in C.

Elementwise binary operations are performed between two tensors having equal shapes. Given some n-dimensional index, the operation is performed (for all indices) between the element of the first tensor and the element of second tensor, at that index. The result of such operation would a rank-n tensor with the same shape as the operands.

The illustration below should clear any doubts about this.

elementwise binary operation between 2 tensors

Shape broadcasting

In the previous section I lied a little. Here's the truth:

Elementwise binary operations can be performed between two tensors even when their shapes are not exactly equal but are broadcastable.

Let's take an example of elementwise addition operation between two tensors A(m,n) and B(n). I have written the shapes of the tensors alongside their names. Here's what the operation might look like:

[a0,0a0,1a0,2a0,n1a1,0a1,1a1,2a1,n1am1,0am1,1am1,2am1,n1]+[b0,b1,b2,...,bn1]

Logically, we can add the values of B with values in individual rows of A, because each row of A has the same number of values as B has in total. Even though their shapes are not exactly equal, we can copy the n values in the tensor B along the "downward" dimension m times to make it equal to A's shape before performing the addition operation.

Expanding and copying the shape and values of a particular tensor BEFORE performing any kind of operation with the modified tensor is called shape broadcasting.

Essentially, we broadcast the values of the tensors along a new shape in such a way that the shapes become equal before performing the operation. The values themselves do not change, only the shape changes and the values are copied.

Keep in mind, if the tensor B didn't have n values to begin with, then no matter what we do, we won't be able to make shape of B equal to A. Let's see the algorithm for shape broadcasting.


Algorithm for broadcasting:

  1. Compare the shapes of the input tensors element-wise, starting from the rightmost dimension.
  2. If the dimensions are equal, or one of them is 1, move to the next dimension.
  3. If the dimensions differ and neither is 1, raise an error.
  4. If one tensor has fewer dimensions, prepend 1s to its shape until both tensors have the same number of dimensions.
  5. The resulting tensor will have the maximum size along each dimension of the input arrays.
  6. For dimensions where one tensor had size 1, that tensor is virtually copied along that dimension to match the other tensor's size.
  7. The resultant shape will be the final shape for both the input tensors.

Example 1: Let's have two tensors A and B with shapes (3,1) and (2) respectively. The steps to find the new shape for both tensors after broadcasting would be:

  1. Compare rightmost dimensions: 1 (A) vs 2 (B)
    • 1 can be broadcast to 2
  2. Move to next dimension: 3 (A) vs no dimension (B)
    • B's shape is expanded to (1,2) for broadcasting
  3. Final broadcast shape: (3,2)

Result: Both tensors are broadcast to the shape (3,2)

Example 2: A and B with shapes (4,3,2) and (3,1) respectively. The steps for resultant shape after broadcasting:

  1. Compare rightmost dimensions: 2 (A) vs 1 (B)
    • 1 can be broadcast to 2
  2. Next dimension: 3 (A) vs 3 (B) - entries match so continue
  3. Next dimension: 4 (A) vs no dimension (B)
    • B's shape is expanded to (1,3,1) for broadcasting
  4. Final broadcast shape: (4,3,2)

Result: Tensor A remains (4,3,2), while tensor B is broadcast to (4,3,2)

Example 3: Non-broadcastable shapes A and B with shapes (3,2) and (2,3) respectively.

  1. Compare rightmost dimensions: 2 (A) vs 3 (B)
    • Neither is 1, and they're not equal
  2. Broadcasting is not possible here.

Result: The shapes aren't broadcastable

Finally, here is an example of elementwise addition operation for two tensors with broadcasting done implicitly. Don't worry about the operations arange and reshape as I will be talking about those later but take a good look at the shapes of the tensors.

>>> import numpy as np
>>> a = np.arange(1, 4, 1).reshape(3, 1)
>>> b = np.arange(1, 3, 1).reshape(2)
>>> c = a + b
>>> a
array([[1],
       [2],
       [3]])
>>> b
array([1, 2])
>>> c
array([[2, 3],
       [3, 4],
       [4, 5]])
>>> a.shape, b.shape, c.shape
(3, 1), (2), (3, 2)

We're not done...

This part discussed about elementwise binary operations and shape broadcasting on tensors. In the next part, we will focus on unary operations that we can perform on tensors. See you soon :)