# The Ultimate NumPy Tutorial for Data Science Beginners

## Highlights

- NumPy is a core Python library every data science professional should be well acquainted with
- This comprehensive NumPy tutorial covers NumPy from scratch, from basic mathematical operations to how Numpy works with image data
- Plenty of Numpy concepts and Python code in this article

## Introduction

I am a huge fan of the NumPy library in Python. I have relied on it countless times during my data science journey to perform all sorts of tasks, from basic mathematical operations to using it for image classification!

In short – NumPy is one of the most fundamental libraries in Python and perhaps the most useful of them all. NumPy handles large datasets effectively and efficiently. I can see your eyes glinting at the prospect of mastering NumPy already. 🙂 As a data scientist or as an aspiring data science professional, we need to have a solid grasp on NumPy and how it works in Python.

In this article, I am going to start off by describing what the NumPy library is and why you should prefer it over the ubiquitous but cumbersome Python lists. Then, we will cover some of the most basic NumPy operations that will get you hooked on to this awesome library!

If you’re new to Python, don’t worry! You can take the comprehensive (and free) Python course to learn everything you need to get started with data science programming!

## Here’s how we’ll learn NumPy:

- What is the NumPy Library in Python?
- Python list vs NumPy arrays – What’s the Difference?
- Creating a NumPy Array
- Basic ndarray
- Array of zeros
- Array of ones
- Random numbers in ndarray
- An array of your choice
- Imatrix in NumPy
- Evenly spaced ndarray

- The Shape and Reshaping of NumPy Array
- Dimensions of NumPy array
- Shape of NumPy array
- Size of NumPy array
- Reshaping a NumPy array
- Flattening a NumPy array
- Transpose of a NumPy array

- Expanding and Squeezing a NumPy Array
- Expanding a NumPy array
- Squeezing a NumPy array

- Indexing and Slicing of NumPy Array
- Slicing 1-D NumPy arrays
- Slicing 2-D NumPy arrays
- Slicing 3-D NumPy arrays
- Negative slicing of NumPy arrays

- Stacking and Concatenating Numpy Arrays
- Stacking ndarrays
- Concatenating ndarrays

- Broadcasting in Numpy Arrays – A class apart!
- NumPy Ufuncs – The secret of its success!
- Maths with NumPy Arrays
- Mean, Median and Standard deviation
- Min-Max values and their indexes

- Sorting in NumPy Arrays
- NumPy Arrays and Images

## What is the NumPy library in Python?

NumPy stands for Numerical Python and is one of the most useful scientific libraries in Python programming. It provides support for large multidimensional array objects and various tools to work with them. **Various other libraries like Pandas, Matplotlib, and Scikit-learn are built on top of this amazing library.**

Arrays are a collection of elements/values, that can have one or more dimensions. An array of one dimension is called a *Vector* while having two dimensions is called a *Matrix*.

NumPy arrays are called **ndarray** or** N-dimensional arrays** and they store elements of the same type and size. It is known for its high-performance and provides efficient storage and data operations as arrays grow in size.

NumPy comes pre-installed when you download Anaconda. But if you want to install NumPy separately on your machine, just type the below command on your terminal:

pip install numpy

Now you need to import the library:

import numpy as np

**np** is the de facto abbreviation for NumPy used by the data science community.

## Python Lists vs NumPy Arrays – What’s the Difference?

If you’re familiar with Python, you might be wondering why use NumPy arrays when we already have Python lists? After all, these Python lists act as an array that can store elements of various types. This is a perfectly valid question and the answer to this is hidden in the way Python stores an object in memory.

A Python object is actually a pointer to a memory location that stores all the details about the object, like bytes and the value. Although this extra information is what makes Python a dynamically typed language, it also comes at a cost which becomes apparent when storing a large collection of objects, like in an array.

Python lists are essentially an array of pointers, each pointing to a location that contains the information related to the element. This adds a lot of overhead in terms of memory and computation. And most of this information is rendered redundant when all the objects stored in the list are of the same type!

**To overcome this problem, we use NumPy arrays that contain only homogeneous elements**, i.e. elements having the same data type. This makes it more efficient at storing and manipulating the array. This difference becomes apparent when the array has a large number of elements, say thousands or millions. **Also, with NumPy arrays, you can perform element-wise operations, something which is not possible using Python lists!**

This is the reason why NumPy arrays are preferred over Python lists when performing mathematical operations on a large amount of data.

## Creating a NumPy Array

**Basic ndarray**

NumPy arrays are very easy to create given the complex problems they solve. To create a very basic ndarray, you use the np.array() method. All you have to pass are the values of the array as a list:

`np.array([1,2,3,4])`

**Output:**

This array contains integer values. You can specify the type of data in the **dtype** argument:

`np.array([1,2,3,4],dtype=np.float32)`

**Output:**

Since NumPy arrays can contain only homogeneous datatypes, values will be upcast if the types do not match:

```
np.array([1,2.0,3,4])
```

**Output:**

Here, NumPy has upcast integer values to float values.

**NumPy arrays can be multi-dimensional too.**

`np.array([[1,2,3,4],[5,6,7,8]])`

Here, we created a 2-dimensional array of values.

*Note: A matrix is just a rectangular array of numbers with shape N x M where N is the number of rows and M is the number of columns in the matrix. The one you just saw above is a 2 x 4 matrix.*

**Array of zeros**

NumPy lets you create an array of all zeros using the **np.zeros()** method. All you have to do is pass the shape of the desired array:

`np.zeros(5)`

The one above is a 1-D array while the one below is a 2-D array:

`np.zeros((2,3))`

`np.ones(5,dtype=np.int32)`

### Random numbers in ndarrays

Another very commonly used method to create ndarrays is np.random.rand() method. It creates an array of a given shape with random values from [0,1):

# random np.random.rand(2,3)

array([[0.95580785, 0.98378873, 0.65133872], [0.38330437, 0.16033608, 0.13826526]])

**An array of your choice**

Or, in fact, you can create an array filled with any given value using the **np.full()** method. Just pass in the shape of the desired array and the value you want:

`np.full((2,2),7)`

**Imatrix in NumPy**

Another great method is **np.eye()** that returns an array with **1s** along its diagonal and **0s** everywhere else.

*An Identity matrix is a square matrix that has 1s along its main diagonal and 0s everywhere else. Below is an Identity matrix of shape 3 x 3.*

*Note: A square matrix has an N x N shape. This means it has the same number of rows and columns.*

```
# identity matrix
np.eye(3)
```

However, NumPy gives you the flexibility to change the diagonal along which the values have to be **1s**. You can either move it above the main diagonal:

`# not an identity matrix`

`np.eye(3,k=1)`

Or move it below the main diagonal:

`np.eye(3,k=-2)`

*Note: A matrix is called the Identity matrix only when the 1s are along the main diagonal and not any other diagonal!*

**Evenly spaced ndarray**

You can quickly get an evenly spaced array of numbers using the **np.arange()** method:

`np.arange(5)`

The start, end and step size of the interval of values can be explicitly defined by passing in three numbers as arguments for these values respectively. A point to be noted here is that the interval is defined as [start,end) where the last number will not be included in the array:

`np.arange(2,10,2)`

Alternate elements were printed because the step-size was defined as 2. Notice that 10 was not printed as it was the last element.

Another similar function is **np.linspace()**, but instead of step size, it takes in the number of samples that need to be retrieved from the interval. A point to note here is that the last number is included in the values returned unlike in the case of np.arange().

`np.linspace(0,1,5)`

Great! Now you know how to create arrays using NumPy. But its also important to know the shape of the array.

## The Shape and Reshaping of NumPy Arrays

Once you have created your ndarray, the next thing you would want to do is check the number of axes, shape, and the size of the ndarray.

**Dimensions of NumPy arrays**

You can easily determine the number of dimensions or axes of a NumPy array using the **ndims** attribute:

# number of axis a = np.array([[5,10,15],[20,25,20]]) print('Array :','\n',a) print('Dimensions :','\n',a.ndim)

This array has two dimensions: 2 rows and 3 columns.

**Shape of NumPy array**

The **shape** is an attribute of the NumPy array that shows how many rows of elements are there along each dimension. You can further index the shape so returned by the ndarray to get value along each dimension:

```
a = np.array([[1,2,3],[4,5,6]])
print('Array :','\n',a)
print('Shape :','\n',a.shape)
print('Rows = ',a.shape[0])
print('Columns = ',a.shape[1])
```

**Size of NumPy array**

You can determine how many values there are in the array using the **size** attribute. It just multiplies the number of rows by the number of columns in the ndarray:

# size of array a = np.array([[5,10,15],[20,25,20]]) print('Size of array :',a.size) print('Manual determination of size of array :',a.shape[0]*a.shape[1])

# reshape a = np.array([3,6,9,12]) np.reshape(a,(2,2))

Here, I reshaped the ndarray from a 1-D to a 2-D ndarray.

While reshaping, if you are unsure about the shape of any of the axis, just input -1. NumPy automatically calculates the shape when it sees a -1:

a = np.array([3,6,9,12,18,24]) print('Three rows :','\n',np.reshape(a,(3,-1))) print('Three columns :','\n',np.reshape(a,(-1,3)))

**Flattening a NumPy array**

Sometimes when you have a multidimensional array and want to collapse it to a single-dimensional array, you can either use the **flatten()** method or the **ravel()** method:

a = np.ones((2,2)) b = a.flatten() c = a.ravel() print('Original shape :', a.shape) print('Array :','\n', a) print('Shape after flatten :',b.shape) print('Array :','\n', b) print('Shape after ravel :',c.shape) print('Array :','\n', c)

Original shape : (2, 2) Array : [[1. 1.] [1. 1.]] Shape after flatten : (4,) Array : [1. 1. 1. 1.] Shape after ravel : (4,) Array : [1. 1. 1. 1.]

But an important difference between flatten() and ravel() is that the former returns a copy of the original array while the latter returns a reference to the original array. This means any changes made to the array returned from ravel() will also be reflected in the original array while this will not be the case with flatten().

b[0] = 0 print(a)

[[1. 1.] [1. 1.]]

## One thought on "The Ultimate NumPy Tutorial for Data Science Beginners"

## RAKESH says: October 11, 2021 at 4:20 pm