Unleash the Power of Data Analysis with NumPy: A Beginner's Guide

March 5, 2024, 2:02 p.m.

Unleash the Power of Data Analysis with NumPy: A Beginner's Guide

 

 

 

NumPy (Numerical Python) is an open source Python library that’s used in almost every field of science and engineering.It’s the universal standard for working with numerical data in Python, and it’s at the core of the scientific Python and PyData ecosystems. 

NumPy users include everyone from beginning coders to experienced researchers doing state-of-the art scientific and industrial research and development. 

The NumPy API is used extensively in Pandas, SciPy, Matplotlib, scikit-learn, scikit-image and most other data science and scientific Python packages.The NumPy library contains multidimensional array and matrix data structures . It provides ndarray, a homogeneous n-dimensional array object, with methods to efficiently operate on it. NumPy can be used to perform a wide variety of mathematical operations on arrays. It adds powerful data structures to Python that guarantee efficient calculations with arrays and matrices and it supplies an enormous library of high-level mathematical functions that operate on these arrays and matrices.

What’s the difference between a Python list and a NumPy array?

NumPy gives you an enormous range of fast and efficient ways of creating arrays and manipulating numerical data inside them. While a Python list can contain different data types within a single list, all of the elements in a NumPy array should

be homogenous. The mathematical operations that are meant to be performed on arrays would be extremely inefficient if the arrays weren’t homogenous.

Why use NumPy?

NumPy arrays are faster and more compact than Python lists. An array consumes less memory and is convenient to use.NumPy uses much less memory to store data and it provides a mechanism of specifying the data types. This allows the code to be optimized even further.

What is an array?

An array is a central data structure of the NumPy library. An array is a grid of values and it contains information about the raw data, how to locate an element, and how to interpret an element. It has a grid of elements that can be indexed in various ways. The elements are all of the same type, referred to as the array dtype. An array can be indexed by a tuple of nonnegative integers, by booleans, by another array, or by integers. The rank of the array is the number of dimensions. The shape of the array is a tuple of integers giving the size of the array along each dimension. One way we can initialize NumPy arrays is from Python lists, using nested lists for two- or higher-dimensional data.

import numpy as np

# Create arrays
# Create ndarrays from lists. note: every element must be the same type (will be converted if
# possible)

data1 = [1,2,3,4,5]
arr1 = np.array(data1)

data2 = [range(1,5),range(5,9)]
arr2 = np.array(data2)

print(arr1)

print(arr2)

print(arr1.tolist())

Creating Special Array

# creating special array
a = np.zeros(10)
print(a)

b = np.zeros((3,6))
print(b)

c = np.ones(10)
print(c)





 

numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None, axis=0)[source]
Return evenly spaced numbers over a specified interval.

a = np.linspace(2.0,3.0,num=5)
print(a)

a = np.linspace(2.0,3.0,num=5,endpoint=False)
print(a)

a = np.linspace(2.0,3.0,num=5,retstep=True)
print(a)
numpy.logspace(start, stop, num=50, endpoint=True, base=10.0, dtype=None, axis=0)[source]
Return numbers spaced evenly on a log scale.

In linear space, the sequence starts at base ** start (base to the power of start) and ends with base ** stop
a = np.logspace(2.0,3.0,num=4)
print(a)

a = np.logspace(2.0,3.0,num=4,endpoint=False)
print(a)

a = np.logspace(2.0,3.0,num=4,base=2)
print(a)

arange is like range, except it returns an array (not a list)
a = np.arange(10)
print(a)

float_array = a.astype(float)
print(type(float_array),float_array)

arr = np.arange(5,dtype=float)
print("data type: ",arr.dtype," number of dim: ",arr.ndim," shape: ",arr.shape," size: ",arr.size)

arr1 = np.array([[1,2,3],[4,5,6]])

print(arr1.shape)



Reshaping an array

arr = np.arange(10)
print(arr.shape)

temp = arr.reshape(2,5)
print(temp)



Add New Dimension to Array

arr = np.arange(5,dtype=float)
print(arr.shape)

temp = arr[:,np.newaxis]
print(temp)

print(temp.shape)

Transposing an Array

arr = np.arange(10,dtype=float).reshape(2,5)
print(arr)

print(arr.transpose())




# Flatten: always returns a flat copy of the orriginal array
print(arr.flatten())

# Ravel: returns a view of the original array whenever possible.

print(arr.ravel())



Summary on Axis : Reshaping/Flattening and Selection

arr = np.arange(2*3*4)
print(arr)

temp = arr.reshape(2,3,4)
print(temp)
print(temp[:,1,1])
print(temp[1,2,:])
print(temp[1,2,3])

print(temp.ravel())

Stack Array

a = np.array([10,20])
b = np.array([30,40])

ab = np.stack((a,b))
print(ab)

print(ab.T)

Array Element Selection

arr = np.arange(10,dtype=float).reshape(2,5)
print(arr)

print(arr[0])
print(arr[0,:])
print(arr[0,3])
print(arr[0,2:])

Fancy Indexing: Integer or Boolean Array Indexing

print(arr[arr>5])

Vectorized Operation

nums = np.arange(5)
print(nums)

print(nums*10)

nums = np.sqrt(nums)
print(nums)

print(np.ceil(nums))

print(np.isnan(nums))

print(nums+np.arange(5))

print(np.maximum(nums, np.array([1, -2, 3, -4, 5])) #compare element-wise)

# compute Euclidean distance between two vectors
vec1 = np.random.randn(10)
vec2 = np.random.randn(10)

dist = np.sqrt(np.sum((vec1-vec2)**2))

Math and Stats

#math and stats

rand = np.random.randn(4,2)
print(rand)

print(rand.mean())
print(rand.std())

print(rand.argmin()) #return index of min. number
print(rand.argmax())
print(rand.sum())

print(rand)
print(rand.sum(axis=0))

print(rand.sum(axis=1))

Random Numbers

#random numbers

np.random.seed(12345)

np.random.rand(2,3) # 2 x 3 matrix in [0, 1]

ar  = np.random.randn(10) # random normals (mean 0, sd 1)

print(ar.mean())
print(ar.std())

np.random.randint(0, 2, 10) # 10 randomly picked 0 or 1