“You do not really understand something unless you can explain it to your grandmother.”

It is quite important to be able to transfer your knowledge to others using plain and easily understandable descriptions which also helps to solidify your comprehension of the topic. So, I have decided to start a series of blog posts to build common machine learning algorithms from scratch inorder to clarify these methods for myself and make sure I corectly understand the mechanics behind each of them.

Let me be clear from the very begining: **It’s all about fitting a function!**

Consider we have a dataset containing the number of trucks crossed border from Mexico to U.S through Ottay Messa port. Note that it’s just a subset of the entire inbound crossings dataset available on Kaggle. First of all, it’s always better to plot the data which may help us have some insight. To load the data from a .CSV file, we are going use Pandas which is a well-known data analysis/manipulation python library. We can then plot the data using Matplotlib (another python library for data visualization).

```
df = pd.read_csv('./regression/dataset.csv')
x_training_set = pd.to_datetime(df.Date, infer_datetime_format=True).to_numpy()
y_training_set = df.Value.to_numpy()
number_of_training_examples = len(x_training_set)
#plot the raw data
fig, ax = plt.subplots(figsize=(8, 6))
year_locator = mdates.YearLocator(2)
year_month_formatter = mdates.DateFormatter("%Y-%m")
ax.xaxis.set_major_locator(year_locator)
ax.xaxis.set_minor_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(year_month_formatter)
ax.plot(x_training_set,y_training_set, ".")
fig.autofmt_xdate()
plt.show()
```

Note that in the original data, each value is corresponding to a month, so I mapped the date intervals into an integer representation.

What we are observing here obviosuly is not an exact linear function, but for the sake of simplicity we can model broder corossings using a linear function! As we already know, the equation of a line is a below:

\[f(x) = mx + c\]where **m** stands for the slope and **c** is the intercep on the y axis. But we can have infinite number of possible values for these parameters. Let’s look at some possible arbitrary lines with values [m=70,c=40000], [m=100,c=40000], [m=140,c=40000] represented with orange, green, and red colors respectively.

```
#plot arbitrary lines
def plot_line(ax, m,c,xMax):
y0 = (1*m)+c
ymax = (xMax*m)+c
ax.plot([x_training_set[0],x_training_set[xMax-1]],[y0,ymax])
fig, ax = plt.subplots(figsize=(8, 6))
year_locator = mdates.YearLocator(2)
year_month_formatter = mdates.DateFormatter("%Y-%m")
ax.xaxis.set_major_locator(year_locator)
ax.xaxis.set_minor_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(year_month_formatter)
ax.plot(x_training_set,y_training_set, ".")
fig.autofmt_xdate()
plot_line(ax,70,40000,number_of_training_examples)
plot_line(ax,100,40000,number_of_training_examples)
plot_line(ax,140,40000,number_of_training_examples)
plt.show()
```

But, What are the parameter values we are choosing for our linear equation to properly fit our data points?

To find the proper fit to our data, basically we have to minimize the average distance of the data points to our arbitrary line. In other words, we are finding the difference between the predicted value with the actual traing data.

there are two ways to calculate the error:

**Mean Squared Error (MSE)**: which considers the squared difference of the values.

**Mean Absolute Error (MAE)**: which considers the absolute difference of the values.

Note that we need to sum up these error values as positive numbers. Otherwise negative values will compensate for positive ones which will make our optimization problem impossible.

Our objective is to find our linear equation parameters **m** and **c** to minimize average error over all training set data (know as the **cost function**) defined below:

where **n** is the number of training examples. Now let’s explore the parameters space and plot the cost function to see how it looks like. for the MSE cost function we have the parameters space-cost plot below

```
# visualize cost function
def line_equation(m,c,x):
return (m*x)+ c
def cost_function(m,c,training_examples_x,training_examples_y):
sum_of_errors = 0
item_index = 0
for example in training_examples_x:
predicted_y = line_equation(m,c,item_index)
sum_of_errors += (predicted_y - training_examples_y[item_index])**2
#sum_of_errors += abs(predicted_y - training_examples_y[item_index])
item_index+=1
mse = (sum_of_errors / len(training_examples_x))
return mse
fig = plt.figure()
fig.set_size_inches(8, 6)
ax = fig.add_subplot(projection='3d')
cost_func_x_points = []
cost_func_y_points = []
cost_func_z_points = []
for m in np.arange(-200,500,10):
for c in np.arange(-10000,60000,200):
cost = cost_function(m,c,x_training_set,y_training_set)
cost_func_x_points.append(m)
cost_func_y_points.append(c)
cost_func_z_points.append(cost)
ax.scatter(cost_func_x_points, cost_func_y_points,
cost_func_z_points,c=cost_func_z_points,marker='.')
ax.set_xlabel('M')
ax.set_ylabel('C')
ax.set_zlabel('Cost')
plt.show()
```

and for the MAE we have the plot below:

As you can see both these cost functions are convex and they just have one minimum point at the bottom of the slope(global minima). Then based on calculus, it means that if we could find the point that the derivative of this function is zero, we have found the optimal parameters for our model. We can simply use the equation below to find the parameters.

\(theta = (X^T.X)^-1.(X^T.Y)\) where X is the training features sample vector. Y is the output vector. and the result(theta) are the parameters for our regression model.

This equation is called Normal Equation and you can find the math behind it here.

So let’s run it on our dataset and see how it works.

```
#linear equation using numpy
x_training_set_numbers = np.arange(0,len(x_training_set),1)
ones_vector = np.ones((len(x_training_set), 1))
x_training_set_numbers = np.reshape(x_training_set_numbers, (len(x_training_set_numbers), 1))
x_training_set_numbers = np.append(ones_vector, x_training_set_numbers, axis=1)
theta_list = np.linalg.inv(x_training_set_numbers.T.dot(x_training_set_numbers)).dot(x_training_set_numbers.T).dot(y_training_set)
print(theta_list)
#visualize raw data with fitted line
fig, ax = plt.subplots(figsize=(8, 6))
year_locator = mdates.YearLocator(2)
year_month_formatter = mdates.DateFormatter("%Y-%m")
ax.xaxis.set_major_locator(year_locator)
ax.xaxis.set_minor_locator(mdates.YearLocator())
ax.xaxis.set_major_formatter(year_month_formatter)
ax.plot(x_training_set,y_training_set, ".")
fig.autofmt_xdate()
plot_line(ax,theta_list[1],theta_list[0],number_of_training_examples)
plt.show()
```

Cuda toolkit is another software layers on top of the Nvidia Driver. As it is mentioned in Nvidia website, different Cuda toolkit versions are mostly forward compatible with Cuda drivers. It means that if you already have nvidia-driver-515, which is a fairly new version, it is compatible with cuda-toolkit-11-2.

**Installing Nvidia Driver**

Nvidia driver is the underlying libraries necessary for making the operating system (In our case **Ubuntu 20.04**) work graphical processors. To install drivers you can simply run following command in the terminal:

```
sudo ubuntu-drivers autoinstall
```

or the following command in case you need a specific version:

```
sudo apt install nvidia-driver-470
```

it is also possible to install the drivers using Ubuntu Software Center

to verify successful installation you should run the command below:

```
nvidia-smi
```

If you look at TensorFlow installation instructions, the compatible cuda-toolkit version is mentioned in Conda install command.

To make TensorFlow work, we should install cuda-toolkit 11.2. basically, we assume that Conda will take care of cudatoolkit installation, but it **DOESN’T**. It is necessary to install cuda toolkit separately on the system. We can either install Cuda toolkit using runfile provided by Nvidia or install it using apt command.

Runfile installation:

```
wget https://developer.download.nvidia.com/compute/cuda/11.2.0/local_installers/cuda_11.2.0_460.27.04_linux.run
sudo sh cuda_11.2.0_460.27.04_linux.run
```

Apt installation:

```
sudo apt install nvidia-cuda-toolkit-11-2
```

Note: it is necessary to set cuda toolkit path to environment variables to make tensorflow and pytorch able to find libraries and tools. you can add the following commands to your **~/.bashrc** to pick the cudatoolkit automatically in every new bash.

```
export CUDA_HOME=/usr/local/cuda
export PATH=/usr/local/cuda/bin:$PATH
export CPATH=/usr/local/cuda/include:$CPATH
export LIBRARY_PATH=/usr/local/cuda/lib64:$LIBRARY_PATH
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/extras/CUPTI/lib64:$LD_LIBRARY_PATH
```

To verify cuda toolkit installation you should run:

```
nvcc --version
```

**Setting up PyTorch(GPU):**

As it is provided by the PyTorch website, it’s possible to setup different versions of PyTorch using Conda. It is supporting cuda toolkit 10.7, 11.3 and 11.6.

Note that any other versions of cuda toolkit will not work with PyTorch, so make sure the installed version of cuda toolkit is among the versions mentioned in PyTorch installation instructions!

**Setting up TensorFlow(GPU):**

After setting up cuda driver and toolkit, we are ready to install tensorflow using commands bellow:

```
conda install -c conda-forge cudatoolkit=11.2 cudnn=8.1.0
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/
python3 -m pip install tensorflow
# Verify install:
python3 -c "import tensorflow as tf; print(tf.config.list_physical_devices('GPU'))"
```

Note that before using TensorFlow, each time a new terminal is opened, it is **necessary** to define cuda toolkit path in the environment with the command below. It is also possible to add the cuda toolkit path permanently using the commands below,

```
mkdir -p $CONDA_PREFIX/etc/conda/activate.d
echo 'export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$CONDA_PREFIX/lib/' > $CONDA_PREFIX/etc/conda/activate.d/env_vars.sh
```

This way Conda will automatically load cuda toolkit path each time activating an environment.

**Setting up TensorFlow(GPU) with Docker:**

One way to avoid problems with cuda-toolkit and also a way to be able to use different versions of TensorFlow is running Tensorflow docker containers which **DIRECTLY**interact with Cuda driver. There are different containers of tensorflow each integrated with Jupiter lab. After running the container, TensorFlow will be accessible through Jupyter lab.

after installing **Docker** you should install nvidia-docker2 using the following command:

```
sudo apt-get install -y nvidia-docker2
```

pull the TensorFlow(bundled with Jupyterlab) container:

```
docker pull tensorflow/tensorflow:latest-gpu-jupyter
```

then run the container and access Jupyter lab thorugh your browser:

```
docker run --gpus all -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter
```

make sure adding **–gpus all** flag to the command to give docker access to GPUs. you can then access to the jupyter notebook through **http://yourip:8888/** (use localhost if you are running it on a local computer). Also you can add **–restart unless-stopped** flag to make docker container run after each restart(unless stoped manually).

```
docker run --restart unless-stopped --gpus all -it -p 8888:8888 tensorflow/tensorflow:latest-gpu-jupyter
```

to verify if GPU devices are available in tensorflow run the following code in a jupyter notebook cell

```
import tensorflow as tf
print(tf.config.list_physical_devices('GPU'))
```