In data science, saving progress is essential. Just like saving your progress in a video game ensures you don’t lose your hard-earned achievements, saving data ensures that complex computations and large datasets are preserved without the need to repeat time-consuming tasks.
In our previous posts, we learned about the NumPy package in Python. We also discussed how to access values inside NumPy arrays to retrieve or modify data. In this post, we’ll learn about how we can save and load these arrays so we can halt and resume progress as needed between coding sessions.
NumPy simplifies this process by offering built-in functions that allow you to save arrays to a file and load them back into your environment effortlessly. This not only saves time and computational resources but also minimizes frustration during iterative workflows. Whether you’re using Jupyter Notebook or Google Colab, saving and loading NumPy arrays is a breeze with a little practice. Let’s get started.
Saving and Loading NumPy Arrays in Jupyter Notebook
As discussed in earlier posts, Jupyter Notebook is a widely used environment for data science, favored for its interactivity and flexibility. When you’re working in Jupyter, saving and loading NumPy arrays is pretty straightforward because you’re operating on your local machine. This means you can treat file operations in Jupyter Notebook just like you would when working in any local file system. If you’re familiar with saving and opening Microsoft Office files, working with NumPy files on Jupyter Notebook shouldn’t be a big departure from what you’re used to.
Jupyter Notebook allows you to save and load files easily using NumPy’s built-in functions. By default, when saving NumPy arrays, the files will be saved to your current working directory unless you specify a different path. In the examples below, we’ll show how to save and load arrays, along with specifying file paths for more control over file locations.
Saving NumPy Arrays in Jupyter Notebook
You can save NumPy arrays to the current working directory, or you can specify a full or relative path to save them elsewhere on your machine. Here are some examples of saving in both the default and custom locations.
Saving a Single NumPy Array (.npy format)
In this scenario, we save the file directly in the current working directory and specify a custom folder path.
import NumPy as np import os
# Create a NumPy array array = np.array([1, 2, 3, 4, 5])
# Save the array to the current working directory np.save('my_array.npy', array)
# Save the array to a specific folder (e.g., 'data' folder) # Create the directory if it doesn't exist if not os.path.exists('data'): os.makedirs('data')
# Save to the 'data' folder np.save('data/my_array.npy', array)
In the first case, the file my_array.npy is saved in the same folder where your Jupyter notebook is running. In the second case, the file is saved in a folder named data within your working directory. If the folder doesn’t exist, it is created using os.makedirs().
Saving Multiple NumPy Arrays (.npz format)
You can also save multiple arrays in a custom folder. Here’s an example:
# Create two NumPy arrays array1 = np.array([1, 2, 3]) array2 = np.array([4, 5, 6])
# Save both arrays in the current working directory np.savez('arrays.npz', array1=array1, array2=array2)
# Save the arrays to a subfolder (e.g., 'results' folder) if not os.path.exists('results'): os.makedirs('results')
# Save into the 'results' folder np.savez('results/arrays.npz', array1=array1, array2=array2)
Here, arrays.npz will either be saved in the current working directory or in the results folder if you specify it.
Loading NumPy Arrays in Jupyter Notebook
Loading files works similarly: you can load from the current directory or specify the path where the file is saved.
Loading a Single NumPy Array (.npy format)
# Load the array from the current working directory loaded_array = np.load('my_array.npy')
# Load the array from a specific folder (e.g., 'data' folder) loaded_array_from_folder = np.load('data/my_array.npy')
print(loaded_array) print(loaded_array_from_folder)
If my_array.npy is saved in the current directory, it will load directly. If it’s saved in a specific folder like data, you just need to specify the relative path.
Loading Multiple NumPy Arrays (.npz format)
For multiple arrays, you load the .npz file and then access individual arrays by their assigned names.
# Load from the current directory loaded_data = np.load('arrays.npz')
# Load from the 'results' folder loaded_data_from_folder = np.load('results/arrays.npz')
# Access individual arrays loaded_array1 = loaded_data['array1'] loaded_array2 = loaded_data['array2']
loaded_array1_from_folder = loaded_data_from_folder['array1'] loaded_array2_from_folder = loaded_data_from_folder['array2']
print(loaded_array1, loaded_array2) print(loaded_array1_from_folder, loaded_array2_from_folder)
In this example, the arrays are loaded either from the current working directory or from the results folder, depending on where the file was saved.
Checking File Locations in Jupyter
To see where your files are saved, you can list the contents of your current directory using shell commands inside your Jupyter notebook:
# List files in the current directory !ls
# List files in a specific folder (e.g., 'data') !ls data
This will display the files stored in the respective directories, helping you confirm where the NumPy arrays were saved.
Saving and Loading NumPy Arrays in Google Colab
As discussed in some of our earliest posts, Google Colab is a cloud-based platform that allows you to code Python in its environment. This means its file system is temporary and resets when your session ends. To save your data between sessions, you’ll need to either download the files to your local machine or store them permanently in Google Drive. In this section, we’ll walk through both approaches for saving and loading NumPy arrays when working with Colab:
Saving NumPy Arrays in Google Colab
In Colab, there are two main options for saving your NumPy arrays:
- Saving to the temporary Colab file system. Useful for short-term storage within the same session.
By default, you can save files to Colab’s file system. However, remember that once the session is closed, these files will be lost.
import NumPy as np
# Create a NumPy array array = np.array([10, 20, 30, 40, 50])
# Save the array in the temporary Colab file system np.save('/content/my_temp_array.npy', array)
# Check if the file was saved !ls /content
In this example, the file my_temp_array.npy is saved in Colab’s /content directory. You can list the directory’s contents using the !ls command to verify that the file is there.
- Saving to Google Drive. For persistent storage across sessions.
To keep your NumPy arrays after the session ends, you’ll need to save them to your Google Drive. First, you need to mount your Google Drive so it becomes accessible in Colab.
# Mount Google Drive from google.colab import drive drive.mount('/content/drive')
# Create a folder in Google Drive !mkdir -p '/content/drive/My Drive/colab_data'
# Save the array in Google Drive np.save('/content/drive/My Drive/colab_data/my_array.npy', array)
# Check if the file was saved to Google Drive !ls '/content/drive/My Drive/colab_data'
After mounting Google Drive, you can save the NumPy array to a folder in your Drive (e.g., colab_data). This ensures that the file will persist even after your Colab session ends.
Loading NumPy Arrays in Google Colab
Similar to saving, loading NumPy arrays in Colab depends on where the files are stored. You can load arrays either from Colab’s temporary file system or from Google Drive.
A. Loading from Colab’s Temporary File System
If the NumPy file is still available in the temporary file system during the same session, you can load it directly:
# Load the array from Colab's temporary file system loaded_array = np.load('/content/my_temp_array.npy')
print(loaded_array)
Since this file exists only in the current session, it can be loaded quickly, but it won’t be available if the session restarts.
B. Loading from Google Drive
If the file is saved in Google Drive, you can load it back into Colab by accessing the relevant path after mounting the Drive:
# Load the array from Google Drive loaded_array_from_drive = np.load('/content/drive/My Drive/colab_data/my_array.npy')
print(loaded_array_from_drive)
Once the array is loaded from Google Drive, it behaves exactly like any NumPy array in memory.
Checking Your Google Drive Files in Colab
To verify files saved in Google Drive, you can use the !ls command to view the contents of your Drive’s directories:
# Check files in your Google Drive folder !ls '/content/drive/My Drive/colab_data'
This ensures that your files are saved and accessible from the specified location.
Downloading NumPy Arrays from Colab
If you want to download NumPy arrays directly to your local machine instead of saving them to Google Drive, you can use Colab’s built-in download functionality:
from google.colab import files # Download the NumPy array to your local machine files.download('/content/my_temp_array.npy')
This will trigger a download of the file my_temp_array.npy from Colab’s temporary file system to your computer.