Data Structures in Python: An Overview

In previous posts, we discussed variables and how they’re some of the most basic components of any programming language. They’re used to store values that can later be accessed, printed, manipulated and processed for a very wide range of mathematical and analytical purposes. We also discussed the different types of values that can be stored in Python variables and the use cases for each.

In this post, we’ll go from the simple concept of variables to the slightly more sophisticated concept of data structures. We’ll discuss what these are about, what types of data structures there are, and what the use cases are for each.

What is a Data Structure?

Like variables, data structures provide programmers a way to assign and store data so it can be accessed and worked with efficiently. As you may recall from our previous posts, variables typically contain one value whether it’s a string, an integer, a floating point or something else. You can write a lot of useful code using variables alone. However, the nature of variables containing singular values can prove very restrictive if you’re dealing with more complex code that taps into larger data sets.

Enter data structures. Whereas variables tend to be assigned single values, data structures can be made to contain sets of multiple values whether they’re whole numbers, decimals, text or a combination of them. A helpful way would be to think of variables as photos in your smartphone. Each one contains a different image (which is a form of data in itself). It would be easy to manage and view your photos if you only kept a few of them, but when you have hundreds, it’s better to sort them into folders.

In programming, data structures act like folders or containers to manage larger amounts of data. You can assign names to them, create them in different types, and use the properties of each type to serve specific data processing purposes. In Python, data structures are incredibly important since they help you manage data in ways that make your code more efficient and easier to understand. Specifically, here’s why you’ll want to learn how to use data structures as a data science novice:

Organization

Data structures arrange data in specific ways, making it easier to perform operations like searching, sorting, and modifying the data. For example, a list is one of the simplest data structures in Python that lets you store an ordered collection of items.

Efficiency

Choosing the right data structure can make your program run faster and use less memory. For example, if you need to frequently access elements by their position, a list is very efficient. However, if you need to look up elements based on a key, a dictionary (another type of data structure) would be a better choice.

Abstract Data Types (ADTs)

In Python, many data structures are built on the concept of Abstract Data Types (ADTs). An ADT defines what operations can be performed on a data structure, but not how these operations are implemented. For instance, Python lists allow you to append items, remove items, and access items by indexâ€”all these operations are part of the list ADT.

Memory Management

Python handles memory management for you, but understanding data structures can help you write more efficient code. For instance, lists in Python are dynamic, meaning they can grow and shrink as needed. This flexibility comes at a small performance cost, but it’s often worth it for the convenience.

Types of Data Structures in Python

In Python, several data structures are built into the language, each serving different purposes and use cases. Knowing what to use for specific situations is crucial in the correctness, of the output you get and the overall efficiency of the programs you write. Here are the most common data structures you’ll be using in the field of data science and business analytics:

Lists

Lists are ordered collections of items that are mutable, meaning you can change their content by adding, removing, or modifying elements. Lists can contain items of different data types. This means you can expect to see strings, floating points, and integers all the time in this data structure type.

Use Cases

Lists are great for maintaining an ordered sequence of items. They are commonly used when you need to store collections of data that you intend to iterate over or modify. You’ll also want to go with a list if you anticipate that changes will be made to it later, like if you need to add or remove items to a list.

Syntax and Examples

Coding lists is similar to coding variables, except the values in a list will be contained within square brackets. You’ll be using commas to separate values from one another and if you’re inputting strings, you still need to have them in quotation marks. See below:

# Creating a list
fruits = ["apple", "banana", "cherry"]
print(fruits)

Tuples

Tuples are ordered collections of items that are immutable, meaning their content cannot be changed after creation. Like lists, tuples can contain items of different data types.

Use Cases

Tuples are used when you need an ordered collection of items that should not be changed throughout the program. They are often used to store multiple items in a single variable and to return multiple values from a function.

Syntax and Examples

Coding tuples is straightforward and very similar to coding lists. The only difference is that instead of square brackets, you’ll be using parentheses to contain the values you want to include in this data structure.

# Creating a tuple
coordinates = (10.0, 20.0)

Dictionaries

Dictionaries are unordered collections of key-value pairs. Each key in a dictionary must be unique, and keys are used to access the corresponding values.
Dictionaries are ideal for scenarios where you need to associate unique keys with values and perform fast lookups, insertions, and deletions based on keys.

Syntax and Examples

Coding dictionaries is also a straightforward affair. We’ll be using curly brackets to contain the values and each value needs to come with a key. The relationship between a key and a value is expressed using a colon as seen below:

# Creating a dictionary
student_grades = {"Alice": 85, "Bob": 92, "Charlie": 78}

In this dictionary the keys are the human names. These keys are used to access the corresponding values in the dictionary, which are the grades associated with each student.

# Accessing a value by key
print(student_grades["Alice"]) # Output: 85

As you can see, we did not have to call on the whole dictionary to see Alice’s grade. We just needed to call the key out and we were able to output her grade without having to bother with everything else.

Sets

Sets are unordered collections of unique items. Sets do not allow duplicate elements and support operations like union, intersection, and difference.

Use Cases

Sets are useful for storing collections of unique items and performing mathematical set operations. They are also efficient for membership testing, i.e., checking if an item is part of the set.

Syntax and Examples

The syntax for sets is similar to the previous data structures. We’ll be using curly brackets like we did in dictionaries, but the values will not be accompanied by keys as seen here:

# Creating a set
fruits = {"apple", "banana", "cherry"}
# Checking membership
print("apple" in fruits) # Output: True

Lists vs Tuples vs Dictionaries vs Sets

Knowing the distinction between these data structures can spell the difference between spot-on data analytics projects and abject disasters. For clarity, we’ll be contrasting each data structure type with the other to make sure you don’t get them mixed up:

Lists vs Tuples

Both are ordered collections, but lists are mutable while tuples are immutable. Use lists when you need to modify the sequence, and tuples when you need a fixed collection.

Lists vs Dictionaries

Lists are ordered collections of items, accessed by index. Dictionaries are unordered collections of key-value pairs, accessed by keys. Use dictionaries for fast lookups based on unique keys.

Sets vs Lists and Tuples

Sets are unordered collections of unique items, making them ideal for operations involving uniqueness and set theory, unlike lists and tuples which allow duplicates and maintain order.

Dictionaries vs Sets

Both are collections with unique elements. Dictionaries use key-value pairs, making them suitable for associative arrays, while sets are just unique values, suitable for membership testing and set operations.

And there you have it, data structures in Python and their four main types. In our next post, we’ll be giving each one an in-depth look along with ways to use and manipulate them. Until then, familiarize yourself with these types and the basic coding syntax involved in creating each one.

About Glen Dimaandal

Glen Dimaandal is a data scientist from the Philippines. He has a post-graduate degree in Data Science and Business Analytics from the prestigious McCombs School of Business in the University of Texas, Austin. He has nearly 20 years of experience in the field as he worked with major brands from the US, UK, Australia and the Asia-Pacific. Glen is also the CEO of SearchWorks.PH, the Philippines' most respected SEO agency.