Curriculum

Sets

Sets are an important built-in data structure in Python that store multiple unique values in a single collection. Unlike Lists and Tuples, Sets do not allow duplicate values and do not maintain a fixed order of elements. In Data Analytics, Data Science, Machine Learning, Business Analytics, and Software Development, Sets are widely used for removing duplicates, performing mathematical set operations, and efficiently managing unique data.

When working with large datasets, duplicate values often appear in customer records, transaction data, product information, and analytical reports. Sets provide an efficient solution for handling such scenarios.

Organizations use Sets for:

Duplicate Removal
Data Cleaning
Customer Analysis
Product Analysis
Data Validation
Machine Learning
Business Analytics
Statistical Processing

Understanding Sets is essential for efficient data processing and data quality management.

What are Sets?

A Set is an unordered collection of unique elements.

Example:

cities = {"Jaipur", "Delhi", "Mumbai"}

Here:

cities is a Set.
Each element is unique.
Duplicate values are automatically removed.

Sets are enclosed within curly braces {}.

Why Sets are Important

Sets provide several advantages.

Benefits:

Duplicate Removal
Fast Membership Testing
Mathematical Set Operations
Efficient Data Processing

Sets are commonly used in Data Analytics workflows.

Characteristics of Sets

Sets have several important characteristics.

Unordered

Elements do not have fixed positions.

Unique Values

Duplicate elements are not allowed.

Mutable

Elements can be added and removed.

No Indexing

Elements cannot be accessed using indexes.

These characteristics make Sets different from Lists and Tuples.

Creating a Set

Example:

fruits = {"Apple", "Mango", "Orange"}

Example:

numbers = {10, 20, 30, 40}

Example:

mixed_data = {"Rahul", 25, True, 50000.50}

Sets can store different data types.

Removing Duplicate Values

One of the most common uses of Sets is duplicate removal.

Example:

numbers = {10, 20, 20, 30, 30, 40}

print(numbers)

Output:

{10, 20, 30, 40}

Duplicates are automatically eliminated.

Applications:

Data cleaning.

Creating an Empty Set

Incorrect:

data = {}

This creates a Dictionary.

Correct:

data = set()

Always use set() for an empty Set.

Adding Elements

The add() method inserts a new element.

Example:

cities = {"Jaipur", "Delhi"}

cities.add("Mumbai")

print(cities)

Output:

{'Jaipur', 'Delhi', 'Mumbai'}

Applications:

Dynamic data collection.

Adding Multiple Elements

The update() method adds multiple values.

Example:

cities = {"Jaipur"}

cities.update(
    ["Delhi", "Mumbai"]
)

print(cities)

Applications:

Dataset expansion.

Removing Elements Using remove()

Example:

cities = {
    "Jaipur",
    "Delhi",
    "Mumbai"
}

cities.remove("Delhi")

print(cities)

Output:

{'Jaipur', 'Mumbai'}

Applications:

Data cleanup.

Removing Elements Using discard()

Example:

cities = {
    "Jaipur",
    "Delhi"
}

cities.discard("Pune")

No error occurs even if the element does not exist.

Benefits:

Safer deletion.

Removing Random Elements Using pop()

Example:

cities = {
    "Jaipur",
    "Delhi"
}

cities.pop()

Since Sets are unordered, any element may be removed.

Applications:

Special processing scenarios.

Clearing a Set

Example:

cities = {
    "Jaipur",
    "Delhi"
}

cities.clear()

print(cities)

Output:

set()

Applications:

Resetting collections.

Finding Set Length

Use the len() function.

Example:

cities = {
    "Jaipur",
    "Delhi",
    "Mumbai"
}

print(len(cities))

Output:

Applications:

Unique record counting.

Looping Through a Set

Example:

cities = {
    "Jaipur",
    "Delhi",
    "Mumbai"
}

for city in cities:
    print(city)

Applications:

Data processing.

Checking Membership

Example:

cities = {
    "Jaipur",
    "Delhi"
}

print("Delhi" in cities)

Output:

True

Benefits:

Fast lookups.

Set Union

Union combines elements from multiple Sets.

Example:

set1 = {1, 2, 3}
set2 = {3, 4, 5}

result = set1.union(set2)

print(result)

Output:

{1, 2, 3, 4, 5}

Applications:

Dataset merging.

Set Intersection

Intersection returns common elements.

Example:

set1 = {1, 2, 3}
set2 = {2, 3, 4}

result = set1.intersection(set2)

print(result)

Output:

{2, 3}

Applications:

Customer overlap analysis.

Set Difference

Difference returns unique elements from the first Set.

Example:

set1 = {1, 2, 3}
set2 = {2, 3, 4}

print(
    set1.difference(set2)
)

Output:

{1}

Applications:

Data comparison.

Symmetric Difference

Returns elements that exist in only one Set.

Example:

set1 = {1, 2, 3}
set2 = {2, 3, 4}

print(
    set1.symmetric_difference(set2)
)

Output:

{1, 4}

Applications:

Comparative analysis.

Sets in Data Analytics

Data Analysts frequently use Sets for:

Duplicate Removal
Customer Analysis
Product Analysis
Data Validation

Example:

customers = [
    "Rahul",
    "Priya",
    "Rahul",
    "Amit"
]

unique_customers = set(customers)

print(unique_customers)

Output:

{'Rahul', 'Priya', 'Amit'}

Applications:

Customer deduplication.

Customer Analytics Example

Example:

website_customers = {
    "Rahul",
    "Priya",
    "Amit"
}

mobile_customers = {
    "Priya",
    "Amit",
    "Neha"
}

common_customers = (
    website_customers
    .intersection(
        mobile_customers
    )
)

print(common_customers)

Output:

{'Priya', 'Amit'}

Applications:

Customer behavior analysis.

Product Analytics Example

Example:

product_categories = {
    "Electronics",
    "Fashion",
    "Books"
}

print(product_categories)

Applications:

Inventory analysis.

Sets in Machine Learning

Machine Learning projects use Sets for:

Feature Selection
Duplicate Removal
Data Validation

Benefits:

Improved data quality.

Sets in Business Analytics

Business Analysts use Sets for:

Unique Customer Analysis
Product Segmentation
KPI Validation

Benefits:

Better reporting accuracy.

Sets vs Lists

Sets

Characteristics:

Unique values.
No indexing.
Faster membership checks.

Example:

data = {1, 2, 3}

Lists

Characteristics:

Ordered.
Allow duplicates.
Index-based access.

Example:

data = [1, 2, 3]

Choose according to project requirements.

Sets vs Tuples

Sets

Mutable
Unordered
Unique values

Tuples

Immutable
Ordered
Allow duplicates

Both serve different purposes.

Frozen Sets

A Frozen Set is an immutable version of a Set.

Example:

numbers = frozenset(
    [1, 2, 3]
)

print(numbers)

Applications:

Secure data storage.

Common Mistakes with Sets

Attempting Index Access

Example:

numbers = {1, 2, 3}

print(numbers[0])

Produces an error.

Using Duplicate Values

Duplicates are automatically removed.

Using Mutable Elements

Example:

data = {[1, 2]}

Produces an error.

Avoiding these mistakes improves code quality.

Best Practices for Sets

Use Sets for Unique Data

Improve data quality.

Use Membership Testing

Improve performance.

Apply Set Operations

Simplify comparisons.

Validate Data Before Processing

Improve reliability.

Choose Appropriate Data Structures

Match project requirements.

These practices support professional programming.

Advantages of Understanding Sets

Benefits include:

Duplicate Removal.
Faster Searches.
Efficient Comparisons.
Improved Data Quality.
Strong Programming Foundation.

Sets are extremely valuable for analytical and data processing tasks.

Learning Outcomes

After completing this lesson, you will be able to:

Understand Sets.
Create and modify Sets.
Remove duplicate values.
Apply Set operations.
Use Union, Intersection, and Difference.
Compare Sets with Lists and Tuples.
Apply Sets in Data Analytics projects.

Frequently Asked Questions (FAQs)

What are Sets in Python?

Sets are unordered collections of unique values.

Do Sets allow duplicate values?

No. Duplicate values are automatically removed.

Are Sets mutable?

Yes. Elements can be added and removed.

Can Sets be indexed?

No. Sets do not support indexing.

What is Set Union?

Union combines all unique elements from multiple Sets.

What is Set Intersection?

Intersection returns common elements.

What is a Frozen Set?

A Frozen Set is an immutable Set.

Why are Sets important in Data Analytics?

They help remove duplicates, validate data, and perform efficient comparisons.

Explore More Learning Opportunities

Want to master Python, SQL, Power BI, and Data Analytics?

Click here for more free courses

Curriculum

Data Analytics Course with Python, SQL, Excel & Power BI

Sets

Sets

What are Sets?

Why Sets are Important

Characteristics of Sets

Unordered

Unique Values

Mutable

No Indexing

Creating a Set

Removing Duplicate Values

Creating an Empty Set

Adding Elements

Adding Multiple Elements

Removing Elements Using remove()

Removing Elements Using discard()

Removing Random Elements Using pop()

Clearing a Set

Finding Set Length

Looping Through a Set

Checking Membership

Set Union

Set Intersection

Set Difference

Symmetric Difference

Sets in Data Analytics

Customer Analytics Example

Product Analytics Example

Sets in Machine Learning

Sets in Business Analytics

Sets vs Lists

Sets

Lists

Sets vs Tuples

Sets

Tuples

Frozen Sets

Common Mistakes with Sets

Attempting Index Access

Using Duplicate Values

Using Mutable Elements

Best Practices for Sets

Use Sets for Unique Data

Use Membership Testing

Apply Set Operations

Validate Data Before Processing

Choose Appropriate Data Structures

Advantages of Understanding Sets

Learning Outcomes

Frequently Asked Questions (FAQs)

What are Sets in Python?

Do Sets allow duplicate values?

Are Sets mutable?

Can Sets be indexed?

What is Set Union?

What is Set Intersection?

What is a Frozen Set?

Why are Sets important in Data Analytics?

Explore More Learning Opportunities

Enter Details

Modal title