Curriculum
Sets are an important built-in data structure in Python that store multiple unique values in a single collection. Unlike Lists and Tuples, Sets do not allow duplicate values and do not maintain a fixed order of elements. In Data Analytics, Data Science, Machine Learning, Business Analytics, and Software Development, Sets are widely used for removing duplicates, performing mathematical set operations, and efficiently managing unique data.
When working with large datasets, duplicate values often appear in customer records, transaction data, product information, and analytical reports. Sets provide an efficient solution for handling such scenarios.
Organizations use Sets for:
Understanding Sets is essential for efficient data processing and data quality management.
A Set is an unordered collection of unique elements.
Example:
cities = {"Jaipur", "Delhi", "Mumbai"}
Here:
cities is a Set.Sets are enclosed within curly braces {}.
Sets provide several advantages.
Benefits:
Sets are commonly used in Data Analytics workflows.
Sets have several important characteristics.
Elements do not have fixed positions.
Duplicate elements are not allowed.
Elements can be added and removed.
Elements cannot be accessed using indexes.
These characteristics make Sets different from Lists and Tuples.
Example:
fruits = {"Apple", "Mango", "Orange"}
Example:
numbers = {10, 20, 30, 40}
Example:
mixed_data = {"Rahul", 25, True, 50000.50}
Sets can store different data types.
One of the most common uses of Sets is duplicate removal.
Example:
numbers = {10, 20, 20, 30, 30, 40}
print(numbers)
Output:
{10, 20, 30, 40}
Duplicates are automatically eliminated.
Applications:
Data cleaning.
Incorrect:
data = {}
This creates a Dictionary.
Correct:
data = set()
Always use set() for an empty Set.
The add() method inserts a new element.
Example:
cities = {"Jaipur", "Delhi"}
cities.add("Mumbai")
print(cities)
Output:
{'Jaipur', 'Delhi', 'Mumbai'}
Applications:
Dynamic data collection.
The update() method adds multiple values.
Example:
cities = {"Jaipur"}
cities.update(
["Delhi", "Mumbai"]
)
print(cities)
Applications:
Dataset expansion.
Example:
cities = {
"Jaipur",
"Delhi",
"Mumbai"
}
cities.remove("Delhi")
print(cities)
Output:
{'Jaipur', 'Mumbai'}
Applications:
Data cleanup.
Example:
cities = {
"Jaipur",
"Delhi"
}
cities.discard("Pune")
No error occurs even if the element does not exist.
Benefits:
Safer deletion.
Example:
cities = {
"Jaipur",
"Delhi"
}
cities.pop()
Since Sets are unordered, any element may be removed.
Applications:
Special processing scenarios.
Example:
cities = {
"Jaipur",
"Delhi"
}
cities.clear()
print(cities)
Output:
set()
Applications:
Resetting collections.
Use the len() function.
Example:
cities = {
"Jaipur",
"Delhi",
"Mumbai"
}
print(len(cities))
Output:
3
Applications:
Unique record counting.
Example:
cities = {
"Jaipur",
"Delhi",
"Mumbai"
}
for city in cities:
print(city)
Applications:
Data processing.
Example:
cities = {
"Jaipur",
"Delhi"
}
print("Delhi" in cities)
Output:
True
Benefits:
Fast lookups.
Union combines elements from multiple Sets.
Example:
set1 = {1, 2, 3}
set2 = {3, 4, 5}
result = set1.union(set2)
print(result)
Output:
{1, 2, 3, 4, 5}
Applications:
Dataset merging.
Intersection returns common elements.
Example:
set1 = {1, 2, 3}
set2 = {2, 3, 4}
result = set1.intersection(set2)
print(result)
Output:
{2, 3}
Applications:
Customer overlap analysis.
Difference returns unique elements from the first Set.
Example:
set1 = {1, 2, 3}
set2 = {2, 3, 4}
print(
set1.difference(set2)
)
Output:
{1}
Applications:
Data comparison.
Returns elements that exist in only one Set.
Example:
set1 = {1, 2, 3}
set2 = {2, 3, 4}
print(
set1.symmetric_difference(set2)
)
Output:
{1, 4}
Applications:
Comparative analysis.
Data Analysts frequently use Sets for:
Example:
customers = [
"Rahul",
"Priya",
"Rahul",
"Amit"
]
unique_customers = set(customers)
print(unique_customers)
Output:
{'Rahul', 'Priya', 'Amit'}
Applications:
Customer deduplication.
Example:
website_customers = {
"Rahul",
"Priya",
"Amit"
}
mobile_customers = {
"Priya",
"Amit",
"Neha"
}
common_customers = (
website_customers
.intersection(
mobile_customers
)
)
print(common_customers)
Output:
{'Priya', 'Amit'}
Applications:
Customer behavior analysis.
Example:
product_categories = {
"Electronics",
"Fashion",
"Books"
}
print(product_categories)
Applications:
Inventory analysis.
Machine Learning projects use Sets for:
Benefits:
Improved data quality.
Business Analysts use Sets for:
Benefits:
Better reporting accuracy.
Characteristics:
Example:
data = {1, 2, 3}
Characteristics:
Example:
data = [1, 2, 3]
Choose according to project requirements.
Both serve different purposes.
A Frozen Set is an immutable version of a Set.
Example:
numbers = frozenset(
[1, 2, 3]
)
print(numbers)
Applications:
Secure data storage.
Example:
numbers = {1, 2, 3}
print(numbers[0])
Produces an error.
Duplicates are automatically removed.
Example:
data = {[1, 2]}
Produces an error.
Avoiding these mistakes improves code quality.
Improve data quality.
Improve performance.
Simplify comparisons.
Improve reliability.
Match project requirements.
These practices support professional programming.
Benefits include:
Sets are extremely valuable for analytical and data processing tasks.
After completing this lesson, you will be able to:
Sets are unordered collections of unique values.
No. Duplicate values are automatically removed.
Yes. Elements can be added and removed.
No. Sets do not support indexing.
Union combines all unique elements from multiple Sets.
Intersection returns common elements.
A Frozen Set is an immutable Set.
They help remove duplicates, validate data, and perform efficient comparisons.
Want to master Python, SQL, Power BI, and Data Analytics?
WhatsApp us