Skip to content

Commit 773fd42

Browse files
authored
[Edit] Pandas Built-in Functions: .unique() (#7404)
* [Edit] Pandas Built-in Functions: .unique() * updated minor content and FAQ based on PAA ---------
1 parent 21cbfc7 commit 773fd42

File tree

1 file changed

+60
-35
lines changed
  • content/pandas/concepts/built-in-functions/terms/unique

1 file changed

+60
-35
lines changed
Lines changed: 60 additions & 35 deletions
Original file line numberDiff line numberDiff line change
@@ -1,81 +1,106 @@
11
---
22
Title: '.unique()'
3-
Description: 'Returns an array containing all the unique elements in the data series, with no specific order.'
3+
Description: 'Returns a NumPy array of the unique values in the order they appear in the Series.'
44
Subjects:
55
- 'Computer Science'
66
- 'Data Science'
7-
- 'Data Visualization'
87
Tags:
98
- 'Arrays'
109
- 'Data'
1110
- 'Encoding'
1211
- 'Functions'
13-
- 'Pandas'
1412
CatalogContent:
1513
- 'learn-python-3'
1614
- 'paths/computer-science'
17-
- 'paths/data-science'
18-
- 'paths/data-science-foundations'
1915
---
2016

21-
The **`.unique()`** function returns unique values from a data series using a hash table. It operates similarly to `numpy.unique()` but is notably faster, especially with large datasets, and it also includes NA values.
17+
The Pandas **`.unique()`** function returns a [NumPy array](https://www.codecademy.com/resources/docs/numpy/ndarray) containing all the unique elements in a data series, with no specific order. It operates similarly to [NumPy's](https://www.codecademy.com/resources/docs/numpy) `.unique()`, but can be more efficient for large Series with repeated elements, and it also includes `NaN` values.
2218

23-
## Syntax
19+
## Pandas `.unique()` Syntax
2420

2521
```pseudo
26-
pd.unique(data_series)
22+
series.unique()
2723
```
2824

29-
The `data_series` parameter represents a 1-dimensional array-like data structure from which unique elements will be returned by the function. The `dtype` of the return matches that of the input, which can be of Index, Categorical, or Series type. The function lists the unique elements in the order they appear in the input data series, and it does _NOT_ sort them.
25+
**Parameters:**
3026

31-
## Example
27+
The `.unique()` function takes no parameters.
3228

33-
The following example demonstrates the use of the `.unique()` function:
29+
**Return value:**
30+
31+
Returns a NumPy array containing the unique values from a Pandas Series, in the order they appear.
32+
33+
## Example 1: Basic Usage of `.unique()`
34+
35+
In this example, `.unique()` is used to return all the unique elements in `series`:
3436

3537
```py
3638
import pandas as pd
3739

3840
series = pd.Series([3, -1, 5, -1, 2, 1, 3, 2, 1, 5, -2, 1, 2])
3941
unique_elements = series.unique()
40-
print(f"The unique elements in series {list(series)} are\n {unique_elements}")
42+
print(unique_elements)
4143
```
4244

43-
The above code outputs the following:
45+
Here is the output:
4446

4547
```shell
46-
The unique elements in series [3, -1, 5, -1, 2, 1, 3, -2, 1, 5, 2, 1, 2] are
47-
[3 -1 5 2 1 3 -2]
48+
[ 3 -1 5 2 1 -2]
4849
```
4950

50-
## Codebyte Example
51+
## Example 2: Using `.unique()` on a DataFrame Column
5152

52-
The code below shows off the effects of `unique()` on different kinds of data types: Index, Categorical, and Series. After defining the array-like objects, the `unique()` method is applied to list out the unique elements of each object, and the resulting data is printed out to the console.
53+
In this example, `.unique()` is used to return all the unique names from the `Name` column of the `df` [DataFrame](https://www.codecademy.com/resources/docs/pandas/dataframe):
5354

54-
```codebyte/python
55+
```py
5556
import pandas as pd
5657

57-
index = pd.Index([
58-
pd.Timestamp("20160101", tz="US/Eastern"),
59-
pd.Timestamp("20160101", tz="US/Eastern"),
60-
pd.Timestamp("20160102", tz="US/Eastern"),
61-
pd.Timestamp("20160101", tz="US/Central"),
62-
])
58+
df = pd.DataFrame({
59+
'Name': ['Alice', 'Bob', 'Alice', 'David', 'Bob'],
60+
'Age': [25, 30, 25, 40, 30]
61+
})
62+
63+
unique_names = df['Name'].unique()
6364

64-
print("Unique elements in Index:")
65-
print(pd.unique(index))
65+
print(unique_names)
66+
```
67+
68+
Here is the output:
6669

67-
grades = pd.Categorical(['A', 'B', 'B+', 'C-', 'D', 'A', 'B', 'A', 'B-', 'F'], categories=['A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'F'], ordered=True)
70+
```shell
71+
['Alice' 'Bob' 'David']
72+
```
6873

69-
print("\nUnique elements in Categorical:")
70-
print(pd.unique(grades))
74+
## Codebyte Example: Dealing with Missing Values Using `.unique()`
7175

72-
string_series = pd.Series(['John', 'Jack', 'Ellen', 'Kirsten', 'Jack', 'John Jr', 'Kristen', 'Ellen'])
76+
This codebyte example shows how `.unique()` deals with missing values:
7377

74-
print("\nUnique elements in String Series:")
75-
print(pd.unique(string_series))
78+
```codebyte/python
79+
import pandas as pd
7680
77-
int_series = pd.Series([2 * n for n in range(10)] + [3 * n for n in range(5)])
81+
data_with_nan = pd.Series([1, 2, 2, None, 3, None, 1])
7882
79-
print("\nUnique elements in Integer Series:")
80-
print(pd.unique(int_series))
83+
unique_with_nan = data_with_nan.unique()
84+
85+
print(unique_with_nan)
8186
```
87+
88+
## Frequently Asked Questions
89+
90+
### 1. Does `.unique()` work on DataFrames directly?
91+
92+
No. `.unique()` only works on Series. To find unique values in a DataFrame column, you must select the column first:
93+
94+
```py
95+
df['column_name'].unique()
96+
```
97+
98+
### 2. What is the difference between `.unique()` and `.nunique()`?
99+
100+
- `.unique()` returns a NumPy array of the unique values.
101+
- `.nunique()` returns the count of unique values.
102+
103+
### 3. What is the difference between `.unique()` and `.drop_duplicates()` in Pandas?
104+
105+
- `.unique()` is used on a single Series and returns a NumPy array of unique values in the order they appear.
106+
- `.drop_duplicates()` is used on a Series or DataFrame and returns a Pandas object (Series or DataFrame) with duplicate rows or values removed.

0 commit comments

Comments
 (0)