|
1 | 1 | ---
|
2 | 2 | Title: '.unique()'
|
3 |
| -Description: 'Returns an array containing all the unique elements in the data series, with no specific order.' |
| 3 | +Description: 'Returns a NumPy array of the unique values in the order they appear in the Series.' |
4 | 4 | Subjects:
|
5 | 5 | - 'Computer Science'
|
6 | 6 | - 'Data Science'
|
7 |
| - - 'Data Visualization' |
8 | 7 | Tags:
|
9 | 8 | - 'Arrays'
|
10 | 9 | - 'Data'
|
11 | 10 | - 'Encoding'
|
12 | 11 | - 'Functions'
|
13 |
| - - 'Pandas' |
14 | 12 | CatalogContent:
|
15 | 13 | - 'learn-python-3'
|
16 | 14 | - 'paths/computer-science'
|
17 |
| - - 'paths/data-science' |
18 |
| - - 'paths/data-science-foundations' |
19 | 15 | ---
|
20 | 16 |
|
21 |
| -The **`.unique()`** function returns unique values from a data series using a hash table. It operates similarly to `numpy.unique()` but is notably faster, especially with large datasets, and it also includes NA values. |
| 17 | +The Pandas **`.unique()`** function returns a [NumPy array](https://www.codecademy.com/resources/docs/numpy/ndarray) containing all the unique elements in a data series, with no specific order. It operates similarly to [NumPy's](https://www.codecademy.com/resources/docs/numpy) `.unique()`, but can be more efficient for large Series with repeated elements, and it also includes `NaN` values. |
22 | 18 |
|
23 |
| -## Syntax |
| 19 | +## Pandas `.unique()` Syntax |
24 | 20 |
|
25 | 21 | ```pseudo
|
26 |
| -pd.unique(data_series) |
| 22 | +series.unique() |
27 | 23 | ```
|
28 | 24 |
|
29 |
| -The `data_series` parameter represents a 1-dimensional array-like data structure from which unique elements will be returned by the function. The `dtype` of the return matches that of the input, which can be of Index, Categorical, or Series type. The function lists the unique elements in the order they appear in the input data series, and it does _NOT_ sort them. |
| 25 | +**Parameters:** |
30 | 26 |
|
31 |
| -## Example |
| 27 | +The `.unique()` function takes no parameters. |
32 | 28 |
|
33 |
| -The following example demonstrates the use of the `.unique()` function: |
| 29 | +**Return value:** |
| 30 | + |
| 31 | +Returns a NumPy array containing the unique values from a Pandas Series, in the order they appear. |
| 32 | + |
| 33 | +## Example 1: Basic Usage of `.unique()` |
| 34 | + |
| 35 | +In this example, `.unique()` is used to return all the unique elements in `series`: |
34 | 36 |
|
35 | 37 | ```py
|
36 | 38 | import pandas as pd
|
37 | 39 |
|
38 | 40 | series = pd.Series([3, -1, 5, -1, 2, 1, 3, 2, 1, 5, -2, 1, 2])
|
39 | 41 | unique_elements = series.unique()
|
40 |
| -print(f"The unique elements in series {list(series)} are\n {unique_elements}") |
| 42 | +print(unique_elements) |
41 | 43 | ```
|
42 | 44 |
|
43 |
| -The above code outputs the following: |
| 45 | +Here is the output: |
44 | 46 |
|
45 | 47 | ```shell
|
46 |
| -The unique elements in series [3, -1, 5, -1, 2, 1, 3, -2, 1, 5, 2, 1, 2] are |
47 |
| -[3 -1 5 2 1 3 -2] |
| 48 | +[ 3 -1 5 2 1 -2] |
48 | 49 | ```
|
49 | 50 |
|
50 |
| -## Codebyte Example |
| 51 | +## Example 2: Using `.unique()` on a DataFrame Column |
51 | 52 |
|
52 |
| -The code below shows off the effects of `unique()` on different kinds of data types: Index, Categorical, and Series. After defining the array-like objects, the `unique()` method is applied to list out the unique elements of each object, and the resulting data is printed out to the console. |
| 53 | +In this example, `.unique()` is used to return all the unique names from the `Name` column of the `df` [DataFrame](https://www.codecademy.com/resources/docs/pandas/dataframe): |
53 | 54 |
|
54 |
| -```codebyte/python |
| 55 | +```py |
55 | 56 | import pandas as pd
|
56 | 57 |
|
57 |
| -index = pd.Index([ |
58 |
| - pd.Timestamp("20160101", tz="US/Eastern"), |
59 |
| - pd.Timestamp("20160101", tz="US/Eastern"), |
60 |
| - pd.Timestamp("20160102", tz="US/Eastern"), |
61 |
| - pd.Timestamp("20160101", tz="US/Central"), |
62 |
| - ]) |
| 58 | +df = pd.DataFrame({ |
| 59 | + 'Name': ['Alice', 'Bob', 'Alice', 'David', 'Bob'], |
| 60 | + 'Age': [25, 30, 25, 40, 30] |
| 61 | +}) |
| 62 | + |
| 63 | +unique_names = df['Name'].unique() |
63 | 64 |
|
64 |
| -print("Unique elements in Index:") |
65 |
| -print(pd.unique(index)) |
| 65 | +print(unique_names) |
| 66 | +``` |
| 67 | + |
| 68 | +Here is the output: |
66 | 69 |
|
67 |
| -grades = pd.Categorical(['A', 'B', 'B+', 'C-', 'D', 'A', 'B', 'A', 'B-', 'F'], categories=['A', 'A-', 'B+', 'B', 'B-', 'C+', 'C', 'C-', 'D+', 'D', 'F'], ordered=True) |
| 70 | +```shell |
| 71 | +['Alice' 'Bob' 'David'] |
| 72 | +``` |
68 | 73 |
|
69 |
| -print("\nUnique elements in Categorical:") |
70 |
| -print(pd.unique(grades)) |
| 74 | +## Codebyte Example: Dealing with Missing Values Using `.unique()` |
71 | 75 |
|
72 |
| -string_series = pd.Series(['John', 'Jack', 'Ellen', 'Kirsten', 'Jack', 'John Jr', 'Kristen', 'Ellen']) |
| 76 | +This codebyte example shows how `.unique()` deals with missing values: |
73 | 77 |
|
74 |
| -print("\nUnique elements in String Series:") |
75 |
| -print(pd.unique(string_series)) |
| 78 | +```codebyte/python |
| 79 | +import pandas as pd |
76 | 80 |
|
77 |
| -int_series = pd.Series([2 * n for n in range(10)] + [3 * n for n in range(5)]) |
| 81 | +data_with_nan = pd.Series([1, 2, 2, None, 3, None, 1]) |
78 | 82 |
|
79 |
| -print("\nUnique elements in Integer Series:") |
80 |
| -print(pd.unique(int_series)) |
| 83 | +unique_with_nan = data_with_nan.unique() |
| 84 | +
|
| 85 | +print(unique_with_nan) |
81 | 86 | ```
|
| 87 | + |
| 88 | +## Frequently Asked Questions |
| 89 | + |
| 90 | +### 1. Does `.unique()` work on DataFrames directly? |
| 91 | + |
| 92 | +No. `.unique()` only works on Series. To find unique values in a DataFrame column, you must select the column first: |
| 93 | + |
| 94 | +```py |
| 95 | +df['column_name'].unique() |
| 96 | +``` |
| 97 | + |
| 98 | +### 2. What is the difference between `.unique()` and `.nunique()`? |
| 99 | + |
| 100 | +- `.unique()` returns a NumPy array of the unique values. |
| 101 | +- `.nunique()` returns the count of unique values. |
| 102 | + |
| 103 | +### 3. What is the difference between `.unique()` and `.drop_duplicates()` in Pandas? |
| 104 | + |
| 105 | +- `.unique()` is used on a single Series and returns a NumPy array of unique values in the order they appear. |
| 106 | +- `.drop_duplicates()` is used on a Series or DataFrame and returns a Pandas object (Series or DataFrame) with duplicate rows or values removed. |
0 commit comments