Skip to content

Commit ce371c9

Browse files
committed
imports update for pandas namespace
1 parent fa3ea34 commit ce371c9

File tree

7 files changed

+2539
-6
lines changed

7 files changed

+2539
-6
lines changed

maria/Intro Turorial.ipynb

Lines changed: 373 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,373 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "0e956d07",
6+
"metadata": {},
7+
"source": [
8+
"## Quick start"
9+
]
10+
},
11+
{
12+
"cell_type": "markdown",
13+
"id": "c4f4ecab",
14+
"metadata": {},
15+
"source": [
16+
"https://narwhals-dev.github.io/narwhals/installation/"
17+
]
18+
},
19+
{
20+
"cell_type": "code",
21+
"execution_count": 2,
22+
"id": "a49c9620",
23+
"metadata": {},
24+
"outputs": [],
25+
"source": [
26+
"from __future__ import annotations\n",
27+
"import pandas as pd\n",
28+
"import polars as pl\n",
29+
"import pyarrow as pa\n",
30+
"import narwhals as nw\n",
31+
"from narwhals.typing import IntoFrame"
32+
]
33+
},
34+
{
35+
"cell_type": "code",
36+
"execution_count": 3,
37+
"id": "1aa0f584",
38+
"metadata": {},
39+
"outputs": [],
40+
"source": [
41+
"def agnostic_get_columns(df_native: IntoFrame) -> list[str]:\n",
42+
" df = nw.from_native(df_native)\n",
43+
" column_names = df.columns\n",
44+
" return column_names"
45+
]
46+
},
47+
{
48+
"cell_type": "code",
49+
"execution_count": 12,
50+
"id": "7dc54474",
51+
"metadata": {},
52+
"outputs": [],
53+
"source": [
54+
"data = {\"a\": [1, 2, 3], \"b\": [4, 5, 6]}\n",
55+
"df_pandas = pd.DataFrame(data)\n",
56+
"df_polars = pl.DataFrame(data)\n",
57+
"table_pa = pa.table(data)"
58+
]
59+
},
60+
{
61+
"cell_type": "code",
62+
"execution_count": 5,
63+
"id": "89939c27",
64+
"metadata": {},
65+
"outputs": [
66+
{
67+
"name": "stdout",
68+
"output_type": "stream",
69+
"text": [
70+
"pandas output\n",
71+
"['a', 'b']\n",
72+
"Polars output\n",
73+
"['a', 'b']\n",
74+
"PyArrow output\n",
75+
"['a', 'b']\n"
76+
]
77+
}
78+
],
79+
"source": [
80+
"print(\"pandas output\")\n",
81+
"print(agnostic_get_columns(df_pandas))\n",
82+
"\n",
83+
"print(\"Polars output\")\n",
84+
"print(agnostic_get_columns(df_polars))\n",
85+
"\n",
86+
"print(\"PyArrow output\")\n",
87+
"print(agnostic_get_columns(table_pa))"
88+
]
89+
},
90+
{
91+
"cell_type": "markdown",
92+
"id": "1c108e21",
93+
"metadata": {},
94+
"source": [
95+
"This is the simplest possible example of a dataframe-agnostic function - as we'll soon see, we can do much more advanced things."
96+
]
97+
},
98+
{
99+
"cell_type": "markdown",
100+
"id": "839e22c4",
101+
"metadata": {},
102+
"source": [
103+
"## DataFrame"
104+
]
105+
},
106+
{
107+
"cell_type": "markdown",
108+
"id": "36af6dfd",
109+
"metadata": {},
110+
"source": [
111+
"To write a dataframe-agnostic function, the steps you'll want to follow are:\n",
112+
"\n",
113+
"1. Initialise a Narwhals DataFrame or LazyFrame by passing your dataframe to `nw.from_native`. All the calculations stay lazy if we start with a lazy dataframe - Narwhals will never automatically trigger computation without you asking it to.\n",
114+
"\n",
115+
" Note: if you need eager execution, make sure to pass `eager_only=True` to `nw.from_native`.\n",
116+
"\n",
117+
"2. Express your logic using the subset of the Polars API supported by Narwhals. \n",
118+
"\n",
119+
"3. If you need to return a dataframe to the user in its original library, call `nw.to_native`.\n",
120+
"\n",
121+
"Steps 1 and 3 are so common that we provide a utility `@nw.narwhalify` decorator, which allows you to only explicitly write step 2.\n",
122+
"\n",
123+
"Let's explore this with some simple examples.\n",
124+
"\n"
125+
]
126+
},
127+
{
128+
"cell_type": "markdown",
129+
"id": "e8003bf9",
130+
"metadata": {},
131+
"source": [
132+
"### Example 1: descriptive statistics"
133+
]
134+
},
135+
{
136+
"cell_type": "markdown",
137+
"id": "36b782d0",
138+
"metadata": {},
139+
"source": [
140+
"Just like in Polars, we can pass expressions to DataFrame.select or LazyFrame.select."
141+
]
142+
},
143+
{
144+
"cell_type": "code",
145+
"execution_count": 8,
146+
"id": "52fbbb9c",
147+
"metadata": {},
148+
"outputs": [],
149+
"source": [
150+
"from narwhals.typing import IntoFrameT\n",
151+
"\n",
152+
"def func(df: IntoFrameT) -> IntoFrameT:\n",
153+
" return (\n",
154+
" nw.from_native(df)\n",
155+
" .select(\n",
156+
" a_sum=nw.col(\"a\").sum(),\n",
157+
" a_mean=nw.col(\"a\").mean(),\n",
158+
" a_std=nw.col(\"a\").std(),\n",
159+
" )\n",
160+
" .to_native()\n",
161+
" )"
162+
]
163+
},
164+
{
165+
"cell_type": "code",
166+
"execution_count": null,
167+
"id": "98996849",
168+
"metadata": {},
169+
"outputs": [
170+
{
171+
"name": "stdout",
172+
"output_type": "stream",
173+
"text": [
174+
" a_sum a_mean a_std\n",
175+
"0 4 1.333333 0.57735\n"
176+
]
177+
}
178+
],
179+
"source": [
180+
"# check in pandas\n",
181+
"df = pd.DataFrame({\"a\":[1,1,2]})\n",
182+
"print(func(df))"
183+
]
184+
},
185+
{
186+
"cell_type": "code",
187+
"execution_count": null,
188+
"id": "aca15172",
189+
"metadata": {},
190+
"outputs": [
191+
{
192+
"name": "stdout",
193+
"output_type": "stream",
194+
"text": [
195+
"shape: (1, 3)\n",
196+
"┌───────┬──────────┬─────────┐\n",
197+
"│ a_sum ┆ a_mean ┆ a_std │\n",
198+
"│ --- ┆ --- ┆ --- │\n",
199+
"│ i64 ┆ f64 ┆ f64 │\n",
200+
"╞═══════╪══════════╪═════════╡\n",
201+
"│ 4 ┆ 1.333333 ┆ 0.57735 │\n",
202+
"└───────┴──────────┴─────────┘\n"
203+
]
204+
}
205+
],
206+
"source": [
207+
"# check in polars\n",
208+
"df = pl.DataFrame({\"a\": [1,1,2]})\n",
209+
"print(func(df))"
210+
]
211+
},
212+
{
213+
"cell_type": "code",
214+
"execution_count": null,
215+
"id": "b192cb5a",
216+
"metadata": {},
217+
"outputs": [
218+
{
219+
"name": "stdout",
220+
"output_type": "stream",
221+
"text": [
222+
"pyarrow.Table\n",
223+
"a_sum: int64\n",
224+
"a_mean: double\n",
225+
"a_std: double\n",
226+
"----\n",
227+
"a_sum: [[4]]\n",
228+
"a_mean: [[1.3333333333333333]]\n",
229+
"a_std: [[0.5773502691896257]]\n"
230+
]
231+
}
232+
],
233+
"source": [
234+
"# check in PyArrow\n",
235+
"table = pa.table({\"a\": [1,1,2]})\n",
236+
"print(func(table))"
237+
]
238+
},
239+
{
240+
"cell_type": "markdown",
241+
"id": "e116914e",
242+
"metadata": {},
243+
"source": [
244+
"### Example 2: group-by and mean\n"
245+
]
246+
},
247+
{
248+
"cell_type": "markdown",
249+
"id": "4f33ced7",
250+
"metadata": {},
251+
"source": [
252+
"Just like in Polars, we can pass expressions to GroupBy.agg. "
253+
]
254+
},
255+
{
256+
"cell_type": "code",
257+
"execution_count": 13,
258+
"id": "4ca29c0d",
259+
"metadata": {},
260+
"outputs": [],
261+
"source": [
262+
"def func(df: IntoFrameT) -> IntoFrameT:\n",
263+
" return(\n",
264+
" nw.from_native(df).group_by(\"a\").agg(nw.col(\"b\").mean()).sort(\"a\").to_native()\n",
265+
" )"
266+
]
267+
},
268+
{
269+
"cell_type": "code",
270+
"execution_count": null,
271+
"id": "d701eefd",
272+
"metadata": {},
273+
"outputs": [
274+
{
275+
"name": "stdout",
276+
"output_type": "stream",
277+
"text": [
278+
" a b\n",
279+
"0 1 4.5\n",
280+
"1 2 6.0\n"
281+
]
282+
}
283+
],
284+
"source": [
285+
"# check in pandas\n",
286+
"df = pd.DataFrame({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
287+
"print(func(df))"
288+
]
289+
},
290+
{
291+
"cell_type": "code",
292+
"execution_count": null,
293+
"id": "0d9430f0",
294+
"metadata": {},
295+
"outputs": [
296+
{
297+
"name": "stdout",
298+
"output_type": "stream",
299+
"text": [
300+
"shape: (2, 2)\n",
301+
"┌─────┬─────┐\n",
302+
"│ a ┆ b │\n",
303+
"│ --- ┆ --- │\n",
304+
"│ i64 ┆ f64 │\n",
305+
"╞═════╪═════╡\n",
306+
"│ 1 ┆ 4.5 │\n",
307+
"│ 2 ┆ 6.0 │\n",
308+
"└─────┴─────┘\n"
309+
]
310+
}
311+
],
312+
"source": [
313+
"# check in polars\n",
314+
"df = pl.DataFrame({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
315+
"print(func(df))"
316+
]
317+
},
318+
{
319+
"cell_type": "code",
320+
"execution_count": 17,
321+
"id": "a26c46bb",
322+
"metadata": {},
323+
"outputs": [
324+
{
325+
"name": "stdout",
326+
"output_type": "stream",
327+
"text": [
328+
"pyarrow.Table\n",
329+
"a: int64\n",
330+
"b: double\n",
331+
"----\n",
332+
"a: [[1,2]]\n",
333+
"b: [[4.5,6]]\n"
334+
]
335+
}
336+
],
337+
"source": [
338+
"#check in PyArrow\n",
339+
"table = pa.table({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
340+
"print(func(table))"
341+
]
342+
},
343+
{
344+
"cell_type": "code",
345+
"execution_count": null,
346+
"id": "9ee2857b",
347+
"metadata": {},
348+
"outputs": [],
349+
"source": []
350+
}
351+
],
352+
"metadata": {
353+
"kernelspec": {
354+
"display_name": "narwhals",
355+
"language": "python",
356+
"name": "python3"
357+
},
358+
"language_info": {
359+
"codemirror_mode": {
360+
"name": "ipython",
361+
"version": 3
362+
},
363+
"file_extension": ".py",
364+
"mimetype": "text/x-python",
365+
"name": "python",
366+
"nbconvert_exporter": "python",
367+
"pygments_lexer": "ipython3",
368+
"version": "3.12.12"
369+
}
370+
},
371+
"nbformat": 4,
372+
"nbformat_minor": 5
373+
}

0 commit comments

Comments
 (0)