narwhals-dev
diff --git a/‎maria/Intro Turorial.ipynb‎
Lines changed: 373 additions & 0 deletions b/‎maria/Intro Turorial.ipynb‎
Lines changed: 373 additions & 0 deletions
@@ -0,0 +1,373 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "id": "0e956d07",
+   "metadata": {},
+   "source": [
+    "## Quick start"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "c4f4ecab",
+   "metadata": {},
+   "source": [
+    "https://narwhals-dev.github.io/narwhals/installation/"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "id": "a49c9620",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from __future__ import annotations\n",
+    "import pandas as pd\n",
+    "import polars as pl\n",
+    "import pyarrow as pa\n",
+    "import narwhals as nw\n",
+    "from narwhals.typing import IntoFrame"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 3,
+   "id": "1aa0f584",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def agnostic_get_columns(df_native: IntoFrame) -> list[str]:\n",
+    "    df = nw.from_native(df_native)\n",
+    "    column_names = df.columns\n",
+    "    return column_names"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 12,
+   "id": "7dc54474",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "data = {\"a\": [1, 2, 3], \"b\": [4, 5, 6]}\n",
+    "df_pandas = pd.DataFrame(data)\n",
+    "df_polars = pl.DataFrame(data)\n",
+    "table_pa = pa.table(data)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 5,
+   "id": "89939c27",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "pandas output\n",
+      "['a', 'b']\n",
+      "Polars output\n",
+      "['a', 'b']\n",
+      "PyArrow output\n",
+      "['a', 'b']\n"
+     ]
+    }
+   ],
+   "source": [
+    "print(\"pandas output\")\n",
+    "print(agnostic_get_columns(df_pandas))\n",
+    "\n",
+    "print(\"Polars output\")\n",
+    "print(agnostic_get_columns(df_polars))\n",
+    "\n",
+    "print(\"PyArrow output\")\n",
+    "print(agnostic_get_columns(table_pa))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1c108e21",
+   "metadata": {},
+   "source": [
+    "This is the simplest possible example of a dataframe-agnostic function - as we'll soon see, we can do much more advanced things."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "839e22c4",
+   "metadata": {},
+   "source": [
+    "## DataFrame"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36af6dfd",
+   "metadata": {},
+   "source": [
+    "To write a dataframe-agnostic function, the steps you'll want to follow are:\n",
+    "\n",
+    "1. Initialise a Narwhals DataFrame or LazyFrame by passing your dataframe to `nw.from_native`. All the calculations stay lazy if we start with a lazy dataframe - Narwhals will never automatically trigger computation without you asking it to.\n",
+    "\n",
+    "    Note: if you need eager execution, make sure to pass `eager_only=True` to `nw.from_native`.\n",
+    "\n",
+    "2. Express your logic using the subset of the Polars API supported by Narwhals.     \n",
+    "\n",
+    "3. If you need to return a dataframe to the user in its original library, call `nw.to_native`.\n",
+    "\n",
+    "Steps 1 and 3 are so common that we provide a utility `@nw.narwhalify` decorator, which allows you to only explicitly write step 2.\n",
+    "\n",
+    "Let's explore this with some simple examples.\n",
+    "\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e8003bf9",
+   "metadata": {},
+   "source": [
+    "### Example 1: descriptive statistics"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "36b782d0",
+   "metadata": {},
+   "source": [
+    "Just like in Polars, we can pass expressions to DataFrame.select or LazyFrame.select."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 8,
+   "id": "52fbbb9c",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from narwhals.typing import IntoFrameT\n",
+    "\n",
+    "def func(df: IntoFrameT) -> IntoFrameT:\n",
+    "    return (\n",
+    "        nw.from_native(df)\n",
+    "        .select(\n",
+    "            a_sum=nw.col(\"a\").sum(),\n",
+    "            a_mean=nw.col(\"a\").mean(),\n",
+    "            a_std=nw.col(\"a\").std(),\n",
+    "        )\n",
+    "        .to_native()\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "98996849",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   a_sum    a_mean    a_std\n",
+      "0      4  1.333333  0.57735\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check in pandas\n",
+    "df = pd.DataFrame({\"a\":[1,1,2]})\n",
+    "print(func(df))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "aca15172",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "shape: (1, 3)\n",
+      "┌───────┬──────────┬─────────┐\n",
+      "│ a_sum ┆ a_mean   ┆ a_std   │\n",
+      "│ ---   ┆ ---      ┆ ---     │\n",
+      "│ i64   ┆ f64      ┆ f64     │\n",
+      "╞═══════╪══════════╪═════════╡\n",
+      "│ 4     ┆ 1.333333 ┆ 0.57735 │\n",
+      "└───────┴──────────┴─────────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check in polars\n",
+    "df = pl.DataFrame({\"a\": [1,1,2]})\n",
+    "print(func(df))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "b192cb5a",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "pyarrow.Table\n",
+      "a_sum: int64\n",
+      "a_mean: double\n",
+      "a_std: double\n",
+      "----\n",
+      "a_sum: [[4]]\n",
+      "a_mean: [[1.3333333333333333]]\n",
+      "a_std: [[0.5773502691896257]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check in PyArrow\n",
+    "table = pa.table({\"a\": [1,1,2]})\n",
+    "print(func(table))"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "e116914e",
+   "metadata": {},
+   "source": [
+    "### Example 2: group-by and mean\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "4f33ced7",
+   "metadata": {},
+   "source": [
+    "Just like in Polars, we can pass expressions to GroupBy.agg. "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 13,
+   "id": "4ca29c0d",
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def func(df: IntoFrameT) -> IntoFrameT:\n",
+    "    return(\n",
+    "        nw.from_native(df).group_by(\"a\").agg(nw.col(\"b\").mean()).sort(\"a\").to_native()\n",
+    "    )"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "d701eefd",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "   a    b\n",
+      "0  1  4.5\n",
+      "1  2  6.0\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check in pandas\n",
+    "df = pd.DataFrame({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
+    "print(func(df))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "0d9430f0",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "shape: (2, 2)\n",
+      "┌─────┬─────┐\n",
+      "│ a   ┆ b   │\n",
+      "│ --- ┆ --- │\n",
+      "│ i64 ┆ f64 │\n",
+      "╞═════╪═════╡\n",
+      "│ 1   ┆ 4.5 │\n",
+      "│ 2   ┆ 6.0 │\n",
+      "└─────┴─────┘\n"
+     ]
+    }
+   ],
+   "source": [
+    "# check in polars\n",
+    "df = pl.DataFrame({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
+    "print(func(df))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 17,
+   "id": "a26c46bb",
+   "metadata": {},
+   "outputs": [
+    {
+     "name": "stdout",
+     "output_type": "stream",
+     "text": [
+      "pyarrow.Table\n",
+      "a: int64\n",
+      "b: double\n",
+      "----\n",
+      "a: [[1,2]]\n",
+      "b: [[4.5,6]]\n"
+     ]
+    }
+   ],
+   "source": [
+    "#check in PyArrow\n",
+    "table = pa.table({\"a\": [1, 1, 2], \"b\": [4, 5, 6]})\n",
+    "print(func(table))"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9ee2857b",
+   "metadata": {},
+   "outputs": [],
+   "source": []
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "narwhals",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.12"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 5
+}