Skip to content

Commit 2540ebf

Browse files
chetanthotechetan thotekesmit13
authored
Added notebooks for load data (#103)
* Create load-CSV-data-S3 * Added notebooks for Load data sections of UI * Modified with suggested changes * Modified with suggested changes * Remove extra header --------- Co-authored-by: chetan thote <chetan@chetans-MacBook-Pro.local> Co-authored-by: Kevin D Smith <kevin.smith@octobergray.com>
1 parent e2becae commit 2540ebf

File tree

5 files changed

+791
-0
lines changed

5 files changed

+791
-0
lines changed

authors/chetan-thote.toml

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,4 @@
1+
name="Chetan Thote"
2+
title="Product Team"
3+
image="singlestore"
4+
external=false
Lines changed: 11 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,11 @@
1+
[meta]
2+
authors=["chetan-thote"]
3+
title="Sales Data Analysis Dataset From Amazon S3"
4+
description="""\
5+
The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file."""
6+
difficulty="beginner"
7+
tags=["starter", "loaddata", "s3"]
8+
lesson_areas=["Ingest"]
9+
icon="database"
10+
destinations=["spaces"]
11+
minimum_tier="free-shared"
Lines changed: 360 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,360 @@
1+
{
2+
"cells": [
3+
{
4+
"cell_type": "markdown",
5+
"id": "97f96c34-81a9-495a-a55d-c565695e87f0",
6+
"metadata": {},
7+
"source": [
8+
"<div id=\"singlestore-header\" style=\"display: flex; background-color: rgba(235, 249, 245, 0.25); padding: 5px;\">\n",
9+
" <div id=\"icon-image\" style=\"width: 90px; height: 90px;\">\n",
10+
" <img width=\"100%\" height=\"100%\" src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/header-icons/database.png\" />\n",
11+
" </div>\n",
12+
" <div id=\"text\" style=\"padding: 5px; margin-left: 10px;\">\n",
13+
" <div id=\"badge\" style=\"display: inline-block; background-color: rgba(0, 0, 0, 0.15); border-radius: 4px; padding: 4px 8px; align-items: center; margin-top: 6px; margin-bottom: -2px; font-size: 80%\">SingleStore Notebooks</div>\n",
14+
" <h1 style=\"font-weight: 500; margin: 8px 0 0 4px;\">Sales Data Analysis Dataset From Amazon S3</h1>\n",
15+
" </div>\n",
16+
"</div>"
17+
]
18+
},
19+
{
20+
"cell_type": "markdown",
21+
"id": "612bd378-f145-42f1-b8ce-32557a4c00cd",
22+
"metadata": {},
23+
"source": [
24+
"<div class=\"alert alert-block alert-warning\">\n",
25+
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
26+
" <div>\n",
27+
" <p><b>Note</b></p>\n",
28+
" <p>This notebook can be run on a Free Starter Workspace. To create a Free Starter Workspace navigate to <tt>Start</tt> using the left nav. You can also use your existing Standard or Premium workspace with this Notebook.</p>\n",
29+
" </div>\n",
30+
"</div>"
31+
]
32+
},
33+
{
34+
"attachments": {},
35+
"cell_type": "markdown",
36+
"id": "481ce5ae-2ee0-4b63-b3f3-a4b53a5bc381",
37+
"metadata": {},
38+
"source": [
39+
"The Sales Data Analysis use case demonstrates how to utilize Singlestore's powerful querying capabilities to analyze sales data stored in a CSV file. This demo showcases typical operations that businesses perform to gain insights from their sales data, such as calculating total sales, identifying top-selling products, and analyzing sales trends over time. By working through this example, new users will learn how to load CSV data into Singlestore, execute aggregate functions, and perform time-series analysis, which are essential skills for leveraging the full potential of Singlestore in a business intelligence context."
40+
]
41+
},
42+
{
43+
"attachments": {},
44+
"cell_type": "markdown",
45+
"id": "72fe6854-5b6e-4b79-a2d0-79bda0e18429",
46+
"metadata": {},
47+
"source": [
48+
"<h3>Demo Flow</h3>"
49+
]
50+
},
51+
{
52+
"attachments": {},
53+
"cell_type": "markdown",
54+
"id": "5ed26ab8-1217-4fbd-be0c-4e7728314671",
55+
"metadata": {},
56+
"source": [
57+
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/LoadDataCSV.png width=\"100%\" hight=\"50%\"/>"
58+
]
59+
},
60+
{
61+
"attachments": {},
62+
"cell_type": "markdown",
63+
"id": "46fb95a8-1402-4b97-b04a-560741f96181",
64+
"metadata": {},
65+
"source": [
66+
"## How to use this notebook"
67+
]
68+
},
69+
{
70+
"attachments": {},
71+
"cell_type": "markdown",
72+
"id": "a701cd90-dd42-4a06-b7a1-e0a2132af558",
73+
"metadata": {},
74+
"source": [
75+
"<img src=https://singlestoreloaddata.s3.ap-south-1.amazonaws.com/images/notebookuse.gif width=\"75%\" hight=\"50%\"/>"
76+
]
77+
},
78+
{
79+
"attachments": {},
80+
"cell_type": "markdown",
81+
"id": "2d22fd53-2c18-40e5-bb38-6d8ebc06f1b8",
82+
"metadata": {},
83+
"source": [
84+
"## Create a database\n",
85+
"\n",
86+
"We need to create a database to work with in the following examples."
87+
]
88+
},
89+
{
90+
"cell_type": "code",
91+
"execution_count": 1,
92+
"id": "1624ccea-0c15-4048-ab2a-fe2178e5912a",
93+
"metadata": {},
94+
"outputs": [],
95+
"source": [
96+
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
97+
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
98+
" %sql DROP DATABASE IF EXISTS SalesAnalysis;\n",
99+
" %sql CREATE DATABASE SalesAnalysis;"
100+
]
101+
},
102+
{
103+
"attachments": {},
104+
"cell_type": "markdown",
105+
"id": "901e6ec1-2530-497a-857e-7973bb9714f1",
106+
"metadata": {},
107+
"source": [
108+
"<h3>Create Table</h3>"
109+
]
110+
},
111+
{
112+
"cell_type": "code",
113+
"execution_count": 2,
114+
"id": "7ac4285d-0d2d-44ec-8b1e-eef7b4f9358c",
115+
"metadata": {},
116+
"outputs": [],
117+
"source": [
118+
"%%sql\n",
119+
"CREATE TABLE `SalesData` (\n",
120+
" `Date` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
121+
" `Store_ID` bigint(20) DEFAULT NULL,\n",
122+
" `ProductID` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
123+
" `Product_Name` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
124+
" `Product_Category` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci,\n",
125+
" `Quantity_Sold` bigint(20) DEFAULT NULL,\n",
126+
" `Price` float DEFAULT NULL,\n",
127+
" `Total_Sales` float DEFAULT NULL\n",
128+
")"
129+
]
130+
},
131+
{
132+
"attachments": {},
133+
"cell_type": "markdown",
134+
"id": "1de959eb-4f17-45d4-af74-42f45684d67b",
135+
"metadata": {},
136+
"source": [
137+
"<h3>Load Data Using Pipelines</h3>"
138+
]
139+
},
140+
{
141+
"cell_type": "code",
142+
"execution_count": 3,
143+
"id": "84f592b8-a12e-41d8-bff0-fe96175992b9",
144+
"metadata": {},
145+
"outputs": [],
146+
"source": [
147+
"%%sql\n",
148+
"CREATE PIPELINE SalesData_Pipeline AS\n",
149+
"LOAD DATA S3 's3://singlestoreloaddata/SalesData/sales_data.csv'\n",
150+
"CONFIG '{ \\\"region\\\": \\\"ap-south-1\\\" }'\n",
151+
"/*\n",
152+
"CREDENTIALS '{\"aws_access_key_id\": \"<access key id>\",\n",
153+
" \"aws_secret_access_key\": \"<access_secret_key>\"}'\n",
154+
" */\n",
155+
"INTO TABLE SalesData\n",
156+
"FIELDS TERMINATED BY ','\n",
157+
"LINES TERMINATED BY '\\r\\n'\n",
158+
"IGNORE 1 lines;\n",
159+
"\n",
160+
"\n",
161+
"START PIPELINE SalesData_Pipeline;"
162+
]
163+
},
164+
{
165+
"cell_type": "code",
166+
"execution_count": 4,
167+
"id": "352e340a-a613-4ec5-94a5-c4e1f3565757",
168+
"metadata": {},
169+
"outputs": [],
170+
"source": [
171+
"%%sql\n",
172+
"SELECT * FROM SalesData LIMIT 10"
173+
]
174+
},
175+
{
176+
"attachments": {},
177+
"cell_type": "markdown",
178+
"id": "4508d431-7683-4ac9-a4e8-d939c47dd1fc",
179+
"metadata": {},
180+
"source": [
181+
"<h3>Sample Queries</h3>\n",
182+
"\n",
183+
"We will try to execute some Analytical Queries"
184+
]
185+
},
186+
{
187+
"attachments": {},
188+
"cell_type": "markdown",
189+
"id": "55ac6134-976c-4f27-bc2b-140835b64f13",
190+
"metadata": {},
191+
"source": [
192+
"<b>Top-Selling Products"
193+
]
194+
},
195+
{
196+
"cell_type": "code",
197+
"execution_count": 5,
198+
"id": "d666c04b-ccb0-47cc-a1e7-efaa7a590d27",
199+
"metadata": {},
200+
"outputs": [],
201+
"source": [
202+
"%%sql\n",
203+
"SELECT product_name, SUM(quantity_sold) AS total_quantity_sold FROM SalesData\n",
204+
" GROUP BY product_name ORDER BY total_quantity_sold DESC LIMIT 5;"
205+
]
206+
},
207+
{
208+
"attachments": {},
209+
"cell_type": "markdown",
210+
"id": "87c36700-0db8-405f-97c0-e13a6a2ae0cb",
211+
"metadata": {},
212+
"source": [
213+
"<b>Sales Trends Over Time"
214+
]
215+
},
216+
{
217+
"cell_type": "code",
218+
"execution_count": 6,
219+
"id": "b46d72c7-07a3-4e23-8fe4-c238b5517ef6",
220+
"metadata": {},
221+
"outputs": [],
222+
"source": [
223+
"%%sql\n",
224+
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
225+
"GROUP BY date ORDER BY total_sales desc limit 5;"
226+
]
227+
},
228+
{
229+
"attachments": {},
230+
"cell_type": "markdown",
231+
"id": "e6c232a1-acce-4d25-aebd-1a89aafba47d",
232+
"metadata": {},
233+
"source": [
234+
"<b>Total Sales by Store"
235+
]
236+
},
237+
{
238+
"cell_type": "code",
239+
"execution_count": 7,
240+
"id": "af571f6c-0145-4466-9ed7-000d37e4738f",
241+
"metadata": {},
242+
"outputs": [],
243+
"source": [
244+
"%%sql\n",
245+
"SELECT Store_ID, SUM(total_sales) AS total_sales FROM SalesData\n",
246+
"GROUP BY Store_ID ORDER BY total_sales DESC limit 5;"
247+
]
248+
},
249+
{
250+
"attachments": {},
251+
"cell_type": "markdown",
252+
"id": "9bf1d7f3-c636-4ac0-b2be-e48eaca747ef",
253+
"metadata": {},
254+
"source": [
255+
"<b>Sales Contribution by Product (Percentage)"
256+
]
257+
},
258+
{
259+
"cell_type": "code",
260+
"execution_count": 8,
261+
"id": "5613b3e8-72d2-48dc-a7ae-47911df24cd2",
262+
"metadata": {},
263+
"outputs": [],
264+
"source": [
265+
"%%sql\n",
266+
"SELECT product_name, SUM(total_sales) * 100.0 / (SELECT SUM(total_sales) FROM SalesData) AS sales_percentage FROM SalesData\n",
267+
" GROUP BY product_name ORDER BY sales_percentage DESC limit 5;"
268+
]
269+
},
270+
{
271+
"attachments": {},
272+
"cell_type": "markdown",
273+
"id": "afed201d-d9f2-49cc-8a14-df35103abd4e",
274+
"metadata": {},
275+
"source": [
276+
"<b>Top Days with Highest Sale</b>"
277+
]
278+
},
279+
{
280+
"cell_type": "code",
281+
"execution_count": 9,
282+
"id": "7fd8d785-7861-4570-88b3-0185c2c9c298",
283+
"metadata": {},
284+
"outputs": [],
285+
"source": [
286+
"%%sql\n",
287+
"SELECT date, SUM(total_sales) AS total_sales FROM SalesData\n",
288+
" GROUP BY date ORDER BY total_sales DESC LIMIT 5;"
289+
]
290+
},
291+
{
292+
"attachments": {},
293+
"cell_type": "markdown",
294+
"id": "6738b6e4-5e8b-45db-b3dc-ebcb73bcf629",
295+
"metadata": {},
296+
"source": [
297+
"## Conclusion\n",
298+
"\n",
299+
"<div class=\"alert alert-block alert-warning\">\n",
300+
" <b class=\"fa fa-solid fa-exclamation-circle\"></b>\n",
301+
" <div>\n",
302+
" <p><b>Action Required</b></p>\n",
303+
" <p> If you created a new database in your Standard or Premium Workspace, you can drop the database by running the cell below. Note: this will not drop your database for Free Starter Workspaces. To drop a Free Starter Workspace, terminate the Workspace using the UI. </p>\n",
304+
" </div>\n",
305+
"</div>\n",
306+
"\n",
307+
"We have shown how to insert data from a Amazon S3 using `Pipelines` to SingleStoreDB. These techniques should enable you to\n",
308+
"integrate your Amazon S3 with SingleStoreDB."
309+
]
310+
},
311+
{
312+
"cell_type": "code",
313+
"execution_count": 10,
314+
"id": "d5053a52-5579-4fea-9594-5250f6fcc289",
315+
"metadata": {},
316+
"outputs": [],
317+
"source": [
318+
"shared_tier_check = %sql show variables like 'is_shared_tier'\n",
319+
"if not shared_tier_check or shared_tier_check[0][1] == 'OFF':\n",
320+
" %sql DROP DATABASE IF EXISTS SalesAnalysis;"
321+
]
322+
},
323+
{
324+
"cell_type": "markdown",
325+
"id": "2dcc585a-43c2-4598-93bf-888143dd5e29",
326+
"metadata": {},
327+
"source": [
328+
"<div id=\"singlestore-footer\" style=\"background-color: rgba(194, 193, 199, 0.25); height:2px; margin-bottom:10px\"></div>\n",
329+
"<div><img src=\"https://raw.githubusercontent.com/singlestore-labs/spaces-notebooks/master/common/images/singlestore-logo-grey.png\" style=\"padding: 0px; margin: 0px; height: 24px\"/></div>"
330+
]
331+
}
332+
],
333+
"metadata": {
334+
"jupyterlab": {
335+
"notebooks": {
336+
"version_major": 6,
337+
"version_minor": 4
338+
}
339+
},
340+
"kernelspec": {
341+
"display_name": "Python 3 (ipykernel)",
342+
"language": "python",
343+
"name": "python3"
344+
},
345+
"language_info": {
346+
"codemirror_mode": {
347+
"name": "ipython",
348+
"version": 3
349+
},
350+
"file_extension": ".py",
351+
"mimetype": "text/x-python",
352+
"name": "python",
353+
"nbconvert_exporter": "python",
354+
"pygments_lexer": "ipython3",
355+
"version": "3.11.6"
356+
}
357+
},
358+
"nbformat": 4,
359+
"nbformat_minor": 5
360+
}
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
[meta]
2+
authors=["chetan-thote"]
3+
title="Real-Time Event Monitoring Dataset From Kafka"
4+
description="""\
5+
The Real-Time Event Monitoring use case illustrates how to leverage Singlestore's capabilities to process and analyze streaming data from a Kafka data source.
6+
"""
7+
difficulty="beginner"
8+
tags=["starter", "loaddata", "kafka"]
9+
lesson_areas=["Ingest"]
10+
icon="database"
11+
destinations=["spaces"]
12+
minimum_tier="free-shared"

0 commit comments

Comments
 (0)