PacktPublishing · donalus · Feb 19, 2023 · Feb 19, 2023
diff --git a/Chapter01/Excercises Ch1.ipynb → Chapter01/Chapter 1 Exercises.ipynb b/Chapter01/Excercises Ch1.ipynb → Chapter01/Chapter 1 Exercises.ipynb
@@ -16,7 +16,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 1\n",
+    "##### Exercise 1\n",
     "Use the adult.csv dataset and run the codes shown in the following Screenshots. Then answer the questions."
    ]
   },
@@ -70,7 +70,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 2 \n",
+    "##### Exercise 2 \n",
     "\n",
     "For adult_df use the .groupby() function to run the following code and create the multi-index Series mlt_sr."
    ]
@@ -295,7 +295,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 3\n",
+    "##### Exercise 3\n",
     "For this exercise you need to use a new dataset: billboard.csv. Visit https://www.billboard.com/charts/hot-100 and see the latest song rankings of the day. This dataset presents information and ranking of 317 song tracks in 80 columns. The first four columns are artist, track, time, and date_e. The first columns are intuitive descriptions of song tracks. The column date_e shows the date that the songs entered the hot-100 list. The rest of 76 columns are songs ranking at the end of each weeks from 'w1' to 'w76'. Download and read this dataset using pandas and answer the following questions."
    ]
   },
@@ -431,7 +431,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 4 \n",
+    "##### Exercise 4 \n",
     "\n",
     "We will use LaqnData.csv for this exercise. Each row of this dataset shows an hourly measurement recording of one of the five following air pollutants: NO, NO2, NOX, PM10, and PM2.5. The data was collected in a location in Londan for the entirety of year 2017. Read the data using Pandas and perform the following tasks."
    ]
@@ -653,7 +653,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 5 \n",
+    "##### Exercise 5 \n",
     "\n",
     "We will continue working with LaqnData.csv. \n",
     "\n",

diff --git a/Chapter02/Excercises Ch2.ipynb → Chapter02/Chapter 2 Exercises.ipynb b/Chapter02/Excercises Ch2.ipynb → Chapter02/Chapter 2 Exercises.ipynb
@@ -25,7 +25,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 1\n",
+    "##### Exercise 1\n",
     "Use adult.csv and Boolean Masking to answer the following questions. "
    ]
   },
@@ -242,7 +242,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 2    \n",
+    "##### Exercise 2    \n",
     "    a)\tRepeat the analysis on Exercise 1. a), but this time use groupby function. \n",
     "    b)\tb) compare the runtime of using BM vs. groupby. (hint: you can import the module time and use the fuction .time()) \n"
    ]
@@ -265,7 +265,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 3 \n",
+    "##### Exercise 3 \n",
     "\n",
     "    If you have not already, solve exercise 4 in the previous chapter. After you created pvt_df for Exercises 4, run the following code.\n"
    ]

diff --git a/Chapter03/Excercises Ch3.ipynb → Chapter03/Chapter 3 Exercises.ipynb b/Chapter03/Excercises Ch3.ipynb → Chapter03/Chapter 3 Exercises.ipynb
@@ -16,7 +16,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 1\n",
+    "##### Exercise 1\n",
     "1)\tFrom 5 colleagues or classmates ask to provide a definition for the term data. \n",
     "\n",
     "    a)\tReport these definitions and indicate the similarity among them. \n",
@@ -36,7 +36,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 2\n",
+    "##### Exercise 2\n",
     "\n",
     "For this exercise, we are going to use covid_impact_on_airport_traffic.csv. Answer the following questions. This dataset is from Kaggle.com, use this link to see its page: https://www.kaggle.com/terenceshin/covid19s-impact-on-airport-traffic.\n",
     "The key attribute of this dataset is PercentOfBaseline which shows the ratio of air traffic in the specific day compared to pre-pandemic time (1st Feb to 15th March 2020)"
@@ -335,7 +335,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 3    \n",
+    "##### Exercise 3    \n",
     "\n",
     "For this exercise, we are going to use US_Accidents.csv. Answer the following questions. This dataset is from Kaggle.com, use this link to see its page: https://www.kaggle.com/sobhanmoosavi/us-accidents.\n",
     "This dataset shows all the car accidents in the US from February 2016 to Dec 2020. \n",
@@ -769,7 +769,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 4 \n",
+    "##### Exercise 4 \n",
     "\n",
     "For this exercise, we are going to use fatal-police-shootings-data.csv. There are a lot of debates, discussions, dialogues, and protests happening in the US surrounding police killings. The Washington Post has been collecting data on all fatal police shootings in the US. The dataset available to the government and the public alike has date, age, gender, race, location, and other situational information of these fatal police shootings. You can read more about this data on https://www.washingtonpost.com/graphics/investigations/police-shootings-database/, and you can download the last version of the data from https://github.com/washingtonpost/data-police-shootings"
    ]
@@ -980,7 +980,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 5\n",
+    "##### Exercise 5\n",
     "For this exercise, we will be using electricity_prediction.csv. The screenshot below shows the 5 rows of this dataset and a linear regression model created to predict electricity consumption based on the weekday and daily average temperature. "
    ]
   },
@@ -1137,7 +1137,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 6\n",
+    "##### Exercise 6\n",
     "For this exercise, we will be using adult.csv. we used this dataset extensively in chapter 1. Read the dataset using Padans and call it adult_df."
    ]
   },

diff --git a/Chapter04/Excercises Ch4.ipynb → Chapter04/Chapter 4 Exercises.ipynb b/Chapter04/Excercises Ch4.ipynb → Chapter04/Chapter 4 Exercises.ipynb
@@ -16,7 +16,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 1\n",
+    "##### Exercise 1\n",
     "In your own words, describe the difference between a dataset and a database.  \n"
    ]
   },
@@ -31,7 +31,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 2\n",
+    "##### Exercise 2\n",
     "What are the advantages and disadvantages of structuring data for a relational database? Mention at least two advantages and two disadvantages. Use examples to elucidate. "
    ]
   },
@@ -54,7 +54,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 3    \n",
+    "##### Exercise 3    \n",
     "\n",
     "In this chapter, we were introduced to 4 different types of databases: relational databases, unstructured databases, distributed databases, and blockchain. \n",
     "\n",
@@ -118,7 +118,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 4 \n",
+    "##### Exercise 4 \n",
     "In this chapter, we were introduced to five different methods of connecting to databases: direct connection, webpage connection, API connection, request connection, and publicly shared. Use the following table to indicate a ranking for each of the five methods of connecting to databases based on the specified criteria. Study the rankings and provides reasoning for why they are correct."
    ]
   },
@@ -176,7 +176,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 5\n",
+    "##### Exercise 5\n",
     "Using the Chinook database as a sample, we want to investigate and find an answer to the following question: Do tracks that are titled using positive words sell better on average than tracks that are titled with negative words. We would like to only focus on the following words in the investigations. \n",
     "\n",
     "- List of negative words: ['Evil', 'Night', 'Problem', 'Sorrow', 'Dead', 'Curse', 'Venom', 'Pain', 'Lonely', 'Beast']\n",
@@ -246,7 +246,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "##### Excercise 6\n",
+    "##### Exercise 6\n",
     "In the year 2020, which of the following 12 stocks experienced the highest growth. \n",
     "\n",
     "Stocks: [‘Baba’, ‘NVR’, ‘AAPL’, ‘NFLX’, ‘FB’, ‘SBUX’, ‘NOW’, ‘AMZN’, ‘GOOGL’, ‘MSFT’, ‘FDX’, ‘TSLA’]\n",

diff --git a/Chapter05/Chaper 5 Excercises.ipynb → Chapter05/Chapter 5 Exercises.ipynb b/Chapter05/Chaper 5 Excercises.ipynb → Chapter05/Chapter 5 Exercises.ipynb
@@ -10,7 +10,7 @@
     "    AUTHOR: Dr. Roy Jafari \n",
     "\n",
     "### Chapter 5: Data Visualization \n",
-    "#### Excercises"
+    "#### Exercises"
    ]
   },
   {
@@ -30,7 +30,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 1\n",
+    "# Exercise 1\n",
     "In this exercise, we will be using Universities_imputed_reduced.csv. Draw the following described visualizations.\n",
     "\n",
     "    a.\tUse boxplots to compare the student to faculty ratio (stud./fac. ratio) for the two population public and private universities.\n",
@@ -233,7 +233,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 2\n",
+    "# Exercise 2\n",
     "\n",
     "In this exercise, we will continue using Universities_imputed_reduced.csv. Draw the following described visualizations.\n",
     "\n",
@@ -288,7 +288,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 3\n",
+    "# Exercise 3\n",
     "\n",
     "For this example, we will be using WH Report_preprocessed.csv. Draw the following described visualizations.\n",
     "\n",
@@ -352,7 +352,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 4\n",
+    "# Exercise 4\n",
     "\n",
     "For this exercise, we will continue using WH Report_preprocessed.csv. Draw the following described visualizations.\n",
     "\n",
@@ -392,7 +392,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 5\n",
+    "# Exercise 5\n",
     "\n",
     "For this exercise, we will be using whickham.csv. Draw the following described visualizations.\n",
     "\n",
@@ -587,7 +587,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 6\n",
+    "# Exercise 6\n",
     "\n",
     "For this exercise, we will be using WH Report_preprocessed.csv. \n",
     "\n",
@@ -637,7 +637,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 7\n",
+    "# Exercise 7\n",
     "\n",
     "For this exercise, we will continue using WH Report_preprocessed.csv. \n",
     "\n",

diff --git a/Chapter06/Chaper 6 Excercises.ipynb → Chapter06/Chapter 6 Exercises.ipynb b/Chapter06/Chaper 6 Excercises.ipynb → Chapter06/Chapter 6 Exercises.ipynb
@@ -10,7 +10,7 @@
     "    AUTHOR: Dr. Roy Jafari \n",
     "\n",
     "### Chapter 6: Prediction \n",
-    "#### Excercises"
+    "#### Exercises"
    ]
   },
   {
@@ -30,7 +30,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 1\n",
+    "# Exercise 1\n",
     "“MLP has the potential to create prediction models that are more accurate than predictions models that are created by linear regression.” This statement is generally correct. In this exercise, we want to explore one of the reasons why the statement is correct. Answer the following questions.\n",
     "\n",
     "    a) The following formula shows the linear equation that we used to connect the dependent and independent attributes of the MSU number of applications problem. Count and report the number of coefficients that Linear Regression can play with to fit the equation to the data.  \n",
@@ -53,7 +53,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 2\n",
+    "# Exercise 2\n",
     "2.\tIn this exercise, we will be using ToyotaCorolla_preprocessed.csv. This dataset has the following columns: Age, Milage_KM, Quarterly_Tax, Weight, \tFuel_Type_CNG, Fuel_Type_Diesel, Fuel_Type_Petrol, and Price. Each data object in this dataset is a used Toyota Corolla car. We would like to use this dataset to predict the price of used Toyota Corolla cars. \n"
    ]
   },

diff --git a/Chapter07/Chaper 7 Excercises.ipynb → Chapter07/Chapter 7 Exercises.ipynb b/Chapter07/Chaper 7 Excercises.ipynb → Chapter07/Chapter 7 Exercises.ipynb
@@ -10,7 +10,7 @@
     "    AUTHOR: Dr. Roy Jafari \n",
     "\n",
     "### Chapter 7: Classification \n",
-    "#### Excercises"
+    "#### Exercises"
    ]
   },
   {
@@ -30,7 +30,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 1\n",
+    "# Exercise 1\n",
     "The chapter asserts that before using KNN you will need to have your independent attributes normalized. This is certainly true, but how come we were able to get away with no-normalization when we performed KNN using visualization? See Figure 7.3. \n"
    ]
   },
@@ -45,7 +45,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 2\n",
+    "# Exercise 2\n",
     "We did not normalize the data when applying the Decision Tree to the Loan Application problem. For practice and deeper understanding, apply the Decision Tree to the normalized data, and answer the following questions. "
    ]
   },
@@ -88,7 +88,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 3\n",
+    "# Exercise 3\n",
     "For this exercise, we are going to use the Customer Churn.csv. This dataset is randomly collected from an Iranian telecom company’s database over a period of 12 months. A total of 3150 rows of data, each representing a customer, bear information for 13 columns. The attributes that are in this dataset are listed below:\n",
     "    \n",
     "    Call Failures: number of call failures\n",

diff --git a/Chapter08/Chaper 8 Excercises.ipynb → Chapter08/Chapter 8 Exercises.ipynb b/Chapter08/Chaper 8 Excercises.ipynb → Chapter08/Chapter 8 Exercises.ipynb
@@ -10,7 +10,7 @@
     "    AUTHOR: Dr. Roy Jafari \n",
     "\n",
     "### Chapter 8: Clustering Analysis\n",
-    "#### Excercises"
+    "#### Exercises"
    ]
   },
   {
@@ -29,7 +29,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 1\n",
+    "# Exercise 1\n",
     "In your own words, answer the following two questions. Use at most 200 words, to answer each question.\n",
     "\n",
     "    a.\tWhat is the difference between Classification and Prediction?\n",
@@ -47,7 +47,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 2\n",
+    "# Exercise 2\n",
     "Consider Figure 8.6 regarding the necessity of normalization before performing Clustering analysis. With this new appreciation you developed in this chapter, would you like to change your answer to the first exercise question from the previous chapter?\n"
    ]
   },
@@ -62,7 +62,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 3\n",
+    "# Exercise 3\n",
     "In this chapter, we used WH Report_preprocessed.csv to form meaningful clusters of countries only using 2019 data. In this exercise, we want to use the data of all the years 2010-2019. Perform the following steps to do this."
    ]
   },
@@ -126,7 +126,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 4\n",
+    "# Exercise 4\n",
     "For this exercise we will be using the dataset Mall_Customers.xlsx to form 4 meaningful clusters of customers. The following steps will help you to do this correctly. "
    ]
   },

diff --git a/Chapter09/Chaper 9 Excercises.ipynb → Chapter09/Chapter 9 Exercises.ipynb b/Chapter09/Chaper 9 Excercises.ipynb → Chapter09/Chapter 9 Exercises.ipynb
@@ -10,7 +10,7 @@
     "    AUTHOR: Dr. Roy Jafari \n",
     "\n",
     "### Chapter 9: Data Cleaning - Levels Ⅰ and Ⅱ \n",
-    "#### Excercises"
+    "#### Exercises"
    ]
   },
   {
@@ -28,7 +28,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 1\n",
+    "# Exercise 1\n",
     "In your own words describe the relationship between analytics goals and data cleaning. Your response should answer the following questions.\n",
     "\n",
     "    a.\t Is data cleaning a separate step of data analytics and can be done in isolation? In other words, can data cleaning be performed without knowing about the analytics?\n",
@@ -47,7 +47,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Excercise 2\n",
+    "# Exercise 2\n",
     "\n",
     "A local airport to analyze the usage of its parking has employed a Single Beam Infrared Detector (SBID) technology to count the number of people who pass the gate from the parking to the airport. \n",
     "As shown in the following figure, an SBDI records the time every time the infrared connection is blocked signaling the entrance or the exit of a passenger."