** Contribute:** Found a typo? Or any other change that could improve the notebook tutorial? Please consider sending us a pull request in the public repo of the notebook here.

# Assignment - 1: Solution

#### (Intermediate - Advanced)

This is the first assignment of DPhi 5 Week Data Science Bootcamp that revolves aroung Data Analysis and Visualizations on Learners dataset of the Bootcamp.

```
import numpy as np
import pandas as pd
```

```
learners_data = pd.read_csv("https://raw.githubusercontent.com/dphi-official/Datasets/master/DPhi%20-%20Learners%20-%20Beginners%20%26%20Absolute%20Beginners%20-%20Real%20Dataset%20-%20DPhi_Learners.csv")
```

#### Question 1: Fill all the missing values with 0. (Treat â€˜-â€™ as missing values)

```
def f(r):
if r == '-':
return 0
else:
return r
for cols in learners_data.columns:
learners_data[cols] = learners_data[cols].apply(f) # Using apply, we are applying the function f to every column in the dataset.
```

```
# Since all the columns are categorical, need to convert quiz scores into numerical
cols = ['Quiz1','Quiz2','Quiz3','Quiz4','Quiz5','Quiz6','Quiz7','Total_Score']
learners_data[cols] = learners_data[cols].astype('float64') # astype helps us convert the data type to float
```

#### Question 2: Visualize learners category with different groups and notedown your inferences.

```
import matplotlib.pyplot as plt
import seaborn as sns
```

```
/usr/local/lib/python3.6/dist-packages/statsmodels/tools/_testing.py:19: FutureWarning: pandas.util.testing is deprecated. Use the functions in the public API at pandas.testing instead.
import pandas.util.testing as tm
```

**One of the possible outputs could be like below**

```
plt.figure(figsize=(16,12))
count_plot = sns.countplot(learners_data.Group_ID, hue=learners_data.Learner_Category)
count_plot.set_xticklabels(count_plot.get_xticklabels(), rotation=45) # we have specified rotation just so that the ticks are more readable
plt.show()
```

# Inferences

- All the groups of absolute beginners learning category have members in between 30 and 35 except AB_G19.
- Group AB_G19 of absolute beginners learning category have more than 60 members.
- All the groups of beginner learning category have members 80 or more.
- There are more members associated with beginners learning track than absolute beginners learning track

Approximate calculation:

absolute beginners : 19 (total # of groups in AB) * 30 (approx members) = 570

beginners : 10 (total # of groups in B) * 80 (approx members) = 800

#### Question 3: Visualize the distribution of Total_Scores scored by each category of learners and share your inferences.

```
Ignore those whose marks are near to zero as 0 was filled by us.
```

```
sns.distplot(learners_data[learners_data['Learner_Category'] == 'Absolute Beginner']['Total_Score']) # Conditional selection
# we are selecting only those entries where the Learner_Category is Absolute Beginners. Out of those rows, we are selecting only Total_Score values
```

```
sns.distplot(learners_data[learners_data['Learner_Category'] == 'Beginner']['Total_Score'])
```

# Inferences

- The distribution of total scores scored by both the learning category are similar.
- Maximum people scores lie in the range 45 to 60 for beginners category.

#### Question 4: Visualize/draw the trends of quizzes mean scores for different groups of learners of learner category absolute beginner.

```
learners_data_grouped = learners_data.groupby(learners_data['Group_ID']).mean().round(2) # Group by different groups and taking the mean of each group upto 2 decimal places
learners_data_grouped.drop(columns = ['Total_Score'], axis = 1, inplace = True) #dropping the total score as we want to visualise group-wise mean score of each quiz
learners_data_grouped_axes_swapped = learners_data_grouped.swapaxes('index','columns') # swap the axes (this step can be skipped as well)
# Plotting the trends of quizzes mean scores scored by different groups of learners.
plt.figure(figsize=(12,8))
x = ["Quiz1", "Quiz2", "Quiz3", "Quiz4", "Quiz5", "Quiz6", "Quiz7"] # Quizzes
for col in learners_data_grouped_axes_swapped.columns[:19]:
plt.plot(x, learners_data_grouped_axes_swapped[col]) #plotting each line graph in one image using a for loop
plt.xlabel('Quizzes')
plt.ylabel('Scores')
plt.title("Patterns of Quizzes mean scores scored by different groups of learners of learner category absolute beginner")
plt.xticks(rotation=45) #just for better visibility
plt.legend(learners_data_grouped_axes_swapped.columns[:19]) # showing the first 19 columns in legend (all the abs beginner category groups)
plt.show()
```

#### Question 5: Question 4: Visualize/draw the trends of quizzes mean scores for different groups of learners of learner category beginner.

```
# Plotting the trends of quizzes mean scores scored by different groups of learners.
plt.figure(figsize=(12,8))
x = ["Quiz1", "Quiz2", "Quiz3", "Quiz4", "Quiz5", "Quiz6", "Quiz7"] # Quizzes and total scores
for col in learners_data_grouped_axes_swapped.columns[20:]:
plt.plot(x, learners_data_grouped_axes_swapped[col])
plt.xlabel('Quizzes')
plt.ylabel('Scores')
plt.title("Patterns of Quizzes mean scores scored by different groups of learners of learner category beginner")
plt.xticks(rotation=45)
plt.legend(learners_data_grouped_axes_swapped.columns[20:])
plt.show()
```