import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Create a DataFrame with NBA player data
nba_players = pd.DataFrame({
    'Name': ['LeBron James', 'Kevin Durant', 'Stephen Curry', 'James Harden', 'Giannis Antetokounmpo', 'Luka Doncic', 'Kawhi Leonard', 'Joel Embiid', 'Damian Lillard', 'Nikola Jokic'],
    'Age': [37, 33, 33, 32, 27, 22, 30, 27, 31, 26],
    'Position': ['Small Forward', 'Power Forward', 'Point Guard', 'Shooting Guard', 'Power Forward', 'Point Guard', 'Small Forward', 'Center', 'Point Guard', 'Center'],
    'Salary': [39000000, 38000000, 43000000, 38000000, 39000000, 27000000, 34000000, 29000000, 31000000, 26000000]
})

# Define age groups
age_groups = ['<25', '25-30', '30-35', '>35']

# Create a new column with the age group for each NBA player
nba_players['Age Group'] = pd.cut(nba_players['Age'], bins=[0, 25, 30, 35, np.inf], labels=age_groups, include_lowest=True)

# Group by age group and count the number of NBA players in each group
age_counts = nba_players.groupby('Age Group')['Name'].count()

# Create a pie chart of the age counts
plt.pie(age_counts, labels=age_groups, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of NBA Players by Age Group')
plt.axis('equal')  # Ensure a circular pie chart
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# read the CSV file
df = pd.read_csv('files/example.csv')

# define position groups
position_groups = ['Point Guard', 'Shooting Guard', 'Small Forward', 'Power Forward', 'Center']

# create a new column with the position group for each player
df['Position Group'] = pd.cut(df['Position'], bins=[0, 1, 2, 3, 4, 5], labels=position_groups, include_lowest=True)

# group by position group and count the number of players in each group
position_counts = df.groupby('Position Group')['Player Name'].count()

# create a pie chart of the position counts
plt.pie(position_counts, labels=position_counts.index, autopct='%1.1f%%', startangle=90)

# set the title
plt.title('NBA Players by Position')

# show the chart
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# read the CSV file
df = pd.read_csv('files/example.csv')

# define position groups
position_groups = ['Point Guard', 'Shooting Guard', 'Small Forward', 'Power Forward', 'Center']

# create a new column with the position group for each player
df['Position Group'] = pd.cut(df['Position'], bins=[0, 1, 2, 3, 4, 5], labels=position_groups, include_lowest=True)

# group by position group and count the number of players in each group
position_counts = df.groupby('Position Group')['Player Name'].count()

# create a dot chart of the position counts
fig, ax = plt.subplots()
y_pos = np.arange(len(position_counts))
ax.plot(position_counts, y_pos, 'o')
ax.set_yticks(y_pos)
ax.set_yticklabels(position_counts.index)
ax.set_xlabel('Number of Players')
ax.set_title('NBA Players by Position')

# show the chart
plt.show()

What the example.csv file looked like:

Player Name,Position LeBron James,3 Stephen Curry,1 Kevin Durant,4 Kyrie Irving,1 Kawhi Leonard,3 Anthony Davis,5

All Questions with Answers

  1. What are the two primary data structures in pandas and how do they differ? The two primary data structures in Pandas are Series and DataFrame.A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.). It is similar to a column in a spreadsheet or a SQL table. A Series has two main components: the index and the data. The index is used to label the data and can be used to access and manipulate specific elements in the Series. A DataFrame, on the other hand, is a two-dimensional labeled data structure with columns of potentially different types.

  2. How do you read a CSV file into a pandas DataFrame? One can use a read(csv) function

  3. How do you select a single column from a pandas DataFrame? To select a single column from a pandas DataFrame, you can use the square bracket indexing notation with the name of the column you want to select.

  4. How do you filter rows in a pandas DataFrame based on a condition? You can filter rows in a pandas DataFrame based on a condition by using boolean indexing. Boolean indexing allows you to select only the rows that meet a specified condition.

  5. How do you group rows in a pandas DataFrame by a particular column? To group rows in a pandas DataFrame by a particular column, you can use the groupby() function. The groupby() function creates a new object that groups the rows of the DataFrame based on the values in one or more columns.

  6. How do you aggregate data in a pandas DataFrame using functions like sum and mean? To aggregate data in a Pandas DataFrame using functions like sum and mean, you can use the groupby() method followed by the aggregation function you want to apply.

  7. How do you handle missing values in a pandas DataFrame? Drop the rows or columns with missing values: You can drop the rows or columns with missing values using the dropna() method. This method removes any row or column that contains at least one missing value.

  8. How do you merge two pandas DataFrames together? Merging two pandas DataFrames is a common task when working with data. Pandas provides the merge() function to merge two DataFrames based on a common column(s).

  9. How do you export a pandas DataFrame to a CSV file? You can export a pandas DataFrame to a CSV file using the to_csv() method.

  10. What is the difference between a Series and a DataFrame in Pandas? In summary, a Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional table-like object. They have different use cases, methods, and operations that can be performed on them.