Panda Hacks
Panda Hacks by Shivansh Goel
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# Create a DataFrame with NBA player data
nba_players = pd.DataFrame({
'Name': ['LeBron James', 'Kevin Durant', 'Stephen Curry', 'James Harden', 'Giannis Antetokounmpo', 'Luka Doncic', 'Kawhi Leonard', 'Joel Embiid', 'Damian Lillard', 'Nikola Jokic'],
'Age': [37, 33, 33, 32, 27, 22, 30, 27, 31, 26],
'Position': ['Small Forward', 'Power Forward', 'Point Guard', 'Shooting Guard', 'Power Forward', 'Point Guard', 'Small Forward', 'Center', 'Point Guard', 'Center'],
'Salary': [39000000, 38000000, 43000000, 38000000, 39000000, 27000000, 34000000, 29000000, 31000000, 26000000]
})
# Define age groups
age_groups = ['<25', '25-30', '30-35', '>35']
# Create a new column with the age group for each NBA player
nba_players['Age Group'] = pd.cut(nba_players['Age'], bins=[0, 25, 30, 35, np.inf], labels=age_groups, include_lowest=True)
# Group by age group and count the number of NBA players in each group
age_counts = nba_players.groupby('Age Group')['Name'].count()
# Create a pie chart of the age counts
plt.pie(age_counts, labels=age_groups, autopct='%1.1f%%', startangle=90)
plt.title('Distribution of NBA Players by Age Group')
plt.axis('equal') # Ensure a circular pie chart
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# read the CSV file
df = pd.read_csv('files/example.csv')
# define position groups
position_groups = ['Point Guard', 'Shooting Guard', 'Small Forward', 'Power Forward', 'Center']
# create a new column with the position group for each player
df['Position Group'] = pd.cut(df['Position'], bins=[0, 1, 2, 3, 4, 5], labels=position_groups, include_lowest=True)
# group by position group and count the number of players in each group
position_counts = df.groupby('Position Group')['Player Name'].count()
# create a pie chart of the position counts
plt.pie(position_counts, labels=position_counts.index, autopct='%1.1f%%', startangle=90)
# set the title
plt.title('NBA Players by Position')
# show the chart
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
# read the CSV file
df = pd.read_csv('files/example.csv')
# define position groups
position_groups = ['Point Guard', 'Shooting Guard', 'Small Forward', 'Power Forward', 'Center']
# create a new column with the position group for each player
df['Position Group'] = pd.cut(df['Position'], bins=[0, 1, 2, 3, 4, 5], labels=position_groups, include_lowest=True)
# group by position group and count the number of players in each group
position_counts = df.groupby('Position Group')['Player Name'].count()
# create a dot chart of the position counts
fig, ax = plt.subplots()
y_pos = np.arange(len(position_counts))
ax.plot(position_counts, y_pos, 'o')
ax.set_yticks(y_pos)
ax.set_yticklabels(position_counts.index)
ax.set_xlabel('Number of Players')
ax.set_title('NBA Players by Position')
# show the chart
plt.show()
All Questions with Answers
-
What are the two primary data structures in pandas and how do they differ? The two primary data structures in Pandas are Series and DataFrame.A Series is a one-dimensional labeled array that can hold data of any type (integer, float, string, etc.). It is similar to a column in a spreadsheet or a SQL table. A Series has two main components: the index and the data. The index is used to label the data and can be used to access and manipulate specific elements in the Series. A DataFrame, on the other hand, is a two-dimensional labeled data structure with columns of potentially different types.
-
How do you read a CSV file into a pandas DataFrame? One can use a read(csv) function
-
How do you select a single column from a pandas DataFrame? To select a single column from a pandas DataFrame, you can use the square bracket indexing notation with the name of the column you want to select.
-
How do you filter rows in a pandas DataFrame based on a condition? You can filter rows in a pandas DataFrame based on a condition by using boolean indexing. Boolean indexing allows you to select only the rows that meet a specified condition.
-
How do you group rows in a pandas DataFrame by a particular column? To group rows in a pandas DataFrame by a particular column, you can use the groupby() function. The groupby() function creates a new object that groups the rows of the DataFrame based on the values in one or more columns.
-
How do you aggregate data in a pandas DataFrame using functions like sum and mean? To aggregate data in a Pandas DataFrame using functions like sum and mean, you can use the groupby() method followed by the aggregation function you want to apply.
-
How do you handle missing values in a pandas DataFrame? Drop the rows or columns with missing values: You can drop the rows or columns with missing values using the dropna() method. This method removes any row or column that contains at least one missing value.
-
How do you merge two pandas DataFrames together? Merging two pandas DataFrames is a common task when working with data. Pandas provides the merge() function to merge two DataFrames based on a common column(s).
-
How do you export a pandas DataFrame to a CSV file? You can export a pandas DataFrame to a CSV file using the to_csv() method.
-
What is the difference between a Series and a DataFrame in Pandas? In summary, a Series is a one-dimensional array-like object, while a DataFrame is a two-dimensional table-like object. They have different use cases, methods, and operations that can be performed on them.