Combining Masking and Indexing in Pandas - python

Consider the following data frame :
population_dict = {'California': 38332521,
'Texas': 26448193,
'New York': 19651127,
'Florida': 19552860,
'Illinois': 12882135}
pop = pd.Series(population_dict)
area_dict = {'California': 423967, 'Texas': 695662, 'New York': 141297,
'Florida': 170312, 'Illinois': 149995}
area = pd.Series(area_dict)
data = pd.DataFrame({'area' : area, 'pop' : pop})
I can perform masking and indexing on columns in the same line as follows :
In [492]:data.loc[data.density > 100, ['pop', 'density']]
Out[492]:
pop density
New York 19651127 139.076746
Florida 19552860 114.806121
But if I need to do this masking and indexing on rows? Something like:
data.loc[data.density > 100, ['New York']]. But this statement obviously gives an error.

If you just want to extract information, chaining loc works just fine:
data[data.density > 100].loc[['New York']]
Output:
area pop density
New York 141297 19651127 139.076746

Try using:
data2 = data.loc[data.density > 100, ['pop', 'density']]
print(data2.loc[data2.index == 'New York'])

Related

SettingWithCopyWarning when I try to add a new column to a DataFrame

Not entirely sure what problem here is?
When I run the code below I get the following warning. Why is this case, and how is it fixed? Thanks
:18: SettingWithCopyWarning: A value
is trying to be set on a copy of a slice from a DataFrame. Try using
.loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
city['average'] = np.mean(city.transaction_value)
Here is the code.
import pandas as pd
import numpy as np
#Create DataFrame
city = ['Paris', 'Paris', 'Paris', 'London', 'London', 'London', 'New York', 'New York', 'New York']
transaction = [100, 90, 40, 100, 110, 40, 150, 200, 100]
df = pd.DataFrame(list(zip(city, transaction)), columns=['city', 'transaction_value'])
# Create new DataFrame to work with
transactions = df.loc[:, ['city', 'transaction_value']]
city_averages = pd.DataFrame()
city_averages
for i in transactions['city'].unique():
city = transactions[transactions['city'] == i]
city['average'] = np.mean(city.transaction_value)
city_averages = city_averages.append(city)
city_averages

Pandas DataFrame creation throws ValueError in loop

I have a nested dictionary(stats), which I'm trying to convert into a Pandas DF. When I run the below code, I get the desired result:
BAL_sp = pd.DataFrame(data = stats['sp']['Orioles'])
However, I need to do this 30 times then concatenate the results. When I run a for loop, I get a ValueError: DataFrame constructor not properly called! I don't understand, it is recognizing the key in stats as valid in the loop:
team_dict = {'LAA': 'Angels', 'ARI': 'Diamondbacks', 'BAL': 'Orioles', 'BOS': 'Red Sox', 'CHC': 'Cubs', 'CIN': 'Reds',
'CLE': 'Indians', 'COL': 'Rockies', 'DET': 'Tigers', 'HOU': 'Astros', 'KC': 'Royals', 'LAD': 'Dodgers',
'WSH':'Nationals', 'NYM': 'Mets', 'OAK': 'Athletics', 'PIT': 'Pirates', 'SD': 'Padres', 'SEA': 'Mariners',
'SF': 'Giants', 'STL': 'Cardinals', 'TB': 'Rays', 'TEX': 'Rangers', 'TOR': 'Blue Jays', 'MIN': 'Twins',
'PHI': 'Phillies', 'ATL': 'Braves', 'CWS': 'White Sox', 'MIA': 'Marlins', 'NYY': 'Yankees', 'MIL': 'Brewers' }
frames = []
for team in team_dict.values():
temp = pd.DataFrame(data = stats['sp'][team])
frames.append(temp)
sp_df = pd.concat(frames)
It doesn't throw an error if I do data = [stats['sp'][team]], but that does not produce the desired result. Thank you for any help.

drop rows with multiple column values

I have a dataset where I have to drop rows with multiple columns. I tried this, but do not know how to do with multiple values
import pandas as pd
df = pd.read_csv("data.csv")
new_df = df[df.location == 'New York' ]
new_df.count()
I also tried another method, but do not know, how to do with multiple values:
import pandas as pd
df = pd.read_csv("data.csv")
df.drop(df[df['location '] == 'New York'].index, inplace = True)
I have delete rows, with values new york, boston, Austin and keep other locations remaining.
Also, I have replace value of a column
if San Francisco then change value to 1, if Miami change to 2, so all values in location, should be replaced
You can use query method and variable with all cities you want to filter
np.random.seed(0)
cities = ['New York', 'Chicago', 'Miami']
data = pd.DataFrame(dict(cities = np.random.choice(cities, 10),
values = np.random.choice(10,10)))
data.cities.unique() # array(['New York', 'Chicago', 'Miami'], dtype=object)
filter = ['New York', 'Chicago']
data_filtered = data.query('cities not in #filter').copy()
data_filtered.cities.unique() # array(['Miami'], dtype=object)
For the values, you can manually set values
data_filtered.loc[data_filtered.cities == 'Miami', ['values']] =2
I don't quite follow what you mean by dropping rows with multiple columns, but to check for multiple values you could use: new_df = df[df.location in ['New York', 'Boston']]
You can try:
# Drop the rows with location "New York", "Boston", "Austin" (1)
df = df[~df["location"].isin(["New York", "Boston", "Austin"])]
# Replace locations with numbers: (2)
loc_map = {"San Francisco": 1, "Miami": 2, ...}
df["location"] = df["location"].map(loc_map)
For step (2), in case you have many values, you can create loc_map automatically by:
loc_map = {df.location.unique()[i]: i+1 for i in range(len(df.location.unique()))}
Hope this helps.

pandas concat successful but error message following stops loop

I'm getting the error message
ValueError: No objects to concatenate
When using pd.concat, but the concat appears to be successful as when I try and print the resulting dataframe it does so successfully but the error message terminates the loop.
state_list = ['Colorado', 'Ilinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Michigan', 'Minnesota', 'Missouri', 'Nebraska',\
'North Carolina', 'North Dakota', 'Ohio', 'Pennsylvania', 'South Dakota', 'Tennessee', 'Texas', 'Wisconsin']
for state_name in state_list:
### DF1 is a dataframe unique to each state###
condition_categories = (df1['description'].unique())
cats = []
for cat in condition_categories:
category_df = df1[['week', 'value']].where(df1['description'] == cat).dropna()
category_df = category_df.set_index('week')
category_df = category_df.rename(columns={'value': str(cat)})
category_df.week = dtype=np.int
cats.append(category_df)
#print(category_df)
df = pd.concat(cats, axis =1)
print(df)
Sorry for the late response. Looks like there is some issue with "cats" list. It is empty in one of the iterations of outer for loop.
You can add a condition just above concatination line. Like below, it may work.
if len(cats) > 0:
df = pd.concat(cats, axis =1)
else:
print("No records to concatinate")

How to remove duplicate records from multiple lists in order in python [duplicate]

This question already has answers here:
How do I remove duplicates from a list, while preserving order?
(30 answers)
Closed 6 years ago.
I have three list with four values in each list, I have to remove duplicate values from these three list
Here are three lists
country_list = ['USA', 'India', 'China', 'India']
city_list = ['New York', 'New Delhi', 'Beijing', 'New Delhi']
event_list = ['First Event', 'Second Event', 'Third Event', 'Second Event']
As its showing in all three lists "India", "new delhi", and Second event" are repeating, means they again are repeating with each other. I want to remove these repeating value, and wants the result like
country_list = ['USA', 'India', 'China']
city_list = ['New York', 'New Delhi', 'Beijing']
event_list = ['First Event', 'Second Event', 'Third Event']
So how can i get this result is there any function for this ?
One simple way is to do the following:
country_list = list(set(country_list))
city_list = list(set(city_list))
event_list = list(set(event_list))
Hope this helps.
Something like
country_list = list(set(country_list))
city_list = list(set(city_list))
event_list = list(set(event_list))
Ought to do it. This is because a set cannot have duplicates by definition. When you convert your list to a set, the duplicates are discarded. If you want the data to be in a form of a list once again you need to convert it back to a list as shown above. In most cases you can use the set exactly as you would use a list.
for example
for item in set(country_list):
print item
so the conversion back to list may not be needed.
Just use set().
Have a look into this: Python Sets
And this:Sets
For your Lists you can do it like this:
>>> city_list = ['New York', 'New Delhi', 'Beijing', 'New Delhi']
>>> set(city_list)
set(['New Delhi', 'New York', 'Beijing'])
>>> list(set(city_list))
['New Delhi', 'New York', 'Beijing']

Categories