Debug: Dataframe Column Referencing and Indexing [duplicate] - python

This question already has answers here:
Deleting multiple columns based on column names in Pandas
(11 answers)
Closed 4 years ago.
I can't figure this bug out. I think it is my misunderstanding of a dataframe and indexing through one. Also, maybe a misunderstanding of a for loop. (I am used to matlab for loops... iterations are, intuitively, way easier :D)
Here is the error:
KeyError: "['United States' 'Canada' 'Mexico'] not found in axis"
This happens at the line: as_df=as_df.drop(as_df[column])
But this makes no sense... I am calling an individual column not the entire set of dummy variables.
The following code can be copied and ran. I made sure of it.
MY CODE:
import pandas as pd
import numpy as np
df=pd.DataFrame({"country": ['United States','Canada','Mexico'], "price": [23,32,21], "points": [3,4,4.5]})
df=df[['country','price','points']]
df2=df[['country']]
features=df2.columns
print(features)
target='points'
#------_-__-___---____________________
as_df=pd.concat([df[features],df[target]],axis=1)
#Now for Column Check
for column in as_df[features]:
col=as_df[[column]]
#Categorical Data Conversion
#This will split the countries into their own column with 1 being when it
#is true and 0 being when it is false
col.select_dtypes(include='object')
dummies=pd.get_dummies(col)
#ML Check:
dumcols=dummies.drop(dummies.columns[1],axis=1)
if dumcols.shape[1] > 1:
print(column)
as_df=as_df.drop(as_df[column])
else:
dummydf=col
as_df=pd.concat([as_df,dummydf],axis=1)
as_df.head()

I would comment instead of answering, but I do not have enough reputation to do so. (I need clarification to help you and Stack Exchange does not provide me with a way to do so "properly".)
I'm not entirely sure what your end-goal is. Could you clarify what your end result for as_df would look like? Including after the for loop ends, and after the entire code is finished running?

Found my mistake.
as_df=as_df.drop(as_df[column])
should be
as_df=as_df.drop(column,axis=1)

Related

Get the indixes of the values which are greater than 0 in the column of a dataframe [duplicate]

This question already has answers here:
Python Pandas: Get index of rows where column matches certain value
(8 answers)
Closed 7 months ago.
finding the right solution for it. I checked many questions but haven't found this one yet. Can someone pls help me?
I want to go trough a column from the dataframe and check every value from the column, if it is greater than 0. If it is true, then to get the index from it.
This is what i have tried so far:
This should do the trick:
ans = df.index[df['Column_name']>0].tolist()
ans will be the list of the indexes of the values that are greater the 0 in the column "Column_name"
If you have any questions feel free to ask me in the comments and if my comment helped you please consider marking it as the answer :)

differenc between using panda.drop_duplicate or value_count on whole frame or one column

I am a new python user just for finish the homework. But I am willing to dig deeper when I meet questions.
Ok the problem is from professor's sample code for data cleaning. He use drop.duplicates() and value_counts to check unique value of a frame, here are his codes:
spyq['sym_root'].value_counts() #method1
spyq['date'].drop_duplicates() #method2
Here is the output:
SPY 7762857 #method1
0 20100506 #method2
I use spyq.shape() to help you understand the spyq dataframe :
spyq.shape #out put is (7762857, 9)
the spqy is dataframe contains trading history for spy500 in one day when is 05/06/2010.
Ok after I see this, I wonder why he specify a column'date" or :'sym_root"; why he dont just use the whole spyq.drop_dupilicates() or spyq.value_counts(), so I have a try:
spyq.value_counts()
spyq.drop_duplicates()
Both output is (6993487, 9)
The row has decreased!
but from professor's sample code, there is no duplicated row existed because the row number from method 1 's output is exactly the same as the row number from spyq.shape!
I am so confused why output of whole dataframe:spyq.drop_duplicates() is not same as spyq['column'].drop_duplicated() when there is no repeat value!
I try to use
spyq.loc[spyq.drop_duplicates()]
to see what have dropped but it is error.
Can any one kindly help me? I know my question is kind of stupid but I just want to figure it out and I want to learn python from most fundmental part not just learn some code to finish homework.
Thanks!

Showing all rows and columns of Pandas dataframe [duplicate]

This question already has answers here:
Pandas: Setting no. of max rows
(10 answers)
Closed 1 year ago.
I am working with python 3 and the pandas package in visual studio code and I the print() function is not displaying correctly.
For example when I am using df.head() it looks good.
But If I use the print() statement I no longer see all of the columns next to each other, some of them get dragged down for some reason. And I can't see the entire data
Anyone knows what I can do to see the entire data, and all of the columns next to each other?
The problem comes from library pandas that cuts part of your dataframe when it's too long. Before your print, add this line:
pandas.set_option('max_row', None)
to display the entier row.
Also, you will be able to see all your data adding None argument in head():
trading.head(None)

Pandas gives me a SettingWithCopyWarning [duplicate]

This question already has answers here:
How to deal with SettingWithCopyWarning in Pandas
(20 answers)
Closed 2 years ago.
I am trying to create two new columns in my dataframe depending on the values of the columns Subscribers, External Party and Direction. If the Direction is I for Incoming, column a should become External Party and col B should become Subscriber. If the Direction is O for Outgoing, it should be the other way around. I use the code:
import pandas as pd
import numpy as np
...
df['a'] = np.where((df.Direction == 'I'), df['External Party'], df['Subscriber'])
df['b'] = np.where((df.Direction == 'O'), df['External Party'], df['Subscriber'])
I get a SettingWithCopyWarning from Pandas, but the code does what it needs to do. How can I improve this operation to avoid the error?
Thanks in advance!
Jo
Inspect the place in your code where df is created.
Most probably, it is a view of another DataFrame, something like:
df = df_src[...]
Then any atempt to save something in df causes just this warning.
To avoid it, create df as a truly independent DataFrame, with its
own data buffer. Something like:
df = df_src[...].copy()
Now df has its own data buffer, and can be modified without the
above warning.
If you are planning to work with the same df later on in your code then it is sometimes useful to create a deep copy of the df before making any iterations.
Pandas native copy method is not always acting as one would expect - here is a similar question that might give more insights.
You can use copy module that comes with python to copy the entire object and to ensure that there are no links between 2 dataframes.
import copy
df_copy = copy.deepcopy(df)

Pandas map to a new column, SettingWithCopyWarning [duplicate]

This question already has an answer here:
df.loc causes a SettingWithCopyWarning warning message
(1 answer)
Closed 6 years ago.
In pandas data frame, I'm trying to map df['old_column'], apply user defined function f for each row and create a new column.
df['new_column'] = df['old_column'].map(lambda x: f(x))
This will give out "SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame." error.
I tried the following:
df.loc[:, 'new_column'] = df['old_column'].map(lambda x: f(x))
which doesn't help. What can I do?
A SettingWithCopy warning is raised for certain operations in pandas which may not have the expected result because they may be acting on copies rather than the original datasets. Unfortunately there is no easy way for pandas itself to tell whether or not a particular call will or won't do this, so this warning tends to be raised in many, many cases where (from my perspective as a user) nothing is actually amiss.
Both of your method calls are fine. If you want to get rid of the warning entirely, you can specify:
pd.options.mode.chained_assignment = None
See this StackOverflow Q&A for more information on this.

Categories