In what way can i debug this attribute error in python? - python

I'm trying to follow a Regression tutorial for python as the stats model package does not seem to be working for me. So I got this far until I received an attribute error.
input:
import pandas as pd
data = pd.read_csv("China_FDIGDP.csv")
data1 = data.dropna()
data1.to_csv("data1.csv", index = False)
Data = pd.read_csv("data1.csv")
print(Data)
x = pd.Data["GDP"].values()
y = pd.Data["FDI_net_in"].values()
Here's the output:
Traceback (most recent call last):
File "FDI.py", line 20, in <module>
x = pd.Data["GDP"].values()
AttributeError: module 'pandas' has no attribute 'Data'
What am I doing wrong?
Date FDI_net_in GDP
0 1982 4.300000e+08 2.050897e+11
1 1983 6.360000e+08 2.306867e+11
2 1984 1.258000e+09 2.599465e+11
3 1985 1.659000e+09 3.094880e+11
4 1986 1.875000e+09 3.007581e+11
Index(['Date', 'FDI_net_in', 'GDP '], dtype='object')

The error comes from these lines
x = pd.Data["GDP"].values()
y = pd.Data["FDI_net_in"].values()
You have read the dataframe like Data = pd.read_csv("data1.csv") so in order to get the GDP column from it you just want to access it like this:
x = Data["GDP"].values
y = Data["FDI_net_in"].values

Try this
Data.columns = Data.columns.str.strip(' ') # remove tab spaces in column names
x = Data["GDP"].values
y = Data["FDI_net_in"].values

Change the file name, if your file name is pandas.py or pd.py as it may cause some errors with the pandas library.

Related

I am trying to use Jupyter to run analysis and have run the code below but I get NameError instead. I had defined df at the beginning

df = pd.read_csv('dowjones.csv', index_col=0);
df['rm'] = 100 * (np.log(df.DJIA) - np.log(df.DJIA.shift(1)))
df.head()
I initially defined df here, in the code above
df = df.dropna()
formula = 'MSFTtrans ~ rm'
results2 = smf.ols(formula, df).fit(cov_type = 'HAC', cov_kwds={'maxlags':10,'use_correction':True})
print(results2.summary())
Then I ran the code above
NameError Traceback (most recent call last)
<ipython-input-3-b46efd5c722d> in <module>
2
3
----> 4 df = df.dropna()
5 formula = 'MSFTtrans ~ rm'
6 results2 = smf.ols(formula, df).fit(cov_type = 'HAC', cov_kwds={'maxlags':10,'use_correction':True})
NameError: name 'df' is not defined
This is the error I got saying df is not defined.
There should not be a semi colon at the end of df = pd.read_csv().
Also run the first code and then run the second code. What you are doing is you are not running the first code so df is not defined and when you try to run second code, it is giving you the error.

Why am i getting " NameError: name 'df2' is not defined" error?

Hello I separated a large 24-hours DataFrame because each hour includes data more than 500 for three columns and I want to merge the hour data side by side like this :
Actually all of them come from same dataframe but I separated.Here is my code :
...
grouped = dataframeone.groupby(dataframeone.Hour)
df_list = [df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15,df16,df17,df18,df19,df20,df21,df22,df23,df24,df25]
df_list[0] = grouped.get_group(0)
df_list[0] = df_list[0].drop(columns=['Hour'])
for j in range (1,24):
df_list[j] = grouped.get_group(j)
df_list[j]= df_list[j].drop(columns=['Hour'])
df= pd.concat([df_list[j-1],df_list[j]], axis=1)
print(df)
But I am getting this error :
NameError Traceback (most recent call last)
<ipython-input-47-ff832fd92e63> in <module>
24 grouped = dataframeone.groupby(dataframeone.Hour)
---> 25 df_list = [df1,df2,df3,df4,df5,df6,df7,df8,df9,df10,df11,df12,df13,df14,df15,df16,df17,df18,df19,df20,df21,df22,df23,df24,df25]
26 df_list[0] = grouped.get_group(0)
27 df_list[0] = df_list[0].drop(columns=['Hour'])
NameError: name 'df2' is not defined
I can't understand that the error comes from where and why.. Please show me. How can I fix ? Should I use another way?
You got this because you do not create df2 anywhere. You might do this:
...
grouped = dataframeone.groupby(dataframeone.Hour)
df_list = ['df'+str(i) for i in range(1,26)]
df_list[0] = grouped.get_group(0)
df_list[0] = df_list[0].drop(columns=['Hour'])
for j in range (1,24):
df_list[j] = grouped.get_group(j)
df_list[j]= df_list[j].drop(columns=['Hour'])
df= pd.concat([df_list[j-1],df_list[j]], axis=1)
print(df)

I am trying to map two Excel sheets with one common column, I am getting error stating as attribute error

import pandas as pd
#file1 = "mapp1.xlsx"
#file2 = "mapp2.xlsx"
file1 = "C:\\Users\\Sudharshan\\Desktop\\file_map\\mapp1.xlsx"
file2 = "C:\\Users\\Sudharshan\\Desktop\\file_map\\mapp2.xlsx"
df1 = pd.read_excel(file1)
df2 = pd.read_excel(file2)
marge = pd.marge(df1, df2, on="Name")
print(marge)
Please help here, I get this error when I execute:
Marge is not an attribute in pandas
I am trying with Jupyter notebook pycharm and google collab
AttributeError
Traceback (most recent call last) in
10 df2 = pd.read_excel(file2)
11
---> 12 marge = pd.marge(df1, df2, on="Name")
13
14 print(marge)
~\anaconda3\lib\site-packages\pandas_init_.py in getattr(name)
256 return _SparseArray
257
--> 258 raise AttributeError(f"module 'pandas' has no attribute '{name}'")
259
260
AttributeError: module 'pandas' has no attribute 'marge'
There is a typo in your code.
it should be pd.merge I think, instead of pd.marge
Also for Combining two excels sheets column wise...
Try this:
Lets Suppose your workbook 1 or excel file 1 has four columns named 'Locations','Employees','Revenue'and 'Profit'. And your excel file 2 has all these columns with some extra columns too. Now you want to merge these four columns from both the files in output.xlsx
So, you can do this:
import pandas as pd
excel1 = 'workbook1.xlsx'
excel2 = 'workbook2.xlsx'
df1 = pd.read_excel(excel1)
df2 = pd.read_excel(excel2)
values1 = df1['Locations','Employees','Revenue','Profit']
values2 = df2['Locations','Employees','Revenue','Profit']
dataframes = [values1, values2]
join = pd.concat(dataframes)
join.to_excel("output.xlsx")

How do I change the dtype of a pandas series?

I am attempting to work on a data set from the Museum of modern art and wish to convert some of the series to integer values (for calculations later on). I have tried to convert the dtype using the .astype method but I've been unsuccessful. I saw somewhere that you can do this in the same line of code as the open csv operation, so I attempted it, although unsuccessfully.
import pandas as pd
df = pd.read_csv('artworks.csv', dtype ={'BeginDate': int})
df.head()
df.dtypes
TypeError Traceback (most recent call last)
pandas_libs\parsers.pyx in pandas._libs.parsers.TextReader._convert_tokens()
ValueError: invalid literal for int() with base 10: '(1947)'
Ultimately, my goal is to convert the BeginDate and EndDate columns (which are object types) to integers. So as an alternative I tried to write a function to remove the parentheses from the dates and also convert the dates to integers. This is below;
def date_cleaner(date):
if date != "":
date = date.replace("(", "")
date = date.replace(")", "")
date = int(date)
return (date)
date_cleaner(1999)
But this also returned an error when I ran the code. However, when I put in ('1999') as an argument, the code works as it should. The issue is that when I use the function on the pandas series (during an iteration for example) I am returned with the error below;
for i, row in df.iterrows():
birth_date = row[3]
death_date = row[4]
birth_date = date_cleaner(birth_date)
death_date = date_cleaner(death_date)
row[3] = birth_date
row[4] = death_date
df.head()
AttributeError Traceback (most recent call last)
<ipython-input-54-dbecb2797a53> in <module>
3 death_date = row[4]
4
----> 5 birth_date = date_cleaner(birth_date)
6 death_date = date_cleaner(death_date)
7
<ipython-input-51-3ddccbf04d24> in date_cleaner(date)
6 if date != "":
7
----> 8 date = date.replace("(", "")
9 date = date.replace(")", "")
10 date = int(date)
AttributeError: 'int' object has no attribute 'replace'
What am I doing wrong and how I can actually clean the columns and convert the dtype?
P.S I have tried to look into the regex method but I am new to python and it seems quite technical
You can use str.strip method then as type int
df['BeginDate'] = df['BeginDate'].astype(str).str.strip('()').astype(int)
Acutlly the full example might clear it up more here:
In [10]: df = pd.DataFrame( data = [ {'BeginDate' : '(1948)' } ] )
In [11]: df
Out[11]:
BeginDate
0 (1948)
In [12]: df['BeginDate'] = df['BeginDate'].astype(str).str.strip('()').astype(int)
In [13]: df
Out[13]:
BeginDate
0 1948
In [14]:
Edit:
Answer to your keeping null_values intact question:
In [43]: def clean_year(begin_date):
...: if not pd.isnull(begin_date):
...: return int(str(begin_date).strip('()'))
...: return begin_date
...:
In [44]: df['BeginDate'] .apply(clean_year)
Out[44]:
0 1948.0
1 NaN
Name: BeginDate, dtype: float64
but keep in mind that this will make your columns dtype to float as there is none value in there.

Why do I get a TypeError when I try to create a data frame?

I am writing code to analyze some data and want to create a data frame. How do I set it up successfully to run?
this is for analysis of data and I would like to create a data frame that can categorize data in different grades such as A
Here is the code I wrote:
import analyze_lc_Feb2update
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
df.shape
df_new = df[df.grade=='A']
df_new.shape
df.columns
df.int_rate.head(5)
df.int_rate.tail(5)
df.int_rate.dtype
df.term.dtype
df_new = df[df.grade =='A']
df_new.shape
output:
TypeError Traceback (most recent call last)
<ipython-input-3-7079435f776f> in <module>()
2 from imp import reload
3 analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
4 df = analyze_lc_Feb2update.create_df()
5 df.shape
6 df_new = df[df.grade=='A']
TypeError: create_df() missing 1 required positional
argument: 'grade'
Based on what was provided I guess your problem is here:
from imp import reload
analyze_lc_Feb2update = reload(analyze_lc_Feb2update)
df = analyze_lc_Feb2update.create_df()
This looks like some custom library you are trying to use, of which the .create_df() method requires a positional argument "grade" which would require you to do something like:
df = analyze_lc_Feb2update.create_df(grade="blah")

Categories