Syntax error while trying to build dataframe in python

Syntax error while trying to build dataframe in python - python

I am trying to create a function in python 3 that builds a dataframe from a csv file. However, I keep getting a syntax error when I call
y = (data_df["Status"].replace("underperform",0).replace("outperform",1).values.tolist())
This line of code is not running, because I never actually call the function. Here is all of my code.
def Build_Data_Set(features = ["DE Ratio","Trailing P/E"]):
data_df = pd.read_csv("key_stats.csv") #created in other file
X = np.array(data_df[features].values#.tolist())
y = (data_df["Status"].replace("underperform",0).replace("outperform",1).values.tolist())
return X,y
How should I go about fixing this error?

You're missing a closing parenthesis in your X = np.array(data_df[features].values#.tolist()) - it's there, but it's commented out of the code with the # sign.
Your python interpreter does not know that you actually wanted to end that line there and continues to search for a closing parenthesis. Before it finds one, it stumbles over the assignment in the next line, which is illegal and causes your syntax error.

You can simply do:
def Build_Data_Set(features = ["DE Ratio","Trailing P/E"]):
data_df = pd.read_csv("key_stats.csv") #created in other file
X = data_df[features].values # already returns an array
y = data_df["Status"].replace({"underperform": 0, "outperform":1}).values
return X,y
df = Build_Data_Set()

Related

How do I solve "SyntaxError: invalid character in identifier" with regards to df.iloc

df = pd.read_csv("foo.csv")
# drop NaN
df = df.dropna()
df.iloc[0,0] = 1
# drop time
for i in range(3):
    df.iloc[0,0] = 2
df.iloc[:,i] = datetime.strptime(df.iloc[:,i], "%Y-%m-%dT%H:%M:%S%z")
I get this error:
File "accuracy.py", line 19
    df.iloc[0,0] = 2
^
> SyntaxError: invalid character in identifier
I want to modify every value in the dataframe, so that I can work with the dates in there. However, when I use df.iloc in the for-loop, I get a similar Sytax Error to this one. I tried it again with simpler code, hence the df.iloc[0,0] = 2, but it gives the same error I would get from the other one. I also tried the code outside of the loop, and that doesn't result in an error, so I'm confused.
I'm working from my terminal since I can't download the csv I'm working on, which makes debugging a little more interesting. Mostly because I cannot print things if the file does not compile.
Any idea what the problem could be?

Writing a function that will normalize using either min max method or z-score method

I am fairly new to Python, so there may be a lot to improve upon, but in the following code I am trying to write a function that takes in the location of the data file, the attribute that has to be normalized and the type of normalization to be performed('min_max' or 'z_score')
After this, based on the normalization type that is mentioned, I want it to apply the appropriate formula and return a dictionary where key = original value in the dataset, value = normalized value.
def normalization (fname, attr, normType):
result = {
}
df = pd.read_csv(fname)
targ = list(df[df.columns[attr]])
scaler = MinMaxScaler()
df["minmax"] = scaler.fit.transform(df[[df.columns[attr]]])
df["zscore”] = ((df[[df.columns[attr]]]) - (df[[df.columns[attr.mean()]]]))/ (df[[df.columns[attr.std(ddof=1)]]])
if normType == "min_max":
result = dict(zip(targ, df.minmax.values.tolist())
else:
result = dict(zip(targ, df.zscore.values.tolist())
return result
I continually get an error specifically on the line with the zscore calculation and have been struggling to troubleshoot it. I would appreciate any help that could point me in the right direction.
Thanks
Edit: Error message shown is
"SyntaxError: EOL while scanning string literal"

"zscore” alone causes that error. The problem is that the ” isn't a proper double-quotes character so the string isn't properly terminated. Not sure how it got there, maybe bad formatting in a document while pasting code around. The fix: "zscore"

How to generate the Python lambda filter codes in the for loop?

I am a beginner to Python lambda. And try to convert the Python for loop to lambda expression. First I would like to explain the for loop lines.
fred = Fred2Hdfs() # construct the python imported objects
for i, state in enumerate(us_states):
df_unemployee_annual = fred.getFredDF('A', state, 'search_text') # generate dataframe from the object
if df_unemployee_annual is None:
continue
if i == 0:
fred.writeCsv2Hdfs('unemployee_annual.csv', df_unemployee_annual) # write dataframe
else:
fred.appendCsv2Hdfs('unemployee_annual.csv', df_unemployee_annual) # append dataframe
The above code work successfully without errors. And below codes are the Python lambda codes which I try to convert.
fred = Fred2Hdfs()
freq='A'
str='search_text'
result_df_list = list(map(lambda state: fred.getFredDF(freq, state, str), us_states))
result_df_list = list(filter(lambda df: df is not None, result_df_list))
print(result_df_list) # codes work correctly until this line.
#func=map(lambda df:fred.writeCsv2Hdfs('unemployee_annual_.csv', df) , result_df_list)
I am stuck with if i==0: line in the for loop. How can I make the appropriate Python lambda expression from if i==0: line. I am afraid I have no idea how to implement the if filter of Python lambda.

map(lambda (i,df):fred.writeCsv2Hdfs('unemployee_annual_.csv', df) if i == 0 else fred.appendCsv2Hdfs('unemployee_annual_.csv', df) , enumerate(result_df_list))

rpy2 code snippet returns an empty object

I am using rpy2 to use a R library in Python. The library has a function prebas() that returns an array in which the item with index [8] is the output. When I write this output to CSV within the R code snippet, everything works as expected (the output is a CSV over 200kB). However, when I return the same object (PREBASout[8]), it returns an empty object. So, obviously, when I write that object to CSV, the file is empty.
run_prebasso = robjects.r('''
weather <- read.csv("/home/example_inputs/weather.csv",header = T)
PAR = c(weather$PAR,weather$PAR,weather$PAR)
TAir = c(weather$TAir,weather$TAir,weather$TAir)
Precip = c(weather$Precip,weather$Precip,weather$Precip)
VPD = c(weather$VPD,weather$VPD,weather$VPD)
CO2 = c(weather$CO2,weather$CO2,weather$CO2)
DOY = c(weather$DOY,weather$DOY,weather$DOY)
library(Rprebasso)
PREBASout = prebas(nYears = 100, PAR=PAR,TAir=TAir,VPD=VPD,Precip=Precip,CO2=CO2)
write.csv(PREBASout[8],"/home/outputs/written_in_r.csv",row.names = F)
PREBASout[8]
''')
r_write_csv = robjects.r['write.csv']
r_write_csv(run_prebasso, "/home/outputs/written_in_py.csv")
This is what the code snippet returns:
(Pdb) run_prebasso
<rpy2.rinterface.NULLType object at 0x7fc1b31e6b48> [RTYPES.NILSXP]
Question: Why aren't written_in_py.csv and written_in_r.csv the same?

I have just found the bug. The problem was in the line
write.csv(PREBASout[8],"/home/outputs/written_in_r.csv",row.names = F)
This statement was returned instead of what I wanted (PREBASout[8]). When I removed it or assigned it to a variable, everything worked as expected.

Having an issue with using median function in numpy

I am having an issue with using the median function in numpy. The code used to work on a previous computer but when I tried to run it on my new machine, I got the error "cannot perform reduce with flexible type". In order to try to fix this, I attempted to use the map() function to make sure my list was a floating point and got this error message: could not convert string to float: .
Do some more attempts at debugging, it seems that my issue is with my splitting of the lines in my input file. The lines are of the form: 2456893.248202,4.490 and I want to split on the ",". However, when I print out the list for the second column of that line, I get
4
.
4
9
0
so it seems to somehow be splitting each character or something though I'm not sure how. The relevant section of code is below, I appreciate any thoughts or ideas and thanks in advance.
def curve_split(fn):
with open(fn) as f:
for line in f:
line = line.strip()
time,lc = line.split(",")
#debugging stuff
g=open('test.txt','w')
l1=map(lambda x:x+'\n',lc)
g.writelines(l1)
g.close()
#end debugging stuff
return time,lc
if __name__ == '__main__':
# place where I keep the lightcurve files from the image subtraction
dirname = '/home/kuehn/m4/kepler/subtraction/detrending'
files = glob.glob(dirname + '/*lc')
print(len(files))
# in order to create our lightcurve array, we need to know
# the length of one of our lightcurve files
lc0 = curve_split(files[0])
lcarr = np.zeros([len(files),len(lc0)])
# loop through every file
for i,fn in enumerate(files):
time,lc = curve_split(fn)
lc = map(float, lc)
# debugging
print(fn[5:58])
print(lc)
print(time)
# end debugging
lcm = lc/np.median(float(lc))
#lcm = ((lc[qual0]-np.median(lc[qual0]))/
# np.median(lc[qual0]))
lcarr[i] = lcm
print(fn,i,len(files))

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Syntax error while trying to build dataframe in python - python

Related

How do I solve "SyntaxError: invalid character in identifier" with regards to df.iloc

Writing a function that will normalize using either min max method or z-score method

How to generate the Python lambda filter codes in the for loop?

rpy2 code snippet returns an empty object

Having an issue with using median function in numpy

Categories

Resources