NameError: name 'air' is not defined - python

I am a python beginner, the situation is:
In test.py:
import numpy as np
import pandas as pd
from numpy import *
def model(file):
import numpy as np
import pandas as pd
data0 = pd.ExcelFile(file)
data = data0.parse('For Stata')
data1 = data.values
varnames = list(data)
for i in range(np.shape(data)[1]):
var = varnames[i]
exec(var+'=np.reshape(data1[:,i],(2217,1))')
return air
air is one of the 'varnames'
Now I run the following in jupyter notebook:
file0 = 'BLPreadydata.xlsx'
from test import model
model(file0)
the error that I get is:
NameError: name 'air' is not defined
EDIT: I tried to pin down the error, it actually came from
exec(var+'=np.reshape(data1[:,i],(2217,1))')
somehow this is not working when I call the function, but it does work when I run it outside the function.
NOTE:
Someone have done this in MATLAB:
vals = [1 2 3 4]
vars = {'a', 'b', 'c', 'd'}
for i = vals
eval([vars{i} '= vals(i)'])
end

You should use one more for loop in function to iterate varnames and find 'air, if found then store it another variable and return that variable.
Try this.
for j in varnames:
if j=='air':
c=j
Then return c.
return c

I found an answer after reading the exec(.) doc and guessing...
air is actually saved as a local variable after exec(.)...
hence, instead of
return air
put
return locals()['air']
Thanks for all the help.

Related

Why is pandas eval not working anymore with where?

I was using pandas eval within a where that sits inside a function in order to create a column in a data frame. While it was working in the past, not it doesn't. There was a recent move to Python 3 within our dataiku software. Could that be the reason for it?
Below will be the code that is now in place
import pandas as pd, numpy as np
from numpy import where, nan
d = {'ASSET': ['X','X','A','X','B'], 'PRODUCT': ['Z','Y','Z','C','Y']}
MAIN_df = pd.DataFrame(data=d)
def val_per(ASSET, PRODUCT):
return(
where(pd.eval("ASSET== 'X' & PRODUCT == 'Z'"),0.04,
where(pd.eval("PRODUCT == 'Y'"),0.08,1.5)
)
)
MAIN_2_df = (MAIN_df.eval("PCT = #val_per(ASSET, PRODUCT)"))
The error received now is <class 'TypeError'>: unhashable type: 'numpy.ndarray'
You can change the last two lines with:
MAIN_2_df = MAIN_df.copy()
MAIN_2_df = val_per(MAIN_2_df.ASSET, MAIN_2_df.PRODUCT)
This approach will work faster for large dataframes. You can use a vectorized aproach to faster results.

How to solve the FunctionError and MapError

Python 3.6 pycharm
import prettytable as pt
import numpy as np
import pandas as pd
a=np.random.randn(30,2)
b=a.round(2)
df=pd.DataFrame(b)
df.columns=['data1','data2']
tb = pt.PrettyTable()
def func1(columns):
def func2(column):
return tb.add_column(column,df[column])
return map(func2,columns)
column1=['data1','data2']
print(column1)
print(func1(column1))
I want to get the results are:
tb.add_column('data1',df['data1'])
tb.add_column('data2',df['data2'])
As a matter of fact,the results are:
<map object at 0x000001E527357828>
I am trying find the answer in Stack Overflow for a long time, some answer tell me can use list(func1(column1)), but the result is [None, None].
Based on the tutorial at https://ptable.readthedocs.io/en/latest/tutorial.html, PrettyTable.add_column modifies the PrettyTable in-place. Such functions generally return None, not the modified object.
You're also overcomplicating the problem by trying to use map and a fancy wrapper function. The below code is much simpler, but produces the desired result.
import prettytable as pt
import numpy as np
import pandas as pd
column_names = ['data1', 'data2']
a = np.random.randn(30, 2)
b = a.round(2)
df = pd.DataFrame(b)
df.columns = column_names
tb = pt.PrettyTable()
for col in column_names:
tb.add_column(col, df[col])
print(tb)
If you're still interesting in learning about the thing that map returns, I suggest reading about iterables and iterators. map returns an iterator over the results of calling the function, and does not actually do any work until you iterate over it.

Getting data from a file and inserting it into a list

I was trying to get data(a list) from a file and assign this list to my python script list.
I want to know how to do it without having to assign all varibles manually
Variables = [MPDev,WDev,DDev,LDev,PDev,MPAll,WAll,DAll,LAll,PAll,MPBlit,WBlit,DBlit,LBlit,PBlit,MPCour,WCour,DCour,LCour,PCour]
dataupdate = open("griddata.txt","r")
datalist = dataupdate.read()
#Inside the file is written:
#['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0',']
var = 0
for e in Variables:
e = datalist[var]
var += 1
I got it working anyways but i would like to know a faster way to improve my skills. Thanks
Get used to using data as a pandas dataframe. It's easy to read, easy to write.
http://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html
import pandas as pd
data = pd.read_csv("griddata.txt", names = ['MPDev',
'WDev',
'DDev',
'LDev',
'PDev',
'MPAll',
'WAll',
'DAll',
'LAll',
'PAll',
'MPBlit',
'WBlit',
'DBlit',
'LBlit',
'PBlit',
'MPCour',
'WCour',
'DCour',
'LCour',
'PCour']
)
import ast
Variables = [MPDev,WDev,DDev,LDev,PDev,MPAll,WAll,DAll,LAll,PAll,MPBlit,WBlit,DBlit,LBlit,PBlit,MPCour,WCour,DCour,LCour,PCour]
dataupdate = open("tmp.txt","r")
datalist = ast.literal_eval(dataupdate.read())
#Inside the file is written:
#['0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0','0',']
for i in Variables:
i=datalist[Variables.index(i)]
Another alternative is using dictionary.
mydict={}
var = 0
for e in Variables:
mydict[e]=datalist[var]
var += 1

Count occurrences of number from specific column in python

I am trying to do the equivalent of a COUNTIF() function in excel. I am stuck at how to tell the .count() function to read from a specific column in excel.
I have
df = pd.read_csv('testdata.csv')
df.count('1')
but this does not work, and even if it did it is not specific enough.
I am thinking I may have to use read_csv to read specific columns individually.
Example:
Column name
4
4
3
2
4
1
the function would output that there is one '1' and I could run it again and find out that there are three '4' answers. etc.
I got it to work! Thank you
I used:
print (df.col.value_counts().loc['x']
Here is an example of a simple 'countif' recipe you could try:
import pandas as pd
def countif(rng, criteria):
return rng.eq(criteria).sum()
Example use
df = pd.DataFrame({'column1': [4,4,3,2,4,1],
'column2': [1,2,3,4,5,6]})
countif(df['column1'], 1)
If all else fails, why not try something like this?
import numpy as np
import pandas
import matplotlib.pyplot as plt
df = pandas.DataFrame(data=np.random.randint(0, 100, size=100), columns=["col1"])
counters = {}
for i in range(len(df)):
if df.iloc[i]["col1"] in counters:
counters[df.iloc[i]["col1"]] += 1
else:
counters[df.iloc[i]["col1"]] = 1
print(counters)
plt.bar(counters.keys(), counters.values())
plt.show()

Get CSV from Tensorflow summaries

I have some very large tensorflow summaries. If these are plotted using tensorboard, I can download CSV files from them.
However, plotting these using tensorboard would take a very long time. I found in the docs that there is a method for reading the summary directly in Python. This method is summary_iterator and can be used as follows:
import tensorflow as tf
for e in tf.train.summary_iterator(path to events file):
print(e)
Can I use this method to create CSV files directly? If so, how can I do this? This would save a lot of time.
One possible way of doing it would be like this:
from tensorboard.backend.event_processing import event_accumulator
import numpy as np
import pandas as pd
import sys
def create_csv(inpath, outpath):
sg = {event_accumulator.COMPRESSED_HISTOGRAMS: 1,
event_accumulator.IMAGES: 1,
event_accumulator.AUDIO: 1,
event_accumulator.SCALARS: 0,
event_accumulator.HISTOGRAMS: 1}
ea = event_accumulator.EventAccumulator(inpath, size_guidance=sg)
ea.Reload()
scalar_tags = ea.Tags()['scalars']
df = pd.DataFrame(columns=scalar_tags)
for tag in scalar_tags:
events = ea.Scalars(tag)
scalars = np.array(map(lambda x: x.value, events))
df.loc[:, tag] = scalars
df.to_csv(outpath)
if __name__ == '__main__':
args = sys.argv
inpath = args[1]
outpath = args[2]
create_csv(inpath, outpath)
Please note, this code will load the entire event file into memory, so best to run this on a cluster. For information about the sg argument of the EventAccumulator, see this SO question.
An additional improvement might be to not only store the value of each scalar, but also the step.
Note The code snippet was updated for recent versions of TF. For TF < 1.1 use the following import instead:
from tensorflow.tensorboard.backend.event_processing import event_accumulator as eva

Categories