Deserialization in pyomo of data constructed in mathematica

Deserialization in pyomo of data constructed in mathematica - python

A mathematica program returns as an output a sum of cardinal sines. The first element of the vector is
-19.9959 Sinc[0.0418879 (0. + t)] Sinc[0.0897598 (-65. + u)]
The variable is saved in a text file; however, this has to be read in pyomo as a variable, so a StringReplace is used to adapt this variable to python's grammmar
savedXPython =
Import["savedWindX.txt"] //
StringReplace[#, {"[" -> "(", "]" -> ")",
"t" -> "m.lammda[i]*180/np.pi", "u" -> "m.phi[i]*180/np.pi"}] &
Then, savedXPython was saved to another text file. However, an error appeared while working with pyomo; I asked here and the answer was to save the result in a json file instead of a text.
Export["savedWindXPython.txt", savedXPython];
Export["savedWindXPythonJ.json", savedXPython, "ExpressionJSON"];
Now, in the pyomo part, the text file was originally read as
g = open("savedWindXPython.txt","r")
b=f.readline()
g.close
later on, following this thread, the json has been read as
f = open("savedWindXPythonJ.json","r")
a=f.readline()
f.close
And then, the variable inside the pyomo code is defined as
def Wind_lammda_definition(model, i):
return m.Wind_lammda[i] == a
m.Wind_lammda_const = Constraint(m.N, rule = Wind_lammda_definition)
in the case of the json file or
def Wind_lammda_definition(model, i):
return m.Wind_lammda[i] == b
m.Wind_lammda_const = Constraint(m.N, rule = Wind_lammda_definition
in the case of the original text file
The code however doesn't work. AttributeError: 'str' object has no attribute 'is_relational', the error which prevented me from just reading the variable from the text file also appears in the json case.
It seems that using the json format has not helped. Can someone tell me if the json implementation has been done wrong?

When reading a line from your file, Pyhton will always return a string. If 1 is the only content in your line, the returned value will be equal to "1" and not 1. That can be solved with a = float(a), since you want to use a numeric value in your constraint. This will simply convert your a string into a float.

Related

Writing a function that will normalize using either min max method or z-score method

I am fairly new to Python, so there may be a lot to improve upon, but in the following code I am trying to write a function that takes in the location of the data file, the attribute that has to be normalized and the type of normalization to be performed('min_max' or 'z_score')
After this, based on the normalization type that is mentioned, I want it to apply the appropriate formula and return a dictionary where key = original value in the dataset, value = normalized value.
def normalization (fname, attr, normType):
result = {
}
df = pd.read_csv(fname)
targ = list(df[df.columns[attr]])
scaler = MinMaxScaler()
df["minmax"] = scaler.fit.transform(df[[df.columns[attr]]])
df["zscore”] = ((df[[df.columns[attr]]]) - (df[[df.columns[attr.mean()]]]))/ (df[[df.columns[attr.std(ddof=1)]]])
if normType == "min_max":
result = dict(zip(targ, df.minmax.values.tolist())
else:
result = dict(zip(targ, df.zscore.values.tolist())
return result
I continually get an error specifically on the line with the zscore calculation and have been struggling to troubleshoot it. I would appreciate any help that could point me in the right direction.
Thanks
Edit: Error message shown is
"SyntaxError: EOL while scanning string literal"

"zscore” alone causes that error. The problem is that the ” isn't a proper double-quotes character so the string isn't properly terminated. Not sure how it got there, maybe bad formatting in a document while pasting code around. The fix: "zscore"

Index out of bounds for a 2D Array?

I am having a 2D List and i am trying to retrieve a column with the index spcified as a parameter (type : IntEnum).
I get the index out of bounds error when trying to retrieve any column other then the one at index 0.
Enum:
class Column(IntEnum):
ROAD = 0
SECTION = 1
FROM = 2
TO = 3
TIMESTAMP = 4
VFLOW=5
class TrafficData:
data=[[]]
Below are member methods of TrafficData
Reading from file and storing the matrix:
def __init__(self,file):
self.data=[[word for word in line.split('\t')]for line in file.readlines()[1:]]
Retrieve-ing the desired column:
def getColumn(self,columnName):
return [line[columnName] for line in self.data]
Call:
)
column1 = traficdata.getColumn(columnName=Column.ROAD)
`column2 = traficdata.getColumn(columnName=Column.FROM)` //error
`column3 = traficdata.getColumn(columnName=Column.TO)` //error
I attached a picture with the data after __init__ processing:
[

I tested the code that you provided above, and didn't see any issues. That leads me to believe that there might be something wrong with the data that you have in the file. Could you paste the file data? (the tab delimited data)
UPDATE -
I found the issue - as suspected, it was a data issue (there is a minor code update involved too). Make the following changes -
1) When opening the file use the appropriate encoding, I used utf-16.
2) At the end of the data file that you shared, it contains the text - "(72413 row(s) affected)" along with a couple of new line characters. So, you have 2 options, either manually cleanup the data file, or update the code to ignore the "(72413 row(s) affected)" & "\n" characters.
Hope that helps.

python variable m.start() from re.finditer does not get overwritten

I am currently writing a small python program for manipulating text files. (I am a newb programmer)
First, I am using re.finditer to find a specific string in lines1. Then I write this into a file and close it.
Next I want to grab the first line and search for this in another text file. The first time using re.finditer it was working great.
The problem is: m.start() always returns the last value of the first m.start. It does not get overwritten as it was the first time using re.finditer.
Could you help me understand why?
my code:
for m in re.finditer(finder1,lines1):
end_of_line = lines1.find('\n',m.start())
#print(m.start())
found_tag = lines1[m.start()+lenfinder1:end_of_line]
writefile.write(found_tag+'\n')
lenfinder2 = len(found_tag)
input_file3 = open ('out.txt')
writefile.close()
num_of_lines3 = file_len('out.txt')
n=1
while (n < num_of_lines3):
line = linecache.getline('out.txt', n)
n = n+1
re.finditer(line,lines2)
#print(m.start())

You've not declared\initialized line that you're using here :
re.finditer(line,lines2)
So, change :
linecache.getline('out.txt', n)
to
line = linecache.getline('out.txt', n)

Python - Reading a CSV, won't print the contents of the last column

I'm pretty new to Python, and put together a script to parse a csv and ultimately output its data into a repeated html table.
I got most of it working, but there's one weird problem I haven't been able to fix. My script will find the index of the last column, but won't print out the data in that column. If I add another column to the end, even an empty one, it'll print out the data in the formerly-last column - so it's not a problem with the contents of that column.
Abridged (but still grumpy) version of the code:
import os
os.chdir('C:\\Python34\\andrea')
import csv
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
if 'phone' in tableHeader:
phoneIndex = tableHeader.index('phone')
else:
phoneIndex = -1
for row in exampleReader:
row[-1] =''
print(phoneIndex)
print(row[phoneIndex])
csvOpen.close()
my.csv
stuff,phone
1,3235556177
1,3235556170
Output
1
1
Same script, small change to the CSV file:
my.csv
stuff,phone,more
1,3235556177,
1,3235556170,
Output
1
3235556177
1
3235556170
I'm using Python 3.4.3 via Idle 3.4.3
I've had the same problem with CSVs generated directly by mysql, ones that I've opened in Excel first then re-saved as CSVs, and ones I've edited in Notepad++ and re-saved as CSVs.
I tried adding several different modes to the open function (r, rU, b, etc.) and either it made no difference or gave me an error (for example, it didn't like 'b').
My workaround is just to add an extra column to the end, but since this is a frequently used script, it'd be much better if it just worked right.
Thank you in advance for your help.

row[-1] =''
The CSV reader returns to you a list representing the row from the file. On this line you set the last value in the list to an empty string. Then you print it afterwards. Delete this line if you don't want the last column to be set to an empty string.

If you know it is the last column, you can count them and then use that value minus 1. Likewise you can use your string comparison method if you know it will always be "phone". I recommend if you are using the string compare, convert the value from the csv to lower case so that you don't have to worry about capitalization.
In my code below I created functions that show how to use either method.
import os
import csv
os.chdir('C:\\temp')
csvOpen = open('my.csv')
exampleReader = csv.reader(csvOpen)
tableHeader = next(exampleReader)
phoneColIndex = None;#init to a value that can imply state
lastColIndex = None;#init to a value that can imply state
def getPhoneIndex(header):
for i, col in enumerate(header): #use this syntax to get index of item
if col.lower() == 'phone':
return i;
return -1; #send back invalid index
def findLastColIndex(header):
return len(tableHeader) - 1;
## methods to check for phone col. 1. by string comparison
#and 2. by assuming it's the last col.
if len(tableHeader) > 1:# if only one row or less, why go any further?
phoneColIndex = getPhoneIndex(tableHeader);
lastColIndex = findLastColIndex(tableHeader)
for row in exampleReader:
print(row[phoneColIndex])
print('----------')
print(row[lastColIndex])
print('----------')
csvOpen.close()

Unexpected whitespace in python generated strings

I am using Python to generate an ASCII file composed of very long lines. This is one example line (let's say line 100 in the file, '[...]' are added by me to shorten the line):
{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,479 8,485 1,[...]}
If I open the ASCII file that I generated with ipython:
f = open('myfile','r')
print repr(f.readlines()[99])
I do obtain the expected line printed correctly ('[...]' are added by me to shorten the line):
'{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,479 8,485 1,[...]}\n'
On the contrary, if I open this file with the program that is suppose to read it, it will generate an exception, complaining about an unexpected pair after 478 1.
So I tried to open the file with vim. Still vim shows no problem, but if I copy the line as printed by vim and paste it in another text editor (in my case TextMate), this is the line that I obtain ('[...]' are added by me to shorten the line):
{6 1,14 1,[...],264 1,270 2,274 2,[...],478 1,4 79 8,485 1,[...]}
This line indeed has a problem after the pair 478 1.
I tried to generate my lines in different ways (concatenating, with cStringIO, ...), but I always obtain this result. When using the cStringIO, for example, the lines are generated as in the following (even though I tried to change this, as well, with no luck):
def _construct_arff(self,attributes,header,data_rows):
"""Create the string representation of a Weka ARFF file.
*attributes* is a dictionary with attribute_name:attribute_type
(e.g., 'num_of_days':'NUMERIC')
*header* is a list of the attributes sorted
(e.g., ['age','name','num_of_days'])
*data_rows* is a list of lists with the values, sorted as in the header
(e.g., [ [88,'John',465],[77,'Bob',223]]"""
arff_str = cStringIO.StringIO()
arff_str.write('#relation %s\n' % self.relation_name)
for idx,att_name in enumerate(header):
try:
name = att_name.replace("\\","\\\\").replace("'","\\'")
arff_str.write("#attribute '%s' %s\n" % (name,attributes[att_name]))
except UnicodeEncodeError:
arff_str.write('#attribute unicode_err_%s %s\n'
% (idx,attributes[att_name]))
arff_str.write('#data\n')
for data_row in data_rows:
row = []
for att_idx,att_name in enumerate(header):
att_type = attributes[att_name]
value = data_row[att_idx]
# numeric attributes can be sparse: None and zeros are not written
if ((not att_type == constants.ARRF_NUMERIC)
or not ((value == None) or value == 0)):
row.append('%s %s' % (att_idx,value))
arff_str.write('{' + (','.join(row)) + '}\n')
return arff_str.getvalue()
UPDATE: As you can see from the code above, the function transforms a given set of data to a special arff file format. I noticed that one of the attributes I was creating contained numbers as strings (e.g., '1', instead of 1). By forcing these numbers into integers:
features[name] = int(value)
I recreated the arff file successfully. However I don't see how this, which is a value, can have an impact on the formatting of *att_idx*, which is always an integer, as also pointed out by #JohnMachin and #gnibbler (thanks for your answers, btw). So, even if my code runs now, I still don't see why this happens. How can the value, if not properly transformed into int, influence the formatting of something else?
This file contains the wrongly formatted version.

The built-in function repr is your friend. It will show you unambiguously what you have in your file.
Do this:
f = open('myfile','r')
print repr(f.readlines()[99])
and edit your question to show the result.
Update: As to how it got there, it is impossible to tell, because it cannot have been generated by the code that you showed. The value 37 should be a value of att_idx which comes from enumerate() and so must be an int. You are formatting this int with %s ... 37 can't become 3rubbish7. Also that should generate att_idx in order 0, 1, etc etc but you are missing many values and there is nothing conditional inside your loop.
Please show us the code that you actually ran.
Update:
And again, this code won't run:
for idx,att_name in enumerate(header):
arff_str.write("#attribute '%s' %s\n" % (name,attributes[att_name]))
because name is not defined; you probably mean att_name.
Perhaps we can short-circuit all this stuffing about: post a copy of your output file (zipped if it's huge) on the web somewhere so that we can see for ourselves what might be disturbing its consumers. Please do edit your question to say which line(s) exhibits(s) the problem.
By the way, you say some of the data is string rather than integer, and the problem goes away if you coerce the data to int by doing features[name] = int(value) ... what is 'features'?? What is 'name'??
Are any of those strings unicode instead of str?
Update 2 (after bad file posted on net)
No info supplied on which line(s) exhibits(s) the problem. As it turned out, no lines exhibited the described problem with attribute 479. I wrote this checking script:
import re, sys
# sample data line:
# {40 1,101 3,319 2,375 2,525 2,530 bug}
# Looks like all data lines end in ",530 bug}" or ",530 other}"
pattern1 = r"\{(?:\d+ \d+,)*\d+ \w+\}$"
matcher1 = re.compile(pattern1).match
pattern2 = r"\{(?:\d+ \d+,)*"
matcher2 = re.compile(pattern2).match
bad_atts = re.compile(r"\D\d+\s+\W").findall
got_data = False
for lino, line in enumerate(open(sys.argv[1], "r"), 1):
if not got_data:
got_data = line.startswith('#data')
continue
if not matcher1(line):
print
print lino, repr(line)
m = matcher2(line)
if m:
print "OK up to offset", m.end()
print bad_atts(line)
Sample output (wrapped at column 80):
581 '{2 1,7 1,9 1,12 1,13 1,14 1,15 1,16 1,17 1,18 1,21 1,22 1,24 1,25 1,26 1,27
1,29 1,32 1,33 1,36 1,39 1,40 1,44 1,48 1,49 1,50 1,54 1,57 1,58 1,60 1,67 1,68
1,69 1,71 1,74 1,75 1,76 1,77 1,80 1,88 1,93 1,101 ,103 6,104 2,109 20,110 3,11
2 2,114 1,119 17,120 4,124 39,128 5,137 1,138 1,139 1,162 1,168 1,172 18,175 1,1
76 6,179 1,180 1,181 2,185 2,187 9,188 8,190 1,193 1,195 2,196 4,197 1,199 3,201
3,202 4,203 5,206 1,207 2,208 1,210 2,211 1,212 5,213 1,215 2,216 3,218 2,220 2
,221 3,225 8,226 1,233 1,241 4,242 1,248 5,254 2,255 1,257 4,258 4,260 1,266 1,2
68 1,269 3,270 2,271 5,273 1,276 1,277 1,280 1,282 1,283 11,285 1,288 1,289 1,29
6 8,298 1,299 1,303 1,304 11,306 5,308 1,309 8,310 1,315 3,316 1,319 11,320 5,32
1 11,322 2,329 1,342 2,345 1,349 1,353 2,355 2,358 3,359 1,362 1,367 2,368 1,369
1,373 2,375 9,377 1,381 4,382 1,383 3,387 1,388 5,395 2,397 2,400 1,401 7,407 2
,412 1,416 1,419 2,421 2,422 1,425 2,427 1,431 1,433 7,434 1,435 1,436 2,440 1,4
49 1,454 2,455 1,460 3,461 1,463 1,467 1,470 1,471 2,472 7,477 2,478 11,479 31,4
82 6,485 7,487 1,490 2,492 16,494 2,495 1,497 1,499 1,501 1,502 1,503 1,504 11,5
06 3,510 2,515 1,516 2,517 3,518 1,522 4,523 2,524 1,525 4,527 2,528 7,529 3,530
bug}\n'
OK up to offset 203
[',101 ,']
709 '{101 ,124 2,184 1,188 1,333 1,492 3,500 4,530 bug}\n'
OK up to offset 1
['{101 ,']
So it looks like the attribute with att_idx == 101 can sometimes contain the empty string ''. You need to sort out how this attribute is to be treated. It would help your thinking if you unwound this Byzantine code:
if ((not att_type == constants.ARRF_NUMERIC)
or not ((value == None) or value == 0)):
Aside: that "expletive deleted" code won't run; it should be ARFF, not ARRF
into:
if value or att_type != constants.ARFF_NUMERIC:
or maybe just if value: which will filter out all of None, 0, and "". Note that att_idx == 101 corresponds to the attribute "priority" which is given a STRING type in the ARFF file header:
[line 103] #attribute 'priority' STRING
By the way, your statement about features[name] = int(value) "fixing" the problem is very suspicious; int("") raises an exception.
It may help you to read the warning at the end of this wiki section about sparse ARFF files.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Deserialization in pyomo of data constructed in mathematica - python

Related

Writing a function that will normalize using either min max method or z-score method

Index out of bounds for a 2D Array?

python variable m.start() from re.finditer does not get overwritten

Python - Reading a CSV, won't print the contents of the last column

Unexpected whitespace in python generated strings

Categories

Resources