Python - looping multiple lists with enumerate for same index - python

list1 = Csvfile1._getRow(' field1')
list2 = Csvfile2._getRow(' field1')
_list1 = Csvfile1._getRow(' field2')
_list2 = Csvfile2._getRow(' field2')
for i,(a,b) in enumerate(zip(list2, list1)):
value = False
if field == ' field1':
for j,(c,d) in enumerate(zip(_list2, _list1)):
if i == j:
if a != b and c != d:
value = True
else:
value = False
break
if value == True:
continue
if a != b
# do something
Below is the sample :
values in both the csv files are compared. when the value for field1
is not equal in both csv files, the condition if a != b: should be executed.
When the value for field1 is not equal in both csv files, and at the same time if the values for field2 is also not equal -> then the condition if a != b: should not be executed.
With huge data this seems to be not working. Or is there a better way to achieve this ?
Csvfile1
field1 | field2
222 | 4 -> enter if a != b: condition loop
435 | 5 -> do not enter if a != b: condition loop
Csvfile2
field1 | field2
223 | 4
436 | 6

If I got right what you want to do, try something like this:
$ cat t1.txt
field1|field2
222|4
435|5
$ cat t2.txt
field1|field2
223|4
436|6
$ python
import csv
with open("t1.txt", "rb") as csvfile:
with open("t2.txt", "rb") as csvfile2:
reader = csv.reader(csvfile, delimiter='|')
reader2 = csv.reader(csvfile2, delimiter='|')
for row1, row2 in zip(reader, reader2):
for elem1, elem2 in zip(row1, row2):
if elem1 != elem2:
print "different: {} and {}".format(elem1, elem2)
different: 222 and 223
different: 435 and 436
different: 5 and 6

#first field(ff) second field(sf) first file(ff) second file(sf)
field1csv1 = Csvfile1._getRow(' field1')
field1csv2 = Csvfile2._getRow(' field1')
field2csv1 = Csvfile1._getRow(' field2')
field2csv2 = Csvfile2._getRow(' field2')
Every time you have huge lists of data you should think about using a generator instead of a list comprehension. itertools.izip is a generator version of zip.
Plugging it in should give you a considerable improvement, as no temporary lists will be generated:
from itertools import izip
for i, (a, b) in enumerate(izip(list2, list1)):
value = False
if field == ' field1':
for j, (c, d) in enumerate(izip(_list2, _list1)):
if i == j:
if a != b and c != d:
value = True
else:
value = False
break
if value == True:
continue
if a != b
# do something
This is an example of how to refactor your code to get rid of the iteration in python and drop the iteration to the C level:
#orig
for i, (a, b) in enumerate(zip(list2, list1)):
value = False
if field == ' field1':
for j, (c, d) in enumerate(zip(_list2, _list1)):
if i == j:
if a != b and c != d:
value = True
else:
value = False
break
With generators:
from itertools import izip
mygen = izip(izip(list2,list1),izip(_list2,_list1))
#[((a, b), (c, d)), ((x, y), (_x, _y)), ...]
values = [tuple1[0]!=tuple1[1] and tuple1[2]!=tuple2[1] for tuple1, tuple2 in mygen]
Also you could use "equality" generators:
field1 = izip(field1csv1, field1csv2)
field2 = izip(field2csv1, field2csv2)
field1equal = (f[0] == f[1] for f in field1)
field2equal = (f[0] == f[1] for f in field2)
I got this far and then gave up. I have no idea what you're doing.

Related

Why doesn't my script display the newly created file?

I am writing a Python script that opens a file and for each test case, it should write the total number of items gathered in that case and display the total amount $ that it costs.
When running my code:
f = open("shopping.txt", "r")
outFile = open("results.txt", "w")
t = int(f.readline().strip())
for z in range(t):
# Assuming prices are unique
myList = {}
items = int(f.readline().strip())
ind = 1
# Read each line for each item
for i in range(items):
p, w = map(int, f.readline().strip().split())
myList[p] = [w, ind]
ind+=1
weights = []
F = int(f.readline().strip())
for i in range(F):
weights.append(int(f.readline().strip()))
RES = []
values = []
for weight in weights:
sortedPrice = sorted(myList.keys())[::-1]
m = 0
p = 0
tmp = []
# Grabbing all possible results using greedy method
# Max price stored into values array and item # in RES array.
for i in range(len(myList)):
R = []
s = 0
p = 0
if myList[sortedPrice[i]][0]<=weight:
s=myList[sortedPrice[i]][0]
p=sortedPrice[i]
R+=myList[sortedPrice[i]][1],
for j in range(i+1, len(myList)):
if myList[sortedPrice[j]][0]+s<=weight:
s+=myList[sortedPrice[j]][0]
p+=sortedPrice[j]
R+=myList[sortedPrice[j]][1],
if m<p:
m = p
tmp = R
tmp.sort()
RES.append(tmp)
values.append(m)
outFile.write("Test Case %d\n" %(z+1))
outFile.write("Total Price: %d\n" %(sum(values)))
outFile.write("Member Items:\n")
for i in range(len(RES)):
outFile.write("%d: %s" %(i+1, " ".join(map(str, RES[i]))))
f.close()
outFile.close()
I get the result:
Test Case 1
Total Price: 0
Member Items:
Test Case 2
Total Price: 0
Member Items:
When I expected something like this:
Test Case1
Total Price 72
Member Items
1: 1
Test Case2
Total Price 568
Member Items
1: 3 4
2: 3 6
3: 3 6
4: 3 4 6
I am relatively new to programming in general so if there is any insight anyone could give for my code, I would appreciate it. Adding to this, my guess is that the sum() and/or the map commands may be breaking and not working as intended, as I'm writing to the file to get the total value and items of the case.

Is there a way to align two separate Excel files so that each item in one file has a unique matching item in the other file? Code is almost working

I have two Excel files that need to be align in the sense that each row from one excel file has a unique corresponding row in the other excel file, weather it be a matching data point or a blank value. The Excel files are not the same size and have some values that match and some that do not, however, they are both in sequential order.
I am trying to accomplish this by inserting empty rows. I am having trouble inserting the correct amount of empty rows when there is no match and then continuing to the next value. I believe that my code is very close to working. Also included in the code both modified Excel files are combined into one file as separate tabs.
j=0
iterations=100+Branch_Flow_Pre.max_row
for i in range(2, iterations):
#if str(Branch_Flow_Pre.cell(row=i, column=1).value) == "None" and str(Branch_Flow_Post.cell(row=i, column=1).value) == "None":
# print("blanks, i = ",i,"j = ",j)
# i += 1
if Branch_Flow_Pre.cell(row=i, column=2).value == Branch_Flow_Post.cell(row=i, column=2).value and Branch_Flow_Pre.cell(row=i, column=8).value == Branch_Flow_Post.cell(row=i, column=8).value:
print("match, i = ",i,"j= ",j)
i += 1
else:
j=0
while j<21:
if Branch_Flow_Pre.cell(row=i+j, column=2).value == Branch_Flow_Post.cell(row=i, column=2).value and Branch_Flow_Pre.cell(row=i+j, column=8).value == Branch_Flow_Post.cell(row=i, column=8).value:
if j!=0:
for x in range(0, j+1):
Branch_Flow_Post.insert_rows(i)
print("insert Post, x = ",x,"i = ",i,"j = ",j)
else:
print("error")
i = i+j
j=21
break
elif Branch_Flow_Pre.cell(row=i, column=2).value == Branch_Flow_Post.cell(row=i+j, column=2).value and Branch_Flow_Pre.cell(row=i, column=8).value == Branch_Flow_Post.cell(row=i+j, column=8).value:
if j!=0:
for x in range(0, j+1):
Branch_Flow_Pre.insert_rows(i)
print("insert Pre, x = ",x,"i = ",i,"j = ",j)
else:
print("error")
i = i+j
j=21
break
elif j==20:
Branch_Flow_Post.insert_rows(i)
print("break, i = ",i,"j = ",j," Insert Post")
i += 1
j = 21
break
else:
print("increment, i = ",i,"j = ",j)
j += 1
c=1
r=2
for row in Branch_Flow_Pre.values:
for v in row:
BF_Pre.cell(row=r, column=c).value = v
c += 1
c=1
r += 1
c=1
r=2
for row in Branch_Flow_Post.values:
for v in row:
BF_Post.cell(row=r, column=c).value = v
c += 1
c=1
r += 1
book3.save(outfilename)
## the rest is not code
desired output:
Input1 Input2 Output1 Output2
A 1 B 2 A 1
B 2 B 2 B 2
C 3 C 3
x y C 3 C 3
D 4 x y
D 4
D 4 D 4
Actual output:
Input1 Input2 Output1 Output2
A 1 B 2 A 1
B 2 B 2 B 2
C 3 C 3
x y C 3
D 4
D 4
D 4
C 3
x y
D 4
I was able to correct the code by adding additional conditions that do not allow empty cells to be counted as an acceptable match.
Here is an image of my working script:
working code
This function will perform a comparison between two excel tabs with similar data and align the two tabs so that each item has a unique item in the other tab. Alternatively you can think of this as a function which will place matching items on the same excel line number in each tab so that a comparison can be easily performed between the two.

Reading from a python file

I have a text file that got some antonyms in the format:
able || unable
unable || able
abaxial || adaxial
adaxial || abaxial
and I need to check if this word is antonyms of another or not.
What i did is a code like this:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
return x, y
but what I have got is only the last pairs, then I tried to indent return a bit so I did
def antonyms():
f = open('antonym_adjectives.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
return x, y
But again I got first pairs only
How can I get all of the pairs?
and how can I do something like:
>>> antonyms(x, y)
to tell me if they are True or False?
Because no answer answers BOTH your questions:
Firstly: if you return, the program will stop there. So you want to yield your antonym so the whole function becomes a generator:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split(' || ') # Think you also need surrounding spaces here.
x = a[0]
y = a[1]
yield x, y
To use this function to check for is_antonym(a,b):
def is_antonym(a,b):
for x,y in antonyms():
if a == x and b == y:
return True
return False
Other answers have good tips too:
A good replacement for instance would be: [x,y] = line.split(' || ')
Your problem is that with return you are getting out of the function.
Instead, of x = a[0] and y = a[1], append those values to an array, and the return that array.
the_array = []
for line in f:
a = line.split('||')
the_array.append((a[0],a[1]))
return the_array
It would make more sense to write a function that gets the antonym of a given word:
def antonym(word):
with open('antonym_adjectives.txt', 'r+') as f:
for line in f:
a, b = line.split('||')
a = a.strip()
b = b.strip()
if a == word:
return b
if b == word:
return a
You can then write antonym(x) == y to check if x and y are antonyms. (However this assumes each word has a single unique antonym).
This reads the file from the beginning each time. If your list of antonyms is manageable in size it might make more sense to read it in to memory as an array or dictionary.
If you can't assume that each word has a single unique antonym, you could turn this into a generator that will return all the antonyms for a given word.
def antonym(word):
with open('antonym_adjectives.txt', 'r+') as f:
for line in f:
a, b = line.split('||')
a = a.strip()
b = b.strip()
if a == word:
yield b
if b == word:
yield a
Then y in antonyms(x) will tell you whether x and y are antonyms, and list(antonyms(x)) will give you the list of all the antonyms of x.
You could use yield:
def antonyms():
f = open('antonyms.txt', 'r+')
for line in f:
a = line.split('||')
x = a[0]
y = a[1]
yield x, y
for a,b in antonyms():
# rest of code here
By the way, you can assign directly to x and y:
x,y = line.split('||')
A simple check if a and b are antonyms could be:
(a == 'un' + b) or (b == 'un' + a)
that is my code ..
def antonyms(first,second):
f = open('text.txt', 'r')
for line in f.readlines():
lst = [s.strip() for s in line.split('||')]
if lst and len(lst) == 2:
x = lst[0]
y = lst[1]
if first == x and second == y:
return True
return False
I don't think list is a good data structure for this problem. After you've read all the antonym pairs into a list, you still have to search the whole list to find the antonym of a word. A dict would be more efficient.
antonym = {}
with open('antonym_adjectives.txt') as infile:
for line in infile:
x,y = line.split('||)
antonym[x] = y
antonym[y] = x
Now you can just look up an antonym in the dict:
try:
opposite = antonym[word]
except KeyError:
print("%s not found" %word)

python simple if else structure wrong comparison

This is my code
for i,val in enumerate(DS3Y_pred_trans):
if val < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
There are values less than 1.5 in the list, but the out is all 2s.
What am I missing?
This is the whole code.
from numpy import genfromtxt
DS3X_train = np.genfromtxt('train.csv', dtype=float, delimiter=',')
print DS3X_train
DS3Y_train = np.genfromtxt('train_labels.csv', dtype=int, delimiter=',' )
print DS3Y_train
DS3X_test = np.genfromtxt('test.csv', dtype=float, delimiter=',')
print DS3X_test
DS3Y_test = np.genfromtxt('test_labels.csv', dtype=int, delimiter=',' )
print DS3Y_test
DS3X_train_trans = zip(*DS3X_train)
cov_train = np.cov(DS3X_train_trans)
U, s, V = np.linalg.svd(cov_train, full_matrices=True)
u = U[:,:-1]
u_trans = zip(*u)
DS3X_train_reduced = np.dot(u_trans,DS3X_train_trans)
b = np.ones((3,2000))
b[1:,:] = DS3X_train_reduced
print "\n"
DS3X_train_reduced = b
DS3X_train_reduced_trans = zip(*DS3X_train_reduced)
temp = np.dot(DS3X_train_reduced,DS3X_train_reduced_trans)
try:
inv_temp = np.linalg.inv(temp)
except np.linalg.LinAlgError:
pass
else:
psue_inv = np.dot(inv_temp,DS3X_train_reduced)
print psue_inv.shape
weight = np.dot(psue_inv,DS3Y_train)
weight_trans = zip(weight)
print weight_trans
DS3X_test_trans = zip(*DS3X_test)
DS3X_test_reduced = np.dot(u_trans,DS3X_test_trans)
b = np.ones((3,400))
b[1:,:] = DS3X_test_reduced
print "\n"
print b
DS3X_test_reduced = b
print DS3X_test_reduced.shape
DS3X_test_reduced_trans = zip(*DS3X_test_reduced)
DS3Y_pred = np.dot(DS3X_test_reduced_trans,weight_trans)
print DS3Y_pred
print DS3Y_pred.shape
DS3Y_pred_trans = zip(DS3Y_pred)
print repr(DS3Y_pred_trans[0])
for i,val in enumerate(DS3Y_pred_trans):
if val < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
print DS3Y_pred
now regression using indicator variable and graph plottings
Your values are not numbers. In Python 2 numbers sort before other objects, so when comparing val with 1.5, the comparison is always false.
You probably have strings:
>>> '1.0' < 1.5
False
>>> 1.0 < 1.5
True
If so, convert your values to floats first:
for i, val in enumerate(DS3Y_pred_trans):
if float(val) < 1.5:
DS3Y_pred_trans[i] = 1
else:
DS3Y_pred_trans[i] = 2
It could be that you are storing other objects in the list still; you'll need to take a close look at what is actually in the list and adjust your code accordingly, or fix how the list is created in the first place.
Since you are replacing all values anyway, you could use a list comprehension:
DS3Y_pred_trans = [1 if float(val) < 1.5 else 2 for val in DS3Y_pred_trans]

how to merge file lines having the same first word in python?

I have written a program to merge lines in a file containing the same first word
in python.However I am unable to get the desired output.
Can anyone please suggest me the mistake in my program?
Note:- (line1,line 2) and (line4,line5,line6) are merging since they
have the same first element
#input
"file.txt"
line1: a b c
line2: a b1 c1
line3: d e f
line4: i j k
line5: i s t
line6: i m n
#output
a b c a b1 c1
d e f
i j k i s t i m n
#my code
for i in range(0,len(a)):
j=i
try:
while True:
if a[j][0] == a[j+1][0]:
L.append(a[j])
L.append(a[j+1])
j=j+2
else:
print a[i]
print L
break
except:
pass`
Try this (give it the file as an argument).
Produces a dictionary with the lines you expect.
import sys
if "__main__" == __name__:
new_lines = dict()
# start reading file
with open(sys.argv[1]) as a:
# iterate file by lines - removing newlines
for a_line in a.read().splitlines():
# candidate is first word in every sentence
candidate = a_line.split()[0] # split on whitespace by default
# dictionary keys are previous candidates
if candidate in new_lines.keys():
# word already included
old_line = new_lines[candidate]
new_lines[candidate] = "%s %s" % (old_line, a_line)
else:
# word not included
new_lines[candidate] = a_line
# now we have our dictionary. print it (or do what you want with it)
for key in new_lines.keys():
print "%s -> %s" % (key, new_lines[key])
output:
a -> a b c a b1 c1
i -> i j k i s t i m n
d -> d e f

Categories