I wrote a python script to run a sql in PostgreSQL,
import sys, os, math
os.chdir(r'C:\Users\Heinz\Desktop')
print os.getcwd()
#set up psycopg2 environment
import psycopg2
#shortest_path module
query = """
select *
from shortest_path ($$
select
gid as id,
source::int4 as source,
target::int4 as target,
cost::double precision as cost,
rcost::double precision as reverse_cost
from network
$$, %s, %s, %s, %s
)
"""
#make connection between python and postgresql
conn = psycopg2.connect("dbname = 'test' user = 'postgres' host = 'localhost' password = 'xxxx'")
cur = conn.cursor()
#count rows in the table
cur.execute("select count(*) from network")
result = cur.fetchone()
k = result[0] + 1 #number of points = number of segments + 1
#run loops
#import csv module
import csv
import tempfile
element = []
i = 1
l = 1
filename = 'pi_value.csv'
with open(filename, 'wb') as f:
while i <= k:
while l <= k:
cur.execute(query, (i, l, True, True))
element = cur.fetchall()
product = sum([a[-1] for a in element[:-1]])
writer = csv.writer(f, delimiter = ',')
writer.writerow([product])
element = []
l = l + 1
l = 1
i = i + 1
You can see that I used iterators from i to k(and l to k) to do the while loop, now I got a csv file containing numbers I want the iterator i and l to be. For example, here's the csv file,
I want the iterator to loop through using the number in every row starting from the first one, like in the innermost while loop, l = 6, l = 31, l = 28,..., l = 17, i is starting from 6 too, but only moves to i = 31 as l moves to 17 and back to l = 6, and so on.
How to write additional lines to read this csv file and let the iterator in the while loop to run loops as those numbers in the file?
Update#1
I tried this,
element = []
with open('tc_sta_id.csv') as f1, open('pi_value.csv', 'wb') as f2:
csvs = csv.reader(f1)
col_num = 0
rows = list(csvs)
k = len(rows)
for row in csvs:
i = row[col_num]
l = row[col_num]
while i <= k:
while l <= k:
cur.execute(query, (i, l, True, True))
element = cur.fetchall()
product = sum([a[-1] for a in element[:-1]])
writer = csv.writer(f2, delimiter = ',')
writer.writerow([product])
element = []
l = l + 1
l = row[col_num]
i = i + 1
the script runs fine, but there are all blank in the output csv file, please give me suggestions to fix this!
Since your question has changed quite a bit since the start, I'm just adding this as a seperate answer. So this is an answer specifically to your update 1.
The condition for your while loop is wrong. Your condition is based on the number of rows in your csv (8 in your example). You compare this with the numbers found in the csv (so 6, 31, ...). This means your while loops stop every time you hit the second number (31 > 8). Moreover you're not jumping to the next element of your csv, but you just add 1. I haven't tried to run your code, but I think your looping over: i=6,7,8 with l=6,7,8 for each value of i. Then it tries with 31, stops immediately as it does with the rest (they're all over 8).
I'm not entirely sure what you want as you seem to keep wanting to use extra while loops for something, and I'm not sure what you want to use them for (can't find it in your question, everything in your question implies for loops only).
I'm also not sure whether i and l come from the same csv or not. I made you a solution where you can easily make i and l come from different csvs, but I set them at the beginning to come from the same one. If they come from the same csv, you cannot just nest the for loops with the same iterator, so we cheat and extract them into a list (I tested this with a simple example).
rows = list(csvs) #convert to a list to avoid problems with iterating over the same iterator
csv_for_i = rows
csv_for_l = rows
for row_i in csv_for_i:
i = row_i[col_num]
for row_l in csv_for_l:
l = row_l[col_num]
cur.execute(query, (i, l, True, True))
element = cur.fetchall()
product = sum([a[-1] for a in element[:-1]])
writer = csv.writer(f2, delimiter = ',')
writer.writerow([product])
element = []
Let me know if this works. If so, accept the answer and I'll think about how to make the question and the answers into something that works more nicely on stack overflow. Currently, there are actually multiple questions and answers here and that's confusing for other people searching for answers.
Just for info, a small example on pitfalls with iterators (made with csv, but it goes for all iterators).
import csv
# test.csv contents:
#
#6
#31
#17
print 'Case 1:'
with open('test.csv') as f1:
csv1 = csv.reader(f1)
csv2 = csv.reader(f1)
for el1 in csv1:
for el2 in csv2:
print el1, el2
# Results
#
#['6'] ['31']
#['6'] ['17']
print 'Case 2:'
with open('test.csv') as f1:
csvs = csv.reader(f1)
rows = list(csvs)
for el1 in rows:
for el2 in rows:
print el1, el2
# Results
#
#['6'] ['6']
#['6'] ['31']
#['6'] ['17']
#['31'] ['6']
#['31'] ['31']
#['31'] ['17']
#['17'] ['6']
#['17'] ['31']
#['17'] ['17']
print 'Case 3:'
with open('test.csv') as f1, open('test.csv') as f2:
for el1 in csv.reader(f1):
for el2 in csv.reader(f2):
print el1, el2
# Results
#
#['6'] ['6']
#['6'] ['31']
#['6'] ['17']
print 'Case 4:'
with open('test.csv') as f1, open('test.csv') as f2:
csv1 = csv.reader(f1)
csv2 = csv.reader(f2)
for el1 in csv1:
for el2 in csv2:
print el1, el2
# Results
#
#['6'] ['6']
#['6'] ['31']
#['6'] ['17']
col_num is the column number in which you have your i values
with open('yourfile') as file:
csv = csv.reader(file)
next(csv) # skip the header
col_num = 0
for row in csv:
i = row[col_num]
while i <= k:
cur.execute(query, (i, 100000000000, True, True))
rs.append(cur.fetchall())
i = i + 1
I made you a short test using just simple python functionality.
f = open('test.csv')
csvlines = f.readlines()
f.close()
numbers = [int(n.split(',')[0]) for n in csvlines]
You might have to replace ',' with ';' or something else depending on the locale settings of your operating system.
Short explanation:
csvlines will contain the rows of your csv as strings, f.e. ['1,a,some text', '2,b,some other text']. You will go through each of those lines and call split on the line, e.g. '1,a,some text'.split(',') will give ['1','a','some text']. Your first column will then need to be cast to an integer because it currently is still a string.
Use in your code as (edited as question was edited):
for i in numbers:
if(i<k):
for l in numbers:
# not sure what your constraint on k is, but you can stop iterating
# through the numbers with a simple if
if(l<k):
#do work (you can use i an l here, they will automatically
# take the next value each iteration of the for loop
#(try print i, l for example): 6,6; 6,31; ...; 6,17; 31,6; 31,31
Related
I started out with a 4d list, something like
tokens = [[[["a"], ["b"], ["c"]], [["d"]]], [[["e"], ["f"], ["g"]],[["h"], ["i"], ["j"], ["k"], ["l"]]]]
So I converted this to a csv file using the code
import csv
def export_to_csv(tokens):
csv_list = [["A", "B", "C", word]]
for h_index, h in enumerate(tokens):
for i_index, i in enumerate(h):
for j_index, j in enumerate(i):
csv_list.append([h_index, i_index, j_index, j])
with open('TEST.csv', 'w') as f:
# using csv.writer method from CSV package
write = csv.writer(f)
write.writerows(csv_list)
But now I want to do the reverse process, want to convert a csv file obtained in this format, back to the list format mentioned above.
Assuming you wanted your csv file to look something like this (there were a couple typos in the posted code):
A,B,C,word
0,0,0,a
0,0,1,b
0,0,2,c
...
here's one solution:
import csv
def import_from_csv(filename):
retval = []
with open(filename) as fh:
reader = csv.reader(fh)
# discard header row
next(reader)
# process data rows
for (x,y,z,word) in reader:
x = int(x)
y = int(y)
z = int(z)
retval.extend([[[]]] * (x + 1 - len(retval)))
retval[x].extend([[]] * (y + 1 - len(retval[x])))
retval[x][y].extend([0] * (z + 1 - len(retval[x][y])))
retval[x][y][z] = [word]
return retval
def import_from_csv(file):
import ast
import csv
data = []
# Read the CSV file
with open(file) as fp:
reader = csv.reader(fp)
# Skip the first line, which contains the headers
next(reader)
for line in reader:
# Read the first 3 elements of the line
a, b, c = [int(i) for i in line[:3]]
# When we read it back, everything comes in as strings. Use
# `literal_eval` to convert it to a Python list
value = ast.literal_eval(line[3])
# Extend the list to accomodate the new element
data.append([[[]]]) if len(data) < a + 1 else None
data[a].append([[]]) if len(data[a]) < b + 1 else None
data[a][b].append([]) if len(data[a][b]) < c + 1 else None
data[a][b][c] = value
return data
# Test
assert import_from_csv("TEST.csv") == tokens
First, I'd make writing this construction in a CSV format independent from dimensions:
import csv
def deep_iter(seq):
for i, val in enumerate(seq):
if type(val) is list:
for others in deep_iter(val):
yield i, *others
else:
yield i, val
with open('TEST.csv', 'w') as f:
csv.writer(f).writerows(deep_iter(tokens))
Next, we can use the lexicographic order of the indices to recreate the structure. All we have to do is sequentially move deeper into the output list according to the indices of a word. We stop at the penultimate index to get the last list, because the last index is pointing only at the place of the word in this list and doesn't matter due to the natural ordering:
with open('TEST.csv', 'r') as f:
rows = [*csv.reader(f)]
res = []
for r in rows:
index = r[:-2] # skip the last index and word
e = res
while index:
i = int(index.pop(0)) # get next part of a current index
if i < len(e):
e = e[i]
else:
e.append([]) # add new record at this level
e = e[-1]
e.append(r[-1]) # append the word to the corresponding list
I've been trying to figure this out for about a year now and I'm really burnt out on it so please excuse me if this explanation is a bit rough.
I cannot include job data, but it would be accurate to imagine 2 csv files both with the first column populated with values (Serial numbers/phone numbers/names, doesn't matter - just values). Between both csv files, some values would match while other values would only be contained in one or the other (Timmy is in both files and is a match, Robert is only in file 1 and does not match any name in file 2).
I can successfully output a csv value ONCE that exists in the both csv files (I.e. both files contain "Value78", output file will contain "Value78" only once).
When I try to tack on an else statement to my if condition, to handle non-matching items, the program will output 1 entry for every item it does not match with (makes 100% sense, matches happen once but every other comparison result besides the match is a non-match).
I cannot envision a structure or method to hold the fields that don't match back so that they can be output once and not overrun my terminal or output file.
My goal is to output two csv files, matches and non-matches, with the non-matches having only one entry per value.
Anyways, onto the code:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
pass
The "else: pass" is where the logic of filtering out non-matches is escaping me. Outputting from this else statement will write the non-matching value (len(VList) - 1) times for an iteration that DOES produce 1 match, the entire len(VList) for an iteration with no match. I've tried using a counter and only outputting if the counter equals the len(VList), (incrementing in the else statement, writing output under the scope of the second for loop), but received the same output as if I tried outputting non-matches.
Below is one way you might go about deduplicating and then writing to a file:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
list_of_non_matches = []
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
OFile.write(MyList[x][0] + '\n')
else:
list_of_non_matches.append(MyList[x][0])
# Remove duplicates from the non matches
new_list = []
[new_list.append(x) for x in list_of_non_matches if x not in new_list]
# Write the new list to a file
for i in new_list:
NFile.write(i + '\n')
Does this work?
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,'r') as MFile,
(VENDORUNITS,'r') as VFile,
(MATCHES,'w') as OFile,
(NONMATCHES,mode,'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
MyVals = [x for x in MyList]
MyVals = [x[0] for x in MyVals]
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
vVals = [x for x in VList]
vVals = [x[0] for x in vVals]
for val in MyVals:
if val in vVals:
OFile.write(Val + '\n')
else:
NFile.write(Val + '\n')
#for x in range(len(MyList)):
# for y in range(len(VList)):
# if str(MyList[x][0]) == str(VList[y][0]):
# OFile.write(MyList[x][0] + '\n')
# else:
# pass
Sorry, I had some issues with my PC. I was able to solve my own question the night I posted. The solution I used is so simple I'm kicking myself for not figuring it out way sooner:
import csv
MYUNITS = 'MyUnits.csv'
VENDORUNITS = 'VendorUnits.csv'
MATCHES = 'Matches.csv'
NONMATCHES = 'NonMatches.csv'
with open(MYUNITS,mode='r') as MFile,
open(VENDORUNITS,mode='r') as VFile,
open(MATCHES,mode='w') as OFile,
open(NONMATCHES,mode'w') as NFile:
MyReader = csv.reader(MFile,delimiter=',',quotechar='"')
MyList = list(MyReader)
VendorReader = csv.reader(VFile,delimiter=',',quotechar='"')
VList = list(VendorReader)
for x in range(len(MyList)):
tmpStr = ''
for y in range(len(VList)):
if str(MyList[x][0]) == str(VList[y][0]):
tmpStr = '' #Sets to blank so comparison fails, works because break
OFile.write(MyList[x][0] + '\n')
break
else:
tmp = str(MyList[x][0])
if tmp != '':
NFile.write(tmp + '\n')
import csv
f = open("savewl_ssj500k22_Minfreq1-lowercaseWords_1.csv", "r")
csvF = csv.reader(f, delimiter="\t")
s = 0
sez = []
sezB = []
for q in f:
s = s + 1
if s > 3:
l = q.split(",")
x = l[1]
y = l[0]
sezB.append(y)
sezB.append(int(x))
sez.append(sezB)
print(sez)
f.close()
How to get it work to get all rows from .csv in list or sez saved
from this code I get: MemoryError
in file is 77214 lines of something like this : je,17031
Every loop you are appending sezB which is growing by itself.
so you are apparently grows by O(number of lines ^2).
This is something like this pattern (just for the explanation):
[[1,2], [1,2,3,4], [1,2,3,4,5,6], .....]
I guess you wanted to reset sezB to [] every loop.
Your code can be simplified to
import csv
s = 0
sez = []
sezB = []
with open("savewl_ssj500k22_Minfreq1-lowercaseWords_1.csv", "r") as f:
csvF = csv.reader(f, delimiter="\t")
for q in f:
s += 1
if s > 3:
l = q.split(",")
x, y = l[:2]
sezB.extend([x, int(y)])
sez.append(sezB)
print(sez)
As you can see, you constantly add 2 more element to the sezB list, which is not that much, but you also keep adding the resulting sezB list to the sez list.
So since the file has 77214 lines, sez will need to hold about 6 trillion (5962079010) strings, which is way too many to be stored into memory...
I have this function:
import csv
myfile = r'csvlist.csv'
with open(myfile, 'r', newline='') as f:
c = csv.reader(f, delimiter=',')
i = next(c).index('Wasted Years')
filtering = [row for row in c if row[i] == '25']
total = sum(float(row["Prices"]) for row in c)
print(filtering, "The total is %s" % total)
The filtering part works well, but the total one, it is supposed to iterate over some items on the column, and output a total from these items, but it prints 0 for some reason, any ideas?
i iterates over a int column, which has data on each cell, like: 25, 18, 30, etc, and filters by a specific number, in this case '25'
total it's supposed to sum everything on Prices column and output a total, these are float-like records
First Solution
c is a csv.reader object, it is also an iterable: you can iterate through it only once. The first time you iterate through c is when you calculate filtering. After that, c becomes empty. One way to deal with this is to create two iterables: c1 and c2 that are identical:
import itertools
import csv
with open(myfile, 'r', newline='') as f:
c = csv.reader(f, delimiter=',')
c1, c2 = tee(c) # Split into 2 separate iterables
# Use the first iterable, c1
i = next(c1).index('Wasted Years')
filtering = [row for row in c1 if row[i] == '25']
# Use a different iterable, c2
next(c2) # Skip the header row
total = sum(float(row["Prices"]) for row in c2)
print(filtering, "The total is %s" % total)
Second Solution
Another solution is to rewind the file pointer to the beginning before iterating through c the second time:
import csv
myfile = r'csvlist.csv'
with open(myfile, 'r', newline='') as f:
c = csv.reader(f, delimiter=',')
i = next(c).index('Wasted Years')
filtering = [row for row in c if row[i] == '25']
f.seek(0) # Rewind the file to the beginning
next(c) # Skip the header row
total = sum(float(row["Prices"]) for row in c)
print(filtering, "The total is %s" % total)
Third Solution
I found out what you and I did wrong: The first time calculating the filtering, we use csv.reader, but the second time when calculating the total, we treated the reader as if it was a csv.DictReader. Let's use csv.DictReader all the way through:
import csv
myfile = r'csvlist.csv'
with open(myfile, 'r', newline='') as f:
c = csv.DictReader(f)
filtering = [row for row in c if row['Wasted Years'] == '25']
# Rewind and skip header
f.seek(0)
next(f)
total = sum(float(row["Prices"]) for row in c)
print(filtering, "The total is %s" % total)
I have a feeling that you want to calculate the total from the filtering, not the whole csv rows. If that it the case:
total = sum(float(row["Prices"]) for row in filtering) # filtering, not c
I have a file text delimited file which I am trying to make binary combination per each line and giving the number of line to each pairs.
Here is an example (you can download it here too if you want https://gist.github.com/anonymous/4107418c63b88c6da44281a8ae7a321f)
"A,B "
"AFD,DNGS,SGDH "
"NHYG,QHD,lkd,uyete"
"AFD,TTT"
I want to have it like this
A_1 B_1
AFD_2 DNGS_2
AFD_2 SGDH_2
DNGS_2 SGDH_2
NHYG_3 QHD_3
NHYG_3 lkd_3
NHYG_3 uyete_3
QHD_3 lkd_3
QHD_3 uyete_3
lkd_3 uyete_3
AFD_4 TTT_4
It means, A_1 and B_1 are coming from the first row
AFD_2 & DNGS_2 are coming from the second row , etc etc
I have tried to do it but I cannot figure it out
#!/usr/bin/python
import itertools
# make my output
out = {}
# give a name to my data
file_name = 'data.txt'
# read all the lines
for n, line in enumerate(open(file_name).readlines()):
# split each line by comma
item1 = line.split('\t')
# split each stirg from another one by a comma
item2 = item1.split(',')
# iterate over all combinations of 2 strings
for i in itertools.combinations(item2,2):
# save the data into out
out.write('\t'.join(i))
Output Answer 1
"A_1, B "_1
"AFD_2, DNGS_2
"AFD_2, SGDH "_2
DNGS_2, SGDH "_2
"NHYG_3, QHD_3
"NHYG_3, lkd_3
"NHYG_3, uyete"_3
QHD_3, lkd_3
QHD_3, uyete"_3
lkd_3, uyete"_3
"AFD_4, TTT"_4
answer 2
"A_1 B "_1
"AFD_2 DNGS_2
"AFD_2 SGDH "_2
DNGS_2 SGDH "_2
"NHYG_3 QHD_3
"NHYG_3 lkd_3
"NHYG_3 uyete"_3
QHD_3 lkd_3
QHD_3 uyete"_3
lkd_3 uyete"_3
"AFD_4 TTT"_4
Try this
#!/usr/bin/python
from itertools import combinations
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = line.strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
for res in result:
for elem in res:
print(',\t'.join(elem))
You need a list of list of lists to represent each pair. You can build them using a list comprehension in a loop.
I wasn't sure what you wanted as your actual output format, but this prints your expected output.
If there are quotes in the input file, the simple fix is
items = line.replace("\"", "").strip().split(',')
For the above code. This would break if there were other double quotes in the data. So if you know there aren't its ok.
Otherwise, create a small function to strip the quotes. This example also writes to a file.
#!/usr/bin/python
from itertools import combinations
def remquotes(s):
beg, end = 0, len(s)
if s[0] == '"': beg = 1
if s[-1] == '"': end = -1
return s[beg:end]
with open('data1.txt') as f:
result = []
for n, line in enumerate(f, start=1):
items = remquotes(line.strip()).strip().split(',')
x = [['%s_%d' % (x, n) for x in item] for item in combinations(items, 2)]
result.append(x)
with open('out.txt', 'w') as fout:
for res in result:
for elem in res:
linestr = ',\t'.join(elem)
print(linestr)
fout.write(linestr + '\n')
Similar to the other answer provided adding that based on the comments it looks like you actually wish to write to a tab-delimited text file instead of a dictionary.
#!/usr/bin/python
import itertools
file_name = 'data.txt'
out_file = 'out.txt'
with open(file_name) as infile, open(out_file, "w") as out:
for n,line in enumerate(infile):
row = [i + "_" + str(n+1) for i in line.strip().split(",")]
for i in itertools.combinations(row,2):
out.write('\t'.join(i) + '\n')
The following seems to work with a minimal amount of code:
import itertools
input_filename = 'data.txt'
output_filename = 'split_data.txt'
with open(input_filename, 'rt') as inp, open(output_filename, 'wt') as outp:
for n, line in enumerate(inp, 1):
items = ('{}_{}'.format(x.strip(), n)
for x in line.replace('"', '').split(','))
for combo in itertools.combinations(items, 2):
outp.write('\t'.join(combo) + '\n')