I have a text file in the following format:
1,"20130219111529","90UP:34","0000","9999","356708","2"
"-2","20130219105824","0001:11","0000","","162_005",""
I want to compare row 1 and row 2 (In this case 1 and -2) for some purpose. To strip out all the quotes and parse this file I have the following code:
if os.path.exists(FileName):
with open(FileName) as File:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row)
The following is the output:
['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
I want to iterate through the columns. For example, iterate through '1' then '-2' and so on.
How do I go about doing this?
Use zip(). It turns two iterables into one iterable of tuples, with elements coming from both lists.
l1 = ['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
l2 = ['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
for elem1, elem2 in zip(l1, l2):
print("elem1 is {0} and elem2 is {1}.".format(elem1, elem2)
Perhaps the following.
if os.path.exists(FileName):
with open(FileName) as File:
lastRow = []
# loop over the lines in the file
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
# saves the first row, for comparison below
if lastRow == []:
lastRow = row
continue
# loop over the columns, if all rows have the same number
for colNum in range(len(row)):
# compare row[colNum] and lastRow[colNum] as you wish
# save this row, to compare with the next row in the loop
lastRow = row
just print the first element in the row:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row[0])
EDIT
rows = csv.reader(File, delimiter= ',', skipinitialspace= True)
print len(rows) # how many rows were read from the file
for row in rows:
print(row[0])
If (as you said in the question, though I'm not sure if you wanted this) you want to iterate through the columns, you can do the following:
if os.path.exists(file_name):
with open(file_name) as csv_file:
for columns in zip(*csv.reader(csv_file, delimiter=',', skipinitialspace=True)):
print columns
This will output the following:
('1', '-2')
('20130219111529', '20130219105824')
('90UP:34', '0001:11')
('0000', '0000')
('9999', '')
('356708', '162_005')
('2', '')
Related
Crunching on this for a long time. Is there an easy way using Numpy or Pandas or fixing my code to get the unique values for the column in a row separated by "|"
I.e the data:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"
Output should be:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
My broken code:
import csv
import pprint
your_list = csv.reader(open('out.csv'))
your_list = list(your_list)
#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
i=0
for col in line:
if i==cols_no:
print "\n"
i=0
if string in col:
values = col.split("|")
myset = set(values)
items = list()
for item in myset:
items.append(item)
print items
else:
print col+",",
i=i+1
It outputs:
id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,
Thanks in advance!
numpy/pandas is a bit overkill for what you can achieve with csv.DictReader and csv.DictWriter with a collections.OrderedDict, eg:
import csv
from collections import OrderedDict
# If using Python 2.x - use `open('output.csv', 'wb') instead
with open('input.csv') as fin, open('output.csv', 'w') as fout:
csvin = csv.DictReader(fin)
csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL)
csvout.writeheader()
for row in csvin:
for k, v in row.items():
row[k] = '|'.join(OrderedDict.fromkeys(v.split('|')))
csvout.writerow(row)
Gives you:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
If you don't care about the order when you have many items separated with |, this will work:
lst = ["id","fname","lname","education","gradyear","attributes",
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL",
"2","john","doe","htw","2000","dev"]
def no_duplicate(string):
return "|".join(set(string.split("|")))
result = map(no_duplicate, lst)
print result
result:
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes', '1', 'john', 'smith', 'ft|harvard|mit', '2003|207|212', 'NULL|admin,co|master|qa', '2', 'john', 'doe', 'htw', '2000', 'dev']
I want to be able to turn csv file into a list of lists with the column values for each list. For example:
6,2,4
5,2,3
7,3,6
into
[[6,5,7],[2,2,3],[4,3,6]]
Ive only managed to open the file and only having success printing it as rows
with open(input,'rb') as csvfile:
csv_file = csv.reader(csvfile)
header = csv_file.next()
raw_data = csv_file
In case you sure it's fixed number of items in each row, you can use zip:
import csv
with open('test.csv') as csvfile:
rows = csv.reader(csvfile)
res = list(zip(*rows))
print(res)
# [('6', '5', '7'), ('2', '2', '3'), ('4', '3', '6')]
Or in case it's different number of items in row:
6,2,4
5,2
7
Use zip_longest and filter:
import csv
from itertools import zip_longest
with open('test.txt') as csvfile:
rows = csv.reader(csvfile)
res = list(zip_longest(*rows))
print(res)
# [('6', '5', '7'), ('2', '2', None), ('4', None, None)]
res2 = [list(filter(None.__ne__, l)) for l in res]
print(res2)
# [['6', '5', '7'], ['2', '2'], ['4']]
You could probably start by reading it into a list of lists first:
from csv import reader as csvreader
with open(input, 'r') as fp:
reader = csvreader(fp)
li = list(reader)
Then chop it into a new sequence, I'm sure there are other tricks with itertools but this is what I came up with:
from itertools import count
def my_gen():
for i in count():
try:
yield [x[i] for x in li]
except IndexError:
break
You can now turn the generator into a list, which will have the desired columns as rows.
list(my_gen())
Or maybe like this...
from csv import reader
with open('test.csv') as csv_file:
csv_reader = reader(csv_file)
rows = list(csv_reader)
print(rows)
I am trying to import a CSV file while removing the '$' signs from the first column.
Is there any way I can omit the '$' sign with csv.reader?
If not, how can I modify aList to remove the $ signs?
>>> import csv
>>> with open('test.csv', 'rb') as csvfile:
... reader = csv.reader(csvfile, delimiter=',')
... for a in reader:
... print a
...
['$135.20 ', '2']
['$137.20 ', '3']
['$139.20 ', '4']
['$141.20 ', '5']
['$143.20 ', '8']
>>> print(aList)
[['$135.20 ', '2'], ['$137.20 ', '3'], ['$139.20 ', '4'], ['$141.20 ', '5'], ['$143.20 ', '8']]
Ultimately, I would like to prep aList for Numpy functions.
You can modify the first column and then accumulate the results somewhere else:
for col_a, col_b in reader:
results.append([col_a[1:], col_b])
That will remove the first character from the first column and append both columns to another list results
You can do it like this:
for a in reader:
print a[0][1:], a[1]
a[0] is the first entry in your array, a[0][1:] is the first entry starting with the second character.
For example:
a="$123"
print a[1:]
# prints 123
If you want to modify the list itself, try the following:
for x in xrange(len(reader)):
reader[x]=[reader[x][0][1:], reader[x][1]]
I want to read and store in a defaultdict(list) a csv file:
Pos ID Name
1 0001L01 50293
2 0002L01 128864
3 0003L01 172937
4 0004L01 12878
5 0005L01 demo
6 0004L01 12878
7 0004L01 12878
8 0005L01 demo
I would like the ID to be my keys and as values Pos and Name. However the number of Pos varies. For instance ID 0005L01 contains Pos 8 and 5 whereas 0001L01 contains Pos 1. Is there a way of doing that?
So far I got:
reader = csv.reader(open("sheet.csv", "rb"))
for row in reader:
if any(row):
dlist.append(row)
for k, g in groupby(zip(mylist, itertools.count()), key=lambda x: x[0][1]):
map(lambda x: d[k].append((x[0][0], x[1], x[0][2])), g)
You can use dict.setdefault method to create the expected dictionary:
import csv
d={}
with open('my_file.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ')
for row in spamreader:
try :
Pos,ID,Name=row
d.setdefault(ID,[]).append([Pos,Name])
except ValueError :
continue
result:
{'0001L01': [['1', '50293']],
'0003L01': [['3', '172937']],
'0002L01': [['2', '128864']],
'0005L01': [['5', 'demo'], ['8', 'demo']],
'0004L01': [['4', '12878'], ['6', '12878'], ['7', '12878']]}
As #tobias_k says, if you have not pos columns in your file you can use enumerate to create it manually :
import csv
d={}
with open('my_file.csv', 'rb') as csvfile:
spamreader = csv.reader(csvfile, delimiter=' ')
for Pos,row in enumerate(spamreader,1):
try :
ID,Name=row
d.setdefault(ID,[]).append([Pos,Name])
except ValueError :
continue
I have been struggling with this for hours. I want to print line 2 and line 3, make a new line, and print line 3 and 4 + newline, next line 4 and 5 +\n ... actually the whole script does more things, but this is the key step I am struggling with.
This is my csv file
ids,CLSZ_0.7,CLID_0.7
ZINC04474603,48,45
ZINC12496548,48,45
ZINC12495776,48,45
ZINC04546442,48,45
ZINC28631806,48,45
this is my code
ifile = 'bin_503_07.csv'
with open(ifile, 'rb') as f:
object = csv.reader(f)
object.next() #skips first line
for row in object:
next_row = object.next()
print row
print next_row
print "\n"
this is what I get as a result (basically the original .csv but with a newline inserted between 2 rows). Instead I need that every new pair of lines begins with the second row from previous pair.
['ZINC04474603', '48', '45']
['ZINC12496548', '48', '45']
['ZINC12495776', '48', '45']
['ZINC04546442', '48', '45']
['ZINC28631806', '48', '45']
['ZINC08860448', '48', '45']
['ZINC04655414', '48', '45']
['ZINC08860490', '48', '45']
any help greatly appreciated
You can do it like this:
ifile = 'bin_503_07.csv'
with open(ifile, 'rb') as f:
reader = csv.reader(f)
reader.next() #skips first line
previous_row = reader.next() # load first actual line
for row in reader: # this already calls "next()"
print previous_row
print row
print # no need to print "\n", empty "print" already does that
previous_row = row # "advance" by replacing previous with current
import itertools
with open('path/to/file') as infile:
a,b = tee(infile)
next(b,None)
for line1, line2 in zip(a,b):
print line1.rstrip('\n')
print line2.rstrip('\n')
print ''