As part of an application reading Csv files made using tkinder and the tksheet library, I would have liked to ensure that each of the lines retrieved from my spreadsheet and individually transformed into a list could be grouped into a single and unique comma separated list.
I currently have the following code:
with open('my_csv_file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
liste = [row[0], row[1], row[2], row[3], row[4]]
print(liste)
Output :
['AAA', 'AAA', 'AAA', 'AAA', 'AAA']
['BBB', 'BBB', 'BBB', 'BBB', 'BBB']
['CCC', 'CCC', 'CCC', 'CCC', 'CCC']
What I want:
[['AAA', 'AAA', 'AAA', 'AAA', 'AAA'],
['BBB', 'BBB', 'BBB', 'BBB', 'BBB'],
['CCC', 'CCC', 'CCC', 'CCC', 'CCC']]
Append to the top list to make your nested list.
lst = []
with open('my_csv_file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
lst.append([row[0], row[1], row[2], row[3], row[4]])
print(lst)
you can add list and append list in the list with list +=
with open('my_csv_file.csv') as csvfile:
res = []
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
res += [[row[0], row[1], row[2], row[3], row[4]]]
or you can use append
with open('my_csv_file.csv') as csvfile:
res = []
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
res.append([row[0], row[1], row[2], row[3], row[4]])
Related
This is my current code:
import csv
data = {'name' : ['Dave', 'Dennis', 'Peter', 'Jess'],
'language': ['Python', 'C', 'Java', 'Python']}
new_data = []
for row in data:
new_row = {}
for item in row:
new_row[item['name']] = item['name']
new_data.append(new_row)
print(new_data)
header = new_data[0].keys()
print(header)
with open('output.csv', 'w') as fh:
csv_writer = csv.DictWriter(fh, header)
csv_writer.writeheader()
csv_writer.writerows(new_data)
What I am trying to achieve is that the dictionary keys are turned into the csv headers and the values turned into the rows.
But when running the code I get a TypeError: 'string indices must be integers' in line 21.
Problem
The issue here is for row in data. This is actually iterating over the keys of your data dictionary, and then you're iterating over the characters of the dictionary keys:
In [2]: data = {'name' : ['Dave', 'Dennis', 'Peter', 'Jess'],
...: 'language': ['Python', 'C', 'Java', 'Python']}
...:
...: new_data = []
...: for row in data:
...: for item in row:
...: print(item)
...:
n
a
m
e
l
a
n
g
u
a
g
e
Approach
What you actually need to do is use zip to capture both the name and favorite language of each person at the same time:
In [43]: for row in zip(*data.values()):
...: print(row)
...:
('Dave', 'Python')
('Dennis', 'C')
('Peter', 'Java')
('Jess', 'Python')
Now, you need to zip those tuples with the keys from data:
In [44]: header = data.keys()
...: for row in zip(*data.values()):
...: print(list(zip(header, row)))
...:
[('name', 'Dave'), ('language', 'Python')]
[('name', 'Dennis'), ('language', 'C')]
[('name', 'Peter'), ('language', 'Java')]
[('name', 'Jess'), ('language', 'Python')]
Solution
Now you can pass these tuples to the dict constructor to create your rowdicts which csv_writer.writerows requires:
header = data.keys()
new_data = []
for row in zip(*data.values()):
new_data.append(dict(zip(header, row)))
with open("output.csv", "w+", newline="") as f_out:
csv_writer = csv.DictWriter(f_out, header)
csv_writer.writeheader()
csv_writer.writerows(new_data)
Output in output.csv:
name,language
Dave,Python
Dennis,C
Peter,Java
Jess,Python
Crunching on this for a long time. Is there an easy way using Numpy or Pandas or fixing my code to get the unique values for the column in a row separated by "|"
I.e the data:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"
Output should be:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
My broken code:
import csv
import pprint
your_list = csv.reader(open('out.csv'))
your_list = list(your_list)
#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
i=0
for col in line:
if i==cols_no:
print "\n"
i=0
if string in col:
values = col.split("|")
myset = set(values)
items = list()
for item in myset:
items.append(item)
print items
else:
print col+",",
i=i+1
It outputs:
id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,
Thanks in advance!
numpy/pandas is a bit overkill for what you can achieve with csv.DictReader and csv.DictWriter with a collections.OrderedDict, eg:
import csv
from collections import OrderedDict
# If using Python 2.x - use `open('output.csv', 'wb') instead
with open('input.csv') as fin, open('output.csv', 'w') as fout:
csvin = csv.DictReader(fin)
csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL)
csvout.writeheader()
for row in csvin:
for k, v in row.items():
row[k] = '|'.join(OrderedDict.fromkeys(v.split('|')))
csvout.writerow(row)
Gives you:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
If you don't care about the order when you have many items separated with |, this will work:
lst = ["id","fname","lname","education","gradyear","attributes",
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL",
"2","john","doe","htw","2000","dev"]
def no_duplicate(string):
return "|".join(set(string.split("|")))
result = map(no_duplicate, lst)
print result
result:
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes', '1', 'john', 'smith', 'ft|harvard|mit', '2003|207|212', 'NULL|admin,co|master|qa', '2', 'john', 'doe', 'htw', '2000', 'dev']
I've been trying to have a program print out a sorted list depending on the requested item. When I request the list from the CSV file I'm not sure how to set only 2 of the 4 values into an integer as when it displays in the program the numbers are treated as strings and it doesn't sort properly.
Eg:
['Jess', 'F', '2009', '6302']
['Kat', 'F', '1999', '6000']
['Alexander', 'M', '1982', '50']
['Bill', 'M', '2006', '2000']
['Jack', 'M', '1998', '1500']
def sortD(choice):
clear()
csv1 = csv.reader(open('TestUnsorted.csv', 'r'), delimiter=',')
sort = sorted(csv1, key=operator.itemgetter(choice))
for eachline in sort:
print (eachline)
open('TestUnsorted.csv', 'r').close()
#From here up is where I'm having difficulty
with open('TestSorted.csv', 'w') as csvfile:
fieldnames = ['Name', 'Gender', 'Year','Count']
csv2 = csv.DictWriter(csvfile, fieldnames=fieldnames,
extrasaction='ignore', delimiter = ';')
csv2.writeheader()
for eachline in sort:
csv2.writerow({'Name': eachline[0] ,'Gender': eachline[1],'Year':eachline[2],'Count':eachline[3]})
List1.insert(0, eachline)
open('TestSorted.csv', 'w').close
Here's what my TestUnsorted file looks like:
Jack,M,1998,1500
Bill,M,2006,2000
Kat,F,1999,6000
Jess,F,2009,6302
Alexander,M,1982,50
sort = sorted(csv1, key=lambda ch: (ch[0], ch[1], int(ch[2]), int(ch[3])))
That will sort the last two values as integers.
EDIT:
Upon further reading the question, I realize choice is the index of the list that you want to sort on. You could do this instead:
if choice < 2: # or however you want to determine whether to cast to int
sort = sorted(csv1, key=operator.itemgetter(choice))
else:
sort = sorted(csv1, key=lambda ch: int(ch[choice]))
This question already has answers here:
Convert .csv table to dictionary [duplicate]
(4 answers)
Closed 9 years ago.
I have a CSV file which I am opening through this code:
open(file,"r")
When I read the file I get the output:
['hello', 'hi', 'bye']
['jelly', 'belly', 'heli']
['red', 'black', 'blue']
I want the otput something like this:
{hello:['jelly','red'], hi:['belly','black'], 'bye':['heli','blue']}
but I have no idea how
You can use collections.defaultdict and csv.DictReader:
>>> import csv
>>> from collections import defaultdict
>>> with open('abc.csv') as f:
reader = csv.DictReader(f)
d = defaultdict(list)
for row in reader:
for k, v in row.items():
d[k].append(v)
...
>>> d
defaultdict(<type 'list'>,
{'hi': ['belly', 'black'],
'bye': ['heli', 'blue'],
'hello': ['jelly', 'red']})
csv = [
['hello', 'hi', 'bye'],
['jelly', 'belly', 'heli'],
['red', 'black', 'blue'],
]
csv = zip(*csv)
result = {}
for row in csv:
result[row[0]] = row[1:]
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
yourHash[line[0]] = line[1:]
This assumes that each key is unique to one line. If not, this would have to be modified to:
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
if line[0] in yourHash:
yourHash[line[0]] += line[1:]
else:
yourHash[line[0]] = line[1:]
Of course, you can use csv, but I figured that someone would definitely post that, so I gave an alternative way to do it. Good luck!
You can use csv, read the first line to get the header, create the number of lists corresponding to the header and then create the dict:
import csv
with open(ur_csv) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=[[head] for head in next(reader)]
for row in reader:
for i, e in enumerate(row):
header[i].append(e)
data={l[0]:l[1:] for l in header}
print(data)
# {'hi': ['belly', 'black'], 'bye': ['heli', 'blue'], 'hello': ['jelly', 'red']}
If you want something more terse, you can use Jon Clements excellent solution:
with open(ur_csv) as fin:
csvin = csv.reader(fin, quotechar="'", skipinitialspace=True)
header = next(csvin, [])
data=dict(zip(header, zip(*csvin)))
# {'bye': ('heli', 'blue'), 'hello': ('jelly', 'red'), 'hi': ('belly', 'black')}
But that will produce a dictionary of tuples if that matters...
And if you csv file is huge, you may want to rewrite this to generate a dictionary row by row (similar to DictReader):
import csv
def key_gen(fn):
with open(fn) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=next(reader, [])
for row in reader:
yield dict(zip(header, row))
for e in key_gen(ur_csv):
print(e)
# {'hi': 'belly', 'bye': 'heli', 'hello': 'jelly'}
{'hi': 'black', 'bye': 'blue', 'hello': 'red'} etc...
I have a text file in the following format:
1,"20130219111529","90UP:34","0000","9999","356708","2"
"-2","20130219105824","0001:11","0000","","162_005",""
I want to compare row 1 and row 2 (In this case 1 and -2) for some purpose. To strip out all the quotes and parse this file I have the following code:
if os.path.exists(FileName):
with open(FileName) as File:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row)
The following is the output:
['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
I want to iterate through the columns. For example, iterate through '1' then '-2' and so on.
How do I go about doing this?
Use zip(). It turns two iterables into one iterable of tuples, with elements coming from both lists.
l1 = ['1', '20130219111529', '90UP:34', '0000', '9999', '356708', '2']
l2 = ['-2', '20130219105824', '0001:11', '0000', '', '162_005', '']
for elem1, elem2 in zip(l1, l2):
print("elem1 is {0} and elem2 is {1}.".format(elem1, elem2)
Perhaps the following.
if os.path.exists(FileName):
with open(FileName) as File:
lastRow = []
# loop over the lines in the file
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
# saves the first row, for comparison below
if lastRow == []:
lastRow = row
continue
# loop over the columns, if all rows have the same number
for colNum in range(len(row)):
# compare row[colNum] and lastRow[colNum] as you wish
# save this row, to compare with the next row in the loop
lastRow = row
just print the first element in the row:
for row in csv.reader(File, delimiter= ',', skipinitialspace= True):
print(row[0])
EDIT
rows = csv.reader(File, delimiter= ',', skipinitialspace= True)
print len(rows) # how many rows were read from the file
for row in rows:
print(row[0])
If (as you said in the question, though I'm not sure if you wanted this) you want to iterate through the columns, you can do the following:
if os.path.exists(file_name):
with open(file_name) as csv_file:
for columns in zip(*csv.reader(csv_file, delimiter=',', skipinitialspace=True)):
print columns
This will output the following:
('1', '-2')
('20130219111529', '20130219105824')
('90UP:34', '0001:11')
('0000', '0000')
('9999', '')
('356708', '162_005')
('2', '')