This is my current code:
import csv
data = {'name' : ['Dave', 'Dennis', 'Peter', 'Jess'],
'language': ['Python', 'C', 'Java', 'Python']}
new_data = []
for row in data:
new_row = {}
for item in row:
new_row[item['name']] = item['name']
new_data.append(new_row)
print(new_data)
header = new_data[0].keys()
print(header)
with open('output.csv', 'w') as fh:
csv_writer = csv.DictWriter(fh, header)
csv_writer.writeheader()
csv_writer.writerows(new_data)
What I am trying to achieve is that the dictionary keys are turned into the csv headers and the values turned into the rows.
But when running the code I get a TypeError: 'string indices must be integers' in line 21.
Problem
The issue here is for row in data. This is actually iterating over the keys of your data dictionary, and then you're iterating over the characters of the dictionary keys:
In [2]: data = {'name' : ['Dave', 'Dennis', 'Peter', 'Jess'],
...: 'language': ['Python', 'C', 'Java', 'Python']}
...:
...: new_data = []
...: for row in data:
...: for item in row:
...: print(item)
...:
n
a
m
e
l
a
n
g
u
a
g
e
Approach
What you actually need to do is use zip to capture both the name and favorite language of each person at the same time:
In [43]: for row in zip(*data.values()):
...: print(row)
...:
('Dave', 'Python')
('Dennis', 'C')
('Peter', 'Java')
('Jess', 'Python')
Now, you need to zip those tuples with the keys from data:
In [44]: header = data.keys()
...: for row in zip(*data.values()):
...: print(list(zip(header, row)))
...:
[('name', 'Dave'), ('language', 'Python')]
[('name', 'Dennis'), ('language', 'C')]
[('name', 'Peter'), ('language', 'Java')]
[('name', 'Jess'), ('language', 'Python')]
Solution
Now you can pass these tuples to the dict constructor to create your rowdicts which csv_writer.writerows requires:
header = data.keys()
new_data = []
for row in zip(*data.values()):
new_data.append(dict(zip(header, row)))
with open("output.csv", "w+", newline="") as f_out:
csv_writer = csv.DictWriter(f_out, header)
csv_writer.writeheader()
csv_writer.writerows(new_data)
Output in output.csv:
name,language
Dave,Python
Dennis,C
Peter,Java
Jess,Python
Related
I was wondering whether there was a way read all columns in a row except the first one as ints using csv.DictReader, kind of like this:
filename = sys.argv[1]
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
row[1:] = int(row[1:])
database.append(row)
I know this isn't a correct way to do this as it gives out the error of being unable to hash slices. I have a way to circumvent having to do this at all, but for future reference, I'm curious whether, using slices or not, I can selectively interact with columns in a row without hardcoding each one?
You can do it by using the key() dictionary method to get a list of the keys in each dictionary and the slice that for doing the conversion:
import csv
from pprint import pprint
import sys
filename = sys.argv[1]
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
for key in list(row.keys())[1:]:
row[key] = int(row[key])
database.append(row)
pprint(database)
Output:
[{'name': 'John', 'number1': 1, 'number2': 2, 'number3': 3, 'number4': 4},
{'name': 'Alex', 'number1': 4, 'number2': 3, 'number3': 2, 'number4': 1},
{'name': 'James', 'number1': 1, 'number2': 3, 'number3': 2, 'number4': 4}]
Use this:
import csv
filename = 'test.csv'
database = []
with open(filename) as file:
reader = csv.DictReader(file)
for row in reader:
new_d = {} # Create new dictionary to be appended into database
for i, (k, v) in enumerate(row.items()): # Loop through items of the row (i = index, k = key, v = value)
new_d[k] = int(v) if i > 0 else v # If i > 0, add int(v) to the dictionary, else add v
database.append(new_d) # Append to database
print(database)
test.csv:
Letter,Num1,Num2
A,123,456
B,789,012
C,345,678
D,901,234
E,567,890
Output:
[{'Letter': 'A', 'Num1': 123, 'Num2': 456},
{'Letter': 'B', 'Num1': 789, 'Num2': 12},
{'Letter': 'C', 'Num1': 345, 'Num2': 678},
{'Letter': 'D', 'Num1': 901, 'Num2': 234},
{'Letter': 'E', 'Num1': 567, 'Num2': 890}]
As part of an application reading Csv files made using tkinder and the tksheet library, I would have liked to ensure that each of the lines retrieved from my spreadsheet and individually transformed into a list could be grouped into a single and unique comma separated list.
I currently have the following code:
with open('my_csv_file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
liste = [row[0], row[1], row[2], row[3], row[4]]
print(liste)
Output :
['AAA', 'AAA', 'AAA', 'AAA', 'AAA']
['BBB', 'BBB', 'BBB', 'BBB', 'BBB']
['CCC', 'CCC', 'CCC', 'CCC', 'CCC']
What I want:
[['AAA', 'AAA', 'AAA', 'AAA', 'AAA'],
['BBB', 'BBB', 'BBB', 'BBB', 'BBB'],
['CCC', 'CCC', 'CCC', 'CCC', 'CCC']]
Append to the top list to make your nested list.
lst = []
with open('my_csv_file.csv') as csvfile:
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
lst.append([row[0], row[1], row[2], row[3], row[4]])
print(lst)
you can add list and append list in the list with list +=
with open('my_csv_file.csv') as csvfile:
res = []
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
res += [[row[0], row[1], row[2], row[3], row[4]]]
or you can use append
with open('my_csv_file.csv') as csvfile:
res = []
reader = csv.reader(csvfile, delimiter=';')
for row in reader:
if row != '' and row[0].isdigit():
res.append([row[0], row[1], row[2], row[3], row[4]])
I'm trying append values to a csv file by looping through the list then looping through dicts of a json data but i get a index erorr:
import csv
data=json_resp.json()
with open('Meteroits.csv','w+') as file:
writer=csv.DictWriter(file,fieldnames=['name','id','nametype','recclass','mass','fall','year','reclat','reclong','geolocation'])
writer.writeheader()
for i in range(len(data)):
for x in data[i]:
name=x[1]
i_d=x[2]
nametype=x[3]
recclass=x[4]
mass=x[5]
fall=x[6]
year=x[7]
reclat=x[8]
reclong=x[9]
geolocation=x[10]
writer.writerow({'name':name,'id':i_d,'nametype':nametype,'recclass':recclass,'mass':mass,'fall':fall,'year':year,'reclat':reclat,'reclong':reclong,'geolocation':geolocation})
I'm getting the error at index 4:
---> 12 recclass=x[4]
IndexError: string index out of range
And here's a sample of data:
{
'name': 'Abee',
'id': '6',
'nametype': 'Valid',
'recclass': 'EH4',
'mass': '107000',
'fall': 'Fell',
'year': '1952-01-01T00:00:00.000',
'reclat': '54.216670',
'reclong': '-113.000000',
'geolocation': {'latitude': '54.21667', 'longitude': '-113.0'}
}
The problem is
i is one index
data[i] is one item, a dict
x is a key of the data[i] dict, the first one being 'name'
x[1] is the letter n, ... x[3] the letter 'e', so x[4] is out of range
Just write the dict as you have it already
with open('Meteroits.csv', 'w+') as file:
writer = csv.DictWriter(file,
fieldnames=['name', 'id', 'nametype', 'recclass', 'mass',
'fall', 'year', 'reclat', 'reclong', 'geolocation'])
writer.writeheader()
for item in data:
writer.writerow(item)
Or just use pandas
import pandas as pd
df = pd.DataFrame(data)
df.to_csv('Meteroits.csv', index=False)
The error is that x is the key of the dict, and not the dict itself. It should be:
for i in range(len(data)):
new_dict = {}
for x, val in data[i].items():
new_dict[x] = val
writer.writerow(new_dict)
Crunching on this for a long time. Is there an easy way using Numpy or Pandas or fixing my code to get the unique values for the column in a row separated by "|"
I.e the data:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL"
"2","john","doe","htw","2000","dev"
Output should be:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
My broken code:
import csv
import pprint
your_list = csv.reader(open('out.csv'))
your_list = list(your_list)
#pprint.pprint(your_list)
string = "|"
cols_no=6
for line in your_list:
i=0
for col in line:
if i==cols_no:
print "\n"
i=0
if string in col:
values = col.split("|")
myset = set(values)
items = list()
for item in myset:
items.append(item)
print items
else:
print col+",",
i=i+1
It outputs:
id, fname, lname, education, gradyear, attributes, 1, john, smith, ['harvard', 'ft', 'mit']
['2003', '212', '207']
['qa', 'admin,co', 'NULL', 'master']
2, john, doe, htw, 2000, dev,
Thanks in advance!
numpy/pandas is a bit overkill for what you can achieve with csv.DictReader and csv.DictWriter with a collections.OrderedDict, eg:
import csv
from collections import OrderedDict
# If using Python 2.x - use `open('output.csv', 'wb') instead
with open('input.csv') as fin, open('output.csv', 'w') as fout:
csvin = csv.DictReader(fin)
csvout = csv.DictWriter(fout, fieldnames=csvin.fieldnames, quoting=csv.QUOTE_ALL)
csvout.writeheader()
for row in csvin:
for k, v in row.items():
row[k] = '|'.join(OrderedDict.fromkeys(v.split('|')))
csvout.writerow(row)
Gives you:
"id","fname","lname","education","gradyear","attributes"
"1","john","smith","mit|harvard|ft","2003|207|212","qa|admin,co|master|NULL"
"2","john","doe","htw","2000","dev"
If you don't care about the order when you have many items separated with |, this will work:
lst = ["id","fname","lname","education","gradyear","attributes",
"1","john","smith","mit|harvard|ft|ft|ft","2003|207|212|212|212","qa|admin,co|master|NULL|NULL",
"2","john","doe","htw","2000","dev"]
def no_duplicate(string):
return "|".join(set(string.split("|")))
result = map(no_duplicate, lst)
print result
result:
['id', 'fname', 'lname', 'education', 'gradyear', 'attributes', '1', 'john', 'smith', 'ft|harvard|mit', '2003|207|212', 'NULL|admin,co|master|qa', '2', 'john', 'doe', 'htw', '2000', 'dev']
This question already has answers here:
Convert .csv table to dictionary [duplicate]
(4 answers)
Closed 9 years ago.
I have a CSV file which I am opening through this code:
open(file,"r")
When I read the file I get the output:
['hello', 'hi', 'bye']
['jelly', 'belly', 'heli']
['red', 'black', 'blue']
I want the otput something like this:
{hello:['jelly','red'], hi:['belly','black'], 'bye':['heli','blue']}
but I have no idea how
You can use collections.defaultdict and csv.DictReader:
>>> import csv
>>> from collections import defaultdict
>>> with open('abc.csv') as f:
reader = csv.DictReader(f)
d = defaultdict(list)
for row in reader:
for k, v in row.items():
d[k].append(v)
...
>>> d
defaultdict(<type 'list'>,
{'hi': ['belly', 'black'],
'bye': ['heli', 'blue'],
'hello': ['jelly', 'red']})
csv = [
['hello', 'hi', 'bye'],
['jelly', 'belly', 'heli'],
['red', 'black', 'blue'],
]
csv = zip(*csv)
result = {}
for row in csv:
result[row[0]] = row[1:]
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
yourHash[line[0]] = line[1:]
This assumes that each key is unique to one line. If not, this would have to be modified to:
yourHash = {}
with open(yourFile, 'r') as inFile:
for line in inFile:
line = line.rstrip().split(',')
if line[0] in yourHash:
yourHash[line[0]] += line[1:]
else:
yourHash[line[0]] = line[1:]
Of course, you can use csv, but I figured that someone would definitely post that, so I gave an alternative way to do it. Good luck!
You can use csv, read the first line to get the header, create the number of lists corresponding to the header and then create the dict:
import csv
with open(ur_csv) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=[[head] for head in next(reader)]
for row in reader:
for i, e in enumerate(row):
header[i].append(e)
data={l[0]:l[1:] for l in header}
print(data)
# {'hi': ['belly', 'black'], 'bye': ['heli', 'blue'], 'hello': ['jelly', 'red']}
If you want something more terse, you can use Jon Clements excellent solution:
with open(ur_csv) as fin:
csvin = csv.reader(fin, quotechar="'", skipinitialspace=True)
header = next(csvin, [])
data=dict(zip(header, zip(*csvin)))
# {'bye': ('heli', 'blue'), 'hello': ('jelly', 'red'), 'hi': ('belly', 'black')}
But that will produce a dictionary of tuples if that matters...
And if you csv file is huge, you may want to rewrite this to generate a dictionary row by row (similar to DictReader):
import csv
def key_gen(fn):
with open(fn) as fin:
reader=csv.reader(fin, quotechar="'", skipinitialspace=True)
header=next(reader, [])
for row in reader:
yield dict(zip(header, row))
for e in key_gen(ur_csv):
print(e)
# {'hi': 'belly', 'bye': 'heli', 'hello': 'jelly'}
{'hi': 'black', 'bye': 'blue', 'hello': 'red'} etc...