Write Python dictionary to CSV where where keys= columns, values = rows - python

I have a list of dictionaries that I want to be able to open in Excel, formatted correctly. This is what I have so far, using csv:
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(ipath, 'wb')
writer = csv.writer(ofile, dialect = 'excel')
for items in list_of_dicts:
for k,v in items.items():
writer.writerow([k,v])
Obviously, when I open the output in Excel, it's formatted like this:
key value
key value
What I want is this:
key key key
value value value
I can't figure out how to do this, so help would be appreciated. Also, I want the column names to be the dictionary keys, in stead of the default 'A, B, C' etc. Sorry if this is stupid.
Thanks

The csv module has a DictWriter class for this, which is covered quite nicely in another SO answer. The critical point is that you need to know all your column headings when you instantiate the DictWriter. You could construct the list of field names from your list_of_dicts, if so your code becomes
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(out_path, 'wb')
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader() # Assumes Python >= 2.7
for row in list_of_dicts:
writer.writerow(row)
out_file.close()
The way I've constructed fieldnames scans the entire list_of_dicts, so it will slow down as the size increases. You should instead construct fieldnames directly from the source of your data e.g. if the source of your data is also a csv file you can use a DictReader and use fieldnames = reader.fieldnames.
You can also replace the for loop with a single call to writer.writerows(list_of_dicts) and use a with block to handle file closure, in which case your code would become
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
with open(out_path, 'wb') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader()
writer.writerows(list_of_dicts)

You need to write 2 separate rows, one with the keys, one with the values, instead:
writer = csv.writer(ofile, dialect = 'excel')
writer.writerow([k for d in list_of_dicts k in d])
writer.writerow([v for d in list_of_dicts v in d.itervalues()])
The two list comprehensions extract first all the keys, then all the values, from the dictionaries in your input list, combining these into one list to write to the CSV file.

I think that the most useful is to write the column by column, so each key is a column (good for later on data processing and use for e.g. ML).
I had some trouble yesterday figuring it out but I came up with the solution I saw on some other website. However, from what I see it is not possible to go through the whole dictionary at once and we have to divide it on smaller dictionaries (my csv file had 20k rows at the end - surveyed person, their data and answers. I did it like this:
# writing dict to csv
# 'cleaned' is a name of the output file
# 1 header
# fildnames is going to be columns names
# 2 create writer
writer = csv.DictWriter(cleaned, d.keys())
# 3 attach header
writer.writeheader()
# write separate dictionarties
for i in range(len(list(d.values())[0])):
writer.writerow({key:d[key][i] for key in d.keys()})
I see my solution has one more for loop but from the other hand, I think it takes less memory (but, I am not sure!!)
Hope it'd help somebody ;)

Related

Exporting a complicated dictionary into a csv file

I have a dictionary in which the values are list. here is an example:
example:
d = {'20190606_CFMPB576run3_CF33456_12.RCC': [1.0354477611940298, '0.51'],
'20190606_CFMPB576run3_CF33457_05.RCC': [1.0412757973733584, '1.09'],
'20190606_CFMPB576run3_CF33505_06.RCC': [1.0531309297912714, '0.81']}
I am trying to export this dictionary into a csv file. like this expected output:
expected output:
file_name,citeria1,criteria2
20190606_CFMPB576run3_CF33456_12.RCC,1.0354477611940298, 0.51
20190606_CFMPB576run3_CF33457_05.RCC,1.0412757973733584,1.09
20190606_CFMPB576run3_CF33505_06.RCC,1.0531309297912714,0.81
to do so, I made the following code:
import csv
with open('mycsvfile.csv', 'w') as f:
header = ["file_name","citeria1","criteria2"]
w = csv.DictWriter(f, my_dict.keys())
w.writeheader()
w.writerow(d)
but it does not return what I want. do you know how to fix it?
Change as follows:
import csv
with open('mycsvfile.csv', 'w') as f:
header = ["file_name", "citeria1", "criteria2"]
w = csv.writer(f)
w.writerow(header)
for key, lst in d.items():
w.writerow([key] + lst)
A DictWriter is given the field/column names and assumes the rows to be provided as dictionaries with keys corresponding to the given field names. In your case, the data structure is different. You can use a simple csv.writer as your rows are a mixture of keys and values of your given dictionary.

writing to a single CSV file from multiple dictionaries

Background
I have multiple dictionaries of different lengths. I need to write the values of dictionaries to a single CSV file. I figured I can loop through each dictionary one by one and write the data to CSV. I ran in to a small formatting issue.
Problem/Solution
I realized after I loop through the first dictionary the data of the second writing gets written the row where the first dictionary ended as displayed in the first image I would ideally want my data to print as show in the second image
My Code
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for name in e:
writer.writerow({'emp_name':name,'age':e.get(name)})
for company in c:
writer.writerow({'company_name':company,'size':c.get(company)})
writeData()
PS: I would have more than 2 dictionaries so I am looking for a generic way where I can print data from row under the header for all the dictionaries. I am open to all solutions and suggestions.
If all dictionaries are of equal size, you could use zip to iterate over them in parallel. If they aren't of equal size, and you want the iteration to pad to the longest dict, you could use itertools.zip_longest
For example:
import csv
from itertools import zip_longest
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
If the dicts are of equal size, it's simpler:
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000, 'Yahoo': 3000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'age', 'company_name', 'size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip(e.items(), c.items()):
writer.writerow(employee + company)
writeData()
A little side note: If you use Python3, dictionaries are ordered. This isn't the case in Python2. So if you use Python2, you should use collections.OrderedDict instead of the standard dictionary.
There might be a more pythonic solution, but I'd do something like this:
I haven't used your .csv writer thing before, so I just made my own comma separated output.
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
dict_list = [e,c] # add more dicts here.
max_dict_size = max(len(d) for d in dict_list)
output = ""
# Add header information here.
for i in range(max_dict_size):
for j in range(len(dict_list)):
key, value = dict_list[j].popitem() if len(dict_list[j]) else ("","")
output += f"{key},{value},"
output += "\n"
# Now output should contain the full text of the .csv file
# Do file manipulation here.
# You could also do it after each row,
# Where I currently have the output += "\n"
Edit: A little more thinking and I found something that might polish this a bit. You could first map the dictionary into a list of keys using the .key() function on each dictionary and appending those to an empty list.
The advantage with that is that you'd be able to go "forward" instead of popping the dictionary items off the back. It also wouldn't destroy the dictionary.

Convert list of dictionaries to csv file

I have a list of dictionaries of this kind :
[
{'site1':'data1'},
{'site2':'data2'}
]
What would be the proper way to generate a csv file with the data in this order ? :
row 1 row2
site1 data1
site2 data2
Loop through the dictionaries and write them to the file.
list_of_dicts = [{'site1':'data1'},{'site2':'data2'}]
with open('sites.csv', 'w') as file:
file.write('row1\trow2\n')
for dictionary in list_of_dicts:
file.write('\t'.join(list(dictionary.items())[0]) + '\n')
output:
row1 row2
site1 data1
site2 data2
Note that this requires each dictionary to only have one entry in it, if it has more, one is randomly selected and the others are ignored. There are many different ways to handle their being more than one entry in the dictionaries, so you must add the expected behaviour to the question statement for those cases to be accommodated for.
this should do the trick :)
data = [ {'site1':'data1'}, {'site2':'data2'} ]
with open ('list.csv', 'w') as f:
for dict in data:
for key, value in dict.items():
text = key+','+value+'\n'
f.writelines(text)
I like to use pandas's dataframe to make my data and write them into csv files
a = [{'site1':'data1'},{'site2':'data2'}]
#Get each key and values from each dictionaries in the list
keys = []
vals = []
for a1 in a:
for k, v in a1.items():
keys.append(k)
vals.append(v)
#make the dataframe from the keys and values
result = pd.DataFrame({'row1': keys, 'row2':vals})
#write the data into csv, use index=False to not write the row numbers
result.to_csv("mydata.csv", index=False)
You should use a CSV writer to make sure that any embedded metacharacters such as commas and quotes are escaped properly otherwise data such as {'site3':'data, data and more data'} will corrupt the file.
import csv
my_list = [{'site1':'data1'}, {'site2':'data2'}]
with open('test.csv', 'w', newline='') as out_fp:
writer = csv.writer(out_fp)
for d in my_list:
writer.writerows(d.items())
You could shorten that up a bit with itertools if you want
import itertools
with open('test.csv', 'w', newline='') as out_fp:
csv.writer(out_fp).writerows(itertools.chain.from_iterable(
d.items() for d in my_list))

Storing list in to csv file using python

I'm trying to store a type list variable in to a csv file using python. Here is what I got after hours on StackOverflow and python documentation:
Code:
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
new_csv_file = open("new_file.csv",'wb')
ruleswriter = csv.writer(new_csv_file,dialect='excel',delimiter=',')
ruleswriter.writerows(row)
new_csv_file.close()
Result:
$ more new_file.csv
S,g,D,e,s,c,r,i,p,t,i,o,n
S,g,N,a,m,e
S,g,G,r,o,u,p,I,d
Can anyone please advice how to store the values to the file like this:
some group,sku,sg-abcdefgh
Thanks a ton!
writerows() expects a sequence of sequences, for example a list of lists. You're passing in a dict, and a dict happens to be iterable: It returns the keys of the dictionary. Each key -- a string -- happens to be iterable as well. So what you get is an element of each iterable per cell, which is a character. You got exactly what you asked for :-)
What you want to do is write one row, with the keys in it, and then maybe another with the values, eg:
import csv
row = {
'SgDescription': 'some group',
'SgName': 'sku',
'SgGroupId': u'sg-abcdefgh'
}
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerows([row.keys(), row.values()])
If order is important, use collections.OrderedDict.
Extract your desired data before writing into a csv file,
row = [row['SgDescription'], row['SgName'], row['SgGroupId']] # ['some group', 'sku', u'sg-abcdefgh']
# write to a csv file
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerow(row)
PS: if you don't care about the order, just use row.values().
Or use csv.DictWriter,
import csv
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
with open('new_file.csv', 'w') as csvfile:
fieldnames = ['SgDescription', 'SgName', 'SgGroupId']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(row)

Possibility of writing dictionary items in columns

i have a dictionary in which keys are tuples and values are list like
{('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'): [5.998999999999998,0.0013169999,
4.0000000000000972], ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'): [7.89899999,
0.15647999999675390, 8.764380000972, 9.200000000]}
I want to write this dictionary to a csv file in the column format like:
('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df') ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')
5.998999999999998 7.89899999
0.0013169999 0.15647999999675390
4.0000000000000972 8.764380000972
9.200000000
I know the same thing to write in row format using the code:
writer = csv.writer(open('dict.csv', 'wb'))
for key, value in mydict.items():
writer.writerow([key, value])
How do i write the same thing in columns? Is it even possible? Thanks in advance
I referred python docs for csv here: http://docs.python.org/2/library/csv.html. There is no information on column wise writing.
import csv
mydict = {('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'):
[5.998999999999998, 0.0013169999, 4.0000000000000972],
('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de'):
[7.89899999, 0.15647999999675390, 8.764380000972, 9.200000000]}
with open('dict.csv', 'wb') as file:
writer = csv.writer(file, delimiter='\t')
writer.writerow(mydict.keys())
for row in zip(*mydict.values()):
writer.writerow(list(row))
Output file dict.csv:
('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df') ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')
5.998999999999998 7.89899999
0.0013169999 0.1564799999967539
4.000000000000097 8.764380000972
I am sure you can figure out the formatting:
>>> d.keys() #gives list of keys for first row
[('c4:7d:4f:53:24:be', 'ac:81:12:62:91:df'), ('a8:5b:4f:2e:fe:09', 'de:62:ef:4e:21:de')]
>>> for i in zip(*d.values()): #gives rows with tuple structure for columns
print i
(5.998999999999998, 7.89899999)
(0.0013169999, 0.1564799999967539)
(4.000000000000097, 8.764380000972)
For your code, do this:
writer = csv.writer(open('dict.csv', 'wb'))
writer.writerow(mydict.keys())
for values in zip(*mydict.values()):
writer.writerow(values)
The ()'s and such will not be added to the csv file.

Categories