writing to a single CSV file from multiple dictionaries - python

Background
I have multiple dictionaries of different lengths. I need to write the values of dictionaries to a single CSV file. I figured I can loop through each dictionary one by one and write the data to CSV. I ran in to a small formatting issue.
Problem/Solution
I realized after I loop through the first dictionary the data of the second writing gets written the row where the first dictionary ended as displayed in the first image I would ideally want my data to print as show in the second image
My Code
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.DictWriter(csv_file, fieldnames=fieldnames)
writer.writeheader()
for name in e:
writer.writerow({'emp_name':name,'age':e.get(name)})
for company in c:
writer.writerow({'company_name':company,'size':c.get(company)})
writeData()
PS: I would have more than 2 dictionaries so I am looking for a generic way where I can print data from row under the header for all the dictionaries. I am open to all solutions and suggestions.

If all dictionaries are of equal size, you could use zip to iterate over them in parallel. If they aren't of equal size, and you want the iteration to pad to the longest dict, you could use itertools.zip_longest
For example:
import csv
from itertools import zip_longest
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name','age','company_name','size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip_longest(e.items(), c.items()):
row = list(employee)
row += list(company) if company is not None else ['', ''] # Write empty fields if no company
writer.writerow(row)
writeData()
If the dicts are of equal size, it's simpler:
import csv
e = {'Jay':10,'Ray':40}
c = {'Google':5000, 'Yahoo': 3000}
def writeData():
with open('employee_file20.csv', mode='w') as csv_file:
fieldnames = ['emp_name', 'age', 'company_name', 'size']
writer = csv.writer(csv_file)
writer.writerow(fieldnames)
for employee, company in zip(e.items(), c.items()):
writer.writerow(employee + company)
writeData()
A little side note: If you use Python3, dictionaries are ordered. This isn't the case in Python2. So if you use Python2, you should use collections.OrderedDict instead of the standard dictionary.

There might be a more pythonic solution, but I'd do something like this:
I haven't used your .csv writer thing before, so I just made my own comma separated output.
e = {'Jay':10,'Ray':40}
c = {'Google':5000}
dict_list = [e,c] # add more dicts here.
max_dict_size = max(len(d) for d in dict_list)
output = ""
# Add header information here.
for i in range(max_dict_size):
for j in range(len(dict_list)):
key, value = dict_list[j].popitem() if len(dict_list[j]) else ("","")
output += f"{key},{value},"
output += "\n"
# Now output should contain the full text of the .csv file
# Do file manipulation here.
# You could also do it after each row,
# Where I currently have the output += "\n"
Edit: A little more thinking and I found something that might polish this a bit. You could first map the dictionary into a list of keys using the .key() function on each dictionary and appending those to an empty list.
The advantage with that is that you'd be able to go "forward" instead of popping the dictionary items off the back. It also wouldn't destroy the dictionary.

Related

Output nested array with variable length to CSV

I've got a datasample that I would like to output in a CSV file. The data structure is a nested list of different german terms (dict) and the corresponding possible english translations (list):
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
As you can see, one german term could hold variable quantities of english translations. I want to output each german term and each of its corresponding translations into separate columns in one row, so "Motor" is in column 1, "engine" in column 2 and "motor" in column 3.
See example:
I just don't know how to loop correctly through the data.
So far, my code to output:
with open(filename, 'a') as csv_file:
writer = csv.writer(csv_file)
# The for loop
for x in all_terms:
for i in x:
for num in i:
writer.writerow([i, x[i][num]])
But this error is thrown out:
writer.writerow([i, x[i][num]]) TypeError: list indices must be integers, not unicode
Any hint appreciated, and maybe there's even a smarter way than 3 nested for loops.
How about the following solution:
import csv
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
with open('test.csv', 'a') as csv_file:
writer = csv.writer(csv_file)
# The for loop
for small_dict in all_terms:
for key in small_dict:
output = [key, *small_dict[key]]
writer.writerow(output)
Output in test.txt:
Motor,engine,motor
Ziel,purpose,goal,aim,destination
I used * operator to unpack all items inside the dictionary's values to create a row for the writerow to write in. This can potentially take care of the case if you have multiple entries in a dictionary inside of all_terms.
Here's a way to do it:
import csv
all_terms = [{'Motor': ['engine', 'motor']},
{'Ziel': ['purpose', 'goal', 'aim', 'destination']}]
filename = 'tranlations.csv'
with open(filename, 'a', newline='') as csv_file:
writer = csv.writer(csv_file)
for term in all_terms:
word, translations = term.popitem()
row = [word] + translations
writer.writerow(row)
CSV file's contents afterwards:
Motor,engine,motor
Ziel,purpose,goal,aim,destination

Storing list in to csv file using python

I'm trying to store a type list variable in to a csv file using python. Here is what I got after hours on StackOverflow and python documentation:
Code:
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
new_csv_file = open("new_file.csv",'wb')
ruleswriter = csv.writer(new_csv_file,dialect='excel',delimiter=',')
ruleswriter.writerows(row)
new_csv_file.close()
Result:
$ more new_file.csv
S,g,D,e,s,c,r,i,p,t,i,o,n
S,g,N,a,m,e
S,g,G,r,o,u,p,I,d
Can anyone please advice how to store the values to the file like this:
some group,sku,sg-abcdefgh
Thanks a ton!
writerows() expects a sequence of sequences, for example a list of lists. You're passing in a dict, and a dict happens to be iterable: It returns the keys of the dictionary. Each key -- a string -- happens to be iterable as well. So what you get is an element of each iterable per cell, which is a character. You got exactly what you asked for :-)
What you want to do is write one row, with the keys in it, and then maybe another with the values, eg:
import csv
row = {
'SgDescription': 'some group',
'SgName': 'sku',
'SgGroupId': u'sg-abcdefgh'
}
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerows([row.keys(), row.values()])
If order is important, use collections.OrderedDict.
Extract your desired data before writing into a csv file,
row = [row['SgDescription'], row['SgName'], row['SgGroupId']] # ['some group', 'sku', u'sg-abcdefgh']
# write to a csv file
with open("new_file.csv",'wb') as f:
ruleswriter = csv.writer(f)
ruleswriter.writerow(row)
PS: if you don't care about the order, just use row.values().
Or use csv.DictWriter,
import csv
row = {'SgDescription': 'some group', 'SgName': 'sku', 'SgGroupId': u'sg-abcdefgh'}
with open('new_file.csv', 'w') as csvfile:
fieldnames = ['SgDescription', 'SgName', 'SgGroupId']
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writerow(row)

How to read a text file into a list or an array with Python

I am trying to read the lines of a text file into a list or array in python. I just need to be able to individually access any item in the list or array after it is created.
The text file is formatted as follows:
0,0,200,0,53,1,0,255,...,0.
Where the ... is above, there actual text file has hundreds or thousands more items.
I'm using the following code to try to read the file into a list:
text_file = open("filename.dat", "r")
lines = text_file.readlines()
print lines
print len(lines)
text_file.close()
The output I get is:
['0,0,200,0,53,1,0,255,...,0.']
1
Apparently it is reading the entire file into a list of just one item, rather than a list of individual items. What am I doing wrong?
You will have to split your string into a list of values using split()
So,
lines = text_file.read().split(',')
EDIT:
I didn't realise there would be so much traction to this. Here's a more idiomatic approach.
import csv
with open('filename.csv', 'r') as fd:
reader = csv.reader(fd)
for row in reader:
# do something
You can also use numpy loadtxt like
from numpy import loadtxt
lines = loadtxt("filename.dat", comments="#", delimiter=",", unpack=False)
So you want to create a list of lists... We need to start with an empty list
list_of_lists = []
next, we read the file content, line by line
with open('data') as f:
for line in f:
inner_list = [elt.strip() for elt in line.split(',')]
# in alternative, if you need to use the file content as numbers
# inner_list = [int(elt.strip()) for elt in line.split(',')]
list_of_lists.append(inner_list)
A common use case is that of columnar data, but our units of storage are the
rows of the file, that we have read one by one, so you may want to transpose
your list of lists. This can be done with the following idiom
by_cols = zip(*list_of_lists)
Another common use is to give a name to each column
col_names = ('apples sold', 'pears sold', 'apples revenue', 'pears revenue')
by_names = {}
for i, col_name in enumerate(col_names):
by_names[col_name] = by_cols[i]
so that you can operate on homogeneous data items
mean_apple_prices = [money/fruits for money, fruits in
zip(by_names['apples revenue'], by_names['apples_sold'])]
Most of what I've written can be speeded up using the csv module, from the standard library. Another third party module is pandas, that lets you automate most aspects of a typical data analysis (but has a number of dependencies).
Update While in Python 2 zip(*list_of_lists) returns a different (transposed) list of lists, in Python 3 the situation has changed and zip(*list_of_lists) returns a zip object that is not subscriptable.
If you need indexed access you can use
by_cols = list(zip(*list_of_lists))
that gives you a list of lists in both versions of Python.
On the other hand, if you don't need indexed access and what you want is just to build a dictionary indexed by column names, a zip object is just fine...
file = open('some_data.csv')
names = get_names(next(file))
columns = zip(*((x.strip() for x in line.split(',')) for line in file)))
d = {}
for name, column in zip(names, columns): d[name] = column
This question is asking how to read the comma-separated value contents from a file into an iterable list:
0,0,200,0,53,1,0,255,...,0.
The easiest way to do this is with the csv module as follows:
import csv
with open('filename.dat', newline='') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',')
Now, you can easily iterate over spamreader like this:
for row in spamreader:
print(', '.join(row))
See documentation for more examples.
Im a bit late but you can also read the text file into a dataframe and then convert corresponding column to a list.
lista=pd.read_csv('path_to_textfile.txt', sep=",", header=None)[0].tolist()
example.
lista=pd.read_csv('data/holdout.txt',sep=',',header=None)[0].tolist()
Note: the column name of the corresponding dataframe will be in the form of integers and i choose 0 because i was extracting only the first column
Better this way,
def txt_to_lst(file_path):
try:
stopword=open(file_path,"r")
lines = stopword.read().split('\n')
print(lines)
except Exception as e:
print(e)

Write Python dictionary to CSV where where keys= columns, values = rows

I have a list of dictionaries that I want to be able to open in Excel, formatted correctly. This is what I have so far, using csv:
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(ipath, 'wb')
writer = csv.writer(ofile, dialect = 'excel')
for items in list_of_dicts:
for k,v in items.items():
writer.writerow([k,v])
Obviously, when I open the output in Excel, it's formatted like this:
key value
key value
What I want is this:
key key key
value value value
I can't figure out how to do this, so help would be appreciated. Also, I want the column names to be the dictionary keys, in stead of the default 'A, B, C' etc. Sorry if this is stupid.
Thanks
The csv module has a DictWriter class for this, which is covered quite nicely in another SO answer. The critical point is that you need to know all your column headings when you instantiate the DictWriter. You could construct the list of field names from your list_of_dicts, if so your code becomes
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
out_file = open(out_path, 'wb')
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader() # Assumes Python >= 2.7
for row in list_of_dicts:
writer.writerow(row)
out_file.close()
The way I've constructed fieldnames scans the entire list_of_dicts, so it will slow down as the size increases. You should instead construct fieldnames directly from the source of your data e.g. if the source of your data is also a csv file you can use a DictReader and use fieldnames = reader.fieldnames.
You can also replace the for loop with a single call to writer.writerows(list_of_dicts) and use a with block to handle file closure, in which case your code would become
list_of_dicts = [{'hello': 'goodbye'}, {'yes': 'no'}]
out_path= "/docs/outfile.txt"
fieldnames = sorted(list(set(k for d in list_of_dicts for k in d)))
with open(out_path, 'wb') as out_file:
writer = csv.DictWriter(out_file, fieldnames=fieldnames, dialect='excel')
writer.writeheader()
writer.writerows(list_of_dicts)
You need to write 2 separate rows, one with the keys, one with the values, instead:
writer = csv.writer(ofile, dialect = 'excel')
writer.writerow([k for d in list_of_dicts k in d])
writer.writerow([v for d in list_of_dicts v in d.itervalues()])
The two list comprehensions extract first all the keys, then all the values, from the dictionaries in your input list, combining these into one list to write to the CSV file.
I think that the most useful is to write the column by column, so each key is a column (good for later on data processing and use for e.g. ML).
I had some trouble yesterday figuring it out but I came up with the solution I saw on some other website. However, from what I see it is not possible to go through the whole dictionary at once and we have to divide it on smaller dictionaries (my csv file had 20k rows at the end - surveyed person, their data and answers. I did it like this:
# writing dict to csv
# 'cleaned' is a name of the output file
# 1 header
# fildnames is going to be columns names
# 2 create writer
writer = csv.DictWriter(cleaned, d.keys())
# 3 attach header
writer.writeheader()
# write separate dictionarties
for i in range(len(list(d.values())[0])):
writer.writerow({key:d[key][i] for key in d.keys()})
I see my solution has one more for loop but from the other hand, I think it takes less memory (but, I am not sure!!)
Hope it'd help somebody ;)

Is it possible to keep the column order using csv.DictReader?

For example, my csv has columns as below:
ID, ID2, Date, Job No, Code
I need to write the columns back in the same order. The dict jumbles the order immediately, so I believe it's more of a problem with the reader.
Python's dicts do NOT maintain order prior to 3.6 (but, regardless, in that version the csv.DictReader class was modified to return OrderedDicts).
However, the instance of csv.DictReader that you're using (after you've read the first row!-) does have a .fieldnames list of strings, which IS in order.
So,
for rowdict in myReader:
print ['%s:%s' % (f, rowdict[f]) for f in myReader.fieldnames]
will show you that the order is indeed maintained (in .fieldnames of course, NEVER in the dict -- that's intrinsically impossible in Python!-).
So, suppose you want to read a.csv and write b.csv with the same column order. Using plain reader and writer is too easy, so you want to use the Dict varieties instead;-). Well, one way is...:
import csv
a = open('a.csv', 'r')
b = open('b.csv', 'w')
ra = csv.DictReader(a)
wb = csv.DictWriter(b, None)
for d in ra:
if wb.fieldnames is None:
# initialize and write b's headers
dh = dict((h, h) for h in ra.fieldnames)
wb.fieldnames = ra.fieldnames
wb.writerow(dh)
wb.writerow(d)
b.close()
a.close()
assuming you have headers in a.csv (otherewise you can't use a DictReader on it) and want just the same headers in b.csv.
Make an OrderedDict from each row dict sorted by DictReader.fieldnames.
import csv
from collections import OrderedDict
reader = csv.DictReader(open("file.csv"))
for row in reader:
sorted_row = OrderedDict(sorted(row.items(),
key=lambda item: reader.fieldnames.index(item[0])))
from csv import DictReader, DictWriter
with open("input.csv", 'r') as input_file:
reader = DictReader(f=input_file)
with open("output.csv", 'w') as output_file:
writer = DictWriter(f=output_file, fieldnames=reader.fieldnames)
for row in reader:
writer.writerow(row)
I know this question is old...but if you use DictReader, you can pass it an ordered list with the fieldnames to the fieldnames param
Edit: as of python 3.6 dicts are ordered by insertion order, essentially making all dicts in python OrderedDicts by default. That being said the docs say dont rely on this behaviour because it may change. I will challenge that, lets see if it ever changes back :)
Unfortunatley the default DictReader does not allow for overriding the dict class, a custom DictReader would do the trick though
import csv
class DictReader(csv.DictReader):
def __init__(self, *args, **kwargs):
self.dict_class = kwargs.pop(dict_class, dict)
super(DictReader, self).__init__(*args, **kwargs)
def __next__(self):
''' copied from python source '''
if self.line_num == 0:
# Used only for its side effect.
self.fieldnames
row = next(self.reader)
self.line_num = self.reader.line_num
# unlike the basic reader, we prefer not to return blanks,
# because we will typically wind up with a dict full of None
# values
while row == []:
row = next(self.reader)
# using the customized dict_class
d = self.dict_class(zip(self.fieldnames, row))
lf = len(self.fieldnames)
lr = len(row)
if lf < lr:
d[self.restkey] = row[lf:]
elif lf > lr:
for key in self.fieldnames[lr:]:
d[key] = self.restval
return d
use it like so
import collections
csv_reader = DictReader(f, dict_class=collections.OrderedDict)
# ...
I wrote a little tool to sort the order of CSV columns:
I don't claim that it's great I know little of Python, but it does the job:
import csv
import sys
with open(sys.argv[1], 'r') as infile:
csvReader = csv.DictReader(infile)
sorted_fieldnames = sorted(csvReader.fieldnames)
writer = csv.DictWriter(sys.stdout, fieldnames=sorted_fieldnames)
# reorder the header first
writer.writeheader()
for row in csvReader:
# writes the reordered rows to the new file
writer.writerow(row)

Categories