Update: I do not want to use pandas because I have a list of dict's and want to write each one to disk as they come in (part of webscraping workflow).
I have a dict that I'd like to write to a csv file. I've come up with a solution, but I'd like to know if there's a more pythonic solution available. Here's what I envisioned (but doesn't work):
import csv
test_dict = {"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
with open('test.csv', 'w') as csvfile:
fieldnames = ["review_id", "text"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(test_dict)
Which would ideally result in:
review_id text
1 5
2 6
3 7
4 8
The code above doesn't seem to work that way I'd expect it to and throws a value error. So, I've turned to following solution (which does work, but seems verbose).
with open('test.csv', 'w') as csvfile:
fieldnames = ["review_id", "text"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
response = test_dict
cells = [{x: {key: val}} for key, vals in response.items()
for x, val in enumerate(vals)]
rows = {}
for d in cells:
for key, val in d.items():
if key in rows:
rows[key].update(d.get(key, None))
else:
rows[key] = d.get(key, None)
for row in [val for _, val in rows.items()]:
writer.writerow(row)
Again, to reiterate what I'm looking for: the block of code directly above works (i.e., produces the desired result mentioned early in the post), but seems verbose. So, is there a more pythonic solution?
Thanks!
Your first example will work with minor edits. DictWriter expects a list of dicts rather than a dict of lists. Assuming you can't change the format of the test_dict:
import csv
test_dict = {"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
def convert_dict(mydict, numentries):
data = []
for i in range(numentries):
row = {}
for k, l in mydict.iteritems():
row[k] = l[i]
data.append(row)
return data
with open('test.csv', 'w') as csvfile:
fieldnames = ["review_id", "text"]
writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
writer.writeheader()
writer.writerows(convert_dict(test_dict, 4))
Try using pandas of python..
Here is a simple example
import pandas as pd
test_dict = {"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
d1 = pd.DataFrame(test_dict)
d1.to_csv("output.csv")
Cheers
The built-in zip function can join together different iterables into tuples which can be passed to writerows. Try this as the last line:
writer.writerows(zip(test_dict["review_id"], test_dict["text"]))
You can see what it's doing by making a list:
>>> list(zip(test_dict["review_id"], test_dict["text"]))
[(1, 5), (2, 6), (3, 7), (4, 8)]
Edit: In this particular case, you probably want a regular csv.Writer, since what you effectively have is now a list.
If you don't mind using a 3rd-party package, you could do it with pandas.
import pandas as pd
pd.DataFrame(test_dict).to_csv('test.csv', index=False)
update
So, you have several dictionaries and all of them seems to come from a scraping routine.
import pandas as pd
test_dict = {"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
pd.DataFrame(test_dict).to_csv('test.csv', index=False)
list_of_dicts = [test_dict, test_dict]
for d in list_of_dicts:
pd.DataFrame(d).to_csv('test.csv', index=False, mode='a', header=False)
This time, you would be appending to the file and without the header.
The output is:
review_id,text
1,5
2,6
3,7
4,8
1,5
2,6
3,7
4,8
1,5
2,6
3,7
4,8
The problem is that with DictWriter.writerows() you are forced to have a dict for each row. Instead you can simply add the values changing your csv creation:
with open('test.csv', 'w') as csvfile:
fieldnames = test_dict.keys()
fieldvalues = zip(*test_dict.values())
writer = csv.writer(csvfile)
writer.writerow(fieldnames)
writer.writerows(fieldvalues)
You have two different problems in your question:
Create a csv file from a dictionary where the values are containers and not primitives.
For the first problem, the solution is generally to transform the container type into a primitive type. The most common method is creating a json-string. So for example:
>>> import json
>>> x = [2, 4, 6, 8, 10]
>>> json_string = json.dumps(x)
>>> json_string
'[2, 4, 6, 8, 10]'
So your data conversion might look like:
import json
def convert(datadict):
'''Generator which converts a dictionary of containers into a dictionary of json-strings.
args:
datadict(dict): dictionary which needs conversion
yield:
tuple: key and string
'''
for key, value in datadict.items():
yield key, json.dumps(value)
def dump_to_csv_using_dict(datadict, fields=None, filepath=None, delimiter=None):
'''Dumps a datadict value into csv
args:
datadict(list): list of dictionaries to dump
fieldnames(list): field sequence to use from the dictionary [default: sorted(datadict.keys())]
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames, restval='', extrasaction='ignore', delimiter=delimiter)
writer.writeheader()
for each_dict in datadict:
writer.writerow(each_dict)
So the naive conversion looks like this:
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
converted_data = dict(convert(test_data))
data_list = [converted_data]
dump_to_csv(data_list)
Create a final value that is actually some sort of a merging of two disparate data sets.
To do this, you need to find a way to combine data from different keys. This is not an easy problem to generically solve.
That said, it's easy to combine two lists with zip.
>>> x = [2, 4, 6]
>>> y = [1, 3, 5]
>>> zip(y, x)
[(1, 2), (3, 4), (5, 6)]
In addition, in the event that your lists are not the same size, python's itertools package provides a method, izip_longest, which will yield back the full zip even if one list is shorter than another. Note izip_longest returns a generator.
from itertools import izip_longest
>>> x = [2, 4]
>>> y = [1, 3, 5]
>>> z = izip_longest(y, x, fillvalue=None) # default fillvalue is None
>>> list(z) # z is a generator
[(1, 2), (3, 4), (5, None)]
So we could add another function here:
from itertoops import izip_longest
def combine(data, fields=None, default=None):
'''Combines fields within data
args:
data(dict): a dictionary with lists as values
fields(list): a list of keys to combine [default: all fields in random order]
default: default fill value [default: None]
yields:
tuple: columns combined into rows
'''
fields = data.keys() if field is None else field
columns = [data.get(field) for field in fields]
for values in izip_longest(*columns, fillvalue=default):
yield values
And now we can use this to update our original conversion.
def dump_to_csv(data, filepath=None, delimiter=None):
'''Dumps list into csv
args:
data(list): list of values to dump
filepath(str): filepath to save to [default: 'tmp.csv']
delimiter(str): delimiter to use in csv [default: '|']
'''
fieldnames = sorted(datadict.keys()) if fields is None else fields
filepath = 'tmp.csv' if filepath is None else filepath
delimiter = '|' if not delimiter else delimiter
with open(filepath, 'w') as csvfile:
writer = csv.writer(csvfile, delimiter=delimiter)
for each_row in data:
writer.writerow(each_dict)
# Conversion code
test_data = {
"review_id": [1, 2, 3, 4],
"text": [5, 6, 7, 8]}
}
combined_data = combine(test_data)
data_list = [combined_data]
dump_to_csv(data_list)
Related
I am trying to put the following into a csv. Here is my code
import csv
data = [[1, 2, 3], 4, 5]
with open('test.csv', 'w') as f:
writer = csv.writer(f)
writer.writerows(data)
I am getting the following error:
_csv.Error: iterable expected, not int
When writing writer.writerow, then the code works but gives [1, 2, 3], 4 and 5 as the columns.
I want the columns to be 1, 2, 3, 4, 5
Any help on how I can do it?
writerow isn't equivalent to writerows
>>> some_data = [[1,2,3],[4,5,6],[7,8,9]]
>>> writer.writerows(some_data)
>
1,2,3
4,5,6
7,8,9
>>> write.writerow(some_data)
>"[1, 2, 3]","[4, 5, 6]","[7, 8, 9]"
Try:
import csv
headers = [1,2,3,4,5]
some_data = ['Foo','Bar','Baz','Qux','Zoo']
more_data = [['d1','d2','d3'],['d4','d5','d6']]
with open('test.csv', 'w', newline='') as f:
writer = csv.writer(f)
writer.writerow(headers) # Takes an iterable of cells
writer.writerow(some_data)
writer.writerows(more_data) # Takes an iterable of iterables
And you'll get:
1,2,3,4,5
Foo,Bar,Baz,Qux,Zoo
d1,d2,d3
d4,d5,d6
import csv
data = [[1, 2, 3,4], 5,6]
print_data = []
with open('test.csv', 'w') as f:
writer = csv.writer(f)
# Following code flattens the list within a list,
# uses temporary 'print_data' to store values for printing to csv
for counter in range(len(data)):
if isinstance(data[counter], list)==1 :
print ('list found')
for val in data[counter]:
print_data.append(val)
else:
print_data.append(data[counter])
writer.writerow(print_data)
I am not familiar with how to export the list to the csv in python. Here is code for one list:
import csv
X = ([1,2,3],[7,8,9])
Y = ([4,5,6],[3,4,5])
for x in range(0,2,1):
csvfile = "C:/Temp/aaa.csv"
with open(csvfile, "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in x[0]:
writer.writerow([val])
And I want to the result:
Then how to modify the code(the main problem is how to change the column?)
To output multiple columns you can use zip() like:
Code:
import csv
x0 = [1, 2, 3]
y0 = [4, 5, 6]
x2 = [7, 8, 9]
y2 = [3, 4, 5]
csvfile = "aaa.csv"
with open(csvfile, "w") as output:
writer = csv.writer(output, lineterminator='\n')
writer.writerow(['x=0', None, None, 'x=2'])
writer.writerow(['x', 'y', None, 'x', 'y'])
for val in zip(x0, y0, [None] * len(x0), x2, y2):
writer.writerow(val)
Results:
x=0,,,x=2
x,y,,x,y
1,4,,7,3
2,5,,8,4
3,6,,9,5
You could try:
with open('file.csv') as fin:
reader = csv.reader(fin)
[fout.write(r[0],r[1]) for r in reader]
If you need further help, leave a comment.
When dealing with csv files you should really just use Pandas. Put your header and data into a dataframe, and then use the .to_csv method on that dataframe. Csv can get tricky when you have strings that contain commas, etc...
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_csv.html
out_gate,useless_column,in_gate,num_connect
a,u,b,1
a,s,b,3
b,e,a,2
b,l,c,4
c,e,a,5
c,s,b,5
c,s,b,3
c,c,a,4
d,o,c,2
d,l,c,3
d,u,a,1
d,m,b,2
shown above is a given, sample csv file. First of all, My final goal is to get the answer as a form of csv file like below:
,a,b,c,d
a,0,4,0,0
b,2,0,4,0
c,9,8,0,0
d,1,2,5,0
I am trying to match this each data (a,b,c,d) one by one to the in_gate so, for example when out_gate 'c'-> in_gate 'b', number of connections is 8 and 'c'->'a' becomes 9.
I want to solve it with lists(or tuple, Dictionary, set) or collections. defaultdict WITHOUT USING PANDAS OR NUMPY, and I want a solution that can be applied to many gates (around 10 to 40) as well.
I understand there is a similar question and It helped a lot, but I still have some troubles in compiling. Lastly, Is there any way with using lists of columns and for loop?
((ex) list1=[a,b,c,d],list2=[b,b,a,c,a,b,b,a,c,c,a,b])
what if there are some useless columns that are not related to the data but the final goal remains same?
thanks
I'd use a Counter for this task. To keep the code simple, I'll read the data from a string. And I'll let you figure out how to produce the output as a CSV file in the format of your choice.
import csv
from collections import Counter
data = '''\
out_gate,in_gate,num_connect
a,b,1
a,b,3
b,a,2
b,c,4
c,a,5
c,b,5
c,b,3
c,a,4
d,c,2
d,c,3
d,a,1
d,b,2
'''.splitlines()
reader = csv.reader(data)
#skip header
next(reader)
# A Counter to accumulate the data
counts = Counter()
# Accumulate the data
for ogate, igate, num in reader:
counts[ogate, igate] += int(num)
# We could grab the keys from the data, but it's easier to hard-code them
keys = 'abcd'
# Display the accumulated data
for ogate in keys:
print(ogate, [counts[ogate, igate] for igate in keys])
output
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
If I understand your problem correctly, you could try and using a nested collections.defaultdict for this:
import csv
from collections import defaultdict
d = defaultdict(lambda : defaultdict(int))
with open('gates.csv') as in_file:
csv_reader = csv.reader(in_file)
next(csv_reader)
for row in csv_reader:
outs, ins, connect = row
d[outs][ins] += int(connect)
gates = sorted(d)
for outs in gates:
print(outs, [d[outs][ins] for ins in gates])
Which Outputs:
a [0, 4, 0, 0]
b [2, 0, 4, 0]
c [9, 8, 0, 0]
d [1, 2, 5, 0]
is there ways to display zipped text vertically in csv ?? I tried many difference type of \n ',' but still can't get the array to be vertical
if __name__ == '__main__': #start of program
master = Tk()
newDirRH = "C:/VSMPlots"
FileName = "J123"
TypeName = "1234"
Field = [1,2,3,4,5,6,7,8,9,10]
Court = [5,4,1,2,3,4,5,1,2,3]
for field, court in zip(Field, Court):
stringText = ','.join((str(FileName), str(TypeName), str(Field), str(Court)))
newfile = newDirRH + "/Try1.csv"
text_file = open(newfile, "w")
x = stringText
text_file.write(x)
text_file.close()
print "Done"
This is the method i am looking for for your Code i can't seem to add new columns as all the column will repeat 10x
You are not writing CSV data. You are writing Python string representations of lists. You are writing the whole Field and Court lists each iteration of your loop, instead of writing field and court, and Excel sees the comma in the Python string representation:
J123,1234,[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[5, 4, 1, 2, 3, 4, 5, 1, 2, 3]
J123,1234,[1, 2, 3, 4, 5, 6, 7, 8, 9, 10],[5, 4, 1, 2, 3, 4, 5, 1, 2, 3]
etc.
while you wanted to write:
J123,1234,1,5
J123,1234,2,4
etc.
Use the csv module to produce CSV files:
import csv
with open(newfile, "wb") as csvfile:
writer = csv.writer(csvfile)
for field, court in zip(Field, Court):
writer.writerow([FileName, TypeName, field, court])
Note the with statement; it takes care of closing the open file object for you. The csv module also makes sure everything is converted to strings.
If you want to write something only on the first row, keep a counter with your items; enumerate() makes that easy:
with open(newfile, "wb") as csvfile:
writer = csv.writer(csvfile)
# row of headers
writer.writerow(['FileName', 'TypeName', 'field', 'court'])
for i, (field, court) in enumerate(zip(Field, Court)):
row = [[FileName, TypeName] if i == 0 else ['', '']
writer.writerow(row + [field, court])
I have a newbie question. I need help on separating a text file into columns and rows. Let's say I have a file like this:
1 2 3 4
2 3 4 5
and I want to put it into a 2d list called values = [[]]
i can get it to give me the rows ok and this code works ok:
values = map(int, line.split(','))
I just don't know how I can say the same thing but for the rows and the documentation doesn't make any sense
cheers
f = open(filename,'rt')
a = [[int(token) for token in line.split()] for line in f.readlines()[::2]]
In your sample file above, you have an empty line between each data row - I took this into account, but you can drop the ::2 subscript if you didn't mean to have this extra line in your data.
Edit: added conversion to int - you can use map as well, but mixing list comprehensions and map seems ugly to me.
import csv
import itertools
values = []
with open('text.file') as file_object:
for line in csv.reader(file_object, delimiter=' '):
values.append(map(int, line))
print "rows:", values
print "columns"
for column in itertools.izip(*values):
print column
Output is:
rows: [[1, 2, 3, 4], [2, 3, 4, 5]]
columns:
(1, 2)
(2, 3)
(3, 4)
(4, 5)
Get the data into your program by some method. Here's one:
f = open(tetxfile, 'r')
buffer = f.read()
f.close()
Parse the buffer into a table (note: strip() is used to clear any trailing whitespace):
table = [map(int, row.split()) for row in buffer.strip().split("\n")]
>>> print table
[[1, 2, 3, 4], [2, 3, 4, 5]]
Maybe it's ordered pairs you want instead, then transpose the table:
transpose = zip(*table)
>>> print transpose
[(1, 2), (2, 3), (3, 4), (4, 5)]
You could try to use the CSV-module. You can specify custom delimiters, so it might work.
If columns are separated by blanks
import re
A,B,C,D = [],[],[],[]
pat = re.compile('([^ ]+)\s+([^ ]+)\s+([^ ]+)\s+([^ ]+)')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.match(line.strip()).groups()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
or with csv module
EDIT
A,B,C,D = [],[],[],[]
with open('try.txt') as f:
for line in f:
a,b,c,d = line.split()
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))
But if there are more than one blank between elements of data, this code will fail
EDIT 2
Because the solution with regex has been qualified of extremely hard to understand, it can be cleared as follows:
import re
A,B,C,D = [],[],[],[]
pat = re.compile('\s+')
with open('try.txt') as f:
for line in f:
a,b,c,d = pat.split(line.strip())
A.append(int(a));B.append(int(b));C.append(int(c));D.append(int(d))