Writing a csv file python

Writing a csv file python - python

So i have a list:
>>> print references
>>> ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ', 'Reference xxx-xxx-xxxx-00-398 ', 'Reference xxx-xxx-xxxx-00-399']
(The list is much longer than that)
I need to write a CSV file wich would look this:
Column 1:
Reference xxx-xxx-xxx-007
Reference xxx-xxx-xxx-001
[...]
I tried this :
c = csv.writer(open("file.csv", 'w'))
for item in references:
c.writerows(item)
Or:
for i in range(0,len(references)):
c.writerow(references[i])
But when I open the csv file created, I get a window asking me to choose the delimiter
No matter what, I have something like
R,e,f,e,r,e,n,c,es

writerows takes a sequence of rows, each of which is a sequence of columns, and writes them out.
But you only have a single list of values. So, you want:
for item in references:
c.writerow([item])
Or, if you want a one-liner:
c.writerows([item] for item in references)
The point is, each row has to be a sequence; as it is, each row is just a single string.
So, why are you getting R,e,f,e,r,e,n,c,e,… instead of an error? Well, a string is a sequence of characters (each of which is itself a string). So, if you try to treat "Reference" as a sequence, it's the same as ['R', 'e', 'f', 'e', 'r', 'e', 'n', 'c', 'e'].
In a comment, you asked:
Now what if I want to write something in the second column ?
Well, then each row has to be a list of two items. For example, let's say you had this:
references = ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ']
descriptions = ['shiny thingy', 'dull thingy']
You could do this:
csv.writerows(zip(references, descriptions))
Or, if you had this:
references = ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ', 'Reference xxx-xxx-xxx-001 ']
descriptions = {'Reference xxx-xxx-xxx-007 ': 'shiny thingy',
'Reference xxx-xxx-xxx-001 ': 'dull thingy']}
You could do this:
csv.writerows((reference, descriptions[reference]) for reference in references)
The key is, find a way to create that list of lists—if you can't figure it out all in your head, you can print all the intermediate steps to see what they look like—and then you can call writerows. If you can only figure out how to create each single row one at a time, use a loop and call writerow on each row.
But what if you get the first column values, and then later get the second column values?
Well, you can't add a column to a CSV; you can only write by row, not column by column. But there are a few ways around that.
First, you can just write the table in transposed order:
c.writerow(references)
c.writerow(descriptions)
Then, after you import it into Excel, just transpose it back.
Second, instead of writing the values as you get them, gather them up into a list, and write everything at the end. Something like this:
rows=[[item] for item in references]
# now rows is a 1-column table
# ... later
for i, description in enumerate(descriptions):
values[i].append(description)
# and now rows is a 2-column table
c.writerows(rows)
If worst comes to worst, you can always write the CSV, then read it back and write a new one to add the column:
with open('temp.csv', 'w') as temp:
writer=csv.writer(temp)
# write out the references
# later
with open('temp.csv') as temp, open('real.csv', 'w') as f:
reader=csv.reader(temp)
writer=csv.writer(f)
writer.writerows(row + [description] for (row, description) in zip(reader, descriptions))

writerow writes the elements of an iterable in different columns. This means that if your provide a tuple, each element will go in one column. If you provide a String, each letter will go in one column. If you want all the content in the same column do the following:
c = csv.writer(open("file.csv", 'wb'))
c.writerows(references)
or
for item in references:
c.writerow(references)

c = csv.writer(open("file.csv", 'w'))
c.writerows(["Reference"])
# cat file.csv
R,e,f,e,r,e,n,c,e
but
c = csv.writer(open("file.csv", 'w'))
c.writerow(["Reference"])
# cat file.csv
Reference
Would work as others have said.
My original answer was flawed due to confusing writerow and writerows.

Related

Rearranging cells in a .tsv file

I have a .tsv file which I have attached along with this post. I have rows(cells) in the format of A1,A2,A3...A12 , B1..B2, .... H1..H12. I need to re-arrange this to a format like A1,B1,C1,D1,...H1 , A2,B2,C2,...H2 ..... A12,B12,C12,...H12.
I need to do this using Python.
I have another .tsv file that allows me to compare it with this file. It is called flipped.tsv . The flipped.tsv file contains the accurate well values corresponding to the cells. In other words, I must map the well values with their accurate cell-lines.
From what I have understood is that the cell line of the meta-data is incorreclty arranged in column-major whereas it has to be arranged in a row-major format like how it is in flipped.tsv file.
For example :
"A2 of flipped_metadata.tsv has the same well values as that of B1 of metadata.tsv."
What is the logic that I can carry out to perform this in Python?
First .tsv file
flipped .tsv file

You could do the following:
import csv
# Read original file
with open("file.tsv", "r") as file:
rows = list(csv.reader(file, delimiter="\t"))
# Key function for sorting
def key_func(row):
""" Transform row in sort key, e.g. ['A7', 1, 2] -> (7, 'A') """
return int(row[0][1:]), row[0][0]
# Write `flipped´ file
with open("file_flipped.tsv", "w") as file:
csv.writer(file, delimiter="\t").writerows(
row[:1] + flipped[1:]
for row, flipped in zip(rows, sorted(rows, key=key_func))
)
The flipping is done by sorting the original rows by
first the integer part of their first row entry int(row[0][1:]), and
then the character part of their first entry row[0][0].
See tio.run illustration here.
If the effect of the sorting isn't obvious, take a look at the result of the same operation, just without the relabelling of the first column:
with open("file_flipped.tsv", "w") as file:
csv.writer(file, delimiter="\t").writerows(
sorted(rows, key=key_func)
)
Output:
A1 26403 23273
B1 27792 8805
C1 5668 19510
...
F12 100 28583
G12 18707 14889
H12 13544 7447
The blocks are build based on the number part first, and within each block the lines run through the sorted characters.
This only works as long as the non-number part has always exactly one character.
If the non-number part has always exactly 2 characters then the return of the key function has to be adjusted to int(row[0][2:]), row[0][:2] etc.
If there's more variability allowed, e.g. between 1 and 5 characters, then a regex approach would be more appropriate:
import re
re_key = re.compile(r"([a-zA-Z]+)(\d+)")
def key_func(row):
""" Transform row in sort key, e.g. ['Aa7', 10, 20] -> (7, 2, 'Aa') """
word, number = re_key.match(row[0]).group(1, 2)
return int(number), len(word), word
Here's a regex demo.
And, depending on how the words have to be sorted, it might be necessary to include the length of the word into the sort key: Python sorts ['B', 'AA', 'A'] naturally into ['A', 'AA', 'B'] and not ['A', 'B', 'AA']. The addition of the length, like in the function, does achieve that.

Using Zip Function to Create Columns in CSV with non-identical lengths of data

I have large number of files that are named according to a gradually more specific criteria.
Each part of the filename separate by the '_' relate to a drilled down categorization of that file.
The naming convetion looks like this:
TEAM_STRATEGY_ATTRIBUTION_TIMEFRAME_DATE_FILEVIEW
What I am trying to do is iterate through all these files and then pull out a list of how many different occurrences of each naming convention exists.
So essentially this is what I've done so far, I iterated through all the files and made a list of each name. I then separated each name by the '_' and then appended each of those to their respective category lists.
Now I'm trying to export them to a CSV file separated by columns and this is where I'm running into problems
L = [teams, strategies, attributions, time_frames, dates, file_types]
columns = zip(*L)
list(columns)
with open (_outputfolder_, 'w') as f:
writer = csv.writer(f)
for column in columns:
print(column)
This is a rough estimation of the list I'm getting out:
[{'TEAM1'},
{'STRATEGY1', 'STRATEGY2', 'STRATEGY3', 'STRATEGY4', 'STRATEGY5', 'STRATEGY6', 'STRATEGY7', 'STRATEGY8', 'STRATEGY9', 'STRATEGY10','STRATEGY11', 'STRATEGY12', 'STRATEGY13', 'STRATEGY14', 'STRATEGY15'},
{'ATTRIBUTION1','ATTRIBUTION1','Attribution3','Attribution4','Attribution5', 'Attribution6', 'Attribution7', 'Attribution8', 'Attribution9', 'Attribution10'},
{'TIME_FRAME1', 'TIME_FRAME2', 'TIME_FRAME3', 'TIME_FRAME4', 'TIME_FRAME5', 'TIME_FRAME6', 'TIME_FRAME7'},
{'DATE1'},
{'FILE_TYPE1', 'FILE_TYPE2'}]
What I want the final result to look like is something like:
Team1 STRATEGY1 ATTRIBUTION1 TIME_FRAME1 DATE1 FILE_TYPE1
STRATEGY2 ATTRIBUTION2 TIME_FRAME2 FILE_TYPE2
... ... ...
etc. etc. etc.
But only the first line actually gets stored in the CSV file.
can anyone help me understand how to iterate just past the first line? I'm sure this is happening because the Team type has only one option, but I don't want this to hinder it.

I referred to the answer, you have to transpose the result and use it.
refer the post below ,
Python - Transposing a list (rows with different length) using numpy fails.
I have used natural sorting to sort the integers and appended the lists with blanks to have the expected outcome.
The natural sorting is slower for larger lists
you can also use third party libraries,
Does Python have a built in function for string natural sort?
def natural_sort(l):
convert = lambda text: int(text) if text.isdigit() else text.lower()
alphanum_key = lambda key: [ convert(c) for c in re.split('([0-9]+)', key) ]
return sorted(l, key = alphanum_key)
res = [[] for _ in range(max(len(sl) for sl in columns))]
count = 0
for sl in columns:
sorted_sl = natural_sort(sl)
for x, res_sl in zip(sorted_sl, res):
res_sl.append(x)
for result in res:
if (count > 0 ):
result.insert(0,'')
count = count +1
with open ("test.csv", 'w', newline='') as f:
writer = csv.writer(f)
writer.writerows(res)
f.close()
the columns should be converted in to list before printing to csv file
writerows method can be leveraged to print multiplerows
https://docs.python.org/2/library/csv.html -- you can find more information here
TEAM1,STRATEGY1,ATTRIBUTION1,TIME_FRAME1,DATE1,FILE_TYPE1
,STRATEGY2,Attribution3,TIME_FRAME2,FILE_TYPE2
,STRATEGY3,Attribution4,TIME_FRAME3
,STRATEGY4,Attribution5,TIME_FRAME4
,STRATEGY5,Attribution6,TIME_FRAME5
,STRATEGY6,Attribution7,TIME_FRAME6
,STRATEGY7,Attribution8,TIME_FRAME7
,STRATEGY8,Attribution9
,STRATEGY9,Attribution10
,STRATEGY10
,STRATEGY11
,STRATEGY12
,STRATEGY13
,STRATEGY14
,STRATEGY15

Dictreader and regex, indexing issue

Hey im having an issue creating a list of all strings from my list that match the regex, and the field names associated with the DictReader.
I am looping through an array of strings, and trying to see if each string matches a pattern:
reader = csv.DictReader(file)
for mystr in reader:
for i in range(len(mystr)):
if re.search(pattern, list(mystr.values())[i]):
data.append([list(reader.fieldnames)[i],list(mystr.values())[i]])
When a string matches the pattern, it appends the matched string and the csv field name to a list.
This works, however there seems to be an issue with it appending a seemingly random field names to the correct and expected matched regex value.
I.E, If my data was ordered
Names, Location, Price
Sometimes the if condition from the regex will append the field name location to the numerical value associated with price. And it seems to have no predictable pattern as to which value is associates...
The results:
[['firstitem'], ['seconditem'], ['thirditem'], ['fourthitem', '27'], ['fifthitem', '201']]
[['firstitem','1'], ['seconditem'], ['thirditem','12'], ['fourthitem'], ['fifthitem']]
etc..
The numbers all appear in the correct order, they just are not aligning in what i can read as a pattern/order so im not sure why they appear somewhat random. Any help would be appreciated.

I think you can simplify your code like this:
reader = csv.DictReader(file)
for mystr in reader:
for fieldname, value in mystr.items():
if re.search(pattern, value):
data.append([fieldname, value])
That way, it is easier to understand…

Given a completely contrived csv like the following (saved as 'test.csv'):
firstitem, seconditem, thirditem, fourthitem, fifthitem
first, price, 1, nothing, important
second, price, 2, over, here
Then the following should extract all columns with integers:
>>> def get_items(pattern, csv_file):
with open(csv_file) as file:
for entry in csv.DictReader(file):
for field_name, value in entry.items():
if re.search(pattern, value):
yield [field_name, value]
>>> data = list(get_items(r'\d+', 'test.csv'))
[[' thirditem', ' 1'], [' thirditem', ' 2']]
Alternatively, you could use if value.strip().isdigit() as the conditional statement rather than having to use regex.

Extract list from a string

I am extracting data from the Google Adwords Reporting API via Python. I can successfully pull the data and then hold it in a variable data.
data = get_report_data_from_google()
type(data)
str
Here is a sample:
data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
I need to process this data more, and ultimately output a processed flat file (Google Adwords API can return a CSV, but I need to pre-process the data before loading it into a database.).
If I try to turn data into a csv object, and try to print each line, I get one character per line like:
c = csv.reader(data, delimiter=',')
for i in c:
print(i)
['I']
['D']
['', '']
['L']
['a']
['b']
['e']
['l']
['s']
['', '']
['D']
['a']
['t']
['e']
So, my idea was to process each column of each line into a list, then add that to a csv object. Trying that:
for line in data.splitlines():
print(line)
3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016
What I actually find is that inside of the str, there is a list: "[""SKWS"",""Exact""]"
This value is a "label" documentation
This list is formatted a bit weird - it has numerous parentheses in the value, so trying to use a quote char, like ", will return something like this: [ SKWS Exact ]. If I could get to [""SKWS"",""Exact""], that would be acceptable.
Is there a good way to extract a list object within a str? Is there a better way to process and output this data to a csv?

You need to split the string first. csv.reader expects something that provides a single line on each iteration, like a standard file object does. If you have a string with newlines in it, split it on the newline character with splitlines():
>>> import csv
>>> data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
>>> c = csv.reader(data.splitlines(), delimiter=',')
>>> for line in c:
... print(line)
...
['ID', 'Labels', 'Date', 'Year']
['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']

This has to do with how csv.reader works.
According to the documentation:
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
The issue here is that if you pass a string, it supports the iterator protocol, and returns a single character for each call to next. The csv reader will then consider each character as a line.
You need to provide a list of line, one for each line of your csv. For example:
c = csv.reader(data.split(), delimiter=',')
for i in c:
print i
# ['ID', 'Labels', 'Date', 'Year']
# ['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
# ['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
# ['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']
Now, your list looks like a JSON list. You can use the json module to read it.

Append csv by row from two lists in Python?

I have two lists in Python: 'away' and 'home'. I want to append them to an already existing csv file such that I write a row solely of 1st element of away, then 1st element of home, then the 2nd element of away, then the 2nd element of home,...etc with empty spaces in between them, so it will be like this:
away1
home1
away2
home2
away3
home3
and so on and so on. The size of the away and home lists is the same, but might change day to day. How can I do this?
Thanks

Looks like you just want the useful and flexible zip built-in.
>>> away = ["away1", "away2", "away3"]
>>> home = ["home1", "home2", "home3"]
>>> list(zip(away, home))
[('away1', 'home1'), ('away2', 'home2'), ('away3', 'home3')]

import csv
away = ["away1", "away2", "away3"]
home = ["home1", "home2", "home3"]
record_list = [ list(item) for item in list(zip(away, home)) ]
print record_list
with open("sample.csv", "a") as fp:
writer = csv.writer(fp)
writer.writerows(record_list)
# record_list = [['away1', 'home1'], ['away2', 'home2'], ['away3', 'home3']]
You should use writerows method to write multiple list at a time to each row.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.