Extract list from a string - python

I am extracting data from the Google Adwords Reporting API via Python. I can successfully pull the data and then hold it in a variable data.
data = get_report_data_from_google()
type(data)
str
Here is a sample:
data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
I need to process this data more, and ultimately output a processed flat file (Google Adwords API can return a CSV, but I need to pre-process the data before loading it into a database.).
If I try to turn data into a csv object, and try to print each line, I get one character per line like:
c = csv.reader(data, delimiter=',')
for i in c:
print(i)
['I']
['D']
['', '']
['L']
['a']
['b']
['e']
['l']
['s']
['', '']
['D']
['a']
['t']
['e']
So, my idea was to process each column of each line into a list, then add that to a csv object. Trying that:
for line in data.splitlines():
print(line)
3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016
What I actually find is that inside of the str, there is a list: "[""SKWS"",""Exact""]"
This value is a "label" documentation
This list is formatted a bit weird - it has numerous parentheses in the value, so trying to use a quote char, like ", will return something like this: [ SKWS Exact ]. If I could get to [""SKWS"",""Exact""], that would be acceptable.
Is there a good way to extract a list object within a str? Is there a better way to process and output this data to a csv?

You need to split the string first. csv.reader expects something that provides a single line on each iteration, like a standard file object does. If you have a string with newlines in it, split it on the newline character with splitlines():
>>> import csv
>>> data = 'ID,Labels,Date,Year\n3179799191,"[""SKWS"",""Exact""]",2016-05-16,2016\n3179461237,"[""SKWS"",""Broad""]",2016-05-16,2016\n3282565342,"[""SKWS"",""Broad""]",2016-05-16,2016\n'
>>> c = csv.reader(data.splitlines(), delimiter=',')
>>> for line in c:
... print(line)
...
['ID', 'Labels', 'Date', 'Year']
['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']

This has to do with how csv.reader works.
According to the documentation:
csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called
The issue here is that if you pass a string, it supports the iterator protocol, and returns a single character for each call to next. The csv reader will then consider each character as a line.
You need to provide a list of line, one for each line of your csv. For example:
c = csv.reader(data.split(), delimiter=',')
for i in c:
print i
# ['ID', 'Labels', 'Date', 'Year']
# ['3179799191', '["SKWS","Exact"]', '2016-05-16', '2016']
# ['3179461237', '["SKWS","Broad"]', '2016-05-16', '2016']
# ['3282565342', '["SKWS","Broad"]', '2016-05-16', '2016']
Now, your list looks like a JSON list. You can use the json module to read it.

Related

CSV delimiter doesn't work properly [Python]

import csv
base='eest1#mail.ru,username1\
test2#gmail.com,username2\
test3#gmail.com,username3\
test4#rambler.ru,username4\
test5#ya.ru,username5'
parsed=csv.reader(base, delimiter=',')
for p in parsed:
print p
Returns:
['e']
['e']
['s']
['t']
['1']
['#']
['m']
['a']
['i']
['l']
['.']
['r']
['u']
['', '']
etc...
How I can get data separated by comma ?
('test1#gmail.com', 'username1'),
('test2#gmail.com', 'username2'),
...
I think csv only works with file like objects. You can use StringIO in this case.
import csv
import StringIO
base='''eest1#mail.ru,username
test2#gmail.com,username2
test3#gmail.com,username3
test4#rambler.ru,username4
test5#ya.ru,username5'''
parsed=csv.reader(StringIO.StringIO(base), delimiter=',')
for p in parsed:
print p
OUTPUT
['eest1#mail.ru', 'username']
['test2#gmail.com', 'username2']
['test3#gmail.com', 'username3']
['test4#rambler.ru', 'username4']
['test5#ya.ru', 'username5']
Also, your example string does not have newlines, so you would get
['eest1#mail.ru', 'usernametest2#gmail.com', 'username2test3#gmail.com', 'username3test4#rambler.ru', 'username4test5#ya.ru', 'username5']
You can use the ''' like I did, or change your base like
base='eest1#mail.ru,username\n\
test2#gmail.com,username2\n\
test3#gmail.com,username3\n\
test4#rambler.ru,username4\n\
test5#ya.ru,username5'
EDIT
According to the docs, the argument can be either a file-like objet OR a list. So this works too
parsed=csv.reader(base.splitlines(), delimiter=',')
Quoting official docs on csv module (emphasis mine):
csv.reader(csvfile, dialect='excel', **fmtparams)
Return a reader object which will iterate over lines in the given
csvfile. csvfile can be any object which supports the iterator
protocol and returns a string each time its __next__() method is
called — file objects and list objects are both suitable.
Strings supports iterator, but it yields characters from string one by one, not lines from multi-line string.
>>> s = "abcdef"
>>> i = iter(s)
>>> next(i)
'a'
>>> next(i)
'b'
>>> next(i)
'c'
So the task is to create iterator, which would yield lines and not characters on each iterations. Unfortunately, your string literal is not a multiline string.
base='eest1#mail.ru,username1\
test2#gmail.com,username2\
test3#gmail.com,username3\
test4#rambler.ru,username4\
test5#ya.ru,username5'
is equivalent to:
base = 'eest1#mail.ru,username1test2#gmail.com,username2test3#gmail.com,username3test4#rambler.ru,username4test5#ya.ru,username5
Esentially you do not have information required to parse that string correctly. Try using multiline string literal instead:
base='''eest1#mail.ru,username1
test2#gmail.com,username2
test3#gmail.com,username3
test4#rambler.ru,username4
test5#ya.ru,username5'''
After this change you may split your string by newlines characters and everything should work fine:
parsed=csv.reader(base.splitlines(), delimiter=',')
for p in parsed:
print(p)

Convert a Python list of lists to a string

How can I best convert a list of lists in Python to a (for example) comma-delimited-value, newline-delimited-row, string? Ideally, I'd be able to do something like this:
>import csv
>matrix = [["Frodo","Baggins","Hole-in-the-Ground, Shire"],["Sauron", "I forget", "Mordor"]]
> csv_string = csv.generate_string(matrix)
>print(csv_string)
Frodo,Baggins,"Hole-in-the-Ground, Shire"
Sauron,I forget,Mordor
I know that Python has a csv module, as seen in SO questions like this but all of its functions seem to operate on a File object. The lists are small enough that using a file is overkill.
I'm familiar with the join function, and there are plenty of SO answers about it. But this doesn't handle values that contain a comma, nor does it handle multiple rows unless I nest a join within another join.
Combine the csv-module with StringIO:
import io, csv
result = io.StringIO()
writer = csv.writer(result)
writer.writerow([5,6,7])
print(result.getvalue())
The approach in the question you link to as a reference for join, together with a nested joins (what's wrong with that?) works as long as you can convert all of the objects contained in your list of lists to a string:
list_of_lists = [[1, 'a'], [2, 3, 'b'], ['c', 'd']]
joined = '\n'.join(','.join(map(str, row)) for row in list_of_lists)
print(join)
Output:
1,a
2,3,b
c,d
EDIT:
If the string representation of your objects may contain commas, here are a couple of things you could do to achieve an output that can recover the original list of lists:
escape those commas, or
wrap said string representations in some flavor of quotes (then you have to escape the occurrences of that character inside your values). This is precisely what the combination of io.StringIO and csv does (see Daniel's answer).
To achieve the first, you could do
import re
def escape_commas(obj):
return re.sub(',', '\,', str(obj))
joined = '\n'.join(','.join(map(escape_commas, row)) for row in list_of_lists)
For the second,
import re
def wrap_in_quotes(obj):
return '"' + re.sub('"', '\"', str(obj)) + '"'
joined = '\n'.join(','.join(map(wrap_in_quotes, row)) for row in list_of_lists)

Split string by comma, ignoring comma inside string. Am trying CSV

I have a string like this:
s = '1,2,"hello, there"'
And I want to turn it into a list:
[1,2,"hello, there"]
Normally I'd use split:
my_list = s.split(",")
However, that doesn't work if there's a comma in a string.
So, I've read that I need to use cvs, but I don't really see how. I've tried:
from csv import reader
s = '1,2,"hello, there"'
ll = reader(s)
print ll
for row in ll:
print row
Which writes:
<_csv.reader object at 0x020EBC70>
['1']
['', '']
['2']
['', '']
['hello, there']
I've also tried with
ll = reader(s, delimiter=',')
It is that way because you provide the csv reader input as a string. If you do not want to use a file or a StringIO object just wrap your string in a list as shown below.
>>> import csv
>>> s = ['1,2,"hello, there"']
>>> ll = csv.reader(s, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
>>> list(ll)
[['1', '2', 'hello, there']]
It sounds like you probably want to use the csv module. To use the reader on a string, you want a StringIO object.
As an example:
>> import csv, StringIO
>> print list(csv.reader(StringIO.StringIO(s)))
[['1', '2', 'hello, there']]
To clarify, csv.reader expects a buffer object, not a string. So StringIO does the trick. However, if you're reading this csv from a file object, (a typical use case) you can just as easily give the file object to the reader and it'll work the same way.
It's usually easier to re-use than to invent a bicycle... You just to use csv library properly. If you can't for some reason, you can always check the source code out and learn how's the parsing done there.
Example for parsing a single string into a list. Notice that the string in wrapped in list.
>>> import csv
>>> s = '1,2,"hello, there"'
>>> list(csv.reader([s]))[0]
['1', '2', 'hello, there']
You can split first by the string delimiters, then by the commas for every even index (The ones not in the string)
import itertools
new_data = s.split('"')
for i in range(len(new_data)):
if i % 2 == 1: # Skip odd indices, making them arrays
new_data[i] = [new_data[i]]
else:
new_data[i] = new_data[i].split(",")
data = itertools.chain(*new_data)
Which goes something like:
'1,2,"hello, there"'
['1,2,', 'hello, there']
[['1', '2'], ['hello, there']]
['1', '2', 'hello, there']
But it's probably better to use the csv library if that's what you're working with.
You could also use ast.literal_eval if you want to preserve the integers:
>>> from ast import literal_eval
>>> literal_eval('[{}]'.format('1,2,"hello, there"'))
[1, 2, 'hello, there']

Writing a csv file python

So i have a list:
>>> print references
>>> ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ', 'Reference xxx-xxx-xxxx-00-398 ', 'Reference xxx-xxx-xxxx-00-399']
(The list is much longer than that)
I need to write a CSV file wich would look this:
Column 1:
Reference xxx-xxx-xxx-007
Reference xxx-xxx-xxx-001
[...]
I tried this :
c = csv.writer(open("file.csv", 'w'))
for item in references:
c.writerows(item)
Or:
for i in range(0,len(references)):
c.writerow(references[i])
But when I open the csv file created, I get a window asking me to choose the delimiter
No matter what, I have something like
R,e,f,e,r,e,n,c,es
writerows takes a sequence of rows, each of which is a sequence of columns, and writes them out.
But you only have a single list of values. So, you want:
for item in references:
c.writerow([item])
Or, if you want a one-liner:
c.writerows([item] for item in references)
The point is, each row has to be a sequence; as it is, each row is just a single string.
So, why are you getting R,e,f,e,r,e,n,c,e,… instead of an error? Well, a string is a sequence of characters (each of which is itself a string). So, if you try to treat "Reference" as a sequence, it's the same as ['R', 'e', 'f', 'e', 'r', 'e', 'n', 'c', 'e'].
In a comment, you asked:
Now what if I want to write something in the second column ?
Well, then each row has to be a list of two items. For example, let's say you had this:
references = ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ']
descriptions = ['shiny thingy', 'dull thingy']
You could do this:
csv.writerows(zip(references, descriptions))
Or, if you had this:
references = ['Reference xxx-xxx-xxx-007 ', 'Reference xxx-xxx-xxx-001 ', 'Reference xxx-xxx-xxx-001 ']
descriptions = {'Reference xxx-xxx-xxx-007 ': 'shiny thingy',
'Reference xxx-xxx-xxx-001 ': 'dull thingy']}
You could do this:
csv.writerows((reference, descriptions[reference]) for reference in references)
The key is, find a way to create that list of lists—if you can't figure it out all in your head, you can print all the intermediate steps to see what they look like—and then you can call writerows. If you can only figure out how to create each single row one at a time, use a loop and call writerow on each row.
But what if you get the first column values, and then later get the second column values?
Well, you can't add a column to a CSV; you can only write by row, not column by column. But there are a few ways around that.
First, you can just write the table in transposed order:
c.writerow(references)
c.writerow(descriptions)
Then, after you import it into Excel, just transpose it back.
Second, instead of writing the values as you get them, gather them up into a list, and write everything at the end. Something like this:
rows=[[item] for item in references]
# now rows is a 1-column table
# ... later
for i, description in enumerate(descriptions):
values[i].append(description)
# and now rows is a 2-column table
c.writerows(rows)
If worst comes to worst, you can always write the CSV, then read it back and write a new one to add the column:
with open('temp.csv', 'w') as temp:
writer=csv.writer(temp)
# write out the references
# later
with open('temp.csv') as temp, open('real.csv', 'w') as f:
reader=csv.reader(temp)
writer=csv.writer(f)
writer.writerows(row + [description] for (row, description) in zip(reader, descriptions))
writerow writes the elements of an iterable in different columns. This means that if your provide a tuple, each element will go in one column. If you provide a String, each letter will go in one column. If you want all the content in the same column do the following:
c = csv.writer(open("file.csv", 'wb'))
c.writerows(references)
or
for item in references:
c.writerow(references)
c = csv.writer(open("file.csv", 'w'))
c.writerows(["Reference"])
# cat file.csv
R,e,f,e,r,e,n,c,e
but
c = csv.writer(open("file.csv", 'w'))
c.writerow(["Reference"])
# cat file.csv
Reference
Would work as others have said.
My original answer was flawed due to confusing writerow and writerows.

python parse csv to lists

I have a csv file thru which I want to parse the data to the lists.
So I am using the python csv module to read that
so basically the following:
import csv
fin = csv.reader(open(path,'rb'),delimiter=' ',quotechar='|')
print fin[0]
#gives the following
['"1239","2249.00","1","3","2011-02-20"']
#lets say i do the following
ele = str(fin[0])
ele = ele.strip().split(',')
print ele
#gives me following
['[\'"1239"', '"2249.00"', '"1"', '"3"', '"2011-02-20"\']']
now
ele[0] gives me --> output---> ['"1239"
How do I get rid of that ['
In the end, I want to do is get 1239 and convert it to integer.. ?
Any clues why this is happening
Thanks
Edit:*Never mind.. resolved thanks to the first comment *
Change your delimiter to ',' and you will get a list of those values from the csv reader.
It's because you are converting a list to a string, there is no need to do this. Grab the first element of the list (in this case it is a string) and parse that:
>>> a = ['"1239","2249.00","1","3","2011-02-20"']
>>> a
['"1239","2249.00","1","3","2011-02-20"']
>>> a[0]
'"1239","2249.00","1","3","2011-02-20"'
>>> b = a[0].replace('"', '').split(',')
>>> b[-1]
'2011-02-20'
of course before you do replace and split string methods you should check if the type is string or handle the exception if it isn't.
Also Blahdiblah is correct your delimiter is probably wrong.

Categories