Python CSV output - additional formatting - python

To start...Python noob...
My first goal is to read the first row of a CSV and output. The following code does that nicely.
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = ','):
csvFileArray.append(row)
print(csvFileArray[0])
Output looks like...
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',......
My second and third tasks deal with formatting.
Thus, if I want the print(csvFileArray[0]) output to contain 'double quotes' for the delimiter how best can I handle that?
I'd like to see...
["Date","Time", "CPU001 User%", "CPU001 Sys%",......
I have played with formatting the csvFileArray field and all I can get it to do is to prefix or append data.
I have also looked into the 'dialect', 'quoting', etc., but am just all over the place.
My last task is to add text into each value (into the array).
Example:
["Test Date","New Time", "Red CPU001 User%", "Blue CPU001 Sys%",......
I've researched a number of methods to do this but am awash in the multiple ways.
Should I ditch the Array as this is too constraining?
Looking for direction not necessarily someone to write it for me.
Thanks.
OK.....refined the code a bit and am looking for direction, not direct solution (need to learn).
import csv
with open('ba200952fd69 - Copy.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print (row)
break
The code nicely reads the first line of the CSV and outputs the first row as follows:
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',....
If I want to add formatting to each/any item within that row, would I be performing those actions within the quotes of the print command? Example: If I wanted each item to have double-quotes, or have a prefix of 'XXXX', etc.
I have read through examples of .join type commands, etc., and am sure that there are much easier ways to format print output than I'm aware of.
Again, looking for direction, not immediate solutions.

For your first task, I'd recommend using the next function to grab the first row rather than iterating through the whole csv. Also, it might be useful to take a look at with blocks as they are the standard way of dealing with opening and closing files.
For your second question, it looks like you want to change the format of the print statement. Note that it is printing strings, which is indicated by the single quotes around each element in the array. This has nothing to do with the csv module, but simply because you are print an array of strings. To print with double quotes, you would have to reformat the print statement. You could take a look at this for some ways on doing that.
For your last question, I'd recommend looking at list comprehensions. E.g.,
["Test " + word for word in words].
If words = ["word1", "word2"], then this would return ["Test word1", "Test word2"].
Edit: If you want to add a different value to each value in the array, you could do something similar. Let prefixes be an array of prefixes you want to add to the word in words at the same index location. You could then use the list comprehension:
[prefix + " " + word for prefix, word in zip(prefixes, words)]

Related

Writing strings to CSV causing issue where the string in CSV is separated by commas (Python)

I am facing an issue that I was not able to resolve so far. I need to save a list of strings into the CSV file. I am able to do it however the strings from each element of the list are separated by commas. Why is that and what do I need to do to resolve this issue? Sorry for maybe simple question I am new to programming. I know it has to be somehow related to the string properties where each number is similar to an item in the list and is indexed however I was not able to find the cause of this behavior and how to resolve it.
Here is the code:
duplicity = ['8110278643', '8110278830', '8110283186']
with open("duplicty.csv", "w", newline="") as duplicity_csv:
output_writer = csv.writer(duplicity_csv)
header = ["Number"]
output_writer.writerow(header)
for item in duplicity:
output_writer.writerow(item)
The output of this code in CSV is following:
Number
8,1,1,0,2,7,8,6,4,3
8,1,1,0,2,7,8,8,3,0
8,1,1,0,2,8,3,1,8,6
The expected output should be:
Number
8110278643
8110278830
8110283186
Thanks a lot for your replies!
The writerow method takes an iterable of strings. Each item in your list is in fact an iterable -- namely a string. Each element from that iterable (in your case each letter in the string) is taken as its own element in a sperate column.
You could just do this instead:
...
for item in duplicity:
output_writer.writerow([item])
Use writerows, for example:
duplicity = ['8110278643', '8110278830', '8110283186']
with open("duplicty.csv", "w", newline="") as duplicity_csv:
output_writer = csv.writer(duplicity_csv)
header = ["Number"]
output_writer.writerows([row] for row in header + duplicity)
writerow() needs list with items (even when you have single item in row) but you use single string and it treads it as list of chars
You need
output_writer.writerow( [item] )

Python split tabspaced bilingual txt to two separate txt files (list) with newlines separating strings

I have a bi-lingual corpora (EN-JP) from tatoeba and want to split this into two separate files. The strings have to say on the same line respectively.
I need this for training an NMT in nmt-keras and training data has to be stored in separate files for each language. I tried several approaches, but since I'm an absolute beginner with python and coding in general I feel like I'm running in circles.
So far the best I managed was the following:
Source txt:
Go. 行け。
Go. 行きなさい。
Hi. やっほー。
Hi. こんにちは!
Code:
with open('jpns.txt', encoding="utf8") as f:
columns = zip(*(l.split("\t") for l in f))
list1= list(columns)
print(list1)
[('Go.', 'Go.', 'Hi.', 'Hi.'), ('行け。\n', '行きなさい。\n', 'やっほー。\n', 'こんにちは!')]
Result with my code:
[('Go.', 'Go.', 'Hi.', 'Hi.'), ('行け。\n', '行きなさい。\n', 'やっほー。\n', 'こんにちは!')]
English and Japanese get properly separated (into a Tuple?) but I'm stuck at figuring out how to export only English and how to export only Japanese to an output.en and an output.jp respecitvely.
Expected result:
output.en
Go.
Go.
Hi.
Hi.
output.jp
行け。
行きなさい。
やっほー。
こんにちは!
Each outputted strings should contain an \n after the string.
Please keep in mind that I'm a total beginner with coding, so I'm not exactly sure what I did after "zip" as I just found this here on stackoverflow. I'd be really gratful for a fully commented suggestion.
The first thing to be aware of is that iterating over a file retains the newlines. That means that in your two columns, the first has no newlines, while the second has newlines already appended to each line (except possibly the last).
Writing the second column is therefore trivial if you've already unpacked the generator columns:
with open('output.jp', 'w') as f:
f.writelines(list1[-1])
But you still have to append newlines to the first column (and possibly others if you go full-on multilingual). One way would be to append newlines to all the columns but the last. Another would be to strip the columns from the last column and process all of them the same.
You can achieve the result you want with a small loop, and another call to zip:
langs = ('en', 'jp')
for index, (lang, data) in enumerate(zip(langs, columns)):
with open('output.' + lang, 'w') as f:
if index < len(langs) - 1:
data = (line + '\n' for line in data)
f.writelines(data)
This approach replaces the tuple data with a generator that appends newlines, unless we are at the last column.
There are a couple of ways to insert newlines between each line in the output files. The one I show uses a lazy generator to append to each line individually. This should save a little memory. If you don't care about memory savings, you can output the whole file as a single string:
joiner = '\n' if index < len(langs) - 1 else ''
f.write(joiner.join(data))
You can even write the loop yourself and print to the file:
for line in data:
print(line, file=f, end='\n' if index < len(args) - 1 else '')
Addendum
Let's also look at the line columns = zip(*(l.split("\t") for l in f)) in detail, since it is a very common Python idiom for transposing nested lists, and is the key to getting the result you want.
The generator expression l.split("\t") for l in f is pretty straightforward: it splits each line in the file around tabs, giving you two elements, one in English, and one in Japanese. Adding a * in front of the generator expands it so that each two-element row becomes a separate argument to zip. zip then re-combines the respective elements of each row, so you get a column of the English elements, and a column of the Japanese elements, effectively transposing your original "matrix".
The result is that columns is a generator over the columns. You can convert it to a list, but that is only necessary for viewing. The generator will work fine for the code shown above.

Row in Excel to array?

I have lots of data in an Excel spreadsheet that I need to import using Python. i need each row to be read as an array so I can call on the first data point in a specified row, the second, the third, and so on.
This is my code so far:
from array import *
import csv
with open ('vals.csv', 'rb') as csvfile:
reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
reader_x = []
reader_y = []
reader_z = []
row = next(reader)
reader_x.append(row)
row = next(reader)
reader_y.append(row)
row = next(reader)
reader_z.append(row)
print reader_x
print reader_y
print reader_z
print reader_x[0]
It is definitely storing it as an array I think. But I think it is storing the entire row of Excel as a string instead of each block being a separate data point, because when I tell Python to print an entire array it looks something like this (a shortened version because there's like a thousand in each row):
[['13,14,12']]
And when I tell it to print reader_x[0] (or any of the other two for that matter) it looks like this:
['13,14,12']
But when I tell it to print anything beyond the 0th thing in the array, it just gives me an error because it's out of range.
How can I fix this? How can I make it [13,14,12] instead of ['13,14,12'] so I can actually use these numbers in calculation? (I want to avoid downloading any more libraries if I can because this is for a school thing and I need to avoid that if possible.)
I have been stuck on this for several days and nothing I can find has worked for me and half of it I didn't even understand. Please try to explain simply if you can, as if you're talking to someone who doesn't even know how to print "Hello World".
You can use split to do this and use , as a separator.
For example:
row = '11,12,13'
row = row.split(',')
It is a csv, (comma separated values) try setting delimiter to ','
You don't need from array import * ... What the rest of the world calls an array is called a list in Python. The Python array is rather specialised and you are not actually using it so just delete that line of code.
As others have pointed out, you need incoming lines to be split. The csv default delimiter is a comma. Just let csv.reader do the job, something like this:
reader = csv.reader(csvfile)
data = [map(int, row) for row in reader]

Writing multiple values in single cell in csv

For each user I have the list of events in which he participated.
e.g. bob : [event1,event2,...]
I want to write it in csv file. I created a dictionary (key - user & value - list of events)
I wrote it in csv. The following is the sample output
username, frnds
"abc" ['event1','event2']
where username is first col and frnds 2nd col
This is code
writer = csv.writer(open('eventlist.csv', 'ab'))
for key, value in evnt_list.items():
writer.writerow([key, value])
when I am reading the csv I am not getting the list directly. But I am getting it in following way
['e','v','e','n','t','1','','...]
I also tried to write the list directly in csv but while reading am getting the same output.
What I want is multiple values in a single cell so that when I read a column for a row I get list of all events.
e.g
colA colB
user1,event1,event2,...
I think it's not difficult but somehow I am not getting it.
###Reading
I am reading it with the help of following
codereader = csv.reader(open("eventlist.csv"))
reader.next()
for row in reader:
tmp=row[1]
print tmp # it is printing the whole list but
print tmp[0] #the output is [
print tmp[1] #output is 'e' it should have been 'event1'
print tmp[2] #output is 'v' it should have been 'event2'
you have to format your values into a single string:
with open('eventlist.csv', 'ab') as f:
writer = csv.writer(f, delimiter=' ')
for key, value in evnt_list.items():
writer.writerow([key, ','.join(value)])
exports as
key1 val11,val12,val13
key2 val21,val22,val23
READING: Here you have to keep in mind, that you converted your Python list into a formatted string. Therefore you cannot use standard csv tools to read it:
with open("eventlist.csv") as f:
csvr = csv.reader(f, delimiter=' ')
csvr.next()
for rec in csvr:
key, values_txt = rec
values = values_txt.split(',')
print key, values
works as awaited.
You seem to be saying that your evnt_list is a dictionary whose keys are strings and whose values are lists of strings. If so, then the CSV-writing code you've given in your question will write a string representation of a Python list into the second column. When you read anything in from CSV, it will just be a string, so once again you'll have a string representation of your list. For example, if you have a cell that contains "['event1', 'event2']" you will be reading in a string whose first character (at position 0) is [, second character is ', third character is e, etc. (I don't think your tmp[1] is right; I think it is really ', not e.)
It sounds like you want to reconstruct the Python object, in this case a list of strings. To do that, use ast.literal_eval:
import ast
cell_string_value = "['event1', 'event2']"
cell_object = ast.literal_eval(cell_string_value)
Incidentally, the reason to use ast.literal_eval instead of just eval is safety. eval allows arbitrary Python expressions and is thus a security risk.
Also, what is the purpose of the CSV, if you want to get the list back as a list? Will people be reading it (in Excel or something)? If not, then you may want to simply save the evnt_list object using pickle or json, and not bother with the CSV at all.
Edit: I should have read more carefully; the data from evnt_list is being appended to the CSV, and neither pickle nor json is easily appendable. So I suppose CSV is a reasonable and lightweight way to accumulate the data. A full-blown database might be better, but that would not be as lightweight.

How to use python csv module for splitting double pipe delimited data

I have got data which looks like:
"1234"||"abcd"||"a1s1"
I am trying to read and write using Python's csv reader and writer.
As the csv module's delimiter is limited to single char, is there any way to retrieve data cleanly? I cannot afford to remove the empty columns as it is a massively huge data set to be processed in time bound manner. Any thoughts will be helpful.
The docs and experimentation prove that only single-character delimiters are allowed.
Since cvs.reader accepts any object that supports iterator protocol, you can use generator syntax to replace ||-s with |-s, and then feed this generator to the reader:
def read_this_funky_csv(source):
# be sure to pass a source object that supports
# iteration (e.g. a file object, or a list of csv text lines)
return csv.reader((line.replace('||', '|') for line in source), delimiter='|')
This code is pretty effective since it operates on one CSV line at a time, provided your CSV source yields lines that do not exceed your available RAM :)
>>> import csv
>>> reader = csv.reader(['"1234"||"abcd"||"a1s1"'], delimiter='|')
>>> for row in reader:
... assert not ''.join(row[1::2])
... row = row[0::2]
... print row
...
['1234', 'abcd', 'a1s1']
>>>
Unfortunately, delimiter is represented by a character in C. This means that it is impossible to have it be anything other than a single character in Python. The good news is that it is possible to ignore the values which are null:
reader = csv.reader(['"1234"||"abcd"||"a1s1"'], delimiter='|')
#iterate through the reader.
for x in reader:
#you have to use a numeric range here to ensure that you eliminate the
#right things.
for i in range(len(x)):
#Odd indexes will be discarded.
if i%2 == 0: x[i] #x[i] where i%2 == 0 represents the values you want.
There are other ways to accomplish this (a function could be written, for one), but this gives you the logic which is needed.
If your data literally looks like the example (the fields never contain '||' and are always quoted), and you can tolerate the quote marks, or are willing to slice them off later, just use .split
>>> '"1234"||"abcd"||"a1s1"'.split('||')
['"1234"', '"abcd"', '"a1s1"']
>>> list(s[1:-1] for s in '"1234"||"abcd"||"a1s1"'.split('||'))
['1234', 'abcd', 'a1s1']
csv is only needed if the delimiter is found within the fields, or to delete optional quotes around fields

Categories