To start...Python noob...
My first goal is to read the first row of a CSV and output. The following code does that nicely.
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = ','):
csvFileArray.append(row)
print(csvFileArray[0])
Output looks like...
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',......
My second and third tasks deal with formatting.
Thus, if I want the print(csvFileArray[0]) output to contain 'double quotes' for the delimiter how best can I handle that?
I'd like to see...
["Date","Time", "CPU001 User%", "CPU001 Sys%",......
I have played with formatting the csvFileArray field and all I can get it to do is to prefix or append data.
I have also looked into the 'dialect', 'quoting', etc., but am just all over the place.
My last task is to add text into each value (into the array).
Example:
["Test Date","New Time", "Red CPU001 User%", "Blue CPU001 Sys%",......
I've researched a number of methods to do this but am awash in the multiple ways.
Should I ditch the Array as this is too constraining?
Looking for direction not necessarily someone to write it for me.
Thanks.
OK.....refined the code a bit and am looking for direction, not direct solution (need to learn).
import csv
with open('ba200952fd69 - Copy.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print (row)
break
The code nicely reads the first line of the CSV and outputs the first row as follows:
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',....
If I want to add formatting to each/any item within that row, would I be performing those actions within the quotes of the print command? Example: If I wanted each item to have double-quotes, or have a prefix of 'XXXX', etc.
I have read through examples of .join type commands, etc., and am sure that there are much easier ways to format print output than I'm aware of.
Again, looking for direction, not immediate solutions.
For your first task, I'd recommend using the next function to grab the first row rather than iterating through the whole csv. Also, it might be useful to take a look at with blocks as they are the standard way of dealing with opening and closing files.
For your second question, it looks like you want to change the format of the print statement. Note that it is printing strings, which is indicated by the single quotes around each element in the array. This has nothing to do with the csv module, but simply because you are print an array of strings. To print with double quotes, you would have to reformat the print statement. You could take a look at this for some ways on doing that.
For your last question, I'd recommend looking at list comprehensions. E.g.,
["Test " + word for word in words].
If words = ["word1", "word2"], then this would return ["Test word1", "Test word2"].
Edit: If you want to add a different value to each value in the array, you could do something similar. Let prefixes be an array of prefixes you want to add to the word in words at the same index location. You could then use the list comprehension:
[prefix + " " + word for prefix, word in zip(prefixes, words)]
I have the following structure in a dictionary:
self._results = {'key1':'1,2,3','key2':'5,6,7'}
I'm using csv writerows() to write the records from the dictionary into a file:
filewriter = csv.writer(
csvfile,
quoting=csv.QUOTE_NONE,
delimiter=',',
quotechar='',
escapechar=' ')
filewriter.writerows(self._results.items())
Unfortunately I get the following:
key1,1 ,2 ,3
key2,4 ,5 ,6
If I remove escapechar or change it to '' I get:
Error: need to escape, but no escapechar set
Any suggestions?
I can remove whitespaces by manipulating the CSV file afterwards, but ideally I would like to do all the text manipulation in one place.
You disabled quoting. So without an escape character, it can't output the commas; they'd be mistaken for field separators in the CSV format. You need to provide some sort of escape character, or allow quoting (allowing quoting with QUOTE_MINIMAL is the default).
If your goal was to have 1, 2 and 3 be separate fields in the result, you shouldn't be storing them as a pre-comma separated string in the dict, because that's csv's job, and trying to do its job for it just means fighting csv and risking logic errors that make it impossible to round trip the data correctly.
If they're a single logical field, allow quoting. If they're more than one field, build the dict as a dict of lists (defaultdict(list) is helpful if the values are being built piecemeal), then output a raw sequence, not partially preformatted text:
# Define values as lists, not strings
self._results = {'key1':[1,2,3],'key2':[5,6,7]}
# Wrap k in list and concatenate v to make four item list
filewriter.writerows([k] + v for k, v in self._results.items())
I'm looking for a way using python to copy the first column from a csv into an empty file. I'm trying to learn python so any help would be great!
So if this is test.csv
A 32
D 21
C 2
B 20
I want this output
A
D
C
B
I've tried the following commands in python but the output file is empty
f= open("test.csv",'r')
import csv
reader = csv.reader(f,delimiter="\t")
names=""
for each_line in reader:
names=each_line[0]
First, you want to open your files. A good practice is to use the with statement (that, technically speaking, introduces a context manager) so that when your code exits from the with block all the files are automatically closed
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
next you want a loop on the lines of the input file (note the indentation, we are inside the with block), line splitting is automatic when you read a text file with lines separated by newlines…
for line in inpfile:
each line is a string, but you think of it as two fields separated by white space — this situation is so common that strings have a method to deal with this situation (note again the increasing indent, we are in the for loop block)
fields = line.split()
by default .split() splits on white space, but you can use, e.g., split(',') to split on commas, etc — that said, fields is a list of strings, for your first record it is equal to ['A', '32'] and you want to output just the first field in this list… for this purpose a file object has the .write() method, that writes a string, just a string, to the file, and fields[0] IS a string, but we have to add a newline character to it because, in this respect, .write() is different from print().
outfile.write(fields[0]+'\n')
That's all, but if you omit my comments it's 4 lines of code
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
for line in inpfile:
fields = line.split()
outfile.write(fields[0]+'\n')
When you are done with learning (some) Python, ask for an explanation of this...
with open('test.csv') as ifl, open('out.csv', 'w') as ofl:
ofl.write('\n'.join(line.split()[0] for line in ifl))
Addendum
The csv module in such a simple case adds the additional conveniences of
auto-splitting each line into a list of strings
taking care of the details of output (newlines, etc)
and when learning Python it's more fruitful to see how these steps can be done using the bare language, or at least that it is my opinion…
The situation is different when your data file is complex, has headers, has quoted strings possibly containing quoted delimiters etc etc, in those cases the use of csv is recommended, as it takes into account all the gory details. For complex data analisys requirements you will need other packages, not included in the standard library, e.g., numpy and pandas, but that is another story.
This answer reads the CSV file, understanding a column to be demarked by a space character. You have to add the header=None otherwise the first row will be taken to be the header / names of columns.
ss is a slice - the 0th column, taking all rows as denoted by :
The last line writes the slice to a new filename.
import pandas as pd
df = pd.read_csv('test.csv', sep=' ', header=None)
ss = df.ix[:, 0]
ss.to_csv('new_path.csv', sep=' ', index=False)
import csv
reader = csv.reader(open("test.csv","rb"), delimiter='\t')
writer = csv.writer(open("output.csv","wb"))
for e in reader:
writer.writerow(e[0])
The best you can do is create a empty list and append the column and then write that new list into another csv for example:
import csv
def writetocsv(l):
#convert the set to the list
b = list(l)
print (b)
with open("newfile.csv",'w',newline='',) as f:
w = csv.writer(f, delimiter=',')
for value in b:
w.writerow([value])
adcb_list = []
f= open("test.csv",'r')
reader = csv.reader(f,delimiter="\t")
for each_line in reader:
adcb_list.append(each_line)
writetocsv(adcb_list)
hope this works for you :-)
I have got a file with the following lines
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f508f-e7c8-32b8-e044-0003ba298018","municipalityCode":"0766","municipalityName":"Hedensted","streetCode":"0072","streetName":"Værnegården","streetBuildingIdentifier":"13","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"8000","districtName":"Århus","presentationString":"Værnegården 13, 8000 Århus","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(553564 6179299)","x":553564,"y":6179299}]}
I want to transform every line into a csv readable file with headers. Like the following
status,message,data,addressAccessId,municipalityCode,municipalityName,streetCode,streetName,streetBuildingIdentifier,mailDeliverySublocationIdentifier,districtSubDivisionIdentifier,postCodeIdentifier,districtName,presentationString,addressSpecificCount,validCoordinates,geometryWkt,x,y
OK,OK,data:type,addressAccessType,0a3f508f-e7c8-32b8-e044-0003ba298018,0766,Hedensted,0072,Værnegården,13,,,8000,Århus,Værnegården 13, 8000 Århus,1,true,POINT553564 6179299,553564,6179299
How do I accomplish that? Code and explanation are very welcome. So far this is what I have come up with the following from this example:(How can I convert JSON to CSV?)
x = json.loads(x)
f = csv.writer(open('test.csv', 'wb+'))
# Write CSV Header, If you dont need that, remove this line
f.writerow(['status', 'message', 'type', 'addressAccessId', 'municipalityCode','municipalityName','streetCode','streetName','streetBuildingIdentifier','mailDeliverySublocationIdentifier','districtSubDivisionIdentifier','postCodeIdentifier','districtName','presentationString','addressSpecificCount','validCoordinates','geometryWkt','x','y'])
for x in x:
f.writerow([x['status'],
x['message'],
x['data']['type'],
x['data']['addressAccessId'],
x['data']['municipalityCode'],
x['data']['municipalityName'],
x['data']['streetCode'],
x['data']['streetName'],
x['data']['streetBuildingIdentifier'],
x['data']['mailDeliverySublocationIdentifier'],
x['data']['districtSubDivisionIdentifier'],
x['data']['postCodeIdentifier'],
x['data']['districtName'],
x['data']['presentationString'],
x['data']['addressSpecificCount'],
x['data']['validCoordinates'],
x['data']['geometryWkt'],
x['data']['x'],
x['data']['y']])
I have looked through and tried a lot of other solutions, including DictWriter, replace() and translate() to remove characthers but have not yet been able to transform the line to my need. The purpose being able to select the fields that are output into a new file, and transforming x and y to a new coordinate system. But for now Im just trying to parse the above line to a csv file. Can anyone offer code and explanation of their code? Thank you very much for your time.
Below are the first few lines of my addresses.txt
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5081-e039-32b8-e044-0003ba298018","municipalityCode":"0265","municipalityName":"Roskilde","streetCode":"0831","streetName":"Brønsager","streetBuildingIdentifier":"69","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Svogerslev","postCodeIdentifier":"4000","districtName":"Roskilde","presentationString":"Brønsager 69, 4000 Roskilde","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(690026 6169309)","x":690026,"y":6169309}]}
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5089-ecab-32b8-e044-0003ba298018","municipalityCode":"0461","municipalityName":"Odense","streetCode":"9505","streetName":"Vægtens Kvarter","streetBuildingIdentifier":"271","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Holluf Pile","postCodeIdentifier":"5220","districtName":"Odense SØ","presentationString":"Vægtens Kvarter 271, 5220 Odense SØ","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(592191 6135829)","x":592191,"y":6135829}]}
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f507c-adc3-32b8-e044-0003ba298018","municipalityCode":"0165","municipalityName":"Albertslund","streetCode":"0445","streetName":"Skyttehusene","streetBuildingIdentifier":"33","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"2620","districtName":"Albertslund","presentationString":"Skyttehusene 33, 2620 Albertslund","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(711079 6174741)","x":711079,"y":6174741}]}
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f509c-7f57-32b8-e044-0003ba298018","municipalityCode":"0851","municipalityName":"Aalborg","streetCode":"5205","streetName":"Løvstikkevej","streetBuildingIdentifier":"36","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"","postCodeIdentifier":"9000","districtName":"Aalborg","presentationString":"Løvstikkevej 36, 9000 Aalborg","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(552407 6322490)","x":552407,"y":6322490}]}
{"status":"OK","message":"OK","data":[{"type":"addressAccessType","addressAccessId":"0a3f5098-32a6-32b8-e044-0003ba298018","municipalityCode":"0779","municipalityName":"Skive","streetCode":"0462","streetName":"Landevejen","streetBuildingIdentifier":"52","mailDeliverySublocationIdentifier":"","districtSubDivisionIdentifier":"Håsum","postCodeIdentifier":"7860","districtName":"Spøttrup","presentationString":"Landevejen 52, 7860 Spøttrup","addressSpecificCount":1,"validCoordinates":true,"geometryWkt":"POINT(491515 6269739)","x":491515,"y":6269739}]}
Note that the data key holds a list of dictionaries. x['data']['type'] wouldn't work, but x['data'][0]['type'] would. There might be more than one such dictionary in that list, however. I'll assume you want a CSV row per x['data'] dictionary.
Next, it appears you have a UTF-8 BOM on every line; whatever wrote this was not using UTF-8 encoding correctly. We need to strip this marker, the first 3 characters.
Last, JSON strings are always Unicode data, and you have non-ASCII characters in your data, so you'll have to encode to bytestrings again before passing the data to the CSV writer object.
I'd use csv.DictWriter here, with a pre-defined list of field names:
import codecs
import csv
import json
fields = [
'status', 'message', 'type', 'addressAccessId', 'municipalityCode',
'municipalityName', 'streetCode', 'streetName', 'streetBuildingIdentifier',
'mailDeliverySublocationIdentifier', 'districtSubDivisionIdentifier',
'postCodeIdentifier', 'districtName', 'presentationString', 'addressSpecificCount',
'validCoordinates', 'geometryWkt', 'x', 'y']
with open('test.csv', 'wb') as csvfile, open('jsonfile', 'r') as jsonfile:
writer = csv.DictWriter(csvfile, fields)
writer.writeheader()
for line in jsonfile:
if line.startswith(codecs.BOM_UTF8):
line = line[3:]
entry = json.loads(line)
for item in entry['data']:
row = dict(item, status=entry['status'], message=entry['message'])
row = {k.encode('utf8'): unicode(v).encode('utf8') for k, v in row.iteritems()}
writer.writerow(row)
The row dictionary is basically a copy of each of the dictionaries in the entry['data'] list, with the status and message keys copied over separately. This makes row a flat dictionary instead.
I also read your input file line by line, as you say that each line contains a separate JSON entry.
Open the output file with cvs.DictWriter() and define the output header fields as you specified. Use extrasaction='ignore' and restval='' as options.
Look at Opening A large JSON file in Python with no newlines for csv conversion Python 2.6.6 for help with processing large files as I had a similar question Also look at the question that I link to.
I build a similar type of system from a JSON using appropriate loops.
for example,
def parse_row(currdata):
outx = {}
# currdata is defined earlier to point to the x['data'] dictionary
for eachx in currdata:
outx[eachx] = currdata[eachx]
return outx
where this is in a function with currdata as an argument and called with x['data'][row] as the input argument.
rows = len(x['data'])
for row in range(rows):
outx = parse_row(x['data'][row])
# process the row and create output
This should allow you to set up the parsing properly. I cannot copy the actual code into this answer but this should point you to a solution.
I have got data which looks like:
"1234"||"abcd"||"a1s1"
I am trying to read and write using Python's csv reader and writer.
As the csv module's delimiter is limited to single char, is there any way to retrieve data cleanly? I cannot afford to remove the empty columns as it is a massively huge data set to be processed in time bound manner. Any thoughts will be helpful.
The docs and experimentation prove that only single-character delimiters are allowed.
Since cvs.reader accepts any object that supports iterator protocol, you can use generator syntax to replace ||-s with |-s, and then feed this generator to the reader:
def read_this_funky_csv(source):
# be sure to pass a source object that supports
# iteration (e.g. a file object, or a list of csv text lines)
return csv.reader((line.replace('||', '|') for line in source), delimiter='|')
This code is pretty effective since it operates on one CSV line at a time, provided your CSV source yields lines that do not exceed your available RAM :)
>>> import csv
>>> reader = csv.reader(['"1234"||"abcd"||"a1s1"'], delimiter='|')
>>> for row in reader:
... assert not ''.join(row[1::2])
... row = row[0::2]
... print row
...
['1234', 'abcd', 'a1s1']
>>>
Unfortunately, delimiter is represented by a character in C. This means that it is impossible to have it be anything other than a single character in Python. The good news is that it is possible to ignore the values which are null:
reader = csv.reader(['"1234"||"abcd"||"a1s1"'], delimiter='|')
#iterate through the reader.
for x in reader:
#you have to use a numeric range here to ensure that you eliminate the
#right things.
for i in range(len(x)):
#Odd indexes will be discarded.
if i%2 == 0: x[i] #x[i] where i%2 == 0 represents the values you want.
There are other ways to accomplish this (a function could be written, for one), but this gives you the logic which is needed.
If your data literally looks like the example (the fields never contain '||' and are always quoted), and you can tolerate the quote marks, or are willing to slice them off later, just use .split
>>> '"1234"||"abcd"||"a1s1"'.split('||')
['"1234"', '"abcd"', '"a1s1"']
>>> list(s[1:-1] for s in '"1234"||"abcd"||"a1s1"'.split('||'))
['1234', 'abcd', 'a1s1']
csv is only needed if the delimiter is found within the fields, or to delete optional quotes around fields