I am saving a list to a csv using the writerow function from csv module. Something went wrong when I opened the final file in MS office Excel.
Before I encounter this issue, the main problem I was trying to deal with is getting the list saved to each row. It was saving each line into a cell in row1. I made some small changes, now this happened. I am certainly very confused as a novice python guy.
import csv
inputfile = open('small.csv', 'r')
header_list = []
header = inputfile.readline()
header_list.append(header)
input_lines = []
for line in inputfile:
input_lines.append(line)
inputfile.close()
AA_list = []
for i in range(0,len(input_lines)):
if (input_lines[i].split(',')[4]) == 'AA':#column4 has different names including 'AA'
AA_list.append(input_lines[i])
full_list = header_list+AA_list
resultFile = open("AA2013.csv",'w+')
wr = csv.writer(resultFile, delimiter = ',')
wr.writerow(full_list)
Thanks!
UPDATE:
The full_list look like this: ['1,2,3,"MEM",...]
UPDATE2(APR.22nd):
Now I got three cells of data(the header in A1 and the rest in A2 and A3 respectively) in the same row. Apparently, the newline signs are not working for three items in one big list. I think the more specific question now is how do I save a list of records with '\n' behind each record to csv.
UPDATE3(APR.23rd):
original file
Importing the csv module is not enough, you need to use it as well. Right now, you're appending each line as an entire string to your list instead of a list of fields.
Start with
with open('small.csv', 'rb') as inputfile:
reader = csv.reader(inputfile, delimiter=",")
header_list = next(reader)
input_lines = list(reader)
Now header_list contains all the headers, and input_lines contains a nested list of all the rows, each one split into columns.
I think the rest should be pretty straightforward.
append() appends a list at the end of another list. So when you write header_list.append(header), it takes header as a list of characters and appends to header_list. You should write
headers = header.split(',')
header_list.append(headers)
This would split the header row by commas and headers would be the list of header words, then append them properly after header_list.
The same thing goes for AA_list.append(input_lines[i]).
I figured it out.
The different between [val], val, and val.split(",") in the writerow bracket was:
[val]: a string containing everything taking only the first column in excel(header and "2013, 1, 2,..." in A1, B1, C1 and so on ).
val: each letter or comma or space(I forgot the technical terms) take a cell in excel.
val.split(","): comma split the string in [val], and put each string separated by comma into an excel cell.
Here is what I found out: 1.the right way to export the flat list to each line by using with syntax, 2.split the list when writing row
csvwriter.writerow(JD.split())
full_list = header_list+AA_list
with open("AA2013.csv",'w+') as resultFile:
wr = csv.writer(resultFile, delimiter= ",", lineterminator = '\n')
for val in full_list:
wr.writerow(val.split(','))
The wanted output
Please correct my mistakenly used term and syntax! Thanks.
Related
I am trying to print the output of a webscrape project into a CSV file.
So for example I have this list of supplier names under a list called SUPP_NAME: (just an example, the actual list has 50 items inside it)
['"FULIAN\\u0020\\u0028M\\u0029\\u0020SENDIRIAN\\u0020BERHAD"', '"RISO\\u0020SEKKEN\\u0020SDN.\\u0020BHD."', '"NATURE\\u0020PROFUSION\\u0020SDN.\\u0020BHD."']
and a list of numbers indicated years, under a list called SUPP_YEARS:
['"9"', '"4"', '"1"', '"1"']
My plan is to put them into a CSV, and then read them back in as a pandas dataframe, then perform decoding to get a bunch of values.
Code so far:
import csv
with open('output3.csv' , 'w') as f:
writer = csv.writer(f)
headers = "Supplier_name,Years\n"
f.write(headers)
supp_names = re.findall(r'("supplierName"):("\w+.+")', results[17].text)
supp_years = re.findall(r'("supplierYear"):("\d+")', results[17].text)
SUPP_NAME = []
for title, name in supp_names:
print (name)
SUPP_NAME.append(name)
#f.write(name + "\n")
SUPP_YEAR = []
for year,number in supp_years:
print (number)
SUPP_YEAR.append(number)
#f.write(number + "\n")
writer.writerow([SUPP_NAME, SUPP_YEAR])
However, what I get is that under the Supplier_name and Years columns, one cell under each of these 2 columns is filled with a LONG list of items still contained in the list, instead of the items separated one by one.
What am I doing wrong? Thanks in advance for answering.
The two re.findall() calls are giving you lists of items (hopefully both the same length). The idea is to then then extract an element from each and write this to your output file. Python has a useful function called zip() to do this. You give it both of your lists and the loop with give you an item from each on each iteration:
import csv
with open('output3.csv', 'w' newline='') as f_output:
writer = csv.writer(f_output)
writer.writerow(["Supplier_name" , "Years"])
supp_names = re.findall(r'("supplierName"):("\w+.+")', results[17].text)
supp_years = re.findall(r'("supplierYear"):("\d+")', results[17].text)
for name, year in zip(supp_names, supp_years):
writer.writerow([name, year])
The csv.writer() object is designed to take a list of items and write them to your file with the desired (i.e. comma) delimiter automatically added between them.
I assume you are using Python 3.x? If not you should change the following:
with open('output3.csv', 'wb') as f_output:
I'm looking for a way using python to copy the first column from a csv into an empty file. I'm trying to learn python so any help would be great!
So if this is test.csv
A 32
D 21
C 2
B 20
I want this output
A
D
C
B
I've tried the following commands in python but the output file is empty
f= open("test.csv",'r')
import csv
reader = csv.reader(f,delimiter="\t")
names=""
for each_line in reader:
names=each_line[0]
First, you want to open your files. A good practice is to use the with statement (that, technically speaking, introduces a context manager) so that when your code exits from the with block all the files are automatically closed
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
next you want a loop on the lines of the input file (note the indentation, we are inside the with block), line splitting is automatic when you read a text file with lines separated by newlines…
for line in inpfile:
each line is a string, but you think of it as two fields separated by white space — this situation is so common that strings have a method to deal with this situation (note again the increasing indent, we are in the for loop block)
fields = line.split()
by default .split() splits on white space, but you can use, e.g., split(',') to split on commas, etc — that said, fields is a list of strings, for your first record it is equal to ['A', '32'] and you want to output just the first field in this list… for this purpose a file object has the .write() method, that writes a string, just a string, to the file, and fields[0] IS a string, but we have to add a newline character to it because, in this respect, .write() is different from print().
outfile.write(fields[0]+'\n')
That's all, but if you omit my comments it's 4 lines of code
with open('test.csv') as inpfile, open('out.csv', 'w') as outfile:
for line in inpfile:
fields = line.split()
outfile.write(fields[0]+'\n')
When you are done with learning (some) Python, ask for an explanation of this...
with open('test.csv') as ifl, open('out.csv', 'w') as ofl:
ofl.write('\n'.join(line.split()[0] for line in ifl))
Addendum
The csv module in such a simple case adds the additional conveniences of
auto-splitting each line into a list of strings
taking care of the details of output (newlines, etc)
and when learning Python it's more fruitful to see how these steps can be done using the bare language, or at least that it is my opinion…
The situation is different when your data file is complex, has headers, has quoted strings possibly containing quoted delimiters etc etc, in those cases the use of csv is recommended, as it takes into account all the gory details. For complex data analisys requirements you will need other packages, not included in the standard library, e.g., numpy and pandas, but that is another story.
This answer reads the CSV file, understanding a column to be demarked by a space character. You have to add the header=None otherwise the first row will be taken to be the header / names of columns.
ss is a slice - the 0th column, taking all rows as denoted by :
The last line writes the slice to a new filename.
import pandas as pd
df = pd.read_csv('test.csv', sep=' ', header=None)
ss = df.ix[:, 0]
ss.to_csv('new_path.csv', sep=' ', index=False)
import csv
reader = csv.reader(open("test.csv","rb"), delimiter='\t')
writer = csv.writer(open("output.csv","wb"))
for e in reader:
writer.writerow(e[0])
The best you can do is create a empty list and append the column and then write that new list into another csv for example:
import csv
def writetocsv(l):
#convert the set to the list
b = list(l)
print (b)
with open("newfile.csv",'w',newline='',) as f:
w = csv.writer(f, delimiter=',')
for value in b:
w.writerow([value])
adcb_list = []
f= open("test.csv",'r')
reader = csv.reader(f,delimiter="\t")
for each_line in reader:
adcb_list.append(each_line)
writetocsv(adcb_list)
hope this works for you :-)
I have a .csv formatted .txt file. I am deliberating over the best manner in which to .capitalize the text in the first column.
.capitalize() is a string method, so I considered the following; I would need to open the file, convert the data to a list of strings, capitalize the the required word and finally write the data back to file.
To achieve this, I did the following:
newGuestList = []
with open("guestList.txt","r+") as guestFile :
guestList = csv.reader(guestFile)
for guest in guestList :
for guestInfo in guest :
capitalisedName = guestInfo.capitalize()
newGuestList.append(capitalisedName)
Which gives the output:
[‘Peter’, ‘35’, ‘ spain’, ‘Caroline’, ‘37’, ‘france’, ‘Claire’,’32’, ‘ sweden’]
The problem:
Firstly; in order to write this new list back to file, I will need to convert it to a string. I can achieve this using the .join method. However, how can I introduce a newline, \n, after every third word (the country) so that each guest has their own line in the text file?
Secondly; this method, of nested for loops etc. seems highly convoluted, is there a cleaner way?
My .txt file:
peter, 35, spain\n
caroline, 37, france\n
claire, 32, sweden\n
You don't need to split the lines, since the first caracter of the first word is the first caracter of the line :
with open("lst.txt","r") as guestFile :
lines=guestFile.readlines()
newlines=[line.capitalize() for line in lines]
with open("lst.txt","w") as guestFile :
guestFile.writelines(newlines)
You can just use a CSV reader and writer and access the element you want to capitalize from the list.
import csv
import os
inp = open('a.txt', 'r')
out = open('b.txt', 'w')
reader = csv.reader(inp)
writer = csv.writer(out)
for row in reader:
row[0] = row[0].capitalize()
writer.writerow(row)
inp.close()
out.close()
os.rename('b.txt', 'a.txt') # if you want to keep the same name
This is my current code, the current issue I have is that search returns nothing. How do I achieve a string value for this variable.
count = 0
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
row_count = sum(1 for row in myFile)
print("aba")
for x in range(row_count):
print("aaa")
for row in myFile:
search = row[count].readline
print(search)
if self.delName.get("1.0","end-1c") in search:
count = count + 1
else:
newFile.write(row[count])
count = count + 1
The output is:
aba
aaa
aaa
So it runs through it twice, which is good as my userDatabase consists of two rows of data.
The file in question has this data:
"lukefinney","0000000","0000000","a"
"nictaylor","0000000","0000000","a"
You cannot just iterate over an open file more than once without rewinding the file object back to the start.
You'll need to add a file.seek(0) call to put the file reader back to the beginning each time you want to start reading from the first row again:
myFile.seek(0)
for row in myFile:
The rest of your code makes little sense; when iterating over a file you get individual lines from the file, so each row is a string object. Indexing into strings gives you new strings with just one character in it; 'foo'[1] is the character 'o', for example.
If you wanted to copy across rows that don't match a string, you don't need to know the row count up front at all. You are not handling a list of rows here, you can look at each row individually instead:
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if filter_string not in row:
newFile.write(row)
This does a sub-string match. If you need to match whole columns, use the csv module to give you individual columns to match against. The module handles the quotes around column values:
import csv
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv", "r", newline='') as myFile:
with open("newFile.csv", "w", newline='') as newFile:
writer = csv.writer(newFile)
for row in csv.reader(myFile):
# row is now a list of strings, like ['lukefinney', '0000000', '0000000', 'a']
if filter_string != row[0]: # test against the first column
# copied across if the first column does not match exactly.
writer.writerow(row)
One problem is that row_count = sum(1 for row in myFile) consumes all rows from myFile. Subsequent reads on myFile will return an empty string which signifies end of file. This means that for loop later in your code where you execute for row in myFile: is not entered because all rows have already been consumed.
A way around this is to add a call to myFile.seek(0) just before for row in myFile:. This will reset the file pointer and the for loop should then work.
It's not very clear from your code what it is that you are trying to do, but it kind of looks like you want to filter out rows that contain a certain string. Try this:
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if self.delName.get("1.0","end-1c") not in row:
newFile.write(row)
I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.