I am scraping data, however I want the csv to write at column 2 to 12 or B-L rather than 1-4. Thus far I have simply been scraping langs_text to the column though this is slow. Is there a better method that does not take such a long time so I can start at column 2?
I have tried to include the below however it simply does not write any values to csv and continues job.
E.g
langs11 = ("potato")
langs11_text = []
langs11 = []
langs11_text = []
time.sleep(0)
FILE LOCATION = 'C:\\Users\\Bain3\\Aperture.csv'
with open((FILE LOCATION), 'a', newline='', encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in zip(langs11_text, langs_text, langs11_text, langs11_text, langs11_text, langs11_text, langs1_text, langs2_text, elem_href, langs11_text):
print(row)
writer.writerow(row)
What you need is something like below
for row in zip(langs_text, langs2_text, langs3_text):
data = ["","","","","","","","","","","",""]
data[1] = row[0]
data[4] = row[1]
data[6] = href
data[7] = row[2]
writer.writerow(data)
Related
Hey I'm working on this project where I take this text and translate it and store it back into the same CSV file. The next open column is at index 10 or Column K. I've been trying to write the data but I just can't get it.
Reading works fine. I tried to do all this into single while loop but I couldn't get it to work. Sorry for any formatting errors!
from googletrans import Translator
import csv
translater = Translator()
f = open("#ElNuevoDia.csv", "r+")
csv_f = csv.reader(f)
csv_wf = csv.writer(f)
tmp = {}
x = 0
for row in csv_f:
tmp[x] = translater.translate(row[4], dest="en")
#print(tmp[x].text)
#print("\n")
#print(tmp[x].text)
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in csv_wf:
csv_wf[10].writerow(tmp[x].text)
f.close()
You should update row in reader and then write it back (as you mentioned in the comment, writer is not iterable). Something like that (part of your code):
for row in csv_f:
row[10] = translater.translate(row[4], dest="en")
tmp[x] = row
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in tmp:
csv_wf.writerow(row)
f.close()
Edit 1:
For text variable you can do that:
row[10] = translater.translate(row[4], dest="en").text
and you can write it back in one step:
csv_wf.writerows(tmp)
i have a large csv file and can not load in memory at a time,i also want to add some columns at the side of csv,so i want to add one column once a time because that does not cost many memory,i use python and pandas,so what can i do for that.
here's my code.
def toCsv(filepath,lists):
i = 0
with open(filepath,'r+') as f:
reader = csv.reader(f)
writer = csv.writer(f)
for row in reader:
print lists
row.append(lists[i])
writer.writerows(row)
i = i+1
Right now, data and filenameVariable are printing the final row when I need all rows. I tried .append but that didn't work. What else could I use?
Here is the data I'm working with:
someCSVfile.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
someCSVfile1.csv|cust_no,0|streetaddr,1|city,2|state,3|zip,4|phone_home,5|firstname,6|lastname,7|status,9|
And here's the code so far:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
data = {}
for row in reader:
filenameVariable = row[0]
data = dict(item.split(',') for item in row[1:])
print data
print filenameVariable
#right now its taking the final row. I need all rows
The problem is you are overwriting data each line in the CSV. Instead, all you need to do is have the row[0] as a key in the data dict:
import csv
reader = csv.reader(open('match_log.txt','rb'), dialect='excel', delimiter='|')
filenameVariable = []
data = {}
for row in reader:
filenameVariable.append(row[0])
data[row[0]] = dict(item.split(',') for item in row[1:])
print data
print filenameVariable
I have the following code:
for i in self.jobs:
with open('postcodes.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
if row[0] == self.jobs[i][3]:
self.jobs[i].append((row[1],row[2]))
else:
self.jobs[i].append('lat & lng not available)
My problem is this produces "lat & lng not available" for each row in the csv file, I only want to know if it matches give me the info from the adjacent two rows, if it doesn't, give me the 'lat & lng not available'.
See http://pastebin.com/gX5HtJV4 for full code
SSCCE could be as follows:
reader = [('HP2 4AA', '51.752927', '-0.470095'), ('NE33 3GA', '54.991663', '-1.414911'), ('CV1 1FL','52.409463', '-1.509234')]
selfjobs = ['NE33 3AA', 'CV1 1FL', 'HP2 4AA']
latlng = []
for row in reader:
for i in selfjobs:
if i in row[0]:
latlng.append((row[1],row[2]))
else:
latlng.append(('not available','not available'))
print latlng
Following Martineau's help in the comments, this is the code I ended up with:
for i in self.jobs:
job = self.jobs[i]
postcode = job[3]
home = (54.764919,-1.368824)
with open('postcodes.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
postcode_csv = row[0]
if postcode in postcode_csv:
job.append((row[1], row[2]))
else:
job.append(home)
I think at least part of the problem is that you actually have the following in your pastebin code:
for i in self.jobs:
with open('postcodes.csv', 'rb') as f:
reader = csv.reader(f)
for row in reader:
if row[0] == self.jobs[i][3]:
self.jobs[i].append((row[1], row[2]))
elif self.jobs[i][3] != row[0]:
self.jobs[i].append("nothing")
However, since theiin thefor i in self.jobsloop is itself alist, it can't be used as a index intoself.jobs like that. Instead, I think it would make more sense to be doing something like the following in the loop:
for job in self.jobs:
with open('postcodes.csv', 'rb') as f:
for row in csv.reader(f):
if row[0] == job[3]:
job.append((row[1], row[2]))
break
else: # no match
job.append("nothing")
...which only indexes the fields of data in the rows read in from the csv file. For efficiency, it stops reading the file as soon as it finds a match. If it ever reads through whole file without finding a match, it appends"nothing"to indicate this, which is what theelseclause of the innerforloop is doing.
BTW, it also seems rather inefficient to open and potentially read through the entirepostcodes.csv file for every entry inself.jobs, so you might want to consider reading the whole thing into a dictionary, once, before executing thefor job in self.jobs:loop (assuming the file's not too large for that).
I am trying to create a clean csv file by merging some of variables together from an old file and appending them to a new csv file.
I have no problem running the data the first time. I get the output I want but whenever I try to append the data with a new variable (i.e. new column) it appends the variable to the bottom and the output is wonky.
I have basically been running the same code for each variable, except changing the
groupvariables variable to my desired variables and then using the f2= open('outputfile.csv', "ab") <--- but with an ab for amend. Any help would be appreciated
groupvariables=['x','y']
f2 = open('outputfile.csv', "wb")
writer = csv.writer(f2, delimiter=",")
writer.writerow(("ID","Diagnosis"))
for line in csv_f:
line = line.rstrip('\n')
columns = line.split(",")
tempname = columns[0]
tempindvar = columns[1:]
templist = []
for j in groupvariables:
tempvar=tempindvar[headers.index(j)]
if tempvar != ".":
templist.append(tempvar)
newList = list(set(templist))
if len(newList) > 1:
output = 'nomatch'
elif len(newList) == 0:
output = "."
else:
output = newList[0]
tempoutrow = (tempname,output)
writer.writerow(tempoutrow)
f2.close()
CSV is a line-based file format, so the only way to add a column to an existing CSV file is to read it into memory and overwrite it entirely, adding the new column to each line.
If all you want to do is add lines, though, appending will work fine.
Here is something that might help. I assumed the first field on each row in each csv file is a primary key for the record and can be used to match rows between the two files. The code below reads the records in from one file, stored them in a dictionary, then reads in the records from another file, appended the values to the dictionary, and writes out a new file. You can adapt this example to better fit your actual problem.
import csv
# using python3
db = {}
reader = csv.reader(open('t1.csv', 'r'))
for row in reader:
key, *values = row
db[key] = ','.join(values)
reader = csv.reader(open('t2.csv', 'r'))
for row in reader:
key, *values = row
if key in db:
db[key] = db[key] + ',' + ','.join(values)
else:
db[key] = ','.join(values)
writer = open('combo.csv', 'w')
for key in sorted(db.keys()):
writer.write(key + ',' + db[key] + '\n')