Hey I'm working on this project where I take this text and translate it and store it back into the same CSV file. The next open column is at index 10 or Column K. I've been trying to write the data but I just can't get it.
Reading works fine. I tried to do all this into single while loop but I couldn't get it to work. Sorry for any formatting errors!
from googletrans import Translator
import csv
translater = Translator()
f = open("#ElNuevoDia.csv", "r+")
csv_f = csv.reader(f)
csv_wf = csv.writer(f)
tmp = {}
x = 0
for row in csv_f:
tmp[x] = translater.translate(row[4], dest="en")
#print(tmp[x].text)
#print("\n")
#print(tmp[x].text)
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in csv_wf:
csv_wf[10].writerow(tmp[x].text)
f.close()
You should update row in reader and then write it back (as you mentioned in the comment, writer is not iterable). Something like that (part of your code):
for row in csv_f:
row[10] = translater.translate(row[4], dest="en")
tmp[x] = row
x = x + 1
x = 0
f.close()
csv_wf = csv.writer(f)
for row in tmp:
csv_wf.writerow(row)
f.close()
Edit 1:
For text variable you can do that:
row[10] = translater.translate(row[4], dest="en").text
and you can write it back in one step:
csv_wf.writerows(tmp)
Related
I have two csv files simulating patient data that I need to read in and compare.
Without using Pandas, I need to sort the second file by Subject_ID and append the sex of the patient to the first csv file. I don't know where to start without using Pandas. Any ideas?
So far my plan is to somehow work with a dictionary to try to re-group the second file.
with open('Patient_Sex.csv','r') as file_sex, open('Patient_FBG.csv','r') as file_fbg:
patient_reader = csv.DictReader(file_sex)
fbg_reader = csv.DictReader(file_fbg)
After this, it gets really muddy for me.
I think this is what you are looking for, assuming you are working with .csv files based on the data that you posted.
Basically you can just parse the files as JSON and then you can manipulate them easily.
import csv
import json
gender_data = []
full_data = []
with open("stack/new.csv", encoding="utf-8") as csvf:
csvReader = csv.DictReader(csvf)
for row in csvReader:
gender_data.append(row)
with open("stack/info.csv", encoding="utf-8") as csvf:
csvReader = csv.DictReader(csvf)
for row in csvReader:
full_data.append(row)
for x in gender_data:
for y in full_data:
if x["SUBJECT_ID"] == y["SUBJECT_ID"]:
y["SEX"] = x["SEX"]
f = csv.writer(open("stack/test.csv", "w+"))
f.writerow(["SUBJECT_ID", "YEAR_1", "YEAR_2", "YEAR_3", "SEX"])
for x in full_data:
f.writerow(
[
x["SUBJECT_ID"],
x["YEAR_1"],
x["YEAR_2"],
x["YEAR_3"],
x["SEX"] if "SEX" in x else "",
]
)
You can do this without importing any modules by reading the csv files as lists of lines, and append lines in the main file with the sex upon a matching name:
with open('test1.csv') as csvfile:
main_csv = [i.rstrip() for i in csvfile.readlines()]
with open('test2.csv') as csvfile:
second_csv = [i.rstrip() for i in csvfile.readlines()]
for n, i in enumerate(main_csv):
if n == 0:
main_csv[n] = main_csv[n] + ',SEX'
else:
patient = i.split(',')[0]
hits = [line.split(',')[-1] for line in second_csv if line.startswith(patient)]
if hits:
main_csv[n] = main_csv[n] + ',' + hits[0]
else:
main_csv[n] = main_csv[n] + ','
with open('test.csv', 'w') as f:
f.write('\n'.join(main_csv))
I have a csv file where I wish to perform a sentiment analysis on this dataset containing survey data.
So far this is what I have tried (thanks to Rupin from a previous question!):
import csv
from collections import Counter
with open('myfile.csv', 'r') as f:
reader = csv.reader(f, delimiter='\t')
alist = []
iterreader = iter(reader)
next(iterreader, None)
for row in iterreader:
clean_rows = row[0].replace(",", " ").rsplit()
alist.append(clean_rows)
word_count = Counter(clean_rows)
mostWcommon = word_count.most_common(3)
print(mostWcommon)
The output is nearly okay, the only problem that I have is that Python is splitting in different rows of a list, hence I have something like this as my output:
['experienced', 1]
['experienced, 1]
['experienced, 1]
I wish to split everything in one row so that I can have the real word frequency... Any suggestions?
Thanks!
You are creating a new Counter for each row and printing only that result. If you want a total count, you can create the counter outside the rows loop and update it with data from each row:
import csv
from collections import Counter
with open('myfile.csv', 'r') as f:
reader = csv.reader(f, delimiter='\t')
alist = []
iterreader = iter(reader)
next(iterreader, None)
c = Conter()
for row in iterreader:
clean_rows = row[0].replace(",", " ").rsplit()
alist.append(clean_rows)
c.update(clean_rows)
mostWcommon = word_count.most_common(3)
print(mostWcommon)
I am scraping data, however I want the csv to write at column 2 to 12 or B-L rather than 1-4. Thus far I have simply been scraping langs_text to the column though this is slow. Is there a better method that does not take such a long time so I can start at column 2?
I have tried to include the below however it simply does not write any values to csv and continues job.
E.g
langs11 = ("potato")
langs11_text = []
langs11 = []
langs11_text = []
time.sleep(0)
FILE LOCATION = 'C:\\Users\\Bain3\\Aperture.csv'
with open((FILE LOCATION), 'a', newline='', encoding="utf-8") as outfile:
writer = csv.writer(outfile)
for row in zip(langs11_text, langs_text, langs11_text, langs11_text, langs11_text, langs11_text, langs1_text, langs2_text, elem_href, langs11_text):
print(row)
writer.writerow(row)
What you need is something like below
for row in zip(langs_text, langs2_text, langs3_text):
data = ["","","","","","","","","","","",""]
data[1] = row[0]
data[4] = row[1]
data[6] = href
data[7] = row[2]
writer.writerow(data)
I have just started learning csv module recently. Suppose we have this CSV file:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
And we want to read this file and create a dictionary containing names (as keys) and totals of each column (as values). So in this case we would end up with:
d = {"John" : 284, "Jeff" : 275, "Judy" : 258}
So I wrote this code which apparently works well, but I am not satisfied with it and was wondering if anyone knows of better or more efficient/elegant way of doing this. Because there's just too many lines in there :D (Or maybe a way we could generalize it a bit - i.e. we would not know how many fields are there.)
d = {}
import csv
with open("file.csv") as f:
readObject = csv.reader(f)
totals0 = 0
totals1 = 0
totals2 = 0
totals3 = 0
currentRowTotal = 0
for row in readObject:
currentRowTotal += 1
if currentRowTotal == 1:
continue
totals0 += int(row[0])
totals1 += int(row[1])
totals2 += int(row[2])
if row[3] == "":
totals3 += 0
f.close()
with open(filename) as f:
readObject = csv.reader(f)
currentRow = 0
for row in readObject:
while currentRow <= 0:
d.update({row[0] : totals0})
d.update({row[1] : totals1})
d.update({row[2] : totals2})
d.update({row[3] : totals3})
currentRow += 1
return(d)
f.close()
Thanks very much for any answer :)
Not sure if you can use pandas, but you can get your dict as follows:
import pandas as pd
df = pd.read_csv('data.csv')
print(dict(df.sum()))
Gives:
{'Jeff': 275, 'Judy': 258, 'John': 284}
Use the top row to figure out what the column headings are. Initialize a dictionary of totals based on the headings.
import csv
with open("file.csv") as f:
reader = csv.reader(f)
titles = next(reader)
while titles[-1] == '':
titles.pop()
num_titles = len(titles)
totals = { title: 0 for title in titles }
for row in reader:
for i in range(num_titles):
totals[titles[i]] += int(row[i])
print(totals)
Let me add that you don't have to close the file after the with block. The whole point of with is that it takes care of closing the file.
Also, let me mention that the data you posted appears to have four columns:
John,Jeff,Judy,
21,19,32,
178,182,169,
85,74,57,
That's why I did this:
while titles[-1] == '':
titles.pop()
It's a little dirty, but try this (operating without the empty last column):
#!/usr/bin/python
import csv
import numpy
with open("file.csv") as f:
reader = csv.reader(f)
headers = next(reader)
sums = reduce(numpy.add, [map(int,x) for x in reader], [0]*len(headers))
for name, total in zip(headers,sums):
print("{}'s total is {}".format(name,total))
Base on Michasel's solution, I would try with less code and less variables and no dependency on Numpy:
import csv
with open("so.csv") as f:
reader = csv.reader(f)
titles = next(reader)
sum_result = reduce(lambda x,y: [ int(a)+int(b) for a,b in zip(x,y)], list(reader))
print dict(zip(titles, sum_result))
I need a way to get a specific item(field) of a CSV. Say I have a CSV with 100 rows and 2 columns (comma seperated). First column emails, second column passwords. For example I want to get the password of the email in row 38. So I need only the item from 2nd column row 38...
Say I have a csv file:
aaaaa#aaa.com,bbbbb
ccccc#ccc.com,ddddd
How can I get only 'ddddd' for example?
I'm new to the language and tried some stuff with the csv module, but I don't get it...
import csv
mycsv = csv.reader(open(myfilepath))
for row in mycsv:
text = row[1]
Following the comments to the SO question here, a best, more robust code would be:
import csv
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
for row in mycsv:
text = row[1]
............
Update: If what the OP actually wants is the last string in the last row of the csv file, there are several aproaches that not necesarily needs csv. For example,
fulltxt = open(mifilepath, 'rb').read()
laststring = fulltxt.split(',')[-1]
This is not good for very big files because you load the complete text in memory but could be ok for small files. Note that laststring could include a newline character so strip it before use.
And finally if what the OP wants is the second string in line n (for n=2):
Update 2: This is now the same code than the one in the answer from J.F.Sebastian. (The credit is for him):
import csv
line_number = 2
with open(myfilepath, 'rb') as f:
mycsv = csv.reader(f)
mycsv = list(mycsv)
text = mycsv[line_number][1]
............
#!/usr/bin/env python
"""Print a field specified by row, column numbers from given csv file.
USAGE:
%prog csv_filename row_number column_number
"""
import csv
import sys
filename = sys.argv[1]
row_number, column_number = [int(arg, 10)-1 for arg in sys.argv[2:])]
with open(filename, 'rb') as f:
rows = list(csv.reader(f))
print rows[row_number][column_number]
Example
$ python print-csv-field.py input.csv 2 2
ddddd
Note: list(csv.reader(f)) loads the whole file in memory. To avoid that you could use itertools:
import itertools
# ...
with open(filename, 'rb') as f:
row = next(itertools.islice(csv.reader(f), row_number, row_number+1))
print row[column_number]
import csv
def read_cell(x, y):
with open('file.csv', 'r') as f:
reader = csv.reader(f)
y_count = 0
for n in reader:
if y_count == y:
cell = n[x]
return cell
y_count += 1
print (read_cell(4, 8))
This example prints cell 4, 8 in Python 3.
There is an interesting point you need to catch about csv.reader() object. The csv.reader object is not list type, and not subscriptable.
This works:
for r in csv.reader(file_obj): # file not closed
print r
This does not:
r = csv.reader(file_obj)
print r[0]
So, you first have to convert to list type in order to make the above code work.
r = list( csv.reader(file_obj) )
print r[0]
Finaly I got it!!!
import csv
def select_index(index):
csv_file = open('oscar_age_female.csv', 'r')
csv_reader = csv.DictReader(csv_file)
for line in csv_reader:
l = line['Index']
if l == index:
print(line[' "Name"'])
select_index('11')
"Bette Davis"
Following may be be what you are looking for:
import pandas as pd
df = pd.read_csv("table.csv")
print(df["Password"][row_number])
#where row_number is 38 maybe
import csv
inf = csv.reader(open('yourfile.csv','r'))
for row in inf:
print row[1]