scenario:
I'm trying to extract tweets from twitter, which is working fine,
next I'm trying to merge 10 files into 1(say file = QW).
for line in file:
my_row = [line]
filename = line.rstrip()+"_tweets"+".csv"
if(os.path.exists(filename)):
f = open(filename, "rt")
reader = csv.reader(f, delimiter="\t")
for line in enumerate(reader):
my_row.append(line)
writer.writerow(my_row)
else:
print(""+filename+ " doesnt exist")
my csv file looks like this
and then I will process that one file
problem: I want to read specific column of that CSV(QW) file
I tried row[0]
for row in input_file:
name_list = [] ;score = 0;
name_list.append(row[0])
print(name_list)
for a in row:
if a.find(skill_input) > 0 :
score = score+1;
name_list.append(score)
print(name_list)
writer.writerow([name_list])
and that point I get an error
my csv file looks like this
name_list.append(row[0])
IndexError: list index out of range
Try this
for line in open("csvfile.csv"):
csv_row = line.split(your_delimiter) #returns a list ["1","50","60"]
if not csv_row[k]:
continue
name_list.append(csv_row[k]) # Or csv_row[k] for specific kth column
Related
I am trying to get the unique values from a csv file. Here's an example of the file:
12,life,car,good,exellent
10,gift,truck,great,great
11,time,car,great,perfect
The desired output in the new file is this:
12,10,11
life,gift,time
car,truck
good.great
excellent,great,perfect
Here is my code:
def attribute_values(in_file, out_file):
fname = open(in_file)
fout = open(out_file, 'w')
# get the header line
header = fname.readline()
# get the attribute names
attrs = header.strip().split(',')
# get the distinct values for each attribute
values = []
for i in range(len(attrs)):
values.append(set())
# read the data
for line in fname:
cols = line.strip().split(',')
for i in range(len(attrs)):
values[i].add(cols[i])
# write the distinct values to the file
for i in range(len(attrs)):
fout.write(attrs[i] + ',' + ','.join(list(values[i])) + '\n')
fout.close()
fname.close()
The code currently outputs this:
12,10
life,gift
car,truck
good,great
exellent,great
12,10,11
life,gift,time
car,car,truck
good,great
exellent,great,perfect
How can I fix this?
You could try to use zip to iterate over the columns of the input file, and then eliminate the duplicates:
import csv
def attribute_values(in_file, out_file):
with open(in_file, "r") as fin, open(out_file, "w") as fout:
for column in zip(*csv.reader(fin)):
items, row = set(), []
for item in column:
if item not in items:
items.add(item)
row.append(item)
fout.write(",".join(row) + "\n")
Result for the example file:
12,10,11
life,gift,time
car,truck
good,great
exellent,great,perfect
I have a JSON file like this: [{"ID": "12345", "Name":"John"}, {"ID":"45321", "Name":"Max"}...] called myclass.json. I used json.load library to get "ID" and "Name" values.
I have another .txt file with the content below. File name is list.txt:
Student,12345,Age 14
Student,45321,Age 15
.
.
.
I'm trying to create a script in python that compares the two files line by line and replace the student ID for the students name in list.txt file, so the new file would be:
Student,John,Age 14
Student,Max,Age 15
.
.
Any ideas?
My code so far:
import json
with open('/myclass.json') as f:
data = json.load(f)
for key in data:
x = key['Name']
z = key['ID']
with open('/myclass.json', 'r') as file1:
with open('/list.txt', 'r+') as file2:
for line in file2:
x = z
try this:
import json
import csv
with open('myclass.json') as f:
data = json.load(f)
with open('list.txt', 'r') as f:
reader = csv.reader(f)
rows = list(reader)
def get_name(id_):
for item in data:
if item['ID'] == id_:
return item["Name"]
with open('list.txt', 'w') as f:
writer = csv.writer(f)
for row in rows:
name = get_name(id_ = row[1])
if name:
row[1] = name
writer.writerows(rows)
Keep in mind that this script technically does not replace the items in the list.txt file one by one, but instead reads the entire file in and then overwrites the list.txt file entirely and constructs it from scratch. I suggest making a back up of list.txt or naming the new txt file something different incase the program crashes from some unexpected input.
One option is individually open each file for each mode while appending a list for matched ID values among those two files as
import json
with open('myclass.json','r') as f_in:
data = json.load(f_in)
j=0
lis=[]
with open('list.txt', 'r') as f_in:
for line in f_in:
if data[j]['ID']==line.split(',')[1]:
s = line.replace(line.split(',')[1],data[j]['Name'])
lis.append(s)
j+=1
with open('list.txt', 'w') as f_out:
for i in lis:
f_out.write(i)
I am making a Twitter sentiment analysis. After cleaning the tweets, when I try to write tweets into .txt file from .csv it writes only first tweets in text file and repeats until the end. Consider the following code
f = open('PanamaCase.csv', 'r')
with f:
reader = csv.DictReader(f)
i=0
for row in reader:
row=str(row['Tweets'])
#print(type(row))
print(clean(row))
txt = open('cleanedTweets.txt','w')
#line = 0
with txt:
reader2 = csv.DictReader(f)
for line in reader2:
txt.write(clean(row) + "\n")
I think your problem is that you are reading the input file twice in your code (or actually 1 + once for each line).
I suggest to try:
f = open('PanamaCase.csv', 'r')
with f:
txt = open('cleanedTweets.txt','w')
with txt:
for row in reader:
row=str(row['Tweets'])
print(clean(row))
txt.write(clean(row) + "\n")
I'm trying to append a random number in every line in csv file in row[2] i get the information from the original file then i write it to a new csv file + i append the second row with the random number
but i get the same random number in every line when i run the script (
i have the read file which contain exp:
car,golf
when i write this data to new csv file and append the second row i get the same number for every line
car,golf,1777
car,bmw,1777
car,m3,1777
how can i fix this so i can have random number in every line
data = []
with open("read.csv", "r") as the_file:
sid_row = 5000
for i in range(sid_row):
line = str(random.randint(1,5000))
sid = line
reader = csv.reader(the_file, delimiter=",")
for row in reader:
try:
new_row = [row[0], row[1],sid]
data.append(new_row)
except IndexError as error:
print(error)
pass
with open("Random.csv", "w+") as Ran_file:
writer = csv.writer(Ran_file, delimiter=",")
for new_row in data:
writer.writerow(new_row)
You need a new random number each time, for each row you're processing, something like:
data = []
with open("read.csv", "r") as the_file:
reader = csv.reader(the_file, delimiter=",")
for row in reader:
try:
line = str(random.randint(1,5000))
sid = line
new_row = [row[0], row[1],sid]
data.append(new_row)
except IndexError as error:
print(error)
pass
with open("Random.csv", "w+") as Ran_file:
writer = csv.writer(Ran_file, delimiter=",")
for new_row in data:
writer.writerow(new_row)
I'm using this information (downloaded the file to my computer) http://www.genome.gov/admin/gwascatalog.txt
and wrote this
import csv
path = '/Users/mtchavez/Documents/ALS/Diseasome/GWAS.txt'
read_file = open(path)
reader = csv.reader(read_file, delimiter = '\t')
fieldnames = reader.next()
rows = list(reader)
read_file.close()
write_file = open('datatest.csv', 'wb')
writer = csv.writer(write_file, delimiter = '\t')
writer.writerow(('disease', 'genes'))
disease_gene = dict()
for row in rows:
disease = row[7]
reported_genes = row[13]
but I get an error message:
File "data1.py", line 18, in <module>
disease = row[7]
IndexError: list index out of range
There is an empty line at the end of this csv file and it will create an empty row. Delete the last line and the code works fine.
Try filtering for empty lines:
for row in rows:
if not row: continue
disease = row[7]
reported_genes = row[13]
Or more specifically, filter for the desired length:
for row in rows:
if len(row) != EXPECTED_LENGTH_OF_RECORD: continue
disease = row[7]
reported_genes = row[13]