I have this code:
with open("wordslist.txt") as f:
words_list = {word.removesuffix("\n") for word in f}
with open("neg.csv") as g:
for tweete in g:
for word in tweete.split():
if word not in words_list:
print(word)
and the output is like this:
gfg
best
gfg
I
am
I
two
two
three
..............
I want to remove the newline (enter) so it will be one big sentence (there are like 4500+ words). How to join the words and remove the newline (replace each newline with space) so it became one big sentence.
I expected the output to be like this:
gfg best gfg I am I two two three..............
The parameter end in python's print() function defaults to \n, so it will wrap automatically.
Default print() function:
#print(end="\n")
print()
Set the parameter end="" to what you want. For example, end="*", then it will be linked with *.
solve:
with open("neg.csv") as g:
for tweete in g:
for word in tweete.split():
if word not in words_list:
print(word, end=" ")
You can append them to a list and than do " ".join(your_list)
Or you can create an empty string x = ""
And in in your iteration do smth like x += word
Here is example for the 1st solution
import csv
# Open the CSV file and read its contents
with open('file.csv', 'r') as csv_file:
reader = csv.reader(csv_file)
next(reader) # Skip the header row
# Initialize an empty list to store the column values
column_values = []
# Retrieve the values from the specified column and append them to the list
for row in reader:
column_values.append(row[0]) # Replace 0 with the index of the desired column
# Create a sentence with whitespace between the words
sentence = ' '.join(column_values)
print(sentence)
Related
I have two files (each indices are separated by a space) :
file1.txt
OTU0001 Archaea
OTU0002 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon
OTU0003 Archaea;Altiarchaeales;uncultured euryarchaeote
OTU0004 Archaea;Bathyarchaeota;uncultured archaeon
OTU0005 Archaea;Diapherotrites;uncultured euryarchaeote
OTU0006 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured
OTU0007 Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome
file2.txt
UniRef90_1 OTU0001 OTU0004 OTU0005 OTU0007
UniRef90_2 OTU0002 OTU0003 OTU0005
UniRef90_3 OTU0004 OTU0006 OTU0007
I would like, in the second file, replace the OTUXXXX by their values from the first file . And I need to keep the Uniref90_X at the beginning of each line. It should like this for the first line of the second file :
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
For the moment, I have created a dictionary for the second file, with the
UniRef90_X as keys and the OTUXXXX as values.
f1=open("file1.txt", "r")
f2=open("file2.txt", "r")
dict={}
for i in f2:
i=i.split(" ")
dict[i[0]]=i[1:]
for j in f1:
j=j.split(" ")
if j[0] in dict.values():
dico[i[0]]=j[1:]
But I don't know how to replace the OTUXXXX with the corresponding values from the first fileny idea?
I would suggest putting the first file into a dictionary. That way, as you read file2, you can look up ids you captured from file1.
The way you have your loops set up, you will read the first record from file2 and enter it into a hash. The key will never match anything from file1. Then you read from file1 and do something there. The next time you read from file2, all of file1 will be exhausted from the first iteration of file2.
Here is an approach that reads file 1 into a dictionary, and when it finds matches in file 2, prints them out.
file1 = {} # declare a dictionary
fin = open('f1.txt', 'r')
for line in fin:
# strip the ending newline
line = line.rstrip()
# only split once
# first part into _id and second part into data
_id, data = line.split(' ', 1)
# data here is a single string possibly containing spaces
# because only split once (above)
file1[_id] = data
fin.close()
fin = open('f2.txt', 'r')
for line in fin:
uniref, *ids = line.split() # here ids is a list (because prepended by *)
print(uniref, end='')
for _id in ids:
if _id in file1:
print(' ', file1[_id], '(#' + _id + ')', end='')
print()
fin.close()
The printout is:
UniRef90_1 Archaea (#OTU0001) Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
UniRef90_2 Archaea;Aenigmarchaeota;Deep Sea Euryarchaeotic Group(DSEG);uncultured archaeon (#OTU0002) Archaea;Altiarchaeales;uncultured euryarchaeote (#OTU0003) Archaea;Diapherotrites;uncultured euryarchaeote (#OTU0005)
UniRef90_3 Archaea;Bathyarchaeota;uncultured archaeon (#OTU0004) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured (#OTU0006) Archaea;Euryarchaeota;Halobacteria;Halobacteriales;Halobacteriaceae;uncultured;marine metagenome (#OTU0007)
First of all, DO NOT NAME YOUR VARIABLES EXACTLY LIKE CLASSES. EVER. Use something like d2 instead.
Then, replace the [1] with [1:]
Then, after importing the first file in a dictionary just like you did with the second one - let's name it d1 - you can combine the values like this:
d3=dict()
for e in d2:
L=list()
for f in d2[e]:
L.append(d1[f])
d3[e]=f(L) #format your list here
Finally, turn it back into a string and write it in a file.
I want to convert all the i (in lower case) to its upper case, i.e. (I). I am merging 2 rows of a csv file and printing it. I want to replace all the individual characters (i) to their uppercase form (I). This should not be applied to other strings in the text file, like is, itself, it, in etc. I have tried, but not getting the desired output. Any help is deeply appreciated.
import csv, string, re, nltk
def process_reqs():
with open('res.csv') as f:
reader = csv.reader(f)
next(reader, None)
global raw_text
with open('raw_res.txt', 'w', encoding = 'utf-8') as f1:
rows = ('"{}."'.format(' '.join(row)) for row in reader)
raw_text = ', '.join(rows)
for word in raw_text.split():
if word == 'i':
raw_text = raw_text.replace(word, "I")
f1.write(raw_text)
print(raw_text)
process_reqs()
You can do a simple string replacement. Just don't forget the spaces, i.e replace (" i ") by (" I ").
I'm trying to create a list of rows that doesn't include 3 specific words.
word_list = ['Banned', 'On_Hold', 'Reviewing']
unknown_row = []
with open('UserFile.csv', newline='') as csvfile:
user_reader = csv.reader(csvfile, delimiter='\t')
for row in user_reader:
joined = ','.join(row)
for i in word_list:
if i in joined:
pass
else:
unknown_row.append(joined)
with open('unkown.csv', 'w', newline='') as output:
writer = csv.writer(output, delimiter=',')
writer.writerows(x.split(',') for x in unknown_row)
Here's the thing, if only one word is included in the word_list, then it works. But if I include two or more words, then it doesn't.
Any ideas?
Thank you
The issue with your code is here:
for i in word_list:
if i in joined:
pass
else:
unknown_row.append(joined)
Right now, if a word from word_list is not found in joined, it will continue the loop, so it will still add the row unless all the words from word_list are found in the row (This wouldn't prevent your code from working with a single "bad word", which you experienced). Instead, you want to short-circuit the loop to break if any word from word_list is found in the row.
You can make use of any here:
if not any(i in joined for i in word_list):
unknown_row.append(joined)
This way, if a single word from word_list is found in joined, the row will not be added.
I have two files. One with word list lets say a.txt and another csv file whose second row are words say b.csv . I want to check if any word from a.txt is in second row of b.csv and print only those lines which are unmatched. Total 3 rows are there in csv file.
what I have so far achieved is printing those lines which have word from word list. But I want exactly the other lines. Here is my code:
reader = csv.reader(open('b.csv', 'rb'))
op = open('a.txt', 'r')
ol = op.readlines()
for row in reader:
for word in ol:
if word==row[1]:
print row[0],row[1],row[2]
Now what do I make it to print the lines which aren't matched ?
Thanks!
The least intrusive solution (i.e. keeping your nested loop) would be something along the lines of
for row in reader:
match = False
for word in ol:
if word==row[1]:
match = True
break
if not match:
print row[0],row[1],row[2]
Or using some more Python goodness:
for row in reader:
for word in ol:
if word==row[1]:
break
else:
print row[0],row[1],row[2]
The else: bit is only executed if the preceding loop ended normally (without ever reaching break).
As suggested by thg435, it's even simpler:
for row in reader:
if row[1] not in ol:
print row[0],row[1],row[2]
I have a question about the best way to get word counts for items in a list.
I have 400+ items indexed in a list. They are of varying lengths. For example, if I enumerate, then I will get:
for index, items in enumerate(my_list):
print index, items
0 fish, line, catch, hook
1 boat, wave, reel, line, fish, bait
.
.
.
Each item will get written into individual rows in an csv file. I would like corresponding word counts to complement this text in the adjacent column. I can find word/token counts just fine using Excel, but I would like to be able to do this in Python so I don't have to keep going back and forth between programs to process my data.
I'm sure there are several ways to do this, but I can't seem to piece together a good solution. Any help would be appreciated.
As was posted in the comments, it's not really clear what your goal is here, but if it is to print a csv file that has one word per row along with each word's length,
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for word in mylist:
writer.writerow([word, str(len(word))])
If I'm misunderstanding here and actually what you have is a list of strings in which each string contains a list of comma-separated words, what you'd want to do instead is:
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for line in mylist:
for word in line.split(", "):
writer.writerow([word, str(len(word))])
If I undertstand correctly, you are looking for:
import csv
words = {}
for items in my_list:
for item in items.split(', '):
words.setdefault(item, 0)
words[item] += 1
with open('output.csv', 'w') as fopen:
writer = csv.writer(fopen)
for word, count in words.items():
writer.writerow([word, count])
This will write a CSV with unique words in one column and the number of occurrences of that word in the next column.
Is this what you were asking for?