I'm trying to create a list of rows that doesn't include 3 specific words.
word_list = ['Banned', 'On_Hold', 'Reviewing']
unknown_row = []
with open('UserFile.csv', newline='') as csvfile:
user_reader = csv.reader(csvfile, delimiter='\t')
for row in user_reader:
joined = ','.join(row)
for i in word_list:
if i in joined:
pass
else:
unknown_row.append(joined)
with open('unkown.csv', 'w', newline='') as output:
writer = csv.writer(output, delimiter=',')
writer.writerows(x.split(',') for x in unknown_row)
Here's the thing, if only one word is included in the word_list, then it works. But if I include two or more words, then it doesn't.
Any ideas?
Thank you
The issue with your code is here:
for i in word_list:
if i in joined:
pass
else:
unknown_row.append(joined)
Right now, if a word from word_list is not found in joined, it will continue the loop, so it will still add the row unless all the words from word_list are found in the row (This wouldn't prevent your code from working with a single "bad word", which you experienced). Instead, you want to short-circuit the loop to break if any word from word_list is found in the row.
You can make use of any here:
if not any(i in joined for i in word_list):
unknown_row.append(joined)
This way, if a single word from word_list is found in joined, the row will not be added.
Related
I have this code:
with open("wordslist.txt") as f:
words_list = {word.removesuffix("\n") for word in f}
with open("neg.csv") as g:
for tweete in g:
for word in tweete.split():
if word not in words_list:
print(word)
and the output is like this:
gfg
best
gfg
I
am
I
two
two
three
..............
I want to remove the newline (enter) so it will be one big sentence (there are like 4500+ words). How to join the words and remove the newline (replace each newline with space) so it became one big sentence.
I expected the output to be like this:
gfg best gfg I am I two two three..............
The parameter end in python's print() function defaults to \n, so it will wrap automatically.
Default print() function:
#print(end="\n")
print()
Set the parameter end="" to what you want. For example, end="*", then it will be linked with *.
solve:
with open("neg.csv") as g:
for tweete in g:
for word in tweete.split():
if word not in words_list:
print(word, end=" ")
You can append them to a list and than do " ".join(your_list)
Or you can create an empty string x = ""
And in in your iteration do smth like x += word
Here is example for the 1st solution
import csv
# Open the CSV file and read its contents
with open('file.csv', 'r') as csv_file:
reader = csv.reader(csv_file)
next(reader) # Skip the header row
# Initialize an empty list to store the column values
column_values = []
# Retrieve the values from the specified column and append them to the list
for row in reader:
column_values.append(row[0]) # Replace 0 with the index of the desired column
# Create a sentence with whitespace between the words
sentence = ' '.join(column_values)
print(sentence)
I sincerely apologize if this is the incorrect way to ask my question. This is my first time posting in Stack.
My inFile is six edited lines of the poem do not go gentle into the night. It should print out an outFile that contains the lines that contain a word that is greater than 3 letters, that is a duplicate. In example "rage rage against the dying of the light" would be printed because of "rage".
edit: When I run this it gives me an error saying "i" is undefined.
Oh, and I can't use any modules.
Here is my code:
def duplicateWordLines(inFile,outFile):
inFile=open(inFileName, "r")
outFile=open(outFileName, "w")
for line in inFile:
words=line.split() #split the lines
og=[] #orignal words
dups=[] #duplicate words
for word in words: #for each word in words
if og.count(i)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(i) #add to the original list - not to worry about it
for line in dups: #for the newly appended lines
outFile.write(line+'\n') #write in the outFile
#test case
inFileName="goodnightPoem.txt"
outFileName="goodnightPoemDUP.txt"
duplicateWordLines(inFileName,outFileName)
#should print
#rage rage against the dying of the light
#do not go gentle into that good good night
Thank you!
Try this out...
def duplicateWordLines(inFile,outFile):
inFile=open(inFileName, "r")
outFile=open(outFileName, "w")
for line in inFile:
# split the lines
words=line.split()
# remove all words less than 3 characters
words = [word for word in words if len(word)>3]
# make the list a set, so all duplicates are removed
no_dups = set(words)
# if there are more words in the words list than the
# no duplicate list, we must have a duplicate, so
# print the line
if len(words) > len(no_dups):
outFile.write(line+'\n') #write in the outFile
#test case
inFileName="file.txt"
outFileName="file_1.txt"
duplicateWordLines(inFileName,outFileName)
Regarding the i is undefined error, let's look at your for loop
for word in words: #for each word in words
if og.count(i)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(i) #add to the original list - not to worry about it
You don't actually define i anywhere, your loop defines word. You are blending a smart loop, i.e. for word for words with a range loop, like for i in range(0,len(words)). If we were to fix your loop, I think it would look something like this...
for word in words: #for each word in words
if og.count(word)>0 and line not in dups: #if the word appears more than once and not already in duplicates
dups.append(line) #add to duplicates
else: #if not a duplicate
og.append(word) #add to the original list - not to worry
This is my current code, the current issue I have is that search returns nothing. How do I achieve a string value for this variable.
count = 0
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
row_count = sum(1 for row in myFile)
print("aba")
for x in range(row_count):
print("aaa")
for row in myFile:
search = row[count].readline
print(search)
if self.delName.get("1.0","end-1c") in search:
count = count + 1
else:
newFile.write(row[count])
count = count + 1
The output is:
aba
aaa
aaa
So it runs through it twice, which is good as my userDatabase consists of two rows of data.
The file in question has this data:
"lukefinney","0000000","0000000","a"
"nictaylor","0000000","0000000","a"
You cannot just iterate over an open file more than once without rewinding the file object back to the start.
You'll need to add a file.seek(0) call to put the file reader back to the beginning each time you want to start reading from the first row again:
myFile.seek(0)
for row in myFile:
The rest of your code makes little sense; when iterating over a file you get individual lines from the file, so each row is a string object. Indexing into strings gives you new strings with just one character in it; 'foo'[1] is the character 'o', for example.
If you wanted to copy across rows that don't match a string, you don't need to know the row count up front at all. You are not handling a list of rows here, you can look at each row individually instead:
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if filter_string not in row:
newFile.write(row)
This does a sub-string match. If you need to match whole columns, use the csv module to give you individual columns to match against. The module handles the quotes around column values:
import csv
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv", "r", newline='') as myFile:
with open("newFile.csv", "w", newline='') as newFile:
writer = csv.writer(newFile)
for row in csv.reader(myFile):
# row is now a list of strings, like ['lukefinney', '0000000', '0000000', 'a']
if filter_string != row[0]: # test against the first column
# copied across if the first column does not match exactly.
writer.writerow(row)
One problem is that row_count = sum(1 for row in myFile) consumes all rows from myFile. Subsequent reads on myFile will return an empty string which signifies end of file. This means that for loop later in your code where you execute for row in myFile: is not entered because all rows have already been consumed.
A way around this is to add a call to myFile.seek(0) just before for row in myFile:. This will reset the file pointer and the for loop should then work.
It's not very clear from your code what it is that you are trying to do, but it kind of looks like you want to filter out rows that contain a certain string. Try this:
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if self.delName.get("1.0","end-1c") not in row:
newFile.write(row)
I have two files. One with word list lets say a.txt and another csv file whose second row are words say b.csv . I want to check if any word from a.txt is in second row of b.csv and print only those lines which are unmatched. Total 3 rows are there in csv file.
what I have so far achieved is printing those lines which have word from word list. But I want exactly the other lines. Here is my code:
reader = csv.reader(open('b.csv', 'rb'))
op = open('a.txt', 'r')
ol = op.readlines()
for row in reader:
for word in ol:
if word==row[1]:
print row[0],row[1],row[2]
Now what do I make it to print the lines which aren't matched ?
Thanks!
The least intrusive solution (i.e. keeping your nested loop) would be something along the lines of
for row in reader:
match = False
for word in ol:
if word==row[1]:
match = True
break
if not match:
print row[0],row[1],row[2]
Or using some more Python goodness:
for row in reader:
for word in ol:
if word==row[1]:
break
else:
print row[0],row[1],row[2]
The else: bit is only executed if the preceding loop ended normally (without ever reaching break).
As suggested by thg435, it's even simpler:
for row in reader:
if row[1] not in ol:
print row[0],row[1],row[2]
I have a question about the best way to get word counts for items in a list.
I have 400+ items indexed in a list. They are of varying lengths. For example, if I enumerate, then I will get:
for index, items in enumerate(my_list):
print index, items
0 fish, line, catch, hook
1 boat, wave, reel, line, fish, bait
.
.
.
Each item will get written into individual rows in an csv file. I would like corresponding word counts to complement this text in the adjacent column. I can find word/token counts just fine using Excel, but I would like to be able to do this in Python so I don't have to keep going back and forth between programs to process my data.
I'm sure there are several ways to do this, but I can't seem to piece together a good solution. Any help would be appreciated.
As was posted in the comments, it's not really clear what your goal is here, but if it is to print a csv file that has one word per row along with each word's length,
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for word in mylist:
writer.writerow([word, str(len(word))])
If I'm misunderstanding here and actually what you have is a list of strings in which each string contains a list of comma-separated words, what you'd want to do instead is:
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for line in mylist:
for word in line.split(", "):
writer.writerow([word, str(len(word))])
If I undertstand correctly, you are looking for:
import csv
words = {}
for items in my_list:
for item in items.split(', '):
words.setdefault(item, 0)
words[item] += 1
with open('output.csv', 'w') as fopen:
writer = csv.writer(fopen)
for word, count in words.items():
writer.writerow([word, count])
This will write a CSV with unique words in one column and the number of occurrences of that word in the next column.
Is this what you were asking for?