How to check if multiple words contain in a CSV column? - python

I found several similar questions online on how to find a word in a specific column in a CSV file however, I didn't find any on checking multiple words.
Here's the problem:
words = ‘something’
for row in data:
if words not in row[‘column header name’]:
writer.writerow[row]
The above code prints all the data into a csv file. It also checks if 'something-...' is in the row['column header name'] then it skips that row.
For example, in that column, each cell contains strings such as 'something-asdas' or 'something-aaaa'. It's checking part of that string and skips it.
Goal:
words = (‘something’, 'word' , 'dada')
for row in data:
if words not in row[‘column header name’]:
writer.writerow[row]
When I try to do that, I get TypeError: 'in ' requires string as left operand, not tuple.
Any ideas on how to fix that problem? I tried to iterate through the tuple but then my csv file will have duplicates.

You need the built-in any() checking if any of the words in the tuple is present in a row:
words = ("something", "word" , "dada")
for row in data:
# skipping rows containing any of the "words"
if not any(word in row["column header name"] for word in words):
writer.writerow(row)
Of course, this assumes that row is a dictionary and you've used the DictReader.

Related

determine whether or not the second entry in each row converted to lower-case

I have been doing these tasks:
Write a script that reads in the data from the CSV file pastimes.csv located in the
chapter 9 practice files folder, skipping over the header row
Display each row of data (except for the header row) as a list of strings
Add code to your script to determine whether or not the second entry in each row
(the "Favorite Pastime") converted to lower-case includes the word "fighting" using
the string methods find() and lower()
I have complited 2 of them but i really misunderstand the third one, cause my english is not very well and i really can't catch what do they want
import csv
with open("pastimes.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
next(my_file_reader)
for row in my_file_reader:
print(row)
Output: ['Fezzik', 'Fighting']
['Westley', 'Winning']
['Inigo Montoya', 'Sword fighting']
['Buttercup', 'Complaining']
Headers which i skipped: Person, Favorite pastime
You need something like:
import csv
with open("pastimes.csv", "r") as my_file:
my_file_reader = csv.reader(my_file)
next(my_file_reader)
for row in my_file_reader:
print(row)
if row[1].lower().find('fighting') >= 0:
print('Second entry lowered contains "fighting"')

Iterate string.append over rows Python

I have a data set like this:
and the objective is to have each cell of the data frame in one list whereby each cell is a word in itself.
I have tried originally with:
string = []
for words in data:
string.append(words)
but it gives me the wanted result for the first row only:
.
I have tried also to iterate it using iterrows but it creates just pairs of value not very useful.
How I can iterate the append function over all the rows and store the results as a single string?
I solved it with this:
string = []
for words in data.itertuples(index=False):
string += words

Python CSV output - additional formatting

To start...Python noob...
My first goal is to read the first row of a CSV and output. The following code does that nicely.
import csv
csvfile = open('some.csv','rb')
csvFileArray = []
for row in csv.reader(csvfile, delimiter = ','):
csvFileArray.append(row)
print(csvFileArray[0])
Output looks like...
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',......
My second and third tasks deal with formatting.
Thus, if I want the print(csvFileArray[0]) output to contain 'double quotes' for the delimiter how best can I handle that?
I'd like to see...
["Date","Time", "CPU001 User%", "CPU001 Sys%",......
I have played with formatting the csvFileArray field and all I can get it to do is to prefix or append data.
I have also looked into the 'dialect', 'quoting', etc., but am just all over the place.
My last task is to add text into each value (into the array).
Example:
["Test Date","New Time", "Red CPU001 User%", "Blue CPU001 Sys%",......
I've researched a number of methods to do this but am awash in the multiple ways.
Should I ditch the Array as this is too constraining?
Looking for direction not necessarily someone to write it for me.
Thanks.
OK.....refined the code a bit and am looking for direction, not direct solution (need to learn).
import csv
with open('ba200952fd69 - Copy.csv', newline='') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
print (row)
break
The code nicely reads the first line of the CSV and outputs the first row as follows:
['Date', 'Time', 'CPU001 User%', 'CPU001 Sys%',....
If I want to add formatting to each/any item within that row, would I be performing those actions within the quotes of the print command? Example: If I wanted each item to have double-quotes, or have a prefix of 'XXXX', etc.
I have read through examples of .join type commands, etc., and am sure that there are much easier ways to format print output than I'm aware of.
Again, looking for direction, not immediate solutions.
For your first task, I'd recommend using the next function to grab the first row rather than iterating through the whole csv. Also, it might be useful to take a look at with blocks as they are the standard way of dealing with opening and closing files.
For your second question, it looks like you want to change the format of the print statement. Note that it is printing strings, which is indicated by the single quotes around each element in the array. This has nothing to do with the csv module, but simply because you are print an array of strings. To print with double quotes, you would have to reformat the print statement. You could take a look at this for some ways on doing that.
For your last question, I'd recommend looking at list comprehensions. E.g.,
["Test " + word for word in words].
If words = ["word1", "word2"], then this would return ["Test word1", "Test word2"].
Edit: If you want to add a different value to each value in the array, you could do something similar. Let prefixes be an array of prefixes you want to add to the word in words at the same index location. You could then use the list comprehension:
[prefix + " " + word for prefix, word in zip(prefixes, words)]

Split method giving unexpected output

I read a line as below from a CSV file:
GsmUart: enabled 2015-08-13T16:57:14.558000 0.072651
Note, each of these entries in the line were delimited by '\t' when the CSV was written.
Problem: I want to extract the 0.072651.
I've tried:
print(str(line)) gives the entire line.
line.split('\t')[1].split('\t')[0] gives the timestamp in between.
line.split('\t')[1].split('\t')[1] gives IndexError: list index out of range.
The first line.split('\t') has already split on all the tabs so there are no more to split on in your second call, that means you only have one element in the list that the second .split('\t') returns giving you an IndexError trying to index a non existent second element.
what you want is the last element from the first split to get 0.072651:
line.split('\t')[-1]
You could also use the csv module to read your file passing a tab as the delimiter:
import csv
with open("your_file") as f:
r = csv.reader(f,delimiter="\t")
for a,b,c in row:
# ...

Python Extract Word/Token counts from items in a list?

I have a question about the best way to get word counts for items in a list.
I have 400+ items indexed in a list. They are of varying lengths. For example, if I enumerate, then I will get:
for index, items in enumerate(my_list):
print index, items
0 fish, line, catch, hook
1 boat, wave, reel, line, fish, bait
.
.
.
Each item will get written into individual rows in an csv file. I would like corresponding word counts to complement this text in the adjacent column. I can find word/token counts just fine using Excel, but I would like to be able to do this in Python so I don't have to keep going back and forth between programs to process my data.
I'm sure there are several ways to do this, but I can't seem to piece together a good solution. Any help would be appreciated.
As was posted in the comments, it's not really clear what your goal is here, but if it is to print a csv file that has one word per row along with each word's length,
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for word in mylist:
writer.writerow([word, str(len(word))])
If I'm misunderstanding here and actually what you have is a list of strings in which each string contains a list of comma-separated words, what you'd want to do instead is:
import csv
with open(filename, 'w') as outfile:
writer = csv.writer(outfile)
writer.writerow(['Word', 'Length'])
for line in mylist:
for word in line.split(", "):
writer.writerow([word, str(len(word))])
If I undertstand correctly, you are looking for:
import csv
words = {}
for items in my_list:
for item in items.split(', '):
words.setdefault(item, 0)
words[item] += 1
with open('output.csv', 'w') as fopen:
writer = csv.writer(fopen)
for word, count in words.items():
writer.writerow([word, count])
This will write a CSV with unique words in one column and the number of occurrences of that word in the next column.
Is this what you were asking for?

Categories