I have a csv file with fields
joburl, jobtitle,totalviews
I am trying to retrieve values for all jobtitles separately. So far, I read all the jobtitles of all joburls and place unique jobtitles in a separate text file. Now I want to read all of the text file, match the jobtitle in the text file with the jobtitle in the csv file and return joburl and view values for the particular jobtitle.
The code is:
with open('Pagepath_detail.csv', 'rt') as f:
with open('individual_jobtitle.txt') as title:
for t in title:
job = [row for row in csv.reader(f) if row[1] == t]
print job
where pagepath_detail is the csv file I'm trying to extract the values from.
The code returns an empty array. But at the same time, if I try the direct approach such as:
with open('Pagepath_detail.csv', 'rt') as f:
job = [row for row in csv.reader(f) if row[1] == 'job1']
print job
The above code works perfectly.
What am I doing wrong?
for t in title is iterating through the lines of the file, but also returns a newline character (\n) at the end of each line. Assuming that the file was created by using print for each jobtitle, what you may need to do is trim off the newlines returned as part of each t:
with open('Pagepath_detail.csv', 'rt') as f:
with open('individual_jobtitle.txt') as title:
for t in title:
t = t.rstrip() # this line will convert 'job1\n' to 'job1'
job = [row for row in csv.reader(f) if row[1] == t]
print job
Note that the last line will sometimes be blank as well, but that only matters if one of more of your names are blank, too. Additionally, if there is no blank final line, then the last (nonblank) line will not usually have a newline at the end. That's OK because rstrip will just quietly return the line intact.
If you look at your code closely, there's a small issue with the looping you have.
You have interchanged the placement of the loops to be specific. Try the code below and it should work well for you.
with open('out.txt','r') as title:
for t in title:
with open('data.csv','r') as iFile:
job = [row for row in csv.reader(iFile) if row[1].lstrip().rstrip() == t.lstrip().rstrip()]
print job
Related
I'm trying to write to a CSV file with output that looks like this:
14897,40.50891,-81.03926,168.19999
but the CSV writer keeps writing the output with quotes at beginning and end
'14897,40.50891,-81.03926,168.19999'
When I print the line normally, the output is correct but I need to do line.split() or else the csv writer puts output as 1,4,8,9,7 etc...
But when I do line.split() the output is then
['14897,40.50891,-81.03926,168.19999']
Which is written as '14897,40.50891,-81.03926,168.19999'
How do I make the quotes go away? I already tried csv.QUOTE_NONE but doesn't work.
with open(results_csv, 'wb') as out_file:
writer = csv.writer(out_file, delimiter=',')
writer.writerow(["time", "lat", "lon", "alt"])
for f in file_directory):
for line in open(f):
print line
line = line.split()
writer.writerow(line)
with line.split(), you're not splitting according to commas but to blanks (spaces, linefeeds, tabs). Since there are none, you end up with only 1 item per row.
Since this item contains commas, csv module has to quote to make the difference with the actual separator (which is also comma). You would need line.strip().split(",") for it to work, but...
using csv to read your data would be a better idea to fix this:
replace that:
for line in open(some_file):
print line
line = line.split()
writer.writerow(line)
by:
with open(some_file) as f:
cr = csv.reader(f) # default separator is comma already
writer.writerows(cr)
You don't need to read the file manually. You can simply use csv reader.
Replace the inner for loop with:
# with ensures that the file handle is closed, after the execution of the code inside the block
with open(some_file) as file:
row = csv.reader(file) # read rows
writer.writerows(row) # write multiple rows at once
Sorry, very much a beginner with Python and could really use some help.
I have a large CSV file, items separated by commas, that I'm trying to go through with Python. Here is an example of a line in the CSV.
123123,JOHN SMITH,SMITH FARMS,A,N,N,12345 123 AVE,CITY,NE,68355,US,12345 123 AVE,CITY,NE,68355,US,(123) 555-5555,(321) 555-5555,JSMITH#HOTMAIL.COM,15-JUL-16,11111,2013,22-DEC-93,NE,2,1\par
I'd like my code to scan each line and look at only the 9th item (the state). For every item that matches my query, I'd like that entire line to be written to an CSV.
The problem I have is that my code will find every occurrence of my query throughout the entire line, instead of just the 9th item. For example, if I scan looking for "NE", it will write the above line in my CSV, but also one that contains the string "NEARY ROAD."
Sorry if my terminology is off, again, I'm a beginner. Any help would be greatly appreciated.
I've listed my coding below:
import csv
with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for line in f:
if "NE" in line:
print ('Found: []'.format(line))
writer.writerow([line])
You're not actually using your reader to read the input CSV, you're just reading the raw lines from the file itself.
A fixed version looks like the following (untested):
import csv
with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
if row[8] == 'NE':
print ('Found: {}'.format(row))
writer.writerow(row)
The changes are as follows:
Instead of iterating over the input file's lines, we iterate over the rows parsed by the reader (each of which is a list of each of the values in the row).
We check to see if the 9th item in the row (i.e. row[8]) is equal to "NE".
If so, we output that row to the output file by passing it in, as-is, to the writer's writerow method.
I also fixed a typo in your print statement - the format method uses braces (not square brackets) to mark replacement locations.
This snippet should solves your problem
import csv
with open('Sample.csv', 'rb') as f, open('NE_Sample.csv', 'wb') as outf:
reader = csv.reader(f, delimiter=',')
writer = csv.writer(outf)
for row in reader:
if "NE" in row:
print ('Found: {}'.format(row))
writer.writerow(row)
if "NE" in line in your code is trying to find out whether "NE" is a substring of string line, which works not as intended. The lines are raw lines of your input file.
If you use if "NE" in row: where row is parsed line of your input file, you are doing exact element matching.
This is my current code, the current issue I have is that search returns nothing. How do I achieve a string value for this variable.
count = 0
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
row_count = sum(1 for row in myFile)
print("aba")
for x in range(row_count):
print("aaa")
for row in myFile:
search = row[count].readline
print(search)
if self.delName.get("1.0","end-1c") in search:
count = count + 1
else:
newFile.write(row[count])
count = count + 1
The output is:
aba
aaa
aaa
So it runs through it twice, which is good as my userDatabase consists of two rows of data.
The file in question has this data:
"lukefinney","0000000","0000000","a"
"nictaylor","0000000","0000000","a"
You cannot just iterate over an open file more than once without rewinding the file object back to the start.
You'll need to add a file.seek(0) call to put the file reader back to the beginning each time you want to start reading from the first row again:
myFile.seek(0)
for row in myFile:
The rest of your code makes little sense; when iterating over a file you get individual lines from the file, so each row is a string object. Indexing into strings gives you new strings with just one character in it; 'foo'[1] is the character 'o', for example.
If you wanted to copy across rows that don't match a string, you don't need to know the row count up front at all. You are not handling a list of rows here, you can look at each row individually instead:
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if filter_string not in row:
newFile.write(row)
This does a sub-string match. If you need to match whole columns, use the csv module to give you individual columns to match against. The module handles the quotes around column values:
import csv
filter_string = self.delName.get("1.0","end-1c")
with open("userDatabase.csv", "r", newline='') as myFile:
with open("newFile.csv", "w", newline='') as newFile:
writer = csv.writer(newFile)
for row in csv.reader(myFile):
# row is now a list of strings, like ['lukefinney', '0000000', '0000000', 'a']
if filter_string != row[0]: # test against the first column
# copied across if the first column does not match exactly.
writer.writerow(row)
One problem is that row_count = sum(1 for row in myFile) consumes all rows from myFile. Subsequent reads on myFile will return an empty string which signifies end of file. This means that for loop later in your code where you execute for row in myFile: is not entered because all rows have already been consumed.
A way around this is to add a call to myFile.seek(0) just before for row in myFile:. This will reset the file pointer and the for loop should then work.
It's not very clear from your code what it is that you are trying to do, but it kind of looks like you want to filter out rows that contain a certain string. Try this:
with open("userDatabase.csv","r") as myFile:
with open("newFile.csv","w") as newFile:
for row in myFile:
if self.delName.get("1.0","end-1c") not in row:
newFile.write(row)
I want to create a csv from an existing csv, by splitting its rows.
Input csv:
A,R,T,11,12,13,14,15,21,22,23,24,25
Output csv:
A,R,T,11,12,13,14,15
A,R,T,21,22,23,24,25
So far my code looks like:
def update_csv(name):
#load csv file
file_ = open(name, 'rb')
#init first values
current_a = ""
current_r = ""
current_first_time = ""
file_content = csv.reader(file_)
#LOOP
for row in file_content:
current_a = row[0]
current_r = row[1]
current_first_time = row[2]
i = 2
#Write row to new csv
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
writer.writerow((current_a,
current_r,
current_first_time,
",".join((row[x] for x in range(i+1,i+5)))
))
#do only one row, for debug purposes
return
But the row contains double quotes that I can't get rid of:
A002,R051,02-00-00,"05-21-11,00:00:00,REGULAR,003169391"
I've tried to use writer = csv.writer(f,quoting=csv.QUOTE_NONE) and got a _csv.Error: need to escape, but no escapechar set.
What is the correct approach to delete those quotes?
I think you could simplify the logic to split each row into two using something along these lines:
def update_csv(name):
with open(name, 'rb') as file_:
with open("updated_"+name, 'wb') as f:
writer = csv.writer(f)
# read one row from input csv
for row in csv.reader(file_):
# write 2 rows to new csv
writer.writerow(row[:8])
writer.writerow(row[:3] + row[8:])
writer.writerow is expecting an iterable such that it can write each item within the iterable as one item, separate by the appropriate delimiter, into the file. So:
writer.writerow([1, 2, 3])
would write "1,2,3\n" to the file.
Your call provides it with an iterable, one of whose items is a string that already contains the delimiter. It therefore needs some way to either escape the delimiter or a way to quote out that item. For example,
write.writerow([1, '2,3'])
Doesn't just give "1,2,3\n", but e.g. '1,"2,3"\n' - the string counts as one item in the output.
Therefore if you want to not have quotes in the output, you need to provide an escape character (e.g. '/') to mark the delimiters that shouldn't be counted as such (giving something like "1,2/,3\n").
However, I think what you actually want to do is include all of those elements as separate items. Don't ",".join(...) them yourself, try:
writer.writerow((current_a, current_r,
current_first_time, *row[i+2:i+5]))
to provide the relevant items from row as separate items in the tuple.
I am trying to remove a row from a csv file if the 2nd column matches a string. My csv file has the following information:
Name
15 Dog
I want the row with "Name" in it removed. The code I am using is:
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter=',')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
But the "Name" row isn't removed. What am I doing wrong?
EDIT: I was using the wrong delimiter. Changing it to \t worked. Below is the code that works now.
import csv
reader = csv.reader(open("info.csv", "rb"), delimiter='\t')
f = csv.writer(open("final.csv", "wb"))
for line in reader:
if "Name" not in line:
f.writerow(line)
print line
Seems that you are specifying the wrong delimiter (comma)in csv.reader
Each line yielded by reader is a list, split by your delimiter. Which, by the way, you specified as ,, are you sure that is the delimiter you want? Your sample is delimited by tabs.
Anyway, you want to check if 'Name' is in any element of a given line. So this will still work, regardless of whether your delimiter is correct:
for line in reader:
if any('Name' in x for x in line):
#write operation
Notice the difference. This version checks for 'Name' in each list element, yours checks if 'Name' is in the list. They are semantically different because 'Name' in ['blah blah Name'] is False.
I would recommend first fixing the delimiter error. If you still have issues, use if any(...) as it is possible that the exact token 'Name' is not in your list, but something that contains 'Name' is.