Python/Selenium Text File Output Loop - python

I'm having trouble with a loop writing to a text file. I'm trying to create a tab delimited text file that writes an ID, date, time, sequence number, and text from a transcript to a line, then starts a new line every time it reaches bold text.
When there is only 1 ID in my company_list, everything works great and it produces this example below:
However, as soon as I add an additional ID to the company_list, it produces this:
It looks like when a second company ID is added, that a new line is placed after every ID for some unknown reason. What's even weirder is that when the loop runs the last company ID in the list, that data is formatted correctly. There are no errors produced at all. If anyone has any idea what is going on here I would really appreciate it.
Code snippet below:
company_list = open('Company_List.txt')
for line in company_list:
company_id = WebDriverWait(driver, 10).until(EC.presence_of_element_located((By.XPATH, "//*[#id='SearchTopBar']")))
company_id.send_keys(line + Keys.ENTER)
driver.implicitly_wait(10)
driver.find_element_by_link_text("Transcripts").click()
driver.implicitly_wait(10)
driver.find_element_by_partial_link_text("Q1 2018").click()
date = driver.find_element_by_xpath('//*[#id="ctl01__header__dateLabel"]').text
struct_time = time.strptime(date, "%A, %B %d, %Y %I:%M %p")
speaker = 1
p_tag = driver.find_elements_by_tag_name('p')
file = open("q1_2018.txt", "a", encoding='utf-8-sig')
for i in range(3,len(p_tag) - 6):
element = driver.find_element_by_xpath('//*[#id="ctl01__bodyRow"]/td/p[' + str(i) + ']')
weight = int(element.value_of_css_property('font-weight'))
if weight == 700:
file.write('\n' + line + '\t' + str(struct_time[0]) + str(struct_time[1]) + str(struct_time[2]) + '\t' + str(struct_time[3]) + str(struct_time[4]) + '\t' + str(speaker) + '\t')
file.write(driver.find_element_by_xpath('//*[#id="ctl01__bodyRow"]/td/p[' + str(i + 1) + ']').text + ' ')
speaker = speaker + 1
else:
file.write(driver.find_element_by_xpath('//*[#id="ctl01__bodyRow"]/td/p[' + str(i) + ']').text + ' ')
file.close()

Related

Is this the right approach to time code speed?

I want to measure the speed of the primary function of my code, and I wrote the following code:
with open(filename + '.speed.csv', 'w') as f:
f.write('result\ttime\ttext\n')
for index, text in enumerate(textlines):
#print(index)
# 1. Start to time
start = time.time()
doc = bnlu.nlps(text, enable = enable)
tags = defaultdict(set)
if 'tag' in doc:
for tag_id, tag_info_list in doc['wb_tag'].items():
for tag_info in tag_info_list:
id = tag_id
name = tag_info[1]
weight = tag_info[2]
pattern_str = tag_info[4]
tag_str = '(' + id + ' ' + name + ' ' + pattern_str + ' ' + str(weight) + ')'
tags[id].add(tag_str)
# 2. End to time
end = time.time()
time_1 = str(end-start)
f.write(str(tags) + '\t' + time_1 + '\t' + text + '\n')
# End to time 2
end2 = time.time()
time2 = end2-start
print(time2)
The time consuming code is this part:
doc = bnlu.nlps(text, enable = enable)
So my time_1 is to measure this part, and time_2 measures whether writing result of each line to a file takes much more time or just a little time compared with time_1.
Is this the right approach to measure code speed? In the resulting csv file, the 2nd column contains the time spent on each text line processed by the algorithm.
Also, the time_1 and time_2 are millseconds, right? In my output, these number are very small as below:
0.002065896987915039
0.002288341522216797
0.0019719600677490234
0.002459287643432617
0.0019350051879882812
0.002561807632446289
0.0022737979888916016
0.0026137828826904297
0.0020627975463867188
0.01592111587524414
0.001967191696166992
0.009980916976928711
0.007891178131103516
0.0022401809692382812
0.0035669803619384766
0.0030107498168945312
0.002779722213745117
0.0027618408203125
0.0019371509552001953
0.0025129318237304688
0.0023632049560546875
0.0022687911987304688
Does it mean only 0.00226 millisecond for the last? It looks suspicious since it is too fast. Is it second?
Use timeit https://docs.python.org/3/library/timeit.html, for example:
def foo():
return "hello"
timeit.timeit(foo,number=10)
Where number is number of repetition of time measure.

Python replace str in list with new value

I’m writing a program that makes music albums into files that you can search for, and for that i need a str in the file that have a specific value that is made after the list is complete. Can you go back in that list and change a blank str with a new value?
I have searched online and found something called words.replace, but it doesn’t work, i get a Attribute error.
def create_album():
global idnumber, current_information
file_information = []
if current_information[0] != 'N/A':
save()
file_information.append(idnumber)
idnumber += 1
print('Type c at any point to abort creation')
for i in creation_list:
value = input('\t' + i)
if value.upper == 'C':
menu()
else:
-1file_information.append('')
file_information.append(value)
file_information.append('Album created - ' + file_information[2] +'\nSongs:')
-2file_information = [w.replace(file_information[1], str(file_information[0]) + '-' + file_information[2]) for w in file_information]
current_information = file_information
save_name = open(save_path + str(file_information[0]) + '-' + str(file_information[2]) + '.txt', 'w')
for i in file_information:
save_name.write(str(i) + '\n')
current_files_ = open(information_file + 'files.txt', 'w')
filenames.append(file_information[0])
for i in filenames:
current_files_.write(str(i) + '\n')
id_file = open(information_file + 'albumid.txt', 'w')
id_file.write(str(idnumber))
-1 is where i have put aside a blank row
-2 is the where i try to replace row 1 in the list with the value of row 0 and row 2.
The error message I receive is ‘int’ object has no attribute ‘replace’
Did you try this?
-2file_information = [w.replace(str(file_information[1]), str(file_information[0]) + '-' + file_information[2]) for w in file_information]

Python Nested Loops - continue iterates first loop

Brand new to programming but very enjoyable challenge.
Here's a question which I suspect may be caused by a misunderstanding of python loops.
System info: Using notepad++ and IDLE python 3.4.3 on Win 7 32-bit
My solution is to open 1 database, use it to look for a correct master entry from database 2, pulls a index number (task_no), then write a 3rd file identical to the first database, this time with the correct index number.
My problem is that it performs 1st and 2nd loop correctly, then on the 2nd iteration of loop 1, tries to perform a block in loop 2 while iterating through the rows of loop 1, not the task_rows of loop 2.
footnote: Both files are quite large (several MB) so I'm note sure if storing them in memory is a good idea.
This was a relevant question that I found closest to this problem:
python nested loop using loops and files
What I got out of it was that I tried moving the file opening within the 1st loop, but the problem persists. Something to do with how I'm using CSV reader?
I also have the sinking suspicion that there may be a root cause in problem solving so I am welcome to suggestions for alternative ways to solve the problem.
Thanks in advance!
The gist:
for row in readerCurrentFile: #LOOP 1
# iterates through readerCurrentFile to define search variables
[...]
for task_row in readerTaskHeader: #LOOP 2
# searches each row iteratively through readerTaskHeader
# Match compid
#if no match, continue <<<- This is where it goes back to 1st loop
[...]
# Match task frequency
#if no match, continue
[...]
# once both of the above matches check out, will grab data (task_no from task_row[0]
task_no = ""
task_no = task_row[0]
if task_row:
break
[...]
# writes PM code
print("Successful write of PM schedule row")
print(compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ": " + pmid + " " + task_no)
The entire code:
import csv
import re
#Writes schedule
csvNewPMSchedule = open('new_pm_schedule.csv', 'a', newline='')
writerNewPMSchedule = csv.writer(csvNewPMSchedule)
# Dictionaries of PM Frequency
def re_compile_dict(d,f):
for k in d:
d[k] = re.compile(d[k], flags=f)
dict_month = {60:'Quin',36:'Trien',24:'Bi-An',12:'Annual(?<!Bi-)(?<!Semi-)',6:'Semi-An',3:'Quart',2:'Bi-Month',1:'Month(?<!Bi-)'}
dict_week = {2:'Bi-Week',1:'Week(?<!Bi-)'}
dict_freq_names = {'60Months':'Quintennial','36Months':'Triennial','24Months':'Bi-Annual','12Months':'Annual','6Months':'Semi-Annual','3Months':'Quarterly','2Months':'Bi-Monthly','1Months':'Monthly','2Weeks':'Bi-Weekly','1Weeks':'Weekly'}
re_compile_dict(dict_month,re.IGNORECASE)
re_compile_dict(dict_week, re.IGNORECASE)
# Unique Task Counter
task_num = 0
total_lines = 0
#Error catcher
error_in_row = []
#Blank out all rows
pmid = 0
compid = 0
comp_desc = 0
pmfreqx = 0
pmfreq = 0
pmfreqtype = 0
# PM Schedule Draft (as provided by eMaint)
currentFile = open('pm_schedule.csv', encoding='windows-1252')
readerCurrentFile = csv.reader(currentFile)
# Loop 1
for row in readerCurrentFile:
if row[0] == "pmid":
continue
#defines row items
pmid = row[0]
compid = row[1]
comp_desc = row[2]
#quantity of pm frequency
pmfreqx_temp = row[3]
#unit of pm frequency, choices are: Months, Weeks
pmfreq = row[4]
#pmfreqtype is currently only static not sure what other options we have
pmfreqtype = row[5]
#pmnextdate is the next scheduled due date from this one. we probably need logic later that closes out any past due date
pmnextdate = row[6]
# Task Number This is what we want to change
# pass
# We want to change this to task header's task_desc
sched_task_desc = row[8]
#last done date
last_pm_date = row[9]
#
#determines frequency search criteria
#
try:
pmfreqx = int(pmfreqx_temp)
except (TypeError, ValueError):
print("Invalid PM frequency data, Skipping row " + pmid)
error_in_row.append(pmid)
continue
#
#defines frequency search variable
#
freq_search_var = ""
if pmfreq == "Weeks":
freq_search_var = dict_week[pmfreqx]
elif pmfreq == "Months":
freq_search_var = dict_month[pmfreqx]
if not freq_search_var:
print("Error in assigning frequency" + compid + " " + str(pmfreqx) + " " + pmfreq)
error_in_row.append(pmid)
continue
#defines Equipment ID Search Variable
print(compid + " frequency found: " + str(pmfreqx) + " " + str(pmfreq))
compid_search_var = re.compile(compid,re.IGNORECASE)
#
# Matching function - search taskHeader for data
#
#PM Task Header Reference
taskHeader = open('taskheader.csv', encoding='windows-1252')
readerTaskHeader = csv.reader(taskHeader)
for task_row in readerTaskHeader:
# task_row[0]: taskHeader pm number
# task_row[1]: "taskHeader task_desc
# task_row[2]: taskHeader_task_notes
#
# search for compid
compid_match = ""
compid_match = compid_search_var.search(task_row[1])
if not compid_match:
print(task_row[1] + " does not match ID for " + compid + ", trying next row.") #debug 2
continue # <<< STOPS ITERATING RIGHT OVER HERE
print("Found compid " + task_row[1]) # debug line
#
freq_match = ""
freq_match = freq_search_var.search(task_row[1])
if not freq_match:
print(task_row[1] + " does not match freq for " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ", trying next row.") #debug line
continue
print("Frequency Match: " + compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)]) # freq debug line
#
task_no = ""
print("Assigning Task Number to " + task_row[0])
task_no = task_row[0]
if task_row:
break
#
#error check
#
if not task_no:
print("ERROR IN SEARCH " + compid + " " + pmid)
error_in_row.append(pmid)
continue
#
# Writes Rows
#
writerNewPMSchedule.writerow([pmid,compid,comp_desc,pmfreqx,pmfreq,pmfreqtype,pmnextdate,task_no,sched_task_desc,last_pm_date])
print("Successful write of PM schedule row")
print(compid + " " + dict_freq_names[str(pmfreqx) + str(pmfreq)] + ": " + pmid + " " + task_no)
print("==============")
# Error reporting lined out for now
# for row in error_in_row:
# writerNewPMSchedule.writerow(["Error in row:",str(error_in_row[row])])
# print("Error in row: " + str(error_in_row[row]))
print("Finished")

Twitter API 404 errors - Python (continued)

I'm trying to use the GET users/lookup from the twitter API in a python script to identify screen name based on a list of users IDs. The script doesn't seem to be able to handle 404 errors, as I assume the whole request with 100 user IDs is not found by twitter, so the for loop doesn't even begin. How do I iterate over 100 user IDs at a time, while still respecting the rate limit, if one of the 100 IDs will cause the whole request to 404? Is there a way to handle this error while still getting the response for the other IDs in the same request? My experiments with "except Valueerror" didn't seem to solve this...
I'd very much appreciate any advice or tips you can give!
while total_request_counter <= request_total:
while list_counter_last <= len(list)+100: #while last group of 100 items is lower or equal than total number of items in list#:
while current_request_counter < 180:
response = twitter.users.lookup(include_entities="false",user_id=",".join(list[list_counter_first:list_counter_last])) #API query for 100 users#
for item in list[list_counter_first:list_counter_last]: #parses twitter IDs in the list by groups of 100 to record results#
try: #necessary for handling errors#
results = str(response)
results = results[results.index("u'screen_name': u'",results.index(item)) + 18:results.index("',",results.index("u'screen_name': u'",results.index(item)) + 18)]#looks for username section in the query output#
text_file = open("output_IDs.txt", "a") #opens current txt output / change path to desired output#
text_file.write(str(item) + "," + results + "\n") #adds twitter ID, info lookup result, and a line skip to the txt output#
text_file.close()
error_counter = error_counter + 1
print str(item) + " = " + str(results)
except: #creates exception to handle errors#
text_file = open("output_IDs.txt", "a") #opens current txt output / change path to desired output#
text_file.write(str(item) + "," + "ERROR " + str(response.headers.get('h')) + "\n") #adds twitter ID, error code, and a line skip to the txt output#
text_file.close()
print str(item) + " = ERROR"
continue
print "Succesfully processed " + str(error_counter) + " users of request number " + str(current_request_counter + 1) + ". " + str(179 - current_request_counter) + " requests left until break." #confirms latest batch processing#
print ""
list_counter_first = list_counter_first + 100 #updates list navigation to move on to next group of 100#
list_counter_last = list_counter_last + 100
error_counter = 0 # resets error counter for the next round#
current_request_counter = current_request_counter + 1 #updates current request counter to stay below rate limit of 180#
t = str(datetime.now())
print ""
print "Taking a break, back in 16 minutes. Approximate time left: " + str((len(list)/18000*15)-(15*total_request_counter)) + " minutes. Current timestamp: " + t
print ""
current_request_counter = 0
total_request_counter = total_request_counter + 1
time.sleep(960) #suspends activities for 16 minutes to respect rate limit#

Add a column in the middle of a row from a .csv file with python

Hello I have a python script which changes a timestamp column in a .csv file from dot notation to "date time TSQL" notation:
One row looks like this before executing the code:
send,2007.10.04.10.11.11.669,Server,Data,Client,TYPE=STP,Length=329,Cnt=11
after executing the code it looks like this:
send,2007-10-04 10:11:11.669,Server,Data,Client,TYPE=STP,Length=329,Cnt=11
I want to append the same time in the new format after the first time column, that it looks like this:
send,2007-10-04 10:11:11.669,2007-10-04 10:11:11.669,Server,Data,Client,TYPE=STP,Length=329,Cnt=11
Here is the Script:
import csv
cr = csv.reader(open("ActualTrace_01 - short2Times.csv", "rb"))
output = csv.writer(open("GermanygoalInputFormatActualTrace_01 - short.csv", "wb"))
for row in cr:
dateTimeContentsSend = row[1].split(".")
finishSend = dateTimeContentsSend[0] + "-" + dateTimeContentsSend[1] + "-" + dateTimeContentsSend[2] + " " + dateTimeContentsSend[3] + ":"
finishSend+= dateTimeContentsSend[4] + ":" + dateTimeContentsSend[5] + "." + dateTimeContentsSend[6]
row[1] = finishSend
output.writerow(row)
All Threads here were not useful and if you just say row[1] = finishSend + "," + finishSend
it makes it in row[1] with quotes like this
send,"2007-10-04 10:11:11.669,2007-10-04 10:11:11.684",Server,Data,Client,TYPE=STP,Length=329,Cnt=11
Are you after (just after row[1] = finishSend)?
row.insert(2, row[1])

Categories