Selenium in Jupyter Notebook, Different Outcome on Different Cells - python

I'm a student studying Python and Web Crawling in Korea.
I found something I can't understand why. I want to ask why this happens and how can I fix it.
It will lovely if someone is gonna help me.
Here is my situation:
This is a code for my web crawling. There is some Korean words, but that's not important, I think.
zeropay_official = 'https://www.zeropay.or.kr/main.do?pgmId=PGM0081'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(zeropay_official)
driver.find_element_by_id('tryCode').click()
driver.find_element_by_id('tryCode').send_keys('서울특별시')
driver.find_element_by_id('skkCode').click()
driver.find_element_by_id('skkCode').send_keys('노원구')
driver.find_element_by_id('pobsAfstrName').send_keys('다마식당')
driver.find_element_by_xpath('//*[#id="form"]/div[2]/a').click()
test = driver.find_element_by_id('list_div')
test.text
and right below this Jupyter Notebook cell, I put the last line of the code,
test.text
to check what's happening.
But, first cell's output ls ''(None), and second cell's output is some string which I wanted to get.
Why Is this happening? And if I need to get the output data string on the first cell, to make this code as a module so my team can import it, what should I do?
Check this image if you couldn't clearly understand what I said due to my poor English.(sob)

You can add some wait time.
zeropay_official = 'https://www.zeropay.or.kr/main.do?pgmId=PGM0081'
driver = webdriver.Chrome('./driver/chromedriver')
driver.get(zeropay_official)
driver.find_element_by_id('tryCode').click()
driver.find_element_by_id('tryCode').send_keys('서울특별시')
driver.find_element_by_id('skkCode').click()
driver.find_element_by_id('skkCode').send_keys('노원구')
driver.find_element_by_id('pobsAfstrName').send_keys('다마식당')
driver.find_element_by_xpath('//*[#id="form"]/div[2]/a').click()
time.sleep(time_in_seconds)
test = driver.find_element_by_id('list_div')
test.text
As Korean text is taking some time to appear.

Related

Python + Open AI/GPT3 question: Why is part of my prompt spilling into the responses I receive?

This happens to probably 10% of responses I get. For whatever reason, the last bits of my prompt somehow spill into it, at the start of it. Like there will be a period, or a question mark, or sometimes a few of the last letters from the prompt, that get removed from the prompt, and somehow find their way into BOTH the response that gets printed inside of the Visual Studio Code terminal, AND in the outputted version that gets written to a corresponding Excel spreadsheet.
Any reason why this might happen?
Some example responses:
.
Most apples are colored red.
Also
?
Most rocks are colored gray.
Another example:
for it.
Most oceans are colored blue.
The period, the question mark, " for it" somehow get transposed FROM the end of the prompt, and tacked onto the response. And they even get removed from the prompt that was originally in the Excel spreadsheet to begin with.
Could this be a bug with xlsxwriter? open ai? Some combo of both?
Code here:
import xlsxwriter
import openpyxl
import os
import openai
filename = f'testing-openai-gpt3-requests-v1.xlsx'
wb = openpyxl.load_workbook(filename, read_only=False)
sheet = wb.active
# print("starting number of ideas is:")
# print(sheet.max_row)
for x in range(sheet.max_row):
c = sheet.cell(row = x+1, column = 1)
# print(c.value)
myCurrentText = c.value
myCurrentPrompt = "What is the color of most of the following objects: " + myCurrentBusinessIdea
openai.api_key = [none of your business]
response = openai.Completion.create(
model = "text-davinci-003",
prompt = myCurrentPrompt,
max_tokens = 1000,
)
TheOutputtedSummary = response['choices'][0]['text']
print(TheOutputtedSummary)
sheet.cell(row = x+1, column = 6).value = TheOutputtedSummary
wb.save(str(filename))
print('All finished!')
GPT-3 is a powerful language model capable of generating human-like text based on the input provided. However, it is important to ensure that the input text is clear and well-formatted in order to get the desired output.
One way to avoid issues with incomplete sentences is to ensure that your input text always ends with a full stop or other appropriate punctuation. This can help GPT-3 understand that the input text is complete, and prevent it from generating text that appears to be part of the input.
Here's how I solved the issue on a project I worked on. The project is a website featuring Amazon products, and it includes descriptions of the products based on reviews from purchasers.
It is important to be as clear as possible when generating a prompt for GPT-3, as this can help ensure that the model produces the desired output. Here's an example of a clear and well-formatted prompt:
"I will ask you a question and provide you with a list of objects.
Please tell me the colour of most of the following objects:
[dynamic text]
END OF THE LIST."

How do you avoid errors do an automatic grading script?

This is my second beginner python course, so I'm learning! For most things I'm working with Python notebooks because I feel it has more flexibility. The actual program is Coursera. I recently transferred the following code from python notebooks (that worked fine) back to Coursera, and I'm getting syntax errors. Not the first time. How do I avoid this? Looking for any advice.
"TabError: inconsistent use of tabs and spaces in indentation"
def email_list(domains):
emails = []
for provider, user in domains.items():
for each_user in user:
new_user=("{}#{} ".format(each_user,provider))
emails.append(new_user)
return(emails)
print(email_list({"gmail.com": ["clark.kent", "diana.prince", "peter.parker"], "yahoo.com": ["barbara.gordon", "jean.grey"], "hotmail.com": ["bruce.wayne"]}))
I have faced such problems in the past and here is what I did, after pasting your code in Coursera, try deleting every space and tab from your code and retype them for example, delete the spaces in the def line until you have this line
def email_list(domains):emails = []
after this line pressing return or enter right after the colon will give you the right indentation and hopefully, your code will run smoothly.

Entrez eFetch Accession Number

We are currently working on a project where we need to access the 'NP_' accession number from ClinVar. However, when we use the Entrez.eFetch( ) function, this information appears to be missing in the result. Here is a link to the website page where the NP_ number is listed:
https://www.ncbi.nlm.nih.gov/clinvar/variation/558834/
And here is the Python sample script code that fetches the XML result:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='variation', retmode="text")
print(handle.read())
Interestingly enough, this used to return the NP number in the results, however, it seems to the website formatting/style changed from when we last developed our Python script and we cannot seem to figure out how to retrieve the NP number now.
Any help would be greatly appreciated! Thank you for your time and input!
You need to format it like a new query not an old one:
handle = Entrez.efetch(db="clinvar", id=558834, rettype='vcv', is_varationid="true", from_esearch="true")
print(handle.read())
See also: https://www.ncbi.nlm.nih.gov/clinvar/docs/maintenance_use/

Basics of connecting python to the web and validating user input

I'm relatively new, and I'm just at a loss as to where to start. I don't expect detailed step-by-step responses (though, of course, those are more than welcome), but any nudges in the right direction would be greatly appreciated.
I want to use the Gutenberg python library to select a text based on a user's input.
Right now I have the code:
from gutenberg.acquire import load_etext
from gutenberg.cleanup import strip_headers
text = strip_headers(load_etext(11)).strip()
where the number represents the text (in this case 11 = Alice in Wonderland).
Then I have a bunch of code about what to do with the text, but I don't think that's relevant here. (If it is let me know and I can add it).
Basically, instead of just selecting a text, I want to let the user do that. I want to ask the user for their choice of author, and if Project Gutenberg (PG) has pieces by that author, have them then select from the list of book titles (if PG doesn't have anything by that author, return some response along the lines of "sorry, don't have anything by $author_name, pick someone else." And then once the user has decided on a book, have the number corresponding to that book be entered into the code.
I just have no idea where to start in this process. I know how to handle user input, but I don't know how to take that input and search for something online using it.
Ideally, I'd be able to handle things like spelling mistakes too, but that may be down the line.
I really appreciate any help anyone has the time to give. Thanks!
The gutenberg module includes facilities for searching for a text by metadata, such as author. The example from the docs is:
from gutenberg.query import get_etexts
from gutenberg.query import get_metadata
print(get_metadata('title', 2701)) # prints frozenset([u'Moby Dick; Or, The Whale'])
print(get_metadata('author', 2701)) # prints frozenset([u'Melville, Hermann'])
print(get_etexts('title', 'Moby Dick; Or, The Whale')) # prints frozenset([2701, ...])
print(get_etexts('author', 'Melville, Hermann')) # prints frozenset([2701, ...])
It sounds as if you already know how to read a value from the user into a variable, and replacing the literal author in the above would be as simple as doing something like:
author_name = my_get_input_from_user_function()
texts = get_etexts('author', author_name)
Note the following note from the same section:
Before you use one of the gutenberg.query functions you must populate the local metadata cache. This one-off process will take quite a while to complete (18 hours on my machine) but once it is done, any subsequent calls to get_etexts or get_metadata will be very fast. If you fail to populate the cache, the calls will raise an exception.
With that in mind, I haven't tried the code I've presented in this answer because I'm still waiting for my local cache to populate.

Having trouble playing music using IPython

I have the lines of code
import IPython
IPython.display.Audio(url="http://www.1happybirthday.com/PlaySong/Anna",embed=True,autoplay=True)
And I'm not really sure what's wrong. I am using try.jupyter.org to run my code, and this is within if statements. The notebook is also taking in user inputs and printing outputs. It gives no error, but just doesn't show up/start playing. I'm not really sure what's wrong.
Any help would be appreciated. Thanks!
First you should try it without the if statement. Just the two lines you mention above. This will still not work, because your URL does point to an HTML page instead of a sound file. In your case the correct URL would be 'https://s3-us-west-2.amazonaws.com/1hbcf/Anna.mp3'.
The Audio object which you are creating, will only be displayed if it is the last statement in a notebook cell. See my Python intro for details. If you want to use it within an if clause, you can use IPython.display.display() like this:
url = 'https://s3-us-west-2.amazonaws.com/1hbcf/Anna.mp3'
if 3 < 5:
IPython.display.display(IPython.display.Audio(url=url, autoplay=True))
else:
print('Hello!')

Categories