finding HTTPS images with BeautifulSoup, python [closed] - python

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 4 years ago.
Improve this question
I'm iterating through all the img's in a request.POST to see if they are HTTPS (I'm using Beautiful Soup to help)
Here's my code:
content = request.POST['content']
print(content) #prints:
<p>test test test</p><br><p><img src="https://www.treefrogfarm.com/store/images/source/IFE_A-K/ClarySage2.jpg" alt=""></p><br><p>2nd 2nd</p><br><p><img src="https://www.treefrogfarm.com/store/images/source/IFE_A-K/ClarySage2.jpg" alt=""></p>
soup = BeautifulSoup(content, 'html.parser')
for image in soup.find_all('img'):
print('Source:', image.get('src')[:8]) #prints Source: https://
if image.get('src')[:7] == "https://":
print('HTTPS')
else:
print('Not HTTPS')
Even though image.get('src')[:7] == "https://", the code still prints Not HTTPS.
Any idea why?

Well for starters, 'https://' is 8 characters, so there's no way that a slice of 7 characters can match it.
Also, please make your question titles actually indicative of the problem you're having rather than unrelated accusations about the python operators.

to match the https:// string the appropriate slice would be :8 instead of :7
if image.get('src')[:8] == "https://":

Related

.remove function not working with an 'if' check in lists [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 3 years ago.
Improve this question
I'm fairly new to programming and python itself- I'm trying to use the .remove function to delete an item from a list if it exists in that list (as to not get the nameError).
q = ["cat","dog","fish","hamster","horse"]
#Request element name to delete from queue
removeElement = input("Please type in the element name to remove from the queue: ")
#Remove the given element from the list
q.remove(removeElement) if 'removeElement' in q else None
print(q)
Unfortunately, if I try and use the 'if' checker it the item isn't removed from my list- why is this and how can I fix this issue?
You have to use the variable's name not a string:
q.remove(removeElement) if removeElement in q else None

I have an unindent error while using geany with python [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 7 years ago.
Improve this question
I'm using geany and I get the following error
File "autoblog2.py", line 9
htmlfile = urllib.urlopen(url)
^
IndentationError: unindent does not match any outer indentation level
here is my code.
import urllib
import re
symbols_list = ["aapl","spy","goog","nflx"]
i = 0
while i<len(symbols_list):
url = 'https://uk.finance.yahoo.com/q?s='+symbols_list[i]
htmlfile = urllib.urlopen(url)
htmltext = htmlfile.read()
regex = '<span id="yfs_l84_aapl">(.+?)'+symbols_list[i]'</span>'
pattern = re.compile(regex)
price = re.findall(pattern,htmltext)
print 'the price of' +symbols_list[i]
i+=1
I don't get any errors when I run the same code on a single url. I've only had since trying it with a while loop, i'm using python 2
This can happen when you edit one script with two editors.
Your indent settings can differ from editor to editor.
Take a look at the script with another editor.
If the script has the same indents in other editors the only way is to remove all indents and add them again.
I would recommend the python-idle.
It should show the indents like the interpreter reads them.
Good Luck.

Displaying and iterating a list within a string? [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 5 months ago.
Improve this question
So I'm writing a program which part of it involves sending out an email.
Within the email body (which is a string), I want to display a list. However, I want to know how/or what the best way to do this because right now, it only displays the first element of the list (naturally). As you can see, the list I'm trying to put in at the bottom is:
formatted_times
Here's the code below:
FROM = gmail_user
TO = ['mcgoga12#wfu.edu']
SUBJECT = "StudyBug - Study Rooms for %s" % newdate
TEXT = """
This is an automated email from StudyBug.
We wanted to let you know that the following studyrooms were booked for %s:
%s
Thank You,
StudyBug
-----------------------------------------------------------------------------------------------
-----------------------------------------------------------------------------------------------
Check out the project page!: https://github.com/g12mcgov/StudyBug
""" % (newdate, formatted_times)
Just join the list on newlines (or commas, or whatever you like) as you format it:
"""…booked for %s:
%s
Thank You, …""" % (newdate, "\n".join(formatted_times))

unindent does not match any outer indentation level [duplicate_same] [closed]

Closed. This question is not reproducible or was caused by typos. It is not currently accepting answers.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Closed 8 years ago.
Improve this question
When i make error?
Don't see errors. Please, help me.
datafile = file('c:\\Users\\username\\Desktop\\test.txt')
for line in datafile:
if '5256' in line:
GLOBACCESS[jid]=100
reply('private', source, u'You license is valid!')
else:
reply('private', source, u'Incorrect password/jid')
Lines 2 and 3 are using spaces while everything else is using tabs. You always need to use the same, and should choose spaces per pep 8
Line 2 shouldn't be indented at all, with spaces or with tabs
for loop shouldn't be indented.
datafile = file('c:\\Users\\username\\Desktop\\test.txt')
for line in datafile:
if '5256' in line:
GLOBACCESS[jid]=100
reply('private', source, u'You license is valid!')
else:
reply('private', source, u'Incorrect password/jid')

Scraping in python using regex not giving any result? [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Closed 8 years ago.
This question was caused by a typo or a problem that can no longer be reproduced. While similar questions may be on-topic here, this one was resolved in a way less likely to help future readers.
Questions asking for code must demonstrate a minimal understanding of the problem being solved. Include attempted solutions, why they didn't work, and the expected results. See also: Stack Overflow question checklist
Improve this question
I am using python 3 to scrape a website and print a value. Here is the code
import urllib.request
import re
url = "http://in.finance.yahoo.com/q?s=spy"
hfile = urllib.request.urlopen(url)
htext = hfile.read().decode('utf-8')
regex = '<span id="yfs_l84_SPY">(.+?)</span>'
code = re.compile(regex)
price = re.findall(code,htext)
print (price)
when i run this snippet, it prints an empty list, ie. [], but i am expecting a value e.g. 483.33.
What is the thing that i am getting wrong ? Help
I have to recommend that you not use regex to parse HTML, because HTML is not a regular language. Yes, you could use it here. It's not a good habit to get into.
The biggest issue I imagine that you're having is that the real id of the span you're looking for on that page is yfs_l84_spy. Note case.
That said, here is a quick implementation in BeautifulSoup.
import urllib.request
from bs4 import BeautifulSoup
url = "http://in.finance.yahoo.com/q?s=spy"
hfile = urllib.request.urlopen(url)
htext = hfile.read().decode('utf-8')
soup = BeautifulSoup(htext)
soup.find('span',id="yfs_l84_spy")
Out[18]: <span id="yfs_l84_spy">176.12</span>
And to get at that number:
found_tag = soup.find('span',id="yfs_l84_spy") #tag is a bs4 Tag object
found_tag.next #get next (i.e. only) element of the tag
Out[36]: '176.12'
You are not using the regex correctly, there are 2 ways of doing this:
1.
regex = '<span id="yfs_l84_spy">(.+?)</span>'
code = re.compile(regex)
price = code.findall(htext)
2.
regex = '<span id="yfs_l84_spy">(.+?)</span>'
price = re.findall(regex, htext)
It should be noted that the Python regex library does some caching internally so precaching has only limited effect.

Categories