extract hidden email from webpage using requests and regular expression [closed]

extract hidden email from webpage using requests and regular expression [closed] - python

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 2 years ago.
Improve this question
I want to find the email in this webpage:
https://reachuae.com/livesearch/brand-detail/3910/A-ALICO-LTD-Sharjah
I created this code but no email found:
import requests
import re
url = 'https://reachuae.com/livesearch/brand-detail/3910/A-ALICO-LTD-Sharjah'
headers = {'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:76.0) Gecko/20100101 Firefox/76.0'}
r = requests.get(url, headers=headers)
print(r.status_code)
page_text = r.text
email = re.findall(r'\b[A-Z0-9._%+-]+#[A-Z0-9.-]+\.[A-Z]{2,}\b',page_text)
print(email)
return empty list

The email is not found in the URL which you have mentioned in the question, but when you click the "(Click here to send enquiry)" another URL is generating at the bottom of page. That URL is containing the mail id. Using the below python code you can extract that Mail id
import requests
from lxml import html
Mail_url = 'https://reachuae.com/livesearch/brand-detail/3910/A-ALICO-LTD-Sharjah'
def mailExtractor():
mail = Mail_url.split('/')
innumber = mail[-2]
Actual_url = 'https://reachuae.com/includes/contact_company.php?id={}&KeepThis=true&'.format(innumber)
getr = requests.get(Actual_url, headers={"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.106 Safari/537.36"})
sour = html.fromstring(getr.content)
emails = sour.xpath('//input[#name="mail"]//#value')
for mail in emails:
print(mail)
mailExtractor()

Related

Using redis in Python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed yesterday.
Improve this question
How can I use celery thats its expire time is 60sec using celery?
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3'
}
def weather(cities):
results = []
for city in cities:
res = requests.get(f'https://www.google.com/search?q={city} weather&oq={city} weather&aqs=chrome.0.35i39l2j0l4j46j69i60.6128j1j7&sourceid=chrome&ie=UTF-8', headers=headers)
soup = BeautifulSoup(res.text, 'html.parser')
weather = soup.select('#wob_tm')[0].getText().strip()
results.append({city: weather})
return results
cities = ["tehran" , "Mashhad","Shiraaz","Semirom","Ahvaz","zahedan","baghdad","van","herat","sari"]
weather_data = weather(cities)
print(weather_data)
def temporary_city(city):
res = requests.get(f'https://www.google.com/search?q={city} weather&oq={city} weather&aqs=chrome.0.35i39l2j0l4j46j69i60.6128j1j7&sourceid=chrome&ie=UTF-8', headers=headers)
return res

find a tag by beautifulsoup and extract element [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
In a HTML file, I have a tag that includes <source type="audio/mpeg" src="/us/media in that, and extract src element from that using bs4?

Here is the desired output:
from bs4 import BeautifulSoup
import requests
headers = {
'User-Agent': 'Mozilla/5.0 (iPad; CPU OS 12_2 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Mobile/15E148'}
res = requests.get('https://dictionary.cambridge.org/us/dictionary/english/vulnerable', headers = headers)
soup = BeautifulSoup(res.content, 'html.parser')
srcs = soup.select('source[src*="us/media"]')
for src in srcs:
try:
print(src['src'])
except:
pass
Output:
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/uk_pron/u/ukv/ukvor/ukvorte027.mp3
/us/media/english/uk_pron_ogg/u/ukv/ukvor/ukvorte027.ogg
/us/media/english/us_pron/v/vul/vulne/vulnerable.mp3
/us/media/english/us_pron_ogg/v/vul/vulne/vulnerable.ogg
/us/media/english/us_pron/e/eus/eus74/eus74904.mp3
/us/media/english/us_pron_ogg/e/eus/eus74/eus74904.ogg

Why does html from requests response deviate from dev tools? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
I am trying to scrape houzz website
In browser dev tools it shows HTML content. But when I scrape it with beautifulsoup, it returns something else together with some of the html, I do not have much knowledge on this.
A little part of what I get is as follows.
</div><style data-styled="true" data-styled-version="5.2.1">.fzynIk.fzynIk{box-sizing:border-box;margin:0;overflow:hidden;}/*!sc*/
.eiQuKK.eiQuKK{box-sizing:border-box;margin:0;margin-bottom:4px;}/*!sc*/
.chJVzi.chJVzi{box-sizing:border-box;margin:0;margin-left:8px;}/*!sc*/
.kCIqph.kCIqph{box-sizing:border-box;margin:0;padding-top:32px;padding-bottom:32px;border-top:1px solid;border-color:#E6E6E6;}/*!sc*/
.dIRCmF.dIRCmF{box-sizing:border-box;margin:0;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-box-pack:justify;-webkit-justify-content:space-between;-ms-flex-pack:justify;justify-content:space-between;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;margin-bottom:16px;}/*!sc*/
.kmAORk.kmAORk{box-sizing:border-box;margin:0;margin-bottom:24px;}/*!sc*/
.bPERLb.bPERLb{box-sizing:border-box;margin:0;margin-bottom:-8px;}/*!sc*/
What should I do with this? Is not this achievable with beautfulsoup?

Developer Tools operate on a live browser DOM, what you’ll see when inspecting the page source is not the original HTML, but a modified one after applying some browser clean up and executing JavaScript code.
Requests is not executing JavaScript so content can deviate slightly, but you can scrape - Just take a deeper look into your soup.
Example (project titles)
from bs4 import BeautifulSoup
import requests
url_news = " https://www.houzz.com.au/professionals/home-builders/turrell-building-pty-ltd-pfvwau-pf~1099128087"
headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/62.0.3202.94 Safari/537.36'}
response = requests.get(url_news, headers=headers)
soup = BeautifulSoup(response.content, "html.parser")
[title.text for title in soup.select('#projects h3')]
Output
['Major Renovation & Master Wing',
'"The Italian Village" Private Residence',
'Country Classic',
'Residential Resort',
'Resort Style Extension, Stone and Timber',
'Old Northern Rd Estate']

How to Search on Google the users input and Open it in a browser without errors? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
Want to make a python code, so that if the user inputs his search words in IDE it should take the user to google website.

Try:
import webbrowser # the module to open url
sear=input("Enter your search: ").strip().replace(' ','+') # converting search input to a perfect query
URL='https://google.com/search?q=' + sear #creating the url
webbrowser.open_new_tab(URL) # opening the url
Also sentences including some special charecters like 'Hams&Eggs' intent to cause problem. Here we use quote() from urllib:
import webbrowser # the module to open url
from urllib.parse import quote # to encode to percentage encoding
sear=input("Enter your search: ").strip() # converting search input to a perfect query
URL='https://google.com/search?q=' + quote(sear) #creating the url
webbrowser.open_new_tab(URL) # opening the url
Now, to open the first result in the google search:
import requests , webbrowser
from bs4 import BeautifulSoup
sear=input("Enter your search: ").strip().replace(' ','+')
URL='https://google.com/search?q=' + sear
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
headers = {"user-agent": USER_AGENT}
resp = requests.get(URL,headers = headers)
soup = BeautifulSoup(resp.content, "html.parser")
webbrowser.open_new_tab(soup.find_all('div',class_='r')[0].find('a')['href'])
Now to open all result links:
import requests , webbrowser
from bs4 import BeautifulSoup
sear=input("Enter your search: ").strip().replace(' ','+')
URL='https://google.com/search?q=' + sear
USER_AGENT = "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:65.0) Gecko/20100101 Firefox/65.0"
headers = {"user-agent": USER_AGENT}
resp = requests.get(URL,headers = headers)
soup = BeautifulSoup(resp.content, "html.parser")
for i in soup.find_all('div',class_='r'):
webbrowser.open_new_tab(i.find('a')['href'])

import webbrowser
url = 'https://www.google.com/search?q=' #Search Query Url
search_words = input("Search your words") # User inputs data as per search requirements
webbrowser.open(url+search_words) # Opens the webbrowser

Python: remove unwanted data to a standard json [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
A url has this JSON which is not in standard format:
{}&& {identifier:'ID', label:'As at 08-03-2018 5:06 PM',items:[{ID:0,N:'2ndChance W200123',SIP:'',NC:'CDWW',R:'',I:'',M:'',LT:0.009,C:0.000,VL:108.200,BV:2149.900,B:'0.008',S:'0.009',SV:7218.300,O:0.009,H:0.009,L:0.008,V:873.700,SC:'5',PV:0.009,P:0.000,BL:'100',P_:'X',V_:''},{ID:1,N:'3Cnergy',SIP:'',NC:'502',R:'',I:'',M:'t',LT:0,C:0,VL:0.000,BV:50.000,B:'0.022',S:'0.025',SV:36.000,O:0,H:0,L:0,V:0.000,SC:'2',PV:0.021,P:0,BL:'100',P_:'X',V_:''},{ID:2,N:'3Cnergy W200528',SIP:'',NC:'1E0W',R:'',I:'',M:'t',LT:0,C:0,VL:0.000,BV:0,B:'',S:'0.004',SV:50.000,O:0,H:0,L:0,V:0.000,SC:'5',PV:0.002,P:0,BL:'100',P_:'X',V_:''}`
I want to make all the data into list or in pandas started from ID.
{}&& {identifier:'ID', label:'As at 08-03-2018 5:06 PM',items: is not wanted when I requested the url
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
url ='AttributeError: 'http://www.sgx.com/JsonRead/JsonstData?qryId=RAll'
page = requests.get(url,headers=headers)
alldata = html.fromstring(page.content)
However, I am unable to continue as the JSON format is not standard. How to correct it?

import requests
import execjs
url = 'http://www.sgx.com/JsonRead/JsonstData?qryId=RAll'
headers = {'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/50.0.2661.102 Safari/537.36'}
page = requests.get(url,headers=headers)
content = page.content[len('{}&& '):] if page.content.startswith('{}&& ') else page.content
data = execjs.get().eval(content)
print(data)
The data is JavaScript Object, in literal notation.
We can use PyExecJs to eval it and get corresponding python dict.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

extract hidden email from webpage using requests and regular expression [closed] - python

Related

Using redis in Python [closed]

find a tag by beautifulsoup and extract element [closed]

Why does html from requests response deviate from dev tools? [closed]

How to Search on Google the users input and Open it in a browser without errors? [closed]

Python: remove unwanted data to a standard json [closed]

Categories

Resources