No connection adapters were found for '%s'" % url - python

Sorry to ask this ! I'm newbie so feel free to teach me anything you guys know.
I'm making a scraping tool for my marketing purpose to scrape contact information from website. I'm using Python 3
This is my code:
import requests, bs4, os, codecs, csv
import pandas as pd
import sys
os.path.join('usr', 'bin', 'spam')
openFile = open('C:\\Users\\hdtra\\Desktop\\Test_1.csv',encoding='utf-8-sig')
read_test = csv.reader(openFile)
for link in read_test :
res = requests.get(link)
res.raise_for_status
facebookSpider = bs4.BeautifulSoup(res.text)
email = facebookSpider.select("._4-u2._3xaf._3-95._4-u8")
helloFile = open('C:\\Users\\hdtra\\Desktop\\In processing\\information.txt','w')
helloFile.write(str(email[3].encode('utf-8')) + '\n')
helloFile.close()
Have no idea why it gets me st like this:
Traceback (most recent call last):
File "C:\Users\hdtra\Desktop\In processing\Facebook_spider.py", line 12, in <module>
res = requests.get(link)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 508, in request
resp = self.send(prep, **send_kwargs)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 612, in send
adapter = self.get_adapter(url=request.url)
File "C:\Program Files\Python36\lib\site-packages\requests\sessions.py", line 703, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '['http://www.facebook.com/D2Streetwear/?ref=br_rs']'
I know that get() only gets string, but have no idea how to convert these links into strings. This is my cvs file:
only one column with 5 row:
http://www.facebook.com/D2Streetwear/?ref=br_rs
https://www.facebook.com/RealClothes/?ref=br_rs
https://www.facebook.com/Lecamelliaclothing/?ref=br_rs
https://www.facebook.com/TaTclothing-285844471884952/?ref=br_rs
https://www.facebook.com/Dai-Clothing-130675847640538/?ref=br_rs
I tried to put str(link()) but it does not work.

You should understand that csv.reader returns an iterator that iterates over each row to return a list of columns for each one.
csv.reader(csvfile, dialect='excel', **fmtparams)
Return a reader object which will iterate over lines in the given
csvfile.[...]
Each row read from the csv file is returned as a list of strings.
Bold emphasis mine. Your CSV appears to contain a single column, so you can access the first column using link[0].
with open('test.csv') as f:
r = csv.reader(f)
for row in r:
r = requests.get(row[0])
...
I consider it good practice to always use a with...as context manager when handling file I/O, as it automatically closes your file and results in cleaner code.

Related

Import JSON in Python error BadStatusLine

I'm trying to import the Json data generated by an Impinj R420 reader.
The code i use is:
# import urllib library
import urllib.request
from urllib.request import urlopen
# import json
import json
# store the URL in url as
# parameter for urlopen
url = "http://10.234.92.19:14150"
# store the response of URL
response = urllib.request.urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response())
# print the json response
print(data_json)
When i execute the programm it gives the following error:
Traceback (most recent call last):
File "C:\Users\V\PycharmProjects\Stapelcontrole\main.py", line 13, in <module>
response = urllib.request.urlopen(url)
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 216, in urlopen
return opener.open(url, data, timeout)
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 519, in open
response = self._open(req, data)
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 536, in _open
result = self._call_chain(self.handle_open, protocol, protocol +
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 496, in _call_chain
result = func(*args)
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 1377, in http_open
return self.do_open(http.client.HTTPConnection, req)
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\urllib\request.py", line 1352, in do_open
r = h.getresponse()
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 1374, in getresponse
response.begin()
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 318, in begin
version, status, reason = self._read_status()
File "C:\Users\V\AppData\Local\Programs\Python\Python310\lib\http\client.py", line 300, in _read_status
raise BadStatusLine(line)
http.client.BadStatusLine: {"epc":"3035307B2831B383E019E8EA","firstSeenTimestamp":"2022-04-11T11:24:23.592434Z","isHeartBeat":false}
Process finished with exit code 1
I know this is an error in the response where it gets a faulty HTTP status code.
Yet i don't know how to fix the error.
Could you advice me how to fix this?
The {"epc":"3035307B2831B383E019E8EA","firstSeenTimestamp":"2022-04-11T11:24:23.592434Z","isHeartBeat":false} is an answer i expect.
Thanks in advance
Edit:
Even with
with urllib.request.urlopen(url) as f:
data_json = json.load(f)`
i get the same BadStatusLine error.
I can't setup the reader any different, it can only sent a JSON response trough the IP-adress of the device. Is there a way to import the data without the HTTP Protocol?
# store the response of URL
response = urllib.request.urlopen(url)
# storing the JSON response
# from url in data
data_json = json.loads(response())
Here you are actually calling response, I do not know what you want to achieve by that, but examples in urllib.request docs suggest that urllib.request.urlopen should be treated akin to local file handle, thus please replace above using
with urllib.request.urlopen(url) as f:
data_json = json.load(f)
Observe that I used json.load not json.loads
EDIT: After Is there a way to import the data without the HTTP Protocol? I conclude more low-level solution is needed, hopefully socket will you allow to get what you want, using echo client example as starting point I prepared following code
import socket
with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s:
s.connect(("10.234.92.19",14150))
s.sendall(b'GET / HTTP/1.1\r\n')
data = s.recv(1024)
print(data)
If everything will work as intended you should get printed 1024 first bytes of answer. If it so change 1024 to value which will be always bigger than number of bytes of response and use json.dumps(data) to get parsed response

How to handle the "Traceback" error in python?

Can anyone please tell me How do I remove this Traceback (most recent call last): Error in python. I am using python 2.7.9
Take a look over the code.
import requests
import optparse
parser = optparse.OptionParser()
parser.add_option("-f", '--filename', action="store" ,dest="filee")
options, args = parser.parse_args()
file = options.filee
fopen = open(file, 'r')
for x in fopen.readlines():
print "Checking for Clickjacking vulnerability\n"
url = x.strip('\n')
req = requests.get(url)
try:
print "[-]Target:" + url + " Not vulnerable\n The targeted victim has %s header\n" % (req.headers['X-Frame-Options'])
except Exception as e:
print "[+] Target:" + url +" Vulnerable to clickjacking"
After running the code successfully I go this error at the end
Traceback (most recent call last):
File "C:\Python27\utkarsh3.py", line 17, in <module>
req = requests.get(url)
File "C:\Python27\lib\site-packages\requests\api.py", line 72, in get
return request('get', url, params=params, **kwargs)
File "C:\Python27\lib\site-packages\requests\api.py", line 58, in request
return session.request(method=method, url=url, **kwargs)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 494, in request
prep = self.prepare_request(req)
File "C:\Python27\lib\site-packages\requests\sessions.py", line 437, in prepare_request
hooks=merge_hooks(request.hooks, self.hooks),
File "C:\Python27\lib\site-packages\requests\models.py", line 305, in prepare
self.prepare_url(url, params)
File "C:\Python27\lib\site-packages\requests\models.py", line 379, in prepare_url
raise MissingSchema(error)
requests.exceptions.MissingSchema: Invalid URL '': No schema supplied. Perhaps you meant http://?
Which really irritate me. I know there are so many peoples who are already asking this before. But I can't understand it so I ask.
And please tell me How we beginners handle these errors?
in a eli5 fashion, a Traceback is a log of what the program was trying to do before actual error happened. Your actual error is requests.exceptions.MissingSchema
The line that follows Invalid URL '': No schema supplied. Perhaps you meant http://? describes the exact problem.
File "C:\Python27\utkarsh3.py", line 17, in <module>
req = requests.get(url)
These above lines describe where the error started..
So, if you go to line 17 of your program you must see this exact same line.
Making a context out of these two things, i get that url is a string that is just example.com and not http://example.com or something on those lines.
I can only speculate so much on what your code might be. But, feel free to provide your code snippets to explain more.
But, hope this helps you to read future tracebacks.
Edit1 : Now that you added snippet. Try printing url just before requests.get(url) and see what exactly you are trying to reach. And, if you have the right schema prepended.

Requests : No connection adapters were found for, error in Python3 [duplicate]

This question already has answers here:
Python Requests - No connection adapters
(4 answers)
Closed 6 months ago.
import requests
import xml.etree.ElementTree as ET
import re
gen_news_list=[]
r_milligenel = requests.get('http://www.milliyet.com.tr/D/rss/rss/Rss_4.xml')
root_milligenel = ET.fromstring(r_milligenel.text)
for entry in root_milligenel:
for channel in entry:
for item in channel:
title = re.search(".*title.*",item.tag)
if title:
gen_news_list.append(item.text)
link = re.search(".*link.*",item.tag)
if link:
gen_news_list.append(item.text)
r = requests.get(item.text)
print(r.text)
I have a list which named gen_news_list and i'm trying to append titles, summaries, links etc to this list. But there is an error occur when i tried to request a link:
Traceback (most recent call last):
File "/home/deniz/Masaüstü/Çalışmalar/Python/Bot/xmlcek.py", line 23, in <module>
r = requests.get(item.text)
File "/usr/lib/python3/dist-packages/requests/api.py", line 55, in get
return request('get', url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/api.py", line 44, in request
return session.request(method=method, url=url, **kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 456, in request
resp = self.send(prep, **send_kwargs)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 553, in send
adapter = self.get_adapter(url=request.url)
File "/usr/lib/python3/dist-packages/requests/sessions.py", line 608, in get_adapter
raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for '
http://www.milliyet.com.tr/tbmm-baskani-cicek-programlarini/siyaset/detay/2037301/default.htm
The first link worked successfully. But second one out an error. I can't add the content to list cause of this error. Is it a problem about my loop? What's wrong with code?
If you add the line print(repr(item.text)) before the problematic line r = requests.get(item.text) you see that starting the second time item.text has \n at the beginning and the end which is not allowed for a URL.
'\nhttp://www.milliyet.com.tr/tbmm-baskani-cicek-programlarini/siyaset/detay/2037301/default.htm\n'
I use repr because it literally shows the newline as the string \n in its output.
The solution to your problem is to call strip on item.text to remove those newlines:
r = requests.get(item.text.strip())

Unknown URL type using urllib2.urlopen()

I am trying to do the following:
open a CSV file containing a list with URLs (GET-Requests)
read the CSV file and write the entries to a list
open every single URL and read the answer
write the answers back to a new CSV file
I get the following error:
Traceback (most recent call last):
File "C:\Users\l.buinui\Desktop\request2.py", line 16, in <module>
req = urllib2.urlopen(url)
File "C:\Python27\lib\urllib2.py", line 127, in urlopen
return _opener.open(url, data, timeout)
File "C:\Python27\lib\urllib2.py", line 404, in open
response = self._open(req, data)
File "C:\Python27\lib\urllib2.py", line 427, in _open
'unknown_open', req)
File "C:\Python27\lib\urllib2.py", line 382, in _call_chain
result = func(*args)
File "C:\Python27\lib\urllib2.py", line 1247, in unknown_open
raise URLError('unknown url type: %s' % type)
URLError: <urlopen error unknown url type: ['http>
Here is the code I am using:
import urllib2
import urllib
import csv
# Open and read the source file and write entries to a list called link_list
source_file=open("source_new.csv", "rb")
source = csv.reader(source_file, delimiter=";")
link_list = [row for row in source]
source_file.close()
# Create an output file which contains the answer of the GET-Request
out=open("output.csv", "wb")
output = csv.writer(out, delimiter=";")
for row in link_list:
url = str(row)
req = urllib2.urlopen(url)
output.writerow(req.read())
out.close()
What is going wrong there?
Thanks in advance for any hints.
Cheers
Using the row variable will pass a list element (it contains only one element, the url) to urlopen, but passing row[0] will pass the string containing the url.
The csv.reader returns a list for each row it reads, no matter how many items are in the row.
It's working now. If I directly reference row[0]in the loop there are no problems.
import urllib2
import urllib
import csv
# Open and read the source file and write entries to a list called link_list
source_file=open("source.csv", "rb")
source = csv.reader(source_file, delimiter=";")
link_list = [row for row in source]
source_file.close()
# Create an output file which contains the answer of the GET-Request
out=open("output.csv", "wb")
output = csv.writer(out)
for row in link_list:
req = urllib2.urlopen(row[0])
answer = req.read()
output.writerow([answer])
out.close()

Python: saving large web page to file

Let me start off by saying, I'm not new to programming but am very new to python.
I've written a program using urllib2 that requests a web page that I would then like to save to a file. The web page is about 300KB, which doesn't strike me as particularly large but seems to be enough to give me trouble, so I'm calling it 'large'.
I'm using a simple call to copy directly from the object returned from urlopen into the file:
file.write(webpage.read())
but it will just sit for minutes, trying to write into the file and I eventually receive the following:
Traceback (most recent call last):
File "program.py", line 51, in <module>
main()
File "program.py", line 43, in main
f.write(webpage.read())
File "/usr/lib/python2.7/socket.py", line 351, in read
data = self._sock.recv(rbufsize)
File "/usr/lib/python2.7/httplib.py", line 541, in read
return self._read_chunked(amt)
File "/usr/lib/python2.7/httplib.py", line 592, in _read_chunked
value.append(self._safe_read(amt))
File "/usr/lib/python2.7/httplib.py", line 649, in _safe_read
raise IncompleteRead(''.join(s), amt)
httplib.IncompleteRead: IncompleteRead(6384 bytes read, 1808 more expected)
I don't know why this should give the program so much grief?
EDIT |
here is how I'm retrieving the page
jar = cookielib.CookieJar()
cookie_processor = urllib2.HTTPCookieProcessor(jar);
opener = urllib2.build_opener(cookie_processor)
urllib2.install_opener(opener)
requ_login = urllib2.Request(LOGIN_PAGE,
data = urllib.urlencode( { 'destination' : "", 'username' : USERNAME, 'password' : PASSWORD } ))
requ_page = urllib2.Request(WEBPAGE)
try:
#login
urllib2.urlopen(requ_login)
#get desired page
portfolio = urllib2.urlopen(requ_page)
except urllib2.URLError as e:
print e.code, ": ", e.reason
I'd use a handy fileobject copier function provided by shutil module. It worked on my machine :)
>>> import urllib2
>>> import shutil
>>> remote_fo = urllib2.urlopen('http://docs.python.org/library/shutil.html')
>>> with open('bigfile', 'wb') as local_fo:
... shutil.copyfileobj(remote_fo, local_fo)
...
>>>
UPDATE: You may want to pass the 3rd argument to copyfileobj that controls the size of internal buffer used to transfer bytes.
UPDATE2: There's nothing fancy about shutil.copyfileobj. It simply reads a chunk of bytes from source file object and writes it the destination file object repeatedly until there's nothing more to read. Here's the actual source code of it that I grabbed from inside Python standard library:
def copyfileobj(fsrc, fdst, length=16*1024):
"""copy data from file-like object fsrc to file-like object fdst"""
while 1:
buf = fsrc.read(length)
if not buf:
break
fdst.write(buf)

Categories