Exception: Failed to parse class ID from URL

Exception: Failed to parse class ID from URL - python

I'm subscribed to skillshare but turns out skillshare's UI is a huge mess and unproductive for learning. So, I am seeking for a way to mass download course(single course) at once.
I found this github.
https://github.com/crazygroot/skillsharedownloader
And it has a google collab link as well at the bottom.
https://colab.research.google.com/drive/1hUUPDDql0QLul7lB8NQNaEEq-bbayEdE#scrollTo=xunEYHutBEv%2F
I'm getting the below error:
Traceback (most recent call last):
File "/root/Skillsharedownloader/ss.py", line 11, in <module>
dl.download_course_by_url(course_url)
File "/root/Skillsharedownloader/downloader.py", line 34, in download_course_by_url
raise Exception('Failed to parse class ID from URL')
Exception: Failed to parse class ID from URL
This is the course link that I'm using:
https://www.skillshare.com/en/classes/React-JS-Learn-by-examples/1186943986/

I have encountered a similar issue. The problem was that it requires the URL to look like https://www.skillshare.com/classes/.*?/(\d+).
If your copying the URL from the address bar, check the URL again and make sure it has the same format. The current one looks like https://www.skillshare.com/en/classes/xxxxx. So simply remove /en.

Related

Facebook scraper 'NoneType' object has no attribute 'find' while get_post

While i using facebook_scraper libraries to get post from facebook page with this code.
from facebook_scraper import get_posts
for post in get_posts('ThaiPBSFan', pages = 50):
print(post['text'][:100])
It work with few post, then error like this.
Traceback (most recent call last):
File ".\main.py", line 2, in <module>
for post in get_posts('ThaiPBSFan', pages = 50):
File "C:\Users\admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\facebook_scraper.py", line 75, in _get_posts
yield _extract_post(article)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\facebook_scraper.py", line 102, in _extract_post
text, post_text, shared_text = _extract_text(article)
File "C:\Users\admin\AppData\Local\Programs\Python\Python37-32\lib\site-packages\facebook_scraper.py", line 137, in _extract_text
nodes = article.find('p, header')
AttributeError: 'NoneType' object has no attribute 'find'
So what's a problem and how can i fix it.

From the traceback, it seems that facebook_scraper is not returning a valid post; this may be because there are no further posts to find on the page.
Therefore, you could use a try/except block to catch this exception, i.e.:
try:
for post in get_posts('ThaiPBSFan', pages=50):
print(post['text'][:100])
except AttributeError:
print("No more posts to get")
It's not ideal as you would preferably be able to get a more specific exception once there were no more posts to retrieve, but it should work in your case. Be careful with the code insider your try clause - if an AttributeError is raise anywhere else, you will miss it.

I had the same issue, but only when using the most recent version of the package (0.1.12). Try with an older version of the package. For example, I tried the version 0.1.4 and it worked well. To install it, write:
pip install facebook_scraper==0.1.4
in your terminal.

(Python3) How do you get the time stamp of a remote file in python (i.e. a web link)?

I have already seen the examples on here of using python's os library to get a local file's time stamp in python by passing it the local path (i.e. /var/www/html/etc.../filename.txt), but when I try to pass getmtime a link, it cannot process it.
Here is what the code looks like:
import os
print(os.path.getmtime('https://www.sec.gov/Archives/edgar/data/1474439/000169655519000022/xslF345X03/wf-form4_156772823294389.xml'))
Here is the error I get:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib64/python3.7/genericpath.py", line 55, in getmtime
return os.stat(filename).st_mtime
FileNotFoundError: [Errno 2] No such file or directory: 'https://www.sec.gov/Archives/edgar/data/1474439/000169655519000022/xslF345X03/wf-form4_156772823294389.xml'
I know that this link exists.
So it obviously doesn't like me passing it a link. Is there another function that you use to pass links, to get the last modification time of a remote file?

An URL is not necessarily a file. You can ask the remote server to tell you about the link, and the remote server may provide a Last-Modified header, or may not, at the remote server's discretion. It could also lie, if so instructed. In order to do this, you would need to make a HTTP request; the easiest way to do it from Python is the nice requests library.
import requests
import dateutil.parser
response = requests.head(url)
last_modified = response.headers.get('Last-Modified')
if last_modified:
last_modified = dateutil.parser.parse(last_modified)

Using google_rest API

I am trying to use the Google Drive API to download publicly available files however whenever I try to proceed I get an import error.
For reference, I have successfully set up the OAuth2 such that I have a client id as well as a client secret , and a redirect url however when I try setting it up I get an error saying the object has no attribute urllen
>>> from apiclient.discovery import build
>>> from oauth2client.client import OAuth2WebServerFlow
>>> flow = OAuth2WebServerFlow(client_id='not_showing_client_id', client_secret='not_showing_secret_id', scope='https://www.googleapis.com/auth/drive', redirect_uri='https://www.example.com/oauth2callback')
>>> auth_uri = flow.step1_get_authorize_url()
>>> code = '4/E4h7XYQXXbVNMfOqA5QzF-7gGMagHSWm__KIH6GSSU4#'
>>> credentials = flow.step2_exchange(code)
And then I get the error:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/Library/Python/2.7/site-packages/oauth2client/util.py", line
137, in positional_wrapper
return wrapped(*args, **kwargs)
File "/Library/Python/2.7/site-packages/oauth2client/client.py", line
1980, in step2_exchange
body = urllib.parse.urlencode(post_data)
AttributeError: 'Module_six_moves_urllib_parse' object has no attribute
'urlencode'
Any help would be appreciated, also would someone mind enlightening me as to how I instantiate a drive_file because according to https://developers.google.com/drive/web/manage-downloads, I need to instantiate one and I am unsure of how to do so.
Edit: So I figured out why I was getting the error I got before. If anyone else is having the same problem then try running.
sudo pip install -I google-api-python-client==1.3.2
However I am still unclear about the drive instance so any help with that would be appreciated.
Edit 2: Okay so I figured out the answer to my whole question. The drive instance is just the metadata which results when we use the API to search for a file based on its id

So as I said in my edits try the sudo pip install and a file instance is just a dictionary of meta data.

Python TweetStream Access Denied

I keep getting this exception from TweetStream 1.1.1, "exception.code == 404:uthenticationError("Access denied")" It worked last week and now it doesn't. I have tried different usernames and passwords. I can log into twitter with my account information. I even deleted and reinstalled the module. what gives? Thanks for the help!
I try running this...
import tweetstream
stream = tweetstream.SampleStream("MY_USERNAME", "MY_PASSWORD")
for tweet in stream:
print tweet
The error actually looks like this:
Traceback (most recent call last):
File "<pyshell#28>", line 1, in <module>
for tweet in stream:
File "C:\Python27\lib\site-packages\tweetstream-1.1.1-py2.7.egg\tweetstream\streamclasses.py", line 165, in __iter__
self._init_conn()
File "C:\Python27\lib\site-packages\tweetstream-1.1.1-py2.7.egg\tweetstream\streamclasses.py", line 103, in _init_conn
raise AuthenticationError("Access denied")
AuthenticationError: Access denied

Twitter released the next version of API (1.1). And tweetstream doesn't support it yet. See relevant issue on tweetstream project issue tracker.

Had the same problem here, and I could not get the patched version mentioned on the project issue tracker (linked by #alecxe) to work either.
Twitter provides a list of libraries that should work with the newer API, at https://dev.twitter.com/docs/twitter-libraries
It lists many, including several for Python.

GetAuthSubToken returns None

Hey guys, I am a little lost on how to get the auth token. Here is the code I am using on the return from authorizing my app:
client = gdata.service.GDataService()
gdata.alt.appengine.run_on_appengine(client)
sessionToken = gdata.auth.extract_auth_sub_token_from_url(self.request.uri)
client.UpgradeToSessionToken(sessionToken)
logging.info(client.GetAuthSubToken())
what gets logged is "None" so that does seem right :-(
if I use this:
temp = client.upgrade_to_session_token(sessionToken)
logging.info(dump(temp))
I get this:
{'scopes': ['http://www.google.com/calendar/feeds/'], 'auth_header': 'AuthSub token=CNKe7drpFRDzp8uVARjD-s-wAg'}
so I can see that I am getting a AuthSub Token and I guess I could just parse that and grab the token but that doesn't seem like the way things should work.
If I try to use AuthSubTokenInfo I get this:
Traceback (most recent call last):
File "/Applications/GoogleAppEngineLauncher.app/Contents/Resources/GoogleAppEngine-default.bundle/Contents/Resources/google_appengine/google/appengine/ext/webapp/__init__.py", line 507, in __call__
handler.get(*groups)
File "controllers/indexController.py", line 47, in get
logging.info(client.AuthSubTokenInfo())
File "/Users/matthusby/Dropbox/appengine/projects/FBCal/gdata/service.py", line 938, in AuthSubTokenInfo
token = self.token_store.find_token(scopes[0])
TypeError: 'NoneType' object is unsubscriptable
so it looks like my token_store is not getting filled in correctly, is that something I should be doing?
Also I am using gdata 2.0.9
Thanks
Matt

To answer my own question:
When you get the Token just call:
client.token_store.add_token(sessionToken)
and App Engine will store it in a new entity type for you. Then when making calls to the calendar service just dont set the authsubtoken as it will take care of that for you also.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Exception: Failed to parse class ID from URL - python

Related

Facebook scraper 'NoneType' object has no attribute 'find' while get_post

(Python3) How do you get the time stamp of a remote file in python (i.e. a web link)?

Using google_rest API

Python TweetStream Access Denied

GetAuthSubToken returns None

Categories

Resources