I am trying to download files from a website. The URL is in HTML of the page, I am able to find the right one, but the various videos are in different frame per second (fps). I had multiple try functions as seen here but it was difficult to follow, so I tried the loop function seen here.
This is what I have:
import re
for [j] in range(23,24,25,26,27,28,29,30,31):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword=result.group(1)
commapos=extractedword.find(",")
link=extractedword[:commapos].replace('"',"")
except:
pass
print(title)
print(link)
The output message states range expected at most 3 arguments, got 9
Any advice how I can search for the correct URL? I've been trying to get this done for days. Thank you!
EDIT:
I should add the URL for one title only exists in one FPS at the set resolution. The various titles exist in a variety of FPS, but each title is only available in one FPS for the required resolution. Some of the solutions are returned "download error retrying" in a loop.
Use this code instead:
import re
s = # Put what s equals here
for j in range(23, 32):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword = result.group(1)
commapos = extractedword.find(",")
link = extractedword[:commapos].replace('"', "")
except:
pass
else:
print(title) # Also make sure to define title somewhere
print(link)
range takes three arguments: start,stop,step. So instead try this:
# other variables
import re
for j in range(23,31):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword=result.group(1)
commapos=extractedword.find(",")
link = extractedword[:commapos].replace('"',"")
except:
pass
else:
print(title)
print(link)
You really do not need range or even a loop here. There are many issues,
Not calling range correctly. start, stop, step
Never calling j as part of your regex
Given the issue, you are cautious that frame rates change and the regex may fail to match if that occurs. A simple, more elegant solution here is to just update your regex to match all the possible frame rates that may be present.
2[3-9]|3[0-1]
The above regex will match all numbers from 23 to 31. If we specify this as a capturing group, we can also access this via our search later if we want to store the frame rate.
import re
s = '"width":1280,"mime":"video/mp4","fps":25,"url":test.com'
result = re.search('"width":1280,"mime":"video\/mp4","fps":(2[3-9]|3[0-1]),"url":(.*)', s)
result.group(1)
#'25'
result.group(2)
#'test.com'
From here you can proceed to modify the output however you want. For a more in depth explanation of the regex you can see the calculation steps at regex101
Related
So I'm trying to create a python script to check data on client's websites for errors, I basically want to use a txt file with the necessary end of URL's and have the script test one line at a time.
This is the snippet from my script:
with open('numbers.txt') as numbers:
for index, line in enumerate(numbers)
def urlnumber():
number = numbers
url = "http://www.url.com/" + number
print ("Processing: "+url)
result = checkErr(url)
print(result)
For reference numbers.txt contains:
One
Two
Three
Four
Five
And I'm trying to make the script check "url.com/one" then "url.com/two" and so on.
If this question has been asked before, please point me in that direction, I have had a look at some similar questions, but the answers did not help me!
Thanks in advance for any help!
with open('numbers.txt') as f:
# Read file and split them into a list called `numbers`
numbers = f.read().splitlines()
for number in numbers:
url = "http://www.url.com/" + number
print ("Processing: "+url)
result = checkErr(url)
print(result)
I guess this does a cleaner job. I'd recommend you clean up the code inside the for loop though.
Python is a dynamically strongly typed language. So it won't convert an integer to a string when you try to concatenate them.
You have to either use string interpolation or explicitly convert it to a string.
for i in range(0, 10):
url = "http://www.url.com/" + str(x)
I'm new to Python and having some trouble with an API scraping I'm attempting. What I want to do is pull a list of book titles using this code:
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
for title in doc["sourceResource"]["title"]:
print (title)
Which works to pull the titles, but most (not all) titles are outputting as one character per line. I've tried adding .splitlines() but this doesn't fix the problem. Any advice would be appreciated!
The problem is that you have two types of title in the response, some are plain strings "Germain the wizard" and some others are arrays of string ['Joe Strong, the boy wizard : or, The mysteries of magic exposed /']. It seems like in this particular case, all lists have length one, but I guess that will not always be the case. To illustrate what you might need to do I added a join here instead of just taking title[0].
import requests
import json
r = requests.get('https://api.dp.la/v2/items?q=magic+AND+wizard&api_key=09a0efa145eaa3c80f6acf7c3b14b588')
data = json.loads(r.text)
for doc in data["docs"]:
title = doc["sourceResource"]["title"]
if isinstance(title, list):
print(" ".join(title))
else:
print(title)
In my opinion that should never happen, an API should return predictable types, otherwise it looks messy on the users' side.
So I am using a Magtek USB reader that will read card information,
As of right now I can swipe a card and I get a long string of information that goes into a Tkinter Entry textbox that looks like this
%B8954756016548963^LAST/FIRST INITIAL^180912345678912345678901234?;8954756016548963=180912345678912345678901234?
All of the data has been randomized, but that's the format
I've got a tkinter button (it gets the text from the entry box in the format I included above and runs this)
def printCD(self):
print(self.carddata.get())
self.card_data_get = self.carddata.get()
self.creditnumber =
self.card_data_get[self.card_data_get.find("B")+1:
self.card_data_get.find("^")]
print(self.creditnumber)
print(self.card_data_get.count("^"))
This outputs:
%B8954756016548963^LAST/FIRST INITIAL^180912345678912345678901234?;8954756016548963=180912345678912345678901234?
8954756016548963
This yields no issues, but if I wanted to get the next two variables firstname, and lastname
I would need to reuse self.variable.find("^") because in the format it's used before LAST and after INITIAL
So far when I've tried to do this it hasn't been able to reuse "^"
Any takers on how I can split that string of text up into individual variables:
Card Number
First Name
Last Name
Expiration Date
Regex will work for this. I didn't capture everything because you didn't detail what's what but here's an example of capturing the name:
import re
data = "%B8954756016548963^LAST/FIRST INITIAL^180912345678912345678901234?;8954756016548963=180912345678912345678901234?"
matches = re.search(r"\^(?P<name>.+)\^", data)
print(matches.group('name'))
# LAST/FIRST INITIAL
If you aren't familiar with regex, here's a way of testing pattern matching: https://regex101.com/r/lAARCP/1 and an intro tutorial: https://regexone.com/
But basically, I'm searching for (one or more of anything with .+ between two carrots, ^).
Actually, since you mentioned having first and last separate, you'd use this regex:
\^(?P<last>.+)/(?P<first>.+)\^
This question may also interest you regarding finding something twice: Finding multiple occurrences of a string within a string in Python
If you find regex difficult you can divide the problem into smaller pieces and attack one at a time:
data = '%B8954756016548963^LAST/FIRST INITIAL^180912345678912345678901234?;8954756016548963=180912345678912345678901234?'
pieces = data.split('^') # Divide in pieces, one of which contains name
for piece in pieces:
if '/' in piece:
last, the_rest = piece.split('/')
first, initial = the_rest.split()
print('Name:', first, initial, last)
elif piece.startswith('%B'):
print('Card no:', piece[2:])
I want to crawl a webpage for some information and what I've done so far It's working but I need to do a request to another url from the website, I'm trying to format it but it's not working, this is what I have so far:
name = input("> ")
page = requests.get("http://www.mobafire.com/league-of-legends/champions")
tree = html.fromstring(page.content)
for index, champ in enumerate(champ_list):
if name == champ:
y = tree.xpath(".//*[#id='browse-build']/a[{}]/#href".format(index + 1))
print(y)
guide = requests.get("http://www.mobafire.com{}".format(y))
builds = html.fromstring(guide.content)
print(builds)
for title in builds.xpath(".//table[#class='browse-table']/tr[2]/td[2]/div[1]/a/text()"):
print(title)
From the input, the user enters a name; if the name matches one from a list (champ_list) it prints an url and from there it formats it to the guide variable and gets another information but I'm getting errors such as invalid ipv6.
This is the output url (one of them but they're similar anyway) ['/league-of-legends/champion/ivern-133']
I tried using slicing but it doesn't do anything, probably I'm using it wrong or it doesn't work in this case. I tried using replace as well, they don't work on lists; tried using it as:
y = [y.replace("'", "") for y in y] so I could see if it removed at least the quotes but it didn't work neither; what can be another approach to format this properly?
I take it y is the list you want to insert into the string?
Try this:
"http://www.mobafire.com{}".format('/'.join(y))
I have an app that will show images from reddit. Some images come like this http://imgur.com/Cuv9oau, when I need to make them look like this http://i.imgur.com/Cuv9oau.jpg. Just add an (i) at the beginning and (.jpg) at the end.
You can use a string replace:
s = "http://imgur.com/Cuv9oau"
s = s.replace("//imgur", "//i.imgur")+(".jpg" if not s.endswith(".jpg") else "")
This sets s to:
'http://i.imgur.com/Cuv9oau.jpg'
This function should do what you need. I expanded on #jh314's response and made the code a little less compact and checked that the url started with http://imgur.com as that code would cause issues with other URLs, like the google search I included. It also only replaces the first instance, which could causes issues.
def fixImgurLinks(url):
if url.lower().startswith("http://imgur.com"):
url = url.replace("http://imgur", "http://i.imgur",1) # Only replace the first instance.
if not url.endswith(".jpg"):
url +=".jpg"
return url
for u in ["http://imgur.com/Cuv9oau","http://www.google.com/search?q=http://imgur"]:
print fixImgurLinks(u)
Gives:
>>> http://i.imgur.com/Cuv9oau.jpg
>>> http://www.google.com/search?q=http://imgur
You should use Python's regular expressions to place the i. As for the .jpg you can just append it.