Pass statement if previous magnet URL is the same - python

I'm writing a simple script in python 2.7 that receives multiple data sets every second over UDP. Then places each of these data sets into individual magnet URLs and opens them.
Many times, a data set can be the same as the previous one(s), and therefore I don't want to open the same magnet URL multiple times.
Here is a portion of my code:
while True:
var = s.recv(30)
url = "magnet://myhost.com/{0}".format(var)
os.startfile(url)
As an example, I can receive the following data sets:
a
a
a
b
b
a
a
e
e
e
Essentially, if two data sets are the same, then the same magnet URLs are produced. In the example above, I would like it to open the first magnet URL (a), but skip (pass) the next two a's. Then open the first b URL but skip the next b. If data set a is sent again, then open the first one but skip the following a's. So on and so forth.
I'm guessing that I could use an if/else and a pass statement for this, but I'm not sure how. Any ideas?

Ok, if you only need to skip a value if it is the same of the previous one, just use a simple variable to keep trace of it :
old = None
while True:
var = s.recv(30)
if var != old:
old = var
url = "magnet://myhost.com/{0}".format(var)
os.startfile(url)

You can construct a set of previously seen items:
seen = set()
while True:
var = s.recv(30)
if var not in seen:
url = "magnet://myhost.com/{0}".format(var)
os.startfile(url)
seen.add(var)

Related

Replace multiple try codes with loop function - searching in HTML

I am trying to download files from a website. The URL is in HTML of the page, I am able to find the right one, but the various videos are in different frame per second (fps). I had multiple try functions as seen here but it was difficult to follow, so I tried the loop function seen here.
This is what I have:
import re
for [j] in range(23,24,25,26,27,28,29,30,31):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword=result.group(1)
commapos=extractedword.find(",")
link=extractedword[:commapos].replace('"',"")
except:
pass
print(title)
print(link)
The output message states range expected at most 3 arguments, got 9
Any advice how I can search for the correct URL? I've been trying to get this done for days. Thank you!
EDIT:
I should add the URL for one title only exists in one FPS at the set resolution. The various titles exist in a variety of FPS, but each title is only available in one FPS for the required resolution. Some of the solutions are returned "download error retrying" in a loop.
Use this code instead:
import re
s = # Put what s equals here
for j in range(23, 32):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword = result.group(1)
commapos = extractedword.find(",")
link = extractedword[:commapos].replace('"', "")
except:
pass
else:
print(title) # Also make sure to define title somewhere
print(link)
range takes three arguments: start,stop,step. So instead try this:
# other variables
import re
for j in range(23,31):
try:
result = re.search('"width":1280,"mime":"video/mp4","fps":[j],"url":(.*),', s)
extractedword=result.group(1)
commapos=extractedword.find(",")
link = extractedword[:commapos].replace('"',"")
except:
pass
else:
print(title)
print(link)
You really do not need range or even a loop here. There are many issues,
Not calling range correctly. start, stop, step
Never calling j as part of your regex
Given the issue, you are cautious that frame rates change and the regex may fail to match if that occurs. A simple, more elegant solution here is to just update your regex to match all the possible frame rates that may be present.
2[3-9]|3[0-1]
The above regex will match all numbers from 23 to 31. If we specify this as a capturing group, we can also access this via our search later if we want to store the frame rate.
import re
s = '"width":1280,"mime":"video/mp4","fps":25,"url":test.com'
result = re.search('"width":1280,"mime":"video\/mp4","fps":(2[3-9]|3[0-1]),"url":(.*)', s)
result.group(1)
#'25'
result.group(2)
#'test.com'
From here you can proceed to modify the output however you want. For a more in depth explanation of the regex you can see the calculation steps at regex101

Can you explain how lines 8 and 9 of the code worked?

I do not understand how lines 8 and 9 in below code works. If someone were to describe this two lines, the code would be easy for me to understand.
Below is the code:
import requests
from bs4 import BeautifulSoup
session = requests.session()
form_page = session.get('http://www.educationboardresults.gov.bd')
form = BeautifulSoup(form_page.content, 'lxml')
#Line 8:
captcha = eval(form.form.table.table.find_all('tr')[6].find_all('td')[1].get_text())
#Line 9:
data = dict(sr=3,et=0,exam='ssc', year='2011', board="comilla", roll="16072541", reg="8718001254", value_s=captcha)
A html table is built like this:
A bunch of rows <tr>, and each row has some columns <td>.
What the captcha line does is:
find_all('tr'): get all rows (<tr>)
[6]: get the 7th row specifically
find_all('td') inside that row, get all the columns (<td>)
[1]: get the second column specifically
We now have a table cell with a single value in it.
5) get_text() Get the actual text content of that cell.
You can read the dots "x.y" as "return y from x"
Now, eval() will execute this table cell value as if it was a part of the code. Whatever value that execution returns is stored in the captcha variable.
eval("print('hello')") is the same as print('hello')
The data line just builds a dictionary. I'm not sure I understand the names used, but you can call members by name with a dictionary, like data['sr'] which will then return 3.
data['value_s'] stores the value of captcha
How line 8 works is that it allows the owner of the resource you are reading (at http://www.educationboardresults.gov.bd) to execute arbitrary code on your machine.
For example, if the owner was to put in the table __import__(“shutil”).rmtree(“/“, True) then they’ve just managed to toast every file you have permission to.
So, you may wish to consider rewriting line 8 entirely.

Python - Thread, Sending 20 request - First to arrive, First to serve?

So I was thinking to use some request to send etc 20 request to a site and the first one to serve a value from one of those site should use that value and continue my code basically. So whenever there is a value inside a String or whatever then just continue the code. However I got stuck. What I have done so far is that I have been able to send only one request:
my_key = load["MyKey"] # My own key for website.
website_key = webkey
url = webUrl
Myclient = MyClient(my_key)
task = Task(url, website_key)
values = client.createTask(task)
value.join()
value = values.get_response()
print(value)
So basically with .join is searches for the value from the website and then return it as a get_response whenever its ready. However when I do this, It will only do one and then end.
And what I want to do is basically to send like etc 25 of them and then whenever hits the value first then end the other one and continue or end the program pretty much.
What would be the best solution for that?

Python - Format a string from a list not working

I want to crawl a webpage for some information and what I've done so far It's working but I need to do a request to another url from the website, I'm trying to format it but it's not working, this is what I have so far:
name = input("> ")
page = requests.get("http://www.mobafire.com/league-of-legends/champions")
tree = html.fromstring(page.content)
for index, champ in enumerate(champ_list):
if name == champ:
y = tree.xpath(".//*[#id='browse-build']/a[{}]/#href".format(index + 1))
print(y)
guide = requests.get("http://www.mobafire.com{}".format(y))
builds = html.fromstring(guide.content)
print(builds)
for title in builds.xpath(".//table[#class='browse-table']/tr[2]/td[2]/div[1]/a/text()"):
print(title)
From the input, the user enters a name; if the name matches one from a list (champ_list) it prints an url and from there it formats it to the guide variable and gets another information but I'm getting errors such as invalid ipv6.
This is the output url (one of them but they're similar anyway) ['/league-of-legends/champion/ivern-133']
I tried using slicing but it doesn't do anything, probably I'm using it wrong or it doesn't work in this case. I tried using replace as well, they don't work on lists; tried using it as:
y = [y.replace("'", "") for y in y] so I could see if it removed at least the quotes but it didn't work neither; what can be another approach to format this properly?
I take it y is the list you want to insert into the string?
Try this:
"http://www.mobafire.com{}".format('/'.join(y))

Python - How to check if the name from file is used?

I have small scraping script. I have file with 2000 names and I use these names to search for Video IDs in YouTube. Because of the amount it takes pretty long time to get all the IDs so I can't do that in one time. What I want is to find where I ended my last scrape and then start from that position. What is the best way to do this? I was thinking about adding the used name to the list and then just check if it's in the list, if no - start scraping but maybe there's a better way to do this? (I hope yes).
Part that takes name from file and scraped IDs. What I want is when I quit scraping, next time when I start it, it would run not from beginning but from point where it ended last time:
index = 0
for name in itertools.islice(f, index, None):
parameters = {'key': api_key, 'q': name}
request_url = requests.get('https://www.googleapis.com/youtube/v3/search?part=snippet&maxResults=1&type=video&fields=items%2Fid', params = parameters)
videoid = json.loads(request_url.text)
if 'error' in videoid:
pass
else:
index += 1
id_file.write(videoid['items'][0]['id']['videoId'] + '\n')
print videoid['items'][0]['id']['videoId']
You could just remember the index number of the last scraped entry. Every time you finish scraping one entry, increment a counter, then assuming the entries in your text file don't change order, just pick up again at that number?
The simplest answer here is probably mitim's answer. Just keep a file that you rewrite with the last-processed index after each line. For example:
savepath = os.path.expanduser('~/.myprogram.lines')
skiplines = 0
try:
with open(savepath) as f:
skiplines = int(f.read())
except:
pass
with open('names.txt') as f:
for linenumber, line in itertools.islice(enumerate(f), skiplines, None):
do_stuff(line)
with open(savepath, 'w') as f:
f.write(str(linenumber))
However, there are other ways you could do this that might make more sense for your use case.
For example, you could rewrite the "names" file after each name is processed to remove the first line. Or, maybe better, preprocess the list into an anydbm (or even sqlite3) database, so you can more easily remove (or mark) names once they're done.
Or, if you might run against different files, and need to keep a progress for each one, you could store a separate .lines file for each one (probably in a ~/.myprogram directory, rather than flooding the top-level home directory), or use an anydbm mapping pathnames to lines done.

Categories