Python - Put a list in a threading module - python

I want to put a list into my threading script, but I am facing a problem.
Contents of list file (example):
http://google.com
http://yahoo.com
http://bing.com
http://python.org
My script:
import codecs
import threading
import sys
import requests
from time import time as timer
from timeout import timeout
import time
try:
with codecs.open(sys.argv[1], mode='r', encoding='ascii', errors='ignore') as iiz:
iiz=iiz.read().splitlines()
except IOError:
pass
oz = list(iiz)
def nnn(url):
hzz = {'param1': sys.argv[2], 'param2': sys.argv[3]}
po = requests.post(url,data=hzz)
if po:
print("ok \n")
if __name__ == '__main__':
threads = []
for i in range(1):
t = threading.Thread(target=nnn, args=(oz,))
threads.append(t)
t.start()

Can you please clarify what elaborate on exactly what you're trying to achieve.
I'm guessing that you're trying to request urls to load into a web browser or the terminal...
Also you shouldn't need to put the urls into a list because when you opened up the file containing the urls, it automatically sorted it into a list. So in other words, the contents in iiz are already in the list format.
Personally, I haven't worked much with the modules you're using (apart from time), but I'll try my best to help you and hopefully other users will try and help you too.

Related

Save live data from data logger to csv file by python

I have a data logger to record the temperature. I want to save these data and epoch time in a csv file. I tried the following code, there is no error reporting but the csv file is empty. Can anyone help me to figure out the problem?
import board
import busio
import adafruit_mcp9600
import time
i2c = busio.I2C(board.SCL,board.SDA,frequency = 100000)
mcp = adafruit_mcp9600.MCP9600(i2c, 0x60, tctype = "J")
with open ("/home/pi/Documents/test.csv", "a") as log:
while True:
temp = mcp.temperature
temptime = time.time()
log.write("{0},{1}\n".format(str(temptime),str(temp)))
time.sleep(1)
Assuming those libraries you're importing are working correctly, I think this is because the writer is not flushing the buffer, so it appears like nothing is being written.
The solution would be to flush with log.flush() after each time you write a log.
Try a simpler example:
A)
import time
def go():
i = 0
with open("/home/some/dir/test.csv", "a") as nice:
while True:
nice.write(f"hello,{i},{time.time()}\n")
i += 1
time.sleep(5)
if __name__ == "__main__":
go()
versus
B)
import time
def go():
i = 0
while True:
with open("/home/some/dir/test.csv", "a") as nice:
nice.write(f"hello,{i},{time.time()}\n")
i += 1
time.sleep(5)
if __name__ == "__main__":
go()
When I refresh the file in case A, new rows do not appear to be written. They are in case B, though.
If I modify case A) and add nice.flush() after each write, it fixes the issue.
The above two blocks are just to demonstrate what you're seeing. I'm not suggesting you do one or the other. Ultimately, I would not suggest doing anything like this, and I would instead use the logging package and configure a proper logger if you're indeed trying to create log files.

Python 3 urllib: 530 too many connections, in loop

I am retrieving data files from a FTP server in a loop with the following code:
response = urllib.request.urlopen(url)
data = response.read()
response.close()
compressed_file = io.BytesIO(data)
gin = gzip.GzipFile(fileobj=compressed_file)
Retrieving and processing the first few works fine, but after a few request I am getting the following error:
530 Maximum number of connections exceeded.
I tried closing the connection (see code above) and using a sleep() timer, but this both did not work. What is it I am doing wrong here?
Trying to make urllib do FTP properly makes my brain hurt. By default, it creates a new connection for each file, apparently without really properly ensuring the connections close.
ftplib is more appropriate I think.
Since I happen to be working on the same data you are(were)... Here is a very specific answer decompressing the .gz files and passing them into ish_parser (https://github.com/haydenth/ish_parser).
I think it is also clear enough to serve as a general answer.
import ftplib
import io
import gzip
import ish_parser # from: https://github.com/haydenth/ish_parser
ftp_host = "ftp.ncdc.noaa.gov"
parser = ish_parser.ish_parser()
# identifies what data to get
USAF_ID = '722950'
WBAN_ID = '23174'
YEARS = range(1975, 1980)
with ftplib.FTP(host=ftp_host) as ftpconn:
ftpconn.login()
for year in YEARS:
ftp_file = "pub/data/noaa/{YEAR}/{USAF}-{WBAN}-{YEAR}.gz".format(USAF=USAF_ID, WBAN=WBAN_ID, YEAR=year)
print(ftp_file)
# read the whole file and save it to a BytesIO (stream)
response = io.BytesIO()
try:
ftpconn.retrbinary('RETR '+ftp_file, response.write)
except ftplib.error_perm as err:
if str(err).startswith('550 '):
print('ERROR:', err)
else:
raise
# decompress and parse each line
response.seek(0) # jump back to the beginning of the stream
with gzip.open(response, mode='rb') as gzstream:
for line in gzstream:
parser.loads(line.decode('latin-1'))
This does read the whole file into memory, which could probably be avoided using some clever wrappers and/or yield or something... but works fine for a year's worth of hourly weather observations.
Probably a pretty nasty workaround, but this worked for me. I made a script (here called test.py) which does the request (see code above). The code below is used in the loop I mentioned and calls test.py
from subprocess import call
with open('log.txt', 'a') as f:
call(['python', 'test.py', args[0], args[1]], stdout=f)

How to decode python string

I have some code which I would like to be decoded but not having much luck in guessing what the codepage is, if any is being used. Any help would be much appreciated.
i am using python command line in windows 7 pc,if any python guru guide me how to decrypt and see the code thaat would be appreciated.
exec("import re;import base64");exec((lambda p,y:(lambda o,b,f:re.sub(o,b,f))(r"([0-9a-f]+)",lambda m:p(m,y),base64.b64decode("NTQgYgo1NCA3CjU0IDMKNTQgMWUKNTQgOQo1NCAxOAozZiAgICAgICA9IGIuMTAoKQoxNiAgID0gIjQzOi8vMTIuM2QvNGMvMWQuMjUuZi00ZC4zZSIKYSA9ICIxZC4yNS5mIgoyYSA4KDYpOgoJMzMgMy5jKCczYy4yZSglNTIpJyAlIDYpID09IDEKCjJhIDE1KDM1KToKCTUgPSAzLjQoMWUuNS4xYygnMTM6Ly8yZC8xZicsJzMwJykpCgkyMyA1CgkyMSA9IDcuMTQoKQoJMjEuMzgoIjEwIDI4IiwiMjAgMTAuLiIsJycsICczNiA0MCcpCgkxMT0xZS41LjFjKDUsICdlLjNlJykKCTM5OgoJCTFlLjFhKDExKQoJMWI6CgkJMmMKCQk5LmUoMzUsIDExLCAyMSkKCQkyID0gMy40KDFlLjUuMWMoJzEzOi8vMmQnLCcxZicpKQoJCTIzIDIKCQkyMS4zNCgwLCIiLCAiM2IgNDciKQoJCTE4LjQ4KDExLDIsMjEpCgkJCgkJMy41MygnMjIoKScpOyAKCQkzLjUzKCcyNigpJyk7CgkJMy41MygiNDUuZCgpIik7IAoJCTE5PTcuMzcoKTsgMTkuNTAoIjMyISIsIjJmIDNhIDQ5IDQxIDI5IiwiICAgWzI0IDQ2XTMxIDUxIDRhIDRlIDE3LjNkWy8yNF0iKQoJCSIiIgoJCTM5OgoJCQkxZS4xYSgxMSkKCQkxYjoKCQkJMmMKCQkJIzI3KCkKCQk0Mjo0NCgpCgkJIiIiCgoyYSAyYigpOgoJNGYgNGIgOChhKToKCQkxNSgxNikKCQoKCjJiKCk=")))(lambda a,b:b[int("0x"+a.group(1),16)],"0|1|addonfolder|xbmc|translatePath|path|script_name|xbmcgui|script_chk|downloader|scriptname|xbmcaddon|getCondVisibility|UpdateLocalAddons|download|supermax|Addon|lib|supermaxwizard|special|DialogProgress|INSTALL|website|SuperMaxWizard|extract|dialog|remove|except|join|plugin|os|addons|Installing|dp|UnloadSkin|print|COLOR|video|ReloadSkin|FORCECLOSE|Installer|Installed|def|Main|pass|home|HasAddon|SuperMax|packages|Brought|Success|return|update|url|Please|Dialog|create|try|Wizard|Nearly|System|com|zip|addon|Wait|been|else|http|quit|XBMC|gold|Done|all|has|You|not|sm|MP|By|if|ok|To|s|executebuiltin|import".split("|")))
The code is uglified. You can unobfuscate it yourself by executing the contents of exec(...) in your Python shell.
import re
import base64
print ((lambda p,y.....split("|")))
EDIT: As snakecharmerb says, it is generally not safe to execute unknown code. I analysed the code to find that running the insides of exec will only decrypt, and leaving off the exec itself will just result in a string. This procedure ("execute stuff inside exec") is by no means a generally safe method to decrypt uglified code, and you need to actually analyse what it does. But, at this point, I was asking you to trust my judgement, which, if it is wrong, theoretically could expose you to an attack. In addition, it seems you have problems getting it to run on your Python; so here's what I'm getting from the above:
import xbmcaddon
import xbmcgui
import xbmc
import os
import downloader
import extract
addon = xbmcaddon.Addon()
website = "http://supermaxwizard.com/sm/plugin.video.supermax-MP.zip"
scriptname = "plugin.video.supermax"
def script_chk(script_name):
return xbmc.getCondVisibility('System.HasAddon(%s)' % script_name) == 1
def INSTALL(url):
path = xbmc.translatePath(os.path.join('special://home/addons','packages'))
print path
dp = xbmcgui.DialogProgress()
dp.create("Addon Installer","Installing Addon..",'', 'Please Wait')
lib=os.path.join(path, 'download.zip')
try:
os.remove(lib)
except:
pass
downloader.download(url, lib, dp)
addonfolder = xbmc.translatePath(os.path.join('special://home','addons'))
print addonfolder
dp.update(0,"", "Nearly Done")
extract.all(lib,addonfolder,dp)
xbmc.executebuiltin('UnloadSkin()');
xbmc.executebuiltin('ReloadSkin()');
xbmc.executebuiltin("XBMC.UpdateLocalAddons()");
dialog=xbmcgui.Dialog(); dialog.ok("Success!","SuperMax Wizard has been Installed"," [COLOR gold]Brought To You By SuperMaxWizard.com[/COLOR]")
"""
try:
os.remove(lib)
except:
pass
#FORCECLOSE()
else:quit()
"""
def Main():
if not script_chk(scriptname):
INSTALL(website)
Main()

Python URLLib does not work with PyQt + Multiprocessing

A simple code as such:
import urllib2
import requests
from PyQt4 import QtCore
import multiprocessing
import time
data = (
['a', '2'],
)
def mp_worker((inputs, the_time)):
r = requests.get('http://www.gpsbasecamp.com/national-parks')
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
def mp_handler():
p = multiprocessing.Pool(2)
p.map(mp_worker, data)
if __name__ == '__main__':
mp_handler()
Basically, if i import PyQt4, and i have a urllib request (i believe this is used in almost all web extraction libraries such as BeautifulSoup, Requests or Pyquery. it crashes with a cryptic log on my MAC)
This is exactly True. It always fails on Mac, I have wasted rows of days just to fix this. And honestly there is no fix as of now. The best way is to use Thread instead of Process and it will work like a charm.
By the way -
r = requests.get('http://www.gpsbasecamp.com/national-parks')
and
request = urllib2.Request("http://www.gpsbasecamp.com/national-parks")
response = urllib2.urlopen(request)
do one and the same thing. Why are you doing it twice?
This may be due _scproxy.get_proxies() not being fork-safe on Mac.
This is raised here https://bugs.python.org/issue33725#msg329926
_scproxy has been known to be problematic for some time, see for instance Issue31818. That issue also gives a simple workaround: setting urllib's "no_proxy" environment variable to "*" will prevent the calls to the System Configuration framework.
This is something that urllib may be attempting to do causing failure when multiprocessing.
There is a workaround and that is to set the environmental variable no-proxy to *
Eg. export no_proxy=*

Requests with multiple connections

I use the Python Requests library to download a big file, e.g.:
r = requests.get("http://bigfile.com/bigfile.bin")
content = r.content
The big file downloads at +- 30 Kb per second, which is a bit slow. Every connection to the bigfile server is throttled, so I would like to make multiple connections.
Is there a way to make multiple connections at the same time to download one file?
You can use HTTP Range header to fetch just part of file (already covered for python here).
Just start several threads and fetch different range with each and you're done ;)
def download(url,start):
req = urllib2.Request('http://www.python.org/')
req.headers['Range'] = 'bytes=%s-%s' % (start, start+chunk_size)
f = urllib2.urlopen(req)
parts[start] = f.read()
threads = []
parts = {}
# Initialize threads
for i in range(0,10):
t = threading.Thread(target=download, i*chunk_size)
t.start()
threads.append(t)
# Join threads back (order doesn't matter, you just want them all)
for i in threads:
i.join()
# Sort parts and you're done
result = ''.join(parts[i] for i in sorted(parts.keys()))
Also note that not every server supports Range header (and especially servers with php scripts responsible for data fetching often don't implement handling of it).
Here's a Python script that saves given url to a file and uses multiple threads to download it:
#!/usr/bin/env python
import sys
from functools import partial
from itertools import count, izip
from multiprocessing.dummy import Pool # use threads
from urllib2 import HTTPError, Request, urlopen
def download_chunk(url, byterange):
req = Request(url, headers=dict(Range='bytes=%d-%d' % byterange))
try:
return urlopen(req).read()
except HTTPError as e:
return b'' if e.code == 416 else None # treat range error as EOF
except EnvironmentError:
return None
def main():
url, filename = sys.argv[1:]
pool = Pool(4) # define number of concurrent connections
chunksize = 1 << 16
ranges = izip(count(0, chunksize), count(chunksize - 1, chunksize))
with open(filename, 'wb') as file:
for s in pool.imap(partial(download_part, url), ranges):
if not s:
break # error or EOF
file.write(s)
if len(s) != chunksize:
break # EOF (servers with no Range support end up here)
if __name__ == "__main__":
main()
The end of file is detected if a server returns empty body, or 416 http code, or if the response size is not chunksize exactly.
It supports servers that doesn't understand Range header (everything is downloaded in a single request in this case; to support large files, change download_chunk() to save to a temporary file and return the filename to be read in the main thread instead of the file content itself).
It allows to change independently number of concurrent connections (pool size) and number of bytes requested in a single http request.
To use multiple processes instead of threads, change the import:
from multiprocessing.pool import Pool # use processes (other code unchanged)
This solution requires the linux utility named "aria2c", but it has the advantage of easily resuming downloads.
It also assumes that all the files you want to download are listed in the http directory list for location MY_HTTP_LOC. I tested this script on an instance of lighttpd/1.4.26 http server. But, you can easily modify this script so that it works for other setups.
#!/usr/bin/python
import os
import urllib
import re
import subprocess
MY_HTTP_LOC = "http://AAA.BBB.CCC.DDD/"
# retrieve webpage source code
f = urllib.urlopen(MY_HTTP_LOC)
page = f.read()
f.close
# extract relevant URL segments from source code
rgxp = '(\<td\ class="n"\>\<a\ href=")([0-9a-zA-Z\(\)\-\_\.]+)(")'
results = re.findall(rgxp,str(page))
files = []
for match in results:
files.append(match[1])
# download (using aria2c) files
for afile in files:
if os.path.exists(afile) and not os.path.exists(afile+'.aria2'):
print 'Skipping already-retrieved file: ' + afile
else:
print 'Downloading file: ' + afile
subprocess.Popen(["aria2c", "-x", "16", "-s", "20", MY_HTTP_LOC+str(afile)]).wait()
you could use a module called pySmartDLfor this it uses multiple threads and can do a lot more also this module gives a download bar by default.
for more info check this answer

Categories