How to append to a list using map function? - python

I'm trying to use map, to map the ping_all function to a list of hosts.
The problem I'm having is that inside the ping_all function, I'm trying to append all failed hosts to a list. Normally I would call the ping_all function, passing in the empty list as an argument and returning the modified list, but since I'm using map here, I'm not sure how to achieve that?
import os
import argparse
from subprocess import check_output
from multiprocessing import Pool
parser = argparse.ArgumentParser(description='test')
args = parser.parse_args()
dead_hosts = []
def gather_hosts():
""" Returns all environments from opsnode and puts them in a dict """
host_list = []
url = 'http://test.com/hosts.json'
opsnode = requests.get(url)
content = json.loads(opsnode.text)
for server in content["host"]:
if server.startswith("ip-10-12") and server.endswith(".va.test.com"):
host_list.append(str(server))
return host_list
def try_ping(hostnames):
try:
hoststatus = check_output(["ping", "-c 1", hostnames])
print "Success:", hostnames
except:
print "\033[1;31mPing Failed:\033[1;m", hostnames
global dead_hosts
dead_hosts.append(hostnames)
def show_dead_hosts(dead_hosts):
print '\033[1;31m******************* Following Hosts are Unreachable ******************* \n\n\033[1;m'
for i in dead_hosts:
print '\033[1;31m{0} \033[1;m'.format(i)
if __name__ == '__main__':
hostnames = gather_hosts()
pool = Pool(processes=30) # process per core
pool.map(try_ping, hostnames, dead_hosts)
show_dead_hosts(dead_hosts)
I tried passing dead_hosts as a second argument in map, but after running this script, dead_hosts remains an empty list, it does not appear that the hosts are appending to the list.
What am I doing wrong?

There are several issues with your code:
The third argument to Pool.map is the chunksize, so passing dead_hosts (a list) is definitely incorrect.
You can't access globals when using a multiprocessing Pool because the tasks in the pool run in separate processes. See Python multiprocessing global variable updates not returned to parent for more details.
Related to the previous point, Pool.map should return a result list (since global side-effects will be mostly invisible). Right now you're just calling it and throwing away the result.
Your format codes weren't properly clearing in my terminal, so everything was turning bold+red...
Here's a version that I've updated and tested—I think it does what you want:
import os
import argparse
from subprocess import check_output
from multiprocessing import Pool
parser = argparse.ArgumentParser(description='test')
args = parser.parse_args()
def gather_hosts():
""" Returns all environments from opsnode and puts them in a dict """
host_list = []
url = 'http://test.com/hosts.json'
opsnode = requests.get(url)
content = json.loads(opsnode.text)
for server in content["host"]:
if server.startswith("ip-10-12") and server.endswith(".va.test.com"):
host_list.append(str(server))
return host_list
def try_ping(host):
try:
hoststatus = check_output(["ping", "-c 1", "-t 1", host])
print "Success:", host
return None
except:
print "\033[1;31mPing Failed:\033[0m", host
return host
def show_dead_hosts(dead_hosts):
print '\033[1;31m******************* Following Hosts are Unreachable ******************* \n\n\033[0m'
for x in dead_hosts:
print '\033[1;31m{0} \033[0m'.format(x)
def main():
hostnames = gather_hosts()
pool = Pool(processes=30) # process per core
identity = lambda x: x
dead_hosts = filter(identity, pool.map(try_ping, hostnames))
show_dead_hosts(dead_hosts)
if __name__ == '__main__':
main()
The main change that I've made is that try_ping either returns None on success, or the host's name on failure. The pings are done in parallel by your task pool, and the results are aggregated into a new list. I run a filter over the list to get rid of all of the None values (None is "falsey" in Python), leaving only the hostnames that failed the ping test.
You'll probably want to get rid of the print statements in try_ping. I'm assuming you just had those for debugging.
You could also consider using imap and ifilter if you need more asynchrony.

Your try_ping function doesn't actually return anything. If I were you, I wouldn't bother with having dead_hosts outside of the function but inside the try_ping function. And then you should return that list.
I'm not familiar with the modules you're using so I don't know if pool.map can take lists.

Related

How to know how many threads / workers from a pool in multiprocessing (Python module) has been completed?

I am using imapala shell to compute some stats over a text file containing the table names.
I am using Python multiprocessing module to pool the processes.
The thing is thing task is very time consuming, so I need to keep track of how many files have been completed to see the job progress.
So let me give you some ideas about the functions that I am using.
job_executor is the function that takes a list of tables and perform the tasks.
main() is the functions, that takes file location, no of executors(pool_workers), converts the file containing table to list of tables and does the multiprocessing thing
I want to see the progress like how much file has been processed by job_executor, but I can't find a solution . Using a counter also doesn't work.
def job_executor(text):
impala_cmd = "impala-shell -i %s -q 'compute stats %s.%s'" % (impala_node, db_name, text)
impala_cmd_res = os.system(impala_cmd) #runs impala Command
#checks for execution type(success or fail)
if impala_cmd_res == 0:
print ("invalidated the metadata.")
else:
print("error while performing the operation.")
def main(args):
text_file_path = args.text_file_path
NUM_OF_EXECUTORS = int(args.pool_executors)
with open(text_file_path, 'r') as text_file_reader:
text_file_rows = text_file_reader.read().splitlines() # this will return list of all the tables in the file.
process_pool = Pool(NUM_OF_EXECUTORS)
try:
process_pool.map(job_executor, text_file_rows)
process_pool.close()
process_pool.join()
except Exception:
process_pool.terminate()
process_pool.join()
def parse_args():
"""
function to take scraping arguments from test_hr.sh file
"""
parser = argparse.ArgumentParser(description='Main Process file that will start the process and session too.')
parser.add_argument("text_file_path",
help='provide text file path/location to be read. ') # text file fath
parser.add_argument("pool_executors",
help='please provide pool executors as an initial argument') # pool_executor path
return parser.parse_args() # returns list/tuple of all arguments.
if __name__ == "__main__":
mail_message_start()
main(parse_args())
mail_message_end()
If you insist on needlessly doing it via multiprocessing.pool.Pool(), the easiest way to keep a track of what's going on is to use a non-blocking mapping (i.e. multiprocessing.pool.Pool.map_async()):
def main(args):
text_file_path = args.text_file_path
NUM_OF_EXECUTORS = int(args.pool_executors)
with open(text_file_path, 'r') as text_file_reader:
text_file_rows = text_file_reader.read().splitlines()
total_processes = len(text_file_rows) # keep the number of lines for reference
process_pool = Pool(NUM_OF_EXECUTORS)
try:
print('Processing {} lines.'.format(total_processes))
processing = process_pool.map_async(job_executor, text_file_rows)
processes_left = total_processes # number of processing lines left
while not processing.ready(): # start a loop to wait for all to finish
if processes_left != processing._number_left:
processes_left = processing._number_left
print('Processed {} out of {} lines...'.format(
total_processes - processes_left, total_processes))
time.sleep(0.1) # let it breathe a little, don't forget to `import time`
print('All done!')
process_pool.close()
process_pool.join()
except Exception:
process_pool.terminate()
process_pool.join()
This will check every 100ms if some of the processes finished processing and if something changed since the last check it will print out the number of lines processed so far. If you need more insight into what's going on with your subprocesses, you can use some of the shared structures like multiprocessing.Queue() or multiprocessing.Manager() structures to directly report from within your processes.

Loop to check if a variable has changed in Python

I have just learned the basics of Python, and I am trying to make a few projects so that I can increase my knowledge of the programming language.
Since I am rather paranoid, I created a script that uses PycURL to fetch my current IP address every x seconds, for VPN security. Here is my code[EDITED]:
import requests
enterIP = str(input("What is your current IP address?"))
def getIP():
while True:
try:
result = requests.get("http://ipinfo.io/ip")
print(result.text)
except KeyboardInterrupt:
print("\nProccess terminated by user")
return result.text
def checkIP():
while True:
if enterIP == result.text:
pass
else:
print("IP has changed!")
getIP()
checkIP()
Now I would like to expand the idea, so that the script asks the user to enter their current IP, saves that octet as a string, then uses a loop to keep running it against the PycURL function to make sure that their IP hasn't changed? The only problem is that I am completely stumped, I cannot come up with a function that would take the output of PycURL and compare it to a string. How could I achieve that?
As #holdenweb explained, you do not need pycurl for such a simple task, but nevertheless, here is a working example:
import pycurl
import time
from StringIO import StringIO
def get_ip():
buffer = StringIO()
c = pycurl.Curl()
c.setopt(pycurl.URL, "http://ipinfo.io/ip")
c.setopt(c.WRITEDATA, buffer)
c.perform()
c.close()
return buffer.getvalue()
def main():
initial = get_ip()
print 'Initial IP: %s' % initial
try:
while True:
current = get_ip()
if current != initial:
print 'IP has changed to: %s' % current
time.sleep(300)
except KeyboardInterrupt:
print("\nProccess terminated by user")
if __name__ == '__main__':
main()
As you can see I moved the logic of getting the IP to separate function: get_ip and added few missing things, like catching the buffer to a string and returning it. Otherwise it is pretty much the same as the first example in pycurl quickstart
The main function is called below, when the script is accessed directly (not by import).
First off it calls the get_ip to get initial IP and then runs the while loop which checks if the IP has changed and lets you know if so.
EDIT:
Since you changed your question, here is your new code in a working example:
import requests
def getIP():
result = requests.get("http://ipinfo.io/ip")
return result.text
def checkIP():
initial = getIP()
print("Initial IP: {}".format(initial))
while True:
current = getIP()
if initial == current:
pass
else:
print("IP has changed!")
checkIP()
As I mentioned in the comments above, you do not need two loops. One is enough. You don't even need two functions, but better do. One for getting the data and one for the loop. In the later, first get initial value and then run the loop, inside which you check if value has changed or not.
It seems, from reading the pycurl documentation, like you would find it easier to solve this problem using the requests library. Curl is more to do with file transfer, so the library expects you to provide a file-like object into which it writes the contents. This would greatly complicate your logic.
requests allows you to access the text of the server's response directly:
>>> import requests
>>> result = requests.get("http://ipinfo.io/ip")
>>> result.text
'151.231.192.8\n'
As #PeterWood suggested, a function would be more appropriate than a class for this - or if the script is going to run continuously, just a simple loop as the body of the program.

Return a non-repeating, random result from a list inside a function

I am writing a Python program that responds to request and vocalizes a response back to the user. Below is a sample of two functions. How can I do this without using a global variable and still get back a non-repeating, random response?
# stores prior response
website_result = 'first_response.wav'
def launch_website():
# if service is offline return with default msg otherwise launch service
if is_connected() == 'FALSE':
arg = 'this_service_is_offline.wav'
return arg
else:
site = 'http://www.somesite.com'
launch_it(site)
return launch_website_response()
def launch_website_response():
# using the global variable inside function
global website_result
# possible responses
RESPONSES = ['first_response.wav', 'second_response.wav', 'third_response.wav']
# ensures a non-repeating response
tmp = random.choice(RESPONSES)
while website_result == tmp:
tmp = random.choice(RESPONSES)
website_result = tmp
return website_result
Your website_result variable indicates that you to have some sort of persistent state. Maybe you could consider storing it in a text file and access it everytime you need it and change it afterward (this works if you don't have to do too many calls to it, otherwise you will get in I/O limitations).
I don't know about the specifics of your application, but it might happen that you could also make your two functions take website_result as an argument as suggested by #JGut.

Does python fabric support dynamic set env.hosts?

I want to change the env.hosts dynamically because sometimes I want to deploy to one machine first, check if ok then deploy to many machines.
Currently I need to set env.hosts first, how could I set the env.hosts in a method and not in global at script start?
Yes you can set env.hosts dynamically. One common pattern we use is:
from fabric.api import env
def staging():
env.hosts = ['XXX.XXX.XXX.XXX', ]
def production():
env.hosts = ['YYY.YYY.YYY.YYY', 'ZZZ.ZZZ.ZZZ.ZZZ', ]
def deploy():
# Do something...
You would use this to chain the tasks such as fab staging deploy or fab production deploy.
Kind of late to the party, but I achieved this with ec2 like so (note in EC2 you do not know what the ip/hostname may be, generally speaking - so you almost have to go dynamic to really account for how the environment/systems could come up - another option would be to use dyndns, but then this would still be useful):
from fabric.api import *
import datetime
import time
import urllib2
import ConfigParser
from platform_util import *
config = ConfigParser.RawConfigParser()
#task
def load_config(configfile=None):
'''
***REQUIRED*** Pass in the configuration to use - usage load_config:</path/to/config.cfg>
'''
if configfile != None:
# Load up our config file
config.read(configfile)
# Key/secret needed for aws interaction with boto
# (anyone help figure out a better way to do this with sub modules, please don't say classes :-) )
global aws_key
global aws_sec
aws_key = config.get("main","aws_key")
aws_sec = config.get("main","aws_sec")
# Stuff for fabric
env.user = config.get("main","fabric_ssh_user")
env.key_filename = config.get("main","fabric_ssh_key_filename")
env.parallel = config.get("main","fabric_default_parallel")
# Load our role definitions for fabric
for i in config.sections():
if i != "main":
hostlist = []
if config.get(i,"use-regex") == 'yes':
for x in get_running_instances_by_regex(aws_key,aws_sec,config.get(i,"security-group"),config.get(i,"pattern")):
hostlist.append(x.private_ip_address)
env.roledefs[i] = hostlist
else:
for x in get_running_instances(aws_key,aws_sec,config.get(i,"security-group")):
hostlist.append(x.private_ip_address)
env.roledefs[i] = hostlist
if config.has_option(i,"base-group"):
if config.get(i,"base-group") == 'yes':
print "%s is a base group" % i
print env.roledefs[i]
# env["basegroups"][i] = True
where get_running_instances and get_running_instances_by_regex are utility functions that make use of boto (http://code.google.com/p/boto/)
ex:
import logging
import re
from boto.ec2.connection import EC2Connection
from boto.ec2.securitygroup import SecurityGroup
from boto.ec2.instance import Instance
from boto.s3.key import Key
########################################
# B-O-F get_instances
########################################
def get_instances(access_key=None, secret_key=None, security_group=None):
'''
Get all instances. Only within a security group if specified., doesnt' matter their state (running/stopped/etc)
'''
logging.debug('get_instances()')
conn = EC2Connection(aws_access_key_id=access_key, aws_secret_access_key=secret_key)
if security_group:
sg = SecurityGroup(connection=conn, name=security_group)
instances = sg.instances()
return instances
else:
instances = conn.get_all_instances()
return instances
Here is a sample of what my config looked like:
# Config file for fabric toolset
#
# This specific configuration is for <whatever> related hosts
#
#
[main]
aws_key = <key>
aws_sec = <secret>
fabric_ssh_user = <your_user>
fabric_ssh_key_filename = /path/to/your/.ssh/<whatever>.pem
fabric_default_parallel = 1
#
# Groupings - Fabric knows them as roledefs (check env dict)
#
# Production groupings
[app-prod]
security-group = app-prod
use-regex = no
pattern =
[db-prod]
security-group = db-prod
use-regex = no
pattern =
[db-prod-masters]
security-group = db-prod
use-regex = yes
pattern = mysql-[d-s]01
Yet another new answer to an old question. :) But I just recently found myself attempting to dynamically set hosts, and really have to disagree with the main answer. My idea of dynamic, or at least what I was attempting to do, was take an instance DNS-name that was just created by boto, and access that instance with a fab command. I couldn't do fab staging deploy, because the instance doesn't exist at fabfile-editing time.
Fortunately, fabric does support a truly dynamic host-assignment with execute. (It's possible this didn't exist when the question was first asked, of course, but now it does). Execute allows you to define both a function to be called, and the env.hosts it should use for that command. For example:
def create_EC2_box(data=fab_base_data):
conn = boto.ec2.connect_to_region(region)
reservations = conn.run_instances(image_id=image_id, ...)
...
return instance.public_dns_name
def _ping_box():
run('uname -a')
run('tail /var/log/cloud-init-output.log')
def build_box():
box_name = create_EC2_box(fab_base_data)
new_hosts = [box_name]
# new_hosts = ['ec2-54-152-152-123.compute-1.amazonaws.com'] # testing
execute(_ping_box, hosts=new_hosts)
Now I can do fab build_box, and it will fire one boto call that creates an instance, and another fabric call that runs on the new instance - without having to define the instance-name at edit-time.

Dnspython: Setting query timeout/lifetime

I have a small script that checks a large list of domains for their MX records, everything works fine but when the script finds a domain with no record, it takes quite a long time to skip to the next one.
I have tried adding:
query.lifetime = 1.0
or
query.timeout = 1.0
but this doesn't seem to do anything. Does anyone know how this setting is configured?
My script is below, thanks for your time.
import dns.resolver
from dns.exception import DNSException
import dns.query
import csv
domains = csv.reader(open('domains.csv', 'rU'))
output = open('output.txt', 'w')
for row in domains:
try:
domain = row[0]
query = dns.resolver.query(domain,'MX')
query.lifetime = 1.0
except DNSException:
print "nothing here"
for rdata in query:
print domain, " ", rdata.exchange, 'has preference', rdata.preference
output.writelines(domain)
output.writelines(",")
output.writelines(rdata.exchange.to_text())
output.writelines("\n")
You're setting the timeout after you've already performed the query. So that's not gonna do anything!
What you want to do instead is create a Resolver object, set its timeout, and then call its query() method. dns.resolver.query() is just a convenience function that instantiates a default Resolver object and invokes its query() method, so you need to do that manually if you don't want a default Resolver.
resolver = dns.resolver.Resolver()
resolver.timeout = 1
resolver.lifetime = 1
Then use this in your loop:
try:
domain = row[0]
query = resolver.resolve(domain,'MX')
except:
# etc.
You should be able to use the same Resolver object for all queries.

Categories