Nail down system requirements: MIN_CPU_COUNT, MIN_RAM

Nail down system requirements: MIN_CPU_COUNT, MIN_RAM - python

I develop a server component with Python.
I want to nail down the system requirements:
MIN_CPU_COUNT
MIN_RAM
...
Is there a way (maybe in setup.py) to define something like this?
My software needs at least N CPUs and M RAM?
Why? Because we had trouble in the past because operators moved the server component to a less capable server and we could not ensure the service-level agreement.

I implemented it this way.
Feedback welcome
from django.conf import settings
import psutil
import humanfriendly
from djangotools.utils import testutils
class Check(testutils.Check):
#testutils.skip_if_not_prod
def test_min_cpu_count(self):
min_cpu_count=getattr(settings, 'MIN_CPU_COUNT', None)
self.assertIsNotNone(min_cpu_count, 'settings.MIN_CPU_COUNT not set. Please supply a value.')
self.assertLessEqual(min_cpu_count, psutil.cpu_count())
#testutils.skip_if_not_prod
def test_min_physical_memory(self):
min_physical_memory_orig=getattr(settings, 'MIN_PHYSICAL_MEMORY', None)
self.assertIsNotNone(min_physical_memory_orig, "settings.MIN_PHYSICAL_MEMORY not set. Please supply a value. Example: MIN_PHYSICAL_MEMORY='4G'")
min_physical_memory_bytes=humanfriendly.parse_size(min_physical_memory_orig)
self.longMessage=False
self.assertLessEqual(min_physical_memory_bytes, psutil.virtual_memory().total, 'settings.MIN_PHYSICAL_MEMORY=%r is not satisfied. Total virtual memory of current hardware: %r' % (
min_physical_memory_orig, humanfriendly.format_size(psutil.virtual_memory().total)))

Related

How to see the "Bundle" output of Hypothesis Python library? (Stateful testing)

When using the hypothesis library and performing stateful testing, how can I see or output the Bundle "services" the library is trying on my code?
Example
import hypothesis.strategies as st
from hypothesis.strategies import integers
from hypothesis.stateful import Bundle, RuleBasedStateMachine, rule, precondition
class test_servicediscovery(RuleBasedStateMachine):
services = Bundle('services')
#rule(target=services, s=st.integers(min_value=0, max_value=2))
def add_service(self, s):
return s
The question is: how do I print / see the Bundle "services" variable, generated by the library?

In the example you've given, the services bundle isn't being tried on your code - you're adding things to it, but never using them as inputs to another rule.
If you are, running Hypothesis in verbose mode will show all inputs as they happen; or even in normal mode failing examples will print all the values used.

Get full computer name from a network drive letter in python

I am using python to populate a table with the file pathways of a number of stored files. However the pathway needs to have the full network drive computer name not just the drive letter, ie
//ComputerName/folder/subfolder/file
not
P:/folder/subfolder/file
I have investigated using the win32api, win32file, and os.path modules but nothing is looking like its able to do it. I need something like win32api.GetComputerName() but with the ability to drop in a known drive letter as an argument and it return the computer name that is mapped to the letter.
So is there anyway in python to look up a drive letter and get back the computer name?

Network drives are mapped using the Windows Networking API that's exported by mpr.dll (multiple provider router). You can create a network drive via WNetAddConnection2. To get the remote path that's associated with a local device, call WNetGetConnection. You can do this using ctypes as follows:
import ctypes
from ctypes import wintypes
mpr = ctypes.WinDLL('mpr')
ERROR_SUCCESS = 0x0000
ERROR_MORE_DATA = 0x00EA
wintypes.LPDWORD = ctypes.POINTER(wintypes.DWORD)
mpr.WNetGetConnectionW.restype = wintypes.DWORD
mpr.WNetGetConnectionW.argtypes = (wintypes.LPCWSTR,
wintypes.LPWSTR,
wintypes.LPDWORD)
def get_connection(local_name):
length = (wintypes.DWORD * 1)()
result = mpr.WNetGetConnectionW(local_name, None, length)
if result != ERROR_MORE_DATA:
raise ctypes.WinError(result)
remote_name = (wintypes.WCHAR * length[0])()
result = mpr.WNetGetConnectionW(local_name, remote_name, length)
if result != ERROR_SUCCESS:
raise ctypes.WinError(result)
return remote_name.value
For example:
>>> subprocess.call(r'net use Y: \\live.sysinternals.com\tools')
The command completed successfully.
0
>>> print(get_connection('Y:'))
\\live.sysinternals.com\tools

I think you just need to look at more of pywin32... As you can see here, there is already an API that converts local drive names to full UNC paths.
For completeness, here is some code that works for me.
import win32wnet
import sys
print(win32wnet.WNetGetUniversalName(sys.argv[1], 1))
And this gives me something like this when I run it:
C:\test>python get_unc.py i:\some\path
\\machine\test_share\some\path

you could run net use and parse the output.
i am posting this from my mobile but i am going to improve this answer when i am in front of a real computer.
here are some links, that can help in the meantime:
https://docs.python.org/2/library/subprocess.html#module-subprocess.
https://technet.microsoft.com/en-us/library/gg651155.aspx.

My answer to a similar question:
Here's how to do it in python ≥ 3.4, with no dependencies!*
from pathlib import Path
def unc_drive(file_path):
return str(Path(file_path).resolve())
*Note: I just found a situation in which this method fails. One of my company's network shares has permissions setup such that this method raises a PermissionError. In this case, win32wnet.WNetGetUniversalName is a suitable fallback.

If you just need the hostname, you can use the socket module:
socket.gethostname()
or you may want to use the os module:
os.uname()[1]
os.uname() returns a 5 tuple that contains (sysname, nodename, release, version, machine)

Most straightforward way to cache geocoding data

I am using geopy to get lat/long coordinates for a list of addresses. All the documentation points to limiting server queries by caching (many questions here, in fact), but few actually give practical solutions.
What is the best way to accomplish this?
This is for a self-contained data processing job I'm working on ... no app platform involved. Just trying to cut down on server queries as I run through data that I will have seen before (very likely, in my case).
My code looks like this:
from geopy import geocoders
def geocode( address ):
# address ~= "175 5th Avenue NYC"
g = geocoders.GoogleV3()
cache = addressCached( address )
if ( cache != False ):
# We have seen this exact address before,
# return the saved location
return cache
# Otherwise, get a new location from geocoder
location = g.geocode( address )
saveToCache( address, location )
return location
def addressCached( address ):
# What does this look like?
def saveToCache( address, location ):
# What does this look like?

How exactly you want to implement your cache is really dependent on what platform your Python code will be running on.
You want a pretty persistent "cache" since addresses' locations are not going to change often:-), so a database (in a key-value mood) seems best.
So in many cases I'd pick sqlite3, an excellent, very lightweight SQL engine that's part of the Python standard library. Unless perhaps I preferred e.g a MySQL instance that I need to have running anyway, one advantage might be that this would allow multiple applications running on different nodes to share the "cache" -- other DBs, both SQL and non, would be good for the latter, depending on your constraints and preferences.
But if I was e.g running on Google App Engine, then I'd be using the datastore it includes, instead. Unless I had specific reasons to want to share the "cache" among multiple disparate applications, in which case I might consider alternatives such as google cloud sql and google storage, as well as another alternative yet consisting of a dedicated "cache server" GAE app of my own serving RESTful results (maybe w/endpoints?). The choice is, again!, very, very dependent on your constraints and preferences (latency, queries-per-seconds sizing, etc, etc).
So please clarify what platform you are in, and what other constraints and preferences you have for your databasey "cache", and then the very simple code to implement that can easily be shown. But showing half a dozen different possibilities before you clarify would not be very productive.
Added: since the comments suggest sqlite3 may be acceptable, and there are a few important details best shown in code (such as, how to serialize and deserialize an instance of geopy.location.Location into/from a sqlite3 blob -- similar issues may well arise with other underlying databases, and the solutions are similar), I decided a solution example may be best shown in code. So, as the "geo cache" is clearly best implemented as its own module, I wrote the following simple geocache.py...:
import geopy
import pickle
import sqlite3
class Cache(object):
def __init__(self, fn='cache.db'):
self.conn = conn = sqlite3.connect(fn)
cur = conn.cursor()
cur.execute('CREATE TABLE IF NOT EXISTS '
'Geo ( '
'address STRING PRIMARY KEY, '
'location BLOB '
')')
conn.commit()
def address_cached(self, address):
cur = self.conn.cursor()
cur.execute('SELECT location FROM Geo WHERE address=?', (address,))
res = cur.fetchone()
if res is None: return False
return pickle.loads(res[0])
def save_to_cache(self, address, location):
cur = self.conn.cursor()
cur.execute('INSERT INTO Geo(address, location) VALUES(?, ?)',
(address, sqlite3.Binary(pickle.dumps(location, -1))))
self.conn.commit()
if __name__ == '__main__':
# run a small test in this case
import pprint
cache = Cache('test.db')
address = '1 Murphy St, Sunnyvale, CA'
location = cache.address_cached(address)
if location:
print('was cached: {}\n{}'.format(location, pprint.pformat(location.raw)))
else:
print('was not cached, looking up and caching now')
g = geopy.geocoders.GoogleV3()
location = g.geocode(address)
print('found as: {}\n{}'.format(location, pprint.pformat(location.raw)))
cache.save_to_cache(address, location)
print('... and now cached.')
I hope the ideas illustrated here are clear enough -- there are alternatives on each design choice, but I've tried to keep things simple (in particular, I'm using a simple example-cum-mini-test when this module is run directly, in lieu of a proper suite of unit-tests...).
For the bit about serializing to/from blobs, I've chosen pickle with the "highest protocol" (-1) protocol -- cPickle of course would be just as good in Python 2 (and faster:-) but these days I try to write code that's equally good as Python 2 or 3, unless I have specific reasons to do otherwise:-). And of course I'm using a different filename test.db for the sqlite database used in the test, so you can wipe it out with no qualms to test some variation, while the default filename meant to be used in "production" code stays intact (it is quite a dubious design choice to use a filename that's relative -- meaning "in the current directory" -- but the appropriate way to decide where to place such a file is quite platform dependent, and I didn't want to get into such exoterica here:-).
If any other question is left, please ask (perhaps best on a separate new question since this answer has already grown so big!-).

How about creating a list or dict in which all geocoded addresses are stored? Then you could simply check.
if address in cached:
//skip

This cache will live from the moment the module is loaded and won't be saved after you finish using this module. You'll probably want to save it into a file with pickle or into a database and load it next time you load the module.
from geopy import geocoders
cache = {}
def geocode( address ):
# address ~= "175 5th Avenue NYC"
g = geocoders.GoogleV3()
cache = addressCached( address )
if ( cache != False ):
# We have seen this exact address before,
# return the saved location
return cache
# Otherwise, get a new location from geocoder
location = g.geocode( address )
saveToCache( address, location )
return location
def addressCached( address ):
global cache
if address in cache:
return cache[address]
return None
def saveToCache( address, location ):
global cache
cache[address] = location

Here is a simple implementation that uses the python shelve package for transparent and persistent caching:
import geopy
import shelve
import time
class CachedGeocoder:
def __init__(self, source = "Nominatim", geocache = "geocache.db"):
self.geocoder = getattr(geopy.geocoders, source)()
self.db = shelve.open(geocache, writeback = True)
self.ts = time.time()+1.1
def geocode(self, address):
if not address in self.db:
time.sleep(max(1 -(time.time() - self.ts), 0))
self.ts = time.time()
self.db[address] = self.geocoder.geocode(address)
return self.db[address]
geocoder = CachedGeocoder()
print geocoder.geocode("San Francisco, USA")
It stores a timestamp to ensure that requests are not issued more frequently than once per second (which is a requirement for Nominatim). One weakness is that doesn't deal with timed out responses from Nominatim.

The easiest way to cache geocoding requests is probably to use requests-cache:
import geocoder
import requests_cache
requests_cache.install_cache('geocoder_cache')
g = geocoder.osm('Mountain View, CA') # <-- This request will go to OpenStreetMap server.
print(g.latlng)
# [37.3893889, -122.0832101]
g = geocoder.osm('Mountain View, CA') # <-- This request should be cached, and return immediatly
print(g.latlng)
# [37.3893889, -122.0832101]
requests_cache.uninstall_cache()
Just for debugging purposes, you could check if requests are indeed cached:
import geocoder
import requests_cache
def debug(response):
print(type(response))
return True
requests_cache.install_cache('geocoder_cache2', filter_fn=debug)
g = geocoder.osm('Mountain View, CA')
# <class 'requests.models.Response'>
# <class 'requests.models.Response'>
g = geocoder.osm('Mountain View, CA')
# <class 'requests_cache.models.response.CachedResponse'>
requests_cache.uninstall_cache()

Programmatically detect system-proxy settings on Windows XP with Python

I develop a critical application used by a multi-national company. Users in offices all around the globe need to be able to install this application.
The application is actually a plugin to Excel and we have an automatic installer based on Setuptools' easy_install that ensures that all a project's dependancies are automatically installed or updated any time a user switches on their Excel. It all works very elegantly as users are seldom aware of all the installation which occurs entirely in the background.
Unfortunately we are expanding and opening new offices which all have different proxy settings. These settings seem to change from day to day so we cannot keep up with the outsourced security guys who change stuff without telling us. It sucks but we just have to work around it.
I want to programatically detect the system-wide proxy settings on the Windows workstations our users run:
Everybody in the organisazation runs Windows XP and Internet Explorer. I've verified that everybody can download our stuff from IE without problems regardless of where they are int the world.
So all I need to do is detect what proxy settings IE is using and make Setuptools use those settings. Theoretically all of this information should be in the Registry.. but is there a better way to find it that is guaranteed not to change with people upgrade IE? For example is there a Windows API call I can use to discover the proxy settings?
In summary:
We use Python 2.4.4 on Windows XP
We need to detect the Internet Explorer proxy settings (e.g. host, port and Proxy type)
I'm going to use this information to dynamically re-configure easy_install so that it can download the egg files via the proxy.
UPDATE0:
I forgot one important detail: Each site has an auto-config "pac" file.
There's a key in Windows\CurrentVersion\InternetSettings\AutoConfigURL which points to a HTTP document on a local server which contains what looks like a javascript file.
The pac script is basically a series of nested if-statements which compare URLs against a regexp and then eventually return the hostname of the chosen proxy-server. The script is a single javascript function called FindProxyForURL(url, host)
The challenge is therefore to find out for any given server which proxy to use. The only 100% guaranteed way to do this is to look up the pac file and call the Javascript function from Python.
Any suggestions? Is there a more elegant way to do this?

Here's a sample that should create a bullet green (proxy enable) or red (proxy disable) in your systray
It shows how to read and write in windows registry
it uses gtk
#!/usr/bin/env python
import gobject
import gtk
from _winreg import *
class ProxyNotifier:
def __init__(self):
self.trayIcon = gtk.StatusIcon()
self.updateIcon()
#set callback on right click to on_right_click
self.trayIcon.connect('popup-menu', self.on_right_click)
gobject.timeout_add(1000, self.checkStatus)
def isProxyEnabled(self):
aReg = ConnectRegistry(None,HKEY_CURRENT_USER)
aKey = OpenKey(aReg, r"Software\Microsoft\Windows\CurrentVersion\Internet Settings")
subCount, valueCount, lastModified = QueryInfoKey(aKey)
for i in range(valueCount):
try:
n,v,t = EnumValue(aKey,i)
if n == 'ProxyEnable':
return v and True or False
except EnvironmentError:
break
CloseKey(aKey)
def invertProxyEnableState(self):
aReg = ConnectRegistry(None,HKEY_CURRENT_USER)
aKey = OpenKey(aReg, r"Software\Microsoft\Windows\CurrentVersion\Internet Settings", 0, KEY_WRITE)
if self.isProxyEnabled() :
val = 0
else:
val = 1
try:
SetValueEx(aKey,"ProxyEnable",0, REG_DWORD, val)
except EnvironmentError:
print "Encountered problems writing into the Registry..."
CloseKey(aKey)
def updateIcon(self):
if self.isProxyEnabled():
icon=gtk.STOCK_YES
else:
icon=gtk.STOCK_NO
self.trayIcon.set_from_stock(icon)
def checkStatus(self):
self.updateIcon()
return True
def on_right_click(self, data, event_button, event_time):
self.invertProxyEnableState()
self.updateIcon()
if __name__ == '__main__':
proxyNotifier = ProxyNotifier()
gtk.main()

As far as I know, In a Windows environment, if no proxy environment variables are set, proxy settings are obtained from the registry's Internet Settings section. .
Isn't it enough?
Or u can get something useful info from registry:
HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Internet Settings\ProxyServer
Edit:
sorry for don't know how to format comment's source code, I repost it here.
>>> import win32com.client
>>> js = win32com.client.Dispatch('MSScriptControl.ScriptControl')
>>> js.Language = 'JavaScript'
>>> js.AddCode('function add(a, b) {return a+b;}')
>>> js.Run('add', 1, 2)
3

Unique session id in python

How do I generate a unique session id in Python?

UPDATE: 2016-12-21
A lot has happened in a the last ~5yrs. /dev/urandom has been updated and is now considered a high-entropy source of randomness on modern Linux kernels and distributions. In the last 6mo we've seen entropy starvation on a Linux 3.19 kernel using Ubuntu, so I don't think this issue is "resolved", but it's sufficiently difficult to end up with low-entropy randomness when asking for any amount of randomness from the OS.
I hate to say this, but none of the other solutions posted here are correct with regards to being a "secure session ID."
# pip install M2Crypto
import base64, M2Crypto
def generate_session_id(num_bytes = 16):
return base64.b64encode(M2Crypto.m2.rand_bytes(num_bytes))
Neither uuid() or os.urandom() are good choices for generating session IDs. Both may generate random results, but random does not mean it is secure due to poor entropy. See "How to Crack a Linear Congruential Generator" by Haldir or NIST's resources on Random Number Generation. If you still want to use a UUID, then use a UUID that was generated with a good initial random number:
import uuid, M2Crypto
uuid.UUID(bytes = M2Crypto.m2.rand_bytes(num_bytes)))
# UUID('5e85edc4-7078-d214-e773-f8caae16fe6c')
or:
# pip install pyOpenSSL
import uuid, OpenSSL
uuid.UUID(bytes = OpenSSL.rand.bytes(16))
# UUID('c9bf635f-b0cc-d278-a2c5-01eaae654461')
M2Crypto is best OpenSSL API in Python atm as pyOpenSSL appears to be maintained only to support legacy applications.

You can use the uuid library like so:
import uuid
my_id = uuid.uuid1() # or uuid.uuid4()

Python 3.6 makes most other answers here a bit out of date. Versions including 3.6 and beyond include the secrets module, which is designed for precisely this purpose.
If you need to generate a cryptographically secure string for any purpose on the web, refer to that module.
https://docs.python.org/3/library/secrets.html
Example:
import secrets
def make_token():
"""
Creates a cryptographically-secure, URL-safe string
"""
return secrets.token_urlsafe(16)
In use:
>>> make_token()
'B31YOaQpb8Hxnxv1DXG6nA'

import os, base64
def generate_session():
return base64.b64encode(os.urandom(16))

It can be as simple as creating a random number. Of course, you'd have to store your session IDs in a database or something and check each one you generate to make sure it's not a duplicate, but odds are it never will be if the numbers are large enough.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Nail down system requirements: MIN_CPU_COUNT, MIN_RAM - python

Related

How to see the "Bundle" output of Hypothesis Python library? (Stateful testing)

Get full computer name from a network drive letter in python

Most straightforward way to cache geocoding data

Programmatically detect system-proxy settings on Windows XP with Python

Unique session id in python

Categories

Resources