I'm trying to connect two separate codes into one program. I need to put one string from first to second part.
First:
import boto3
if __name__ == "__main__":
bucket='BUCKET-NAME'
collectionId='COLLECTION-ID'
fileName='input.jpg'
threshold = 70
maxFaces=1
client=boto3.client('rekognition')
response=client.search_faces_by_image(CollectionId=collectionId,
Image={'S3Object':{'Bucket':bucket,'Name':fileName}},
FaceMatchThreshold=threshold,
MaxFaces=maxFaces)
faceMatches=response['FaceMatches']
for match in faceMatches:
print (match['Face']['FaceId'])
Second:
import boto3
from boto3.dynamodb.conditions import Key, Attr
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('faces')
response = table.scan(
FilterExpression=Attr('faceid').eq('FaceId')
)
items = response['Items']
print(items)
I need to put ID shown by print (match['Face']['FaceId']) from first code to FaceId in second code.
I tried to define a variable and put a value into it and then get it later but I could not do it correctly
Typically, you're write your first block of code as a library/module with a function that does some unit of work and returns the result. Then the second block of code would import the first and call the function.
# lib.py
def SomeFunction(inputs):
output = doSomething(inputs)
return output
# main.py
import lib
data = ...
result = lib.SomeFunction(data)
moreWork(result)
If you want two separate programs that run independently and share data, you want Inter-process communication. You can get processes to share information with each other via: a file/fifo in the filesystem; a network socket; shared memory; and STDIO (and probably more). However, IPC is definitely more work than synchronous library calls.
Related
I will be attaching a client to listen in on an OPC server comprising of thousands of nodes that reads mechanical data from various sensors. The code for the client will be written in some version of Python3.
So essentially, the task comes down to:
Connect to the server endpoint
Iterate through all the nodes and pick up its values from the server
Store the read values in some format (not decided yet)
I've written a basic sample code, just to add some reference:
import sys
import time
import datetime
import asyncio
import json
from opcua import Client
endpoint = "opc.tcp://some-server-endpoint:portid/foobar"
# In actual DEV code, there will be about 100_000 nodes.
# Will make reading the nodes into a separate loop function
# to avoid hard coding all the node IDs.
nodes = [
"i=1001",
"i=1002",
"i=1003"
]
# Connect to Endpoint (Probably will wrap this into a
# function in DEV as well but need to look at how the
# library dependencies handle the connection)
try:
server = Client(endpoint)
server.connect()
print("Connection Success.")
except Exception as err:
print("Connection Error:", err)
sys.exit(1)
# Function to return node values, nodeID and timestamp
async def read_node(arg) -> str:
node_conn = server.get_node(arg)
node_value = node_conn.get_value()
output = {
"nodeID": str(arg),
"value": str(node_value),
"timestamp": str(datetime.datetime.now().strftime("%Y%m%d%H%M%S"))
}
global output_log
output_log = json.dumps(output)
print(output_log)
# Returns output to JSON file (For the sake of practice,
# just made it to dump all outputs into a single JSON
# file for now)
def json_convert(data) -> str:
with open("opc_data.json", "a") as outfile:
json.dump(data, outfile, indent=2)
# Loops through nodes list, outputs to json file
async def main() -> None:
while True:
for i in range(len(nodes)):
async_read = await read_node(nodes[i])
json_convert(output_log)
time.sleep(1)
if __name__ == "__main__":
asyncio.run(main())
I am not looking for a code review (although all suggestions are welcome!!).
My primary question is how to maximize the efficiency of having to loop through thousands of items in the form of a list or array when data needs to be read consistently and constantly on 1 sec intervals or less.
In the above code, I used async so that each batch of data can be stored independently and together in various storage locations, but there seems to be other options such as multiprocessing or threading also available.
Each option seems to have its pros and cons, and I am not sure how many cores I can end up using for this job which could make things like multiprocessing less efficient.
Currently, the only thing I can think of is maybe using a library like numpy to take advantage of something like np.array.
I have the following function (shortened for readability), which I parallelize using Python's (3.5) multiprocessing module:
def evaluate_prediction(enumeration_tuple):
i = enumeration_tuple[0]
logits_pred = enumeration_tuple[1]
print("This prints succesfully")
print("This never gets printed: ")
print(enumeration_tuple[0])
filename = sample_names_test[i]
onehots_pred = logits_to_onehots(logits_pred)
np.save("/media/nfs/7_raid/ebos/models/fcn/" + channels + "/test/ndarrays/" + filename, onehots_pred)
However, this function hangs whenever I attempt to read its input argument. Execution can get past the logits_pred = enumeration_tuple[1] line, as evidenced by a print statement printing a simple string, but it halts whenever I print(logits_pred). So apparently, whenever I actually need the passed value, the process stops. I do not get an exception or error message. When using either Python's built-in map() function or a for-loop, the function finishes succesfully. I should have sufficient memory en computing power available. All processes are writing to different files. enumerate(predictions) yields correct index-value pairs, as expected. I call this function using Pool.map():
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
Why is it hanging? And how can I get an exception, so I know what's going wrong?
UPDATE: After outsourcing the mapped function to another module, importing it from there, and adding __init__.py to my directory, I manage to print the first item in the tuple, but not the second.
I had a similar issue before, and a solution that worked for me was to put the function you want to parallelize in a separate module and then import it.
from eval_prediction import evaluate_prediction
pool = multiprocessing.Pool()
file_results = pool.map(evaluate_prediction, enumerate(predictions))
I assume you will save the function definition inside a filename eval_prediction.py in same directory. Make sure you have __init__.py as well.
I want to use different API keys for data scraping each time my program is run.
For instance, I have the following 2 keys:
apiKey1 = "123abc"
apiKey2 = "345def"
and the following URL:
myUrl = http://myurl.com/key=...
When the program is run, I would like myUrl to be using apiKey1. Once it is run again, I would then like it to use apiKey2 and so forth... i.e:
First Run:
url = "http://myurl.com/key=" + apiKey1
Second Run:
url = "http://myurl.com/key=" + apiKey2
Sorry if this doesn't make sense, but does anyone know a way to do this? I have no idea.
EDIT:
To avoid confusion, I've had a look at this answer. But this doesn't answer my query. My target is to cycle between the variables between executions of my script.
I would use a persistent dictionary (it's like a database but more lightweight). That way you can easily store the options and the one to visit next.
There's already a library in the standard library that provides such a persistent dictionary: shelve:
import shelve
filename = 'target.shelve'
def get_next_target():
with shelve.open(filename) as db:
if not db:
# Not created yet, initialize it:
db['current'] = 0
db['options'] = ["123abc", "345def"]
# Get the current option
nxt = db['options'][db['current']]
db['current'] = (db['current'] + 1) % len(db['options']) # increment with wraparound
return nxt
And each call to get_next_target() will return the next option - no matter if you call it several times in the same execution or once per execution.
The logic could be simplified if you never have more than 2 options:
db['current'] = 0 if db['current'] == 1 else 1
But I thought it might be worthwhile to have a way that can easily handle multiple options.
Here is an example of how you can do it with automatic file creation if no such file exists:
import os
if not os.path.exists('Checker.txt'):
'''here you check whether the file exists
if not this bit creates it
if file exists nothing happens'''
with open('Checker.txt', 'w') as f:
#so if the file doesn't exist this will create it
f.write('0')
myUrl = 'http://myurl.com/key='
apiKeys = ["123abc", "345def"]
with open('Checker.txt', 'r') as f:
data = int(f.read()) #read the contents of data and turn it into int
myUrl = myUrl + apiKeys[data] #call the apiKey via index
with open('Checker.txt', 'w') as f:
#rewriting the file and swapping values
if data == 1:
f.write('0')
else:
f.write('1')
I would rely on an external process to hold which key was used last time,
or even simpler I would count executions of the script and use a key if execution count is an odd number, or the other key for an even number.
So I would introduce something like redis, which will also help a lot for other (future ?) features you may want to add in your project. redis is one of those tools that always give benefits at almost no cost, it's very practical to be able to rely on an external permanent storage - it can serve many purposes.
So here is how I would do it:
first make sure redis-server is running (can be started automatically as a daemon, depends on your system)
install Python redis module
then, here is some Python code for inspiration:
import redis
db = redis.Redis()
if db.hincrby('execution', 'count', 1) % 2:
key = apiKey1
else:
key = apiKey2
That's it !
So, my first two functions, sqlPull() and dupCatch() work perfectly, but when I try to pass new_data (the unique MySQL tuple rows) to the post() function, nothing happens. I am not getting errors, and it continues to run. Normally if I were to execute a static post request I would see it instantaneously in Google Analytics, but nothing is appearing, so I know something is wrong with the function. I assume the error lies in the for loop within the post() function, but I am not sure what about it. Maybe I can't unpack the variables like I am currently doing because of what I did to them in the previous function?
import mysql.connector
import datetime
import requests
import time
def sqlPull():
connection = mysql.connector.connect(user='xxxxx', password='xxxxx', host='xxxxx', database='MeshliumDB')
cursor = connection.cursor()
cursor.execute("SELECT TimeStamp, MAC, RSSI FROM wifiscan ORDER BY TimeStamp DESC LIMIT 20;")
data = cursor.fetchall()
connection.close()
time.sleep(5)
return data
seen = set()
def dupCatch():
data = sqlPull()
new_data = []
for (TimeStamp, MAC, RSSI) in data:
if (TimeStamp, MAC, RSSI) not in seen:
seen.add((TimeStamp, MAC, RSSI))
new_data.append((TimeStamp, MAC, RSSI))
return new_data
def post():
new_data = dupCatch()
for (TimeStamp, MAC, RSSI) in new_data:
requests.post("http://www.google-analytics.com/collect",
data="v=1&tid=UA-22560594-2&cid={}&t=event&ec={}&ea=InStore&el=RSSI&ev={}&pv=SipNSiz_Store".format(
MAC,
RSSI,
TimeStamp)
)
while run is True:
sqlPull()
dupCatch()
post()
Your post function calls dupCatch(). But you also call dupCatch in your main run loop, right before calling dupCatch.
Similarly, your dupCatch function calls sqlPull(), but you also call sqlPull in your main run loop.
Those extra calls mean you end up throwing away 2 batches of data for each batch you process.
You could restructure your code so your functions take their values as arguments, like this:
while run is True:
data = sqlPull()
newdata = dupCatch(data)
post(newdata)
… and then change dupCatch and post so they use those arguments, instead of calling the functions themselves.
Alternatively, you could just remove the extra calls in the main run loop.
I've been playing around with the pybluez module recently to scan for nearby Bluetooth devices. What I want to do now is extend the program to also find nearby WiFi client devices.
The WiFi client scanner will have need to have a While True loop to continually monitor the airwaves. If I were to write this as a straight up, one file program, it would be easy.
import ...
while True:
client = scan()
print client['mac']
What I want, however, is to make this a module. I want to be able to reuse it later and, possible, have others use it too. What I can't figure out is how to handle the loop.
import mymodule
scan()
Assuming the first example code was 'mymodule', this program would simply print out the data to stdout. I would want to be able to use this data in my program instead of having the module print it out...
How should I code the module?
I think the best approach is going to be to have the scanner run on a separate thread from the main program. The module should have methods that start and stop the scanner, and another that returns the current access point list (using a lock to synchronize). See the threading module.
How about something pretty straightforward like:
mymodule.py
import ...
def scanner():
while True:
client = scan()
yield client['mac']
othermodule.py
import mymodule
for mac in mymodule.scanner():
print mac
If you want something more useful than that, I'd also suggest a background thread as #kindall did.
Two interfaces would be useful.
scan() itself, which returned a list of found devices, such that I could call it to get an instantaneous snapshot of available bluetooth. It might take a max_seconds_to_search or a max_num_to_return parameter.
A "notify on found" function that accepted a callback. For instance (maybe typos, i just wrote this off the cuff).
def find_bluetooth(callback_func, time_to_search = 5.0):
already_found = []
start_time = time.clock()
while 1:
if time.clock()-start_time > 5.0: break
found = scan()
for entry in found:
if entry not in already_found:
callback_func(entry)
already_found.append(entry)
which would be used by doing this:
def my_callback(new_entry):
print new_entry # or something more interesting...
find_bluetooth(my_callback)
If I get your question, you want scan() in a separate file, so that it can be reused later.
Create utils.py
def scan():
# write code for scan here.
Create WiFi.py
import utils
def scan_wifi():
while True:
cli = utils.scan()
...
return