Why does my geopy loop always in end in Killed: 9? - python

I have a list of addresses and I just wind up getting a Kill 9 error when I try to add coordinates.
Is it timing out? I added sleep times to prevent it .
I get this error Killed: 9
def do_geocode(Nominatim, address):
time.sleep(3)
try:
return Nominatim.geocode(address)
except GeocoderTimedOut:
return do_geocode(Nominatim,address)
def addCoordinates(businessList):
businessList[0] = ["pageNum","entryNum","name","address","tagOne","tagTwo","tagThree","geoAddress","appendedLocation","latitude","longitude","key"]
geolocator = Nominatim(timeout=None)
z = 0
i=1
while i < len(businessList):
longitude = ""
latitude = ""
geoLocation = ""
geoAddress = ""
entry = []
appendedLocation = (businessList[i][3] + ", San Francisco")
geoLocation = do_geocode(geolocator, appendedLocation)
if geoLocation is not None:
geoAddress = geoLocation.address
latitude = geoLocation.latitude
longitude = geoLocation.longitude
entry = [geoAddress, appendedLocation, str(latitude), str(longitude)]
j=0
while j < len(entry):
businessList[i] += [entry[j]]
j+=1
print("coordinates added")
z +=1
print(z)
i+=1

Killed: 9 probably means that your Python script has been terminated by something in your OS (perhaps OOM killer?). Ensure your script doesn't occupy the whole available memory of the machine.
For geopy specifically I'd suggest to take a look at the RateLimiter class. Also note that you need to specify your unique User Agent when using Nominatim (which is explained in the Nominatim class docs). You'd get something like this:
from geopy.extra.rate_limiter import RateLimiter
def addCoordinates(businessList):
businessList[0] = ["pageNum","entryNum","name","address","tagOne","tagTwo","tagThree","geoAddress","appendedLocation","latitude","longitude","key"]
geolocator = Nominatim(user_agent="specify_your_app_name_here", timeout=20)
geocode = RateLimiter(
geolocator.geocode,
min_delay_seconds=3.0,
error_wait_seconds=3.0,
swallow_exceptions=False,
max_retries=10,
)
z = 0
i=1
while i < len(businessList):
longitude = ""
latitude = ""
geoLocation = ""
geoAddress = ""
entry = []
appendedLocation = (businessList[i][3] + ", San Francisco")
geoLocation = geocode(appendedLocation)
if geoLocation is not None:
geoAddress = geoLocation.address
latitude = geoLocation.latitude
longitude = geoLocation.longitude
entry = [geoAddress, appendedLocation, str(latitude), str(longitude)]
j=0
while j < len(entry):
businessList[i] += [entry[j]]
j+=1
print("coordinates added")
z +=1
print(z)
i+=1

Related

Python and MQTT optimization

I have the following code that I've been asked to 'tidy up'. At the moment I think its not optimal, so I'd like some advice on how to make it so please.
I have 4 voltages received as MQTT messages (power1,power2 etc.), corresponding to 4 different measuring stations, each of which is a value I append to a new array 'power'. If the values lie between 'Vtrig' and 'Vlow', they are appended to another array 'flags', whose length, when exceeding a certain value (Flag_length), triggers a flag I can send out as an MQTT message, so that way if I get multiple values outside of the required range I'm notified. Otherwise, the array is emptied and we start again.
Here's what I wrote so far:
import smbus
import time
import csv
from datetime import datetime
import paho.mqtt.client as paho
MQTT_HOST = '10.10.20.122'
MQTT_PORT = 1883
MQTT_CLIENT_ID = 'lowerStation'
TOPIC = 'pwrTest/testData'
TOPIC_TOM = 'pwrTest/Error'
TOPICeval = 'pwrTest/testEval'
Vtrig = 12
Vlow = 0.1
Flag_length = 7
flags0=[]
flags1=[]
flags2=[]
flags3=[]
client = paho.Client(MQTT_CLIENT_ID)
client.connect(MQTT_HOST, MQTT_PORT)
# Serial numbers to Volito connected to measureing station, station 1 to 4
SN = [109, 78, 86, 60]
def on_connect(client, userdata, flags, rc):
if rc==0:
print('connected')
else:
print('Bad connection code =', rc)
broker = '10.10.20.122'
client.connect(broker) #connect to broker
client.on_connect = on_connect #bind call back function
print('Connecting to broker', broker)
But this is really the relevant part.
payload = str(power1) + "," + "Station1," + "SN" +str(SN[0])
client.publish(TOPIC, payload)
payload = str(power2) + "," + "Station2," + "SN" +str(SN[1])
client.publish(TOPIC, payload)
payload = str(power3) + "," + "Station3," + "SN" +str(SN[2])
client.publish(TOPIC, payload)
payload = str(power4) + "," + "Station4," + "SN" +str(SN[3])
client.publish(TOPIC, payload)
power = []
power.append(datetime.now().strftime("%d/%m/%Y %H:%M:%S"))
power.append(power1)
power.append(power2)
power.append(power3)
power.append(power4)
for x in power[1:2]:
if x <Vtrig and x> Vlow:
flags0.append(x)
elif x>= Vtrig:
flags0 = []
if len(flags0) > Flag_length:
payload = '5.0,' + "Station1," + "SN" +str(SN[0])
flags0 = []
client.publish(TOPICeval,payload)
for x in power[2:3]:
if x <Vtrig and x> Vlow:
flags1.append(x)
elif x>= Vtrig:
flags1 = []
if len(flags1) > Flag_length:
payload = '5.0,' + "Station2," + "SN" +str(SN[1])
flags1 = []
client.publish(TOPICeval,payload)
for x in power[3:4]:
if x <Vtrig and x> Vlow:
flags2.append(x)
elif x>= Vtrig:
flags2 = []
if len(flags2) > Flag_length:
payload ='5.0,' + "Station3," + "SN" +str(SN[2])
flags2 = []
client.publish(TOPICeval,payload)
for x in power[4:5]:
if x <Vtrig and x> Vlow:
flags3.append(x)
elif x>= Vtrig:
flags3 = []
if len(flags3) > Flag_length:
payload ='5.0,' + "Station4," + "SN" +str(SN[3])
flags3 = []
client.publish(TOPICeval,payload)
print('running')
time.sleep(10)
As you can see I repeat the same code for each entry, is there a better way of writing this in a for loop?
Nice spot on simplification. I think you could do it like this.
Even better you should probably start using functions or debugging your code will be a nightmare in the future.
buffer = {} #a dict but probably other ways to do it
for index,value in enumerate(power):
if value<Vtrig and value>Vlow: #your conditional checks
buffer[index].append(value) #appends to the index of your power list
elif value>= Vtrig: #conditional checks
buffer[index] = [] #clearing the 'flag' list for index
if len(buffer[index])> Flag_length:#your conditional check
payload = '5.0, Station{}, SN{}'.format(index+1,SN[index])
#index from 0 so i think you want + 1 here for station and SN to equal the index
buffer[index] = [] #clearing again
client.publish(TOPICeval,payload) #publishihing whatever you wanted
On another note, consider creating a class and creating functions. Would make your code much more readable.

I am getting 'index out of bound error' when reading from csv in pandas but not when I extract the data via api. What could be the reason?

So for my bot, I am first extracting data via api and storing it in csv. When I run my for loop on data via api, it gives no error and runs smoothly.
But when the csv file is read and run, it gives out of bound error.
This is my function to generate data:
full_list = pd.DataFrame(columns=("date","open","high","low","close","volume","ticker","RSI","ADX","20_sma","max_100"))
def stock_data(ticker):
create_data = fetchOHLC(ticker,'minute',60)
create_data["ticker"] = ticker
create_data["RSI"] = round(rsi(create_data,25),2)
create_data["ADX"] = round(adx(create_data,14),2)
create_data["20_sma"] = round(create_data.close.rolling(10).mean().shift(),2)
create_data["max_100"] = create_data.close.rolling(100).max().shift()
create_data.dropna(inplace=True,axis=0)
create_data.reset_index(inplace=True)
return create_data
stocklist = open("stocklist.txt","r+")
tickers = stocklist.readlines()
for x in tickers:
try:
full_list = full_list.append(stock_data(x.strip()))
except:
print(f'{x.strip()} did not work')
full_list.to_csv("All_Data")
full_list
So when I run the same code below on dataframe created I got no error. But when I run the same code on the csv file, I get out of bound error.
list_tickers = full_list["ticker"].unique()
for y in list_tickers[:2]:
main = full_list[full_list["ticker"]==y]
pos = 0
num = 0
tick = y
signal_time = 0
signal_rsi = 0
signal_adx = 0
buy_time = 0
buy_price = 0
sl = 0
#to add trailing sl in this.
for x in main.index:
maxx = main.iloc[x]["max_100"]
rsi = main.iloc[x]["RSI"]
adx = main.iloc[x]["ADX"]
sma = main.iloc[x]["20_sma"]
close = main.iloc[x]["close"]
high = main.iloc[x]["high"]
if rsi > 80 and adx > 35 and close > maxx:
if pos == 0:
buy_price = main.iloc[x+1]["open"]
buy_time = main.iloc[x+1]["date"]
pos=1
signal_time = main.iloc[x]["date"]
signal_rsi = main.iloc[x]["RSI"]
signal_adx = main.iloc[x]["ADX"]
elif close < sma:
if pos == 1:
sell_time = main.iloc[x]["date"]
sell_price = sma*.998
pos=0
positions.loc[positions.shape[0]] = [y,signal_time,signal_rsi,signal_adx,buy_time,buy_price,sell_time,sell_price]
Any idea why?
Here is a cleanup and file call code:
full_list = pd.read_csv("All_data")
full_list.dropna(inplace=True,axis=0)
full_list.drop(labels="Unnamed: 0",axis=1) < index of previous dataframe
full_list.head(5)
Thanks

QUERY_EXCEEDED_MAX_MATCHES_ALLOWED error on Kaltura API (Python)

I'm unable to generate all entries in Kaltura. An ApiException with the message "Unable to generate list. max matches value was reached" (Error: QUERY_EXCEEDED_MAX_MATCHES_ALLOWED) gets triggered.
I tried to work around such issue by setting my sessionPrivileges to disableentitlement
class class_chk_integrity():
client = None
pagesize = 0
def __init__(self,worker_num, progress):
self.pagesize = 30
self.worker_num = worker_num
self.progress = progress
config = KalturaConfiguration(2723521)
config.serviceUrl = "https://www.kaltura.com/"
self.client = KalturaClient(config)
ks = self.client.session.start("KALTURA_ADMIN_SECRET",
"email#email.com",
KalturaPluginsCore.KalturaSessionType.ADMIN,
"KALTURA_PARTNER_ID",
432000,
"disableentitlement")
self.client.setKs(ks)
I also tried to filter based on the id's. However, I can't manage to put the filter.idNotIn to work properly.
def get_total_reg(self, cont, lastEntryIds, lastEntryCreatedAt):
filter = KalturaPluginsCore.KalturaBaseEntryFilter()
if lastEntryIds != "":
filter.idNotIn = lastEntryIds
filter.orderBy = KalturaBaseEntryOrderBy.CREATED_AT_DESC
pager = KalturaPluginsCore.KalturaFilterPager()
pageIndex = 1
entriesGot = 0
pager.pageSize = self.pagesize
pager.setPageIndex = pageIndex
result = self.client.baseEntry.list(filter, pager)
totalCount = result.totalCount
if totalCount > 10000:
totalCount = 9970
if totalCount <= 0:
cont = False
while entriesGot < totalCount:
pager.pageSize = self.pagesize
pageIndex += 1
pager.pageIndex = pageIndex
result = self.client.baseEntry.list(filter, pager)
entriesGot += len(result.objects)
for e in result.objects:
if lastEntryIds == "":
lastEntryIds.append(e.id)
else:
lastEntryIds.append(e.id)
lastEntryCreatedAt = e.createdAt
return result.totalCount, self.pagesize, cont, lastEntryIds, lastEntryCreatedAt
This is my how I'm calling the functions
if __name__ == '__main__':
try:
log = _ServiceUtils.log()
log.setup('all', 'integrity')
cont = True
lastEntryIds = []
lastEntryCreatedAt = 0
while cont is True:
kmc = class_chk_integrity(0,0)
kmc_total_reg, kmc_page_size, cont, lastEntryIds, lastEntryCreatedAt = kmc.get_total_reg(cont, lastEntryIds, lastEntryCreatedAt)
interval = 10
max_threads = math.ceil(kmc_total_reg / (interval * kmc_page_size))
# max_threads = 1
threads_list = []
print('TOTAL REG : %s | PAGE_SIZE : %s | INTERVAL : %s | THREADS : %s' % (kmc_total_reg,kmc_page_size,interval,max_threads))
progress = class_progress_thread(max_threads)
for index in range(0,max_threads):
page_ini = index * interval
page_end = index * interval + interval
progress.add_worker_progress(index,datetime.now())
threads_list.append(threading.Thread(target=thread_chk_integrity, args=(index, log, index * interval + 1,index * interval + interval,progress)))
threads_list.append(threading.Thread(target=thread_output_progress, args=(progress,max_threads)))
for thread in threads_list:
thread.start()
for thread in threads_list:
thread.join()
while not progress.stop(): time.sleep(30)
except KeyboardInterrupt:
try:
sys.exit(0)
except SystemExit:
os._exit(0)
I'd appreciate any help with this.
Thank you for your attention.
if totalCount > 10000:
totalCount = 9970
I'm curious to know why you are changing the totalCount this way.
Short answer - paging works as long as the result set is up to 10K.
To work around that, sort the result by creation date (as you did), and when you get to 10K, start with a new search where the created_at date in the filter is the last value you got in the previous search. Reset your paging of course.

Python multiprocessing don't wait all elements done

I have the following code
global total_pds
total_pds = []
ksplit = wr.s3.list_objects(pred_path)
ksplit = list(ksplit)
def process(x):
dk = wr.s3.read_parquet(path = pred_path+x,dataset=False)
return dk
def log_result(result):
print(len(total_pds), end = ' ')
total_pds.append(result)
def error_back(error):
print('error', error)
pool = mp.Pool(processes=4,maxtasksperchild=10)
dcms_info = [pool.apply_async(process, args=(spl,), callback = log_result, error_callback = error_back) for spl in ksplit]
for x in dcms_info:
x.wait()
pool.close()
pool.join()
dataset = pd.concat(total_pds, ignore_index=True)
the last element throw me this error:
error("'i' format requires -2147483648 <= number <= 2147483647"
Thank you

More Idiomatic way of extracting column values and assigning it back to DF (Pandas)

I have an existing routine that is running just fine. But since I'm new to python I find my codes to be ugly and I'm finding ways to improve it.
My program goes this way, I have created a class where it needs to have a complete address string which I need to process. Thus, this class have 4 attributes namely, address, state, city and zipcode.
This is the said class:
class Address:
def __init__(self, fulladdress):
self.fulladdress = fulladdress.split(",")
self.address = self.get_address()
self.city = self.get_city()
stateandzip = str(self.fulladdress[-1]).strip()
self.statezip = stateandzip.split(" ")
self.state = self.get_state()
self.zipcode = self.get_zipcode()
def get_address(self):
len_address = len(self.fulladdress)
if len_address == 3:
return self.fulladdress[0].strip()
elif len_address == 4:
return self.fulladdress[0].strip() + ", " + self.fulladdress[1].strip()
elif len_address > 5:
temp_address = self.fulladdress[0]
for ad in self.fulladdress[0:-3]:
temp_address = temp_address + ", " + ad.strip()
return temp_address
else:
return ''
def get_city(self):
if len(self.fulladdress) > 0:
address = self.fulladdress[-2]
return address.strip()
else:
return ''
def get_state(self):
if len(self.fulladdress) > 0:
return self.statezip[0]
else:
return ''
def get_zipcode(self):
if len(self.fulladdress) > 0:
return self.statezip[1]
else:
return ''
Now my existing routine needs to append this results to my dataframe which based on the address column. What I did to parse the address data is I use df.iterrows() since I don't know how can I use the Address Class using the df.apply method.
Here is the routine:
import pandas as pd
import datahelper as dh
import address as ad
# Find the header name of the Address column
address_header = dh.findcolumn('Address', list(df.columns))
header_loc = df.columns.get_loc(address_header)
address = []
city = []
state = []
zipcode = []
for index, row in df.iterrows():
if not row[address_header]:
address.append('')
city.append('')
state.append('')
zipcode.append('')
continue
# extract details from the address
address_data = ad.Address(row[address_header])
address.append(address_data.address)
city.append(address_data.city)
state.append(address_data.state)
zipcode.append(address_data.zipcode)
df[address_header] = address
df.insert(header_loc + 1, 'City', city)
df.insert(header_loc + 2, 'State', state)
df.insert(header_loc + 3, 'Zip Code', zipcode)
I would really appreciate if someone can point me in the right direction. Thank you in advance.
By the way, dh is a datahelper module where I put all my helper functions.
def findcolumn(searchstring, list):
if searchstring in list:
return searchstring
else:
try:
return [i for i in list if searchstring in i][0]
except ValueError:
return None
except IndexError:
return None
And here is my desired output given the sample data from Address column.
df = pd.DataFrame({'Address': ['Rubin Center Dr Ste, Fort Mill, SC 29708', 'Miami, FL 33169']})
Output should be:
Address | City | State | Zip Code
--------------------------------------------------
Rubin Center Dr Ste |Fort Mill| SC |29708
--------------------------------------------------
|Miami | FL |33169

Categories