String to array in python (boto3) - python

I have a code that results in multiple string objects and I want to convert them into an array. The end result looks like this
Queue1
Queue2
Queue3
but, I need it like this
[Queue1, Queue2, Queue3]
P.S. I am new to programming
import boto3
import numpy
rg = boto3.client('resource-groups')
cloudwatch = boto3.client('cloudwatch')
#def queuenames(rg):
response = rg.list_group_resources(
Group='env_prod'
)
resources = response.get('Resources')
for idents in resources:
identifier = idents.get('Identifier')
resourcetype = identifier.get('ResourceType')
if resourcetype == 'AWS::SQS::Queue':
RArn = identifier.get('ResourceArn')
step0 = RArn.split(':')
step1 = step0[5]
print(step1)

To convert a string to a list do this:
arr = 'Queue1 Queue2 Queue3'.split(' ')
# Result:
['Queue1', 'Queue2', 'Queue3']

You have a cycle where upon each step you print a string. Try creating an array before the cycle and adding each string inside the cycle, like this (I'm not fluent in Python, please excuse me if there is something wrong in the syntax)
import boto3
import numpy
rg = boto3.client('resource-groups')
cloudwatch = boto3.client('cloudwatch')
#def queuenames(rg):
response = rg.list_group_resources(
Group='env_prod'
)
resources = response.get('Resources')
myArray = []
for idents in resources:
identifier = idents.get('Identifier')
resourcetype = identifier.get('ResourceType')
if resourcetype == 'AWS::SQS::Queue':
RArn = identifier.get('ResourceArn')
step0 = RArn.split(':')
step1 = step0[5]
print(step1)
myArray.append(step1)
The code above will not change the way your output is displayed, but builds the array you need. You can remove the print line and print the array after the cycle instead.

Related

Kill dask worker when no result/fail

I am running a dask.distributed Client that gets data from an API with several parameters, parses results and joins/aggregates on each result. This is done with client.map()
Sometimes the API call gives an empty string because the specific combination of input parameters doesn't exist. It doesn't make sense to continue with computations and I would like to just kill that worker (without passing on e.g. a None).
How do I tell Dask to kill a worker if its result is None/error and exclude that future from the following operations?
Please let me know if you need more details.
Thanks.
EDIT:
Added a minimal working example to show the logic: the first map produces a lot of "useless" workers that I would like to kill.
Please notice that this is not my actual use case, I am querying an Influx database via http requests but the general structure of the code is the same. I am open to any comments on how to do that faster/more efficiently.
´´´´python
import requests
import numpy as np
import pandas as pd
from dask.distributed import Client, LocalCluster, as_completed
import dask.dataframe as dd
def fetch_html(pair):
req_string = 'https://www.bitstamp.net/api/v2/order_book/{currency_pair}/'
response = requests.get(req_string.format(currency_pair=pair))
try:
result = response.json()
return result
except Exception as e:
print('Error: {}\nMessage: {}'.format(e,response.reason))
return None
def parse_result(result):
if result:
data = {}
data['prices'] = [e[0] for e in result['bids']]
data['vols'] = [e[1] for e in result['bids']]
data['index'] = [result['timestamp'] for i in data['prices']]
df = pd.DataFrame.from_dict(data).set_index('index')
return df
else:
return pd.DataFrame()
def other_calcs(result):
if not result.empty:
# something
return result
else:
return pd.DataFrame()
def aggregator(res1, res2):
if (not res1.empty) and (not res2.empty):
# something
return res1
elif not res2.empty:
# something
return res2
elif not res1.empty:
return res1
else:
return pd.DataFrame()
if __name__=='__main__':
pairs = [
# legit params (100s of these):
'btcusd',
'btceur',
'btcgbp',
'bateur',
'batbtc',
'umausd',
'xrpusdt',
'eurteur',
'eurtusd',
'manausd',
'sandeur',
'storjusd',
'storjeur',
'adausd',
'adaeur',
# bad params resulting in error / empty result (100s of these)
'foobar',
'foobaz',
'foousd',
'barbaz',
'bazbar',
]
cluster = LocalCluster(n_workers=16, threads_per_worker=1)
client = Client(cluster)
futures_list = client.map(fetch_html, pairs)
futures_list = client.map(parse_result, futures_list)
futures_list = client.map(other_calcs, futures_list)
seq = as_completed(futures_list)
while seq.count() > 1:
f1 = next(seq)
f2 = next(seq)
new = client.submit(aggregator, f1, f2, priority=1)
seq.add(new)
final = next(seq)
final = final.result()
print(final.head())
´´´´

How to convert Hive schema to Bigquery schema using Python?

What i get from api:
"name":"reports"
"col_type":"array<struct<imageUrl:string,reportedBy:string>>"
So in hive schema I got:
reports array<struct<imageUrl:string,reportedBy:string>>
Note: I got hive array schema as string from api
My target:
bigquery.SchemaField("reports", "RECORD", mode="NULLABLE",
fields=(
bigquery.SchemaField('imageUrl', 'STRING'),
bigquery.SchemaField('reportedBy', 'STRING')
)
)
Note: I would like to create universal code that can handle when i receive any number of struct inside of the array.
Any tips are welcome.
I tried creating a script that parses your input which is reports array<struct<imageUrl:string,reportedBy:string>>. This converts your input to a dictionary that could be used as schema when creating a table. The main idea of the apporach is instead of using SchemaField(), you can create a dictionary which is much easier than creating SchemaField() objects with parameters using your example input.
NOTE: The script is only tested based on your input and it can parse more fields if added in struct<.
import re
from google.cloud import bigquery
def is_even(number):
if (number % 2) == 0:
return True
else:
return False
def clean_string(str_value):
return re.sub(r'[\W_]+', '', str_value)
def convert_to_bqdict(api_string):
"""
This only works for a struct with multiple fields
This could give you an idea on constructing a schema dict for BigQuery
"""
num_even = True
main_dict = {}
struct_dict = {}
field_arr = []
schema_arr = []
# Hard coded this since not sure what the string will look like if there are more inputs
init_struct = sample.split(' ')
main_dict["name"] = init_struct[0]
main_dict["type"] = "RECORD"
main_dict["mode"] = "NULLABLE"
cont_struct = init_struct[1].split('<')
num_elem = len(cont_struct)
# parse fields inside of struct<
for i in range(0,num_elem):
num_even = is_even(i)
# fields are seen on even indices
if num_even and i != 0:
temp = list(filter(None,cont_struct[i].split(','))) # remove blank elements
for elem in temp:
fields = list(filter(None,elem.split(':')))
struct_dict["name"] = clean_string(fields[0])
# "type" works for STRING as of the moment refer to
# https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types
# for the accepted data types
struct_dict["type"] = clean_string(fields[1]).upper()
struct_dict["mode"] = "NULLABLE"
field_arr.append(struct_dict)
struct_dict = {}
main_dict["fields"] = field_arr # assign dict to array of fields
schema_arr.append(main_dict)
return schema_arr
sample = "reports array<struct<imageUrl:string,reportedBy:string,newfield:bool>>"
bq_dict = convert_to_bqdict(sample)
client = bigquery.Client()
project = client.project
dataset_ref = bigquery.DatasetReference(project, '20211228')
table_ref = dataset_ref.table("20220203")
table = bigquery.Table(table_ref, schema=bq_dict)
table = client.create_table(table)
Output:

TypeError: list indices must be integers or slices, not Tag. One of my loops isn't working

The ultimate goal of this is to output select data columns to a .csv. I had it working once to where it only got the first table on the page but I needed both. Now it says this. Im quite new to python and IDK how I got to this point in the first place. I needed the call and put table but on the web page the calls came first and when I did .find I only got the calls. I am working on this with a friend and he put in the last two functions. He could get the columns I wanted but now we only get the calls. I tried to fix it and now it say the error in the title.
import bs4
import requests
import pandas as pd
import csv
from bs4 import BeautifulSoup
#sets desired ticker. in the future you could make this long
def ticker():
ticker = ['GME','NYMT']
return ticker
#creates list of urls for scrapet to grab
def ticker_site():
ticker_site = ['https://finance.yahoo.com/quote/'+x+'/options?p='+x for x in ticker()]
return ticker_site
optionRows = []
for i in range(len(ticker_site())):
optionRows.append([])
def ticker_gets():
option_page = ticker_site()
requested_page = requests.get(option_page[i])
ticker_soup = BeautifulSoup(requested_page.text,'html.parser')
return ticker_soup
def soup_search():
table = ticker_gets()
both_tables = table.find_all('table')
call_table = both_tables[0]
put_table= both_tables[1]
call_rows = call_table.find('tr')
put_rows = put_table.find('tr')
#makes the call table
for call in call_rows:
whole_call_table = call.find_all('td')
call_row = [y.text for y in whole_call_table]
optionRows[call].append(call_row)
#makes the put table
for put in put_rows:
whole_put_table = put.find_all('td')
put_row = [z.text for z in whole_put_table]
optionRows[put].append(put_row)
for i in range(len(optionRows)):
optionRows[i] = optionRows[i][1:len(optionRows[i])]
return optionRows
def getColumns(columnIndexes=[2, 4, 5]):
newList = []
for tickerIndex in range(len(soup_search())):
newList.append([])
indexCount = 0
for j in soup_search()[tickerIndex]:
newList[tickerIndex].append([])
for i in columnIndexes:
newList[tickerIndex][indexCount].append(j[i])
indexCount += 1
return newList
def csvOutputer():
rows = getColumns()
fields = ["Ticker", "Strike", "Bid", "Ask"]
with open('newcsv', 'w') as f:
write = csv.writer(f)
write.writerow(fields)
for i in range(len(ticker())):
for j in rows[i]:
j.insert(0, ticker()[i])
write.writerow(j)
csvOutputer()

Data output is not the same as inside function

I am currently having an issue where I am trying to store data in a list (using dataclasses). When I print the data inside the list in the function (PullIncursionData()) it responded with a certain amount of numbers (never the same, not possible due to it's nature). When printing it after it being called to store it's return in a Var it somehow prints only the same number.
I cannot share the numbers, as they update with EVE Online's API, so the only way is to run it locally and read the first list yourself.
The repository is Here: https://github.com/AtherActive/EVEAPI-Demo
Heads up! Inside the main.py (the file with issues) (a snippet of code is down below) are more functions. All functions from line 90 and forward are important, the rest can be ignored for this question, as they do not interact with the other functions.
def PullIncursionData():
#Pulls data from URL and converts it into JSON
url = 'https://esi.evetech.net/latest/incursions/?datasource=tranquility'
data = rq.get(url)
jsData = data.json()
#Init var to store incursions
incursions = []
#Set lenght for loop. yay
length = len(jsData)
# Every loop incursion data will be read by __parseIncursionData(). It then gets added to var Incursions.
for i in range(length):
# Add data to var Incursion.
incursions.append(__parseIncursionData(jsData, i))
# If Dev mode, print some debug. Can be toggled in settings.py
if settings.developerMode == 1:
print(incursions[i].constellation_id)
return incursions
# Basically parses the input data in a decent manner. No comments needed really.
def __parseIncursionData(jsData, i):
icstruct = stru.Incursion
icstruct.constellation_id = jsData[i]['constellation_id']
icstruct.constellation_name = 'none'
icstruct.staging = jsData[i]['staging_solar_system_id']
icstruct.region_name = ResolveSystemNames(icstruct.constellation_id, 'con-reg')
icstruct.status = jsData[i]['state']
icstruct.systems_id = jsData[i]['infested_solar_systems']
icstruct.systems_names = ResolveSystemNames(jsData[i]['infested_solar_systems'], 'system')
return icstruct
# Resolves names for systems, regions and constellations. Still WIP.
def ResolveSystemNames(id, mode='constellation'):
#init value
output_name = 'none'
# If constellation, pull data and find region name.
if mode == 'con-reg':
url = 'https://www.fuzzwork.co.uk/api/mapdata.php?constellationid={}&format=json'.format(id)
data = rq.get(url)
jsData = data.json()
output_name = jsData[0]['regionname']
# Pulls system name form Fuzzwork.co.uk.
elif mode == 'system':
#Convert output to a list.
output_name = []
lenght = len(id)
# Pulls system name from Fuzzwork. Not that hard.
for i in range(lenght):
url = 'https://www.fuzzwork.co.uk/api/mapdata.php?solarsystemid={}&format=json'.format(id[i])
data = rq.get(url)
jsData = data.json()
output_name.append(jsData[i]['solarsystemname'])
return output_name
icdata = PullIncursionData()
print('external data check:')
length = len(icdata)
for i in range(length):
print(icdata[i].constellation_id)
structures.py (custom file)
#dataclass
class Incursion:
constellation_id = int
constellation_name = str
staging = int
staging_name = str
systems_id = list
systems_names = list
region_name = str
status = str
def ___init___(self):
self.constellation_id = -1
self.constellation_name = 'undefined'
self.staging = -1
self.staging_name = 'undefined'
self.systems_id = []
self.systems_names = []
self.region_name = 'undefined'
self.status = 'unknown'

How do I join results of looping script into a single variable?

I have looping script returning different filtered results, I can make this data return as an array for each of the different filter classes. However I am unsure of the best method to join all of these arrays together.
import mechanize
import urllib
import json
import re
import random
import datetime
from sched import scheduler
from time import time, sleep
from sets import Set
##### Code to loop the script and set up scheduling time
s = scheduler(time, sleep)
random.seed()
##### Code to stop duplicates part 1
userset = set ()
def run_periodically(start, end, interval, func):
event_time = start
while event_time < end:
s.enterabs(event_time, 0, func, ())
event_time += interval + random.randrange(-5, 10)
s.run()
##### Code to get the data required from the URL desired
def getData():
post_url = "URL OF INTEREST"
browser = mechanize.Browser()
browser.set_handle_robots(False)
browser.addheaders = [('User-agent', 'Firefox')]
##### These are the parameters you've got from checking with the aforementioned tools
parameters = {'page' : '1',
'rp' : '250',
'sortname' : 'race_time',
'sortorder' : 'asc'
}
##### Encode the parameters
data = urllib.urlencode(parameters)
trans_array = browser.open(post_url,data).read().decode('UTF-8')
xmlload1 = json.loads(trans_array)
pattern2 = re.compile('/control/profile/view/(.*)\' title=')
pattern4 = re.compile('title=\'posted: (.*) strikes:')
pattern5 = re.compile('strikes: (.*)\'><img src=')
for row in xmlload1['rows']:
cell = row["cell"]
##### defining the Keys (key is the area from which data is pulled in the XML) for use in the pattern finding/regex
user_delimiter = cell['username']
selection_delimiter = cell['race_horse']
user_numberofselections = float(re.findall(pattern4, user_delimiter)[0])
user_numberofstrikes = float(re.findall(pattern5, user_delimiter)[0])
strikeratecalc1 = user_numberofstrikes/user_numberofselections
strikeratecalc2 = strikeratecalc1*100
userid_delimiter_results = (re.findall(pattern2, user_delimiter)[0])
##### Code to stop duplicates throughout the day part 2 (skips if the id is already in the userset)
if userid_delimiter_results in userset: continue;
userset.add(userid_delimiter_results)
arraym = ""
arrayna = ""
if strikeratecalc2 > 50 and strikeratecalc2 < 100):
arraym0 = "System M"
arraym1 = "user id = ",userid_delimiter_results
arraym2 = "percantage = ",strikeratecalc2,"%"
arraym3 = ""
arraym = [arraym0, arraym1, arraym2, arraym3]
if strikeratecalc2 > 0 and strikeratecalc2 < 50):
arrayna0 = "System NA"
arrayna1 = "user id = ",userid_delimiter_results
arrayna2 = "percantage = ",strikeratecalc2,"%"
arrayna3 = ""
arrayna = [arrayna0, arrayna1, arrayna2, arrayna3]
getData()
run_periodically(time()+5, time()+1000000, 10, getData)
What I want to be able to do, is return both the 'arraym' and the 'arrayna' as one final Array, however due to the looping nature of the script upon each loop of the script the old 'arraym'/'arrayna' are overwritten, currently my attempts to yield one array containing all of the data has resulted in the last userid for 'systemm' and the last userid for 'sustemna'. This is obviously because, upon each run of the loop it overwrites the old 'arraym' and the 'arrayna' however I do not know of a way to get around this, so that all of my data can be accumulated in one array. Please note, I have been coding for cumulatively two weeks now, so there may well be some simple function to overcome this problem.
Kind regards AEA
Without looking at that huge code segment, typically you can do something like:
my_array = [] # Create an empty list
for <some loop>:
my_array.append(some_value)
# At this point, my_array is a list containing some_value for each loop iteration
print(my_array)
Look into python's list.append()
So your code might look something like:
#...
arraym = []
arrayna = []
for row in xmlload1['rows']:
#...
if strikeratecalc2 > 50 and strikeratecalc2 < 100):
arraym.append("System M")
arraym.append("user id = %s" % userid_delimiter_results)
arraym.append("percantage = %s%%" % strikeratecalc2)
arraym.append("")
if strikeratecalc2 > 0 and strikeratecalc2 < 50):
arrayna.append("System NA")
arrayna.append("user id = %s" % userid_delimiter_results)
arrayna.append("percantage = %s%%" % strikeratecalc2)
arrayna.append("")
#...

Categories