Problems querying with python to BigQuery (Python String Format) - python

I am trying to make a query to BigQuery in order to modify all the values of a row (in python). When I use a simple string to query, I have no problems. Nevertheless, when I introduce the string formatting the query does not work. As follows I'm presenting the same query, but diminishing the number of columns that I am modifying.
I already made the connection to BigQuery, by defining the Client, etc (and works properly).
I tried:
"UPDATE `riscos-dev.survey_test.data-test-bdrn` SET informaci_meteorol_gica = {inf}, risc = {ri} WHERE objectid = {obj_id}".format(inf = df.informaci_meteorol_gica[index], ri = df.risc[index], obj_id = df.objectid[index])
To specify the input values in format:
df.informaci_meteorol_gica[index] = 'Neu' , also a string for df.risc[index] and df.objectid[index] = 3
I am obtaining the following error message:
BadRequest: 400 Braced constructors are not supported at [1:77]

Instead of using format method of string, I propose you another approach with the f string formating in Python :
def build_query():
inf = "'test_inf'"
ri = "'test_ri'"
obj_id = "'test_obj_id'"
return f"UPDATE `riscos-dev.survey_test.data-test-bdrn` SET informaci_meteorol_gica = {inf}, risc = {ri} WHERE objectid = {obj_id}"
if __name__ == '__main__':
query = build_query()
print(query)
The result is :
UPDATE `riscos-dev.survey_test.data-test-bdrn` SET informaci_meteorol_gica = 'test_inf', risc = 'test_ri' WHERE objectid = 'test_obj_id'
I mocked the query params in my example with :
inf = "'test_inf'"
ri = "'test_ri'"
obj_id = "'test_obj_id'"

Related

How to convert Hive schema to Bigquery schema using Python?

What i get from api:
"name":"reports"
"col_type":"array<struct<imageUrl:string,reportedBy:string>>"
So in hive schema I got:
reports array<struct<imageUrl:string,reportedBy:string>>
Note: I got hive array schema as string from api
My target:
bigquery.SchemaField("reports", "RECORD", mode="NULLABLE",
fields=(
bigquery.SchemaField('imageUrl', 'STRING'),
bigquery.SchemaField('reportedBy', 'STRING')
)
)
Note: I would like to create universal code that can handle when i receive any number of struct inside of the array.
Any tips are welcome.
I tried creating a script that parses your input which is reports array<struct<imageUrl:string,reportedBy:string>>. This converts your input to a dictionary that could be used as schema when creating a table. The main idea of the apporach is instead of using SchemaField(), you can create a dictionary which is much easier than creating SchemaField() objects with parameters using your example input.
NOTE: The script is only tested based on your input and it can parse more fields if added in struct<.
import re
from google.cloud import bigquery
def is_even(number):
if (number % 2) == 0:
return True
else:
return False
def clean_string(str_value):
return re.sub(r'[\W_]+', '', str_value)
def convert_to_bqdict(api_string):
"""
This only works for a struct with multiple fields
This could give you an idea on constructing a schema dict for BigQuery
"""
num_even = True
main_dict = {}
struct_dict = {}
field_arr = []
schema_arr = []
# Hard coded this since not sure what the string will look like if there are more inputs
init_struct = sample.split(' ')
main_dict["name"] = init_struct[0]
main_dict["type"] = "RECORD"
main_dict["mode"] = "NULLABLE"
cont_struct = init_struct[1].split('<')
num_elem = len(cont_struct)
# parse fields inside of struct<
for i in range(0,num_elem):
num_even = is_even(i)
# fields are seen on even indices
if num_even and i != 0:
temp = list(filter(None,cont_struct[i].split(','))) # remove blank elements
for elem in temp:
fields = list(filter(None,elem.split(':')))
struct_dict["name"] = clean_string(fields[0])
# "type" works for STRING as of the moment refer to
# https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types
# for the accepted data types
struct_dict["type"] = clean_string(fields[1]).upper()
struct_dict["mode"] = "NULLABLE"
field_arr.append(struct_dict)
struct_dict = {}
main_dict["fields"] = field_arr # assign dict to array of fields
schema_arr.append(main_dict)
return schema_arr
sample = "reports array<struct<imageUrl:string,reportedBy:string,newfield:bool>>"
bq_dict = convert_to_bqdict(sample)
client = bigquery.Client()
project = client.project
dataset_ref = bigquery.DatasetReference(project, '20211228')
table_ref = dataset_ref.table("20220203")
table = bigquery.Table(table_ref, schema=bq_dict)
table = client.create_table(table)
Output:

How to pass variable in find() method in mongo db using python

TPID=[318,205,2624,2635]
Tid= len(TPID)
try:
mclient = MongoClient(host="tgl-mongodb22.rctanalytics.com", port=27017)
Db = mclient['sitereft4']
Db.authenticate('st_sitereference', 'rlQ2YnPKNlS0')
coll = Db['shopper_journey_sitedata']
for i in range(Tid):
data = coll.find({"third_party_site_id":318})
for datas in data:
None
print(datas["st_site_id"])
In place of 318 i need to pass the variable "Tid" so that it should run for all values.
how to do it ?
i tried below one it didn't worked:
data = coll.find({"third_party_site_id":Tid[i]})
If your goal is to find each site id in TPID=[318,205,2624,2635]
Structure the for loop logic as:
for i in TPID:
data = coll.find({"third_party_site_id":i})
print(data)

Python JSON scraping - how can I handle missing values?

I'm pretty new to coding, so I'm learning a lot as I go. This problem got me stumped, and even though I can find several similar questions on here, I can't find one that works or has a recognizable syntax to me.
I'm trying to scrape various user data from a JSON API, og then store those values in a MySQL database I've set up.
The code seems to run fine for the most part, but some users does not have the attributes I'm trying to scrape in the JSON, and thus I'm left with Nonetype errors that I cant seem to foil.
If possible I'd like to just store "0" in the database where the json does not contain the attribute.
In the m/snippet below this works fine for users that has a job, but users without a job returns Nonetype on jobposition and apparently breaks the loop.
response = requests.get("URL")
json_obj = json.loads(response.text)
timer = json_obj['timestamp']
jobposition = json_obj['job']['position']
query = "INSERT INTO users (timer, jobposition) VALUES (%s, %s)"
values = (timer, jobposition)
cursor = db.cursor()
cursor.execute(query, values)
db.commit()
Thanks in advance!
You can use for that the get() method of the dictionary as follow
timer = json_obj.get('timestamp', 0)
0 is the default value and in case there is no 'timestamp' attribute it will return 0.
For job position, you can do
jobposition = json_obj['job'].get('position', 0) if 'job' in json_obj else 0
Try this
try:
jobposition = json_obj['job']['position']
except:
jobposition = 0
You can more clearly declare the data schema using dataclasses:
from dataclasses import dataclass
from validated_dc import ValidatedDC
#dataclass
class Job(ValidatedDC):
name: str
position: int = 0
#dataclass
class Workers(ValidatedDC):
timer: int
job: Job
input_data = {
'timer': 123,
'job': {'name': 'driver'}
}
workers = Workers(**input_data)
assert workers.job.position == 0
https://github.com/EvgeniyBurdin/validated_dc

I have an Error with python flask cause of an API result (probably cause of my list) and my Database

I use flask, an api and SQLAlchemy with SQLite.
I begin in python and flask and i have problem with the list.
My application work, now i try a news functions.
I need to know if my json informations are in my db.
The function find_current_project_team() get information in the API.
def find_current_project_team():
headers = {"Authorization" : "bearer "+session['token_info']['access_token']}
user = requests.get("https://my.api.com/users/xxxx/", headers = headers)
user = user.json()
ids = [x['id'] for x in user]
return(ids)
I use ids = [x['id'] for x in user] (is the same that) :
ids = []
for x in user:
ids.append(x['id'])
To get ids information. Ids information are id in the api, and i need it.
I have this result :
[2766233, 2766237, 2766256]
I want to check the values ONE by One in my database.
If the values doesn't exist, i want to add it.
If one or all values exists, I want to check and return "impossible sorry, the ids already exists".
For that I write a new function:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(
login=session['login'], project_session=test
).first()
I have absolutely no idea to how check values one by one.
If someone can help me, thanks you :)
Actually I have this error :
sqlalchemy.exc.InterfaceError: (InterfaceError) Error binding
parameter 1 - probably unsupported type. 'SELECT user.id AS user_id,
user.login AS user_login, user.project_session AS user_project_session
\nFROM user \nWHERE user.login = ? AND user.project_session = ?\n
LIMIT ? OFFSET ?' ('my_tab_login', [2766233, 2766237, 2766256], 1, 0)
It looks to me like you are passing the list directly into the database query:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(login=session['login'], project_session=test).first()
Instead, you should pass in the ID only:
def test():
test = find_current_project_team()
for find_team in test:
find_team_db = User.query.filter_by(login=session['login'], project_session=find_team).first()
Asides that, I think you can do better with the naming conventions though:
def test():
project_teams = find_current_project_team()
for project_team in project_teams:
project_team_result = User.query.filter_by(login=session['login'], project_session=project_team).first()
All works thanks
My code :
project_teams = find_current_project_team()
for project_team in project_teams:
project_team_result = User.query.filter_by(project_session=project_team).first()
print(project_team_result)
if project_team_result is not None:
print("not none")
else:
project_team_result = User(login=session['login'], project_session=project_team)
db.session.add(project_team_result)
db.session.commit()

Filtering objects in Django based on optional arguments

Many times I find myself writing code similar to:
query = MyModel.objects.all()
if request.GET.get('filter_by_field1'):
query = query.filter(field1 = True)
if request.GET.get('filter_by_field2'):
query = query.filter(field2 = False)
field3_filter = request.GET.get('field3'):
if field3_filter is not None:
query = query.filter(field3 = field3_filter)
if field4_filter:
query = query.filter(field4 = field4_filter)
# etc...
return query
Is there a better, more generic way of building queries such as the one above?
If the only things that are ever going to be in request GET are potential query arguments, you could do this:
query = MyModel.objects.filter(**request.GET)

Categories