Python: Convert specific value in dictionary from string to int

Python: Convert specific value in dictionary from string to int - python

i want to convert specific value in dictionary from string to int.
This is the dictionary:
for r in repo_tags:
name, tag = r.split(":")
image = {"repo": name, "tag": tag, "id": img_id[1][:12], "full_name": r}
print(image["repo"])
images.append(image)
print(image)
This is the output:
{'repo': 'python', 'tag': 'latest', 'id': '254d4a8a8f31', 'full_name': 'python:latest'}
nginx
{'repo': 'nginx', 'tag': 'latest', 'id': '35c43ace9216', 'full_name': 'nginx:latest'}
hello-world
{'repo': 'hello-world', 'tag': 'latest', 'id': 'bf756fb1ae65', 'full_name': 'hello-world:latest'}
The id is a string, but i want this to be an int to make it a primary key in database.
Is this possible to change only this value to an int?

Related

How can I use the batchupdate function of the google form api with python to modify the url and answers?

Here is an example of my form. How can I use python to modify the url and answer for the first question, as I am not familiar with using batchupdate?
I can use "get" to retrieve information from the form.
{'formId': '1q4pJMDtiLxQ2cjmLXxowqJ5VPfI68bUUo',
'info': {'title': 'PIXEL ', 'documentTitle': 'daily'},
'settings': {'quizSettings': {'isQuiz': True}},
'revisionId': '00000067',
'responderUri': 'https://docs.google.com/forms/d/e/1FAIpQLScap6ZdpOnWIyxWqZNXjlfWW9DgPe-Wv_CUtziWw/viewform',
'items': [{'itemId': '7c0ddb37', 'pageBreakItem': {}},
{'itemId': '2870b06c', 'videoItem': {'video': {'youtubeUri': 'www.youtube.com/watch?v=Lt5HqPvM-eI', 'properties': {'alignment': 'LEFT', 'width': 320}}}},
{'itemId': '381aedf6', 'questionGroupItem': {'questions': [{'questionId': '4d7f011e', 'required': True, 'rowQuestion': {'title': 'pick'}}], 'grid': {'columns': {'type': 'RADIO', 'options': [{'value': '1'}, {'value': '2'}, {'value': '3'}]}}}, 'title': 'pay'},
{'itemId': '0f9dc00b', 'title': 'number', 'questionItem': {'question': {'questionId': '39523976', 'required': True,
'grading': {'correctAnswers': {'answers': [{'value':'1115'}]}}, 'textQuestion': {}}}},
{'itemId': '0a12a42e', 'pageBreakItem': {}},
{'itemId': '19640fea', 'videoItem': {'video': {'youtubeUri': 'www.youtube.com/watch?v=Lt5HqPvM-eI', 'properties': {'alignment': 'LEFT', 'width': 320}}}},
{'itemId': '685ba545', 'questionGroupItem': {'questions': [{'questionId': '044f9f9b', 'required': True, 'rowQuestion': {'title': 'pick'}}], 'grid': {'columns': {'type': 'RADIO', 'options': [{'value': '1'}, {'value': '2'}, {'value': '3'}]}}}, 'title': 'pay'},
{'itemId': '6a9d1b88', 'title': 'number', 'questionItem': {'question': {'questionId': '2199beb0', 'required': True,
'grading': {'correctAnswers': {'answers': [{'value': '1115'}]}}, 'textQuestion': {}}}}]}
The official documentation has too few examples for me to understand how to apply it to my form.
update = {
"requests": [{
"updateItem": {
"item": {
"title": "Homework video",
"description": "Quizzes in Google Forms",
"videoItem": {
"video": {
"youtubeUri": "https://www.youtube.com/watch?v=Lt5HqPvM-eI"
}
}
},"location": {
"index": 0},
"updateMask": "description,youtubeUri"
}
}]
}
question_setting = service.forms().batchUpdate(
formId=form_id, body=update).execute()

From your following reply,
I want to update the youtubeUri item and use a new URL. How can I do this? i have two question use the video,how do i update the first question URL ?
I understood your question is as follows.
You want to update youtubeUri of 1st question in Google Forms using googleapis for python.
In this case, how about the following sample script?
Sample script:
service = # Please use your client
formId = "###" # Please set your Google Form ID.
after = "https://www.youtube.com/watch?v=###" # Please set YouTube URL you want to replace. In this sample, the existing URL is changed to this URL.
res = service.forms().get(formId=formId).execute()
itemIds = [[i, e["itemId"]] for i, e in enumerate(res.get("items")) if "videoItem" in e]
topItem = itemIds[0] # From your question, `youtubeUri` of the 1st question.
req = {
"requests": [
{
"updateItem": {
"item": {
"itemId": topItem[1],
"videoItem": {
"video": {
"youtubeUri": after,
}
},
},
"location": {"index": topItem[0]},
"updateMask": "videoItem.video.youtubeUri",
}
}
]
}
service.forms().batchUpdate(formId=formId, body=req).execute()
When this script is run, first, all items are retrieved. And, the item IDs including youtubeUri are retrieved. And, using the 1st item ID, the value of youtubeUri is changed to the value of after you set.
Note:
In this sample script, it supposes that you have already been able to get and out values to Google Form using Google Form API. Please be careful about this.
Reference:
Method: forms.batchUpdate

Parse JSON value into key-value attributes

I got the following output from my API:
{
'Type': 'Notification',
'MessageId': 'xxx',
'TopicArn': 'xxx',
'Subject': 'xxx',
'Message': 'EventType=Delete, FriendlyType=was deleted, '
'Timestamp=2021-11-08T15:30:45Z, UserId=1111, UserName=me#me.com, '
'IPAddr=(empty), AccountId=22222, AccountName=test-account, '
'ProjectId=test-project',
'Timestamp': '2021-11-08T15:30:46.214Z',
'SignatureVersion': '1'
}
Now I want to access the "Message" variable - once I am in, and get the following output (as already visible in the previous mentioned JSON):
EventType=Delete, FriendlyType=was deleted, Timestamp=2021-11-08T15:30:45Z, UserId=1111, UserName=me#me.com, IPAddr=(empty), AccountId=22222, AccountName=test-account, ProjectId=test-project
How can I now access the keys like EventType, FriendlyType, etc.? I assume that I have to convert this output at first to a valid JSON, but I am currently baffled.

In case that you are not able to receive the Message data as a JSON, a way to handle the situation is convert the message string into a dict <key>:<value>:
message_as_dict = dict(map(lambda var: var.strip().split("=") ,message.split(",")))
NOTICE the .strip() in order to remove the spaces on the beginning of the key.
That shoud create a dictionary with the following structure:
{'EventType': 'Delete', 'FriendlyType': 'was deleted', 'Timestamp': '2021-11-08T15:30:45Z', 'UserId': '1111', 'UserName': 'me#me.com', 'IPAddr': '(empty)', 'AccountId': '22222', 'AccountName': 'test-account', 'ProjectId': 'test-project'}
Then you can access to the values with, for example:
print(message_as_dict["UserName"])
> me#me.com

you can parse your string spliting and then use it to create a dict. Maybe it's not the best solution but it's a simple one.
response = {
'Type': 'Notification',
'MessageId': 'xxx',
'TopicArn': 'xxx',
'Subject': 'xxx',
'Message': 'EventType=Delete, FriendlyType=was deleted, Timestamp=2021-11-08T15:30:45Z, UserId=1111, UserName=me#me.com, IPAddr=(empty), AccountId=22222, AccountName=test-account, ProjectId=test-project',
'Timestamp': '2021-11-08T15:30:46.214Z',
'SignatureVersion': '1'
}
keyVals = [el.split('=') for el in response['Message'].split(', ')]
subdict = {}
for key,val in keyVals:
subdict[key] = val

As mentioned in one of the answer, you can parse your Message string, but I too feel it won't be the best solution. What I noticed is that your JSON is not in proper format. See below for proper JSON you should be getting from your API.
{
"Type": "Notification",
"MessageId": "xxx",
"TopicArn": "xxx",
"Subject": "xxx",
"Message": {
"EventType": "Delete",
"FriendlyType": "was deleted",
"Timestamp": "2021-11-08T15:30:45Z",
"UserId": "1111",
"UserName": "me#me.com",
"IPAddr": "(empty)",
"AccountId": "22222",
"AccountName": "test-account",
"ProjectId": "test-project"
},
"SignatureVersion": "1"
}
Once you are able to get this output, you may further access nested objects. For example, to access FriendlyType from Message, you can simply say, body.Message.FriendlyType. body here means your entire JSON object.

You could do it by splitting the 'Message' string up into (key, value) pairs and constructing a dictionary from them:
from pprint import pprint
output = {'Type': 'Notification',
'MessageId': 'xxx',
'TopicArn': 'xxx',
'Subject': 'xxx',
'Message': 'EventType=Delete, FriendlyType=was deleted, '
'Timestamp=2021-11-08T15:30:45Z, UserId=1111, UserName=me#me.com, '
'IPAddr=(empty), AccountId=22222, AccountName=test-account, '
'ProjectId=test-project',
'Timestamp': '2021-11-08T15:30:46.214Z',
'SignatureVersion': '1'}
msg_dict = dict(pair.split('=') for pair in output['Message'].split(', '))
pprint(msg_dict, sort_dicts=False)
Output:
{'EventType': 'Delete',
'FriendlyType': 'was deleted',
'Timestamp': '2021-11-08T15:30:45Z',
'UserId': '1111',
'UserName': 'me#me.com',
'IPAddr': '(empty)',
'AccountId': '22222',
'AccountName': 'test-account',
'ProjectId': 'test-project'}

Create a new dictionary from a nested JSON output after parsing

In python3 I need to get a JSON response from an API call,
and parse it so I will get a dictionary That only contains the data I need.
The final dictionary I ecxpt to get is as follows:
{'Severity Rules': ('cc55c459-eb1a-11e8-9db4-0669bdfa776e', ['cc637182-eb1a-11e8-9db4-0669bdfa776e']), 'auto_collector': ('57e9a4ec-21f7-4e0e-88da-f0f1fda4c9d1', ['0ab2470a-451e-11eb-8856-06364196e782'])}
the JSON response returns the following output:
{
'RuleGroups': [{
'Id': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rules',
'Order': 1,
'Enabled': True,
'Rules': [{
'Id': 'cc637182-eb1a-11e8-9db4-0669bdfa776e',
'Name': 'Severity Rule',
'Description': 'Look for default severity text',
'Enabled': False,
'RuleMatchers': None,
'Rule': '\\b(?P<severity>DEBUG|TRACE|INFO|WARN|ERROR|FATAL|EXCEPTION|[I|i]nfo|[W|w]arn|[E|e]rror|[E|e]xception)\\b',
'SourceField': 'text',
'DestinationField': 'text',
'ReplaceNewVal': '',
'Type': 'extract',
'Order': 21520,
'KeepBlockedLogs': False
}],
'Type': 'user'
}, {
'Id': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c',
'Name': 'auto_collector',
'Order': 4,
'Enabled': True,
'Rules': [{
'Id': '2d6bdc1d-4064-11eb-8856-06364196e782',
'Name': 'auto_collector',
'Description': 'DO NOT CHANGE!! Created via API coralogix-blocker tool',
'Enabled': False,
'RuleMatchers': None,
'Rule': 'AUTODISABLED',
'SourceField': 'subsystemName',
'DestinationField': 'subsystemName',
'ReplaceNewVal': '',
'Type': 'block',
'Order': 1,
'KeepBlockedLogs': False
}],
'Type': 'user'
}]
}
I was able to create a dictionary that contains the name and the RuleGroupsID, like that:
response = requests.get(url,headers=headers)
output = response.json()
outputlist=(output["RuleGroups"])
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
# Create a dictionary of NAME + ID
ruleDic = {}
for key in groupRuleName:
for value in groupRuleID:
ruleDic[key] = value
groupRuleID.remove(value)
break
Which gave me a simple dictionary:
{'Severity Rules': 'cc55c459-eb1a-11e8-9db4-0669bdfa776e', 'Rewrites': 'ddbaa27e-1747-11e9-9db4-0669bdfa776e', 'Extract': '0cb937b6-2354-d23a-5806-4559b1f1e540', 'auto_collector': '4f6fa7c6-d60f-49cd-8c3d-02dcdff6e54c'}
but when I tried to parse it as nested JSON things just didn't work.

In the end, I managed to create a function that returns this dictionary,
I'm doing it by breaking the JSON into 3 lists by the needed elements (which are Name, Id, and Rules from the first nest), and then create another list from the nested JSON ( which listed everything under Rule) which only create a list from the keyword "Id".
Finally creating a dictionary using a zip command on the lists and dictionaries created earlier.
def get_filtered_rules() -> List[dict]:
groupRuleName = [li['Name'] for li in outputlist]
groupRuleID = [li['Id'] for li in outputlist]
ruleIDList = [li['Rules'] for li in outputlist]
ruleIDListClean = []
ruleClean = []
for sublist in ruleIDList:
try:
lstRule = [item['Rule'] for item in sublist]
ruleClean.append(lstRule)
ruleContent=list(zip(groupRuleName, ruleClean))
ruleContentDictionary = dict(ruleContent)
lstID = [item['Id'] for item in sublist]
ruleIDListClean.append(lstID)
# Create a dictionary of NAME + ID + RuleID
ruleDic = dict(zip(groupRuleName, zip(groupRuleID, ruleIDListClean)))
except Exception as e: print(e)
return ruleDic

How to flatten nested dict formatted '_source' column of csv, into dataframe

I have a csv with 500+ rows where one column "_source" is stored as JSON. I want to extract that into a pandas dataframe. I need each key to be its own column. #I have a 1 mb Json file of online social media data that I need to convert the dictionary and key values into their own separate columns. The social media data is from Facebook,Twitter/web crawled... etc. There are approximately 528 separate rows of posts/tweets/text with each having many dictionaries inside dictionaries. I am attaching a few steps from my Jupyter notebook below to give a more complete understanding. need to turn all key value pairs for dictionaries inside dictionaries into columns inside a dataframe
Thank you so much this will be a huge help!!!
I have tried changing it to a dataframe by doing this
source = pd.DataFrame.from_dict(source, orient='columns')
And it returns something like this... I thought it might unpack the dictionary but it did not.
#source.head()
#_source
#0 {'sub_organization_id': 'default', 'uid': 'aba...
#1 {'sub_organization_id': 'default', 'uid': 'ab0...
#2 {'sub_organization_id': 'default', 'uid': 'ac0...
below is the shape
#source.shape (528, 1)
below is what the an actual "_source" row looks like stretched out. There are many dictionaries and key:value pairs where each key needs to be its own column. Thanks! The actual links have been altered/scrambled for privacy reasons.
{'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
before you post make sure the actual code works for the data attached. Thanks!
The below code I tried but it did not work there was a syntax error that I could not figure out.
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
pd.io.json.json_normalize(source_data.[_source].apply(json.loads))
^
SyntaxError: invalid syntax
Whoever can help me with this will be a saint!

I had to do something like that a while back. Basically I used a function that completely flattened out the json to identify the keys that would be turned into the columns, then iterated through the json to reconstruct a row and append each row into a "results" dataframe. So with the data you provided, it created 52 column row and looking through it, looks like it included all the keys into it's own column. Anything nested, for example: 'meta': {'rule_matcher':[{'atribs': {'website': ...]} should then have a column name meta.rule_matcher.atribs.website where the '.' denotes those nested keys
data_source = {'sub_organization_id': 'default',
'uid': 'ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b',
'project_veid': 'default',
'campaign_id': 'default',
'organization_id': 'default',
'meta': {'rule_matcher': [{'atribs': {'website': 'github.com/res',
'source': 'Explicit',
'version': '1.1',
'type': 'crawl'},
'results': [{'rule_type': 'hashtag',
'rule_tag': 'Far',
'description': None,
'project_veid': 'A7180EA-7078-0C7F-ED5D-86AD7',
'campaign_id': '2A6DA0C-365BB-67DD-B05830920',
'value': '#Far',
'organization_id': None,
'sub_organization_id': None,
'appid': 'ray',
'project_id': 'CDE2F42-5B87-C594-C900E578C',
'rule_id': '1838',
'node_id': None,
'metadata': {'campaign_title': 'AF',
'project_title': 'AF '}}]}],
'render': [{'attribs': {'website': 'github.com/res',
'version': '1.0',
'type': 'Page Render'},
'results': [{'render_status': 'success',
'path': 'https://east.amanaws.com/rays-ime-store/renders/b/b/70f7dffb8b276f2977f8a13415f82c.jpeg',
'image_hash': 'bb7674b8ea3fc05bfd027a19815f82c',
'url': 'https://discooprdapp.com/',
'load_time': 32}]}]},
'norm_attribs': {'website': 'github.com/res',
'version': '1.1',
'type': 'crawl'},
'project_id': 'default',
'system_timestamp': '2019-02-22T19:04:53.569623',
'doc': {'appid': 'subtter',
'links': [],
'response_url': 'https://discooprdapp.com',
'url': 'https://discooprdapp.com/',
'status_code': 200,
'status_msg': 'OK',
'encoding': 'utf-8',
'attrs': {'uid': '2ab8f2651cb32261b911c990a8b'},
'timestamp': '2019-02-22T19:04:53.963',
'crawlid': '7fd95-785-4dd259-fcc-8752f'},
'type': 'crawl',
'norm': {'body': '\n',
'domain': 'discordapp.com',
'author': 'crawl',
'url': 'https://discooprdapp.com',
'timestamp': '2019-02-22T19:04:53.961283+00:00',
'id': '7fc5-685-4dd9-cc-8762f'}}
Code:
def flatten_json(y):
out = {}
def flatten(x, name=''):
if type(x) is dict:
for a in x:
flatten(x[a], name + a + '_')
elif type(x) is list:
i = 0
for a in x:
flatten(a, name + str(i) + '_')
i += 1
else:
out[name[:-1]] = x
flatten(y)
return out
flat = flatten_json(data_source)
import pandas as pd
import re
results = pd.DataFrame()
special_cols = []
columns_list = list(flat.keys())
for item in columns_list:
try:
row_idx = re.findall(r'\_(\d+)\_', item )[0]
except:
special_cols.append(item)
continue
column = re.findall(r'\_\d+\_(.*)', item )[0]
column = re.sub(r'\_\d+\_', '.', column)
row_idx = int(row_idx)
value = flat[item]
results.loc[row_idx, column] = value
for item in special_cols:
results[item] = flat[item]
Output:
print (results.to_string())
atribs_website atribs_source atribs_version atribs_type results.rule_type results.rule_tag results.description results.project_veid results.campaign_id results.value results.organization_id results.sub_organization_id results.appid results.project_id results.rule_id results.node_id results.metadata_campaign_title results.metadata_project_title attribs_website attribs_version attribs_type results.render_status results.path results.image_hash results.url results.load_time sub_organization_id uid project_veid campaign_id organization_id norm_attribs_website norm_attribs_version norm_attribs_type project_id system_timestamp doc_appid doc_response_url doc_url doc_status_code doc_status_msg doc_encoding doc_attrs_uid doc_timestamp doc_crawlid type norm_body norm_domain norm_author norm_url norm_timestamp norm_id
0 github.com/res Explicit 1.1 crawl hashtag Far NaN A7180EA-7078-0C7F-ED5D-86AD7 2A6DA0C-365BB-67DD-B05830920 #Far NaN NaN ray CDE2F42-5B87-C594-C900E578C 1838 NaN AF AF github.com/res 1.0 Page Render success https://east.amanaws.com/rays-ime-store/render... bb7674b8ea3fc05bfd027a19815f82c https://discooprdapp.com/ 32.0 default ac0fafe9ba98327f2d0c72ddc365ffb76336czsa13280b default default default github.com/res 1.1 crawl default 2019-02-22T19:04:53.569623 subtter https://discooprdapp.com https://discooprdapp.com/ 200 OK utf-8 2ab8f2651cb32261b911c990a8b 2019-02-22T19:04:53.963 7fd95-785-4dd259-fcc-8752f crawl \n discordapp.com crawl https://discooprdapp.com 2019-02-22T19:04:53.961283+00:00 7fc5-685-4dd9-cc-8762f

Trouble accessing a value from a dictionary

I have the following dictionary
Dict = {'Manu':{u'ID0020879.07': [{'ID': u'ID0020879.07', 'log': u'log-123-56', 'owner': [Manu], 'item': u'WRAITH', 'main_id': 5013L, 'status': u'noticed', 'serial': u'89980'}]}}
How can I access the serial from this dictionary?
I tried Dict['Manu']['serial'], But its not working as expected..
Guys any idea?

Your dictionary is very nested one.Try like this.
In [1]: Dict['Manu']['ID0020879.07'][0]['serial']
Out[1]: u'89980'

Here is the restructured dictionary.
{
'Manu': {
u'ID0020879.07': [{
'ID': u'ID0020879.07',
'log': u'log-123-56',
'owner': [Manu],
'item': u'WRAITH',
'main_id': 5013L,
'status': u'noticed',
'serial': u'89980'
}]
}
}
Now, you can see where the serial key is located more clearly (not under Manu)...
It is instead
Dict['Manu']['ID0020879.07'][0]['serial']
I suggest you fix that data source to not make ID0020879.07 a key of the data (because it is duplicated in the ID key of that object in the list).
Perhaps fix like so where the Manu key maps to a list of "accounts", each with an ID and other fields
{
'Manu': [{
'ID': u'ID0020879.07',
'log': u'log-123-56',
'owner': [Manu],
'item': u 'WRAITH',
'main_id': 5013L,
'status': u'noticed',
'serial': u'89980'
}]
}
And then you could do
Dict['Manu'][0]['serial']
Or loop the list to get all the serial keys
for item in Dict['Manu']:
print(item['serial'])

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Python: Convert specific value in dictionary from string to int - python

Related

How can I use the batchupdate function of the google form api with python to modify the url and answers?

Parse JSON value into key-value attributes

Create a new dictionary from a nested JSON output after parsing

How to flatten nested dict formatted '_source' column of csv, into dataframe

Trouble accessing a value from a dictionary

Categories

Resources