Python Parsing Trace File - python

I have a trace file downloaded from chrome://bookmarks and it looks like this
{"args":{"name":"Chrome_ChildIOThread"},"cat":"__metadata","name":"thread_name","ph":"M","pid":24050,"tid":13059,"ts":0},
{"args":{"name":"Compositor"},"cat":"__metadata","name":"thread_name","ph":"M","pid":24050,"tid":42243,"ts":0},
{"args":{"name":"CompositorTileWorker1"},"cat":"__metadata","name":"thread_name","ph":"M","pid":24050,"tid":23555,"ts":0},
{"args":{"name":"CompositorTileWorker2"},"cat":"__metadata","name":"thread_name","ph":"M","pid":24050,"tid":40963,"ts":0},
{"args":{"name":"ThreadPoolForegroundWorker"},"cat":"__metadata","name":"thread_name","ph":"M","pid":24050,"tid":12035,"ts":0},
{"args":{"name":"Chrome_ChildIOThread"},"cat":"__metadata","name":"thread_name","ph":"M","pid":17240,"tid":13315,"ts":0},
{"args":{"name":"Chrome_ChildIOThread"},"cat":"__metadata","name":"thread_name","ph":"M","pid":17247,"tid":13059,"ts":0},
{"args":{"name":"Chrome_ChildIOThread"},"cat":"__metadata","name":"thread_name","ph":"M","pid":17244,"tid":18435,"ts":0},
I would like to read this file and load into a dictionary to get the traces that name is equal to Chrome_ChildIOThread and so on. How can I do that?

Let's assume that the data you've shown as input is in a file called bookmarks.txt then this should suffice:
D = {'data': []}
with open('bookmarks.txt') as bm:
for line in bm:
try:
i = line.rindex('}') + 1
j = eval(line[:i])
if j['args']['name'] == 'Chrome_ChildIOThread':
D['data'].append(j)
except (ValueError, KeyError):
pass
print(D)

Related

How do I extract just a single data element using usaddress?

I'm attempting to access one data element from usaddress. For example, PlaceName is the city field of the address. usaddress returns an ordered dictionary. I'm just trying to extract one value from the ordered dictionary.
import usaddress
temp = usaddress.parse("ZENIA, CA 95595")
print(temp)
try:
print(temp.get['PlaceName'])
except AttributeError:
print("ERROR")
Results:
[('ZENIA,', 'PlaceName'), ('CA', 'StateName'), ('95595', 'ZipCode')]
ERROR
I wanted just ZENIA.
If you get the data in a form of a list. I think you can create a simple function to extract the info as follows:
import re
data = [('ZENIA,', 'PlaceName'), ('CA', 'StateName'), ('95595', 'ZipCode')]
def get_place_name(data):
flag = False
for info in data:
if 'PlaceName' in info:
return re.sub(r"[^a-zA-Z0-9]+", '', info[0])
return flag
Result:
res = get_place_name(data)
# 'ZENIA'
in python 3
import usaddress
addr = "ZENIA, CA 95595"
parsed_addr = usaddress.tag(addr)
print(parsed_addr)
try:
place_name = parsed_addr[0]['PlaceName']
print(place_name)
except AttributeError as e:
print(e)
Try this:
import usaddress
temp = dict(usaddress.tag('ZENIA, CA 95595')[0])
print(temp['PlaceName'])
Your output would be:
ZENIA
For printing everything, just try:
print(temp)
Output is:
{'PlaceName': 'ZENIA', 'StateName': 'CA', 'ZipCode': '95595'}

Check that a key from json output exists

I keep getting the following error when trying to parse some json:
Traceback (most recent call last):
File "/Users/batch/projects/kl-api/api/helpers.py", line 37, in collect_youtube_data
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
KeyError: 'brandingSettings'
How do I make sure that I check my JSON output for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value. Code below:
try:
channel_id = channel_id_response_data['items'][0]['id']
channel_info_url = YOUTUBE_URL + '/channels/?key=' + YOUTUBE_API_KEY + '&id=' + channel_id + '&part=snippet,contentDetails,statistics,brandingSettings'
print('Querying:', channel_info_url)
channel_info_response = requests.get(channel_info_url)
channel_info_response_data = json.loads(channel_info_response.content)
no_of_videos = int(channel_info_response_data['items'][0]['statistics']['videoCount'])
no_of_subscribers = int(channel_info_response_data['items'][0]['statistics']['subscriberCount'])
no_of_views = int(channel_info_response_data['items'][0]['statistics']['viewCount'])
avg_views = round(no_of_views / no_of_videos, 0)
photo = channel_info_response_data['items'][0]['snippet']['thumbnails']['high']['url']
description = channel_info_response_data['items'][0]['snippet']['description']
start_date = channel_info_response_data['items'][0]['snippet']['publishedAt']
title = channel_info_response_data['items'][0]['snippet']['title']
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except Exception as e:
raise Exception(e)
You can either wrap all your assignment in something like
try:
keywords = channel_info_response_data['items'][0]['brandingSettings']['channel']['keywords']
except KeyError as ignore:
keywords = "default value"
or, let say, use .has_key(...). IMHO In your case first solution is preferable
suppose you have a dict, you have two options to handle the key-not-exist situation:
1) get the key with default value, like
d = {}
val = d.get('k', 10)
val will be 10 since there is not a key named k
2) try-except
d = {}
try:
val = d['k']
except KeyError:
val = 10
This way is far more flexible since you can do anything in the except block, even ignore the error with a pass statement if you really don't care about it.
How do I make sure that I check my JSON output
At this point your "JSON output" is just a plain native Python dict
for a key before assigning it to a variable? If a key isn’t found, then I just want to assign a default value
Now you know you have a dict, browsing the official documention for dict methods should answer the question:
https://docs.python.org/3/library/stdtypes.html#dict.get
get(key[, default])
Return the value for key if key is in the dictionary, else default. If default is not given, it defaults to None, so that this method never raises a KeyError.
so the general case is:
var = data.get(key, default)
Now if you have deeply nested dicts/lists where any key or index could be missing, catching KeyErrors and IndexErrors can be simpler:
try:
var = data[key1][index1][key2][index2][keyN]
except (KeyError, IndexError):
var = default
As a side note: your code snippet is filled with repeated channel_info_response_data['items'][0]['statistics'] and channel_info_response_data['items'][0]['snippet'] expressions. Using intermediate variables will make your code more readable, easier to maintain, AND a bit faster too:
# always set a timeout if you don't want the program to hang forever
channel_info_response = requests.get(channel_info_url, timeout=30)
# always check the response status - having a response doesn't
# mean you got what you expected. Here we use the `raise_for_status()`
# shortcut which will raise an exception if we have anything else than
# a 200 OK.
channel_info_response.raise_for_status()
# requests knows how to deal with json:
channel_info_response_data = channel_info_response.json()
# we assume that the response MUST have `['items'][0]`,
# and that this item MUST have "statistics" and "snippets"
item = channel_info_response_data['items'][0]
stats = item["statistics"]
snippet = item["snippet"]
no_of_videos = int(stats.get('videoCount', 0))
no_of_subscribers = int(stats.get('subscriberCount', 0))
no_of_views = int(stats.get('viewCount', 0))
avg_views = round(no_of_views / no_of_videos, 0)
try:
photo = snippet['thumbnails']['high']['url']
except KeyError:
photo = None
description = snippet.get('description', "")
start_date = snippet.get('publishedAt', None)
title = snippet.get('title', "")
try:
keywords = item['brandingSettings']['channel']['keywords']
except KeyError
keywords = ""
You may also want to learn about string formatting (contatenating strings is quite error prone and barely readable), and how to pass arguments to requests.get()

adding to nested dictionaries in python

i have a nested dictionary in the form of:
self.emoji_per_word = {0: {'worte': 0, 'emojis': 0, '#': 0}}
Now i need to add more sub dictionaries to this as my program runs. I do this:
worte = 0
emoji = 0
# some code that assigns values to the 2 variables and creates the time_stamp variable
if time_stamp in self.emoji_per_word:
self.emoji_per_word[time_stamp]['worte'] = self.emoji_per_word[time_stamp]['worte'] + worte
self.emoji_per_word[time_stamp]['emojis'] = self.emoji_per_word[time_stamp]['emojis'] + emojis
else:
self.emoji_per_word[time_stamp]['worte'] = worte
self.emoji_per_word[time_stamp]['emojis'] = emojis
As you can see, i try to the test if the key time_stamp already exists and if yes, update the value with the new data. If not i want to create the key time_stamp and assign it a inital value. However im getting a Key Error once the programm goes past the inital value (see top).
Exception in thread Video 1:
Traceback (most recent call last):
File "C:\Anaconda\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "C:\MA\Code\jsonparser_v2\jsonparser_v2.py", line 418, in run
self.process_json()
File "C:\MA\Code\jsonparser_v2\jsonparser_v2.py", line 201, in process_json
self.emoji_per_word[time_stamp]['worte'] = worte
KeyError: 1
What I want in the end is something like this:
self.emoji_per_word = {0: {'worte': 20, 'emojis': 5, '#':0.25}, 1: {'worte': 20, 'emojis': 5, '#':0.25}}
What am I doing wrong here?
You're getting the error because self.emoji_per_word[time_stamp] doesn't exist when time_stamp != 0 so you need to create the dictionary first before assigning values to it, like so:
else:
self.emoji_per_word[time_stamp] = {}
self.emoji_per_word[time_stamp]['worte'] = worte
self.emoji_per_word[time_stamp]['emojis'] = emojis

Iterating collections inside collections in Firestore

if I iterate every user.id inside the user collection I get every user.id printed out correctly:
user_ref = db.collection(u'users')
for user_collection in user_ref.get():
print(user_collection.id, file = sys.stderr)
Now, when I try to iterate a collections inside each one of the documents inside the user collection, the original iteration that printsuser.id does not run completely:
user_ref = db.collection(u'users')
for user_collection in user_ref.get():
print(user_collection.id, file = sys.stderr)
s2_ref = user_ref.document(user_collection.id).collection(u'preferences')
for s2 in s2_ref.get():
try:
print(s2.id, file = sys.stderr)
except google.cloud.exceptions.NotFound:
pass
I have included an exception to bypass empty collections.
How can I complete the iteration correctly?
I just had to create an array for the first set of results, and then iterate each id separately:
user_id_array = []
for user_collection in user_ref.get():
user_id_array.append(user_collection.id)
for user_id in user_id_array:
try:
suscription_ref = doc_ref.document(user_id).collection(u'suscriptions').document(user_id).get()
print(suscription_ref.id,file = sys.stderr)
except google.cloud.exceptions.NotFound:
pass
It takes more time, but it'll get you there.

Loading function parameters from a text file

I have the following function:
def request( url, type, headers, simulate = False, data = {}):
I want to be able to load the parameters from a text file and pass them to the function, I tried using evil eval below:
if execute_recovery:
for command in content:
logger.debug("Executing: "+command)
try:
result = eval(utilities.request("{0}").format(command))
if not result["Success"]:
continue_recovery = utilities.query_yes_no("Warning: Previous recovery command failed, attempt to continue recovery?\n")
if not continue_recovery:
break
else:
logger.debug("Command executed successfully...")
except Exception, e:
logger.debug( "Recovery: Eval Error, %s" % str(e) )
Where command would be a line in a text file like:
"http://192.168.1.1/accounts/1/users/1",delete,headers,simulate=False,data={}
This throws me the following error:
'request() takes at least 3 arguments (1 given)'
So presumably this means that it is interpreting the command as a single string instead of different parameters.
Does anybody know how to solve this?
I can't understand what you are trying to do there with eval or format. For one thing, you've put eval around the call to request itself, so it will evaluate the return value rather than call it with some dynamic value.
But you don't need eval at all. You just need to pass the arguments using the * and ** operators:
args = []
kwargs = {}
for arg in command.split(','):
if '=' in arg:
k, v = arg.split('=')
kwargs[k] = ast.literal_eval(v)
else:
args.append(arg)
result = utilities.request(*args, **kwargs)
Using #BurhanKhalid's suggestion, I decided to store the parameters as a json object and load them at run time like so:
Store parameters here:
def request( url, type, headers, simulate = False, data = {}):
if simulate:
recovery_command = {"url":url, "type" : type, "data" : data}
recovery.add_command(json.dumps(recovery_command))
...
Load parameters here:
def recovery():
...
if execute_recovery:
for command in content:
logger.debug("Executing: "+command)
try:
recovery_command = json.loads(command)
result = utilities.request(url = recovery_command["url"], type = recovery_command["type"], headers = headers, simulate = False, data = recovery_command["data"])
if not result["Success"]:
continue_recovery = utilities.query_yes_no("Warning: Previous recovery command failed, attempt to continue recovery?\n")
if not continue_recovery:
break
else:
logger.debug("Command executed successfully...")
except Exception, e:
logger.debug( "Recovery: Eval Error, %s" % str(e) )

Categories