Extract a part of a dictionnary file - python

Before saying I didn't search for an answer, I did and even if i'm not a Python expert, I didn't find any explicit answer.
For me to be clear, I'd like to extract 2 infos ("name" & "fame") from a specific "clan".
In the json file extracted, the info are on [items] then in [0] and 1 and 2 and [3] and [4]. In this dictionnary on [standings]. Then, my issue is in the next dictionnary, it can be or in [0] or 1 or 2 or [3] or [4]. I don't know how to filter, for exemple by using something like "filter with tag = #9VL9L9Y".
Here is my code:
data = json.loads(response)
for item in data ["items"]:
for p in item ["standings"]:
for q in p ["clan"]["participants"]:
if (p["clan"] = '#9VL9L9YQ'):
print("%s %s" % (
q["name"],
q["fame"],
))
I know my line "if (p["clan"] = '#9VL9L9YQ'):" is not correct but this is what i'd like to do.
How the JSON file looks like:
Thanks for your help !

Reorder the logic a bit:
data = json.loads(response)
for item in data ["items"]:
for p in item ["standings"]:
clan = p["clan"]
# check tag first:
if clan["tag"] == '#9VL9L9YQ': # remove extraneous )
for q in clan["participants"]:
print("%s %s" % (q["name"], ["fame"]))

There is a syntax error in your code, simply correct it:
Replace:
if (p["clan"] = '#9VL9L9YQ'):
With:
if (p["clan"] == '#9VL9L9YQ'):
Note: your syntax was almost correct you just had a small and common mistake to forget using double "=" for comparison.

Related

How do I split and reconstruct a variable name while holding its original value

Is it possible to split variables that have already been assigned values, and re-piece them back together to hold those same previous values?
For Example:
URLs.QA.Signin = 'https://qa.test.com'
TestEnvironment = 'QA'
CurrentURL = 'URLs.' + TestEnvironment + '.Signin'
print(CurrentURL)
Outputs as: 'URLs.QA.Signin'
but I would like it to:
Output as: 'https://qa.test.com'
The purpose is so I can plug in any value to my 'TestEnvironment' variable and thus access any of my massive list of URL's with ease =P
I am green with Python. Your time and efforts are greatly appreciated! =)
Based upon evanrelf's answer, I tried and loved the following code!:
This is exactly what i'm looking for, I might be over complicating it, any suggestions to clean up the code?
urls = {}
environment = 'qa'
district = 'pleasanthill'
url = environment + district
urls[url] = 'https://' + environment + '.' + district + '.test.com'
print(urls[url])
Output is: https://qa.pleasanthill.test.com
I would recommend you look into Python's dictionaries.
urls = {}
urls['qa'] = 'https://qa.test.com'
test_environment = 'qa'
print(urls[test_environment])
// => https://qa.test.com
I believe to my comprehension that you are trying to input a string and get a new string (the url) back. The simplest answer that I can understand is to use a dictionary. An example of this is by simply doing
URLS = {'sheep' : 'wool.com', 'cows' : 'beef.com'}
either this or by using two arrays and referencing a common index, but who wants to do that :p

JSON.LOADS is picking only 2 resultset

I am trying to use JSON to search through googlemapapi. So, I give location "Plymouth" - in googlemapapi it is showing 6 resultset but when I try to parse in Json, I am getting length of only 2. I tried with multiple cities too, but all I am getting is resultset of 2 rather.
What is wrong below?
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1:
location = js1["results"][count]["formatted_address"]
lat = js1["results"][count]["geometry"]["location"]["lat"]
lng = js1["results"][count]["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)
Simply replace for result in js1: with for result in js1['results']:
By the way, as posted in a comment in the question, no need to use a counter. You can rewrite your for loop as:
for result in js1['results']:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
print('lat',lat,'lng',lng)
print(location)
If you look at the json that comes in, you'll see that its a single dict with two items ("results" and "status"). Add print('result:', result) to the top of your for loop and it will print result: status and result: results because all you are iterating the the keys of that outer dict. That's a general debugging trick in python... if you aren't getting the stuff you want, put in a print statement to see what you got.
The results (not surprisingly) and in a list under js1["results"]. In your for loop, you ignore the variable you are iterating and go back to the original js1 for its data. This is unnecessary and in your case, it hid the error. Had you tried to reference cities off of result you would gotten an error and it may have been easier to see that result was "status", not the array you were after.
Now a few tweaks fix the problem
import urllib.request as UR
import urllib.parse as URP
import json
url = "http://maps.googleapis.com/maps/api/geocode/json?address=Plymouth&sensor=false"
uh = UR.urlopen(url)
data = uh.read()
count = 0
js1 = json.loads(data.decode('utf-8') )
print ("Length: ", len(js1))
for result in js1["results"]:
location = result["formatted_address"]
lat = result["geometry"]["location"]["lat"]
lng = result["geometry"]["location"]["lng"]
count = count + 1
print ('lat',lat,'lng',lng)
print (location)

for loop adding same value together and make JSON format

test=[]
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()
people = site.xpath("//p[#class='poeple']/text()").extract()
test.append('{"money":'+str(money[0])+',"people":'+str(people[0])+'}')
My result test is:
['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
I was stuck by two thing:
One is I print the type of test is string,so is not like JSON format
Two is the money value with 1 is duplicate,so I need to add the people together ,
so the final format I want is:
[
{"money":1,"people":77},
{"money":3,"people":21},
{"money":12,"people":82},
]
How can I do this??
I'd collect money entries in a dict and add up the people as values, the output to json should be done using a json library indeed (I've not tested the code but it should give you an idea how you can approach the problem):
money_map = {}
sites = sel.css(".info")
for site in sites:
money = site.xpath("./h2[#class='money']/text()").extract()[0]
people = int(site.xpath("//p[#class='poeple']/text()").extract()[0])
if money not in money_map:
money_map[money] = 0
money_map[money] += people
import json
output = [{'money': key, 'people': value} for key, value in money_map.items()]
json_output = json.dumps(output)
basically this:
import json
foo = ['{"money":1,"people":23}',
'{"money":3,"people":21}',
'{"money":12,"people":82}',
'{"money":1,"people":54}' ]
bar = []
for i in foo:
j = json.loads(i) # string to json/dict
# if j['money'] is not in bar:
bar.append(j)
# else:
# find index of duplicate and add j['people']
Above is incomplete solution, you have to implement the 'duplicate check and add'

Print only not null values

I am trying to print only not null values but I am not sure why even the null values are coming up in the output:
Input:
from lxml import html
import requests
import linecache
i=1
read_url = linecache.getline('stocks_url',1)
while read_url != '':
page = requests.get(read_url)
tree = html.fromstring(page.text)
percentage = tree.xpath('//span[#class="grnb_20"]/text()')
if percentage != None:
print percentage
i = i + 1
read_url = linecache.getline('stocks_url',i)
Output:
$ python test_null.py
['76%']
['76%']
['80%']
['92%']
['77%']
['71%']
[]
['50%']
[]
['100%']
['67%']
You are getting empty lists, not None objects. You are testing for the wrong thing here; you see [], while if a Python null was being returned you'd see None instead. The Element.xpath() method will always return a list object, and it can be empty.
Use a boolean test:
percentage = tree.xpath('//span[#class="grnb_20"]/text()')
if percentage:
print percentage[0]
Empty lists (and None) test as false in a boolean context. I opted to print out the first element from the XPath result, you appear to only ever have one.
Note that linecache is primarily aimed at caching Python source files; it is used to present tracebacks when an error occurs, and when you use inspect.getsource(). It isn't really meant to be used to read a file. You can just use open() and loop over the file without ever having to keep incrementing a counter:
with open('stocks_url') as urlfile:
for url in urlfile:
page = requests.get(read_url)
tree = html.fromstring(page.content)
percentage = tree.xpath('//span[#class="grnb_20"]/text()')
if percentage:
print percentage[0]
Change this in your code and it should work:
if percentage != []:

How do I create a list of timedeltas in python?

I've been searching through this website and have seen multiple references to time deltas, but haven't quite found what I'm looking for.
Basically, I have a list of messages that are received by a comms server and I want to calcuate the latency time between each message out and in. It looks like this:
161336.934072 - TMsg out: [O] enter order. RefID [123] OrdID [4568]
161336.934159 - TMsg in: [A] accepted. ordID [456] RefNumber [123]
Mixed in with these messages are other messages as well, however, I only want to capture the difference between the Out messages and in messages with the same RefID.
So far, to sort out from the main log which messages are Tmessages I've been doing this, but it's really inefficient. I don't need to be making new files everytime.:
big_file = open('C:/Users/kdalton/Documents/Minicomm.txt', 'r')
small_file1 = open('small_file1.txt', 'w')
for line in big_file:
if 'T' in line: small_file1.write(line)
big_file.close()
small_file1.close()
How do I calculate the time deltas between the two messages and sort out these messages from the main log?
First of all, don't write out the raw log lines. Secondly use a dict.
tdeltas = {} # this is an empty dict
if "T" in line:
get Refid number
if Refid in tedeltas:
tdeltas[Refid] = timestamp - tdeltas[Refid]
else:
tdeltas[Refid] = timestamp
Then at the end, convert to a list and print
allRefids = sorted(tdeltas.keys())
for k in allRefids:
print k+": "+tdeltas[k]+" secs"
You may want to convert your dates into time objects from the datetime module and then use timedelta objects to store in the dict. Probably not worth it for this task but it is worthwhile to learn how to use the datetime module.
Also, I have glossed over parsing the Refid from the input string, and the possible issue of converting the times from string to float and back.
Actually, just storing deltas will cause confusion if you ever have a Refid that is not accepted. If I were doing this for real, I would store a tuple in the value with the start datetime, end datetime and the delta. For a new record it would look like this: (161336.934072,0,0) and after the acceptance was detected it would look like this: (161336.934072,161336.934159,.000087). If the logging activity was continuous, say a global ecommerce site running 24x7, then I would periodically scan the dict for any entries with a non-zero delta, report them, and delete them. Then I would take the remaining values, sort them on the start datetime, then report and delete any where the start datetime is too old because that indicates failed transactions that will never complete.
Also, in a real ecommerce site, I might consider using something like Redis or Memcache as an external dict so that reporting and maintenance can be done by another server/application.
This generator function returns a tuple containing the id and the difference in timestamps between the out and in messages. (If you want to do something more complex with the time difference, check out datetime.timedelta). Note that this assumes out messages always appear before in messages.
def get_time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if len(e) == 11 and " ".join(e[2:5]) == "TMsg out: [O]":
ts[e[8]] = e[0] # store timestamp for id
elif len(e) == 10 and " ".join(e[2:5]) == "TMsg in: [A]":
in_ts, ref_id = e[0], e[9]
# Raises KeyError if out msg not seen yet. Handle if required.
out_ts = ts.pop(ref_id) # get ts for this id
yield (ref_id[1:-1], float(in_ts) - float(out_ts))
You can now get a list out of it:
>>> INFILE = 'C:/Users/kdalton/Documents/Minicomm.txt'
>>> list(get_time_deltas(INFILE))
[('123', 8.699999307282269e-05), ('1233', 0.00028700000257231295)]
Or write it to a file:
>>> with open("out.txt", "w") as outfile:
... for id, td in get_time_deltas(INFILE):
... outfile.write("Msg %s took %f seconds\n", (id, td))
Or chain it into a more complex workflow.
Update:
(in response to looking at the actual data)
Try this instead:
def get_time_deltas(infile):
entries = (line.split() for line in open(INFILE, "r"))
ts = {}
for e in entries:
if " ".join(e[2:5]) == "OuchMsg out: [O]":
ts[e[8]] = e[0] # store timestamp for id
elif " ".join(e[2:5]) == "OuchMsg in: [A]":
in_ts, ref_id = e[0], e[7]
out_ts = ts.pop(ref_id, None) # get ts for this id
# TODO: handle case where out_ts = None (no id found)
yield (ref_id[1:-1], float(in_ts) - float(out_ts))
INFILE = 'C:/Users/kdalton/Documents/Minicomm.txt'
print list(get_time_deltas(INFILE))
Changes in this version:
the number of fields is not as stated in the sample input posted in question. Removed check based on entry number
ordID for in messages is the one that matches refID in the out messages
used OuchMsg instead of TMsg
Update 2
To get an average of the deltas:
deltas = [d for _, d in get_time_deltas(INFILE)]
average = sum(deltas) / len(deltas)
Or, if you have previously generated a list containing all the data, we can reuse it instead of reparsing the file:
data = list(get_time_deltas(INFILE))
# .. use data for something some operation ...
# calculate average using the list
average = sum(d for _, d in data) / len(data)

Categories