looping through json python is very slow

looping through json python is very slow - python

Can someone help me understand what I'm doing wrong in the following code:
def matchTrigTohost(gtriggerids,gettriggers):
mylist = []
for eachid in gettriggers:
gtriggerids['params']['triggerids'] = str(eachid)
hgetjsonObject = updateitem(gtriggerids,processor)
hgetjsonObject = json.dumps(hgetjsonObject)
hgetjsonObject = json.loads(hgetjsonObject)
hgetjsonObject = eval(hgetjsonObject)
hostid = hgetjsonObject["result"][0]["hostid"]
hname = hgetjsonObject["result"][0]["name"]
endval = hostid + "--" + hname
mylist.append(endval)
return(hgetjsonObject)
The variable gettriggers contain a lot of ids (~3500):
[ "26821", "26822", "26810", ..... ]
I'm looping through the ids in the variable and assigning them to a json object.
gtriggerids = {
"jsonrpc": "2.0",
"method": "host.get",
"params": {
"output": ["hostid", "name"],
"triggerids": "26821"
},
"auth": mytoken,
"id": 2
}
When I run the code against the above json variable, it is very slow. It is taking several minutes to check each ID. I'm sure I'm doing many things wrong here or at least not in the pythonic way. Can anyone help me speed this up? I'm very new to python.
NOTE:
The dump() , load(), eval() were used to convert the str produced to json.

You asked for help knowing what you're doing wrong. Happy to oblige :-)
At the lowest level—why your function is running slowly—you're running many unnecessary operations. Specifically, you're moving data between formats (python dictionaries and JSON strings) and back again which accomplishes nothing but wasting CPU cycles.
You mentioned this is only way you could get the data in the format you needed. That brings me to the second thing you're doing wrong.
You're throwing code at the wall instead of understanding what's happening.
I'm quite sure (and several of your commenters appear to agree) that your code is not the only way to arrange your data into a usable structure. What you should do instead is:
Understand as much as you can about the data you're being given. I suspect the output of updateitem() should be your first target of learning.
Understand the right/typical way to interact with that data. Your data doesn't have to be a dictionary before you can use it. Maybe it's not the best approach.
Understand what regularities and irregularities the data may have. Part of your problem may not be with types or dictionaries, but with an unpredictable/dirty data source.
Armed with all this new knowledge, manipulate your as simply as you can.
I can pretty much guarantee the result will run faster.
More detail! Some things you wrote suggest misconceptions:
I'm looping through the ids in the variable and assigning them to a json object.
No, you can't assign to a JSON object. In python, JSON data is always a string. You probably mean that you're assigning to a python dictionary, which (sometimes!) can be converted to a JSON object, represented as a string. Make sure you have all those concepts clear before you move forward.
The dump() , load(), eval() were used to convert the str produced to json.
Again, you don't call dumps() on a string. You use that to convert a python object to a string. Run this code in a REPL, go step by step, and inspect or play with each output to understand what it is.

Related

Write a custom JSON interpreter for a file that looks like json but isnt using Python

What I need to do is to write a module that can read and write files that use the PDX script language. This language looks alot like json but has enough differences that a custom encoder/decoder is needed to do anything with those files (without a mess of regex substitutions which would make maintenance hell). I originally went with just reading them as txt files, and use regex to find and replace things to convert it to valid json. This lead me to my current point, where any additions to the code requires me to write far more code than I would want to, just to support some small new thing. So using a custom json thing I could write code that shows what valid key:value pairs are, then use that to handle the files. To me that will be alot less code and alot easier to maintain.
So what does this code look like? In general it looks like this (tried to put all possible syntax, this is not an example of a working file):
#key = value # this is the definition for the scripted variable
key = {
# This is a comment. No multiline comments
function # This is a single key, usually optimize_memory
# These are the accepted key:value pairs. The quoted version is being phased out
key = "value"
key = value
key = #key # This key is using a scripted variable, defined either in the file its in or in the `scripted_variables` folder. (see above for example on how these are initially defined)
# type is what the key type is. Like trigger:planet_stability where planet_stability is a trigger
key = type:key
# Variables like this allow for custom names to be set. Mostly used for flags and such things
[[VARIABLE_NAME]
math_key = $VARIABLE_NAME$
]
# this is inline math, I dont actually understand how this works in the script language yet as its new. The "<" can be replaced with any math symbol.
# Valid example: planet_stability < #[ stabilitylevel2 + 10 ]
key < #[ key + 10 ]
# This is used alot to handle code blocks. Valid example:
# potential = {
# exists = owner
# owner = {
# has_country_flag = flag_name
# }
# }
key = {
key = value
}
# This is just a list. Inline brackets are used alot which annoys me...
key = { value value }
}
The major differences between json and PDX script is the nearly complete lack of quotations, using an equals sign instead of a colon for separation and no comma's at the end of the lines. Now before you ask me to change the PDX code, I cant. Its not mine. This is what I have to work with and cant make any changes to the syntax. And no I dont want to convert back and forth as I have already mentioned this would require alot of work. I have attempted to look for examples of this, however all I can find are references to convert already valid json to a python object, which is not what I want. So I cant give any examples of what I have already done, as I cant find anywhere to even start.
Some additional info:
Order of key:value pairs does not technically matter, however it is expected to be in a certain order, and when not in that order causes issues with mods and conflict solvers
bool properties always use yes or no rather than true or false
Lowercase is expected and in some cases required
Math operators are used as separators as well, eg >=, <= ect
The list of syntax is not exhaustive, but should contain most of the syntax used in the language
Past work:
My last attempts at this all revolved around converting it from a text file to a json file. This was alot of work just to get a small piece of this to work.
Example:
potential = {
exists = owner
owner = {
is_regular_empire = yes
is_fallen_empire = no
}
NOR = {
has_modifier = resort_colony
has_modifier = slave_colony
uses_habitat_capitals = yes
}
}
And what i did to get most of the way to json (couldnt find a way to add quotes)
test_string = test_string.replace("\n", ",")
test_string = test_string.replace("{,", "{")
test_string = test_string.replace("{", "{\n")
test_string = test_string.replace(",", ",\n")
test_string = test_string.replace("}, ", "},\n")
test_string = "{\n" + test_string + "\n}"
# Replace the equals sign with a colon
test_string = test_string.replace(" =", ":")
This resulted in this:
{
potential: {
exists: owner,
owner: {
is_regular_empire: yes,
is_fallen_empire: no,
},
NOR: {
has_modifier: resort_colony,
has_modifier: slave_colony,
uses_habitat_capitals: yes,
},
}
}
Very very close yes, but in no way could I find a way to add the quotations to each word (I think I did try a regex sub, but wasnt able to get it to work, since this whole thing is just one unbroken string), making this attempt stuck and also showing just how much work is required just to get a very simple potential block to mostly work. However this is not the method I want anymore, one because its alot of work and two because I couldnt find anything to finish it. So a custom json interpreter is what I want.

The classical approach (potentially leading to more code, but also more "correctness"/elegance) is probably to build a "recursive descent parser", from a bunch of conditionals/checks, loops and (sometimes recursive?) functions/handlers to deal with each of the encountered elements/characters on the input stream. An implicit parse/call tree might be sufficient if you directly output/print the JSON equivalent, or otherwise you could also create a representation/model in memory for later output/conversion.
Related book recommendation could be "Language Implementation Patterns" by Terence Parr, me avoiding to promote my own interpreters and introductory materials :-) In case you need further help, maybe write me?

What is equivalent of Perl DB_FILE module in Python?

I was asked by my supervisor to convert some Perl scripts into Python language. I'm baffled by few lines of code and I am also relatively inexperienced with Python as well. I'm an IT intern, so this was something of a challenge.
Here are the lines of code:
my %sybase;
my $S= tie %sybase, "DB_File", $prismfile, O_RDWR|O_CREAT, 0666, $DB_HASH or die "Cannot open: $!\n";
$DB_HASH->{'cachesize' } = $cache;
I'm not sure what is the equivalent of this statement in Python? DB_FILE is a Perl module. DB_HASH is a database type that allows arbitrary keys/values to be stored in data file, at least that's according to Perl documentation.
After that, the next lines of code also got me stumped on how to convert this to the equivalent in Python as well.
$scnt=0;
while(my $row=$msth->fetchrow_arrayref()) {
$scnt++;
$srow++;
#if ($scnt <= 600000) {
$S->put(join('#',#{$row}[0..5]),join('#',#{$row}[6..19]));
perf(500000,'sybase') ;#if $VERBOSE ;
# }
}
I'll probably use fetchall() in Python to store the entire result dataset in it, then work through it row by row. But I'm not sure how to implement join() correctly in Python, especially since these lines use range within the row index elements -- [0..5]. Also it seems to write the output to data file (look at put()). I'm not sure what perf() does, can anyone help me out here?
I'd appreciate any kind of help here. Thank you very much.

Check if JSON var has nullable key (Twitter Streaming API)

I'm downloading tweets from Twitter Streaming API using Tweepy. I manage to check if downloaded data has keys as 'extended_tweet', but I'm struggling with an specific key inside another key.
def on_data(self, data):
savingTweet = {}
if not "retweeted_status" in data:
dataJson = json.loads(data)
if 'extended_tweet' in dataJson:
savingTweet['text'] = dataJson['extended_tweet']['full_text']
else:
savingTweet['text'] = dataJson['text']
if 'coordinates' in dataJson:
if 'coordinates' in dataJson['coordinates']:
savingTweet['coordinates'] = dataJson['coordinates']['coordinates']
else:
savingTweet['coordinates'] = 'null'
I'm checking 'extended_key' propertly, but when I try to do the same with ['coordinates]['coordinates] I get the following error:
TypeError: argument of type 'NoneType' is not iterable
Twitter documentation says that key 'coordinates' has the following structure:
"coordinates":
{
"coordinates":
[
-75.14310264,
40.05701649
],
"type":"Point"
}
I achieved to solve it by just putting the conflictive check in a try, except, but I think this is not the most suitable approach to the problem. Any other idea?

So the twitter API docs are probably lying a bit about what they return (shock horror!) and it looks like you're getting a None in place of the expected data structure. You've already decided against using try, catch, so I won't go over that, but here are a few other suggestions.
Using dict get() default
There are a couple of options that occur to me, the first is to make use of the default ability of the dict get command. You can provide a fall back if the expected key does not exist, which allows you to chain together multiple calls.
For example you can achieve most of what you are trying to do with the following:
return {
'text': data.get('extended_tweet', {}).get('full_text', data['text']),
'coordinates': data.get('coordinates', {}).get('coordinates', 'null')
}
It's not super pretty, but it does work. It's likely to be a little slower that what you are doing too.
Using JSONPath
Another option, which is likely overkill for this situation is to use a JSONPath library which will allow you to search within data structures for items matching a query. Something like:
from jsonpath_rw import parse
matches = parse('extended_tweet.full_text').find(data)
if matches:
print(matches[0].value)
This is going to be a lot slower that what you are doing, and for just a few fields is overkill, but if you are doing a lot of this kind of work it could be a handy tool in the box. JSONPath can also express much more complicated paths, or very deeply nested paths where the get method might not work, or would be unweildy.
Parse the JSON first!
The last thing I would mention is to make sure you parse your JSON before you do your test for "retweeted_status". If the text appears anywhere (say inside the text of a tweet) this test will trigger.
JSON parsing with a competent library is usually extremely fast too, so unless you are having real speed problems it's not necessarily worth worrying about.

Python- Insert new values into 'nested' list?

What I'm trying to do isn't a huge problem in php, but I can't find much assistance for Python.
In simple terms, from a list which produces output as follows:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765}...
All I want to do is add in an extra field, containing the 'runner name' in the data set below, into each of the 'runners' sub lists from the initial data set, based on selection_id=selectionId.
So initially I iterate through the full dataset, and then create a separate list to get the runner name from the runner id (I should point out that runnerId===selectionId===selection_id, no idea why there are multiple names are used), this works fine and the code is shown below:
for market_book in market_books:
market_catalogues = trading.betting.list_market_catalogue(
market_projection=["RUNNER_DESCRIPTION", "RUNNER_METADATA", "COMPETITION", "EVENT", "EVENT_TYPE", "MARKET_DESCRIPTION", "MARKET_START_TIME"],
filter=betfairlightweight.filters.market_filter(
market_ids=[market_book.market_id],
),
max_results=100)
data = []
for market_catalogue in market_catalogues:
for runner in market_catalogue.runners:
data.append(
(runner.selection_id, runner.runner_name)
)
So as you can see I have the data in data[], but what I need to do is add it to the initial data set, based on the selection_id.
I'm more comfortable with Php or Javascript, so apologies if this seems a bit simplistic, but the code snippets I've found on-line only seem to assist with very simple Python lists and nothing 'nested' (to me the structure seems similar to a nested array).
As per the request below, here is the full list:
{"marketId":"1.130856098","totalAvailable":null,"isMarketDataDelayed":null,"lastMatchTime":null,"betDelay":0,"version":2576584033,"complete":true,"runnersVoidable":false,"totalMatched":null,"status":"OPEN","bspReconciled":false,"crossMatching":false,"inplay":false,"numberOfWinners":1,"numberOfRunners":10,"numberOfActiveRunners":8,"runners":[{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":2.8,"size":34.16},{"price":2.76,"size":200},{"price":2.5,"size":237.85}],"availableToLay":[{"price":2.94,"size":6.03},{"price":2.96,"size":10.82},{"price":3,"size":33.45}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832765},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":20,"size":3},{"price":19.5,"size":26.36},{"price":19,"size":2}],"availableToLay":[{"price":21,"size":13},{"price":22,"size":2},{"price":23,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832767},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":11,"size":9.75},{"price":10.5,"size":3},{"price":10,"size":28.18}],"availableToLay":[{"price":11.5,"size":12},{"price":13.5,"size":2},{"price":14,"size":7.75}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832766},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":48,"size":2},{"price":46,"size":5},{"price":42,"size":5}],"availableToLay":[{"price":60,"size":7},{"price":70,"size":5},{"price":75,"size":10}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832769},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":18.5,"size":28.94},{"price":18,"size":5},{"price":17.5,"size":3}],"availableToLay":[{"price":21,"size":20},{"price":23,"size":2},{"price":24,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832768},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":4.3,"size":9},{"price":4.2,"size":257.98},{"price":4.1,"size":51.1}],"availableToLay":[{"price":4.4,"size":20.97},{"price":4.5,"size":30},{"price":4.6,"size":16}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832771},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":24,"size":6.75},{"price":23,"size":2},{"price":22,"size":2}],"availableToLay":[{"price":26,"size":2},{"price":27,"size":2},{"price":28,"size":2}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":12832770},{"status":"ACTIVE","ex":{"tradedVolume":[],"availableToBack":[{"price":5.7,"size":149.33},{"price":5.5,"size":29.41},{"price":5.4,"size":5}],"availableToLay":[{"price":6,"size":85},{"price":6.6,"size":5},{"price":6.8,"size":5}]},"sp":{"nearPrice":null,"farPrice":null,"backStakeTaken":[],"layLiabilityTaken":[],"actualSP":null},"adjustmentFactor":null,"removalDate":null,"lastPriceTraded":null,"handicap":0,"totalMatched":null,"selectionId":10064909}],"publishTime":1551612312125,"priceLadderDefinition":{"type":"CLASSIC"},"keyLineDescription":null,"marketDefinition":{"bspMarket":false,"turnInPlayEnabled":false,"persistenceEnabled":false,"marketBaseRate":5,"eventId":"28180290","eventTypeId":"2378961","numberOfWinners":1,"bettingType":"ODDS","marketType":"NONSPORT","marketTime":"2019-03-29T00:00:00.000Z","suspendTime":"2019-03-29T00:00:00.000Z","bspReconciled":false,"complete":true,"inPlay":false,"crossMatching":false,"runnersVoidable":false,"numberOfActiveRunners":8,"betDelay":0,"status":"OPEN","runners":[{"status":"ACTIVE","sortPriority":1,"id":10064909},{"status":"ACTIVE","sortPriority":2,"id":12832765},{"status":"ACTIVE","sortPriority":3,"id":12832766},{"status":"ACTIVE","sortPriority":4,"id":12832767},{"status":"ACTIVE","sortPriority":5,"id":12832768},{"status":"ACTIVE","sortPriority":6,"id":12832770},{"status":"ACTIVE","sortPriority":7,"id":12832769},{"status":"ACTIVE","sortPriority":8,"id":12832771},{"status":"LOSER","sortPriority":9,"id":10317013},{"status":"LOSER","sortPriority":10,"id":10317010}],"regulators":["MR_INT"],"countryCode":"GB","discountAllowed":true,"timezone":"Europe\/London","openDate":"2019-03-29T00:00:00.000Z","version":2576584033,"priceLadderDefinition":{"type":"CLASSIC"}}}

i think i understand what you are trying to do now
first hold your data as a python object (you gave us a json object)
import json
my_data = json.loads(my_json_string)
for item in my_data['runners']:
item['selectionId'] = [item['selectionId'], my_name_here]
the thing is that my_data['runners'][i]['selectionId'] is a string, unless you want to concat the name and the id together, you should turn it into a list or even a dictionary
each item is a dicitonary so you can always also a new keys to it
item['new_key'] = my_value

So, essentially this works...with one exception...I can see from the print(...) in the loop that the attribute is updated, however what I can't seem to do is then see this update outside the loop.
mkt_runners = []
for market_catalogue in market_catalogues:
for r in market_catalogue.runners:
mkt_runners.append((r.selection_id, r.runner_name))
for market_book in market_books:
for runner in market_book.runners:
for x in mkt_runners:
if runner.selection_id in x:
setattr(runner, 'x', x[1])
print(market_book.market_id, runner.x, runner.selection_id)
print(market_book.json())
So the print(market_book.market_id.... displays as expected, but when I print the whole list it shows the un-updated version. I can't seem to find an obvious solution, which is odd, as it seems like a really simple thing (I tried messing around with indents, in case that was the problem, but it doesn't seem to be, its like its not refreshing the market_book list post update of the runners sub list)!

Better way to insert values in xml python 2.7

I've been looking around for a better solution that what I'm already doing. I need to construct an xml to send to a SOAP service but I need to generate the XML dinamically. The problem or may not be a problem, is that I find this really long and I think there must be a better way of achiving this. I'm using python 2.7.5, this is my XML (kind of, it's actually larger):
SINGLE_PAYMENT = '''...
<v1:Shipping>
<v1:Type>%s</v1:Type>
<v1:Address1>%s</v1:Address1>
<v1:Address2>%s</v1:Address2>
<v1:City>%s</v1:City>
<v1:Country>%s</v1:Country>
<v1:Items>%s</v1:Items>
<v1:State>%s</v1:State>
<v1:Carrier>%s</v1:Carrier>
<v1:Weight>%s</v1:Weight>
<v1:Total>%s</v1:Total>
</v1:Shipping>
....'''
Then I do
SoapMessage = SINGLE_PAYMENT%...and replace here with variables passed to this function
Is there any better way of doing this?.Thanks!

You could use a proper XML library, or you could just write a method that generates a single "shipping" element using templates, for example:
def generate_tag(name, values, version='1'):
return '\n'.join([
'<v{0}:{1}>'.format(version, name),
'\n'.join(
'\t<v{2}:{0}>{1}</v{2}:{0}>'.format(k, v, version) for k, v in values.iteritems()
),
'</v{0}:{1}>'.format(version, name)
])
# Output:
>>> print generate_tag('Shipping', {"Total": "100", "Type": "A"})
<v1:Shipping>
<v1:Total>100</v1:Total>
<v1:Type>A</v1:Type>
</v1:Shipping>
But, as you can see, that gets messy quickly.
Using the libraries (such as ElementTree or MiniDom) would be more to your benefit, though, as they will take care of any needed escaping and provide a better interface should you need to make changes later.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.