How do I remove an extra square bracket from JSON in Python? - python

I have JSON of the following form:
{"blah":
[
[
{"first":
{"something":"that","something":"else","another":"thing","key":"value"}...[etc.]
}
]
]
}
that I'm trying to parse in Python. I've imported json (or simplejson, depending on what version of Python you're using) and everything goes pretty well until I get to this block:
for result in page['blah']:
that = result['first']
a_list.append(that)
which throws the error "TypeError: list indices must be integers, not str".
I'm pretty sure this error is due to the extra pair of square-bracket that makes the JSON inside look like a list.
My question, assuming that's the case, is, How do I remove it and still have valid JSON to parse as dictionaries?
Other workarounds welcome. If I need to supply more info, let me know. Thanks!
(Added the missing curly bracket and changed a couple of confusing terms--I was trying to come up with generic terms on the fly, sorry for any confusion.)

If there's always exactly one "extra" set of array brackets:
for result in page['blah']:
that = result[0]['this']
list.append(that)

There is not need to remove the brackets from the JSON string. I think making sure to remove only the rights is not worth the effort. Just figure out the right way to access the values you want.
The "extra" brackets are not the only problem. this is a property of an object which is the value of first. So to access this, you'd have to write
that = result[0]['first']['this']
Whether this always works or not depends on the left-out JSON data.

First, you are right - your error is related to the incorrect use of the Python data types and JSON output.
Second, don't use list and all other Python reserved words when creating your variables.
Finally, if you simply want to get all results for 'this' inner key, you can try using the following code:
data = {"blah":
[
[
{"first":
{"this":"that",
"something":"else","another":"thing","key":"value"}}
]
]
}
outres = []
for k,v in data.items():#iterate over top dictionary('blah',....)
for sv in v: # iterate through first list
for tv in sv: # iterate through second list
for fk,fv in tv.iteritems(): # iterate through each dicionary from second list
if 'this' in fv:
outres.append(fv['this'])
print outres
Please note that my sample is based on your data sample - so if there is any additional levels in your data structure, or if any other rules should be applied, then the code should be modified.

Related

Using nested for loops to iterate through JSON file of tweets in Python

So I am new to Python, but I know what I am trying to accomplish. Basically, I have the output of tweets from twitter in a JSON file loaded into Python. What I need to do is iterate through the tweets to access the "text" key, that has the text of each tweet, because that's what I'm going to use to do topic modeling. So, I have discovered that "text" is triple nested in this data structure, and it's been really difficult to find the correct way to write the for loop code in order to iterate through the dataset and pull the "text" from every tweet.
Here is a look at what the JSON structure is like: https://pastebin.com/fUH5MTMx
So, I have figured out that the "text" key that I want is within [hits][hits][_source]. What I can't figure out is the appropriate for loop to iterate through _source and pull those texts. Here is my code so far (again I'm very beginning sorry if try code is way off):
for hits in tweets["hits"]["hits"]:
for _source in hits:
for text in _source:
for item in text:
print(item)
also tried this:
for item in tweets['hits']["hits"]["_source"]:
print(item['text'])
But I keep getting either syntax errors for the first one then "TypeError: list indices must be integers or slices, not str" for the second one. I am understanding that I need to specify some way that I am trying to access this list, and that I'm missing something in order to show that its a list and I am not looking for integers as an output from iterations...(I am using the JSON module in Python for this, an using a Mac with Python3 in Spyder)
Any insight would be greatly appreciated! This multiple nesting is confusing me a lot.
['hits']["hits"] is not dictionary with ["_source"]
but a list with one or many items which have ["_source"]
it means
tweets['hits']["hits"][0]["_source"]
tweets['hits']["hits"][1]["_source"]
tweets['hits']["hits"][2]["_source"]
So this should work
for item in tweets['hits']["hits"]:
print(item["_source"]['text'])
Not sure if you realize it, but JSON is transformed into a Python dictionary, not a list. Anyway, let's get into this nest.
tweets['hits'] will give you another dict.
tweets['hits']['hits'] will give you a list (notice the brackets)
This apparently is a list of dictionaries, and in this case (not sure if it will always be), the dict with the "_source" key you are looking for is the first one,so:
tweets['hits']['hits'][0] will give you the dict you want. Then, finally:
tweets['hits']['hits'][0]['_source'] should give you the text.
The value of the second "hits" is a list.
Try:
for hit in tweets["hits"]["hits"]:
print(hit["_source"]["text"])

json strip multiple lists [duplicate]

This question already has answers here:
Why can't Python parse this JSON data? [closed]
(3 answers)
Closed 5 years ago.
I am looking for more info regarding this issue I have. So far I have checked the JSON encoding/decoding but it was not precisely what I was looking for.
I am looking for some way to strip this kind of list quite easily:
//response
{
"age":[
{"#":"1","age":10},
{"#":"2","age":12},
{"#":"3","age":16},
{"#":"4","age":3}
],
"age2":[
{"#":"1","age":10},
{"#":"2","age":12},
{"#":"3","age":16},
{"#":"4","age":3}
],
"days_month":31,
"year":2017
}
So how do I easily extract the data? i.e. I want to get the result age of person in age2 with # == 3.
To get the results for year/days_months I found the solution with google:
j=json.loads(r.content)
print(j['year'])
to retrieve the data. Probably I have missed something somewhere on the internet, but I could not find the specific solution for this case.
I think this is what #Jean-François Fabre tried to indicate:
import json
response = """
{
"age":[
{"#":"1","age":10},
{"#":"2","age":12},
{"#":"3","age":16},
{"#":"4","age":3}
],
"age2":[
{"#":"1","age":10},
{"#":"2","age":12},
{"#":"3","age":16},
{"#":"4","age":3}
],
"days_month":31,
"year":2017
}
"""
j = json.loads(response)
# note that the [2] means the third element in the "age2" list-of-dicts
print(j['age2'][2]['#']) # -> 3
print(j['age2'][2]['age']) # -> 16
json.loads() converts a string in JSON format into a Python object. In particular it converts JSON objects into Python dictionaries and JSON lists into Python list objects. This means you can access the contents of the result stored in the variable j in this case, just like you would if it was a native mixture of one or more of those types of Python datatypes (and would look very similar to what is shown in the response).
As the search criterion you are looking for is not contained in the indices of the respective datastructures, I would do it using a list comprehension. For your example, this would be
[person['age'] for person in j['age2'] if person['#'] == u'3'][0]
This iterates through all the items in the list under 'age2', and puts all the items where the number is '3' into a list. The [0] selects the first entry of the list.
However, this is very inefficient. If you have large datasets, you might want to have a look at pandas:
df = pandas.DataFrame(j['age2'])
df[df['#'] == '3']['age']
which is much more performant as long as your data can be represented by a sort of series or table.

Python sorting deep in a datastructure

I'm in pretty far over my head right now for Python scripting and don't really understand what I'm doing.
I have a dictionary where the keys are strings and the values are lists of strings. I need to sort the strings within the list alphanumerically such that
"role_powerdns": [
"name-2.example.com",
"name-1.example.com",
"name-3.example.com"
],
Looks like this
"role_powerdns": [
"name-1.example.com",
"name-2.example.com",
"name-3.example.com"
],
The full original code I'm working off of is here for reference. https://github.com/AutomationWithAnsible/ansible-dynamic-inventory-chef/blob/master/chef_inventory.py
I don't fully understand myself how the code I'm working off of works, I just know the data structure it's returning.
My local additions to that base code is filtering out IPs and inserting .sub into the strings. I've tried reusing the comprehension syntax I've got below for modifying the strings to sort the strings. Could someone show provide an example of how to iterate through a nested structure like this? Alternatively if doing this sort of sorting so late in the script is not appropriate, when generally should it be handled?
def filterIP(fullList):
regexIP = re.compile(r'\d{1,3}.\d{1,3}.\d{1,3}.\d{1,3}$')
return filter(lambda i: not regexIP.search(i), fullList)
groups = {key : [domain.replace('sub.example.com', 'example.com') for domain in filterIP(list(set(items)))] for (key, items) in groups.iteritems() }
print(self.json_format_dict(groups, pretty=True))
If d is your dictionary, you can sort all its values by
for _, l in d.items():
l.sort()
fuglede's answer almost worked for me. What I needed to do instead was use iteritems. items was doing some funky local version that was being thrown away after the code block was done with it per this
Iterating over dictionaries using 'for' loops
for key, value in groups.iteritems():
value.sort()

how to pass list values through url

I have written a python script where I have collected some values in a list. I need to pass on these values to an URL in a loop where in each time a different value is picked up.
i..e, I want to achieve this:
http://www.abc.com/xyz/pqr/symbol=something[i].
Here "something" is a list and I have verified that it contains the proper values. However when I pass the values to the URL, I am not getting the desired results. I have tried with URL encoding for something[i] but still it is not giving me proper results. Can someone help me?
EDIT: My example script at the moment is:
import json
script=["Linux","Windows"]
for i in xrange(len(script)):
site="abc.com/pqr/xyz/symbol=json.dumps(script[i])";
print site
I think the problem is your approach to formatting. You don't really need json if you have a list already and are just trying to modify a URL...
import json
script=["Linux","Windows"]
something = ["first","second"]
for i,j in zip(script,something):
site="http:abc.com/pqr/xyz/symbol={0}".format(j)
print i, site
This uses the .format() operator, which "sends" the values in parentheses into the string at the positions marked with {}. You could just add the strings together if it is always at the end. You could also use the older % operator instead. It does pretty much the same thing, but in this case it inserts the string j at the position marked by %s:
site="http:abc.com/pqr/xyz/symbol=%s" % (j)
Side note: I slightly prefer % because once you learn it, it can also be used in other programming languages, but .format() has more options and is the recommended way to do it since python 2.6.
Output:
Linux http:abc.com/pqr/xyz/symbol=first
Windows http:abc.com/pqr/xyz/symbol=second
You should be able to get close to what you want from this starting point, but if this is nothing like your desired output, then you need to clarify in your question...

Python: Linking to a dictionary through a text string

I'm trying to create a program module that contains data structures (dictionaries) and text strings that describe those data structures. I want to import these (dictionaries and descriptions) into a module that is feeding a GUI interface. One of the displayed lines is the contents contained in the first dictionary with one field that contains all possible values contained in another dictionary. I'm trying to avoid 'hard-coding' this relationship and would like to pass a link to the second dictionary (containing all possible values) to the string describing the first dictionary. An abstracted example would be:
dict1 = {
"1":["dog","cat","fish"],
"2":["alpha","beta","gamma","epsilon"]
}
string="parameter1,parameter2,dict1"
# Silly example starts here
#
string=string.split(",")
print string[2]["2"]
(I'd like to get: ["alpha","beta","gamma","epsilon"]
But of course this doesn't work
Does anyone have a clever solution to this problem?
Generally, this kind of dynamic code execution is a bad idea. it leads to very difficult to read and maintain code. However, if you must, you can use globals for this:
globals()[string[2]]["2"]
A better solution would be to put dict1 into a dictionary in the first place:
dict1 = ...
namespace = {'dict1': dict1}
string = ...
namespace[string[2]]["2"]

Categories