How to select particular JSON object with specific value? - python

I have List of multiple dictionaries inside it(as JSON ).I have a list of value and based on that value I want that JSON object for that particular value. For eg.
[{'content_type': 'Press Release',
'content_id': '1',
'Author':John},
{'content_type': 'editorial',
'content_id': '2',
'Author': Harry
},
{'content_type': 'Article',
'content_id': '3',
'Author':Paul}]
I want to to fetch complete object where author is Paul.
This is the code I have made so far.
import json
newJson = "testJsonNewInput.json"
ListForNewJson = []
def testComparision(newJson,oldJson):
with open(newJson, mode = 'r') as fp_n:
json_data_new = json.load(fp_n)
for jData_new in json_data_new:
ListForNewJson.append(jData_new['author'])
If any other information required, please ask.

Case 1
One time access
It is perfectly alright to read your data and iterate over it, returning the first match found.
def access(f, author):
with open(file) as f:
data = json.load(f)
for d in data:
if d['Author'] == author:
return d
else:
return 'Not Found'
Case 2
Repeated access
In this instance, it would be wise to reshape your data in such a way that accessing objects by author names is much faster (think dictionaries!).
For example, one possible option would be:
with open(file) as f:
data = json.load(f)
newData = {}
for d in data:
newData[d['Author']] = d
Now, define a function and pass your pre-loaded data along with a list of author names.
def access(myData, author_list):
for a in author_list:
yield myData.get(a)
The function is called like this:
for i in access(newData, ['Paul', 'John', ...]):
print(i)
Alternatively, store the results in a list r. The list(...) is necessary, because yield returns a generator object which you must exhaust by iterating over.
r = list(access(newData, [...]))

Why not do something like this? It should be fast and you will not have to load the authors that wont be searched.
alreadyknown = {}
list_of_obj = [{'content_type': 'Press Release',
'content_id': '1',
'Author':'John'},
{'content_type': 'editorial',
'content_id': '2',
'Author': 'Harry'
},
{'content_type': 'Article',
'content_id': '3',
'Author':'Paul'}]
def func(author):
if author not in alreadyknown:
obj = get_obj(author)
alreadyknown[author] = obj
return alreadyknown[author]
def get_obj(auth):
return [obj for obj in list_of_obj if obj['Author'] is auth]
print(func('Paul'))

Related

Read List from stringname append

I have the following problem, I want to reference a variable from a string so that I can call up a list.
I enter the user into the function def fetch(user). e.g. name1
I would like from name1, read the list name1_skiplist
or from name2 read name2_skiplist
name1_skiplist = [('home', '/pic'),('home', '/jpg'),]
name2_skiplist = [('etc', '/pic'),('etc', '/jpg'),]
name3_skiplist = [('tmp', '/pic'),('tmp', '/jpg'),]
def fetch(user):
joinedlist = []
joinedlist = user + '_skiplist'
if joinedlist:
....
Dict is more suited for you use case to retrieve list based on your key.
data = {'name1_skiplist': [('home', '/pic'), ('home', '/jpg'), ],
'name2_skiplist': [('etc', '/pic'), ('etc', '/jpg'), ],
'name3_skiplist': [('tmp', '/pic'), ('tmp', '/jpg'), ]}
def fetch(user):
joinedlist = user + '_skiplist'
result = data.get(joinedlist)
return result
Organize related information in collections -- data structures like dicts, lists,
tuples, namedtuples, dataclasses, etc. In your case, assuming I understand
your goal, a dict is probably a decent choice. For example:
skips = {
'home': [('home', '/pic'), ('home', '/jpg')],
'etc': [('etc', '/pic'), ('etc', '/jpg')],
'tmp': [('tmp', '/pic'), ('tmp', '/jpg')],
}
An illustrated usage:
for name in skips:
sks = skips[name]
print(name, sks)

How to structure a list with JSON objects in Python?

I got a list in Python with Twitter user information and exported it with Pandas to an Excel file.
One row is one Twitter user with nearly all information of the user (name, #-tag, location etc.)
Here is my code to create the list and fill it with the user data:
def get_usernames(userids, api):
fullusers = []
u_count = len(userids)
try:
for i in range(int(u_count/100) + 1):
end_loc = min((i + 1) * 100, u_count)
fullusers.extend(
api.lookup_users(user_ids=userids[i * 100:end_loc])
)
print('\n' + 'Done! We found ' + str(len(fullusers)) + ' follower in total for this account.' + '\n')
return fullusers
except:
import traceback
traceback.print_exc()
print ('Something went wrong, quitting...')
The only problem is that every row is in JSON object and therefore one long comma-seperated string. I would like to create headers (no problem with Pandas) and only write parts of the string (i.e. ID or name) to colums.
Here is an example of a row from my output.xlsx:
User(_api=<tweepy.api.API object at 0x16898928>, _json={'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')
I have two ideas, but I don't know how to realize them due to my lack of skills and experience with Python.
Create a loop which saves certain parts ('id','name' etc.) from the JSON-string in colums.
Cut off the User(_api=<tweepy.api. API object at 0x16898928>, _json={ at the beginning and ) at the end, so that I may export they file as CSV.
Could anyone help me out with one of my two solutions or suggest a "simple" way to do this?
fyi: I want to do this to gather data for my thesis.
Try the python json library:
import json
jsonstring = "{'id': 12345, 'id_str': '12345', 'name': 'Jane Doe', 'screen_name': 'jdoe', 'location': 'Nirvana, NI', 'description': 'Just some random descrition')"
jsondict = json.loads(jsonstring)
# type(jsondict) == dictionary
Now you can just extract the data you want from it:
id = jsondict["id"]
name = jsondict["name"]
newdict = {"id":id,"name":name}

Best way to add dictionary entry and append to JSON file in Python

I have a need to add entries to a dictionary with the following keys:
name
element
type
I want each entry to append to a JSON file, where I will access them for another piece of the project.
What I have below technically works, but there are couple things(at least) wrong with this.
First, it doesn't prevent duplicates being entered. For example I can have 'xyz', '4444' and 'test2' appear as JSON entries multiple times. Is there a way to correct this?
Is there a cleaner way to write the actual data entry piece so when I am entering these values into the dictionary it's not directly there in the parentheses?
Finally, is there a better place to put the JSON piece? Should it be inside the function?
Just trying to clean this up a bit. Thanks
import json
element_dict = {}
def add_entry(name, element, type):
element_dict["name"] = name
element_dict["element"] = element
element_dict["type"] = type
return element_dict
#add entry
entry = add_entry('xyz', '4444', 'test2')
#export to JSON
with open('elements.json', 'a', encoding="utf-8") as file:
x = json.dumps(element_dict, indent=4)
file.write(x + '\n')
There are several questions here. The main points worth mentioning:
Use can use a list to hold your arguments and use *args to unpack when you supply them to add_entry.
To check / avoid duplicates, you can use set to track items already added.
For writing to JSON, now you have a list, you can simply iterate your list and write in one function at the end.
Putting these aspects together:
import json
res = []
seen = set()
def add_entry(res, name, element, type):
# check if in seen set
if (name, element, type) in seen:
return res
# add to seen set
seen.add(tuple([name, element, type]))
# append to results list
res.append({'name': name, 'element': element, 'type': type})
return res
args = ['xyz', '4444', 'test2']
res = add_entry(res, *args) # add entry - SUCCESS
res = add_entry(res, *args) # try to add again - FAIL
args2 = ['wxy', '3241', 'test3']
res = add_entry(res, *args2) # add another - SUCCESS
Result:
print(res)
[{'name': 'xyz', 'element': '4444', 'type': 'test2'},
{'name': 'wxy', 'element': '3241', 'type': 'test3'}]
Writing to JSON via a function:
def write_to_json(lst, fn):
with open(fn, 'a', encoding='utf-8') as file:
for item in lst:
x = json.dumps(item, indent=4)
file.write(x + '\n')
#export to JSON
write_to_json(res, 'elements.json')
you can try this way
import json
import hashlib
def add_entry(name, element, type):
return {hashlib.md5(name+element+type).hexdigest(): {"name": name, "element": element, "type": type}}
#add entry
entry = add_entry('xyz', '4444', 'test2')
#Update to JSON
with open('my_file.json', 'r') as f:
json_data = json.load(f)
print json_data.values() # View Previous entries
json_data.update(entry)
with open('elements.json', 'w') as f:
f.write(json.dumps(json_data))

Looking for a better data structure in python

I have some basic data that I want to store and I'm looking for a better solution then what I've come up with.
I have multiple entries of data with 4 fields per entry, name, url, currYear, availYears
I can solve this with a simple array of arrays like so:
data = [
['test-name', ['http://example.com', '2015', '2015,2014']]
['next-name', ['http://example.org', '1999', '1999']]
]
But this gets messy when trying to access data in each array. I end up with a for loop like this
for each in data:
name = each[0]
url = each[1][0]
currYear = each[1][1]
I'd prefer to do something similar to a dict where I can reference what I want by a key name. This isn't valid syntax, but hopefully it gets the point across.
data = {'entry1': {'name': 'test-name'}, {'url': 'http://example.com'}, {'currYear': '2015'}, {'availYears': '2015,2014'}}
Then I could pull the url data for entryX.
EDIT:
Several good responses. I decided to go with creating a class since 1) it satisfies my need 2) helps clean up the code by segregating functionality and 3) learn how packages, modules and classes work compared to Java (which I'm more familiar with).
In addition to creating the class, I also created getters and setters.
class SchoolSiteData(object):
def __init__(self, name, url, currYear, availYears):
self.name = name
self.url = url
self.currYear = currYear
self.availYears = availYears
def getName(self):
return self.name
def getURL(self):
return self.url
def getCurrYear(self):
return self.currYear
def getAvailYears(self):
return self.availYears
def setName(self, name):
self.name = name
def setURL(self, url):
self.url = url
def setCurrYear(self, currYear):
self.currYear = currYear
def setAvailYears(self, availYears):
self.availYears = availYears
A class may make this easier to use: eg:
class Entry(object):
def __init__(self, name, url, currYear, availYears):
self.name = name
self.url = url
self.currYear = currYear
self.availYears = availYears
entry1 = Entry('test-name', 'http://example.com', '2015', '2015,2014')
entry2 = Entry('next-name', 'http://example.org', '1999', '1999')
data = [entry1, entry2]
for entry in data:
print entry.name
print entry.url
print entry.currYear
print entry.availYears
print
Use the names as the keys in a dictionary:
data = {'test-name':
{'url': 'http://example.com',
'currYear': '2015',
'availYears': '2015,2014'
}
}
Access like so:
data['test-data']['url']
You seem to have needlessly complicated things with the list-in-list solution. If you keep it a little flatter, you can just unpack the rows into variables:
data = [
['test-name', 'http://example.com', '2015', '2015,2014'],
['next-name', 'http://example.org', '1999', '1999']
]
for name, url, currYear, availYears in data:
....
The most light-weight solution for what you want is probably a namedtuple.
>>> from collections import namedtuple
>>> mytuple = namedtuple("mytuple", field_names="url currYear availYears")
>>> data = [ 'test-name': mytuple('http://example.com', '2015', '2015,2014'), ...
... ]
>>> print(data['test-name'])
mytuple(url='http://example.com', currYear='2015', availYears='2015,2014')
You can access members by numerical index or by name:
>>> x = data['test-name']
>>> print(x.currYear)
2015
>>> print(x[1])
2015
data = [
{'name': 'test-name', 'url': 'http://example.com', 'currYear': '2015', 'availYears': '2015,2014'},
{'name': 'next-name', 'url': 'http://example.org', 'currYear': '1999', 'availYears': '1999'}
]
for each in data:
name = each['name']
url = each['url']
currYear = each['currYear']

Python Dictionaries & CSV Values | Check CSV

The csv file works fine. So does the dictionary but I can't seem to check the values in the csv file to make sure I'm not adding duplicate entries. How can I check this? The code I tried is below:
def write_csv():
csvfile = csv.writer(open("address.csv", "a"))
check = csv.reader(open("address.csv"))
for item in address2:
csvfile.writerow([address2[items]['address']['value'],address2[items]['address']['count'],items, datetime.datetime.now()])
def check_csv():
check = csv.reader(open("address.csv"))
csvfile = csv.writer(open("address.csv", "a"))
for stuff in address2:
address = address2[str(stuff)]['address']['value']
for sub in check:
if sub[0] == address:
print "equals"
try:
address2[stuff]['delete'] = True
except:
address2[stuff]['delete'] = True
else:
csvfile.writerow([address2[stuff]['address']['value'], address2[stuff]['address']['count'], stuff, datetime.datetime.now()])
Any ideas?
Your CSV and dict structures are a little wonky - I'd love to know if that is set or if you can change them to be more useful. Here is an example that does basically what you want -- you'll have to change some things to fit your format. The most important change is probably not writing to a file that you are reading - that is going to lead to headaches.
This does what you asked with the delete flag -- is there an external need for this? If not there is almost certainly a better way (removing the bad rows, saving the good rows somewhere else, etc - depends on what you are doing).
Anyway, here is the example. I used just the commented block to create the csv file in the first place, then added the new address to the list and ran the rest. Instead of looping through the file over and over it makes a lookup dict by address and stores the row number, which it then uses to update the delete flag if it is found when it reads the csv file. You'll want to take the prints out and uncomment the last line to actually write the new rows.
import csv, datetime
addresses = [
{'address': {'value': '123 road', 'count': 1}, 'delete': False},
{'address': {'value': '456 road', 'count': 1}, 'delete': False},
{'address': {'value': '789 road', 'count': 1}, 'delete': False},
{'address': {'value': '1 new road', 'count': 1}, 'delete': False},
]
now = datetime.datetime.now()
### create the csv
##with open('address.csv', 'wb') as csv_file:
## writer = csv.writer(csv_file)
## for row in addresses:
## writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])
# make lookup keys for the dict
address_lookup = {}
for i in range(len(addresses)):
address_row = addresses[i]
address_lookup[address_row['address']['value']] = i
# read csv once
with open('address.csv', 'rb') as csv_file:
reader = csv.reader(csv_file)
for row in reader:
print row
# if address is found in the dict, set delete flag to true
if row[0] in address_lookup:
print 'flagging address as old: %s' % row[0]
addresses[ address_lookup[row[0]] ]['delete'] = True
with open('address.csv', 'ab') as csv_file:
# go back through addresses and add any that shouldnt be deleted to the csv
writer = csv.writer(csv_file)
for address_row in addresses:
if address_row['delete'] is False:
print 'adding row: '
print address_row
#writer.writerow([ row['address']['value'], row['address']['count'], now.strftime('%Y-%m-%d %H:%M:%S') ])

Categories