Looking for a better data structure in python - python

I have some basic data that I want to store and I'm looking for a better solution then what I've come up with.
I have multiple entries of data with 4 fields per entry, name, url, currYear, availYears
I can solve this with a simple array of arrays like so:
data = [
['test-name', ['http://example.com', '2015', '2015,2014']]
['next-name', ['http://example.org', '1999', '1999']]
]
But this gets messy when trying to access data in each array. I end up with a for loop like this
for each in data:
name = each[0]
url = each[1][0]
currYear = each[1][1]
I'd prefer to do something similar to a dict where I can reference what I want by a key name. This isn't valid syntax, but hopefully it gets the point across.
data = {'entry1': {'name': 'test-name'}, {'url': 'http://example.com'}, {'currYear': '2015'}, {'availYears': '2015,2014'}}
Then I could pull the url data for entryX.
EDIT:
Several good responses. I decided to go with creating a class since 1) it satisfies my need 2) helps clean up the code by segregating functionality and 3) learn how packages, modules and classes work compared to Java (which I'm more familiar with).
In addition to creating the class, I also created getters and setters.
class SchoolSiteData(object):
def __init__(self, name, url, currYear, availYears):
self.name = name
self.url = url
self.currYear = currYear
self.availYears = availYears
def getName(self):
return self.name
def getURL(self):
return self.url
def getCurrYear(self):
return self.currYear
def getAvailYears(self):
return self.availYears
def setName(self, name):
self.name = name
def setURL(self, url):
self.url = url
def setCurrYear(self, currYear):
self.currYear = currYear
def setAvailYears(self, availYears):
self.availYears = availYears

A class may make this easier to use: eg:
class Entry(object):
def __init__(self, name, url, currYear, availYears):
self.name = name
self.url = url
self.currYear = currYear
self.availYears = availYears
entry1 = Entry('test-name', 'http://example.com', '2015', '2015,2014')
entry2 = Entry('next-name', 'http://example.org', '1999', '1999')
data = [entry1, entry2]
for entry in data:
print entry.name
print entry.url
print entry.currYear
print entry.availYears
print

Use the names as the keys in a dictionary:
data = {'test-name':
{'url': 'http://example.com',
'currYear': '2015',
'availYears': '2015,2014'
}
}
Access like so:
data['test-data']['url']

You seem to have needlessly complicated things with the list-in-list solution. If you keep it a little flatter, you can just unpack the rows into variables:
data = [
['test-name', 'http://example.com', '2015', '2015,2014'],
['next-name', 'http://example.org', '1999', '1999']
]
for name, url, currYear, availYears in data:
....

The most light-weight solution for what you want is probably a namedtuple.
>>> from collections import namedtuple
>>> mytuple = namedtuple("mytuple", field_names="url currYear availYears")
>>> data = [ 'test-name': mytuple('http://example.com', '2015', '2015,2014'), ...
... ]
>>> print(data['test-name'])
mytuple(url='http://example.com', currYear='2015', availYears='2015,2014')
You can access members by numerical index or by name:
>>> x = data['test-name']
>>> print(x.currYear)
2015
>>> print(x[1])
2015

data = [
{'name': 'test-name', 'url': 'http://example.com', 'currYear': '2015', 'availYears': '2015,2014'},
{'name': 'next-name', 'url': 'http://example.org', 'currYear': '1999', 'availYears': '1999'}
]
for each in data:
name = each['name']
url = each['url']
currYear = each['currYear']

Related

Read List from stringname append

I have the following problem, I want to reference a variable from a string so that I can call up a list.
I enter the user into the function def fetch(user). e.g. name1
I would like from name1, read the list name1_skiplist
or from name2 read name2_skiplist
name1_skiplist = [('home', '/pic'),('home', '/jpg'),]
name2_skiplist = [('etc', '/pic'),('etc', '/jpg'),]
name3_skiplist = [('tmp', '/pic'),('tmp', '/jpg'),]
def fetch(user):
joinedlist = []
joinedlist = user + '_skiplist'
if joinedlist:
....
Dict is more suited for you use case to retrieve list based on your key.
data = {'name1_skiplist': [('home', '/pic'), ('home', '/jpg'), ],
'name2_skiplist': [('etc', '/pic'), ('etc', '/jpg'), ],
'name3_skiplist': [('tmp', '/pic'), ('tmp', '/jpg'), ]}
def fetch(user):
joinedlist = user + '_skiplist'
result = data.get(joinedlist)
return result
Organize related information in collections -- data structures like dicts, lists,
tuples, namedtuples, dataclasses, etc. In your case, assuming I understand
your goal, a dict is probably a decent choice. For example:
skips = {
'home': [('home', '/pic'), ('home', '/jpg')],
'etc': [('etc', '/pic'), ('etc', '/jpg')],
'tmp': [('tmp', '/pic'), ('tmp', '/jpg')],
}
An illustrated usage:
for name in skips:
sks = skips[name]
print(name, sks)

Additional key: values not being added to Python Dictionary

Summary problem: Building an API endpoint and trying to push new key:values to the existing API. Not knowing if I am correctly adding key value pairs or not. Familiar with Ruby but first time Python user!
Context:
I currently have a method that will format given information into a Python Dictionary to be used as a JSON for my API. I have one method that pushes information to this method but another that is not functioning. Can anybody spot why?
Things I've tried:
Feature test using command line environment
Getting visibility - Printing
Environment:
MacOS 11.1, python 3.9.1, VSCode
Code:
METHOD THAT IS FORMATTING
class Database(object):
def __init__(self):
self.data = {}
def insert_entity(self, kind, entity):
kind_dict = self.data.get(kind, {})
entity_id = entity.get('id', str(uuid4()))
if not isinstance(entity_id, str):
raise Exception('Entity `id` must be a string')
entity['id'] = entity_id
kind_dict[entity_id] = entity
self.data[kind] = kind_dict
return entity
def get_entity(self, kind, entity_id):
entities = self.data.get(kind, {})
return entities.get(entity_id, None)
def get_all_entities(self, kind):
return list(self.data.get(kind, {}).values())
METHOD THAT IS WORKING:
def initialise_user_data():
first_names = ['Ron', 'Paul', 'Simon', 'David', 'Phil', 'Ada', 'Julia']
last_names = [
'Legend', 'Mac', 'Stuartson', 'Sili', 'Word', 'Nine',
'Smith'
]
for index in range(len(first_names)):
first_name = first_names[index]
last_name = last_names[index]
email = str(random.randint(0, 9999)) + "#email.com"
user_data = {
'firstName': first_name,
'lastName': last_name,
'email': email
}
database.insert_entity('User', user_data)
METHOD THAT IS NOT WORKING:
def initialise_event_data():
users = database.get_all_entities('User')
for user in users:
for _ in range(random.randint(0, 10)):
database.insert_entity(
'Event', {
'userId': user['id'],
'points': 100,
'eventName': 'levels_completed'
})
ALL METHODS ARE INVOKED AS SUCH:
database = InMemoryDatabase()
initialise_data()
def initialise_data():
initialise_user_data()
initialise_event_data()
initialise_follow_data()

How to select particular JSON object with specific value?

I have List of multiple dictionaries inside it(as JSON ).I have a list of value and based on that value I want that JSON object for that particular value. For eg.
[{'content_type': 'Press Release',
'content_id': '1',
'Author':John},
{'content_type': 'editorial',
'content_id': '2',
'Author': Harry
},
{'content_type': 'Article',
'content_id': '3',
'Author':Paul}]
I want to to fetch complete object where author is Paul.
This is the code I have made so far.
import json
newJson = "testJsonNewInput.json"
ListForNewJson = []
def testComparision(newJson,oldJson):
with open(newJson, mode = 'r') as fp_n:
json_data_new = json.load(fp_n)
for jData_new in json_data_new:
ListForNewJson.append(jData_new['author'])
If any other information required, please ask.
Case 1
One time access
It is perfectly alright to read your data and iterate over it, returning the first match found.
def access(f, author):
with open(file) as f:
data = json.load(f)
for d in data:
if d['Author'] == author:
return d
else:
return 'Not Found'
Case 2
Repeated access
In this instance, it would be wise to reshape your data in such a way that accessing objects by author names is much faster (think dictionaries!).
For example, one possible option would be:
with open(file) as f:
data = json.load(f)
newData = {}
for d in data:
newData[d['Author']] = d
Now, define a function and pass your pre-loaded data along with a list of author names.
def access(myData, author_list):
for a in author_list:
yield myData.get(a)
The function is called like this:
for i in access(newData, ['Paul', 'John', ...]):
print(i)
Alternatively, store the results in a list r. The list(...) is necessary, because yield returns a generator object which you must exhaust by iterating over.
r = list(access(newData, [...]))
Why not do something like this? It should be fast and you will not have to load the authors that wont be searched.
alreadyknown = {}
list_of_obj = [{'content_type': 'Press Release',
'content_id': '1',
'Author':'John'},
{'content_type': 'editorial',
'content_id': '2',
'Author': 'Harry'
},
{'content_type': 'Article',
'content_id': '3',
'Author':'Paul'}]
def func(author):
if author not in alreadyknown:
obj = get_obj(author)
alreadyknown[author] = obj
return alreadyknown[author]
def get_obj(auth):
return [obj for obj in list_of_obj if obj['Author'] is auth]
print(func('Paul'))

Python Praw ways to store data for calling later?

Is a dictionary the correct way to be doing this? Ideally this will be more then 5+ deep. Sorry my only language experience is powershell there I would just make an array of object. Im not looking for someone to write the code I just wanna know if there is a better way?
Thanks
Cody
My Powershell way:
[$title1,$title2,$title3]
$titleX.comment = "comment here"
$titleX.comment.author = "bob"
$titleX.comment.author.karma = "200"
$titleX.comment.reply = "Hey Bob love your comment."
$titleX.comment.reply.author = "Alex"
$titleX.comment.reply.reply = "I disagree"
#
Python code Borken:
import praw
d = {}
reddit = praw.Reddit(client_id='XXXX',
client_secret='XXXX',
user_agent='android:com.example.myredditapp:'
'v1.2.3 (by /u/XXX)')
for submission in reddit.subreddit('redditdev').hot(limit=2):
d[submission.id] = {}
d[submission.id]['comment'] = {}
d[submission.id]['title']= {}
d[submission.id]['comment']['author']={}
d[submission.id]['title'] = submission.title
mySubmission = reddit.submission(id=submission.id)
mySubmission.comments.replace_more(limit=0)
for comment in mySubmission.comments.list():
d[submission.id]['comment'] = comment.body
d[submission.id]['comment']['author'] = comment.author.name
print(submission.title)
print(comment.body)
print(comment.author.name)
print(d)
File "C:/git/tensorflow/Reddit/pull.py", line 23, in <module>
d[submission.id]['comment']['author'] = comment.author.name
TypeError: 'str' object does not support item assignment
#
{'6xg24v': {'comment': 'Locking this version. Please comment on the [original post](https://www.reddit.com/r/changelog/comments/6xfyfg/an_update_on_the_state_of_the_redditreddit_and/)!', 'title': 'An update on the state of the reddit/reddit and reddit/reddit-mobile repositories'}}
I think your approach using a dictionary is okay, but you might also solve this by using a data structure for your posts: Instead of writing
d[submission.id] = {}
d[submission.id]['comment'] = {}
d[submission.id]['title']= {}
d[submission.id]['comment']['author']={}
d[submission.id]['title'] = submission.title
you could create a class Submission like this:
class Submission(object):
def __init__(self, id, author, title, content):
self.id = id
self.author = author
self.title = title
self.content = content
self.subSubmissions = {}
def addSubSubmission(self,submission):
self.subSubmission[submission,id] = submission
def getSubSubmission(self,id):
return self.subSubmission[id]
by using you could change your code to this
submissions = {}
for sm in reddit.subreddit('redditdev').hot(limit=2):
submissions[sm.id] = Submission(sm.id, sm.author, sm.title, sm.content)
# I am not quite sure what these lines are supposed to do, so you might be able to improve these, too
mySubmission = reddit.submission(id=sm.id)
mySubmission.comments.replace_more(limit=0)
for cmt in mySubmission.comments.list():
submissions[sm.id].addSubSubmission(Submission(cmt.id, cmt.title, cmt.author, cmt.body))
By using this apporach you are also able to export the code to readout the comments/subSubmissions into an extra function which can call itself recursively, so that you can read infitive depths of the comments.

Use Json data to initialize an object in python?

Here is what I got right now.
import urllib2
import json
from pprint import pprint
response = urllib2.urlopen('http://census.soe.com/get/ps2:v2/weapon_datasheet?c:start=0&c:limit=1&c:show=capacity,clip_size,damage,fire_rate_ms,item_id,reload_ms')
response1 = urllib2.urlopen('http://census.soe.com/get/ps2:v2/item?c:start=0&c:limit=1&c:show=name.en,description.en,item_id')
data = json.load(response)
data1 = json.load(response1)
pprint(data)
pprint(data1)
class Weapon(object):
"""Creates a PlanetSide2 Weapon"""
def __init__(self, capacity, clip_size, damage, fire_rate_ms, itemd_id,
reload_ms, description, name):
self.capacity = capacity
self.clip_size = clip_size
self.damage = damage
self.fire_rate_ms = fire_rate_ms
self.item_id = item_id
self.reload_ms = reload_ms
self.description = description
self.name = name
right now my data looks like this.
{u'returned': 1,
u'weapon_datasheet_list': [{u'capacity': u'210',
u'clip_size': u'30',
u'damage': u'143',
u'fire_rate_ms': u'75',
u'item_id': u'73',
u'reload_ms': u'2455'}]}
{u'item_list': [{u'description': {u'en': u"The New Conglomerate's Mag-Cutter features a powerful electromagnet capable of cutting through enemy body armor."},
u'item_id': u'1',
u'name': {u'en': u'Mag-Cutter'}}],
u'returned': 1}
Is there a way to use the data from the json to initialize a weapon object with the name of the weapon?
for example. Mag-Cutter = Weapon(data from json file)
How would I got about reading setting the Weapon class variables from the json file data?
Sure, use the first element of data['weapon_datasheet_list'] plus some data from the first element of data1['item_list']:
name = data1['item_list'][0]['name']['en']
description = data1['item_list'][0]['description']['en']
mag_cutter = Weapon(name=name, description=description,
**data['weapon_datasheet_list'][0])
This applies all of the first weapon_datasheet_list item as keyword arguments to the Weapon() constructor, matching keys from that dictionary to the argument names of the constructor. The remaining two items, name and description, I supplied manually.
This does mean you need to correct a typo in the Weapon.__init__ signature; itemd_id should be spelled item_id to match the JSON structure.
Demo:
>>> import urllib2
>>> import json
>>> from pprint import pprint
>>> response = urllib2.urlopen('http://census.soe.com/get/ps2:v2/weapon_datasheet?c:start=0&c:limit=1&c:show=capacity,clip_size,damage,fire_rate_ms,item_id,reload_ms')
>>> response1 = urllib2.urlopen('http://census.soe.com/get/ps2:v2/item?c:start=0&c:limit=1&c:show=name.en,description.en,item_id')
>>> data = json.load(response)
>>> data1 = json.load(response1)
>>> class Weapon(object):
... """Creates a PlanetSide2 Weapon"""
... def __init__(self, capacity, clip_size, damage, fire_rate_ms, item_id,
... reload_ms, description, name):
... self.capacity = capacity
... self.clip_size = clip_size
... self.damage = damage
... self.fire_rate_ms = fire_rate_ms
... self.item_id = item_id
... self.reload_ms = reload_ms
... self.description = description
... self.name = name
...
>>> name = data1['item_list'][0]['name']['en']
>>> description = data1['item_list'][0]['description']['en']
>>> mag_cutter = Weapon(name=name, description=description,
... **data['weapon_datasheet_list'][0])
>>> pprint(vars(mag_cutter))
{'capacity': u'210',
'clip_size': u'30',
'damage': u'143',
'description': u"The New Conglomerate's Mag-Cutter features a powerful electromagnet capable of cutting through enemy body armor.",
'fire_rate_ms': u'75',
'item_id': u'73',
'name': u'Mag-Cutter',
'reload_ms': u'2455'}

Categories