Merging list objects in python - python

I have a pickle file having many objects. I need to have one proper object by combining all the other objects in the file. How can I do that. I tried using many commands, but none seems to work.
objs = []
while True:
try:
f = open(picklename,"rb")
objs.append(pickle.load(f))
f.close()
except EOFError:
break
Like the one above as shown.
OBJECT stored image :
<nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7fb172819198>
<nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7fb1719ce4a8>
<nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7fb1723caeb8>
<nltk.classify.naivebayes.NaiveBayesClassifier object at 0x7fb172113588>

You should use .extend() to append all items in the list to objs:
(Assuming pickle.load(f) returns a list of objects)
objs.extend(pickle.load(f))

Related

pickle.load() adds a None object at the end

Hi there, I'm learning Python and trying to create a command-line Address Book as a dictionary, which is saved to a file using pickle (that's the briefing).
I've coded the add_contact and browse_contact functions, and they both work well. I have added a few contacts (whose names are simply "Test One", "Test Two" ... "Test Ten") plus their emails and phone numbers to the Address Book.
However, when I code the search_contact and modify_contact functions, I'll need to load the file back into a dictionary using pickle.load().
The problem is, as each contact was added one by one to the Address Book, if I simply use the following code, it will only return the first object in the Address Book:
with open("addressbook.data", "rb") as f:
loaded_contacts = pickle.load(f)
print(loaded_contacts)
Output:
{'Test One': ('test.one#jdsofj.com', '39893849')}
That's because "Pickle serializes a single object at a time, and reads back a single object." Based on the solution suggested here, I've changed to the following code, so I can load back all the objects into a dictionary called loaded_contacts:
with open("addressbook.data", "rb") as f:
while True:
try:
loaded_contacts = print(pickle.load(f))
except EOFError:
break
That seems to work, but the loaded dictionary from the file will have an extra None object at the end, as shown below once loaded_contacts is printed out:
{'Test One': ('test.one#jdsofj.com', '39893849')}
{'Test Two': ('test.two#clajdf.com', '93294798374')}
.
.
.
{'Test Ten': ('test.ten#oajfd.com', '79854399')}
None
Consequently, when I try to search for a name like "Test One" and try to retrieve its value, I will get TypeError: 'NoneType' object is not subscriptable because there is a None object at the end.
Here's the code (it needs more work but you'll get the idea):
with open("addressbook.data", "rb") as f:
while True:
try:
loaded_contacts = print(pickle.load(f))
except EOFError:
break
search_name = input("Please enter a name: ")
print(loaded_contacts[search_name])
Here's the error message after I enter a name:
TypeError Traceback (most recent call last)
/var/.../x.py in <module>
11
12 search_name = input("Please enter a name: ")
---> 13 print(loaded_contacts[search_name])
14
TypeError: 'NoneType' object is not subscriptable
Not sure what I did wrong, or why there's an extra None object at the end of the loaded dictionary (or how to remove it). I know I could use a list to store all the values as strings, but the briefing is to create a dictionary to hold all the values and use the dictionary built-in methods to add, delete and modify the contacts--and hence my question here.
Edited to update:
Answer to the question (thanks #JohnGordon and #ewong):
In loaded_contacts = print(pickle.load(f)), print() won’t return anything, so assigning a variable to print() will return Noun.
However, simply removing print() won’t fully work—it solves the ‘None’ problem, but loaded_contacts will only be assigned to value of the last iteration of pickle.load(f).
Here's how to iterate all the objects in pickle.load(f) (if they were initially added one-by-one) and then store all of them in a dictionary:
loaded_contacts = {} # create an empty dictionary
with open("addressbook.data", "rb") as f: # open the file
while True: # looping pickle.load(f)
try:
# add each object to the dictionary
loaded_contacts.update(pickle.load(f))
except EOFError:
break # stop looping at the end of the file
loaded_contacts = print(pickle.load(f))
This is your problem. You're assigning loaded_contacts to the return value of print(), which doesn't return anything, so it returns None by default.
Do this instead:
loaded_contacts = pickle.load(f)
print(loaded_contacts)

TypeError: 'open' object does not support indexing, but can still be iterated over. How to get the first n results of an Open object?

I can iterate over an Open object using this code
with jsonl.open("train.dataset", gzip = True) as train_file:
for entry in train_file:
print(entry["summary"], entry["text"])
But say that I only want the first 10 results. This code
with jsonl.open("train.dataset", gzip = True) as train_file:
for i in range(0, 10):
print(train_file[i]["summary"], train_file[i]["text"])
results in
TypeError: 'open' object does not support indexing
If an object can be iterated over, why can't it support indexing to directly access parts. And is there an alternative way to get data at a particular index, and/or only retrieve the first n results?
If train_file is a list, you can use a slice:
with jsonl.open("train.dataset", gzip=True) as train_file:
for entry in train_file[0:10]:
print(entry["summary"], entry["text"])
If train_file is an iterable, you can use itertools.islice:
import itertools
with jsonl.open("train.dataset", gzip=True) as train_file:
for entry in itertools.islice(train_file, 10):
print(entry["summary"], entry["text"])

python csv TypeError: unhashable type: 'list'

Hi Im trying to compare two csv files and get the difference. However i get the above mentioned error. Could someone kindly give a helping hand. Thanks
import csv
f = open('ted.csv','r')
psv_f = csv.reader(f)
attendees1 = []
for row in psv_f:
attendees1.append(row)
f.close
f = open('ted2.csv','r')
psv_f = csv.reader(f)
attendees2 = []
for row in psv_f:
attendees2.append(row)
f.close
attendees11 = set(attendees1)
attendees12 = set(attendees2)
print (attendees12.difference(attendees11))
When you iterate csv reader you get lists, so when you do
for row in psv_f:
attendees2.append(row)
Row is actually a list instance. so attendees1 / attendees2 is a list of lists.
When you convert it to set() it need to make sure no item appear more than once, and set() relay on hash function of the items in the list. so you are getting error because when you convert to set() it try to hash a list but list is not hashable.
You will get the same exception if you do something like this:
set([1, 2, [1,2] ])
More in sets: https://docs.python.org/2/library/sets.html
Happened on the line
attendees11 = set(attendees1)
didn't it? You are trying to make a set from a list of lists but it is impossible because set may only contain hashable types, which list is not. You can convert the lists to tuples.
attendees1.append(tuple(row))
Causes you created list of list:
attendees1.append(row)
Like wise:
attendees2.append(row)
Then when you do :
attendees11 = set(attendees1)
The error will be thrown
What you should do is :
attendees2.append(tuple(row))

TypeError: string indices must be integers when parsing JSON

This code throws a TypeError: string indices must be integers when parsing JSON. Why is that?
ids=[]
for line in f:
k = json.loads(line)
if "person" in k:
for id in ids:
if k["iden"] == id[0]: #Exception raised here
#Do some processing
else:
ids += [(k["iden"], 1)]
json.loads(line) gives you a string (in your case), not a dictionary. If you've got a dictionary, you can use dictionary['whatever'], but your_str['other_str'] won't work. Check your json file, it may be containing some unwanted data.
Here's the documentation on json.loads(s):
Deserialize s (a str or unicode instance containing a JSON document) to a Python object.
In your case, that Python object is a string, not a dictionary.
I'll guess f is your file.readlines(), and you have something like this before:
my_json_file = open('/path/to/file.json')
f = my_json_file.readlines()
my_json_file.close()
Try, instead of doing readlines(), to pass the file directly to json.load:
my_json_file = open('/path/to/file.json')
k = json.loads(my_json_file)
my_json_file.close()

Sum JSON objects and get a JSON file in Python

I have a JSON file with many objects. I want to filter it to discard all the objects that does not have a specific field called ´id´. I developed a piece of code but it does not work:
import json
b=open("all.json","r")
sytems_objs=json.loads(b.read())
flag=0
for i in range(len(sytems_objs)):
if sytems_objs[i]["id"]<>None:
if flag==0:
total=sytems_objs[i]
flag=1
else:
total=total+sytems_objs[i]
file1=open("filtered.json","w+")
json.dump(total, file1)
c=open("filtered.json","r")
sytems_objs2=json.loads(b.read())
I get a Error: ValueError: No JSON object could be decoded
What am I doing wrong?
I'm assuming that system_objs is originally an array of objects
system_objs = json.loads(b.read())
# create a list that only have dicts with the property 'id'
# just read the comment to also include if id is not null
system_objs = [o for o in system_objs if 'id' in o and o['id'] is not None]
# how many dicts have 'id' with it
print len(system_objs)
# write the system objs to a json file
with open('filtered.json', 'w') as f:
f.write(json.dumps(system_objs))

Categories