Remove multiple keys from dictionary from txt file using python - python

So I have a txt file with this reddit comments:
There's a lot of information on each line of the dict, and I only want 2 elements from there, which is author and body. I'm trying to iterate over each line of the file, to remove the unnecessary information and keep only those two. I searched a lot and I didn't found any thing that help me.
The output should be a new filename.txt with only author and body in the dict for each line.
I just realize that it is in json format. So I tried this:
The problem is, now when I remove the unnecessary elements, it also removes it's value.
listcomments = []
for line in open ('RC_2009-01.json', 'r'):
listcomments.append(json.loads(line))
#res = dict([(key, val) for key, val in comments.items() if key not in rem_list])
#print(res)
for line in listcomments:
rem_list = ['subreddit_id', 'name', 'author_flair_text', 'link_id', 'score_hidden', 'retrieved_on', 'controversiality',
'parent_id', 'subreddit', 'author_flair_css_class', 'created_utc', 'gilded', 'archived', 'distinguished',
'id', 'edited', 'score', 'downs', 'ups']
list1 = [ele for ele in line if ele not in rem_list]
out_file = open("teste2.json", "w")
json.dump(list1, out_file, indent = 4)

Example in the image file is in json format. You have to parse json and get needed tags from the parser.
forward the following link :
https://www.w3schools.com/python/python_json.asp

You do this.
Say you have a dictionary like this below.
a={chr(i):j for i,j in zip(range(65,91),range(1,27))}
'''a={'A': 1, 'B': 2, 'C': 3, 'D': 4, 'E': 5, 'F': 6, 'G': 7, 'H': 8, 'I': 9,
'J': 10, 'K': 11, 'L': 12, 'M': 13, 'N': 14, 'O': 15, 'P': 16, 'Q': 17, 'R': 18,
'S': 19, 'T': 20, 'U': 21, 'V': 22, 'W': 23, 'X': 24, 'Y': 25, 'Z': 26}'''
And you want to extract only 'A' and 'C'.
wanted_key=['A','C']
res={key:a.get(key) for key in wanted_key}
print(res)
output
{'A': 1, 'C': 3}

Related

Is there another way to extract info with complex/unstructured nested dict format in Python?

Suppose I have an unstructured nested dict as following:
{
'A_brand': {'score1': {'A': 13, 'K': 50}},
'B_brand': {'before_taste': {'score2': {'A': 43, 'D': 23}}, 'after_taste': {'score3': {'H': 36, 'J': 34}}},
'Score4': {'G': 2, 'W': 19}
}
How can I get/show the info like: Which letter get the highest score for each scores?
like:
{'key':'value',
'A_brand/score1':'K',
'B_brand/before_taste/score2':'A',
'B_brand/after_taste/score3':'H',
'Score4':'W'}
What I did was dummies way which I created a new dict and accessed into each path, sorted them by values and selected first one item, then added it into the new dict.
For example:
new_csv={'key':'value'}
first=data['A_brand']['before_lunch_break']['score1']
first_new=sorted(first.items(),key=lambda x: x[1],reverse=True)
new_csv['A_brand/score']=first_new[0][0]
second=data['B_brand']['before_taste']['score2']
second_new=sorted(second.items(),key=lambda x: x[1],reverse=True)
new_csv['B_brand/before_taste/score2']=second_new[0][0]
...
I'm wondering if there is any faster or automatic ways to do that?
You can use a generator with recursion:
data = {'A_brand': {'score1': {'A': 13, 'K': 50}}, 'B_brand': {'before_taste': {'score2': {'A': 43, 'D': 23}}, 'after_taste': {'score3': {'H': 36, 'J': 34}}}, 'Score4': {'G': 2, 'W': 19}}
def get_max(d, c = []):
for a, b in d.items():
if all(not isinstance(i, dict) for i in b.values()):
yield ('/'.join(c+[a]), max(b, key=lambda x:b[x]))
else:
yield from get_max(b, c+[a])
print(dict(get_max(data)))
Output:
{'A_brand/score1': 'K', 'B_brand/before_taste/score2': 'A', 'B_brand/after_taste/score3': 'H', 'Score4': 'W'}

Replacing each letter with its number self

I want to take each letter of a word (a is 1, b is 2, etc.), then add them all together to find the sum of all the numbers. For example, “apple” would be 50. I have this code:
conversions = {
'a': 1,
'b': 2,
'c': 3,
'd': 4,
'e': 5,
'f': 6,
'g': 7,
'h': 8,
'i': 9,
'j': 10,
'k': 11,
'l': 12,
'm': 13,
'n': 14,
'o': 15,
'p': 16,
'q': 17,
'r': 18,
's': 19,
't': 20,
'u': 21,
'v': 22,
'w': 23,
'x': 24,
'y': 25,
'z': 26
}
def conversion(word):
for letter in word:
word.replace(letter, str(conversions[letter]))
word = list(word)
for number in word:
number = int(number)
return sum(word)
However, this returns the following error:
invalid literal for int() with base 10
I’ve probably done some dumb mistake, but I can’t seem to figure out what the problem is. Any help would be much appreciated.
Strings are immutable. word.replace() returns the modifies string, it doesn't update word. And you can't change a string into a list of numbers anyway.
You don't need to use replace at all, just add up the conversions.
def conversion(word):
return sum(conversions[letter] for letter in word)
You are definitely making things unnecessarily complicated. All lowercase alphabetic characters are sequentially coded. Get their codes and add them up:
def conversion(word):
return sum(ord(x) - ord('a') + 1 for x in word)
conversion('apple')
#50
Beware that this code will not handle upper-case letters or punctuation correctly.
Try this (this will handle upper and lowercase):
def conversion(word):
return sum([ord(x.lower()) - 96 for x in word])
>>> conversion("AppLe")
50

Queryset filter: retrieve manytomany field as list of each object

I want to retrieve every object of a model with only their id and a list of ids they have in a many to many field.
My models :
class Wordlist(models.Model):
words = models.ManyToManyField(Word)
class Word(models.Model):
word = models.CharField(max_length=256)
I have this code :
list(Wordlist.objects.all().annotate(w=F('words')).values_list('pk', 'w'))
And it gives me this :
[{'pk': 1, 'w': 7},
{'pk': 1, 'w': 13},
{'pk': 1, 'w': 17},
{'pk': 2, 'w': 29},
{'pk': 1, 'w': 42},
{'pk': 3, 'w': 52},
{'pk': 2, 'w': 65}
...
]
What i want is :
[{'pk': 1, 'w': [7, 13, 17, 42,...]},
{'pk': 2, 'w': [29, 65,...]},
{'pk': 3, 'w': [52,...]},
...
]
A simple solution would be to combine the dicts based on their ids but I don't think it's a good practice and very efficient, since we could have dozens of thousands dicts as a result.
Also, I wondered if it was possible to do the opposite with a single request; retrieving a list of Wordlist a word is in for each word in a request on Word.
If you use PostgreSQL you can use array_agg function that implemented in django.contrib.postgres package. Your query will look like this:
from django.contrib.postgres.aggregates.general import ArrayAgg
Wordlist.objects.annotate(arr=ArrayAgg('words')).values_list('id', 'arr')

Python: Importing a graph

So I have this code here, which basically runs Dijkstra's Algortihm on a graph in Python, and then prints the distance and moves it takes to get from start to end. My problem is, I can't figure out how to import a graph as a .txt file using sys.argv and have it read it and run the algorithm on it. Here is my code, with a graph and start 'a' and end 'b' already filled into it. (It should work).
import sys
def shortestpath(graph,start,end,visited=[],distances={},predecessors={}):
"""Find the shortest path between start and end nodes in a graph"""
# we've found our end node, now find the path to it, and return
if start==end:
path=[]
while end != None:
path.append(end)
end=predecessors.get(end,None)
return distances[start], path[::-1]
# detect if it's the first time through, set current distance to zero
if not visited: distances[start]=0
# process neighbors as per algorithm, keep track of predecessors
for neighbor in graph[start]:
if neighbor not in visited:
neighbordist = distances.get(neighbor,sys.maxsize)
tentativedist = distances[start] + graph[start][neighbor]
if tentativedist < neighbordist:
distances[neighbor] = tentativedist
predecessors[neighbor]=start
# neighbors processed, now mark the current node as visited
visited.append(start)
# finds the closest unvisited node to the start
unvisiteds = dict((k, distances.get(k,sys.maxsize)) for k in graph if k not in visited)
closestnode = min(unvisiteds, key=unvisiteds.get)
# now we can take the closest node and recurse, making it current
return shortestpath(graph,closestnode,end,visited,distances,predecessors)
if __name__ == "__main__":
graph = {'a': {'w': 14, 'x': 7, 'y': 9},
'b': {'w': 9, 'z': 6},
'w': {'a': 14, 'b': 9, 'y': 2},
'x': {'a': 7, 'y': 10, 'z': 15},
'y': {'a': 9, 'w': 2, 'x': 10, 'z': 11},
'z': {'b': 6, 'x': 15, 'y': 11}}
print(shortestpath(graph,'a','b'))
"""
Result:
(20, ['a', 'y', 'w', 'b'])
"""
Now here is the graph that I am trying to import, it is called sample-map.txt:
{'a': {'b': 5, 'c': 8},
'b': {'a': 5, 'd': 6},
'c': {'a': 8, 'd': 2},
'd': {'b': 6, 'c': 2, 'e': 12, 'f': 2},
'e': {'d': 12, 'g': 3},
'f': {'d': 2, 'g': 7},
'g': {'e': 3, 'f':7}}
I just need to figure out how to import it using sys.argv and then have it take the place of the graph already in the .py. Also, being able to use sys.argv to define a starting point and end point would be nice too, something like in the format >python file.py start end sample-map.txt
Where
sys.argv[0] is file.py
sys.argv[1] is start
sys.argv[2] is end,
and sys.argv[3]
is the graph I want to import. Thank you!
If sys.argv[3] is the name of the file containing the graph you want to import, you can use ast.literal_eval:
with open(sys.argv[3], "r") as f:
graph = ast.literal_eval(f.read())
# If you don't trust your users, check that graph is indeed a dict
# and that it has the right structure
You have three options:
JSON encoder/decoder: Python dictionaries are quite similar to JSON in format and you can import files specified as dictionaries in json format and then convert them into python dictionary. That could work in your case. Have a close look at json.load(file) method and using something like json.load(sys.argv[3]) could be worth trying. Have a look at http://docs.python.org/2/library/json.html
You can alternatively read the entire file using readlines and manually convert lines into a dictionary. More cumbersome but doable
[EDIT] I just saw a comment talking about ast.literal_eval. Again, that does not give you a dictionary directly. You will have to write some code to convert it into a dictionary.
I think you want to transfer the content of the file "sample-map.txt" into a dict. If you can ensure that the content of that file obeys the syntax grammer of Python. You can use the code below to do this job:
# sys.argv[3] should be 'sample-map.txt'
graph=eval(open(sys.argv[3],'r').read())
This version will work:
import sys,json
def shortestpath(graph,start,end,visited=[],distances={},predecessors={}):
...
return shortestpath(graph,closestnode,end,visited,distances,predecessors)
if __name__ == "__main__":
start_node=sys.argv[1]
end_node=sys.argv[2]
filename=sys.argv[3]
#here we load file text content
with open(filename,"r") as f:
cont= f.read()
#here we convert text content to dict
graph=json.loads(cont)
print(shortestpath(graph,start_node,end_node))
You have to use " character instead of ' character to delimit keys & values in "sample-map.txt" file. (to respect JSON syntax)
Like this:
{ "a": {"b": 5, "c": 8},
"b": {"a": 5, "d": 6},
"c": {"a": 8, "d": 2},
"d": {"b": 6, "c": 2, "e": 12, "f": 2},
"e": {"d": 12, "g": 3},
"f": {"d": 2, "g": 7},
"g": {"e": 3, "f":7}
}
If you respect the JSON syntax in the graph text file, when you call your program from a terminal:
python shortestpath.py a b sample-map.txt
You will get the good answer !
>>>(5, ['a', 'b'])

Creating instance names from a list (Python)

I've got a function that builds a random number of object instances. For the sake of demonstrating the general idea, we're going to pretend that this is an algorithm to build a series of nethack-like rooms.
The requirements are such that I won't know how many instances there will be in advance; these are generated randomly and on-the-fly.
As a brief note: I am fully aware that the following code is nonfunctional, but it should (hopefully!) demonstrate my aims.
import random
class Levelbuild(object):
def __init__(self):
self.l1 = dict({0:'a',1:'b',2:'c',3:'d',4:'e',5:'f',6:'g',7:'h',8:'i'})
# Pick a random number between 4 and 9.
for i in range(random.randint(4,9)):
self.l1[i] = Roombuilder()
If we assume that the chosen random integer is 5, the ideal result would be 5 Roombuilder() instances; labeled a, b, c, d, and e, respectively.
Is there a simple way of doing this? Is there a way to do this period?
--Edit--
A giant "thank you" to Nick ODell for his answer. This isn't a complete copy/paste-- but it's a variation that works for what I need;
import random
class Room(object):
def __init__(self):
self.size = (5,5)
class Level(object):
def __init__(self):
roomnames = ['a','b','c','d','e','f','g','h','i']
self.rooms = {}
for i in range(random.randint(4, 9)):
self.rooms[roomnames[i]] = Room()
Rather than build each "room" by hand, I can now...
test = Level()
print test.rooms['a'].size
>>> (5,5)
import string
import random
class Levelbuild(object):
def __init__(self,min_room_count,max_room_count):
rooms_temp = [new RoomBuilder() for i in range(random.randint(min_room_count,max_room_count))]
self.l1 = dict(zip(string.ascii_lowercase, rooms_temp))
Note: This will fail silently if given more than 26 rooms.
You're pretty close, actually. You don't need a dict; just use a list. (And, by the way, {...} is already a dict, so wrapping it in dict() doesn't do anything.)
roomnames = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i']
self.rooms = []
for i in range(random.randint(4, 9)):
self.rooms.append(Roombuilder(roomnames[i]))
For what it's worth, putting builder in the name of a class is kind of funky. The objects are rooms, right? So the type should just be Room.
As another answer for a more general solution (mainly as a companion to Nick ODell's answer, if you want to handle any number of names, it's a pretty simply solution with an infinite generator:
import string
import itertools
def generate_names(characters):
for i in itertools.count(1):
for name in itertools.product(characters, repeat=i):
yield "".join(name)
You can then use it as you would any other generator:
>>>print(dict(zip(generate_names(string.ascii_lowercase), range(30))))
{'aa': 26, 'ac': 28, 'ab': 27, 'ad': 29, 'a': 0, 'c': 2, 'b': 1, 'e': 4, 'd': 3, 'g': 6, 'f': 5, 'i': 8, 'h': 7, 'k': 10, 'j': 9, 'm': 12, 'l': 11, 'o': 14, 'n': 13, 'q': 16, 'p': 15, 's': 18, 'r': 17, 'u': 20, 't': 19, 'w': 22, 'v': 21, 'y': 24, 'x': 23, 'z': 25}
If you need to generate actual names, then you have a few choices. If you need a real word, pick from a dictionary file (most Linux users have one in /usr/share/dict). If you need word-like things, then I have actually written a Python script to generate such 'words' using Markov Chains, which is available under the GPL. Naturally, you'd have to check these for uniqueness.

Categories