I have a list of about 50 strings with an integer representing how frequently they occur in a text document. I have already formatted it like shown below, and am trying to create a dictionary of this information, with the first word being the value and the key is the number beside it.
string = [('limited', 1), ('all', 16), ('concept', 1), ('secondly', 1)]
The code I have so far:
my_dict = {}
for pairs in string:
for int in pairs:
my_dict[pairs] = int
Like this, Python's dict() function is perfectly designed for converting a list of tuples, which is what you have:
>>> string = [('limited', 1), ('all', 16), ('concept', 1), ('secondly', 1)]
>>> my_dict = dict(string)
>>> my_dict
{'all': 16, 'secondly': 1, 'concept': 1, 'limited': 1}
Just call dict():
>>> string = [('limited', 1), ('all', 16), ('concept', 1), ('secondly', 1)]
>>> dict(string)
{'limited': 1, 'all': 16, 'concept': 1, 'secondly': 1}
The string variable is a list of pairs. It means you can do something somilar to this:
string = [...]
my_dict = {}
for k, v in string:
my_dict[k] = v
Make a pair of 2 lists and convert them to dict()
list_1 = [1,2,3,4,5]
list_2 = [6,7,8,9,10]
your_dict = dict(zip(list_1, list_2))
Related
I currently have the dictionary:
matrix = {(0, 0): 1, (1, 1): 1, (2, 2): 1}
And I need to create a single string that displays:
(0,0,1),(1,1,1),(2,2,1)
How can I join the dictionary keys (tuples) and value together into one string?
I was thinking of putting the keys into a list and adding the key into the list, but I am not entirely sure how to add it in the right order given that tuples are immutable.
result = []
for i, j in matrix.items():
result.append(i)
for i in result:
The basic operation you're looking for is:
[k + (v,) for k, v in matrix.items()]
To print that in your specific way, you probably want:
print(', '.join(str(k + (v,)) for k, v in matrix.items()))
Note that dictionaries are unordered, so the order of the result is undefined.
Your approach will keep them in the right order. The only thing it looks like you need to change in your code is splitting the tuple into its sub-parts:
>>> result = []
>>> for (x, y), z in matrix.items():
result.append((x, y, z))
>>> print result
[(0, 0, 1), (1, 1, 1), (2, 2, 1)]
I don't know whether this is the most pythonic solution but try this:
str(tuple(_ + (matrix[_],) for _ in matrix))
Or use %s:
print(','.join(["(%s,%s,%s)"%(*k,v) for k,v in matrix.items()]))
Or:
print(','.join([str((*k,v)) for k,v in matrix.items()]))
Or if version not above python 3.5:
print(','.join(["(%s,%s,%s)"%k+(v,) for k,v in matrix.items()]))
Or:
print(','.join([str(k+(v,)) for k,v in matrix.items()]))
Building off your work so far:
lst = []
for i,j in matrix.items():
lst.append((i[0],i[1],j))
result = ",".join( repr(e) for e in lst )
Please note that your keys are tuples which are immutable so cannot be modified, so you need to unpack them first and then join with the dictionary value.
You can use,
matrix = {(0, 0): 1, (1, 1): 1, (2, 2): 1}
# *k unpack keys
print(','.join("({},{},{})".format(*k,v) for k,v in matrix.items()))
# output,
# (0,0,1),(1,1,1),(2,2,1)
I would like to reformat the following list containing tuples with integers (shared between some tuples) and strings (idiosyncratic to each tuple)
mylist = [(8, 'dddd'), (8, '33333'), (8, 'fdsss'), (9, 'fsfjs'),(10, 'dddd'), (10, '33333'), (12, 'fdsss'), (12, 'fsfjs')]
so that each tuple contains an integer and a concatenated string of all strings belonging to it, like so:
mynewlist = [(8, 'dddd, 33333, fdsss'), (9, 'fsfjs'),(10, 'dddd, 333333'), (12, 'fdsss, fsfjs')
After some deliberation, the most parsimonious solution I've come up with is to simply loop across all tuples and concatenate strings until the integer doesn't match the next one:
mynewlist = []
label = ''
for i in range(len(mylist)-1):
if mylist[i][0] != mylist[i+1][0]:
mynewlist.append(tuple([mylist[i][0], label + mylist[i][1]]))
label = ''
else:
label = label + mylist[i][1] + ','
This works fine. However, I'd like to know if there's a more efficient/Pythonic way of producing the list. I considered using a list comprehension, but this wouldn't allow me to select the strings without going through the whole list many times over; the list comprehension would need to be run for each unique integer, which seems wasteful. I also considered pre-selecting the strings associated with a unique integer through indexing, but this appears quite un-Pythonic to me.
Advice is very appreciated. Thanks!
You could use itertools.groupby() to do the grouping here:
from itertools import groupby
from operator import itemgetter
mynewlist = [
(key, ', '.join([s for num, s in group]))
for key, group in groupby(mylist, itemgetter(0))]
This uses list comprehensions to process each group and extract the strings from the grouped tuples for concatenation. The operator.itemgetter() object tells groupby() to group the input on the first element:
>>> from itertools import groupby
>>> from operator import itemgetter
>>> mylist = [(8, 'dddd'), (8, '33333'), (8, 'fdsss'), (9, 'fsfjs'),(10, 'dddd'), (10, '33333'), (12, 'fdsss'), (12, 'fsfjs')]
>>> [(key, ', '.join([s for num, s in group])) for key, group in groupby(mylist, itemgetter(0))]
[(8, 'dddd, 33333, fdsss'), (9, 'fsfjs'), (10, 'dddd, 33333'), (12, 'fdsss, fsfjs')]
Note that the groupby() iterator groups only consecutive matching elements. That means if your input is not sorted, then tuples with the same initial element are not necessarily going to always be put together either. If your input is not sorted and you need all tuples with the same starting element to be grouped regardless of where they are in the input sequence, use a dictionary to group the elements first:
grouped = {}
for key, string in mylist:
grouped.setdefault(key, []).append(string)
mynewlist = [(key, ', '.join([s for num, s in group])) for key, group in grouped.items()]
A defaultdict would do the trick:
from collections import defaultdict
dct = defaultdict(list)
for k, v in mylist:
dct[k].append(v)
mynewlist = [(k, ','.join(v)) for k, v in dct.iteritems()]
You can do it with a custom dict subclass:
class mydict(dict):
def __setitem__(self, key, val):
self.setdefault(key,[]).append(val)
>>> mylist = [(8, 'dddd'), (8, '33333'), (8, 'fdsss'),
... (9, 'fsfjs'),(10, 'dddd'), (10, '33333'),
... (12, 'fdsss'), (12, 'fsfjs')]
>>> d = mydict()
>>> for key, val in mylist:
... d[key] = val
Now d contains something like
{8: ['dddd', '33333', 'fdsss'], 9: ['fsfjs'], 10: ['dddd', '33333'], 12: ['fdsss', 'fsfjs']}
(to within order of items), and you can easily massage this into the form you want:
result = [(key,', '.join(d[key])) for key, value in d]
a = 1
b = 2
i want to insert a:b into a blank python list
list = []
as
a:b
what is the proper syntax for this, to result in
[(a:b), (c:d)]
?
this is just so I can sort the list by value from least to greatest later
How does one insert a key value pair into a python list?
You can't. What you can do is "imitate" this by appending tuples of 2 elements to the list:
a = 1
b = 2
some_list = []
some_list.append((a, b))
some_list.append((3, 4))
print some_list
>>> [(1, 2), (3, 4)]
But the correct/best way would be using a dictionary:
some_dict = {}
some_dict[a] = b
some_dict[3] = 4
print some_dict
>>> {1: 2, 3: 4}
Note:
Before using a dictionary you should read the Python documentation, some tutorial or some book, so you get the full concept.
Don't call your list as list, because it will hide its built-in implementation. Name it something else, like some_list, L, ...
Let's assume your data looks like this:
a: 15
c: 10
b: 2
There are several ways to have your data sorted. This key/value data is best stored as a dictionary, like so:
data = {
'a': 15,
'c': 10,
'b': 2,
}
# Sort by key:
print [v for (k, v) in sorted(data.iteritems())]
# Output: [15, 2, 10]
# Keys, sorted by value:
from operator import itemgetter
print [k for (k, v) in sorted(data.iteritems(), key = itemgetter(1))]
# Output: ['b', 'c', 'a']
If you store the data as a list of tuples:
data = [
('a', 15),
('c', 10),
('b', 2),
]
data.sort() # Sorts the list in-place
print data
# Output: [('a', 15), ('b', 2), ('c', 10)]
print [x[1] for x in data]
# Output [15, 2, 10]
# Sort by value:
from operator import itemgetter
data = sorted(data, key = itemgetter(1))
print data
# Output [('b', 2), ('c', 10), ('a', 15)]
print [x[1] for x in data]
# Output [2, 10, 15]
Overview: In this code sample I demonstrate how to tokenize a list of sentences and then store a dictionary containing a key and value pair where the value is the tokenized words with occurrence count for each sentence.
index=np.arange(0,len(sentences))
wordfreq = {}
bagList=[]
for i in index:
sentence=sentences[i]
keyId=keys[i]
if(len(sentence)>0):
tokens=sum([word_tokenize(sentence)],[])
words_frequency = FreqDist(tokens)
wordfreq={}
for token in tokens:
if token not in wordfreq.keys():
wordfreq[token] = 1
else:
wordfreq[token] += 1
bagList.append({'keyId':keyId, 'words': wordfreq})
to enumerate the list of dictionary items use
for dictionaryValues in bagList:
print(dictionaryValues['keyId'])
print(dictionaryValues['words'])
to get back the key value pair use items
uncommon=dict()
for dictionaryItem in bagList:
words=dictionaryItem['words']
for key,(word,count) in enumerate(words.items()):
if word in uncommon:
uncommon[word]+=1
else:
uncommon[word]=1
uncommon = {key: value for key, value in uncommon.items() if (value<=3 )}
In a script I have an OrderedDict groups that gets fed key/value pairs alphabetically.
In another part of the script, I'm checking against files that have the same same as key like so:
for (key, value) in groups.items():
file = open(key, 'r')
# do stuff
Stuff happens just fine, part of which is printing a status line for each file, but how can I get Python to iterate through groups alphabetically, or at least numerically as they are ordered (since they are being entered in alphabetical order anyways)?
The whole point of an OrderedDict is that you can iterate through it normally in the order that keys were entered:
>>> from collections import OrderedDict
>>> d = OrderedDict()
>>> d[1] = 2
>>> d[0] = 3
>>> d[9] = 2
>>> for k, v in d.items():
print(k, v)
(1, 2)
(0, 3)
(9, 2)
Just make sure you don't feed OrderedDict(...) a dictionary to initialize it or it starts off unordered.
If all you want to do is iterate through a dictionary in order of the keys, you can use a regular dictionary and sorted():
>>> d = dict(s=5,g=4,a=6,j=10)
>>> d
{'g': 4, 's': 5, 'j': 10, 'a': 6}
>>> for k in sorted(d):
print(k, ':', d[k])
a : 6
g : 4
j : 10
s : 5
>>>
(pardon the python3 print())
If you really want to stick with the ordered dict, then read the documentation which shows an example of reordering an OrderedDict:
>>> # regular unsorted dictionary
>>> d = {'banana': 3, 'apple':4, 'pear': 1, 'orange': 2}
>>> # dictionary sorted by key
>>> OrderedDict(sorted(d.items(), key=lambda t: t[0]))
OrderedDict([('apple', 4), ('banana', 3), ('orange', 2), ('pear', 1)])
If you really entered them into an OrderedDict alphabetically in the first place, then I'm not sure why you're having trouble.
I am just wondering how i would convert a string, such as "hello there hi there", and turn it into a dictionary, then using this dictionary, i want to count the number of each word in the dictionary, and return it in alphabetic order. So in this case it would return:
[('hello', 1), ('hi', 1), ('there', 2)]
any help would be appreciated
>>> from collections import Counter
>>> text = "hello there hi there"
>>> sorted(Counter(text.split()).items())
[('hello', 1), ('hi', 1), ('there', 2)]
class collections.Counter([iterable-or-mapping])
A Counter is a dict subclass for counting hashable objects. It is an unordered collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The Counter class is similar to bags or multisets in other languages.
jamylak did fine with Counter. this is a solution without importing Counter:
text = "hello there hi there"
dic = dict()
for w in text.split():
if w in dic.keys():
dic[w] = dic[w]+1
else:
dic[w] = 1
gives
>>> dic
{'hi': 1, 'there': 2, 'hello': 1}