Related
I've got
desc = ['(4,1);(1,4)', '(2,3);(3,2)', '(4,2);(2,4);(1,3);(3,1)', '(1,2);(2,1);(4,3);(3,4)']
and I want the output to be
[[(4, 1), (1, 4)], [(2, 3), (3, 2)], [(4, 2), (2, 4), (1, 3), (3, 1)], [(1, 2), (2, 1), (4, 3), (3, 4)]]
So far I've tried:
for x in range(len(desc)):
desc[x] = desc[x].split(';')
for y in range(len(desc[x])):
desc[x][y] = eval(desc[x][y])
but there is a syntax error saying 'unexpected EOF while parsing. How do I fix my code?
For the last two lines of my code I was just trying to extract the tuples from the strings containing them, is there anything else I could use except for eval()?
Unexpected EOF is caused by the indentation of the second for loop.
for x in range(len(desc)):
desc[x] = desc[x].split(';')
for y in range(len(desc[x])): # this has one tab to much
desc[x][y] = eval(desc[x][y])
This is how it should look like:
for x in range(len(desc)):
desc[x] = desc[x].split(';')
for y in range(len(desc[x])):
desc[x][y] = eval(desc[x][y])
You want to split each item of your list with the separator ';'. You need to parse your list :
for element in desc and split each element according to this separator :
temp = element.split(';'). You can then add to your output list the list [temp[0], temp[1]]
desc = ['(4,1);(1,4)', '(2,3);(3,2)', '(4,2);(2,4);(1,3);(3,1)', '(1,2);(2,1);(4,3);(3,4)']
output = []
for element in desc:
temps = element.split(";")
output.append([temps[0], temps[1]])
print(output)
# [['(4,1)', '(1,4)'], ['(2,3)', '(3,2)'], ['(4,2)', '(2,4)'], ['(1,2)', '(2,1)']]
To remove the '' you have to transform your items into actual tuples with the integers inside :
desc = ['(4,1);(1,4)', '(2,3);(3,2)', '(4,2);(2,4);(1,3);(3,1)', '(1,2);(2,1);(4,3);(3,4)']
output = []
for element in desc:
temps = element.split(";")
tuples_to_add = []
for i in temps:
tuples_to_add.append(tuple([int(i.strip('()')[0]), int(i.strip('()')[-1])]))
output.append(tuples_to_add)
print(output)
[[(4, 1), (1, 4)], [(2, 3), (3, 2)], [(4, 2), (2, 4), (1, 3), (3, 1)], [(1, 2), (2, 1), (4, 3), (3, 4)]]
I trying to make a simple positional index that but having some problems getting the correct output.
Given a list of strings (sentences) I want to use the string position in the sting list as document id and then iterate over the words in the sentence and use the words index in the sentence as its position. Then update a dictionary of words with a tuple of the doc id and it's position in the doc.
Code:
main func -
def doc_pos_index(alist):
inv_index= {}
words = [word for line in alist for word in line.split(" ")]
for word in words:
if word not in inv_index:
inv_index[word]=[]
for item, index in enumerate(alist): # find item and it's index in list
for item2, index2 in enumerate(alist[item]): # for words in string find word and it's index
if item2 in inv_index:
inv_index[i].append(tuple(index, index2)) # if word in index update it's list with tuple of doc index and position
return inv_index
example list:
doc_list= [
'hello Delivered dejection necessary objection do mr prevailed',
'hello Delivered dejection necessary objection do mr prevailed',
'hello Delivered dejection necessary objection do mr prevailed',
'hello Delivered dejection necessary objection do mr prevailed',
'hello Delivered dejection necessary objection do mr prevailed'
]
desired output:
{'Delivered': [(0,1),(1,1),(2,1),(3,1),(4,1)],
'necessary': [(0,3),(1,3),(2,3),(3,3),(4,3)],
'dejection': [(0,2),(1,2),(2,2),(3,2),(4,2)],
ect...}
Current output:
{'Delivered': [],
'necessary': [],
'dejection': [],
'do': [],
'objection': [],
'prevailed': [],
'mr': [],
'hello': []}
An fyi, I do know about collections libarary and NLTK but I'm mainly doing this for learning/practice reasons.
Check this:
>>> result = {}
>>> for doc_id,doc in enumerate(doc_list):
for word_pos,word in enumerate(doc.split()):
result.setdefault(word,[]).append((doc_id,word_pos))
>>> result
{'Delivered': [(0, 1), (1, 1), (2, 1), (3, 1), (4, 1)], 'necessary': [(0, 3), (1, 3), (2, 3), (3, 3), (4, 3)], 'dejection': [(0, 2), (1, 2), (2, 2), (3, 2), (4, 2)], 'do': [(0, 5), (1, 5), (2, 5), (3, 5), (4, 5)], 'objection': [(0, 4), (1, 4), (2, 4), (3, 4), (4, 4)], 'prevailed': [(0, 7), (1, 7), (2, 7), (3, 7), (4, 7)], 'mr': [(0, 6), (1, 6), (2, 6), (3, 6), (4, 6)], 'hello': [(0, 0), (1, 0), (2, 0), (3, 0), (4, 0)]}
>>>
You seem to be confused about what enumerate does. The first item returned by enumerate() is the index, and the second item is the value. You seem to have it reversed.
You are further confused with your second use of enumerate():
for item2, index2 in enumerate(alist[item]): # for words in string find word and it's index
First of all you don't need to do alist[item]. You already have the value of that line in the index variable (again, you are perhaps confused since you have the variable names backwards. Second, you seem to think that enumerate() will split a line into individual words. It won't. Instead it will just iterate over every character in the string (I'm confused why you thought this since you demonstrated earlier that you know how to split a string on spaces--interesting though).
As an additional tip, you don't need to do this:
for word in words:
if word not in inv_index:
inv_index[word]=[]
First of all, since you're just initializing a dict you don't need the if statement. Just
for word in words:
inv_index[word] = []
will do. If the word is already in the dictionary this will make an unnecessary assignment, true, but it's still an O(1) operation so there's no harm. However, you don't even need to do this. Instead you can use collections.defaultdict:
from collections import defaultdict
inv_index = defaultdict(list)
Then you can just do ind_index[word].append(...). If word is not already in inv_index it will add it and initialize its value to an empty list. Otherwise it will just append to the existing list.
#And the algorithm for the following: {term: [df, tf, {doc1: [tf, [offsets], doc2...}]]
InvertedIndex = {}
from TextProcessing import *
for i in range(len(listaDocumentos)):
docTokens = tokenization(listaDocumentos[i], NLTK=True)
for token in docTokens:
if token in InvertedIndex:
if i in InvertedIndextoken:
pass
else:
InvertedIndex[token][0] += 1
InvertedIndextoken.append(i)
else:
DF = 1
ListOfDOCIDs = [i]
InvertedIndex[token] = [DF, ListOfDOCIDs]
Output
I'm trying to dig into some code I found online here to better understand Python.
This is the code fragment I'm trying to get a feel for:
from itertools import chain, product
def generate_groupings(word_length, glyph_sizes=(1,2)):
cartesian_products = (
product(glyph_sizes, repeat=r)
for r in range(1, word_length + 1)
)
Here, word_length is 3.
I'm trying to evaluate the contents of the cartesian_products generator. From what I can gather after reading the answer at this SO question, generators do not iterate (and thus, do not yield a value) until they are called as part of a collection, so I've placed the generator in a list:
list(cartesian_products)
Out[6]:
[<itertools.product at 0x1025d1dc0>,
<itertools.product at 0x1025d1e10>,
<itertools.product at 0x1025d1f50>]
Obviously, I now see inside the generator, but I was hoping to get more specific information than the raw details of the itertools.product objects. Is there a way to accomplish this?
if you don't care about exhausting the generator, you can use:
list(map(list,cartesian_products))
You will get the following for word_length = 3
Out[1]:
[[(1,), (2,)],
[(1, 1), (1, 2), (2, 1), (2, 2)],
[(1, 1, 1),
(1, 1, 2),
(1, 2, 1),
(1, 2, 2),
(2, 1, 1),
(2, 1, 2),
(2, 2, 1),
(2, 2, 2)]]
I'm trying to include multiple operations in the lambda function with variables that have different lengths, i.e. something like:
$ serial_result = map(lambda x,y:(x**2,y**3), range(20), range(10))
but this doesn't work. Could someone tell me how to get around this?
I understand that:
$ serial_result = map(lambda x,y:(x**2,y**3), range(0,20,2), range(10))
works because the arrays of "x" and "y" have the same length.
If you want the product of range items you can use itertools.product :
>>> from itertools import product
>>> serial_result = map(lambda x:(x[0]**2,x[1]**3), product(range(20), range(10)))
If you want to pass the pairs to lambda like second case you can use itertools.zip_longest (in python 2 use izip_longest)and pass a fillvalue to fill the missed items,
>>> from itertools import zip_longest
>>> serial_result = map(lambda x:(x[0]**2,x[1]**3), zip_longest(range(20), range(10),fillvalue=1))
Note that if you are in python 2 you can pass multiple argument to lambda as a tuple :
>>> serial_result = map(lambda (x,y):(x**2,y**3), product(range(20), range(10)))
See the difference of izip_longest and product in following example :
>>> list(product(range(5),range(3)))
[(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2), (2, 0), (2, 1), (2, 2), (3, 0), (3, 1), (3, 2), (4, 0), (4, 1), (4, 2)]
>>> list(zip_longest(range(5),range(3)))
[(0, 0), (1, 1), (2, 2), (3, None), (4, None)]
>>> list(zip_longest(range(5),range(3),fillvalue=1))
[(0, 0), (1, 1), (2, 2), (3, 1), (4, 1)]
It sounds like you may be confused as to how exactly you want to use all the values in these two variables. There are several ways combine them...
If you want a result for every combination of an element in a and an element in b: itertools.product(a, b).
If you want to stop once you get to the end of the shorter: zip(a, b)
If you want to continue on until you've used all of the longest: itertools.zip_longer(a, b) (izip_longer in python 2). Once a runs out of elements it will be filled in with None, or a default you provide.
I am currently working with a script that has lists that looks like this:
example = [ ((2,1),(0,1)), ((0,1),(2,1)), ((2,1),(0,1)) ]
Now turning this list to a set returns:
set( [ ((2,1),(0,1)), ((0,1),(2,1)) ] )
For my purposes I need to recognize these tuples as being equal as well. I dont care about retaining the order. All solutions I can think of is really messy so if anyone has any idea I would be gratefull.
It sounds like you may be off using frozensets instead of tuples.
>>> x = [((2, 1), (0, 1)), ((0, 1), (2, 1)), ((2, 1), (0, 1))]
>>> x
[((2, 1), (0, 1)), ((0, 1), (2, 1)), ((2, 1), (0, 1))]
>>> set(frozenset(ts) for ts in x)
set([frozenset([(0, 1), (2, 1)])])
In [10]: set(tuple(sorted(elt)) for elt in example)
Out[10]: set([ ((0, 1), (2, 1)) ])
First transform all elements to sets too. Then make a set of the whole list.