Python: list index out of range - while/for loop - python

I have a list
abc = ['date1','sentence1','date2','sentence2'...]
I want to do sentiment analysis on the sentences. After that I want to store the results in a list that looks like:
xyz =[['date1','sentence1','sentiment1'],['date2','sentence2','sentiment2']...]
For this I have tried following code:
def result(doc):
x = 2
i = 3
for lijn in doc:
sentiment = classifier.classify(word_feats_test(doc[i]))
xyz.extend(([doc[x],doc[i],sentiment])
x = x + 2
i = i + 2
The len(abc) is about 7500. I start out with x as 2 and i as 3, as I don't want to use the first two elements of the list.
I keep on getting the error 'list index out of range', no matter what I try (while, for loops...)
Can anybody help me out? Thank you!

As comments mentioned - we won't be able to help You with finding error in Your code without stacktrace. But it is easy to solve Your problem like this:
xyz = []
def result(abc):
for item in xrange(0, len(abc), 2): # replace xrange with range in python3
#sentiment = classifier.classify(word_feats_test(abc[item]))
sentiment = "sentiment" + str(1 + (item + 1) / 2)
xyz.append([abc[item], abc[item + 1], sentiment])
You might want to read about built-in functions that makes programmers life easy. (Why worry about incrementing if range has that already?)
#output
[['date1', 'sentence1', 'sentiment1'],
['date2', 'sentence2', 'sentiment2'],
['date3', 'sentence3', 'sentiment3'],
['date4', 'sentence4', 'sentiment4'],
['date5', 'sentence5', 'sentiment5']]

Try this
i =0
for i in xrange(0,len(doc) -1)
date = doc[i]
sentence = doc[i + 1]
sentiment = classifier.classify(word_feats_test(sentence))
xyz.append([date,sentence,classifier])
Only need one index. The important thing is knowing when to stop.
Also, check out the difference between extend and append
Finally I would suggest you store your data as a list of dictionaries rather than a list of lists. That lets you access the items by field name rather than index , which makes for cleaner code.

If you want two elements from your list at a time, you can use a generator then pass the element/s to your classifier:
abc = ["ignore","ignore",'date1','sentence1','date2','sentence2']
from itertools import islice
def iter_doc(doc, skip=False):
it = iter(doc)
if skip: # if skip is set, start from index doc[skip:]
it = iter(islice(it, skip, None))
date, sent = next(it), next(it)
while date and sent:
yield date, sent
date, sent = next(it, ""), next(it, "")
for d, sen in result(abc, 2): # skip set to to so we ignore first two elements
print(d, sen)
date1 sentence1
date2 sentence2
So to create you list of lists xyz you can use a list comprehension:
xyz = [ [d,sen,classifier.classify(word_feats_test(sen))] for d, sen in iter_doc(abc, 2)]

It's simple. you can try it:
>>> abc = ['date1','sentence1','date2','sentence2'...]
>>> xyz = [[ abc[i], abc[i+1], "sentiment"+ str(i/2 + 1)] for i in range(0, len(abc), 2) ]
>>> xyz
output : [['date1', 'sentence1', 'sentiment1'], ['date2', 'sentence2', 'sentiment2'], .....]

Related

split or chunk dynamic string into specific parts and merging in python

Is there way to split or chunk the dynamic string into fixed size? let me explain:
Suppose:
name = Natalie
Family = David12
length = len(name) #7 bit
length = len(Family) # 7 bit
i want to split the name and family into and merging as :
result=nadatavilid1e2
and again split and extract the the 2 string as
x= Natalie
y= david
another Example:
Name = john
Family= mark
split and merging:
result= jomahnrk
and again split and extract the the 2 string as
x=john
y= mark
.
Remember variable name and family have different size length every time not static! . i hope my question is clear. i have seen some related solution about it like here and here and here and here and here and here and here but none of these work with what im looking for. Any suggestion ?? Thanks
i'm using spyder python 3.6.4
I have try this code split data into two parts:
def split(data):
indices = list(int(x) for x in data[-1:])
data = data[:-1]
rv = []
for i in indices[::-1]:
rv.append(data[-i:])
data=data[:-i]
rv.append(data)
return rv[::-1]
data='Natalie'
x,c=split(str(data))
print (x)
print (c)
Given you have stated names will always be of equal length you could use wrap to split in to 2 char pairs and the zip and chain to join them up. In the split part you can again use wwrap to split in 2 char pairs but if the number of pairs is odd then you need to split the last pair into 2 single entries. something like.
from textwrap import wrap
from itertools import chain
def merge_names(name, family):
name_split = wrap(name, 2)
family_split = wrap(family, 2)
return "".join(chain(*zip(name_split, family_split)))
def split_names(merged_name):
names = ["", ""]
char_pairs = wrap(merged_name, 2)
if len(char_pairs) % 2:
char_pairs.append(char_pairs[-1][1])
char_pairs[-2] = char_pairs[-2][0]
for index, chars in enumerate(char_pairs):
pos = 1 if index % 2 else 0
names[pos] += chars
return names
print(merge_names("john", "mark"))
print(split_names("jomahnrk"))
print(merge_names("stephen", "natalie"))
print(split_names("stnaeptaheline"))
print(merge_names("Natalie", "David12"))
print(split_names("NaDatavilid1e2"))
OUTPUT
jomahnrk
['john', 'mark']
stnaeptaheline
['stephen', 'natalie']
NaDatavilid1e2
['Natalie', 'David12']
Something like:
a = "Eleonora"
b = "James"
l = max(len(a), len(b))
a = a.lower() + " " * (l-len(a))
b = b.lower() + " " * (l-len(b))
n = 2
a = [a[i:i+n] for i in range(0, len(a), n)]
b = [b[i:i+n] for i in range(0, len(b), n)]
ans = "".join(map(lambda xy: "".join(xy), zip(a, b))).replace(" ", "")
Giving for this example:
eljaeomenosra

How to get portion of string from 2 different strings and concat?

I have 2 strings a and b with - as delimiter, want to get 3rd string by concatenating the substring upto last % from a (which is one-two-three-%whatever% in below example) and from string b, drop the substring upto number of dashes found in resultant string (which is 4 in below e.g., that gives bar-bazz), I did this so far, is there a better way?
>>> a='one-two-three-%whatever%-foo-bar'
>>> b='1one-2two-3three-4four-bar-bazz'
>>> k="%".join(a.split('%')[:-1]) + '%-'
>>> k
'one-two-three-%whatever%-'
>>> k.count('-')
4
>>> y=b.split("-",k.count('-'))[-1]
>>> y
'bar-bazz'
>>> k+y
'one-two-three-%whatever%-bar-bazz'
>>>
An alternative approach using Regex:
import re
a = 'one-two-three-%whatever%-foo-bar'
b = '1one-2two-3three-4four-bar-bazz'
part1 = re.findall(r".*%-",a)[0] # one-two-three-%whatever%-
num = part1.count("-") # 4
part2 = re.findall(r"\w+",b) # ['1one', '2two', '3three', '4four', 'bar', 'bazz']
part2 = '-'.join(part2[num:]) # bar-bazz
print(part1+part2) # one-two-three-%whatever%-bar-bazz
For the first substring obtained from a, you can use rsplit():
k = a.rsplit('%', 1)[0] + '%-'
The rest look good to me
Maybe a little shorter ?
a = 'one-two-three-%whatever%-foo-bar'
b = '1one-2two-3three-4four-bar-bazz'
def merge (a,b):
res = a[:a.rfind ('%')+1]+'-'
return (res + "-".join (b.split ("-")[res.count ('-'):]))
print (merge (a,b) == 'one-two-three-%whatever%-bar-bazz')
I personally get nervous when I need to manually increment indexes or concatenate bare strings.
This answer is pretty similar to hingev's, just without the additional concat/addition operators.
t = "-"
ix = list(reversed(a)).index("%")
t.join([s] + b.split(t)[len(a[:-ix].split(t)):])
yet another possible answer:
def custom_merge(a, b):
result = []
idx = 0
for x in itertools.zip_longest(a.split('-'), b.split('-')):
result.append(x[idx])
if x[0][0] == '%' == x[0][-1]:
idx = 1
return "-".join(result)
Your question is specific enough that you might be optimizing the wrong thing (a smaller piece of a bigger problem). That being said, one way that feels easier to follow, and avoids some of the repeated linear traversals (splits and joins and counts) would be this:
def funky_percent_join(a, b):
split_a = a.split('-')
split_b = b.split('-')
breakpoint = 0 # len(split_a) if all of a should be used on no %
for neg, segment in enumerate(reversed(split_a)):
if '%' in segment:
breakpoint = len(split_a) - neg
break
return '-'.join(split_a[:breakpoint] + split_b[breakpoint:])
and then
>>> funky_percent_join('one-two-three-%whatever%-foo-bar', '1one-2two-3three-4four-bar-bazz')
'one-two-three-%whatever%-bar-bazz'
print(f"one-two-three-%{''.join(a.split('%')[1])}%")
that would work for the first, and then you could do the same for the second, and when you're ready to concat, you can do:
part1 = str(f"one-two-three-%{''.join(a.split('%')[1])}%")
part2 = str(f"-{''.join(b.split('-')[-2])}-{''.join(b.split('-')[-1])}")
result = part1+part2
this way it'll grab whatever you set the a/b variables to, provided they follow the same format.
but then again, why not do something like:
result = str(a[:-8] + b[22:])

Convert integers inside a list into strings and then a date in python 3.x

i've just started studying python in college and i have a problem with this exercise:
basically i have to take a list of integers, like for example [10,2,2013,11,2,2014,5,23,2015], turn the necessary elements to form a date into a string, like ['1022013',1122014,5232015] and then put a / between the strings so i have this ['10/2/2013', '11/22/2014','05/23/2015']. It needs to be a function, and the length of the list is assumed to be a multiple of 3. How do i go about doing this?
I wrote this code to start:
def convert(lst):
...: for element in lst:
...: result = str(element)
...: return result
...:
but from a list [1,2,3] only returns me '1'.
To split your list into size 3 chunks you use a range with a step of 3
for i in range(0, len(l), 3):
print(l[i:i+3])
And joining the pieces with / is as simple as
'/'.join([str(x) for x in l[i:i+3]])
Throwing it all together into a function:
def make_times(l):
results = []
for i in range(0, len(l), 3):
results.append('/'.join([str(x) for x in l[i:i+3]]))
return results
testList = [10,2,2013,11,2,2014,5,23,2015]
def convert(inputList):
tempList = []
for i in range (0, len(inputList), 3): #Repeats every 3 elements
newDate = str(inputList[i])+"/"+str(inputList[i+1])+"/"+str(inputList[i+2]) #Joins everything together
tempList.append(newDate)
return tempList
print(convert(testList))
Saswata sux
Use datetime to extract the date and and strftime to format it:
from datetime import datetime
dates = [10,2,2013,11,2,2014,5,23,2015]
for i in range(0, len(dates), 3):
d = datetime(dates[i+2], dates[i], dates[i+1])
print(d.strftime("%m/%d/%y"))
OUTPUT
10/02/13
11/02/14
05/23/15
Something like this would work:
def convert(lst):
string = ''
new_lst = []
for x in lst:
if len(str(x)) < 4:
string += str(x)+'/'
else:
string += str(x)
new_lst.append(string)
string = ''
return(new_lst)
lst = [10,2,2013,11,2,2014,5,23,2015]
lst = convert(lst)
print(lst)
#output
['10/2/2013', '11/2/2014', '5/23/2015']
So create a placeholder string and a new list. Then loop through each element in your list. If the element is not a year, then add it to the string with a '/'. If it is a year, add the string to the new list and clear the string.

formatting list to convert into string

Here is my question
count += 1
num = 0
num = num + 1
obs = obs_%d%(count)
mag = mag_%d%(count)
while num < 4:
obsforsim = obs + mag
mylist.append(obsforsim)
for index in mylist:
print index
The above code gives the following results
obs1 = mag1
obs2 = mag2
obs3 = mag3
and so on.
obsforrbd = parentV = {0},format(index)
cmds.dynExpression(nPartilce1,s = obsforrbd,c = 1)
However when i run the code above it only gives me
parentV = obs3 = mag3
not the whole list,it only gives me the last element of the list why is that..??
Thanks.
I'm having difficulty interpreting your question, so I'm just going to base this on the question title.
Let's say you have a list of items (they could be anything, numbers, strings, characters, etc)
myList = [1,2,3,4,"abcd"]
If you do something like:
for i in myList:
print(i)
you will get:
1
2
3
4
"abcd"
If you want to convert this to a string:
myString = ' '.join(myList)
should have:
print(myString)
>"1 2 3 4 abcd"
Now for some explanation:
' ' is a string in python, and strings have certain methods associated with them (functions that can be applied to strings). In this instance, we're calling the .join() method. This method takes a list as an argument, and extracts each element of the list, converts it to a string representation and 'joins' it based on ' ' as a separator. If you wanted a comma separated list representation, just replace ' ' with ','.
I think your indentations wrong ... it should be
while num < 4:
obsforsim = obs + mag
mylist.append(obsforsim)
for index in mylist:
but Im not sure if thats your problem or not
the reason it did not work before is
while num < 4:
obsforsim = obs + mag
#does all loops before here
mylist.append(obsforsim) #appends only last
The usual pythonic way to spit out a list of numbered items would be either the range function:
results = []
for item in range(1, 4):
results.append("obs%i = mag_%i" % (item, item))
> ['obs1 = mag_1', 'obs2 = mag_2', 'ob3= mag_3']
and so on (note in this example you have to pass in the item variable twice to get it to register twice.
If that's to be formatted into something like an expression you could use
'\n'.join(results)
as in the other example to create a single string with the obs = mag pairs on their own lines.
Finally, you can do all that in one line with a list comprehension.
'\n'.join([ "obs%i = mag_%i" % (item, item) for item in range (1, 4)])
As other people have pointed out, while loops are dangerous - its easier to use range

Python: How to replace N random string occurrences in text?

Say that I have 10 different tokens, "(TOKEN)" in a string. How do I replace 2 of those tokens, chosen at random, with some other string, leaving the other tokens intact?
>>> import random
>>> text = '(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)'
>>> token = '(TOKEN)'
>>> replace = 'foo'
>>> num_replacements = 2
>>> num_tokens = text.count(token) #10 in this case
>>> points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
>>> replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
'(TOKEN)__(TOKEN)__(TOKEN)__(TOKEN)__foo__(TOKEN)__foo__(TOKEN)__(TOKEN)__(TOKEN)'
In function form:
>>> def random_replace(text, token, replace, num_replacements):
num_tokens = text.count(token)
points = [0] + sorted(random.sample(range(1,num_tokens+1),num_replacements)) + [num_tokens+1]
return replace.join(token.join(text.split(token)[i:j]) for i,j in zip(points,points[1:]))
>>> random_replace('....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....','(TOKEN)','FOO',2)
'....FOO....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....(TOKEN)....FOO....'
Test:
>>> for i in range(0,9):
print random_replace('....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....','(0)','(%d)'%i,i)
....(0)....(0)....(0)....(0)....(0)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(1)....(0)....(0)....(0)....
....(0)....(0)....(0)....(0)....(0)....(2)....(2)....(0)....
....(3)....(0)....(0)....(3)....(0)....(3)....(0)....(0)....
....(4)....(4)....(0)....(0)....(4)....(4)....(0)....(0)....
....(0)....(5)....(5)....(5)....(5)....(0)....(0)....(5)....
....(6)....(6)....(6)....(0)....(6)....(0)....(6)....(6)....
....(7)....(7)....(7)....(7)....(7)....(7)....(0)....(7)....
....(8)....(8)....(8)....(8)....(8)....(8)....(8)....(8)....
If you need exactly two, then:
Detect the tokens (keep some links to them, like index into the string)
Choose two at random (random.choice)
Replace them
What are you trying to do, exactly? A good answer will depend on that...
That said, a brute-force solution that comes to mind is to:
Store the 10 tokens in an array, such that tokens[0] is the first token, tokens[1] is the second, ... and so on
Create a dictionary to associate each unique "(TOKEN)" with two numbers: start_idx, end_idx
Write a little parser that walks through your string and looks for each of the 10 tokens. Whenever one is found, record the start/end indexes (as start_idx, end_idx) in the string where that token occurs.
Once done parsing, generate a random number in the range [0,9]. Lets call this R
Now, your random "(TOKEN)" is tokens[R];
Use the dictionary in step (3) to find the start_idx, end_idx values in the string; replace the text there with "some other string"
My solution in code:
import random
s = "(TOKEN)test(TOKEN)fgsfds(TOKEN)qwerty(TOKEN)42(TOKEN)(TOKEN)ttt"
replace_from = "(TOKEN)"
replace_to = "[REPLACED]"
amount_to_replace = 2
def random_replace(s, replace_from, replace_to, amount_to_replace):
parts = s.split(replace_from)
indices = random.sample(xrange(len(parts) - 1), amount_to_replace)
replaced_s_parts = list()
for i in xrange(len(parts)):
replaced_s_parts.append(parts[i])
if i < len(parts) - 1:
if i in indices:
replaced_s_parts.append(replace_to)
else:
replaced_s_parts.append(replace_from)
return "".join(replaced_s_parts)
#TEST
for i in xrange(5):
print random_replace(s, replace_from, replace_to, 2)
Explanation:
Splits string into several parts using replace_from
Chooses indexes of tokens to replace using random.sample. This returned list contains unique numbers
Build a list for string reconstruction, replacing tokens with generated index by replace_to.
Concatenate all list elements into single string
Try this solution:
import random
def replace_random(tokens, eqv, n):
random_tokens = eqv.keys()
random.shuffle(random_tokens)
for i in xrange(n):
t = random_tokens[i]
tokens = tokens.replace(t, eqv[t])
return tokens
Assuming that a string with tokens exists, and a suitable equivalence table can be constructed with a replacement for each token:
tokens = '(TOKEN1) (TOKEN2) (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) (TOKEN9) (TOKEN10)'
equivalences = {
'(TOKEN1)' : 'REPLACEMENT1',
'(TOKEN2)' : 'REPLACEMENT2',
'(TOKEN3)' : 'REPLACEMENT3',
'(TOKEN4)' : 'REPLACEMENT4',
'(TOKEN5)' : 'REPLACEMENT5',
'(TOKEN6)' : 'REPLACEMENT6',
'(TOKEN7)' : 'REPLACEMENT7',
'(TOKEN8)' : 'REPLACEMENT8',
'(TOKEN9)' : 'REPLACEMENT9',
'(TOKEN10)' : 'REPLACEMENT10'
}
You can call it like this:
replace_random(tokens, equivalences, 2)
> '(TOKEN1) REPLACEMENT2 (TOKEN3) (TOKEN4) (TOKEN5) (TOKEN6) (TOKEN7) (TOKEN8) REPLACEMENT9 (TOKEN10)'
There are lots of ways to do this. My approach would be to write a function that takes the original string, the token string, and a function that returns the replacement text for an occurrence of the token in the original:
def strByReplacingTokensUsingFunction(original, token, function):
outputComponents = []
matchNumber = 0
unexaminedOffset = 0
while True:
matchOffset = original.find(token, unexaminedOffset)
if matchOffset < 0:
matchOffset = len(original)
outputComponents.append(original[unexaminedOffset:matchOffset])
if matchOffset == len(original):
break
unexaminedOffset = matchOffset + len(token)
replacement = function(original=original, offset=matchOffset, matchNumber=matchNumber, token=token)
outputComponents.append(replacement)
matchNumber += 1
return ''.join(outputComponents)
(You could certainly change this to use shorter identifiers. My style is somewhat more verbose than typical Python style.)
Given that function, it's easy to replace two random occurrences out of ten. Here's some sample input:
sampleInput = 'a(TOKEN)b(TOKEN)c(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)g(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k'
The random module has a handy method for picking random items from a population (not picking the same item twice):
import random
replacementIndexes = random.sample(range(10), 2)
Then we can use the function above to replace the randomly-chosen occurrences:
sampleOutput = strByReplacingTokensUsingFunction(sampleInput, '(TOKEN)',
(lambda matchNumber, token, **keywords:
'REPLACEMENT' if (matchNumber in replacementIndexes) else token))
print sampleOutput
And here's some test output:
a(TOKEN)b(TOKEN)cREPLACEMENTd(TOKEN)e(TOKEN)fREPLACEMENTg(TOKEN)h(TOKEN)i(TOKEN)j(TOKEN)k
Here's another run:
a(TOKEN)bREPLACEMENTc(TOKEN)d(TOKEN)e(TOKEN)f(TOKEN)gREPLACEMENTh(TOKEN)i(TOKEN)j(TOKEN)k
from random import sample
mystr = 'adad(TOKEN)hgfh(TOKEN)hjgjh(TOKEN)kjhk(TOKEN)jkhjk(TOKEN)utuy(TOKEN)tyuu(TOKEN)tyuy(TOKEN)tyuy(TOKEN)tyuy(TOKEN)'
def replace(mystr, substr, n_repl, replacement='XXXXXXX', tokens=10, index=0):
choices = sorted(sample(xrange(tokens),n_repl))
for i in xrange(choices[-1]+1):
index = mystr.index(substr, index) + 1
if i in choices:
mystr = mystr[:index-1] + mystr[index-1:].replace(substr,replacement,1)
return mystr
print replace(mystr,'(TOKEN)',2)

Categories