Iteration issue in python - python

[Code below question]
The idea is to expand on python's built-in split() function. This function takes two strings, one that needs to be split, and the second is what characters to omit and split at in the first string. This code has worked, but for some reason with this input, it will not iterate anything past the last comma. In other words, no matter the input in this format, it won't append anything past the final comma. I can't figure out why. I have gone line through line of this code and I can't find out where I am losing it.
Why is my code not iterating through any characters past the last comma?
def split_string(source,splitlist):
## Variables ##
output = []
start, start_pos , tracker = 0 , 0 , 0
## Iterations ##
for char in source:
start = source.find(char,start)
if char in splitlist:
tracker += 1
if tracker <= 1:
end_pos = source.find(char, start)
output.append(source[start_pos:end_pos])
start_pos = end_pos + 1
else:
start_pos+=1
else:
tracker = 0
return output
out = split_string("First Name,Last Name,Street Address,City,State,Zip Code",",")
print out

Because your code does not have any code to append from the last comma till the end of string.
end_pos = source.find(char, start)
output.append(source[start_pos:end_pos])
Your need to finally append a range between last comma and string length.
Add the following after the loop ends.
output.append(source[end_pos+1:];
Modified code:
http://ideone.com/9Khu4g

Related

How do I read a set of characters from a specific position in a string in python

So I am writing my own cypher and decoder for a bit of fun but I've gotten stuck on my decipher program where I need to move up by 1 character in my string in order to decipher the next "block" of characters.
cyinput2 = ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff:"
#989E86
Above is my string that I am trying move up in, essentially I am trying to move on from my first base 6 string to the second, which in this case would be ",#808000" and then so on by increments of 10 characters from each comma.
I've tried to incorporate some ideas such as just adding 1 character to count as the space in between each base 6 string but I wasn't able to figure out. I have also started to work on using a variable set to integer 0 that would increment by 10 after each string is converted back into hexadecimal.
But I haven't yielded any results from that yet, so I'm hoping someone on here may be able to stop me from going overboard when I could instead use a much simpler solution.
while str(cyinput2) != "":
S = 0
S1 = 9
hexstart = ','
hexend = ':'
start_index = cyinput2.find(hexstart) + len(hexstart)
end_index = cyinput2.find(hexend)
block = cyinput2[start_index:end_index]
print(block)
You could simply convert your string to a list by using split,
If you want clean blocks you can use replace too
# Removing useless chars in order to have a clean split
cyinput2 = cyinput2.replace(',', '')
cyinput2 = cyinput2.replace(':', '')
# From : ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff:"
# to : "#96c8a2 #808000 #96c8a2 #e1a95f #808000 #6f00ff"
# Split
cyinput2 = cyinput2.split(' ')
# From : "#96c8a2 #808000 #96c8a2 #e1a95f #808000 #6f00ff"
# to : ["#96c8a2", "#808000", "#96c8a2", "#e1a95f", "#808000", "#6f00ff"]
for block in cyinput2:
print(block)
You can use this while loop
cyinput2 = ",#96c8a2: ,#808000: ,#96c8a2: ,#e1a95f: ,#808000: ,#6f00ff: "
while len(str(cyinput2)) > 9 :
print(cyinput2[1:10])
cyinput2 = cyinput2.replace(cyinput2[0:10],'')
Output
#96c8a2:
#808000:
#e1a95f:
#6f00ff:

Replace space-char by newline-char within max-line-length in python

I am trying to write a python script,
which breaks a continuous string into lines,
when the max_line_length has been exceeded.
It shall not break words,
and searches therefore the last occurrence of a whitespace-char,
which will be replaced by a newline-char.
For some reason it does not break within the specified limit.
E.g. when defining the max_line_length = 80,
the text sometimes breaks at 82 or 83, etc.
Since quite some time I am trying to fix the problem,
however it feels like i am having the tunnel vision
and don't see the problem here:
#!/usr/bin/python
import sys
if len(sys.argv) < 3:
print('usage: $ python3 breaktext.py <max_line_length> <file>')
print('example: $ python3 breaktext.py 80 infile.txt')
exit()
filename = str(sys.argv[2])
with open(filename, 'r') as file:
text_str = file.read().replace('\n', '')
m = int(sys.argv[1]) # max_line_length
text_list = list(text_str) # convert string to list
l = 0; # line_number
i = m+1 # line_character_index
index = m+1 # total_list_index
while index < len(text_list):
while text_list[l * m + i] != ' ':
i -= 1
pass
text_list[l * m + i] = '\n'
l += 1
i = m+1
index += m+1
pass
text_str = ''.join(text_list)
print(text_str)
I guess we'll take this from the top.
text_str = file.read().replace('\n', '')
Here's one assumption about the input data I don't know if it's true. You're replacing all the newline characters with nothing; if there weren't spaces next to them, this means the code below will never break the lines in the same places.
text_list = list(text_str) # convert string to list
This splits the input file into single character strings. I guess you might have done so to make it mutable, such that you can replace individual characters, but it's a very expensive operation and loses all the features of a string. Python is a high level language that would allow you to split into e.g. words instead.
index = m+1 # total_list_index
while index < len(text_list):
#...
index += m+1
Let's consider what this means. We're not entering into the loop if index exceeds the text_list length. But index is advancing in steps of m+1. So we're splitting math.floor(len(text)/(max_line_length+1)) times. Unless every line is exactly max_line_length characters, not counting its space we replace with a newline, that's too few times. Too few times means too long lines, at least at the end.
l = 0; # line_number
i = m+1 # line_character_index
#loop:
while text_list[l * m + i] != ' ':
i -= 1
text_list[l * m + i] = '\n'
l += 1
i = m+1
This is making things difficult with index math. Quite clearly the one index we ever use is l * m + i. This moves in a quite odd way; it searches backwards for a space, then leaps forward as l increments and i resets. Whatever position it had reversed to is lost as all the leaps are in steps of m.
Let's apply m=5 to the string "Fee fie faw fum who did you see now". For the first iteration, 0 * 5 + 5+1 hits the second word, and i seeks back to the first space. The first line then is "Fee", as expected. The second search starts at 1*5 + 5+1, which is a space, and the second line becomes "fie faw", which already exceeds our limit of 5! The reason is that l * m isn't the beginning of the line; it's actually in the middle of "fie", a discrepancy which can only grow as you continue through the file. It grows whenever you split off a line that is shorter than m.
The solution involves remembering where you did your split. That could be as simple as replacing l * m with index, and updating it by index += i instead of m+1.
Another odd effect happens if you ever encounter a word that exceeds the maximum line length. Beyond meaning a line is longer than the limit, i will still search backwards until it finds a space; that space could then be in an earlier line altogether, producing extra short lines as well as too long ones. That's a result of handling the entire text as one array and not limiting which section we're looking at.
Personally I'd much rather use Python's built in methods, such as str.rindex, which can find a particular character in a given region within a string:
s = "Fee fie faw fum who did you see now"
maxlen = 5
start = 8
end = s.rindex(' ', start, start+maxlen)
print(s[start:end])
start = end + 1
We also, as PaulMcG pointed out, can go full "batteries included" and use the standard library textwrap module for the entire task.

Splice a string, multiple times, using a keyword

I am trying to break a string apart by removing segments that occur between two words.
Example:
AGCGUGUGAGAGCUCCGA
I will remove the parts that occur between: GUGU and AGAG
So, the new string will be:
AGCCUCCGA
I wrote a code that utilises while loop to keep 'splicing' a string over and over till it can't find the GUGU and AGAG in the string. The process works, most of the time.
I encountered one case where the 'input' is extremely long and then my code is stuck in an infinite loop and I don't understand why that is the case.
I was hoping that someone could review it and help me improve on what I am doing.
def splice(strand):
while True:
initial = strand.find('GUGU')
final = strand.find('AGAG')
if initial == -1:
break
if final == -1:
break
strand = strand[:initial] + strand[final+4:]
return strand
if __name__ == "__main__":
strand = input("Input strand: ")
print()
spliced = splice(strand)
print("Output is {}".format(spliced))
The case where it is failing is:
GUGUAGAGGUCACAGUGUAAAAGCUCUAGAGCAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG
Expected result:
GUCACACAGACAGAUGUAGAGCAAAGGCAACA
I haven't encountered any other cases where the code will not work.
Your code doesn't work if AGAG is right before GUGU. After the first iteration on that input, the value of strand is
GUCACACAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG
Then initial is 21 and final is 17, so you do:
strand = strand[:21] + strand[21:]
which just sets strand back to the same value, so you get stuck in a loop.
The string.find() method has an optional start argument, so you can tell it to start looking for AGAG after initial:
final = strand.find("AGAG", initial+4)
You can also do the whole thing with a regexp substitution:
import re
strand = re.sub(r'GUGU(.*?)AGAG', '', strand)
import re
pattern = '(.*?)GUGU.*?AGAG'
s1 = 'AGCGUGUGAGAGCUCCGA'
s2 = 'GUGUAGAGGUCACAGUGUAAAAGCUCUAGAGCAGACAGAUGUAGAGGUGUUGUGUAACCCGUAGAGCAAAGGCAACAGUGUGUAAAGAGGUGUAAAGAG'
print ''.join(re.findall(pattern,s1)) + s1[s1.rfind('AGAG')+4:]
print ''.join(re.findall(pattern,s2)) + s2[s2.rfind('AGAG')+4:]
AGCCUCCGA
GUCACACAGACAGAUGUAGAGCAAAGGCAACA

Returning every instance of whatever's between two strings in a file [Python 3]

What I'm trying to do is open a file, then find every instance of '[\x06I"' and '\x06;', then return whatever is between the two.
Since this is not a standard text file (it's map data from RPG maker) readline() will not work for my purposes, as the file is not at all formatted in such a way that the data I want is always neatly within one line by itself.
What I'm doing right now is loading the file into a list with read(), then simply deleting characters from the very beginning until I hit the string '[\x06I'. Then I scan ahead to find '\x06;', store what's between them as a string, append said string to a list, then resume at the character after the semicolon I found.
It works, and I ended up with pretty much exactly what I wanted, but I feel like that's the worst possible way to go about it. Is there a more efficient way?
My relevant code:
while eofget == 0:
savor = 0
while savor == 0 or eofget == 0:
if line[0:4] == '[\x06I"':
x = 4
spork = 0
while spork == 0:
x += 1
if line[x] == '\x06':
if line[x+1] == ';':
spork = x
savor = line[5:spork] + "\n"
line = line[x+1:]
linefinal[lineinc] = savor
lineinc += 1
elif line[x:x+7] == '#widthi':
print("eof reached")
spork = 1
eofget = 1
savor = 0
elif line[x:x+7] == '#widthi':
print("finished map " + mapname)
eofget = 1
savor = 0
break
else:
line = line[1:]
You can just ignore the variable names. I just name things the first thing that comes to mind when I'm doing one-offs like this. And yes, I am aware a few things in there don't make any sense, but I'm saving cleanup for when I finalize the code.
When eofget gets flipped on this subroutine terminates and the next map is loaded. Then it repeats. The '#widthi' check is basically there to save time, since it's present in every map and indicates the beginning of the map data, AKA data I don't care about.
I feel this is a natural case to use regular expressions. Using the findall method:
>>> s = 'testing[\x06I"text in between 1\x06;filler text[\x06I"text in between 2\x06;more filler[\x06I"text in between \n with some line breaks \n included in the text\x06;ending'
>>> import re
>>> p = re.compile('\[\x06I"(.+?)\x06;', re.DOTALL)
>>> print(p.findall(s))
['text in between 1', 'text in between 2', 'text in between \n with some line breaks \n included in the text']
The regex string '\[\x06I"(.+?)\x06;'can be interpreted as follows:
Match as little as possible (denoted by ?) of an undetermined number of unspecified characters (denoted by .+) surrounded by '[\x06I"' and '\x06;', and only return the enclosed text (denoted by the parentheses around .+?)
Adding re.DOTALL in the compile makes the .? match line breaks as well, allowing multi-line text to be captured.
I would use split():
fulltext = 'adsfasgaseg[\x06I"thisiswhatyouneed\x06;sdfaesgaegegaadsf[\x06I"this is the second what you need \x06;asdfeagaeef'
parts = fulltext.split('[\x06I"') # split by first label
results = []
for part in parts:
if '\x06;' in part: # if second label exists in part
results.append(part.split('\x06;')[0]) # get the part until the second label
print results

Python: split line by comma, then by space

I'm using Python 3 and I need to parse a line like this
-1 0 1 0 , -1 0 0 1
I want to split this into two lists using Fraction so that I can also parse entries like
1/2 17/12 , 1 0 1 1
My program uses a structure like this
from sys import stdin
...
functions'n'stuff
...
for line in stdin:
and I'm trying to do
for line in stdin:
X = [str(elem) for elem in line.split(" , ")]
num = [Fraction(elem) for elem in X[0].split()]
den = [Fraction(elem) for elem in X[1].split()]
but all I get is a list index out of range error: den = [Fraction(elem) for elem in X[1].split()]
IndexError: list index out of range
I don't get it. I get a string from line. I split that string into two strings at " , " and should get one list X containing two strings. These I split at the whitespace into two separate lists while converting each element into Fraction. What am I missing?
I also tried adding X[-1] = X[-1].strip() to get rid of \n that I get from ending the line.
The problem is that your file has a line without a " , " in it, so the split doesn't return 2 elements.
I'd use split(',') instead, and then use strip to remove the leading and trailing blanks. Note that str(...) is redundant, split already returns strings.
X = [elem.strip() for elem in line.split(",")]
You might also have a blank line at the end of the file, which would still only produce one result for split, so you should have a way to handle that case.
With valid input, your code actually works.
You probably get an invalid line, with too much space or even an empty line or so. So first thing inside the loop, print line. Then you know what's going on, you can see right above the error message what the problematic line was.
Or maybe you're not using stdin right. Write the input lines in a file, make sure you only have valid lines (especially no empty lines). Then feed it into your script:
python myscript.py < test.txt
How about this one:
pairs = [line.split(",") for line in stdin]
num = [fraction(elem[0]) for elem in pairs if len(elem) == 2]
den = [fraction(elem[1]) for elem in pairs if len(elem) == 2]

Categories