how to delete a certain line in a string - python

I'm having real trouble finding the relevant information for this answer, so far I have this code:
test = ("")
file_content = ("")
reset = 0
import os
file = open("repeated names.txt",'r', encoding="utf-8-sig")
from collections import Counter
wordcount = Counter(file.read().split())
for item in wordcount.items():
test = ("{}, = {}".format(*item))
short_string = test
file_content+=short_string
file_content+=("\n")
try:
f = open ("storage.xls", 'a') # storing data befor change
f.write (file_content)
f.close()
except PermissionError:
print ("NOTICE:")
print ("""this file cannot be opened/save, due to it being open at the moment, please close the file before running the programe again, thank-you.""")
time.sleep (1)
find = (" = 4")
test3 = ("")
test5 = ("")
try:
from StringIO import StringIO
except ImportError:
from io import StringIO
for line in StringIO(file_content):
test2 = (line.strip())
if find in test2:
test3 +=test2
test3 +=("\n")
print ("these are the names that have appered more than four times")
print (test3)
file_content_V3 = ("")
file_content_V4 = ("")
with open ("repeated results.txt", 'r') as r: #the results include a name with a ',' after it like this: 'name,'
for line in sorted(r):
if "..." in line:
pass
else :
short_string = (line)
file_content_V3+=short_string
file_content_V3+=("...")
file_content_V3+=("\n")
test4 = test3.replace(' = 4','')
print (test4)
# i would only like to have the first 3 results, so i have kept it to 4, so it will delete the last one.
one = 0
two = 1
three = 0
for line in StringIO(file_content_V3):
one = one+two
for line in StringIO(test4):
three = three+two
one = (one - two)
print (one)
print (three)
num = 0
numV2 = 0
yes = 0
delete = ("")
while True:
manager = (file_content_V3.splitlines()[num])
print (manager)
while True:
if reset == 0:
findV2 = (test4.splitlines()[numV2])
print (findV2)
if findV2 in manager:
print ("yes")
yes = (yes+1)
# code that is suppose to go here:
# code that deletes the line 'numV2' from string 'test4'.
# code that deletes the line 'num' from string 'file_content_V3'.
delete += str(num)
delete+= ("\n")
reset = 1
num = (num+1)
else:
print (numV2)
numV2 = (numV2 + 1)
print ("reset intiates")
reset = 1
if numV2 == three:
reset = 1
numV2 = 0
num = (num + 1)
print ("hard reset initiated")
if reset == 1:
print ("working")
reset = 0
if num == one:
reset = 2
break
if reset == 2:
break
break
if reset == 2:
break
print ("i am out of all loops")
print (num, "", one)
print (yes)
print (delete)
print (test4)
# it may be hard to understand at this point, as I have a lot of testing to do to make sure it works stably (had some flaws here and there).
The code is used to sort out results alphabetically, and to only have three of the latest results. so far, I have got it to be able to say 'yes' when ever it finds a repeat, but then I would like to have a code that can delete that line (or a code that deletes certain lines in a string in general).
I am trying to figure out how to delete a line from a string using the line numbers, (not file, as there have been a lot of string transfers from the original string). After a dozen of searches, I am no where close enough to my answer.
it will be really helpful if someone helped me what code can delete lines in strings, by giving a a number for which line. Note: I cannot download third party modules, sorry but its only those modules that come with python, pre downloaded...
so can any one help me out, thank-you, links to other web pages will also be helpful.

I'll call the string content; generalize as needed. Since strings are immutable, you'll have to rebuild the string without the chosen line in the middle. Let n be the line number you want to delete.
front = content.split('\n', n-1) # Split just before the doomed line.
rear_pos = front[1].index('\n') # Find the start of the next line.
content = front[0] + front[1][rear_pos:] # Rejoin without the deleted line.

I have found another way to do this process, by transferring the string into a list then you can search and delete the unwanted line, thereafter, transferring the list back into a string.
test5 = list (StringIO(test4))
del test5[1] # the number 1 can be replaced by a string.
try:
test4 = ''.join(test5)
except AttributeError:
import string
test5 = string.join(a,'')
test5 = test4
the only problem is that when using this, my strings get re-arranged, somehow, which means it may need work on.

Related

Have user input within for loop print something, given a certain input, but not pass to the next file

So I'm still new to coding and I've got a mess of for loops, conditional if statements, and a couple of while loops. My code loops over files and depending on my input, it moves the files to a location matching my input. However, I would like to be able to prompt the code to simply print a list but not move onto the next file. I've tried placing it into a while loop but whenever the while loop is satisfied, it passes onto the next file.
while True:
try:
if "print df" in answer:
subset_folders_list = list()
for folder in all_folders_list:
if folder.startswith('A'):
subset_folders_list.append(folder)
df = pd.DataFrame(subset_folders_list, columns ['Folders'])
print(df)
But upon an input of "print-folders", it will print my dataframe and move onto the next file because the condition of the while loop is met. How can I get this code to print this dataframe without moving onto the next file. Note that this while loop is nested inside of another while loop inside of a function that is called inside of another loop that is inside of a function. But I think this is the only chunk of code I need to fix in order to implement this feature.
EDIT: other relevant code:
for filename in files_to_move:
counter += 1
matching_folders = list()
iterating = True
item_words = set(re.split('[. ,_-]', filename.lower()))
source_file_path = os.path.join(paths[0], filename)
all_folders_list = [g for g in os.listdir(paths[12]) if not g.startswith('.')]
#Matching the filename with a folder
for folder in all_folders_list:
count = 0
folder_words = folder.lower().split(' ')
for word in item_words:
if word in folder_words:
count += 1
if count >= 2:
matching_folders.append(folder)
#Multiple matching folders
if len(matching_folders) >= 2:
print("\n" + f"There is MORE than one folder for {filename}")
if not filename in files_to_move:
continue
while iterating:
try:
pass
answer_2 = input("\n" + f"MOVE IT TO ONE OR ELSEWHERE (type name of folder or print-subset for subset list)?: ")
item_words = answer_2.lower().split(' ')
if len(item_words) >= 2:
#Moving file to a matching or input folder
for folder in all_folders_list:
count = 0
folder_words = folder.lower().split(' ')
for word in item_words:
if word in folder_words:
count += 1
if count == 2:
folder_path = os.path.join(paths[12], folder)
destination_file_path = os.path.join(folder_path, filename)
shutil.move(source_file_path, destination_file_path)
print("\n" + f"File moved to {folder}")
print(FTG)
iterating = False
break
if len(item_words) < 2:
file_mover(source_file_path, paths, filename, answer_2, iterating, all_folders_list, matching_folders, x, counter)
iterating = False
break
So my function file_mover() has many other conditional if statements in it, I will skip them because they work, but the part I want to add is to be able to print the subset without moving onto the next file. Here is file_mover:
def file_mover(source_file_path, paths, filename, answer_2, iterating, all_folders_list, matching_folders, x, counter):
FTG = str("\n" + str(x - counter) + " files to go")
while iterating:
if "exit" in answer_2:
print()
iterating = False
sys.exit()
if "pass" in answer_2:
print("Moving on to NEXT file" + "\n")
print(FTG)
iterating = False
pass
if "del" in answer_2:
shutil.move(source_file_path, os.path.join(paths[15], filename))
print(f"File moved to DELETE folder" + "\n")
print(FTG)
iterating = False
#This is where I want the ability to just print something given the input "print-subset", or something like that, and have it re-prompt me for an input.
else:
break
The reason I want to add this is because there is a subset of non-matching folders that sometimes I want to move the files to, and I don't always remember what they are or I type them in wrong. Now I've got an error catcher in my main while loop, so that could easily function as a workaround for typing in the folder name wrong, but I'm new to coding and this would be good practice. I've tried a lot of things and I can get it to print over and over again endlessly or I can get it to print one time but it moves onto the next file. I don't want either of those to happen. I'd like the subset to print once and have it prompt me for an input again for the same file. I can also try to implement this in the len(item_words) >= 2 section and do the input as "print subset", but I might also run into the same problems.

Python start from beginning of file if first file is not last line

with open(args.identfile) as indetifierfile, \
open(args.elementtxt) as elementfile:
for identifier_line, element_line in izip(identifierfile, elementfile):
ident_split = identifier_line.split(".")
el_split = elementfile_line.split(".")
print ident_split[0]
print ident_split[1]
print el_split[0] //print for debug, bad programming practice apparently. I know.
print el_split[1]
if el_split is None: //tried to use this to start from the beginning of the file and continue loop? I don't know if it's valid.
el_split.seek(0)
So I tried to read and process these two files. Where the print statements are I was going to put some code to put the stuff from the files together and output it to a file. The stuff in the element file doesn't have as much as identifier files. I would like to start from the beginning of the element file everytime it reaches end of file? I'm not sure how to go about this I tried the .seek
But that's not working. How do I go about doing this? To continue the loop and reading identifier file but start from the beginning of the element file.
You can try using itertools.cycle:
from itertools import izip, cycle
with open(args.identfile) as indetifierfile, \
open(args.elementtxt) as elementfile:
for identifier_line, element_line in izip(identifierfile, cycle(elementfile)):
ident_split = identifier_line.split(".")
el_split = elementfile_line.split(".")
print ident_split[0]
print ident_split[1]
print el_split[0]
print el_split[1]
I think the code below will do what you want. Get the length of the element file. Add to your counter every pass of the for loop. Once the counter reaches the length of the element file minus 1 (because arrays start at 0), it will reset the counter to 0 and start from the beginning of the elementfile while still going on the identifierfile.
count = 0
elementLength = len(elementfile)
for i in range(len(identifierfile)):
ident_split = identifierfile[i].split(".")
el_split = elementfile[count].split(".")
print ident_split[0]
print ident_split[1]
print el_split[0]
print el_split[1]
if count < (elementLength-1):
count = count + 1
else:
count = 0

Python Iterator strangely skipping items

As part of a program that decodes a communication protocol (EDIFACT MSCONS) I have a class that gives me the next 'segment' of the message. The segments are delimited by an apostrophe "'". There may be newlines after the "'" or not.
Here's the code for that class:
class SegmentGenerator:
def __init__(self, filename):
try:
fh = open(filename)
except IOError:
print ("Error: file " + filename + " not found!")
sys.exit(2)
lines=[]
for line in fh:
line = line.rstrip()
lines.append(line)
if len(lines) == 1:
msg = lines[0]
else:
msg = ''
for line in lines:
msg = msg + line.rstrip()
self.segments=msg.split("'")
self.iterator=iter(self.segments)
def next(self):
try:
return next(self.iterator)
except StopIteration:
return None
if __name__ == '__main__': #testing only
sg = SegmentGenerator('MSCONS_21X000000001333E_20X-SUD-STROUM-M_20180807_000026404801.txt')
for i in range(210436):
if i > 8940:
break
print(sg.next())
To give an idea what the file looks like here's an excerpt of it:
UNB+UNOC:3+21X000000001333E:020+20X-SUD-STROUM-M:020+180807:1400+000026404801++TL'UNH+000026404802+MSCONS:D:04B:UN:1.0'BGM+7+000026404802+9'DTM+137:201808071400:203'RFF+AGI:6HYR67925RZUD_000000257860_00_E27'NAD+MS+21X000000001333E::020'NAD+MR+20X-SUD-STROUM-M::020'UNS+D'NAD+DP'LOC+172+LU0000010496200000000000050287886::89'DTM+163:201701010000?+01:303'DTM+164:201702010000?+01:303'LIN+1'PIA+5+1-1?:1.29.0:SRW'QTY+220:9.600'DTM+163:201701010000?+01:303'DTM+164:201701010015?+01:303'QTY+220:10.400'DTM+163:201701010015?+01:303'DTM+164:201701010030?+01:303'QTY+220:10.400'DTM+163:201701010030?+01:303'DTM+164:201701010045?+01:303'QTY+220:10.400'DTM+163:201701010045?+01:303'DTM+164:201701010100?+01:303'QTY+220:10.400'DTM+163:201701010100?+01:303'DTM+164:201701010115?+01:303'QTY+220:10.400'DTM+163:201701010115?+01:303'DTM+164:201701010130?+01:303'QTY+220:10.400'DTM+163:201701010130?+01:303'DTM+164:201701010145?+01:303'QTY+220:10.400'DTM+163:201701010145?+01:303'DTM+164:201701010200?+01:303'QTY+220:11.200'DTM+163:201701010200?+01:303' ...
The file I have a problem with has 210000 of those segments. I tested the code and everything works fine. The list of segments is complete and I get one segment after the other correctly until the end of the list.
I use the segments as input to a statemachine that gets new segments from an instance of SegmentGenerator.
Here's an excerpt:
def DTMstarttransition(self,segment):
match=re.search('DTM\+(.*?):(.*?):(.*?)($|\+.*|:.*)',segment)
if match:
if match.group(1) == '164':
self.currentendtime=self.dateConvert(match.group(2),match.group(3))
return('DTMend',self.sg.next())
return('Error',segment + "\nExpected DTM segment didn't match")
The method returns the name of the next state and the next segment sg.next(), sg being an instance of SegmentGenerator.
However at the 8942st segment the call to sg.next() doesn't give me the next segment but the second last of the list of segments!
I traced the function calls (with the autologging module):
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'DTM+164:201702010000?+01:303'
TRACE:__main__.MSCONSparser:QTYtransition:RETURN ('DTMstart', 'DTM+164:201702010000?+01:303')
TRACE:__main__.MSCONSparser:DTMstarttransition:CALL *('DTM+164:201702010000?+01:303',) **{}
TRACE:__main__.MSCONSparser:dateConvert:CALL *('201702010000?+01', '303') **{}
TRACE:__main__.MSCONSparser:dateConvert:RETURN datetime.datetime(2017, 2, 1, 0, 0)
TRACE:segmentgenerator.SegmentGenerator:next:CALL *() **{}
TRACE:segmentgenerator.SegmentGenerator:next:RETURN 'UNT+17872+000026404802'
TRACE:__main__.MSCONSparser:DTMstarttransition:RETURN ('DTMend', 'UNT+17872+000026404802')
TRACE:__main__.MSCONSparser:DTMendtransition:CALL *('UNT+17872+000026404802',) **{}
UNT+... isn't the next segment it should be a LIN segment.
But how is this possible? Why does SegmentGenerator work when I test it with the main function in its module and doesn't work correctly after thousands of calls from the other module?
All the segments are there from beginning to end. I can verify this from the interpreter, since the list sg.segments stays available after program stop. len(sg.segments) is 210435 but my program stops after 8942. So it is clearly a problem with the iterator.
The files (3 python files and data example) can be found on Github in branch 'next' if you like to test the whole thing.
I think it's possible there is a double apostrophe '' in your data file, near the 8942th apostrophe.
In this case your code will continue to read the whole file reading all 210435 segments.
But if you have the condition that tests the result of sg.next(), then that would be falsey on the 8942th iteration, and I'm guessing this is causing your program to abort.
eg:
while sg.next():
# some processing here
If I'm completely wrong then I'd be interested in seeing the behaviour of this: - where len and iterations should equal.
if __name__ == '__main__':
fn = sys.argv[1]
sg = SegmentGenerator(fn)
print("Num segments:", len(sg.segments))
i = 0
value = 'x'
while value:
value = sg.next()
i += 1
print(i, value)
print("Num iterations:", i)
It turned out that a segment 'DTM+164:201702010000?+01:303' existed a second time further down in the file and that indeed that one is followed by a UTM segment. So the problem is with the protocol states themselves and the iterator was working correctly.
So sorry that I bothered you with my wrong assumption. Thanks for wanting to help!

Python: How to compare string from two text files and retrieve an additional line of one in case of match

I have found so much information from previous search on this website but I seem to be stuck on the following issue.
I have two text files that looks like this
Inter.txt ( n-lines but only showed 4 lines,you get the idea)
7275
30000
6693
855
....
rules.txt (2n-lines)
7275
8500
6693
7555
....
3
1000
8
5
....
I want to compare the first line of Inter.txt with rules.txt and in case of a match, I jump for n-lines in order to get the score of that line. (E.g. with 7275, there is a match, I jump n to get the score 3)
I produced the following code but for some reasons, I only have the ouput of the first line when I should have one for each match from my first file. With the previous example, I should have 8 as an output for 6693.
import linecache
inter = open("Inter.txt", "r")
rules = open("rules.txt", "r")
iScore = 0
jump = 266
i=0
for lineInt in inter:
#i = i+1
#print(i)
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
All the print(i) are there because I checked that all the lines were read. I am a novice in Python.
To sum up, I don't understand why I only have one output. Thanks in advance !
Ok, I think the main thing that blocks you from getting forward is that the for loops on files gets the pointer to the end of the file, and doesn't resets when you starts the loops again.
So when you only open rules.txt once, and uses its intance in the inner loop it only goes through all the lines at the first iteration of the outer loop, the second time it tries to go over the remains lines, which are non.
The solution is to close and open the file outside the inner loop.
This code worked for me.
import linecache
inter = open("Inter.txt", "r")
iScore = 0
jump = 4
for lineInt in inter:
i=0
#i = i+1
#print(i)
rules = open("rules.txt", "r")
for lineRul in rules:
i = i+1
#print(i)
if lineInt == lineRul:
print("Match")
inc = linecache.getline("rules.txt", i + jump)
#print(inc)
iScore = iScore + int(inc)
print(iScore)
#break
else:
continue
rules.close()
I also moved where you set the i to 0 to the beginning of the outer loop, but I guess you'd find it yourself.
And I changed jump to 4 to fit the example files your gave :p
Can you please try this solution:
def get_rules_values(rules_file):
with open(rules_file, "r") as rules:
return map(int, rules.readlines())
def get_rules_dict(rules_values):
return dict(zip(rules_values[:len(rules_values)/2], rules_values[len(rules_values)/2:]))
def get_inter_values(inter_file):
with open(inter_file, "r") as inter:
return map(int, inter.readlines())
rules_dict = get_rules_dict(get_rules_values("rules.txt"))
inter_values = get_inter_values("inter.txt")
for inter_value in inter_values:
print inter_value, rules_dict[inter_value]
Hope it's working for you!

Returning every instance of whatever's between two strings in a file [Python 3]

What I'm trying to do is open a file, then find every instance of '[\x06I"' and '\x06;', then return whatever is between the two.
Since this is not a standard text file (it's map data from RPG maker) readline() will not work for my purposes, as the file is not at all formatted in such a way that the data I want is always neatly within one line by itself.
What I'm doing right now is loading the file into a list with read(), then simply deleting characters from the very beginning until I hit the string '[\x06I'. Then I scan ahead to find '\x06;', store what's between them as a string, append said string to a list, then resume at the character after the semicolon I found.
It works, and I ended up with pretty much exactly what I wanted, but I feel like that's the worst possible way to go about it. Is there a more efficient way?
My relevant code:
while eofget == 0:
savor = 0
while savor == 0 or eofget == 0:
if line[0:4] == '[\x06I"':
x = 4
spork = 0
while spork == 0:
x += 1
if line[x] == '\x06':
if line[x+1] == ';':
spork = x
savor = line[5:spork] + "\n"
line = line[x+1:]
linefinal[lineinc] = savor
lineinc += 1
elif line[x:x+7] == '#widthi':
print("eof reached")
spork = 1
eofget = 1
savor = 0
elif line[x:x+7] == '#widthi':
print("finished map " + mapname)
eofget = 1
savor = 0
break
else:
line = line[1:]
You can just ignore the variable names. I just name things the first thing that comes to mind when I'm doing one-offs like this. And yes, I am aware a few things in there don't make any sense, but I'm saving cleanup for when I finalize the code.
When eofget gets flipped on this subroutine terminates and the next map is loaded. Then it repeats. The '#widthi' check is basically there to save time, since it's present in every map and indicates the beginning of the map data, AKA data I don't care about.
I feel this is a natural case to use regular expressions. Using the findall method:
>>> s = 'testing[\x06I"text in between 1\x06;filler text[\x06I"text in between 2\x06;more filler[\x06I"text in between \n with some line breaks \n included in the text\x06;ending'
>>> import re
>>> p = re.compile('\[\x06I"(.+?)\x06;', re.DOTALL)
>>> print(p.findall(s))
['text in between 1', 'text in between 2', 'text in between \n with some line breaks \n included in the text']
The regex string '\[\x06I"(.+?)\x06;'can be interpreted as follows:
Match as little as possible (denoted by ?) of an undetermined number of unspecified characters (denoted by .+) surrounded by '[\x06I"' and '\x06;', and only return the enclosed text (denoted by the parentheses around .+?)
Adding re.DOTALL in the compile makes the .? match line breaks as well, allowing multi-line text to be captured.
I would use split():
fulltext = 'adsfasgaseg[\x06I"thisiswhatyouneed\x06;sdfaesgaegegaadsf[\x06I"this is the second what you need \x06;asdfeagaeef'
parts = fulltext.split('[\x06I"') # split by first label
results = []
for part in parts:
if '\x06;' in part: # if second label exists in part
results.append(part.split('\x06;')[0]) # get the part until the second label
print results

Categories