Problem with replacing a word in a file, using Python - python
I have a .txt file containing data like this:
1,Rent1,Expense,16/02/2010,1,4000,4000
1,Car Loan1,Expense,16/02/2010,2,4500,9000
1,Flat Loan1,Expense,16/02/2010,2,4000,8000
0,Rent2,Expense,16/02/2010,1,4000,4000
0,Car Loan2,Expense,16/02/2010,2,4500,9000
0,Flat Loan2,Expense,16/02/2010,2,4000,8000
I want to replace the first item. If it is 1, means it should remain the same but if it is 0 means I want to change it to 1. So I have tried using the following code:
import fileinput
for line in fileinput.FileInput("sample.txt",inplace=1):
s=line.split(",")
print a
print ','.join(s)
But after successfully executed the program my .txt file looks like:
1,Rent1,Expense,16/02/2010,1,4000,4000
1,Car Loan1,Expense,16/02/2010,2,4500,9000
1,Flat Loan1,Expense,16/02/2010,2,4000,8000
0,Rent2,Expense,16/02/2010,1,4000,4000
0,Car Loan2,Expense,16/02/2010,2,4500,9000
0,Flat Loan2,Expense,16/02/2010,2,4000,8000
Now I want to remove the empty line. Is it possible, or is there any other way to replace the 0's?
print adds an extra newline after the input and you already have one newline there. You should either strip the existing newline (line.rstrip("\n")) or use sys.stdout.write() instead.
import fileinput
import re
p = re.compile(r'^0,')
for line in fileinput.FileInput("sample.txt",inplace=1):
print p.sub('1,', line.strip())
The existing code you have doesn't actually change the lines like you want; print a doesn't do anything if a isn't actually defined! So you end up just printing a blank line (the print a bit) and then printing the existing line, hence why you get a file that's unaltered except for the addition of some blank lines.
Either use rstrip to remove the trailing new lines before printing or use sys.stdout.write instead of print.
Also, if you only need to modify the first element, there is no need to split the entire line and join it again. You only need to split on the first comma:
line.split(',', 1)
If you want even better performance you could also just test the value of line[0] directly.
fixed = []
for l in file('sample.txt'):
parts = l.split(',',1)
if(parts[0] == '0'):
# not sure what you want to do here, but you want to "change this" number to 1?
parts[0] = 1
fixed.append(parts.join(','))
outp = file('sample.txt','w')
for f in fixed:
outp.write(f)
outp.close()
This is untested, but it should get you most of the way there.
Good luck
import fileinput
for line in fileinput.FileInput("sample.txt",inplace=1):
s=line.rstrip().split(",")
print a
print ','.join(s)
You have to use a comma at the end of your print so that it doesn't add a newline. Like so:
print "Hello",
This is what I came up with:
input = open('file.txt', 'r')
output = open('output.txt', 'w')
for line in input:
values = line.split(',')
if (values[0] == '0'):
values[0] = '1'
output.write(','.join(values))
If you want a better csv handling library you might want to use this instead of split.
The cleanest way to do it is to use the CSV parser :
import fileinput
import csv
f = fileinput.FileInput("test.txt",inplace=1)
fichiercsv = csv.reader(f, delimiter=',')
for line in fichiercsv:
line[0] = "1"
print ",".join(line)
Related
python open csv search for pattern and strip everything else
I got a csv file 'svclist.csv' which contains a single column list as follows: pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1 pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs I need to strip each line from everything except the PL5 directoy and the 2 numbers in the last directory and should look like that PL5,00 PL5,01 I started the code as follow: clean_data = [] with open('svclist.csv', 'rt') as f: for line in f: if line.__contains__('profile'): print(line, end='') and I'm stuck here. Thanks in advance for the help.
you can use the regular expression - (PL5)[^/].{0,}([0-9]{2,2}) For explanation, just copy the regex and paste it here - 'https://regexr.com'. This will explain how the regex is working and you can make the required changes. import re test_string_list = ['pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1', 'pf=/usr/sap/PL5/SYS/profile/PL5_ASCS01_s4prdascs'] regex = re.compile("(PL5)[^/].{0,}([0-9]{2,2})") result = [] for test_string in test_string_list: matchArray = regex.findall(test_string) result.append(matchArray[0]) with open('outfile.txt', 'w') as f: for row in result: f.write(f'{str(row)[1:-1]}\n') In the above code, I've created one empty list to hold the tuples. Then, I'm writing to the file. I need to remove the () at the start and end. This can be done via str(row)[1:-1] this will slice the string. Then, I'm using formatted string to write content into 'outfile.csv'
You can use regex for this, (in general, when trying to extract a pattern this might be a good option) import re pattern = r"pf=/usr/sap/PL5/SYS/profile/PL5_.*(\d{2})" with open('svclist.csv', 'rt') as f: for line in f: if 'profile' in line: last_two_numbers = pattern.findall(line)[0] print(f'PL5,{last_two_numbers}') This code goes over each line, checks if "profile" is in the line (this is the same as _contains_), then extracts the last two digits according to the pattern
I made the assumption that the number is always between the two underscores. You could run something similar to this within your for-loop. test_str = "pf=/usr/sap/PL5/SYS/profile/PL5_D00_s4prd1" test_list = test_str.split("_") # splits the string at the underscores output = test_list[1].strip( "abcdefghijklmnopqrstuvwxyz" + str.swapcase("abcdefghijklmnopqrstuvwxyz")) # removing any character try: int(output) # testing if the any special characters are left print(f"PL5, {output}") except ValueError: print(f'Something went wrong! Output is PL5,{output}')
Enumerate using python
I'm a new coder and am currently trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on. I've opened the file and striped it of all it's commas. I found online that you can use a function called enumerate() to get the line number. However when I run the code instead of getting numbers like 1, 2, 3 I get information like: 0x113a2cff0. Any idea of how to fix this problem/what the actual problem is? The code for how I used enumerate is below. my_document = open("data.txt") readDocument = my_document.readlines() invalidData = [] for data in readDocument: stripDocument = data.strip() if stripDocument.isnumeric() == False: data = (enumerate(stripDocument)) invalidData.append(data)
First of all, start by opening the document and already reading its content, and it's a good practice to use with, as it closes the document after the use. The readlines function gathers all the lines (this assumes the data.txt file is in the same folder as your .py one: with open("data.txt") as f: lines = f.readlines() After, use enumerate to add index to the lines, so you can read them, use them, or even save the indexes: for index, line in enumerate(lines): print(index, line) As last point, if you have breaklines on your data.txt, the lines will contain a \n, and you can remove them with the line.strip(), if you need. The full code would be: with open("data.txt") as f: lines = f.readlines() for index, line in enumerate(lines): print(index, line.strip())
Taking your problem statement: trying to write a piece of code that, from an opened txt document, will print out the line number that each piece of information is on You're using enumerate incorrectly as #roganjosh was trying to explain: with open("data.txt") as my_document: for i, data in enumerate(my_document): print(i, data)
The way you're doing it now, you're not removing the commas. The strip() method without arguments only deletes whitespaces leading and trailing the line. If you only want the data, this would work: invalidData = [] for row_number, data in enumerate(readDocument): stripped_line = ''.join(data.split(',')) if not stripped_line.isnumeric(): invalidData.append((row_number, data))
You can use the enumerate() function to enumerate a list. This will return a list of tuples containing the index first, then the line string. Like this: (0, 'first line') Your readDocument is a list of the lines, so it might be a good idea to name it accordingly. lines = my_document.readlines() for i, line in enumerate(lines): print i, line
Read/Write text file
I am trying to change a some lines in a text file without affecting the other lines. This is what's inside the text file called "text.txt" this is a test1|number1 this is a test2|number2 this is a test3|number2 this is a test4|number3 this is a test5|number3 this is a test6|number4 this is a test7|number5 this is a test8|number5 this is a test9|number5 this is a test10|number5 My objective is to change the line 4 and line 5 but keep the rest same. mylist1=[] for lines in open('test','r'): a=lines.split('|') b=a[1].strip() if b== 'number3': mylist1.append('{}|{} \n'.format('this is replacement','number7')) else: mylist1.append('{}|{} \n'.format(a[0],a[1].strip())) myfile=open('test','w') myfile.writelines(mylist1) Even though the code works, I am wondering if there is any better and efficient way to do it? Is it possible to read the file just by line number?
There is not much you can improve. But you have to write all lines to a new file, either changed or unchanged. Minor improvements would be: using the with statement; avoiding storing lines in a list; writing lines without formatting in the else clause (if applicable). Applying all of the above: import shutil with open('test') as old, open('newtest', 'w') as new: for line in old: if line.rsplit('|', 1)[-1].strip() == 'number3': new.write('this is replacement|number7\n') else: new.write(line) shutil.move('newtest', 'test')
import fileinput for lines in fileinput.input('test', inplace=True): # inplace=True redirects stdout to a temp file which will # be renamed to the original when we reach the end of the file. this # is more efficient because it doesn't save the whole file into memeory a = lines.split('|') b = a[1].strip() if b == 'number3': print '{}|{} '.format('this is replacement', 'number7') else: print '{}|{} '.format(a[0], a[1].strip())
No. Files are byte-oriented, not line-oriented, and changing the length of a line will not advance the following bytes.
try this solution with open('test', inplace=True) as text_file: for line in text_file: if line.rsplit('|', 1)[-1].strip() == 'number3': print '{}|{} \n'.format('this is replacement', 'number7') else: print line
It's not wholly clear whether your intent is to identify the lines to be replaced by their value, or by their line number. If the former is your intent, you can get a list of lines like this: with open('test','r') as f: oldlines = f.read().splitlines() If there's a danger of trailing whitespace, you could also: Then you can process them like this: newlines = [ line if not line.strip().endswith('|number3') else 'this is replacement|number7' for line in oldlines] Open the destination file (I'm assuming you want to overwrite the original, here), and write all the lines: with open('test','w') as f: f.write("\n".join(newlines)) This is a general pattern that's useful for any kind of simple line-filtering. If you meant to identify the lines by number, you could just alter the 'newlines' line: newlines = [ line if i not in (3, 4) else 'this is replacement|number7' for i, line in enumerate(oldlines)]
Splitting lines in python based on some character
Input: !,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1 2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000. 0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W 55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56 281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34 :18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22. Output: !,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:19,000.0,0,37N22. '!' is the starting character and +0013 should be the ending of each line (if present). Problem which I am getting: Output is like : !,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/1 2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:14,000. 0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W Any help would be highly appreciated...!!! My code: file_open= open('sample.txt','r') file_read= file_open.read() file_open2= open('output.txt','w+') counter =0 for i in file_read: if '!' in i: if counter == 1: file_open2.write('\n') counter= counter -1 counter= counter +1 file_open2.write(i)
You can try something like this: with open("abc.txt") as f: data=f.read().replace("\r\n","") #replace the newlines with "" #the newline can be "\n" in your system instead of "\r\n" ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines for x in ans: print "!"+x #or write to some other file .....: !,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split? lines = file_read.split('!') Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file: file_open2.writelines('!{0}\n'.format(line) for line in lines) You might need: file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines) if you find that you're getting more newlines than you wanted in the output. A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly: with open('inputfile') as fin: lines = fin.read() with open('outputfile','w') as fout: fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line: In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1 2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000. 0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W 55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56 281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34 :18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '') In [15]: print data.replace('+0013!', "+0013\n!") !,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 !,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer: import re outputFile = open('output.txt', 'w+') with open('sample.txt', 'r') as f: for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL): outputFile.write(line.replace("\n", "") + '\n') outputFile.close() It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4 After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) : file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files >>> def split_on_stream(it,sep="!"): prev = "" for line in it: line = (prev + line.strip()).split(sep) for parts in line[:-1]: yield parts prev = line[-1] yield prev >>> with open("test.txt") as fin: for parts in split_on_stream(fin): print parts ,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013 ,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Trouble sorting a list with python
I'm somewhat new to python. I'm trying to sort through a list of strings and integers. The lists contains some symbols that need to be filtered out (i.e. ro!ad should end up road). Also, they are all on one line separated by a space. So I need to use 2 arguments; one for the input file and then the output file. It should be sorted with numbers first and then the words without the special characters each on a different line. I've been looking at loads of list functions but am having some trouble putting this together as I've never had to do anything like this. Any takers? So far I have the basic stuff #!/usr/bin/python import sys try: infilename = sys.argv[1] #outfilename = sys.argv[2] except: print "Usage: ",sys.argv[0], "infile outfile"; sys.exit(1) ifile = open(infilename, 'r') #ofile = open(outfilename, 'w') data = ifile.readlines() r = sorted(data, key=lambda item: (int(item.partition(' ')[0]) if item[0].isdigit() else float('inf'), item)) ifile.close() print '\n'.join(r) #ofile.writelines(r) #ofile.close() The output shows exactly what was in the file but exactly as the file is written and not sorted at all. The goal is to take a file (arg1.txt) and sort it and make a new file (arg2.txt) which will be cmd line variables. I used print in this case to speed up the editing but need to have it write to a file. That's why the output file areas are commented but feel free to tell me I'm stupid if I screwed that up, too! Thanks for any help!
When you have an issue like this, it's usually a good idea to check your data at various points throughout the program to make sure it looks the way you want it to. The issue here seems to be in the way you're reading in the file. data = ifile.readlines() is going to read in the entire file as a list of lines. But since all the entries you want to sort are on one line, this list will only have one entry. When you try to sort the list, you're passing a list of length 1, which is going to just return the same list regardless of what your key function is. Try changing the line to data = ifile.readlines()[0].split() You may not even need the key function any more since numbers are placed before letters by default. I don't see anything in your code to remove special characters though.
since they are on the same line you dont really need readlines with open('some.txt') as f: data = f.read() #now data = "item 1 item2 etc..." you can use re to filter out unwanted characters import re data = "ro!ad" fixed_data = re.sub("[!?#$]","",data) partition maybe overkill data = "hello 23frank sam wilbur" my_list = data.split() # ["hello","23frank","sam","wilbur"] print sorted(my_list) however you will need to do more to force numbers to sort maybe something like numbers = [x for x in my_list if x[0].isdigit()] strings = [x for x in my_list if not x[0].isdigit()] sorted_list = sorted(numbers,key=lambda x:int(re.sub("[^0-9]","",x))) + sorted(strings(
Also, they are all on one line separated by a space. So your file contains a single line? data = ifile.readlines() This makes data into a list of the lines in your file. All 1 of them. r = sorted(...) This makes r the sorted version of that list. To get the words from the line, you can .read() the entire file as a single string, and .split() it (by default, it splits on whitespace).