delete empty spaces of files python - python

I have a file with several lines, and some of them have empty spaces.
x=20
y=3
z = 1.5
v = 0.1
I want to delete those spaces and get each line into a dictionary, where the element before the '=' sign will be the key, and the element after the '=' sign will be its value.
However, my code is not working, at least the "delete empty spaces" part. Here's the code:
def copyFile(filename):
"""
function's contract
"""
with open(filename, 'r') as inFile:
for line in inFile:
cleanedLine = line.strip()
if cleanedLine:
firstPart, secondPart = line.split('=')
dic[firstPart] = float(secondPart)
inFile.close()
return dic
After clearing the empty spaces, my file is supposed to get like this
x=20
y=3
z=1.5
v=0.1
But is not working. What am I doing wrong?

You need to strip after splitting the string. That's assuming that the only unwanted spaces are around the = or before or after the contents of the line.
from ast import literal_eval
def copyFile(filename):
with open(filename, 'r') as inFile:
split_lines = (line.split('=', 1) for line in inFile)
d = {key.strip(): literal_eval(value.strip()) for key, value in split_lines}
return d

There are a few issues with your code.
For one, you never define dic so when you try to add keys to it you get a NameError.
Second, you don't need to inFile.close() because you're opening it in a with which will always close it outside the block.
Third, your function and variable names are not PEP8 standard.
Fourth, you need to strip each part.
Here's some code that works and looks nice:
def copy_file(filename):
"""
function's contract
"""
dic = {}
with open(filename, 'r') as in_file:
for line in in_file:
cleaned_line = line.strip()
if cleaned_line:
first_part, second_part = line.split('=')
dic[first_part.strip()] = float(second_part.strip())
return dic

You have two problems:
The reason you're not removing the white space is that you're calling .strip() on the entire line. strip() removes white space at the beginning and end of the string, not in the middle. Instead, called .strip() on firstpart and lastpart.
That will fix the in-memory dictionary that you're creating but it won't make any changes to the file since you're never writing to the file. You'll want to create a second copy of the file into which you write your strip()ed values and then, at the end, replace the original file with the new file.

to remove the whitespace try .replace(" ", "") instead of .strip()

Related

How do I sort a text file after the last instance of a character?

Goal: Sort the text file alphabetically based on the characters that appear AFTER the final slash. Note that there are random numbers right before the final slash.
Contents of the text file:
https://www.website.com/1939332/delta.html
https://www.website.com/2237243/alpha.html
https://www.website.com/1242174/zeta.html
https://www.website.com/1839352/charlie.html
Desired output:
https://www.website.com/2237243/alpha.html
https://www.website.com/1839352/charlie.html
https://www.website.com/1939332/delta.html
https://www.website.com/1242174/zeta.html
Code Attempt:
i = 0
for line in open("test.txt").readlines(): #reading text file
List = line.rsplit('/', 1) #splits by final slash and gives me 4 lists
dct = {list[i]:list[i+1]} #tried to use a dictionary
sorted_dict=sorted(dct.items()) #sort the dictionary
textfile = open("test.txt", "w")
for element in sorted_dict:
textfile.write(element + "\n")
textfile.close()
Code does not work.
I would pass a different key function to the sorted function. For example:
with open('test.txt', 'r') as f:
lines = f.readlines()
lines = sorted(lines, key=lambda line: line.split('/')[-1])
with open('test.txt', 'w') as f:
f.writelines(lines)
See here for a more detailed explanation of key functions.
Before you run this, I am assuming you have a newline at the end of your test.txt. This will fix "combining the second and third lines".
If you really want to use a dictionary:
dct = {}
i=0
with open("test.txt") as textfile:
for line in textfile.readlines():
mylist = line.rsplit('/',1)
dct[mylist[i]] = mylist[i+1]
sorted_dict=sorted(dct.items(), key=lambda item: item[1])
with open("test.txt", "w") as textfile:
for element in sorted_dict:
textfile.write(element[i] + '/' +element[i+1])
What you did wrong
In the first line, you name your variable List, and in the second you access it using list.
List = line.rsplit('/', 1)
dct = {list[i]:list[i+1]}
Variable names are case sensitive so you need use the same capitalisation each time. Furthermore, Python already has a built-in list class. It can be overridden, but I would not recommend naming your variables list, dict, etc.
( list[i] will actually just generate a types.GenericAlias object, which is a type hint, something completely different from a list, and not what you want at all.)
You also wrote
dct = {list[i]:list[i+1]}
which repeatedly creates a new dictionary in each loop iteration, overwriting whatever was stored in dct previously. You should instead create an empty dictionary before the loop, and assign values to its keys every time you want to update it, as I have done.
You're calling sort in each iteration in the loop; you should only call once it after the loop is done. After all, you only want to sort your dictionary once.
You also open the file twice, and although you close it at the end, I would suggest using a context manager and the with statement as I have done, so that file closing is automatically handled.
My code
sorted(dct.items(), key=lambda item: item[1])
means that the sorted() function uses the second element in the item tuple (the dictionary item) as the 'metric' by which to sort.
`textfile.write(element[i] + '/' +element[i+1])`
is necessary, since, when you did rsplit('/',1), you removed the /s in your data; you need to add them back and reconstruct the string from the element tuple before you write it.
You don't need + \n in textfile.write since readlines() preserves the \n. That's why you should end text files with a newline: so that you don't have to treat the last line differently.
def sortFiles(item):
return item.split("/")[-1]
FILENAME = "test.txt"
contents = [line for line in open(FILENAME, "r").readlines() if line.strip()]
contents.sort(key=sortFiles)
with open(FILENAME, "w") as outfile:
outfile.writelines(contents)

importing from a text file to a dictionary

filename:dictionary.txt
YAHOO:YHOO
GOOGLE INC:GOOG
Harley-Davidson:HOG
Yamana Gold:AUY
Sotheby’s:BID
inBev:BUD
code:
infile = open('dictionary.txt', 'r')
content= infile.readlines()
infile.close()
counters ={}
for line in content:
counters.append(content)
print(counters)
i am trying to import contents of the file.txt to the dictionary. I have searched through stack overflow but please an answer in a simple way (not with open...)
First off, instead of opening and closing the files explicitly you can use with statement for opening the files which, closes the file automatically at the end of the block.
Secondly, as the file objects are iterator-like objects (one shot iterable) you can loop over the lines and split them with : character. You can do all of these things as a generator expression within dict function:
with open('dictionary.txt') as infile:
my_dict = dict(line.strip().split(':') for line in infile)
I assume that you don't have semi-colons in your keys.
In that case you should:
#read lines from your file
lines = open('dictionary.txt').read().split('\n')
#create an empty dictionary
dict = {}
#split every lines at ':' and use the left element as a key for the right value
for l in lines:
content = l.split(':')
dict[content[0]] = content[1]

Appending lines to a file, then reading them

I want to append or write multiple lines to a file. I believe the following code appends one line:
with open(file_path,'a') as file:
file.write('1')
My first question is that if I do this:
with open(file_path,'a') as file:
file.write('1')
file.write('2')
file.write('3')
Will it create a file with the following content?
1
2
3
Second question—if I later do:
with open(file_path,'r') as file:
first = file.read()
second = file.read()
third = file.read()
Will that read the content to the variables so that first will be 1, second will be 2 etc? If not, how do I do it?
Question 1: No.
file.write simple writes whatever you pass to it to the position of the pointer in the file. file.write("Hello "); file.write("World!") will produce a file with contents "Hello World!"
You can write a whole line either by appending a newline character ("\n") to each string to be written, or by using the print function's file keyword argument (which I find to be a bit cleaner)
with open(file_path, 'a') as f:
print('1', file=f)
print('2', file=f)
print('3', file=f)
N.B. print to file doesn't always add a newline, but print itself does by default! print('1', file=f, end='') is identical to f.write('1')
Question 2: No.
file.read() reads the whole file, not one line at a time. In this case you'll get
first == "1\n2\n3"
second == ""
third == ""
This is because after the first call to file.read(), the pointer is set to the end of the file. Subsequent calls try to read from the pointer to the end of the file. Since they're in the same spot, you get an empty string. A better way to do this would be:
with open(file_path, 'r') as f: # `file` is a bad variable name since it shadows the class
lines = f.readlines()
first = lines[0]
second = lines[1]
third = lines[2]
Or:
with open(file_path, 'r') as f:
first, second, third = f.readlines() # fails if there aren't exactly 3 lines
The answer to the first question is no. You're writing individual characters. You would have to read them out individually.
Also, note that file.read() returns the full contents of the file.
If you wrote individual characters and you want to read individual characters, process the result of file.read() as a string.
text = open(file_path).read()
first = text[0]
second = text[1]
third = text[2]
As for the second question, you should write newline characters, '\n', to terminate each line that you write to the file.
with open(file_path, 'w') as out_file:
out_file.write('1\n')
out_file.write('2\n')
out_file.write('3\n')
To read the lines, you can use file.readlines().
lines = open(file_path).readlines()
first = lines[0] # -> '1\n'
second = lines[1] # -> '2\n'
third = lines[2] # -> '3\n'
If you want to get rid of the newline character at the end of each line, use strip(), which discards all whitespace before and after a string. For example:
first = lines[0].strip() # -> '1'
Better yet, you can use map to apply strip() to every line.
lines = list(map(str.strip, open(file_path).readlines()))
first = lines[0] # -> '1'
second = lines[1] # -> '2'
third = lines[2] # -> '3'
Writing multiple lines to a file
This will depend on how the data is stored. For writing individual values, your current example is:
with open(file_path,'a') as file:
file.write('1')
file.write('2')
file.write('3')
The file will contain the following:
123
It will also contain whatever contents it had previously since it was opened to append. To write newlines, you must explicitly add these or use writelines(), which expects an iterable.
Also, I don't recommend using file as an object name since it is a keyword, so I will use f from here on out.
For instance, here is an example where you have a list of values that you write using write() and explicit newline characters:
my_values = ['1', '2', '3']
with open(file_path,'a') as f:
for value in my_values:
f.write(value + '\n')
But a better way would be to use writelines(). To add newlines, you could join them with a list comprehension:
my_values = ['1', '2', '3']
with open(file_path,'a') as f:
f.writelines([value + '\n' for value in my_values])
If you are looking for printing a range of numbers, you could use a for loop with range (or xrange if using Python 2.x and printing a lot of numbers).
Reading individual lines from a file
To read individual lines from a file, you can also use a for loop:
my_list = []
with open(file_path,'r') as f:
for line in f:
my_list.append(line.strip()) # strip out newline characters
This way you can iterate through the lines of the file returned with a for loop (or just process them as you read them, particularly if it's a large file).

Splitting lines in python based on some character

Input:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Output:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
'!' is the starting character and +0013 should be the ending of each line (if present).
Problem which I am getting:
Output is like :
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
Any help would be highly appreciated...!!!
My code:
file_open= open('sample.txt','r')
file_read= file_open.read()
file_open2= open('output.txt','w+')
counter =0
for i in file_read:
if '!' in i:
if counter == 1:
file_open2.write('\n')
counter= counter -1
counter= counter +1
file_open2.write(i)
You can try something like this:
with open("abc.txt") as f:
data=f.read().replace("\r\n","") #replace the newlines with ""
#the newline can be "\n" in your system instead of "\r\n"
ans=filter(None,data.split("!")) #split the data at '!', then filter out empty lines
for x in ans:
print "!"+x #or write to some other file
.....:
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Could you just use str.split?
lines = file_read.split('!')
Now lines is a list which holds the split data. This is almost the lines you want to write -- The only difference is that they don't have trailing newlines and they don't have '!' at the start. We can put those in easily with string formatting -- e.g. '!{0}\n'.format(line). Then we can put that whole thing in a generator expression which we'll pass to file.writelines to put the data in a new file:
file_open2.writelines('!{0}\n'.format(line) for line in lines)
You might need:
file_open2.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
if you find that you're getting more newlines than you wanted in the output.
A few other points, when opening files, it's nice to use a context manager -- This makes sure that the file is closed properly:
with open('inputfile') as fin:
lines = fin.read()
with open('outputfile','w') as fout:
fout.writelines('!{0}\n'.format(line.replace('\n','')) for line in lines)
Another option, using replace instead of split, since you know the starting and ending characters of each line:
In [14]: data = """!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/1
2/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:14,000.
0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W
55.576,+0013!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013!,A,56
281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34
:18,000.0,0,37N22.714,121W55.576,+0013!,A,56281,12/12/19,19:34:19,000.0,0,37N22.""".replace('\n', '')
In [15]: print data.replace('+0013!', "+0013\n!")
!,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
!,A,56281,12/12/19,19:34:19,000.0,0,37N22.
Just for some variance, here is a regular expression answer:
import re
outputFile = open('output.txt', 'w+')
with open('sample.txt', 'r') as f:
for line in re.findall("!.+?(?=!|$)", f.read(), re.DOTALL):
outputFile.write(line.replace("\n", "") + '\n')
outputFile.close()
It will open the output file, get the contents of the input file, and loop through all the matches using the regular expression !.+?(?=!|$) with the re.DOTALL flag. The regular expression explanation & what it matches can be found here: http://regex101.com/r/aK6aV4
After we have a match, we strip out the new lines from the match, and write it to the file.
Let's try to add a \n before every "!"; then let python splitlines :-) :
file_read.replace("!", "!\n").splitlines()
I will actually implement as a generator so that you can work on the data stream rather than the entire content of the file. This will be quite memory friendly if working with huge files
>>> def split_on_stream(it,sep="!"):
prev = ""
for line in it:
line = (prev + line.strip()).split(sep)
for parts in line[:-1]:
yield parts
prev = line[-1]
yield prev
>>> with open("test.txt") as fin:
for parts in split_on_stream(fin):
print parts
,A,56281,12/12/19,19:34:12,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:13,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:14,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:15,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:16,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:17,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:18,000.0,0,37N22.714,121W55.576,+0013
,A,56281,12/12/19,19:34:19,000.0,0,37N22.

Stripping line edings before appending to a list?

Ok I am writing a program that reads text files and goes through the different lines, the problem that I have encountered however is line endings (\n). My aim is to read the text file line by line and write it to a list and remove the line endings before it is appended to the list.
I have tried this:
thelist = []
inputfile = open('text.txt','rU')
for line in inputfile:
line.rstrip()
thelist.append(line)
Strings are immutable in Python. All string methods return new strings, and don't modify the original one, so the line
line.rstrip()
effectively does nothing. You can use a list comprehension to accomplish this:
with open("text.txt", "rU") as f:
lines = [line.rstrip("\n") for line in f]
Also note that it is stringly recommended to use the with statement to open (and implicitly close) files.
with open('text.txt', 'rU') as f: # Use with block to close file on block exit
thelist = [line.rstrip() for line in f]
rstrip doesn't change its argument, it returns modified string, that's why you must write it so:
thelist.append(line.rstrip())
But you can write your code simpler:
with open('text.txt', 'rU') as inputfile:
thelist = [x.rstrip() for x in inputfile]
Use rstrip('\n') on each line before appending to your list.
I think you need something like this.
s = s.strip(' \t\n\r')
This will strip white spaces from both the beginning and the end of you string
In Python - strings are immutable - which means that operations return a new string, and don't modify the existing string. ie, you've got it right, but need to re-assign (or name a new variable) using line = line.rstrip().
rstrip returns a new string. It should be line = line.rstrip(). However, the whole code could be shorter:
thelist = list(map(str.rstrip, open('text.txt','rU')))
UPD: Note that just calling rstrip() trims all trailing whitespace, not just newline. But there is a concise way to do that too:
thelist = open('text.txt','rU').read().splitlines()

Categories