Related
Goal: Sort the text file alphabetically based on the characters that appear AFTER the final slash. Note that there are random numbers right before the final slash.
Contents of the text file:
https://www.website.com/1939332/delta.html
https://www.website.com/2237243/alpha.html
https://www.website.com/1242174/zeta.html
https://www.website.com/1839352/charlie.html
Desired output:
https://www.website.com/2237243/alpha.html
https://www.website.com/1839352/charlie.html
https://www.website.com/1939332/delta.html
https://www.website.com/1242174/zeta.html
Code Attempt:
i = 0
for line in open("test.txt").readlines(): #reading text file
List = line.rsplit('/', 1) #splits by final slash and gives me 4 lists
dct = {list[i]:list[i+1]} #tried to use a dictionary
sorted_dict=sorted(dct.items()) #sort the dictionary
textfile = open("test.txt", "w")
for element in sorted_dict:
textfile.write(element + "\n")
textfile.close()
Code does not work.
I would pass a different key function to the sorted function. For example:
with open('test.txt', 'r') as f:
lines = f.readlines()
lines = sorted(lines, key=lambda line: line.split('/')[-1])
with open('test.txt', 'w') as f:
f.writelines(lines)
See here for a more detailed explanation of key functions.
Before you run this, I am assuming you have a newline at the end of your test.txt. This will fix "combining the second and third lines".
If you really want to use a dictionary:
dct = {}
i=0
with open("test.txt") as textfile:
for line in textfile.readlines():
mylist = line.rsplit('/',1)
dct[mylist[i]] = mylist[i+1]
sorted_dict=sorted(dct.items(), key=lambda item: item[1])
with open("test.txt", "w") as textfile:
for element in sorted_dict:
textfile.write(element[i] + '/' +element[i+1])
What you did wrong
In the first line, you name your variable List, and in the second you access it using list.
List = line.rsplit('/', 1)
dct = {list[i]:list[i+1]}
Variable names are case sensitive so you need use the same capitalisation each time. Furthermore, Python already has a built-in list class. It can be overridden, but I would not recommend naming your variables list, dict, etc.
( list[i] will actually just generate a types.GenericAlias object, which is a type hint, something completely different from a list, and not what you want at all.)
You also wrote
dct = {list[i]:list[i+1]}
which repeatedly creates a new dictionary in each loop iteration, overwriting whatever was stored in dct previously. You should instead create an empty dictionary before the loop, and assign values to its keys every time you want to update it, as I have done.
You're calling sort in each iteration in the loop; you should only call once it after the loop is done. After all, you only want to sort your dictionary once.
You also open the file twice, and although you close it at the end, I would suggest using a context manager and the with statement as I have done, so that file closing is automatically handled.
My code
sorted(dct.items(), key=lambda item: item[1])
means that the sorted() function uses the second element in the item tuple (the dictionary item) as the 'metric' by which to sort.
`textfile.write(element[i] + '/' +element[i+1])`
is necessary, since, when you did rsplit('/',1), you removed the /s in your data; you need to add them back and reconstruct the string from the element tuple before you write it.
You don't need + \n in textfile.write since readlines() preserves the \n. That's why you should end text files with a newline: so that you don't have to treat the last line differently.
def sortFiles(item):
return item.split("/")[-1]
FILENAME = "test.txt"
contents = [line for line in open(FILENAME, "r").readlines() if line.strip()]
contents.sort(key=sortFiles)
with open(FILENAME, "w") as outfile:
outfile.writelines(contents)
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
I have the following problem. I am supposed to open a CSV file (its an excel table) and read it without using any library.
I tried already a lot and have now the first row in a tuple and this in a list. But only the first line. The header. But no other row.
This is what I have so far.
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
The output should: be every line in a tuple and all the tuples in a list.
My question is now, how can I read the other lines in python?
I am really sorry, I am new to programming all together and so I have a real hard time finding my mistake.
Thank you very much in advance for helping me out!
This problem was many times on Stackoverflow so you should find working code.
But much better is to use module csv for this.
You have wrong indentation and you use return results after reading first line so it exits function and it never try read other lines.
But after changing this there are still other problems so it still will not read next lines.
You use readline() so you read only first line and your loop will works all time with the same line - and maybe it will never ends because you never set text = ''
You should use read() to get all text which later you split to lines using split("\n") or you could use readlines() to get all lines as list and then you don't need split(). OR you can use for line in file: In all situations you don't need while
def read_csv(path):
with open(path, 'r+') as file:
results = []
text = file.read()
for line in text.split('\n'):
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
lines = file.readlines()
for line in lines:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
def read_csv(path):
with open(path, 'r+') as file:
results = []
for line in file:
line = line.rstrip('\n') # remove `\n` at the end of line
items = line.split(',')
results.append(tuple(items))
# after for-loop
return results
All this version will not work correctly if you will '\n' or , inside item which shouldn't be treated as end of row or as separtor between items. These items will be in " " which also can make problem to remove them. All these problem you can resolve using standard module csv.
Your code is pretty well and you are near goal:
with open(path, 'r+') as file:
results=[]
text = file.read()
#while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
Your Code:
with open(path, 'r+') as file:
results=[]
text = file.readline()
while text != '':
for line in text.split('\n'):
a=line.split(',')
b=tuple(a)
results.append(b)
return results
So enjoy learning :)
One caveat is that the csv may not end with a blank line as this would result in an ugly tuple at the end of the list like ('',) (Which looks like a smiley)
To prevent this you have to check for empty lines: if line != '': after the for will do the trick.
Let's say I have a text file full of nicknames. How can I delete a specific nickname from this file, using Python?
First, open the file and get all your lines from the file. Then reopen the file in write mode and write your lines back, except for the line you want to delete:
with open("yourfile.txt", "r") as f:
lines = f.readlines()
with open("yourfile.txt", "w") as f:
for line in lines:
if line.strip("\n") != "nickname_to_delete":
f.write(line)
You need to strip("\n") the newline character in the comparison because if your file doesn't end with a newline character the very last line won't either.
Solution to this problem with only a single open:
with open("target.txt", "r+") as f:
d = f.readlines()
f.seek(0)
for i in d:
if i != "line you want to remove...":
f.write(i)
f.truncate()
This solution opens the file in r/w mode ("r+") and makes use of seek to reset the f-pointer then truncate to remove everything after the last write.
The best and fastest option, rather than storing everything in a list and re-opening the file to write it, is in my opinion to re-write the file elsewhere.
with open("yourfile.txt", "r") as file_input:
with open("newfile.txt", "w") as output:
for line in file_input:
if line.strip("\n") != "nickname_to_delete":
output.write(line)
That's it! In one loop and one only you can do the same thing. It will be much faster.
This is a "fork" from #Lother's answer (which I believe that should be considered the right answer).
For a file like this:
$ cat file.txt
1: october rust
2: november rain
3: december snow
This fork from Lother's solution works fine:
#!/usr/bin/python3.4
with open("file.txt","r+") as f:
new_f = f.readlines()
f.seek(0)
for line in new_f:
if "snow" not in line:
f.write(line)
f.truncate()
Improvements:
with open, which discard the usage of f.close()
more clearer if/else for evaluating if string is not present in the current line
The issue with reading lines in first pass and making changes (deleting specific lines) in the second pass is that if you file sizes are huge, you will run out of RAM. Instead, a better approach is to read lines, one by one, and write them into a separate file, eliminating the ones you don't need. I have run this approach with files as big as 12-50 GB, and the RAM usage remains almost constant. Only CPU cycles show processing in progress.
I liked the fileinput approach as explained in this answer:
Deleting a line from a text file (python)
Say for example I have a file which has empty lines in it and I want to remove empty lines, here's how I solved it:
import fileinput
import sys
for line_number, line in enumerate(fileinput.input('file1.txt', inplace=1)):
if len(line) > 1:
sys.stdout.write(line)
Note: The empty lines in my case had length 1
If you use Linux, you can try the following approach.
Suppose you have a text file named animal.txt:
$ cat animal.txt
dog
pig
cat
monkey
elephant
Delete the first line:
>>> import subprocess
>>> subprocess.call(['sed','-i','/.*dog.*/d','animal.txt'])
then
$ cat animal.txt
pig
cat
monkey
elephant
Probably, you already got a correct answer, but here is mine.
Instead of using a list to collect unfiltered data (what readlines() method does), I use two files. One is for hold a main data, and the second is for filtering the data when you delete a specific string. Here is a code:
main_file = open('data_base.txt').read() # your main dataBase file
filter_file = open('filter_base.txt', 'w')
filter_file.write(main_file)
filter_file.close()
main_file = open('data_base.txt', 'w')
for line in open('filter_base'):
if 'your data to delete' not in line: # remove a specific string
main_file.write(line) # put all strings back to your db except deleted
else: pass
main_file.close()
Hope you will find this useful! :)
I think if you read the file into a list, then do the you can iterate over the list to look for the nickname you want to get rid of. You can do it much efficiently without creating additional files, but you'll have to write the result back to the source file.
Here's how I might do this:
import, os, csv # and other imports you need
nicknames_to_delete = ['Nick', 'Stephen', 'Mark']
I'm assuming nicknames.csv contains data like:
Nick
Maria
James
Chris
Mario
Stephen
Isabella
Ahmed
Julia
Mark
...
Then load the file into the list:
nicknames = None
with open("nicknames.csv") as sourceFile:
nicknames = sourceFile.read().splitlines()
Next, iterate over to list to match your inputs to delete:
for nick in nicknames_to_delete:
try:
if nick in nicknames:
nicknames.pop(nicknames.index(nick))
else:
print(nick + " is not found in the file")
except ValueError:
pass
Lastly, write the result back to file:
with open("nicknames.csv", "a") as nicknamesFile:
nicknamesFile.seek(0)
nicknamesFile.truncate()
nicknamesWriter = csv.writer(nicknamesFile)
for name in nicknames:
nicknamesWriter.writeRow([str(name)])
nicknamesFile.close()
In general, you can't; you have to write the whole file again (at least from the point of change to the end).
In some specific cases you can do better than this -
if all your data elements are the same length and in no specific order, and you know the offset of the one you want to get rid of, you could copy the last item over the one to be deleted and truncate the file before the last item;
or you could just overwrite the data chunk with a 'this is bad data, skip it' value or keep a 'this item has been deleted' flag in your saved data elements such that you can mark it deleted without otherwise modifying the file.
This is probably overkill for short documents (anything under 100 KB?).
I like this method using fileinput and the 'inplace' method:
import fileinput
for line in fileinput.input(fname, inplace =1):
line = line.strip()
if not 'UnwantedWord' in line:
print(line)
It's a little less wordy than the other answers and is fast enough for
Save the file lines in a list, then remove of the list the line you want to delete and write the remain lines to a new file
with open("file_name.txt", "r") as f:
lines = f.readlines()
lines.remove("Line you want to delete\n")
with open("new_file.txt", "w") as new_f:
for line in lines:
new_f.write(line)
here's some other method to remove a/some line(s) from a file:
src_file = zzzz.txt
f = open(src_file, "r")
contents = f.readlines()
f.close()
contents.pop(idx) # remove the line item from list, by line number, starts from 0
f = open(src_file, "w")
contents = "".join(contents)
f.write(contents)
f.close()
You can use the re library
Assuming that you are able to load your full txt-file. You then define a list of unwanted nicknames and then substitute them with an empty string "".
# Delete unwanted characters
import re
# Read, then decode for py2 compat.
path_to_file = 'data/nicknames.txt'
text = open(path_to_file, 'rb').read().decode(encoding='utf-8')
# Define unwanted nicknames and substitute them
unwanted_nickname_list = ['SourDough']
text = re.sub("|".join(unwanted_nickname_list), "", text)
Do you want to remove a specific line from file so use this snippet short and simple code you can easily remove any line with sentence or prefix(Symbol).
with open("file_name.txt", "r") as f:
lines = f.readlines()
with open("new_file.txt", "w") as new_f:
for line in lines:
if not line.startswith("write any sentence or symbol to remove line"):
new_f.write(line)
To delete a specific line of a file by its line number:
Replace variables filename and line_to_delete with the name of your file and the line number you want to delete.
filename = 'foo.txt'
line_to_delete = 3
initial_line = 1
file_lines = {}
with open(filename) as f:
content = f.readlines()
for line in content:
file_lines[initial_line] = line.strip()
initial_line += 1
f = open(filename, "w")
for line_number, line_content in file_lines.items():
if line_number != line_to_delete:
f.write('{}\n'.format(line_content))
f.close()
print('Deleted line: {}'.format(line_to_delete))
Example output:
Deleted line: 3
Take the contents of the file, split it by newline into a tuple. Then, access your tuple's line number, join your result tuple, and overwrite to the file.
Question:
How can I open a file in python that contains one integer value per line. Make python read the file, store data in a list and then print the list?
I have to ask the user for a file name and then do everything above. The file entered by the user will be used as 'alist' in the function below.
Thanks
def selectionSort(alist):
for index in range(0, len(alist)):
ismall = index
for i in range(index,len(alist)):
if alist[ismall] > alist[i]:
ismall = i
alist[index], alist[ismall] = alist[ismall], alist[index]
return alist
I think this is exactly what you need:
file = open('filename.txt', 'r')
lines = [int(line.strip()) for line in file.readlines()]
print(lines)
I didn't use a with statement here, as I was not sure whether or not you intended to use the file further in your code.
EDIT: You can just assign an input to a variable...
filename = input('Enter file path: ')
And then the above stuff, except open the file using that variable as a parameter...
file = open(filename, 'r')
Finally, submit the list lines to your function, selectionSort.
selectionSort(lines)
Note: This will only work if the file already exists, but I am sure that is what you meant as there would be no point in creating a new one as it would be empty. Also, if the file specified is not in the current working directory you would need to specify the full path- not just the filename.
Easiest way to open a file in Python and store its contents in a string:
with open('file.txt') as f:
contents = f.read()
for your problem:
with open('file.txt') as f:
values = [int(line) for line in f.readlines()]
print values
Edit: As noted in one of the other answers, the variable f only exists within the indented with-block. This construction automatically handles file closing in some error cases, which you would have to do with a finally-construct otherwise.
You can assign the list of integers to a string or a list
file = open('file.txt', mode = 'r')
values = file.read()
values will have a string which can be printed directly
file = open('file.txt', mode = 'r')
values = file.readlines()
values will have a list for each integer but can't be printed directly
f.readlines() read all the lines in your file, but what if your file contains a lot of lines?
You can try this instead:
new_list = [] ## start a list variable
with open('filename.txt', 'r') as f:
for line in f:
## remove '\n' from the end of the line
line = line.strip()
## store each line as an integer in the list variable
new_list.append(int(line))
print new_list