I'm trying to remove all (non-space) whitespace characters from a file and replace all spaces with commas. Here is my current code:
def file_get_contents(filename):
with open(filename) as f:
return f.read()
content = file_get_contents('file.txt')
content = content.split
content = str(content).replace(' ',',')
with open ('file.txt', 'w') as f:
f.write(content)
when this is run, it replaces the contents of the file with:
<built-in,method,split,of,str,object,at,0x100894200>
The main issue you have is that you're assigning the method content.split to content, rather than calling it and assigning its return value. If you print out content after that assignment, it will be: <built-in method split of str object at 0x100894200> which is not what you want. Fix it by adding parentheses, to make it a call of the method, rather than just a reference to it:
content = content.split()
I think you might still have an issue after fixing that through. str.split returns a list, which you're then tuning back into a string using str (before trying to substitute commas for spaces). That's going to give you square brackets and quotation marks, which you probably don't want, and you'll get a bunch of extra commas. Instead, I suggest using the str.join method like this:
content = ",".join(content) # joins all members of the list with commas
I'm not exactly sure if this is what you want though. Using split is going to replace all the newlines in the file, so you're going to end up with a single line with many, many words separated by commas.
When you split the content, you forgot to call the function. Also once you split, its an array so you should loop to replace things.
def file_get_contents(filename):
with open(filename) as f:
return f.read()
content = file_get_contents('file.txt')
content = content.split() <- HERE
content = [c.replace(' ',',') for c in content]
content = "".join(content)
with open ('file.txt', 'w') as f:
f.write(content)
if you are looking to replace characters i think you would be better off using python's re module for regular expressions. sample code would be as follows:
import re
def file_get_contents(filename):
with open(filename) as f:
return f.read()
if __name__=='__main__':
content = file_get_contents('file.txt')
# First replace any spaces with commas, then remove any other whitespace
new_content = re.sub('\s', '', re.sub(' ', ',', content))
with open ('new_file.txt', 'w') as f:
f.write(new_content)
its more succinct then trying to split all the time and gives you a little bit more flexibility. just also be careful with how large of a file you are opening and reading with your code - you may want to consider using a line iterator or something instead of reading all the file contents at once
Related
Simple inquiry here that I can't seem to figure out. I've written a line of code to add commas to the end of a list of several hundred URLs as so:
with open('bdall.txt') as f:
lines = f.read().splitlines()
new_line = ', '.join(lines)
But how do I modify this (or write new code) to also put all of the URLs from that text file into quotes so that I can paste them all into a program I am writing?
The output of the above is:
http://www.metrolyrics.com/til-i-fell-in-love-with-you-lyrics-bob-dylan.html,
http://www.metrolyrics.com/i-heard-that-lonesome-whistle-lyrics-bob-dylan.html,
etc.
But I need
"http://www.metrolyrics.com/til-i-fell-in-love-with-you-lyrics-bob-dylan.html", "http://www.metrolyrics.com/i-heard-that-lonesome-whistle-lyrics-bob-dylan.html",
etc.
Instead of directly passing "lines" as parameter for the ".join()" method, you can turn it into a list comprehension where you just need to format each line like this:
with open('bdall.txt') as f:
lines = f.read().splitlines()
new_line = ', '.join([f'"{line}"' for line in lines])
With this solution, the possibilities are endless.
I hope that was helpful
Suppose I have a very large integer (say, of 1000+ digits) that I've saved into a text file named 'large_number.txt'. But the problem is the integer has been divided into multiple lines in the file, i.e. like the following:
47451445736001306439091167216856844588711603153276
70386486105843025439939619828917593665686757934951
62176457141856560629502157223196586755079324193331
64906352462741904929101432445813822663347944758178
92575867718337217661963751590579239728245598838407
58203565325359399008402633568948830189458628227828
80181199384826282014278194139940567587151170094390
35398664372827112653829987240784473053190104293586
86515506006295864861532075273371959191420517255829
71693888707715466499115593487603532921714970056938
54370070576826684624621495650076471787294438377604
Now, I want to read this number from the file and use it as a regular integer in my program. I tried the following but I can't.
My Try (Python):
with open('large_number.txt') as f:
data = f.read().splitlines()
Is there any way to do this properly in Python 3.6 ? Or what best can be done in this situation?
Just replace the newlines with nothing, then parse:
with open('large_number.txt') as f:
data = int(f.read().replace('\n', ''))
If you might have arbitrary (ASCII) whitespace and you want to discard all of it, switch to:
import string
killwhitespace = str.maketrans(dict.fromkeys(string.whitespace))
with open('large_number.txt') as f:
data = int(f.read().translate(killwhitespace))
Either way that's significantly more efficient than processing line-by-line in this case (because you need all the lines to parse, any line-by-line solution would be ugly), both in memory and runtime.
You can use this code:
with open('large_number.txt', 'r') as myfile:
data = myfile.read().replace('\n', '')
number = int(data)
You can use str.rstrip to remove the trailing newline characters and use str.join to join the lines into one string:
with open('large_number.txt') as file:
data = int(''.join(line.rstrip() for line in file))
Using the python open built-in function in this way:
with open('myfile.csv', mode='r') as rows:
for r in rows:
print(r.__repr__())
I obtain this ouput
'col1,col2,col3\n'
'fst,snd,trd\n'
'1,2,3\n'
I don't want the \n character. Do you know some efficient way to remove that char (in place of the obvious r.replace('\n',''))?
If you are trying to read and parse csv file, Python's csv module might serve better:
import csv
reader = csv.reader(open('myfile.csv', 'r'))
for row in reader:
print(', '.join(row))
Although you cannot change the line terminator for reader here, it ends a row with either '\r' or '\n', which works for your case.
https://docs.python.org/3/library/csv.html#csv.Dialect.lineterminator
Again, for most of the cases, I don't think you need to parse csv file manually. There are a few issues/reasons that makes csv module easier for you: field containing separator, field containing newline character, field containing quote character, etc.
You can use string.strip(), which (with no arguments) removes any whitespace from the start and end of a string:
for r in rows:
print(r.strip())
If you want to remove only newlines, you can pass that character as an argument to strip:
for r in rows:
print(r.strip('\n'))
For a clean solution, you could use a generator to wrap open, like this:
def open_no_newlines(*args, **kwargs):
with open(*args, **kwargs) as f:
for line in f:
yield line.strip('\n')
You can then use open_no_newlines like this:
for line in open_no_newlines('myfile.csv', mode='r'):
print(line)
EDIT: See bottom of post for the entire code
I am new to this forum and I have an issue that I would be grateful for any help solving.
Situation and goal:
- I have a list of strings. Each string is one word, like this: ['WORD', 'LINKS', 'QUOTE' ...] and so on.
- I would like to write this list of words (strings) on separate lines in a new text file.
- One would think the way to do this would be by appending the '\n' to every item in the list, but when I do that, I get a blank line between every list item. WHY?
Please have a look at this simple function:
def write_new_file(input_list):
with open('TEKST\\TEKST_ny.txt', mode='wt') as output_file:
for linje in input_list:
output_file.write(linje + '\n')
This produces a file that looks like this:
WORD
LINKS
QUOTE
If I remove the '\n', then the file looks like this:
WORDLINKSQUOTE
Instead, the file should look like this:
WORD
LINKS
QUOTE
I am obviously doing something wrong, but after a lot of experimenting and reading around the web, I can't seem to get it right.
Any help would be deeply appreciated, thank you!
Response to link to thread about write() vs. writelines():
Writelines() doesn't fix this by itself, it produces the same result as write() without the '\n'. Unless I add a newline to every list item before passing it to the writelines(). But then we're back at the first option and the blank lines...
I tried to use one of the answers in the linked thread, using '\n'.join() and then write(), but I still get the blank lines.
It comes down to this: For some reason, I get two newlines for every '\n', no matter how I use it. I am .strip()'ing the list items of newline characters to be sure, and without the nl everything is just one massive block of texts anyway.
On using another editor: I tried open the txt-file in windows notepad and in notepad++. Any reason why these programs wouldn't display it correctly?
EDIT: This is the entire code. Sorry for the Norwegian naming. The purpose of the program is to read and clean up a text file and return the words first as a list and ultimately as a new file with each word on a new line. The text file is a list of Scrabble-words, so it's rather big (9 mb or something). PS: I don't advocate Scrabble-cheating, this is just a programming exercise :)
def renskriv(opprinnelig_ord):
nytt_ord = ''
for bokstav in opprinnelig_ord:
if bokstav.isupper() == True:
nytt_ord = nytt_ord + bokstav
return nytt_ord
def skriv_ny_fil(ny_liste):
with open('NSF\\NSF_ny.txt', 'w') as f:
for linje in ny_liste:
f.write(linje + '\n')
def behandle_kildefil():
innfil = open('NSF\\NSF_full.txt', 'r')
f = innfil.read()
kildeliste = f.split()
ny_liste = []
for item in kildeliste:
nytt_ord = renskriv(item)
nytt_ord = nytt_ord.strip('\n')
ny_liste.append(nytt_ord)
skriv_ny_fil(ny_liste)
innfil.close()
def main():
behandle_kildefil()
if __name__ == '__main__':
main()
I think there must be some '\n' among your lines, try to skip empty lines.
I suggest you this code.
def write_new_file(input_list):
with open('TEKST\\TEKST_ny.txt', 'w') as output_file:
for linje in input_list:
if not linje.startswith('\n'):
output_file.write(linje.strip() + '\n')
You've said in the comments that python is writing two carriage return ('\r') characters for each line feed ('\n') character you write. It's a bit bizaare that python is replacing each line feed with two carriage returns, but this is a feature of opening a file in text mode (normally the translation would be to something more useful). If instead you open your file in binary mode then this translation will not be done and the file should display as you wish in Notepad++. NB. Using binary mode may cause problems if you need characters outside the ASCII range -- ASCII is basically just latin letters (no accents), digits and a few symbols.
For python 2 try:
filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
outfile.write("first line")
outfile.write("\n")
outfile.write("second line")
Python 3 will be a bit more tricky. For each string literal you wish you write you must prepend it with a b (for binary). For each string you don't have immediate access to, or don't wish to change to a binary string, then you must encode it using the encode() method on the string. eg.
filename = "somefile.txt"
with open(filename, mode="wb") as outfile:
outfile.write(b"first line")
outfile.write(b"\n")
some_text = "second line"
outfile.write(some_text.encode())
How would you get only the first line of a file as a string with Python?
Use the .readline() method:
with open('myfile.txt') as f:
first_line = f.readline()
Note that unless it is the only line in the file, the string returned from f.readline() will contain a trailing newline. You may wish to use
with open('myfile.txt') as f:
first_line = f.readline().strip('\n')
instead, to remove the newline.
infile = open('filename.txt', 'r')
firstLine = infile.readline()
fline=open("myfile").readline().rstrip()
To go back to the beginning of an open file and then return the first line, do this:
my_file.seek(0)
first_line = my_file.readline()
This should do it:
f = open('myfile.txt')
first = f.readline()
first_line = next(open(filename))
Lots of other answers here, but to answer precisely the question you asked (before #MarkAmery went and edited the original question and changed the meaning):
>>> f = open('myfile.txt')
>>> data = f.read()
>>> # I'm assuming you had the above before asking the question
>>> first_line = data.split('\n', 1)[0]
In other words, if you've already read in the file (as you said), and have a big block of data in memory, then to get the first line from it efficiently, do a split() on the newline character, once only, and take the first element from the resulting list.
Note that this does not include the \n character at the end of the line, but I'm assuming you don't want it anyway (and a single-line file may not even have one). Also note that although it's pretty short and quick, it does make a copy of the data, so for a really large blob of memory you may not consider it "efficient". As always, it depends...
If you want to read file.txt
line1 helloworld
import linecache
# read first line
print(linecache.getline('file.txt'), 1)
>helloworld