I'm having trouble it seems with reading in lines from a text file. When I do the whole f.readline() I can save it to a string and then print off the correct text however when lets say I go to print the first or second character of the string I just made it'll print a strange like dot checker pattern character instead of the correct letter.
Edit: Ok so when I try alfasin's method I seem to get the correct length of each line besides the first line that is read in. If I'm say reading in 5 lines and looking for a space, the first line with find the first space at spot 13 when it should find it at spot 8. However the next lines read in will all produce the correct length and location of the space.
Edit2: Also the text file I am reading in is UTF-8.
Edit3: Definitely was an issue with the encoding of the text file. I changed it to ANSI and everything started working as it should.
Try the following:
with open('filename.txt') as file:
for line in file:
print line
# and if you want to break it down to characters:
print list(line)
Related
I am writing a Streamlit app that takes in tensor output data from a .txt file, formats it, and both shows information on the data and prints the formatted data back to a new .txt file for later use.
After uploading the txt file to Streamlit and decoding it to a single long string, I alter the string and write it to a new txt file. When I open the txt file, the line spacings are huge, it looks like extra newlines have been put in but when you highlight the text, it is just large line spacings.
As well as this, when I use splitlines() on the string, the array that is returned is empty. This is the case even though the string is not empty and does contain newlines - I think it is to do with the large line spacings, but I am not sure.
The program is split into modules, but the code that is meant to format the file is in just two functions. One adds delimiters and works like this (with Streamlit as st):
def delim(file):
#read the selected file and write it to variable elems as a string
elems = file.decode('utf-8')
#replace the applicable parts of variable elems with the delimiters
elems = elems.replace('e+002', 'e+002, ')
elems = elems.replace('e+003', 'e+003, ')
elems = elems.replace('e+004', 'e+004, ')
elems = elems.replace('e+005', 'e+005, ')
elems = elems.replace('e+006', 'e+006, ')
elems = elems.replace('e+007', 'e+007, ')
elems = elems.replace('e+008', 'e+008, ')
elems = elems.replace('e+009', 'e+009, ')
with open('final_file.txt', 'w') as magma_file:
#write a txt file with the stored, altered text in variable elems
magma_file.write(elems)
#close the writeable file to be safe
magma_file.close()
st.success('Delimiters successfully added')
The second part, where I am getting the empty array, is in a second function. The whole function is not necessary to see the issue, but the part that is not working is here:
def addElem(file):
#create counting variables
counter = 0
linecount = 1
#put file as string in variable checks
checks = file.decode('utf-8')
checks.splitlines()
#check to see if the start of the file is formatted correctly. This is the part giving me strife
if checks[0].rstrip().endswith('5'):
with open('final_file.txt', 'w') as ff:
#iterate through the lines in the file
for line in checks:
counter+=1
# and so on, not relevant to the problem
The variable checks does contain a string after decoding the file, but when I use splitlines() then look inside checks[0], checks[1] etc., they are all empty. I tried commenting out other code, the conditional statement, removing the rstrip() and just seeing what was in the checks array after splitting the string, but it was still nothing. I tried changing splitlines() to split() using various delimiters including \n, but the array remained empty.
This program logic worked perfectly when I was running it locally using a console application interacting directly with the file system, so probably the problem is something to do with how a Streamlit "file like object" works. I read through the docs at Streamlit, but it doesn't give much detail on this.
This program is not for my use, so I can't keep it as a console app. I did ask about this on the Streamlit community a month ago, but so far no one has answered and I am not sure whether it is an unusual problem or just a terrible question.
I am wondering if there is a better way to decode the file to a string, but decoding to unicode doesn't explain the line spacings so I think something else is going on.
I used truncate(0) in python to overwrite data on my file. Part of the code is like this:
l=f.readlines()
f.truncate(0)
for w in l:
data=w.split()
if 105==int(data[0]):
f.write('%s %s\n'%((str(data[0]),str(np.mean(b)))))
else:
f.write('%s %s\n'%((str(data[0]),(data[1]))))
The code works fine and output is correct, but when I open the output file (which is in txt format) I got "invalid characters" errors. At the head of the output file, I have this extra data:
\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00
And the bad character error is for these extra data. After this invalid-characters, I have the data of output, made by python, which is correct.
Why does this happen, and how can I fix it?
Because after the f.readlines() call, the file position is still set to the original end of file. truncate doesn't change the file position, so when you write your string, it pads out to the old end of file with zeros. Just do f.seek(0) before you truncate.
I am using the following code to upload a file on server using FTP after editing it:
import fileinput
file = open('example.php','rb+')
for line in fileinput.input('example.php'):
if 'Original' in line :
file.write( line.replace('Original', 'Replacement'))
file.close()
There is one thing, instead of replacing the text in its original place, the code adds the replaced text at the end and the text in original place is unchanged.
Also, instead of just the replaced text, it prints out the whole line. Could anyone please tell me how to resolve these two errors?
1) The code adds the replaced text at the end and the text in original place is unchanged.
You can't replace in the body of the file because you're opening it with the + signal. This way it'll append to the end of the file.
file = open('example.php','rb+')
But this only works if you want to append to the end of the document.
To bypass this you may use seek() to navigate to the specific line and replace it. Or create 2 files: an input_file and an output_file.
2) Also, instead of just the replaced text, it prints out the whole line.
It's because you're using:
file.write( line.replace('Original', 'Replacement'))
Free Code:
I've segregated into 2 files, an inputfile and an outputfile.
First it'll open the ifile and save all lines in a list called lines.
Second, it'll read all these lines, and if 'Original' is present, it'll replace it.
After replacement, it'll save into ofile.
ifile = 'example.php'
ofile = 'example_edited.php'
with open(ifile, 'rb') as f:
lines = f.readlines()
with open(ofile, 'wb') as g:
for line in lines:
if 'Original' in line:
g.write(line.replace('Original', 'Replacement'))
Then if you want to, you may os.remove() the non-edited file with:
More Info: Tutorials Point: Python Files I/O
The second error is how the replace() method works.
It returns the entire input string, with only the specified substring replaced. See example here.
To write to a specific place in the file, you should seek() to the right position first.
I think this issue has been asked before in several places, I would do a quick search of StackOverflow.
Maybe this would help?
Replacing stuff in a file only works well if original and replacement have the same size (in bytes) then you can do
with open('example.php','rb+') as f:
pos=f.tell()
line=f.readline()
if b'Original' in line:
f.seek(pos)
f.write(line.replace(b'Original',b'Replacement'))
(In this case b'Original' and b'Replacement' do not have the same size so your file will look funny after this)
Edit:
If original and replacement are not the same size, there are different possibilities like adding bytes to fill the hole or moving everything after the line.
wrote a python script in windows 8.1 using Sublime Text editor and I just tried to run it from terminal in OSX Yosemite but I get an error.
My error occurs when parsing the first line of a .CSV file. This is the slice of the code
lines is an array where each element is the line in the file it is read from as a string
we split the string by the desired delimiter
we skip the first line because that is the header information (else condition)
For the last index in the for loop i = numlines -1 = the number of lines in the file - 2
We only add one to the value of i because the last line is blank in the file
for i in range(numlines):
if i == numlines-1:
dataF = lines[i+1].split(',')
else:
dataF = lines[i+1].split(',')
dataF1 = list(dataF[3])
del(dataF1[len(dataF1)-1])
del(dataF1[len(dataF1)-1])
del(dataF1[0])
f[i] = ''.join(dataF1)
return f
All the lines in the csv file looks like this (with the exception of the header line):
"08/06/2015","19:00:00","1","410"
So it saves the single line into an array where each element corresponds to one of the 4 values separated by commas in a line of the CSV file. Then we take the 3 element in the array, "410" ,and create a list that should look like
['"','4','1','0','"','\n']
(and it does when run from windows)
but it instead looks like
['"','4','1','0','"','\r','\n']
and so when I concatenate this string based off the above code I get 410 instead of 410.
My question is: Where did the '\r' term come from? It is non-existent in the original files when ran by a windows machine. At first I thought it was the text format so I saved the CSV file to a UTF-8, that didn’t work. I tried changing the tab size from 4 to 8 spaces, that didn’t work. Running out of ideas now. Any help would be greatly appreciated.
Thanks
The "\r" is the line separator. The "\r\n" is also a line separator. Different platforms have different line separators.
A simple fix: if you read a line from a file yourself, then line.rstrip() will remove the whitespace from the line end.
A proper fix: use Python's standard CSV reader. It will skip the blank lines and comments, will properly handle quoted strings, etc.
Also, when working with long lists, it helps to stop thinking about them as index-addressed 'arrays' and use the 'stream' or 'sequential reading' metaphor.
So the typical way of handling a CSV file is something like:
import csv
with open('myfile.csv') as f:
reader = csv.reader(f)
# We assume that the file has 3 columns; adjust to taste
for (first_field, second_field, third_field) in reader:
# do something with field values of the current lines here
The Task
I am writing a program in python that running a SAP2000 program by importing a new .s2k file each time into the Sap2000 program, and then a new file is generated from the results of the previous run by the means of exporting the data.
The file is about 1,500 lines containing arbitrary words and numbers. (For a better understanding, see this: http://pastebin.com/8ptYacJz, which is the file I am dealing with.)
I'm required to replace one number in the file.
That number is somewhere in the middle of line 800.
The Question
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
What I've Tried
Regular expressions did not work, because there can be more then one instance of the same number.
So I came up with the solution of templating the file and writing a new file each time with the number to be changed as a template parameter.
This solution does work but the person insists that I can move the file pointer down to line 800, then over to the middle of the line to replace the number.
Here is the only code I have for the problem that takes the file buffer to a line then back up to the beginning when I try to seek over.
import sys
import os
#open file
f = open("output.$2k")
#this will go to line 883 in text file
count = 0;
while count < 883:
line = f.readline()
count = count+1
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
line = f.readline()
print(line)
f.close()
Yes and no. Consider:
f=open('output.$2k','r+')
f.seek(300)
f.write('\n')
f.close()
This script just changes the 300th character in your ascii file to a newline. Now the tricky part is that there is no way to know the length of a line in an ascii file short of reading until you get to a newline. So, locating the particular character in the file at the middle of the 800th line is non-trivial. However, if you can make guarantees (due to the way the file was written) about the line length, you can calculate the position without any problem. Also note that replacing 1 with 100 won't work here. You need to replace 1 character with 1 character.
And just for all the other *NIX users in the world ... please don't put $ in your filename. That's just a nightmare...
OK, i'm not a professional programmer, but my (stupid) approach would be: If it's always line 800, read the file line by line while tracking the line numbers. Write then directly to a new file. Read line 800, change it, write it. Then write the rest. Dumb and not elegant but it should work-unless i miss something which i probably do. And there goes my meager reputation :D
No. Read in the line, manipulate it, then write it out to the new file you've previously opened for writing (and have been writing the other lines to, unmodified).
A first thing:
#this would seek over to middle of file DOESN'T WORK
f.seek(0,0)
this is not true. This seeks to the beginning of the file.
To your actual question:
Does anyone know an efficient way to move down to the middle of line 800 in a file, in order to replace one number?
In general, no. You'd need to rewrite the file. For example like this:
# open the file in read-and-update mode
with open("file", 'r+') as f:
# read all lines
lines = f.readlines()
# update 800'th line
my_line = lines[799].split()
my_line[5] = "%s" % my_number # TODO: put in index of number and updated number
lines[799] = " ".join(my_line)
# truncate and rewrite file
f.truncate(0)
f.writelines(lines)
You can do it, if the starting position of the number in the file is predictable (e.g. number_starting_pos = 1234 from the beginning of the file) and the size of the string representation is also predictable (e.g. 20).
Then you could rewrite the number and make sure you fill up the padding with whitespace again to overwrite any content of the previous entry.
Similar to this:
with open("file", 'r+') as f:
# seek to the number starting position
f.seek(number_starting_pos, 0)
# update number field, assuming width (20), arbitrary space-padding allowed
my_number_string = "%19s " % my_number
# make sure the string is indeed exactly of the specific size (it may be longer)
assert len(my_number_string) == 20, "file writing would fail! aborting!"
f.write(my_number_string)
For this to work, you'd need to have a look at the docs of your SAP-thingy, and see if whitespace indeed not matters.
However, both approaches are based on a lot of assumptions. Depending on your use case it may easily break your code, e.g. if a line is inserted or even a characters is inserted before the number field.