Expected text file lines after saving with "writeLines" python method - python

I am trying to split a large CSV file into several parts as files with Python.
as a first try, I read the first 261579 lines from the CSV dataset file using this part of the code:
for c in range(261579):
line = datasetFile.readline()
if len(line) == 0:print("empty line detected at : " ,c)
lines.append(line)
print("SAVING LINES ......")
split = open(outputDirectoryName+"spilt" + str(x+1) +".csv","w")
split.writelines(lines)
print("SPLIT " + str(x+1) + " END with " ,str(len(lines)) , "lines .")
OK, for the moment, the code works well and shows me
"SPLIT 1 END with 261579 lines."
, But the problem is that when I open my file "Split1.csv" with notpad++, I only find 261575 instead of 261579, it's a loss of data for 4 lines somewhere in the file.
With this proportion, I want to know what exactly happens with the "file.writeLines (lines)" method when do we use it to save my data in a split file?

I had same issue and then I found out that I should have closed my file.for you
split.close()

Related

Minor bug in code written to format a text file (incorrect spacing) (Python 3)

New to coding so sorry if this is a silly question.
I have some text that I'm attempting to format to make it more pleasant to read, so I tried my hand at writing a short program in Python to do it for me. I initially removed extra paragraph breaks in MS-Word using the find-and-replace option. The input text looks something like this:
This is a sentence. So is this one. And this.
(empty line)
This is the next line
(empty line)
and some lines are like this.
I want to eliminate all empty lines, so that there is no spacing between lines, and ensure no sentences are left hanging mid-way like in the bit above. All new lines should begin with 2 (two) empty spaces, represented by the $ symbol below. So after formatting it should look something like this:
$$This is a sentence. So is this one. And this.
$$This is the next line and some lines are like this.
I wrote the following script:
import os
directory = "C:/Users/DELL/Desktop/"
filename = "test.txt"
path = os.path.join(directory, filename)
with open(path,"r") as f_in, open(directory+"output.txt","w+") as f_out:
temp = " "
for line in f_in:
curr_line = line.strip()
temp += curr_line
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n " #two blank spaces here
It eliminates all blank lines, indents new lines by two spaces, and conjoins hanging sentences, but doesn't insert the necessary blank space - so the output currently looks like (missing space between the words line and and).
$$This is a sentence. So is this one. And this.
$$This is the next lineand some lines are like this.
I tried to fix this by changing the following lines of code to read as follows:
temp += " " + curr_line
temp = "\n " #one space instead of two
and that doesn't work, and I'm not entirely sure why. It might be an issue with the text itself but I'll check on that.
Any advice would be appreciated, and if there is a better way to do what I want than this convoluted mess that I wrote, then I would like to know that as well.
EDIT: I seem to have fixed it. In my text (very long so I didn't notice it at first) there were two lines separated by 2 (two) empty lines, and so my attempt at fixing it didn't work. I moved one line a bit further below to give the following loop, which seems to have fixed it:
for line in f_in:
curr_line = line.strip()
#print("Current line:\n%s\n\ntemp line: %s" % (curr_line, temp))
if curr_line:
temp += " " + curr_line
if temp[-1]==".": #check if sentence is complete
f_out.write(temp)
temp = "\n "
I also saw that an answer below initially had a bit of Regex in it, I'll have to learn that at some point in the future I suppose.
Thanks for the help everyone.
This should work. It's effectively the same as yours but a bit more efficient. Doesn't use string concatenation + += (which are slow) but instead saves incomplete lines as a list. It then writes 2 spaces, each incomplete sentence joined by spaces and then a newline -- this simplifies it by only writing when a line is complete.
temp = []
with open(path_in, "r") as f_in, open(path_out, "w") as f_out:
for line in f_in:
curr_line = line.strip()
if curr_line:
temp.append(curr_line)
if curr_line.endswith('.'): # write our line
f_out.write(' ')
f_out.write(' '.join(temp))
f_out.write('\n')
temp.clear() # reset temp
outputs
This is a sentence. So is this one. And this.
This is the next line and some lines are like this.

Python: Formatting the way my program writes arrays to txt file

I am trying to get my program to print one item from each array to a text file and then once all the first items were written,write the second item of the array on the second line and so on.
The code I have now only prints the info on one line of text.
def write():
outFile=open("Inventory.txt","w")
for i in range(0,len(clothesItem)):
outFile.write(format(clothesItem[i],ITEM_FORMAT)+format(clothesColor[i],COLOR_FORMAT)+format(clothesAmount[i],AMOUNT_FORMAT))
outFile.close()
Change this line:
outFile.write(format(clothesItem[i],ITEM_FORMAT)+format(clothesColor[i],COLOR_FORMAT)+format(clothesAmount[i],AMOUNT_FORMAT))
To the following:
outFile.write(format(clothesItem[i], ITEM_FORMAT) + format(clothesColor[i],COLOR_FORMAT) + format(clothesAmount[i], AMOUNT_FORMAT) + "\n")
^^^^
Note the + "\n" added onto the end.

Gathering data from huge text files

I have a text file composed of several subsequent tables. I need to get certain values from certain tables and save them in an output file. Every table has a header which contains a string that can be used to find specific tables. The size of these text files can vary from tenths of MB to some GB. I have written the following script to do the job:
string = 'str'
index = 20
n = 2
in_file = open('file.txt')
out_file = open("out.txt", 'w')
current_line = 0
for i in range(-index,index+1):
for j in range(-index,index+1):
for line in in_file:
if string in line:
En = line.split().pop(4)
for line in in_file:
current_line += 1
if current_line == 2*(n+1)+2:
x = line.split().pop(10)
elif current_line == 3*(n+1)+2:
y = line.split().pop(10)
elif current_line == 4*(n+1)+2:
z = line.split().pop(10)
current_line = 0
break
print i, j, En, x, y, z
data = "%d %d %s %s %s %s\n" % (i,j,En,x,y,z)
out_file.write(data)
break
in_file.close()
out_file.close()
The script reads the file line by line searching for the specified string ('str' in this example). When found, it then extracts a value from the line containing the string and continue reading the lines that form the data table itself. Since all the tables in the file have the same number of lines and columns, I've used the variable current_line to keep track of which line is read and to specify which line contains the data I need. The first two for-loops are just there to generate a pair of indexes that I need to be printed in the output file (in this case they are between -20 and 20).
The script works fine. But since I've been learning python by myself for about one month, and the files I have to handle can be very big, I'm asking for advices on how to make the script more efficient, and overall, better.
Also, since the tables are regular, I can know beforehand which are the lines that contain the values I need. So I was wondering, instead of reading all the lines in the file, is it possible to specify which lines have to be read and then jump directly between them?
Sample input file
Here's a sample input file. I've included just some tables so you can have an idea how it's organized. This file is composed by two blocks with three tables each. In this sample file, the string "table #" is what is used to find the data to be extracted.
Sample output file
And here's a sample output file. Keep in mind that these two files are not equivalent! This output was created by my script using an input file containing 1681 blocks of 16 tables. Each table had 13 lines just as in the sample input file.

Read lines from .txt and if the first and last char equals to X and Y, add some text after that string

i'm trying to solve a problem that consists in:
open a .txt input file
readlines of that .txt
store the lines values in a list
check if lin.startswith("2") and lin.endswith("|")
if it's true, then lin2 = lin + "ISENTO"
write the edited lines to an output .txt file
Here's what i got until now...
def editTxt():
#open .txt file and read the lines
filename = askopenfilename(filetypes=[("Text files","*.txt"), ("Text files","*.TXT")])
infile = open(filename, 'r')
infile.read()
#save each line in a list called "linhas" outside the editTxt function
with open(filename, 'r') as f:
linhas = f.readlines()
#create the output file
outfile = open(filename + "_edit.txt", 'w')
#checking the condition and writing the edited lines
for linha in linhas:
if linha.startswith("2") and linha.endswith("|"):
linha = linha + "ISENTO"
outfile.write(linha)
#close files
outfile.close()
infile.close()
The problem is that my output file is exactly equals my input file...
i've already tried to use if linha[0] == "2" and linha[len(linha)-1] == "|"
but then i figured it out that the module f.readlines() just add \n after my string...
so i tried with if linha[0] == "2" and linha[len(linha)-3] == "|"
but i didn't worked too...
some guys told me that i should use the replace function.. but i couldn't figure how
The real file example:
lin1: 10|1,00|55591283000185|02/03/2015|31/03/2015
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|
lin3: 30|1103|DAT 05MM - 5.102||PC|1,0000|19,9000|19,90|090|0,00|0,00|0,00
I just need to change the lin2, because it starts with "2" and ends with "|"
what i need after running the editTxt function:
lin2: 20|I||VENDA|0|9977|02/03/2015 00:00:00|02/03/2015 11:48:00|1|5102|||07786273000152|OBSERVE SEGURANCA LTDA|RUA
MARINGA,|2150||BOA VISTA|RIBEIRAO PRETO|SP|14025560||39121530|ISENTO
please python experts, show me an easier way to do this with another code or preferably explaining to me whats wrong with mine..
thx!
You were very close with your last attempt
The '\n' line terminator is not literally the characters '\' and 'n'. It's a special character that is represented, for convenience by '\n'. So it's only one character in your string not two.
Hopefully, that should give you enough of a hint to figure out how to change your code :)

Calculating size of content

Below is the code snippet. I have a file.
f = open(self.reportSavePath,'w')
self.test = '';
for file in file_sorted:
f.write(str(os.path.getmtime(file)) + "|" + file + "\r\n")
self.test = self.test + str(os.path.getmtime(file)) + "|" + file + "\r\n"
f.close()
print("Size:",os.path.getsize(self.reportSavePath)) #Without opening file
print("Calculated size:",len(self.test.encode())) #After reading the contents
My question is, why are the last two lines giving different output? Should they not be same?
If there is a reason, how can I edit line with comment #Without opening file to match the output with line commented #After reading the contents?
You're comparing apples and oranges.
os.path.getsizeof returns the file size. len(some_string) returns the length of the string in characters regardless of encoding, which can affect the naive byte count.

Categories