Lines were dropped when merging two text files using the write() method - python

I started with relevant_files, which was a list of paths of two CSV files. I then attempted to create a file with path output_filename, using the following block of code.
new_file = open(output_filename, 'w+')
for x in relevant_files:
for line in open(x):
new_file.write(line)
The code looks perfectly reasonable, but I totally randomly decided to check the lengths, before and after the merge. file_1 had length 6,740,108 and file_2 had length 4,938,459. Those sum to 11,678,567. However, the new file has length 11,678,550, which is 17 lines shorter than the combined length of the two source files. I then checked the CSV files by hand -- indeed, it was exactly the final 17 lines of the 2nd text file (i.e., 2nd entry in relevant_files) that had gotten dropped.
What went wrong? Is there a maximum file length or something?

I'm not sure exactly what is wrong with your script, but it's good to use with statements when working with files in python. They get rid of the need to close the file once you've opened it, which it seems you haven't done here.
with open(output_file, 'w+') as f:
lines = []
for file in relevant_files:
for line in open(file, 'r').read().split('\n'):
lines.append(line)
f.write('\n'.join(lines))
This is what I would use to complete your task.

Related

How to remove the first n lines from multiple files to a single output file using any windows program E.g Python

I use the python script below to remove the first n lines of all text files in a folder.I want the deleted lines to be sent to a single output file.
Here is my code:
import glob
myfiles = glob.glob("*.txt")
for file in myfiles:
lines = open(file).readlines()
open(file, 'w').writelines(lines[4:])
I'm not providing a full written programming answer, because there's plenty of those to be found on stackexchange if you look there. Instead here's some hints to get you to where you hopefully need to be.
First, select the lines you want (to keep) from the file after you open it.
for myline in lines: # For each line, stored as myline,
# It's upto you to figure out how to only do the first 4...
mylines.append(myline) # mylines now contains, the lines you want.
open(removed_lines, 'a').writelines(mylines) # Save the 'removed lines'.

IndexError when printing with readlines()

I keep encountering an index error when trying to print a line from a text file. I'm new to python and I'm still trying to learn so I'd appreciate if you can try to be patient with me; if there is something else needed from me, please let me know!
The traceback reads as
...
print(f2.readlines()[1]):
IndexError: list index out of range
When trying to print line 2 (...[1]), I am getting this out of range error.
Here's the current script.
with open("f2.txt", "r") as f2:
print(f2.readlines()[1])
There are 3 lines with text in the file.
contents of f2.txt
peaqwenasd
lasnebsat
kikaswmors
It seems that f2.seek(0) was necessary here to solve the issue.
with open("f2.txt", "r") as f2:
f2.seek(0)
print(f2.readlines()[1])
You haven't given all the code needed to solve your problem, but your given symptoms point to multiple calls to readlines.
Read the documentation: readlines() reads the entire file and returns a list of the contents. As a consequence, the file pointer is at the end of the file. If you call readlines() again at this point, it returns an empty file.
You apparently have a readlines() call before the code you gave us. seek(0) resets the file pointer to the start of the file, and you're reading the entire file a second time.
There are many tutorials that show you canonical ways to iterate through the contents of a file. I strongly recommend that you use one of those. For instance:
with open("f2.txt", "r") as f2:
for line in f2.readlines():
# Here you can work with the lines in sequence
If you need to deal with the lines in non-sequential order, then
with open("f2.txt", "r") as f2:
content = list(f2.readlines())
# Now you can access content[2], content[1], etc.

Attempting to merge several CSV files, program hangs on seemingly arbitrary file

I have a bunch of CSV files with common columns but different rows. I want to merge them all into one CSV file. Here is the script I wrote to do that
import glob, os
os.chdir("./data")
fout = open("merged.csv", "a")
lout = open("merger_log", "a")
for fname in glob.glob("*.csv*"):
with open(fname) as f:
# exclude header for all but the first csv file.
if os.stat("merged.csv").st_size > 0:
next(f)
fout.writelines(f)
log = "Appended %s \n" % fname
print(log)
lout.write(log)
fout.close()
lout.close()
When I run this script, it successfully appends the first few files but gets stuck on one file every time. And by stuck it seems to be adding bits from said file to the output file without moving on to the next file. There's nothing special about the file it stops on, it's about the same size as the rest of them and is not malformed. In fact, I removed that file from the data set and the program hung on a different file. Not sure what is wrong with this script.
If anyone has a better way to merge a bunch of CSV files, I'm all ears.
Thanks!
EDIT: I should mention this script works perfectly fine with just two files.

Replace string in specific line of nonstandard text file

Similar to posting: Replace string in a specific line using python, however results were not forethcomming in my slightly different instance.
I working with python 3 on windows 7. I am attempting to batch edit some files in a directory. They are basically text files with .LIC tag. I'm not sure if that is relevant to my issue here. I am able to read the file into python without issue.
My aim is to replace a specific string on a specific line in this file.
import os
import re
groupname = 'Oldtext'
aliasname = 'Newtext'
with open('filename') as f:
data = f.readlines()
data[1] = re.sub(groupname,aliasname, data[1])
f.writelines(data[1])
print(data[1])
print('done')
When running the above code I get an UnsupportedOperation: not writable. I am having some issue writing the changes back to the file. Based on suggestion of other posts, I edited added the w option to the open('filename', "w") function. This causes all text in the file to be deleted.
Based on suggestion, the r+ option was tried. This leads to successful editing of the file, however, instead of editing the correct line, the edited line is appended to the end of the file, leaving the original intact.
Writing a changed line into the middle of a text file is not going to work unless it's exactly the same length as the original - which is the case in your example, but you've got some obvious placeholder text there so I have no idea if the same is true of your actual application code. Here's an approach that doesn't make any such assumption:
with open('filename', 'r') as f:
data = f.readlines()
data[1] = re.sub(groupname,aliasname, data[1])
with open('filename', 'w') as f:
f.writelines(data)
EDIT: If you really wanted to write only the single line back into the file, you'd need to use f.tell() BEFORE reading the line, to remember its position within the file, and then f.seek() to go back to that position before writing.

How to read this particular file format?

I have the following text in a csv file:
b'DataMart\n\nDate/Time Generated,11/7/16 8:54 PM\nReport Time Zone,America/New_York\nAccount ID,8967\nDate Range,10/8/16 - 11/6/16\n\nReport Fields\nSite (DCM),Creative\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter\nGlobest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter'
Essentially there are multiple new line characters in this file instead of a single big string so you can picture the same text as follows
DataMart
Date/Time Generated,11/7/16 8:54 PM
Report Time Zone,America/New_York
Account ID,8967
Date Range,10/8/16 - 11/6/16
Report Fields
Site (DCM),Creative
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
Globest.com,2016-08_CB_018_1040x320_Globe St_16_PropertyFilter
I need to grab the last two lines, which is basically the data. I tried doing a for loop:
with open('file.csv','r') as f:
for line in f:
print(line)
It instead prints the entire line again with \n.
Just read the file and get the last two lines:
my_file = file("/path/to/file").read()
print(my_file.splitlines()[-2:])
The [-2:] is known as slicing: it creates a slice, starting from the second to last element, going to the end.
ok, after struggling around for a bit, i found out that i need to change the decoding of the file from binary to 'utf-8' and then i can apply the split functions. The problem was split functions are not applicable to the binary file.
This is the actual code that seems to be working for me now:
with open('BinaryFile.csv','rb') as f1:
data=f1.read()
text=data.decode('utf-8')
with open('TextFile.csv', 'w') as f2:
f2.write(text)
with open('TextFile.csv','r') as f3:
for line in f3:
print(line.split('\\n')[9:])
thanks for your help guys

Categories