Creating text files, appending them to zip, then delete them - python

I am trying to get the code below to read the file raw.txt, split it by lines and save every individual line as a .txt file. I then want to append every text file to splits.zip, and delete them after appending so that the only thing remaining when the process is done is the splits.zip, which can then be moved elsewhere to be unzipped. With the current code, I get the following error:
Traceback (most recent call last): File "/Users/Simon/PycharmProjects/text-tools/file-splitter-txt.p‌​y",
line 13, in <module> at stonehenge summoning the all father. z.write(new_file)
File "/usr/local/Cellar/python/2.7.12_2/Frameworks/Python.framewo‌​rk/Versions/2.7/lib/‌​python2.7/zipfile.py‌​", line 1123, in write st = os.stat(filename) TypeError: coercing to Unicode: need string or buffer,
file found
My code:
import zipfile
import os
z = zipfile.ZipFile("splits.zip", "w")
count = 0
with open('raw.txt','r') as infile:
for line in infile:
print line
count +=1
with open(str(count) + '.txt','w') as new_file:
new_file.write(str(line))
z.write(new_file)
os.remove(new_file)

You could simply use writestr to write a string directly into the zipFile. For example:
zf.writestr(str(count) + '.txt', str(line), compress_type=...)

Use the file name like below. write method expects the filename and remove expects path. But you have given the file (file_name)
z.write(str(count) + '.txt')
os.remove(str(count) + '.txt')

Related

How I can simulateneously sort and unique files from directory using python?

I am trying to sort and unique 30 files of different sizes in one single file.
Each file contains a single line and are separated by newline. That means the file has simple text on each single line.
Here is what I tried to attempt:
lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
for line in open('d:\\testing\\*', "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()
The folder name is testing and it contains 30 different files, which I am trying to combine into file out.txt. The output will be the sorted and unique text, written on each line of the output file.
Well, I thought it would be easy, if I write d:\\testing\\* and it will read the files from the folder. But I got error:
Traceback (most recent call last):
File "sort and unique.py", line 3, in <module>
for line in open('d:\\testing\\*', "r"):
OSError: [Errno 22] Invalid argument: 'd:\\testing\\*'
I would like to know how I can get rid of this error and process my all files efficiently into one single output without any unsuccess.
Please note: RAM is 8 GB and the folder size is about 10 GB.
You just need to loop over all files using os.listdir. Something like this:
lines_seen = set() # holds lines already seen
outfile = open('out.txt', "w")
path = r'd:\testing'
for file in os.listdir(path): #added this line
current_file = os.path.join(path, file)
for line in open(current_file, "r"):
if line not in lines_seen: # not a duplicate
outfile.write(line)
lines_seen.add(line)
outfile.close()

TypeError - What does this error mean?

So, i've been writing this program that takes a HTMl file, replaces some text and puts the return back into a different file in a different directory.
This error happened.
Traceback (most recent call last):
File "/Users/Glenn/jack/HTML_Task/src/HTML Rewriter.py", line 19, in <module>
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/posixpath.py", line 89, in join
genericpath._check_arg_types('join', a, *p)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/genericpath.py", line 143, in _check_arg_types
(funcname, s.__class__.__name__)) from None
TypeError: join() argument must be str or bytes, not 'TextIOWrapper'
Below is my code. Has anyone got any solutions I could implement, or should I kill it with fire.
import re
import os
os.mkdir ("dest")
file = open("2016-06-06_UK_BackToSchool.html").read()
text_filtered = re.sub(r'http://', '/', file)
print (text_filtered)
with open ("2016-06-06_UK_BackToSchool.html", "wt") as out_file:
print ("testtesttest")
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
out_file.write(text_filtered)
os.rename("/Users/Glenn/jack/HTML_Task/src/2016-06-06_UK_BackToSchool.html", "/Users/Glenn/jack/HTML_Task/src/dest/2016-06-06_UK_BackToSchool.html")
with open (os.path.join("/Users/Glenn/jack/HTML_Task/src", out_file)):
Here out_file if TextIOWrapper, not string.
os.path.join takes string as arguments.
Do not use keywords name as variable. file is keyword.
Do not use space in between function call os.mkdir ("dest")
try to change this:
with open ("2016-06-06_UK_BackToSchool.html", "wt") as out_file
on this:
with open ("2016-06-06_UK_BackToSchool.html", "w") as out_file:
or this:
with open ("2016-06-06_UK_BackToSchool.html", "wb") as out_file:

Using os.path.join with os.path.getsize, returning FileNotFoundError

In conjunction with my last question, I'm onto printing the filenames with their sizes next to them in a sort of list. Basically I am reading filenames from one file (which are added by the user), taking the filename and putting it in the path of the working directory to print it's size one-by-one, however I'm having an issue with the following block:
print("\n--- Stats ---\n")
with open('userdata/addedfiles', 'r') as read_files:
file_lines = read_files.readlines()
# get path for each file and find in trackedfiles
# use path to get size
print(len(file_lines), "files\n")
for file_name in file_lines:
# the actual files should be in the same working directory
cwd = os.getcwd()
fpath = os.path.join(cwd, file_name)
fsize = os.path.getsize(fpath)
print(file_name.strip(), "-- size:", fsize)
which is returning this error:
tolbiac wpm-public → ./main.py --filestatus
--- Stats ---
1 files
Traceback (most recent call last):
File "./main.py", line 332, in <module>
main()
File "./main.py", line 323, in main
parseargs()
File "./main.py", line 317, in parseargs
tracking()
File "./main.py", line 204, in tracking
fsize = os.path.getsize(fpath)
File "/usr/lib/python3.4/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/home/tolbiac/code/wpm-public/file.txt\n'
tolbiac wpm-public →
So it looks like something is adding a \n to the end of file_name, I'm not sure if thats something used in the getsize module, I tried this with os.stat, but it did the same thing.
Any suggestions? Thanks.
When you're reading in a file, you need to be aware of how the data is being seperated. In this case, the read-in file has a filename once per line seperated out by that \n operator. Need to strip it then before you use it.
for file_name in file_lines:
file_name = file_name.strip()
# rest of for loop

Reading in a filename, manipulating it then saving it

Bascially I am reading in a file called Sounding, which has its name as '12142014_2345.csv' and I want to save it as
'12142014_2345_Averaged.csv'
Below is the code I have leading up to it.
basename = os.path.basename(Sounding)
basename,ext = os.path.split(basename)
with open(os.path.join(basename+'_Averaged'+ext)) as f:
w = csv.DictWriter(f, rows_1[0].keys())
w.writeheader()
And this is my error.
Traceback (most recent call last):
File "C:\Users\Bud\Desktop\OWLES RECENT\Moving Average.py", line 151, in <module>
with open(os.path.join(basename+'_Averaged'+ext)) as f:
IOError: [Errno 2] No such file or directory: u'_AveragedUIllinois_20131207Singular.csv'
I'm not exactly sure what I am doing wrong with it.
You want to use os.path.splitext (to split the extension) instead of split (that splits last path element).
And don't forget to open the file in write mode (and check your indentation):
basename,ext = os.path.splitext(basename)
with open(os.path.join(basename+'_Averaged'+ext), 'w') as f:
w = csv.DictWriter(f, rows_1[0].keys())
w.writeheader()
Looking at the file name in the trace, you can see '_Averaged' coming before the full file name. os.path.split looks for the directory divider (usually slash). I think you want os.path.splitext, which splits the string into path and extension. Since you already have just the basename, the path will be the filename without the extension.

Use codecs to read file with correct encoding: TypeError

I need to read from a file, linewise. Also also need to make sure the encoding is correctly handled.
I wrote the following code:
#!/bin/bash
import codecs
filename = "something.x10"
f = open(filename, 'r')
fEncoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in fEncoded:
totalLength+=len(line)
print("Total Length is "+totalLength)
This code does not work on all files, on some files I get a
Traceback (most recent call last):
File "test.py", line 11, in <module>
for line in fEncoded:
File "/usr/lib/python3.2/codecs.py", line 623, in __next__
line = self.readline()
File "/usr/lib/python3.2/codecs.py", line 536, in readline
data = self.read(readsize, firstline=True)
File "/usr/lib/python3.2/codecs.py", line 480, in read
data = self.bytebuffer + newdata
TypeError: can't concat bytes to str
Im using python 3.3 and the script must work with this python version.
What am I doing wrong, I was not able to find out which files work and which not, even some plain ASCII files fail.
You are opening the file in non-binary mode. If you read from it, you get a string decoded according to your default encoding (http://docs.python.org/3/library/functions.html?highlight=open%20builtin#open).
codec's StreamReader needs a bytestream (http://docs.python.org/3/library/codecs#codecs.StreamReader)
So this should work:
import codecs
filename = "something.x10"
f = open(filename, 'rb')
f_decoded = codecs.getreader("ISO-8859-15")(f)
totalLength = 0
for line in f_decoded:
total_length += len(line)
print("Total Length is "+total_length)
or you can use the encoding parameter on open:
f_decoded = open(filename, mode='r', encoding='ISO-8859-15')
The reader returns decoded data, so I fixed your variable name. Also, consider pep8 as a guide for formatting and coding style.

Categories