reading from multiple txt files - strip data and save to xls

reading from multiple txt files - strip data and save to xls - python

i'm very new to python, so far i have written the following code below, which allows me to search for text files in a folder, then read all the lines from it, open an excel file and save the read lines in it. (Im still unsure whether this does it for all the text files one by one)
Having run this, i only see the file text data being read and saved into the excel file (first column). Or it could be that it is overwriting the the data from multiple text files into the same column until it finishes.
Could anyone point me in the right direction on how to get it to write the stripped data to the next available column in excel through each text file?
import os
import glob
list_of_files = glob.glob('./*.txt')
for fileName in list_of_files:
fin = open( fileName, "r" )
data_list = fin.readlines()
fin.close() # closes file
del data_list[0:17]
del data_list[1:27] # [*:*]
fout = open("stripD.xls", "w")
fout.writelines(data_list)
fout.flush()
fout.close()

Can be condensed in
import glob
list_of_files = glob.glob('./*.txt')
with open("stripD.xls", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, "r" ).readlines()
fout.write(data_list[17])
fout.writelines(data_list[44:])
Are you aware that writelines() doesn't introduce newlines ? readlines() keeps newlines during a reading, so there are newlines present in the elements of data_list written in the file by writelines() , but this latter doesn't introduce newlines itself

You may like to check this and for simple needs also csv.

These lines are "interesting":
del data_list[0:17]
del data_list[1:27] # [*:*]
You are deleting as many of the first 17 lines of your input file as exist, keeping the 18th (if it exists), deleting another 26 (if they exist), and keeping any following lines. This is a very unusual procedure, and is not mentioned at all in your description of what you are trying to do.
Secondly, you are writing the output lines (if any) from each to the same output file. At the end of the script, the output file will contain data from only the last input file. Don't change your code to use append mode ... opening and closing the same file all the time just to append records is very wasteful, and only justified if you have a real need to make sure that the data is flushed to disk in case of a power or other failure. Open your output file once, before you start reading files, and close it once when you have finished with all the input files.
Thirdly, any old arbitrary text file doesn't become an "excel file" just because you have named it "something.xls". You should write it with the csv module and name it "something.csv". If you want more control over how Excel will interpret it, write an xls file using xlwt.
Fourthly, you mention "column" several times, but as you have not given any details about how your input lines are to be split into "columns", it is rather difficult to guess what you mean by "next available column". It is even possible to suspect that you are confusing columns and rows ... assuming less than 43 lines in each input file, the 18th ROW of the last input file will be all you will see in the output file.

Related

Adding text at the beginning of multiple txt files into a folder. Problem of overwriting the text inside

im trying to add the same text at the beggining of all the txt files that are in a folder.
With this code i can do it, but there is a problem, i dont know why it overwrite part of the text that is at the beginning of each txt file.
output_dir = "output"
if not os.path.exists(output_dir):
os.makedirs(output_dir)
for f in glob.glob("*.txt"):
with open(f, 'r', encoding="utf8") as inputfile:
with open('%s/%s' % (output_dir, ntpath.basename(f)), 'w', encoding="utf8") as outputfile:
for line in inputfile:
outputfile.write(line.replace(line,"more_text"+line+"text_that_is_overwrited"))
outputfile.seek(0,io.SEEK_SET)
outputfile.write('text_that_overwrite')
outputfile.seek(0, io.SEEK_END)
outputfile.write("more_text")
The content of txt files that im trying to edit start with this:
here 4 spaces text_line_1
here 4 spaces text_line_2
The result is:
On file1.txt: text_that_overwriteited
On file1.txt: text_that_overwriterited

Your mental model of how writing a file works seems to be at odds with what's actually happening here.
If you seek back to the beginning of the file, you will start overwriting all of the file. There is no such thing as writing into the middle of a file. A file - at the level of abstraction where you have open and write calls - is just a stream; seeking back to the beginning of the stream (or generally, seeking to a specific position in the stream) and writing replaces everything which was at that place in the stream before.
Granted, there is a lower level where you could actually write new bytes into a block on the disk whilst that block still remains the storage for a file which can then be read as a stream. With most modern file systems, the only way to make this work is to replace that block with exactly the same amount of data, which is very rarely feasible. In other words, you can't replace a block containing 1024 bytes with data which isn't also exactly 1024 bytes. This is so marginally useful that it's simply not an operation which is exposed to the higher level of the file system.
With that out of the way, the proper way to "replace lines" is to not write those lines at all. Instead, write the replacement, followed by whichever lines were in the original file.
It's not clear from your question what exactly you want overwritten, so this is just a sketch with some guesses around that part.
output_dir = "output"
# prefer exist_ok=True over if not os.path.exists()
os.makedirs(output_dir, exist_ok=True)
for f in glob.glob("*.txt"):
# use a single with statement
# prefer os.path.basename over ntpath.basename; use os.path.join
with open(f, 'r', encoding="utf8") as inputfile, \
open(os.path.join(output_dir, os.path.basename(f)), 'w', encoding="utf8") as outputfile:
for idx, line in enumerate(inputfile):
if idx == 0:
outputfile.write("more text")
outputfile.write(line.rstrip('\n'))
outputfile.write("text that is overwritten\n")
continue
# else:
outputfile.write(line)
outputfile.write("more_text\n")
Given an input file like
here is some text
here is some more text
this will create an output file like
more texthere is some texttext that is overwritten
here is some more text
more_text
where the first line is a modified version of the original first line, and a new line is appended after the original file's contents.

I found this elsewhere on StackOverflow. Why does my text file keep overwriting the data on it?
Essentially, the w mode is meant to overwrite text.
Also, you seem to be writing a sitemap manually. If you are using a web framework like Flask or Django, they have plugin or built-in support for auto-generated sitemaps — you should use that instead. Alternatively, you could create an XML template for the sitemap using Jinja or DTL. Templates are not just for HTML files.

How can I make the program to load all text files in glob one by one?

Every time, all the operations are taking place only on the first file of glob folder..
The code is not updating the file in globs.. It's is taking only the first text file.
The code is written in the image..
There is no error msg. I want it to perform same operations on all text files inside glob.

From what I understood, you need to first open your writer, and then read what you need to read and write what you read in a loop . (then close your writer if you need) Not specifically your answer but should be something similar to this:
with open('some.csv', 'w') as somefile:
for filename in glob.glob('path'):
with open(filename, 'r') as file_to_read:
some_data=file_to_read.readlines()
#if you need loop again in data
#or do whatever you want
for i in range(len(some_data)):
data=some_data[i]
writer=csv.writer(somefile)
writer.writerow([filename, data])

Is there a way I can extract mutliple pieces of data from a multiple text file in python and save it as a row in a new .csv file?

Is there a way I can extract multiple pieces of data from a text file in python and save it as a row in a new .csv file? I need to do this for multiple input files and save the output as a single .csv file for all of the input files.
I have never used Python before so I am quite clueless. I have used matlab before and I know how I would do it in matlab if it was numbers (but unfortunately it is text which is why I am trying python). So to be clear I need a new line in the .csv output file for each "ID" in the input files.
An example of the data is show below (2 separate files)
EXAMPLE DATA - FILE 1:
id,ARI201803290
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/29
id,ARI201803300
data,er,corbp001,2
version,2
info,visteam,COL
info,hometeam,ARI
info,site,PHO01
info,date,2018/03/30
data,er,delaj001,0
EXAMPLE DATA - FILE 2:
id,NYN201803290
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,usedh,false
info,date,2018/03/29
data,er,famij001,0
id,NYN201803310
version,2
info,visteam,SLN
info,hometeam,NYN
info,site,NYC20
info,date,2018/03/31
data,er,gselr001,0
I'm hoping to get the data in a .csv format with all the details from one "id" on 1 line. There are multiple "id's" per text file and there are multiple files. I want to repeat this process for multiple text files so the outputs are in the same .csv output file. I want the output to look as follows in the .csv file, with each piece of info as a new cell:
ARI201803290 COL ARI PHO01 2018/03/29 2
ARI201803300 COL ARI PHO01 2018/03/30 0
NYN201803290 SLN NYN NYC20 2018/03/29 0
NYN201803310 SLN NYN NYC20 2018/03/31 0
If I was doing it in matlab I'd use a for loop and if statement and say
j=1
k=1
for i=1:size(myMatrix, 1)
if file1(i;1)==id
output(k,1)=(i;2)
k=k+1
else if
file1(i;1)==info && file1(i;1)==info
output(j,2)=(i;3)
j=j+1
etc.....
However I obviously can't do this in matlab because I have comma separated text files, not a matrix. Does anyone have any suggestions how I can translate my idea to python code? Or any other suggestion. I am super new to python so willing to try anything that might work.
Thank you very much in advance!

python is very flexible and can do these jobs very easily,
there is a lot of csv tools/modules in python to handle pretty much all type of csv and excel files, however i prefer to handle a csv the same as a text file because csv is simply a text file with comma separated text, so simple is better than complicated
below is the code with comments to explain most of it, you can tweak it to match your needs exactly
import os
input_folder = 'myfolder/' # path of folder containing the text files on your disk
# create a list with file names with their full paths using list comprehension
data_files = [os.path.join(input_folder, file) for file in os.listdir(input_folder)]
# open our csv file for writing
csv = open('myoutput.csv', 'w') # better to open files with context manager like below but i am trying to show you different methods
def write_to_csv(line):
print(line)
csv.write(line)
# loop thru your text files
for file in data_files:
with open(file, 'r') as f: # use context manager to open files (best practice)
buff = []
for line in f:
line = line.strip() # remove spaces and new lines
line = line.split(',') # split line to list of values
if buff and line[0] == 'id': # hit another 'id'
write_to_csv(','.join(buff) + '\n')
buff = []
buff.append(line[-1]) # add the last word in line
write_to_csv(','.join(buff) + '\n')
csv.close() # must close any open file handles opened manually "no context manager i.e. no with"
output:
ARI201803290,2,COL,ARI,PHO01,2018/03/29,2
ARI201803300,2,COL,ARI,PHO01,2018/03/30,0
NYN201803290,2,SLN,NYN,NYC20,false,2018/03/29,0
NYN201803310,2,SLN,NYN,NYC20,2018/03/31,0

Python: Saving to CSV file, accidentally writing to next column instead of row after manually opening the file

I've noticed a really weird bug and didn't know if anyone else had seen this / knows how to stop it.
I'm writing to a CSV file using this:
def write_to_csv_file(self, object, string):
with open('data_model_1.csv', 'a') as f:
writer = csv.writer(f)
writer.writerow([object, string])
and then write to the file:
self.write_to_csv_file(self.result['outputLabel'], string)
If I open the CSV file to look at the results, the next time I write to the file, it will start in column 3 of the last line (column 1 is object, column 2 is string).
If I run self.write_to_csv_file(self.result['outputLabel'], string) multiple times without manually opening the file (obviously I open the file in the Python script), everything is fine.
It's only when I open the file so I get the issue of starting on Column 3.
Any thoughts on how to fix this?

You're opening the file in append mode, so the data is appended to the end of the file. If the file doesn't end in a newline, rows may get concatenated. Try writing a newline to the file before appending new rows:
with open("data_model_1.csv", "a") as f:
f.write("\n")

Python CSV seek not working?

file_handle = open("/var/www/transactions.csv", "a")
c = csv.writer(file_handle);
oldamount = amount / 1.98
file_handle.seek(0);
c.writerow( [addre, oldamount, "win"])
Here is my code
I wish to write [addre, oldamount, "win"]) to the start of my CSV file, however it's not working. It's still going to the bottom.

You are opening the file in append ("a") mode. The documentation for open() points out this behavior explicitly: "all writes append to the end of the file regardless of the current seek position".
It isn't possible to "just insert" text at the beginning of a file like you want to. You can either read the whole file, add your data in the front, and write it back out, or you live with the fact that the data goes at the end.
Example for rewriting:
with open("/var/www/transactions.csv", "r+") as f:
olddata = f.read()
f.seek(0)
c = csv.writer(f);
c.writerow([addre, oldamount, "win"])
f.write(olddata)
Note that this can corrupt your file if something goes wrong while writing. If you want to minimize that possibility, write to a new file, then os.rename() it to overwrite the old one.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.