I have a bunch of CSV files with common columns but different rows. I want to merge them all into one CSV file. Here is the script I wrote to do that
import glob, os
os.chdir("./data")
fout = open("merged.csv", "a")
lout = open("merger_log", "a")
for fname in glob.glob("*.csv*"):
with open(fname) as f:
# exclude header for all but the first csv file.
if os.stat("merged.csv").st_size > 0:
next(f)
fout.writelines(f)
log = "Appended %s \n" % fname
print(log)
lout.write(log)
fout.close()
lout.close()
When I run this script, it successfully appends the first few files but gets stuck on one file every time. And by stuck it seems to be adding bits from said file to the output file without moving on to the next file. There's nothing special about the file it stops on, it's about the same size as the rest of them and is not malformed. In fact, I removed that file from the data set and the program hung on a different file. Not sure what is wrong with this script.
If anyone has a better way to merge a bunch of CSV files, I'm all ears.
Thanks!
EDIT: I should mention this script works perfectly fine with just two files.
Related
Every time, all the operations are taking place only on the first file of glob folder..
The code is not updating the file in globs.. It's is taking only the first text file.
The code is written in the image..
There is no error msg. I want it to perform same operations on all text files inside glob.
From what I understood, you need to first open your writer, and then read what you need to read and write what you read in a loop . (then close your writer if you need) Not specifically your answer but should be something similar to this:
with open('some.csv', 'w') as somefile:
for filename in glob.glob('path'):
with open(filename, 'r') as file_to_read:
some_data=file_to_read.readlines()
#if you need loop again in data
#or do whatever you want
for i in range(len(some_data)):
data=some_data[i]
writer=csv.writer(somefile)
writer.writerow([filename, data])
I started with relevant_files, which was a list of paths of two CSV files. I then attempted to create a file with path output_filename, using the following block of code.
new_file = open(output_filename, 'w+')
for x in relevant_files:
for line in open(x):
new_file.write(line)
The code looks perfectly reasonable, but I totally randomly decided to check the lengths, before and after the merge. file_1 had length 6,740,108 and file_2 had length 4,938,459. Those sum to 11,678,567. However, the new file has length 11,678,550, which is 17 lines shorter than the combined length of the two source files. I then checked the CSV files by hand -- indeed, it was exactly the final 17 lines of the 2nd text file (i.e., 2nd entry in relevant_files) that had gotten dropped.
What went wrong? Is there a maximum file length or something?
I'm not sure exactly what is wrong with your script, but it's good to use with statements when working with files in python. They get rid of the need to close the file once you've opened it, which it seems you haven't done here.
with open(output_file, 'w+') as f:
lines = []
for file in relevant_files:
for line in open(file, 'r').read().split('\n'):
lines.append(line)
f.write('\n'.join(lines))
This is what I would use to complete your task.
I've just managed to run my python code on ubuntu, all seems to be going well. My python script writes out a .csv file every hour and I can't seem to find the .csv file.
Having the .csv file is important as I need to do research on the data. I am also using Filezilla, I would have thought the .csv would have run into there.
import csv
import time
collectionTime= datetime.now().strftime('%Y-%m-%d %H:%M:%S')
mylist= [d['Spaces'] for d in data]
mylist.append(collectionTime)
print(mylist)
with open("CarparkData.csv","a",newline="") as f:
writer = csv.writer(f)
writer.writerow(mylist)
In short, your code is outputting to wherever the file you're opening is in this line:
with open("CarparkData.csv","a",newline="") as f:
You can change this filename to the location of wherever you'd like the file to be read/written from/to. For example, data/CarparkData.csv if you had a folder named data/ within your application dedicated to holding data files.
As written in your code, writer.writerow will write the lines to both python's in-memory object of the file (instantiated with open("filename.csv"...), and the file itself (in this case, CarparkData.csv).
The way your code is structured, it won't be creating a new .csv every hour because it is using a static filename. If a file with this name did not exist at time of opening, it will create one, and if it did, it will continue to append new lines to the existing file.
I have two python files, both of them in the same folder. The main file executes the whole function, making my program what I want it to do. The other file writes data to a text file.
However, there's one issue with writing data to the text file: instead of writing each time new lines to the existing text, it completely overwrites the whole file.
File responsible for writing data(writefile.py)
import codecs
def start(text):
codecs.open('D:\\Program Files (x86)\\Python342\\passguess.txt', 'a', 'utf-8')
with open('D:\\Program Files (x86)\\Python342\\passguess.txt', 'w') as file:
file.write(text + '\n')
I've tried out couple of things such as .join(text) or running the code from writefile.py in the main file. Nothing seems to work..
The problem lies with the line
with open('D:\\Program Files (x86)\\Python342\\passguess.txt', 'w') as file:
this one opens the file in write mode, to append you want 'a' option so just change to
with open('D:\\Program Files (x86)\\Python342\\passguess.txt', 'a') as file:
and you should be fine
i'm very new to python, so far i have written the following code below, which allows me to search for text files in a folder, then read all the lines from it, open an excel file and save the read lines in it. (Im still unsure whether this does it for all the text files one by one)
Having run this, i only see the file text data being read and saved into the excel file (first column). Or it could be that it is overwriting the the data from multiple text files into the same column until it finishes.
Could anyone point me in the right direction on how to get it to write the stripped data to the next available column in excel through each text file?
import os
import glob
list_of_files = glob.glob('./*.txt')
for fileName in list_of_files:
fin = open( fileName, "r" )
data_list = fin.readlines()
fin.close() # closes file
del data_list[0:17]
del data_list[1:27] # [*:*]
fout = open("stripD.xls", "w")
fout.writelines(data_list)
fout.flush()
fout.close()
Can be condensed in
import glob
list_of_files = glob.glob('./*.txt')
with open("stripD.xls", "w") as fout:
for fileName in list_of_files:
data_list = open( fileName, "r" ).readlines()
fout.write(data_list[17])
fout.writelines(data_list[44:])
Are you aware that writelines() doesn't introduce newlines ? readlines() keeps newlines during a reading, so there are newlines present in the elements of data_list written in the file by writelines() , but this latter doesn't introduce newlines itself
You may like to check this and for simple needs also csv.
These lines are "interesting":
del data_list[0:17]
del data_list[1:27] # [*:*]
You are deleting as many of the first 17 lines of your input file as exist, keeping the 18th (if it exists), deleting another 26 (if they exist), and keeping any following lines. This is a very unusual procedure, and is not mentioned at all in your description of what you are trying to do.
Secondly, you are writing the output lines (if any) from each to the same output file. At the end of the script, the output file will contain data from only the last input file. Don't change your code to use append mode ... opening and closing the same file all the time just to append records is very wasteful, and only justified if you have a real need to make sure that the data is flushed to disk in case of a power or other failure. Open your output file once, before you start reading files, and close it once when you have finished with all the input files.
Thirdly, any old arbitrary text file doesn't become an "excel file" just because you have named it "something.xls". You should write it with the csv module and name it "something.csv". If you want more control over how Excel will interpret it, write an xls file using xlwt.
Fourthly, you mention "column" several times, but as you have not given any details about how your input lines are to be split into "columns", it is rather difficult to guess what you mean by "next available column". It is even possible to suspect that you are confusing columns and rows ... assuming less than 43 lines in each input file, the 18th ROW of the last input file will be all you will see in the output file.