Convert txt files in a folder to rows in csv file - python

I have 100 txt files in a folder. I would like to create a csv file in which the content of each text file becomes a single row (actually, a single cell in a row) in this csv file. So, the result would be a csv file with 100 rows.
I tried the following code:
import glob
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
for f in read_files:
with open(f, "rb") as infile:
for line in infile:
outfile.write(line)
This create a csv with over thousands of rows since each txt file contains multiple paragraphs. Any suggestion?

Try:
import glob
import csv
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
w=csv.writer(outfile)
for f in read_files:
with open(f, "rb") as infile:
w.writerow([line for line in infile])
That makes each line a cell in the output and each file a row.
If you want each cell to be the entire contents of the file, try:
import glob
import csv
read_files = glob.glob('neg/*')
with open("neg.csv", "wb") as outfile:
w=csv.writer(outfile)
for f in read_files:
with open(f, "rb") as infile:
w.writerow(" ".join([line for line in infile]))

Before writing each line, first do line.replace('\n',' ') to replace all new line characters with spaces.
Obviously, adjust your newline character according to your OS.

Related

Converting .txt file to CSV

I need code that can read a .txt file and output the data inside into a CSV file. The .txt file would have data in this form:
Jeff/Terry/01-10-2020/1-2
+Tom/02-10-2020
-Jeff/03-10-2020
And I need to write the data into a CSV file where the string is split and separated every time it encounters a "/". So the CSV file would look something like this:
Column A
Column B
Column C
Column D
Jeff
Terry
01-10-2020
1-2
+Tom
02-10-2020
-Jeff
03-10-2020
Furthermore I would also need code to write this CSV file data back into another .txt file in the same format, with the "/" separating the data of the cells.
I currently have this:
import csv
with open ("data.txt", "r") as in_file:
stripped = (line.strip() for line in in_file)
lines = (line.split("/") for line in stripped if line
open ("data.csv", "w") as out_file:
writer = csv.writer(out_file)
writer.writerow(('first', 'second'))
writer.writerows(lines)
Currently it is nowhere near the functionality that I need, and it is also giving me syntax errors on the "o" of open on the 4th last line, open ("data.csv", "w") as out_file:.
There are a few grammatical errors here.
Line 4 has missing closing parenthesis: lines = (line.split("/") for line in stripped if line).
Line 5 should start with with: with open("data.csv", "w") as out_file:.
Also, it's better to place the 5th line outside the with block on line 2.
Lines 6 through 8 should have one indent, not two.
Try the following (see the documentation):
import csv
with open("data.txt", "r", newline='') as in_file:
lines = list(csv.reader(in_file, delimiter="/"))
with open("data.csv", "w", newline='') as out_file:
writer = csv.writer(out_file, delimiter="/")
writer.writerow(('first', 'second'))
writer.writerows(lines)

Python - replace the startswith character

I want to replace the first character in each line from the text file.
2 1.510932 0.442072 0.978141 0.872182
5 1.510932 0.442077 0.978141 0.872181
Above is my text file.
import sys
import glob
import os.path
list_of_files = glob.glob('/path/txt/23.txt')
for file_name in list_of_files:
f= open(file_name, 'r')
lst = []
for line in f:
f = open(file_name , 'w')
if line.startswith("2 "):
line = line.replace("2 ","7")
f.write(line)
f.close()
What i want:-
If the number starting with 2, i want to change that into 7. The problem is that, In the same line multiple 7 is there. If i change startswith character and save everything was changing
Thanks
The proper solution is (pseudo code):
open sourcefile for reading as input
open temporaryfile for writing as output
for each line in input:
fix the line
write it to output
close input
close output
replace sourcefile with temporaryfile
We use a temporary file and write along to avoid potential memory errors.
I leave it up to you to translate this to Python (hint: that's quite straightforward).
This is one approach.
Ex:
for file_name in list_of_files:
data = []
with open(file_name) as infile:
for line in infile:
if line.startswith("2 "): #Check line
line = " ".join(['7'] + line.split()[1:]) #Update line
data.append(line)
with open(file_name, "w") as outfile: #Write back to file
for line in data:
outfile.write(line+"\n")

Errors when reading column name from csv files and saving as list

I have a folder that has over 15,000 csv files. They all have different number of column names.
Most files have its first row as a column name (attribute of data) like this :
Name Date Contact Email
a b c d
a2 b2 c2 d2
What I want to do is read first row of all files, store them as a list, and write that list as new csv file.
Here is what I have done so far :
import csv
import glob
list=[]
files=glob.glob('C:/example/*.csv')
for file in files :
f = open(file)
a=[file,f.readline()]
list.append(a)
with open('test.csv', 'w') as testfile:
csv_writer = csv.writer(testfile)
for i in list:
csv_writer.writerow(i)
When I try this code, result comes out like this :
[('C:/example\\example.csv', 'Name,Date,Contact,Email\n'), ('C:/example\\example2.csv', 'Address,Date,Name\n')]
Therefore in a made csv, all attributes of each file go into second column making it look like this (for some reason, there's a empty row between) :
New CSV file made
Moreover when going through files, I have encoutered another error :
UnicodeDecodeError: 'cp949' codec can't decode byte 0xed in position 6: illegal multibyte sequence
So I included this code in first line but it didn't work saying files are invalid.
import codecs
files=glob.glob('C:/example/*.csv')
fileObj = codecs.open( files, "r", "utf-8" )
I read answers on stackflow but I couldn't find one related to my problem. I appreciate your answers.
Ok, so
import csv
import glob
list=[]
files=glob.glob('C:/example/*.csv')
for file in files :
f = open(file)
a=[file,f.readline()]
list.append(a)
here you're opening the file and then creating a list with the column headers as a string(note that means they'll look like "Column1,Column2") and the file name. So [("Filename", "Column1, Column2")]
so you're going to need to split that on the ',' like:
for file in files :
f = open(file)
a=[file] + f.readline().split(',')
Now we have:
["filename", ("Column1", "Column2")]
So it's still going to print to the file wrong. We need to concatenate the lists.
a=[file] + f.readline().split(',')
So we get:
["filename", "Column1", "Column2"]
And you should be closing each file after you open it with f.close() or use a context manager inside your loop like:
for file in files :
with open(file) as f:
a=[file] + f.readline()
list.append(a)
Better solution and how I would write it:
import csv
import glob
files = glob.glob('mydir/*.csv')
lst = list()
for file in files:
with open(file) as f:
reader = csv.reader(f)
lst.append(next(reader))
try:
with open(files,'r'.encoding='utf8') as f:
# do things
except UnicodeError:
with open(files,'r'.encoding='utf8') as f:
# do things
a little bit of tidying, proper context managing, and using csv.reader:
import csv
import glob
list=[]
files=glob.glob('C:/example/*.csv')
with open('test.csv', 'w') as testfile:
csv_writer = csv.writer(testfile)
for file in files:
with open(file, 'r') as infile:
reader = csv.reader(infile)
headers = next(reader)
lst = [file] + headers
writer.writerow(lst)
this will write a new csv with one row per infile, each row being filename, column1, column2, ...

Appending output of a for loop for Python to a csv file

I have a folder with .txt files in it. My code will find the line count and character count in each of these files and save the output for each file in a single csv file in a different directory. The csv file is Linecount.csv. For some reason the output to csv file is repeating for character and linecount for the last output, though printing the output is producing correct results. The output of the print statement is correct.
For the csv file it is not.
import glob
import os
import csv
os.chdir('c:/Users/dasa17/Desktop/sample/Upload')
for file in glob.glob("*.txt"):
chars = lines = 0
with open(file,'r')as f:
for line in f:
lines+=1
chars += len(line)
a=file
b=lines
c=chars
print(a,b,c)
d=open('c:/Users/dasa17/Desktop/sample/Output/LineCount.cs‌​v', 'w')
writer = csv.writer(d,lineterminator='\n')
for a in os.listdir('c:/Users/dasa17/Desktop/sample/Upload'):
writer.writerow((a,b,c)) d.close()
Please check your indentation.
You are looping through each file using for file in glob.glob("*.txt"):
This stores the last result in a,b, and c. It doesn't appear to write it anywhere.
You then loop through each item using for a in os.listdir('c:/Users/dasa17/Desktop/sample/Upload'):, and store a from this loop (the filename), and the last value of b and c from the initial loop.
I've not run but reordering as follows may solve the issue:
import glob
import os
import csv
os.chdir('c:/Users/dasa17/Desktop/sample/Upload')
d=open('c:/Users/dasa17/Desktop/sample/Output/LineCount.cs‌​v', 'w')
writer = csv.writer(d,lineterminator='\n')
for file in glob.glob("*.txt"):
chars = lines = 0
with open(file,'r') as f:
for line in f:
lines+=1
chars += len(line)
a=file
b=lines
c=chars
print(a,b,c)
writer.writerow((a,b,c))
d.close()

Start reading and writing on specific line on CSV with Python

I have a CSV file that looks like this:
COL_A,COL_B
12345,A=1$B=2$C=3$
How do I read that file and wrote it back to a new file but just the second row (line)?
I want the output file to contain:
12345,A=1$B=2$C=3$
Thanks!
The following reads your csv, extracts the second row, then writes that second row to another file.
with open('file.csv') as file:
second_line = list(file)[1]
with open('out.csv', mode = 'w') as file:
file.write(second_line)
outfile = open(outfilename,"w")
with open(filename) as f:
for line in f:
print >> outfile , line.split()[-1]
outfile.close()
as long as the lines actually look like the line you posted in the OP

Categories