What I'm trying to do is basically reading a csv file to a list, ( one column only ). Then I need to read 11 elements ( one element = 9 digit number ) from the list, with comma, to a row with newline. All goes into another text file. 11 elements i a row match a A4 sheet. Then iterate over remaining elements in the list. I can't figure out how. Below follows my code I'm working on:
count = 0
textfile = 'texstf.txt'
filepath = 'testfile2.csv'
with open(filepath, "r") as f:
lines = [ str(line.rstrip()) for line in f ]
for key in lines:
while(count < 11):
with open(textfile, "w") as myfile:
myfile.write(','.join(lines))
count += 1
csv sample:
6381473
6381783
6381814
...
expected output to file sample:
6381473,6381783,6381814
I ran your code and it looks like it is working. If you could provide any more context of what specifically is not working with your code such as error messages, that would be helpful. Make sure you have the correct filepath for each file you are trying to read and write.
Here is an alternative way to do this based off of this similar question:
import csv
import os
textfile = os.getcwd() + r'\texstf.txt'
filepath = os.getcwd() + r'\testfile2.csv'
with open(textfile, "w") as my_output_file:
with open(filepath, "r") as my_input_file:
[ my_output_file.write(" ".join(row)+',') for row in csv.reader(my_input_file)]
my_output_file.close()
Related
I am new in Python. I am looking for the number of occurrences of a text string in a defined folder containing text files. I'm talking about the total number of this particular string.
def errors():
errors = 0
file = open ("\\d:\\myfolder\\*.txt", "r")
data = file.read()
errors = data.count("errors")
return errors
print("Errors:", errors)
Your code doesn't make any sense, but if I understand what you want to do, then here's some pseudo-code to get you going:
from glob import glob
text_file_paths = glob("\\d:\\myfolder\\*.txt")
error_counting = 0
for file_path in text_file_paths:
with open(file_path, 'r') as f:
all_file_lines = f.readlines()
error_counting += sum([line.count('errors') for line in all_lines])
print(error_counting)
Does that help?
I am trying to save my output from x .txt files in only one .txt file.
The .txt file should look like the output as you can see in the picture below.
What this program actually does is read a couple of .txt files with tons of data which I filter out using regex.
My source code:
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
print(values)
Thank you for your time! :)
Then you just need to open a file and write values to it. Try with this. You might need to format (I cannot test since I don't have your text files. I am assuming the output you have in values is correct and keep in mind that this is appending, so if you run more than once you will get duplicates.
import os,glob
import re
folder_path =(r"C:\Users\yokay\Desktop\DMS\Messdaten_DMT")
values_re = re.compile(r'\t\d+\t-?\d+,?\d*(\t-?\d+,?\d+){71}')
outF = open("myOutFile.txt", "a")
for filename in glob.glob(os.path.join(folder_path, '*.txt')):
with open(filename) as lines:
for line in lines:
match = values_re.search(line)
if match:
values = match.group(0).split('\t')
assert values[0] == ''
values = values[1:]
outF.write(values)
print(values)
I've been moving my file.write expression in, out, and around the loop, but no matter what, I don't get the full list, only the last one.This problem has been addressed with Python: Only writes last line of output but that thread doesn't give a clear solution.
I've tried with "w" and "a" but not successful with either.I also tried with "for line in f".
The other issue is that what I want to output is a tuple, and when I print it, I get exactly what I want it to look like, i.e.
14_sec.wav 14
16_sec.wav 16
but I've been converting it to a str in order to satisfy the write() requirement and so it doesn't appear that way in the output file.
My code is like this:
path="/Volumes/LaCie/VoicemailDownload/test"
res_file = "fileLengths.csv"
for filename in os.listdir(path):
if filename.endswith(".wav"):
with open (filename, 'r') as f:
f.seek(28)
a = f.read(4)
byteRate=0
for i in range(4):
byteRate=byteRate + ord(a[i])*pow(256,i)
fileSize=os.path.getsize(filename)
ms=((fileSize-44)*1000)/byteRate
sec = ms/1000
res = filename, sec
print filename, sec
with open (res_file, "w") as results:
#for line in f:
results.write(str(res))
I solved it by importing pandas and then exporting as a csv file, which worked great:
data = pandas.DataFrame([])
for filename in os.listdir(path):
if filename.endswith(".wav"):
f = open(filename, 'r')
f.seek(28)
a = f.read(4)
byteRate=0
for i in range(1,4):
byteRate=byteRate + ord(a[i])*pow(256,i)
fileSize=os.path.getsize(filename)
ms=((fileSize-44)*1000)/byteRate
sec = ms/1000
counter += 1
data = data.append(pandas.DataFrame({'Filename':filename, 'Sec': sec},index = [counter]), ignore_index=True)
newdata = data.to_csv(index=False)
with open(res_file, 'w') as results:
results.write(str(newdata))
I have a simple text file which contains numbers in ASCII text separated by spaces as per this example.
150604849
319865.301865 5810822.964432 -96.425797 -1610
319734.172256 5810916.074753 -52.490280 -122
319730.912949 5810918.098465 -61.864395 -171
319688.240891 5810889.851608 -0.339890 -1790
*<continues like this for millions of lines>*
basically I want to copy the first line as is, then for all following lines I want to offset the first value (x), offset the second value (y), leave the third value unchanged and offset and half the last number.
I've cobbled together the following code as a python learning experience (apologies if it crude and offensive, truly I mean no offence) and it works ok. However the input file I'm using it on is several GB in size and I'm wondering if there's ways to speed up the execution. Currently for a 740 MB file it takes 2 minutes 21 seconds
import glob
#offset values
offsetx = -306000
offsety = -5806000
files = glob.glob('*.pts')
for file in files:
currentFile = open(file, "r")
out = open(file[:-4]+"_RGB_moved.pts", "w")
firstline = str(currentFile.readline())
out.write(str(firstline.split()[0]))
while 1:
lines = currentFile.readlines(100000)
if not lines:
break
for line in lines:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), str(float(words[2])), str((int(words[3])+2050)/2)]
out.write(" ".join(newwords))
Many thanks
Don't use .readlines(). Use the file directly as an iterator:
for file in files:
with open(file, "r") as currentfile, open(file[:-4]+"_RGB_moved.pts", "w") as out:
firstline = next(currentFile)
out.write(firstline.split(None, 1)[0])
for line in currentfile:
out.write('\n')
words = line.split()
newwords = [str(float(words[0])+offsetx), str(float(words[1])+offsety), words[2], str((int(words[3]) + 2050) / 2)]
out.write(" ".join(newwords))
I also added a few Python best-practices, and you don't need to turn words[2] into a float, then back to a string again.
You could also look into using the csv module, it can handle splitting and rejoining lines in C code:
import csv
for file in files:
with open(file, "rb") as currentfile, open(file[:-4]+"_RGB_moved.pts", "wb") as out:
reader = csv.reader(currentfile, delimiter=' ', quoting=csv.QUOTE_NONE)
writer = csv.writer(out, delimiter=' ', quoting=csv.QUOTE_NONE)
out.writerow(next(reader)[0])
for row in reader:
newrow = [str(float(row[0])+offsetx), str(float(row[1])+offsety), row[2], str((int(row[3]) + 2050) / 2)]
out.writerow(newrow)
Use thé CSV package. It may be more optimized than your script and will simplify your code.
I've got a text file that is tab delimited and I'm trying to figure out how to search for a value in a specific column in this file.
I think i need to use the csv import but have been unsuccessful so far. Can someone point me in the right direction?
Thanks!
**Update**
Thanks for everyone's updates. I know I could probably use awk for this but simply for practice, I am trying to finish it in python.
I am getting the following error now:
if row.split(' ')[int(searchcolumn)] == searchquery:
IndexError: list index out of range
And here is the snippet of my code:
#open the directory and find all the files
for subdir, dirs, files in os.walk(rootdir):
for file in files:
f=open(file, 'r')
lines=f.readlines()
for line in lines:
#the first 4 lines of the file are crap, skip them
if linescounter > startfromline:
with open(file) as infile:
for row in infile:
if row.split(' ')[int(searchcolumn)] == searchquery:
rfile = open(resultsfile, 'a')
rfile.writelines(line)
rfile.write("\r\n")
print "Writing line -> " + line
resultscounter += 1
linescounter += 1
f.close()
I am taking both searchcolumn and searchquery as raw_input from the user. Im guessing the reason I am getting the list out of range now, is because it's not parsing the file correctly?
Thanks again.
You can also use the sniffer (example taken from http://docs.python.org/library/csv.html)
csvfile = open("example.csv", "rb")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
Yes, you'll want to use the csv module, and you'll want to set delimiter to '\t':
spamReader = csv.reader(open('spam.csv', 'rb'), delimiter='\t')
After that you should be able to iterate:
for row in spamReader:
print row[n]
This prints all rows in filename with 'myvalue' in the fourth tab-delimited column:
with open(filename) as infile:
for row in infile:
if row.split('\t')[3] == 'myvalue':
print row
Replace 3, 'myvalue', and print as appropriate.