writing .csv files from for loops and lists - python

I am new to python but I have searched on Stack Overflow, google, and CodeAcademy for an answer or inspiration for my obviously very simple problem. I thought finding a simple example where a for loop is used to save every interation would be easy to find but I've either missed it or don't have the vocab to ask the right question. So please don't loudly sigh in front of your monitor at this simple question. Thanks.
I would like to simply write a csv file with each iteration of the two print lines on the code below in a seperate column. so an output example might look like:
##################
andy.dat, 8
brett.dat, 9
candice.dat, 11
#################
the code I have so far is:
import sys
import os.path
image_path = "C:\\"
for filename in os.listdir (image_path):
print filename
print len(filename)
If I try to do
x = filename
then I only get the last interation of the loop written to x. How do I write all of them to x using a for loop? Also, how to write it as a column in a csv with the print result of len(filename) next to it? Thanks.

Although for this task you don't need it, I would take advantage of standard library modules when you can, like csv. Try something like this,
import os
import csv
csvfile = open('outputFileName.csv', 'wb')
writer = csv.writer(csvfile)
for filename in os.listdir('/'): # or C:\\ if on Windows
writer.writerow([filename, len(filename)])
csvfile.close()

I'd probably change this:
for filename in os.listdir (image_path):
print filename
print len(filename)
To something like
lines = list()
for filename in os.listdir(image_path):
lines.append("%s, %d" % (filename, len(filename)))
My version creates a python list, then on each iteration of your for loop, appends an entry to it.
After you're done, you could print the lines with something like:
for line in lines:
print(line)
Alternatively, you could initially create a list of tuples in the first loop, then format the output in the second loop. This approach might look like:
lines = list()
# Populate list
for filename in os.listdir(image_path):
lines.append((filename, len(filename))
# Print list
for line in lines:
print("%s, %d" % (line[0], line[1]))
# Or more simply
for line in lines:
print("%s, %d" % line)
Lastly, you don't really need to explicitly store the filename length, you could just calculate it and display it on the fly. In fact, you don't even really need to create a list and use two loops.
Your code could be as simple as
import sys, os
image_path = "C:\\"
for filename in os.listdir(image_path):
print("%s, %d" % (filename, len(filename))

import sys
import os
image_path = "C:\\"
output = file("output.csv", "a")
for filename in os.listdir (image_path):
ouput.write("%s,%d" % (filename, len(filename)))
Where a in file constructor opens file for appending, you can read more about different modes in which you can use file object here.

Try this:
# Part 1
import csv
import os
# Part 2
image_path = "C:\\"
# Part 3
li = [] # empty list
for filename in os.listdir(image_path):
li.append((filename, len(filename))) # populating the list
# Part 4
with open('test.csv', 'w') as f:
f.truncate()
writer = csv.writer(f)
writer.writerows(li)
Explanation:
In Part 1,
we import the module os and csv.
In Part 2,
we declare image_path.
Now the for loops..
In Part 3,
we declare an empty list (li).
Then we go into a for loop, in which we populate the list with every item and it's length in image_path.
In Part 4,
we move to writing the csv file. In the with statement, we wrote all the data from li, into our file.

Related

Extracting a diffrentiating numerical value from multiple files - PowerShell/Python

I have multiple text files containing different text.
They all contain a single appearance of the same 2 lines I am interested in:
================================================================
Result: XX/100
I am trying to write a script to collect all those XX values (numerical values between 0 and 100), and paste them in a CSV file with the text file name in column A and the numerical value in column B.
I have considered using Python or PowerShell for this purpose.
How can I identify the line where "Result" appears under the string of "===..", collect its content until '\n', and then strip it from "Result: " and "/100"?
"Result" and other numerical values could appear in the files, but never in the quoted format, and below "=====", like the line im interested in.
Thank you!
Edit: I have written this poor naive attempt to collect the numerical values.
import os
dir_path = os.path.dirname(os.path.realpath(__file__))
for filename in os.listdir(dir_path):
if filename.endswith(".txt"):
with open(filename,"r") as f:
lineFound=False
for index, line in enumerate(f):
if lineFound:
line=line.replace("Result: ", "")
line=line.replace("/100","")
line.strip()
grade=line
lineFound=False
print(grade, end='')
continue
if index>3:
if "================================================================" in line:
lineFound=True
I'd still be happy to learn if there's a simple way to do this with PowerShell tbh
For the output, I used csv writer to append the results to a file one by one.
So there's two steps involved here, first is to get a list of files. There's a ton of answers for that one on stackoverflow, but this one is stupidly complete.
Once you have the list of files, you can simply just load the files themselves one by one, and then do some simple string.split() to get the value you want.
Finally, write the results into a CSV file. Since the CSV file is a simple one, you don't need to use the CSV library for this.
See the code example below. Note that I copied/pasted the function for generating the list of files from my personal github repo. I reuse that one a lot.
import os
def get_files_from_path(path: str = ".", ext:str or list=None) -> list:
"""Find files in path and return them as a list.
Gets all files in folders and subfolders
See the answer on the link below for a ridiculously
complete answer for this.
https://stackoverflow.com/a/41447012/9267296
Args:
path (str, optional): Which path to start on.
Defaults to '.'.
ext (str/list, optional): Optional file extention.
Defaults to None.
Returns:
list: list of file paths
"""
result = []
for subdir, dirs, files in os.walk(path):
for fname in files:
filepath = f"{subdir}{os.sep}{fname}"
if ext == None:
result.append(filepath)
elif type(ext) == str and fname.lower().endswith(ext.lower()):
result.append(filepath)
elif type(ext) == list:
for item in ext:
if fname.lower().endswith(item.lower()):
result.append(filepath)
return result
filelist = get_files_from_path("path/to/files/", ext=".txt")
split1 = "================================================================\nResult: "
split2 = "/100"
with open("output.csv", "w") as outfile:
outfile.write('filename, value\n')
for filename in filelist:
with open(filename) as infile:
value = infile.read().split(split1)[1].split(split2)[0]
print(value)
outfile.write(f'"{filename}", {value}\n')
You could try this.
In this example the filename written to the CSV will be its full (absolute) path. You may just want the base filename.
Uses the same, albeit seemingly unnecessary, mechanism for deriving the source directory. It would be unusual to have your Python script in the same directory as your data.
import os
import glob
equals = '=' * 64
dir_path = os.path.dirname(os.path.realpath(__file__))
outfile = os.path.join(dir_path, 'foo.csv')
with open(outfile, 'w') as csv:
print('A,B', file=csv)
for file in glob.glob(os.path.join(dir_path, '*.txt')):
prev = None
with open(file) as indata:
for line in indata:
t = line.split()
if len(t) == 2 and t[0] == 'Result:' and prev.startswith(equals):
v = t[1].split('/')
if len(v) == 2 and v[1] == '100':
print(f'{file},{v[0]}', file=csv)
break
prev = line

Adding commas in between JSON objects while writing,

I am parsing an extremely large JSON file using IJSON and then writing the contents to a temp file. Afterwards, I overwrite the original file with the contents of the temp file.
FILE_NAME = 'file-name'
DIR_PATH = 'path'
#Generator function that yields dictionary objects.
def constructDictionary():
data = open(os.path.join(DIR_PATH, FILE_NAME + ".json"), "rb")
row = ijson.items(data,'item')
for record in row:
yield record
data.close()
def writeToTemp(row, temp):
#Needs to add a comma
json.dump(row, temp)
def writeTempToFile(temp):
temp.seek(0)
data = open(os.path.join(DIR_PATH, FILE_NAME + ".json"), "wb")
data.write(b'[')
for line in temp:
data.write(line.encode('utf-8'))
data.write(b']')
data.close()
if __name__ == "__main__":
temp = tempfile.NamedTemporaryFile(mode = 'r+')
for row in constructDictionary():
writeToTemp(row,temp)
writeTempToFile(temp)
temp.close()
My issue is that I end up with the JSON objects being written without commas between them. I can't parse over the file again and add the missing commas as it would take way too long. Ideally, while writing i would be able to add a comma at the end of each json.dump(). But, how would i handle the final entry?
Some way to determine when the generator function has reached the end of the file? Then i would use a flag or pass a variable so that it wouldn't write the final comma.
Or, i could use file.seek() to go to the character before the final character and remove it. But that sounds not good.
I would appreciate any suggestions, thank you.
Ideally, while writing i would be able to add a comma at the end of each json.dump(). But, how would i handle the final entry?
I suggest taking different view - rather than writing comma after each but last element, writing comma before each but first element. This way it is enough to next once before using generator normal way, consider following simple example: I want to print 10 times A sheared by *, then I can do:
import itertools
a10 = itertools.repeat("A", 10)
print(next(a10), end='')
for i in a10:
print('*', end='')
print(i, end='')
output:
A*A*A*A*A*A*A*A*A*A
have you tried this json.dump(row, temp, indent=4)

Why does Python give 3 different lines from outputWriter instead of overwriting the lines?

I am learning how to import information into .csv files with Python. I have the following code:
import csv
outputFile = open('output.csv','w',newline = '')
outputWriter = csv.writer(outputFile)
outputWriter.writerow(['spam','eggs','bacon','ham'])
outputWriter.writerow(['Hello World!','eggs','bacon','ham'])
outputWriter.writerow([1,2,3.141592,4])
outputFile.close()
My csv file looks like this:
Why does it output in 3 individual rows instead of overwriting the 1st row each time? How would I get it to overwrite if I wanted to?
Thank you for your insight from a beginner.
You can use seek() to go back to the beginning of the file before writing each row.
import csv
outputFile = open('output.csv','w',newline = '')
outputWriter = csv.writer(outputFile)
outputWriter.writerow(['spam','eggs','bacon','ham'])
outputFile.seek(0)
outputWriter.writerow(['Hello World!','eggs','bacon','ham'])
outputFile.seek(0)
outputWriter.writerow([1,2,3.141592,4])
outputFile.truncate() # clear out any remnants of previous lines.
outputFile.close()
I believe that behavior is from the csv library, if you want to write in one line, you could try this:
with open("output.csv","w") as writer:
writer.write(",".join(['spam','eggs','bacon','ham']))
writer.write(",".join(['Hello World!','eggs','bacon','ham']))
writer.write(",".join(['1','2','3.141592','4']))
Additionally, if you want to use CSV library you could try to do this:
import csv
# Create 1 list from the other 3
list1 = ['spam','eggs','bacon','ham']
list2 = ['Hello World!','eggs','bacon','ham']
list3 = [1,2,3.141592,4]
completeList = list1 + list2 + list3
# Write that list into one line
outputFile = open('output.csv','w',newline = '')
outputWriter = csv.writer(outputFile)
outputWriter.writerow(completeList)
Edit:
This is the result I got from both approaches.
If you are not reaching this result. It is probable that your list elements contain some line break "\n" inside them. I am not sure where you are reading your lists as inputs. If you have any input it would be worthy if you post it as well.

Nested with blocks in Python, level of nesting variable

I would like to combine columns of various csv files into one csv file, with a new heading, concatenated horizontally. I want to only select certain columns,chosen by heading. There are different columns in each of the files to be combined.
Example input:
freestream.csv:
static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02
fan.csv:
static pressure,static temperature,mass flow
0.9e5,301,72.9
exhaust.csv:
static pressure,static temperature,mass flow
1.7e5,432,73.1
Desired output:
combined.csv:
P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1
Possible call to the function:
reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
"combined.csv",["static pressure,relative Mach number",
"static pressure,mass flow","mass flow"],
"P_amb,M0,Ps_fan,W_fan,W_exh")
Here is a previous version of the code, with only one input file allowed. I wrote this with help from write CSV columns out in a different order in Python:
def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
with open(infilename) as infile:
with open(outfilename,'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in oldheadings.split(",")]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(newheadings.split(","))
for row in reader:
towrite = reorderfunc(row)
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)
So what I have figure out, in order to adapt this to multiple files, is:
-I need infilename, oldheadings, and newheadings to be a list now (all of the same length)
-I need to iterate over the list of input files to make a list of readers
-readnames can also be a list, iterating over the readers
-which means I can make name2index a list of dictionaries
One thing I don't know how to do, is use the keyword with, nested n-levels deep, when n is known only at run time. I read this: How can I open multiple files using "with open" in Python? but that seems to only work when you know how many files you need to open.
Or is there a better way to do this?
I am quite new to python so I appreciate any tips you can give me.
I am only replying to the part about opening multiple files with with, where the number of files is unknown before. It shouldn't be too hard to write your own contextmanager, something like this (completely untested):
from contextlib import contextmanager
#contextmanager
def open_many_files(filenames):
files = [open(filename) for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
Which you would use like this:
innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
for infile in infiles:
do_stuff(in_file)
There is also a function that does something similar, but it is deprecated.
I am not sure if this is the correct way to do this, but I wanted to expand on Bas Swinckels answer. He had a couple small inconsistencies in his very helpful answer and I wanted to give the correect code.
Here is what I did, and it worked.
from contextlib import contextmanager
import csv
import operator
import itertools as IT
#contextmanager
def open_many_files(filenames):
files=[open(filename,'r') for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
with open_many_files(filter(None,infilenames.split(','))) as handles:
with open(outfilename,'w') as outfile:
readers=[csv.reader(f) for f in handles]
writer = csv.writer(outfile)
reorderfunc=[]
for i, reader in enumerate(readers):
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
reorderfunc.append(operator.itemgetter(*writeindices))
writer.writerow(filter(None,newheadings.split(",")))
for rows in IT.izip_longest(*readers,fillvalue=['']*2):
towrite=[]
for i, row in enumerate(rows):
towrite.extend(reorderfunc[i](row))
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)

python beginner - how to read contents of several files into unique lists?

I'd like to read the contents from several files into unique lists that I can call later - ultimately, I want to convert these lists to sets and perform intersections and subtraction on them. This must be an incredibly naive question, but after poring over the iterators and loops sections of Lutz's "Learning Python," I can't seem to wrap my head around how to approach this. Here's what I've written:
#!/usr/bin/env python
import sys
OutFileName = 'test.txt'
OutFile = open(OutFileName, 'w')
FileList = sys.argv[1: ]
Len = len(FileList)
print Len
for i in range(Len):
sys.stderr.write("Processing file %s\n" % (i))
FileNum = i
for InFileName in FileList:
InFile = open(InFileName, 'r')
PathwayList = InFile.readlines()
print PathwayList
InFile.close()
With a couple of simple test files, I get output like this:
Processing file 0
Processing file 1
['alg1\n', 'alg2\n', 'alg3\n', 'alg4\n', 'alg5\n', 'alg6']
['csr1\n', 'csr2\n', 'csr3\n', 'csr4\n', 'csr5\n', 'csr6\n', 'csr7\n', 'alg2\n', 'alg6']
These lists are correct, but how do I assign each one to a unique variable so that I can call them later (for example, by including the index # from range in the variable name)?
Thanks so much for pointing a complete programming beginner in the right direction!
#!/usr/bin/env python
import sys
FileList = sys.argv[1: ]
PathwayList = []
for InFileName in FileList:
sys.stderr.write("Processing file %s\n" % (i))
InFile = open(InFileName, 'r')
PathwayList.append(InFile.readlines())
InFile.close()
Assuming you read in two files, the following will do a line by line comparison (it won't pick up any extra lines in the longer file, but then they'd not be the same if one had more lines than the other ;)
for i, s in enumerate(zip(PathwayList[0], PathwayList[1]), 1):
if s[0] == s[1]:
print i, 'match', s[0]
else:
print i, 'non-match', s[0], '!=', s[1]
For what you're wanting to do, you might want to take a look at the difflib module in Python. For sorting, look at Mutable Sequence Types, someListVar.sort() will sort the contents of someListVar in place.
You could do it like that if you don't need to remeber where the contents come from :
PathwayList = []
for InFileName in FileList:
sys.stderr.write("Processing file %s\n" % InFileName)
InFile = open(InFileName, 'r')
PathwayList.append(InFile.readlines())
InFile.close()
for contents in PathwayList:
# do something with contents which is a list of strings
print contents
or, if you want to keep track of the files names, you could use a dictionary :
PathwayList = {}
for InFileName in FileList:
sys.stderr.write("Processing file %s\n" % InFileName)
InFile = open(InFileName, 'r')
PathwayList[InFile] = InFile.readlines()
InFile.close()
for filename, contents in PathwayList.items():
# do something with contents which is a list of strings
print filename, contents
You might want to check out Python's fileinput module, which is a part of the standard library and allows you to process multiple files at once.
Essentially, you have a list of files and you want to change to list of lines of these files...
Several ways:
result = [ list(open(n)) for n in sys.argv[1:] ]
This would get you a result like -> [ ['alg1', 'alg2', 'alg3'], ['csr1', 'csr2'...]] Accessing would be like 'result[0]' which would result in ['alg1', 'alg2', 'alg3']...
Somewhat better might be dictionary:
result = dict( (n, list(open(n))) for n in sys.argv[1:] )
If you want to just concatenate, you would just need to chain it:
import itertools
result = list(itertools.chain.from_iterable(open(n) for n in sys.argv[1:]))
# -> ['alg1', 'alg2', 'alg3', 'csr1', 'csr2'...
Not one-liners for a beginner...however now it would be a good exercies to try to comprehend what's going on :)
You need to dynamically create the variable name for each file 'number' that you're reading. (I'm being deliberately vague on purpose, knowing how to build variables like this is quite valuable and more readily remembered if you discover it yourself)
something like this will give you a start
You need a list which holds your PathwayList lists, that is a list of lists.
One remark: it is quite uncommon to use capitalized variable names. There is no strict rule for that, but by convention most people only use capitalized names for classes.

Categories