Assign variables to all files in a folder for access - python

I have a folder that contains'.pkl' files. I want to access the data inside those multiple files to plot my results.
I am getting an error when I try to do it with a for loop. All my .pkl files contain numbering in their filename like meta_room_1_reg.pkl, meta_room_2_reg.pkl and so on. So I want to assign a single variable to each and every one.
Currently, I am doing that with an if and find() statement. But that is not ideal.
My code is here:
all_files = glob.glob('D:/Master Thesis/MCDM Combined code/results/*.pkl')
for i, curr_file in enumerate(all_files):
with open(curr_file, 'rb') as f:
mydata = pickle.load(f)
if curr_file.find('1'):
myfile = mydata
Please guide.

I'm assuming that you want to collect all data to list or something like that, right?
Use the .append method, instead of just an assignment.
myfiles = []
all_files = glob.glob('D:/Master Thesis/MCDM Combined code/results/*.pkl')
for i, curr_file in enumerate(all_files):
with open(curr_file, 'rb') as f:
mydata = pickle.load(f)
if curr_file.find('1'):
myfiles.append(mydata)

Related

Adding text files content into a list of lists

I want to "read" the content of many txt files I have in a dir, to a list.
The thing is that I want every object in the list to be a list too.
I'd like to be able to access each "file" (or content of a file) by the index - in order to later train it with an NLP model. Also, that's why I used the line.strip() because I need each content to be stripped into "lines".
Here is the code I tried, however, I get the Error:
IndexError: list index out of range
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = []
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1
I think the problem is with the fact that I'm trying to address a list index that hasn't been created.
I'm sure there's must be a simple way to do it, though I can't find it.
I'd be glad for any help!
The error is because you don't have any inner list to insert to. I would fix it like so:
for i in ent_txts:
with open(i, 'r') as f:
file_lines = [line.strip() for line in f]
ent_docs.append(file_lines)
from collections import defaultdict
os.chdir(r'C:\Users\User1\Article\BBC\bbc\entertainment')
ent_txts = glob.glob('*.txt')
ent_docs = defaultdict(list)
d=0
for i in ent_txts:
with open(i, 'r') as f:
for line in f:
ent_docs[d].append(line.strip())
d+=1

How to create a new python dictionary from loop results without overwrite

I get stuck when I try to create a new dictionary with the result from each file.
Basically I have a bunch of files which I'm reading it using glob and json, so I managed to get the value from each file and it's working fine, it's displaying all files content with the different informations which is the expected and it's good.
But now I'm looking about how to create a new dictionary new_dictonary = {} #in my code using the variable I've got get_hostname, get_fs_type, get_uptime without overwrite the new dictionary, below is my code.
import json
import glob
test_inventory = glob.glob('inventory/facts_am4/*')
new_dictonary = {}
for files in test_inventory:
with open(files, 'r') as jsonfile:
myfile = json.load(jsonfile)
get_hostname = myfile['ansible_facts']['facter_networking']['fqdn']
get_fs_type = myfile['ansible_facts']['facter_filesystems']
get_uptime = myfile['ansible_facts']['facter_system_uptime']['uptime']
print('Hostname: ' + get_hostname)
print('FS Type:' + get_fs_type)
print('Uptime:' + get_uptime)
#Here I need something which you grab the variables and create a new dictionary.
#Without overwrite.
I really tried a lot of stuffs, I'm learning Python and I came here to kindly request you help.
You can either:
make a list of dictionaries, and add a new one to it for each file, or
make a nested dictionary, where each "info-dict" is keyed by the filename.
Using a list:
data_list = []
for filename in test_inventory:
with open(filename, 'r') as file_obj:
# read the data
data = {'Hostname': get_hostname,
'FS Type': get_fs_type,
'Uptime': get_uptime}
data_list.append(data)
# Now data_list has a list of all your data, accessible as data_list[0], [1], etc..
Using a dictionary:
data_dict = {}
for filename in test_inventory:
with open(filename, 'r') as file_obj:
# as above
data_dict[filename] = data
# Now data_dict has each file's data accessible as data_dict[filename]

python: create a list of strings

I have a number of files that I am reading in. I would like to have a list that contains the file contents. I read the whole content into a string.
I need a list that looks like this:
["Content of the first file", "content of the second file",...]
I have tried various ways like append, extend or insert, but they all expect a list as parameter and not a str so I end up getting this:
[["Content of the first file"], ["content of the second file"],...]
How can I get a list that contains strings and then add strings without turning it into a list of lists?
EDIT
Some more code
for file in os.listdir("neg"):
with open("neg\\"+file,'r', encoding="utf-8") as f:
linesNeg.append(f.read().splitlines())
for file in os.listdir("pos"):
with open("pos\\"+file,'r', encoding="utf-8") as f:
linesPos.append(f.read().splitlines())
listTotal = linesNeg + linesPos
contents_list = []
for filename in filename_list:
with open(filename) as f:
contents_list.append(f.read())
There's definitely more than one way to do it. Assuming you have the opened files as file objects f1 and f2:
alist = []
alist.extend([f1.read(), f2.read()])
or
alist = [f.read() for f in (f1, f2)]
Personally I'd do something like this, but there's more than one way to skin this cat.
file_names = ['foo.txt', 'bar.txt']
def get_string(file_name):
with open(file_name, 'r') as fh:
contents = fh.read()
return contents
strings = [get_string(f) for f in file_names]

Nested with blocks in Python, level of nesting variable

I would like to combine columns of various csv files into one csv file, with a new heading, concatenated horizontally. I want to only select certain columns,chosen by heading. There are different columns in each of the files to be combined.
Example input:
freestream.csv:
static pressure,static temperature,relative Mach number
1.01e5,288,5.00e-02
fan.csv:
static pressure,static temperature,mass flow
0.9e5,301,72.9
exhaust.csv:
static pressure,static temperature,mass flow
1.7e5,432,73.1
Desired output:
combined.csv:
P_amb,M0,Ps_fan,W_fan,W_exh
1.01e5,5.00e-02,0.9e6,72.9,73.1
Possible call to the function:
reorder_multiple_CSVs(["freestream.csv","fan.csv","exhaust.csv"],
"combined.csv",["static pressure,relative Mach number",
"static pressure,mass flow","mass flow"],
"P_amb,M0,Ps_fan,W_fan,W_exh")
Here is a previous version of the code, with only one input file allowed. I wrote this with help from write CSV columns out in a different order in Python:
def reorder_CSV(infilename,outfilename,oldheadings,newheadings):
with open(infilename) as infile:
with open(outfilename,'w') as outfile:
reader = csv.reader(infile)
writer = csv.writer(outfile)
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in oldheadings.split(",")]
reorderfunc = operator.itemgetter(*writeindices)
writer.writerow(newheadings.split(","))
for row in reader:
towrite = reorderfunc(row)
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)
So what I have figure out, in order to adapt this to multiple files, is:
-I need infilename, oldheadings, and newheadings to be a list now (all of the same length)
-I need to iterate over the list of input files to make a list of readers
-readnames can also be a list, iterating over the readers
-which means I can make name2index a list of dictionaries
One thing I don't know how to do, is use the keyword with, nested n-levels deep, when n is known only at run time. I read this: How can I open multiple files using "with open" in Python? but that seems to only work when you know how many files you need to open.
Or is there a better way to do this?
I am quite new to python so I appreciate any tips you can give me.
I am only replying to the part about opening multiple files with with, where the number of files is unknown before. It shouldn't be too hard to write your own contextmanager, something like this (completely untested):
from contextlib import contextmanager
#contextmanager
def open_many_files(filenames):
files = [open(filename) for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
Which you would use like this:
innames = ['file1.csv', 'file2.csv', 'file3.csv']
outname = 'out.csv'
with open_many(innames) as infiles, open(outname, 'w') as outfile:
for infile in infiles:
do_stuff(in_file)
There is also a function that does something similar, but it is deprecated.
I am not sure if this is the correct way to do this, but I wanted to expand on Bas Swinckels answer. He had a couple small inconsistencies in his very helpful answer and I wanted to give the correect code.
Here is what I did, and it worked.
from contextlib import contextmanager
import csv
import operator
import itertools as IT
#contextmanager
def open_many_files(filenames):
files=[open(filename,'r') for filename in filenames]
try:
yield files
finally:
for f in files:
f.close()
def reorder_multiple_CSV(infilenames,outfilename,oldheadings,newheadings):
with open_many_files(filter(None,infilenames.split(','))) as handles:
with open(outfilename,'w') as outfile:
readers=[csv.reader(f) for f in handles]
writer = csv.writer(outfile)
reorderfunc=[]
for i, reader in enumerate(readers):
readnames = reader.next()
name2index = dict((name,index) for index, name in enumerate(readnames))
writeindices = [name2index[name] for name in filter(None,oldheadings[i].split(","))]
reorderfunc.append(operator.itemgetter(*writeindices))
writer.writerow(filter(None,newheadings.split(",")))
for rows in IT.izip_longest(*readers,fillvalue=['']*2):
towrite=[]
for i, row in enumerate(rows):
towrite.extend(reorderfunc[i](row))
if isinstance(towrite,str):
writer.writerow([towrite])
else:
writer.writerow(towrite)

writing .csv files from for loops and lists

I am new to python but I have searched on Stack Overflow, google, and CodeAcademy for an answer or inspiration for my obviously very simple problem. I thought finding a simple example where a for loop is used to save every interation would be easy to find but I've either missed it or don't have the vocab to ask the right question. So please don't loudly sigh in front of your monitor at this simple question. Thanks.
I would like to simply write a csv file with each iteration of the two print lines on the code below in a seperate column. so an output example might look like:
##################
andy.dat, 8
brett.dat, 9
candice.dat, 11
#################
the code I have so far is:
import sys
import os.path
image_path = "C:\\"
for filename in os.listdir (image_path):
print filename
print len(filename)
If I try to do
x = filename
then I only get the last interation of the loop written to x. How do I write all of them to x using a for loop? Also, how to write it as a column in a csv with the print result of len(filename) next to it? Thanks.
Although for this task you don't need it, I would take advantage of standard library modules when you can, like csv. Try something like this,
import os
import csv
csvfile = open('outputFileName.csv', 'wb')
writer = csv.writer(csvfile)
for filename in os.listdir('/'): # or C:\\ if on Windows
writer.writerow([filename, len(filename)])
csvfile.close()
I'd probably change this:
for filename in os.listdir (image_path):
print filename
print len(filename)
To something like
lines = list()
for filename in os.listdir(image_path):
lines.append("%s, %d" % (filename, len(filename)))
My version creates a python list, then on each iteration of your for loop, appends an entry to it.
After you're done, you could print the lines with something like:
for line in lines:
print(line)
Alternatively, you could initially create a list of tuples in the first loop, then format the output in the second loop. This approach might look like:
lines = list()
# Populate list
for filename in os.listdir(image_path):
lines.append((filename, len(filename))
# Print list
for line in lines:
print("%s, %d" % (line[0], line[1]))
# Or more simply
for line in lines:
print("%s, %d" % line)
Lastly, you don't really need to explicitly store the filename length, you could just calculate it and display it on the fly. In fact, you don't even really need to create a list and use two loops.
Your code could be as simple as
import sys, os
image_path = "C:\\"
for filename in os.listdir(image_path):
print("%s, %d" % (filename, len(filename))
import sys
import os
image_path = "C:\\"
output = file("output.csv", "a")
for filename in os.listdir (image_path):
ouput.write("%s,%d" % (filename, len(filename)))
Where a in file constructor opens file for appending, you can read more about different modes in which you can use file object here.
Try this:
# Part 1
import csv
import os
# Part 2
image_path = "C:\\"
# Part 3
li = [] # empty list
for filename in os.listdir(image_path):
li.append((filename, len(filename))) # populating the list
# Part 4
with open('test.csv', 'w') as f:
f.truncate()
writer = csv.writer(f)
writer.writerows(li)
Explanation:
In Part 1,
we import the module os and csv.
In Part 2,
we declare image_path.
Now the for loops..
In Part 3,
we declare an empty list (li).
Then we go into a for loop, in which we populate the list with every item and it's length in image_path.
In Part 4,
we move to writing the csv file. In the with statement, we wrote all the data from li, into our file.

Categories