Im trying to append multiple csv files from one directory into a single file within another. When I run this code it appears to compile successfully but it does not take effect. The combined.csv file remains empty after each run. There are also no errors within the console. I attempted this on multiple IDEs (vs code, pycharm, and spyder).
import os
import glob
import pandas
def concatenate(indir="/directoryA/directoryB/csvFile_directoryC",
outfile="/directoryA/directoryB/combine.csv"):
os.chdir(indir)
filelist=glob.glob("*.csv")
dfList=[]
colnames=["c1","c2","c3","c4"]
for filename in filelist:
print(filename)
df=pandas.read_csv(filename,header=None)
dfList.append(df)
concatDf=pandas.concat(dfList,axis=0)
concatDf.columns=colnames
concatDf.to_csv(outfile,index=None)
Well, it's not going to print anything if you didn't call it ;)
I think you just forgot to call your function in your program, that's why it is compiling, but since the function is never being called you are never getting a print.
If it's not what's been stated above (i.e. you need to call the function), you may not actually be able to find any *.csv files in that directory. Since it's a for loop over filelist, if filelist turns up empty, you'll still be left with dfList that's an empty list, but is still valid to concat on your concatDf.
If it's not the fact that the function is never called, try printing the result from os.listdir() to see what's in there for glob to check against, and check that your filelist isn't an empty list.
Related
I am attempting to find every file in a directory that contains the file extension: '.py'
I tried doing:
if str(file.contains('.py')):
pass
I also thought of doing a for loop and going through each character in the filename but thought better of it, thinking that it would be too intense, and concluded that there would be an easier way to do things.
Below is an example of what I would want my code to look like, obviously replacing line 4 with an appropriate answer.
def find_py_in_dir():
for file in os.listdir():
#This next line is me guessing at some method
if str(file.contains('.py')):
pass
Ideally, you'd use endswith('.py') for actual file extensions, not checking substrings (which you'd do using in statements)
But, forget the if statement
https://docs.python.org/3/library/glob.html
import glob
for pyile in glob.glob('*.py'):
print(pyile)
if requires a boolean and not a string, so remove the str and replace .contains with .__contains__
if file.__contains__ ('.py'):
You can also do:
if '.py' in file:
I've searched for an answer for this but the answers still gave me an error message and I wasn't allowed to ask there because I had to make a new question. So here it goes...
I need my python script to use the latest file in a folder.
I tried several things, currently the piece of code looks like this:
list_of_files = glob.glob('/my/path/*.csv')
latest_file = max(list_of_files, key=os.path.getmtime)
But the code fails with the following comment:
ValueError: max() arg is an empty sequence
Does anyone have an idea why?
It should be ok if the list is not empty, but it seems to be. So first check if the list isn't empty by printing it or something similar.
I tested this code and it worked fine:
import os
import glob
mypath = "C:/Users/<Your username>/Downloads/*.*"
print(min(glob.glob(mypath), key=os.path.getmtime))
print(max(glob.glob(mypath), key=os.path.getmtime))
glob.glob has a limitation of not matching the files that start with a .
So, if you want to match these files, this is what you should do - (assume a directory having .picture.png in it)
import glob
glob.glob('.p*') #assuming you're already in the directory
Also, it would be an ideal way to check the number of files present in the directory, before operating on them.
Using python 2.7
I have a list of *.tat.gz files on a linux box. Using python, I want to loop through the files and extract those files in a different location, under their respective folders.
For example: if my file name is ~/TargetData/zip/1440198002317590001.tar.gz
then I want to untar and ungzip this file in a different location under its
respective folder name i.e. ~/TargetData/unzip/1440198002317590001.
I have written some code but I am not able to loop through the files. In a command line I am able to untar using $ tar -czf 1440198002317590001.tar.gz 1440198002317590001 command. But I want to be able to loop through the .tar.gz files. The code is mentioned below. Here, I’m not able to loop just the files Or print only the files. Can you please help?
import os
inF = []
inF = str(os.system('ls ~/TargetData/zip/*.tar.gz'))
#print(inF)
if inF is not None:
for files in inF[:-1]:
print files
"""
os.system('tar -czf files /unzip/files[:-7]')
# This is what i am expecting here files = "1440198002317590001.tar.gz" and files[:-7]= "1440198002317590001"
"""
Have you ever worked on this type of use case? Your help is greatly appreciated!! Thank you!
I think you misunderstood the meaning of os.system(), that will do the job, but its return value was not expected by you, it returns 0 for successful done, you can not directly assign its output to a variable. You may consider the module [subprocess], see doc here. However, I DO NOT recommend that way to list files (actually, it returns string instead of list, see doc find the detail by yourself).
The best way I think would be glob module, see doc here. Use glob.glob(pattern), you can put all files match the pattern in a list, then you can loop it easily.
Of course, if you are familiar with os module, you also can use os.listdir(), os.path.join(), or even os.paht.expanduser() to do this. (Unlike glob, it only put filenames without fully path into a list, you need to reconstruct file path).
By the way, for you purpose here, there is no need to declare an empty list first (i.e. inF = [])
For unzip file part, you can do it by os.system, but I also recommend to use subprocess module instead of os.system, you will find the reason in the doc of subprocess.
DO NOT see the following code, ONLY see them after you really can not solve this by yourself.
import os
import glob
inF = glob.glob('~/TargetData/zip/*.tar.gz')
if inF:
for files in inF:
# consider subprocess.call() instead of os.system
unzip_name = files.replace('zip', 'unzip')[:-7]
# get directory name and make sure it exists, otherwise create it
unzip_dir = os.path.dirname(unzip_name)
if not os.path.exists(unzip_dir):
os.mkdir(unzip_dir)
subprocess.call(['tar -xzf', files, '-C', unzip_name])
# os.system('tar -czf files /unzip/files[:-7]')
If I am to read a number of files in Python 3.2, say 30-40, and i want to keep the file references in a list
(all the files are in a common folder)
Is there anyway how i can open all the files to their respective file handles in the list, without having to individually open every file via the file.open() function
This is simple, just use a list comprehension based on your list of file paths. Or if you only need to access them one at a time, use a generator expression to avoid keeping all forty files open at once.
list_of_filenames = ['/foo/bar', '/baz', '/tmp/foo']
open_files = [open(f) for f in list_of_filenames]
If you want handles on all the files in a certain directory, use the os.listdir function:
import os
open_files = [open(f) for f in os.listdir(some_path)]
I've assumed a simple, flat directory here, but note that os.listdir returns a list of paths to all file objects in the given directory, whether they are "real" files or directories. So if you have directories within the directory you're opening, you'll want to filter the results using os.path.isfile:
import os
open_files = [open(f) for f in os.listdir(some_path) if os.path.isfile(f)]
Also, os.listdir only returns the bare filename, rather than the whole path, so if the current working directory is not some_path, you'll want to make absolute paths using os.path.join.
import os
open_files = [open(os.path.join(some_path, f)) for f in os.listdir(some_path)
if os.path.isfile(f)]
With a generator expression:
import os
all_files = (open(f) for f in os.listdir(some_path)) # note () instead of []
for f in all_files:
pass # do something with the open file here.
In all cases, make sure you close the files when you're done with them. If you can upgrade to Python 3.3 or higher, I recommend you use an ExitStack for one more level of convenience .
The os library (and listdir in particular) should provide you with the basic tools you need:
import os
print("\n".join(os.listdir())) # returns all of the files (& directories) in the current directory
Obviously you'll want to call open with them, but this gives you the files in an iterable form (which I think is the crux of the issue you're facing). At this point you can just do a for loop and open them all (or some of them).
quick caveat: Jon Clements pointed out in the comments of Henry Keiter's answer that you should watch out for directories, which will show up in os.listdir along with files.
Additionally, this is a good time to write in some filtering statements to make sure you only try to open the right kinds of files. You might be thinking you'll only ever have .txt files in a directory now, but someday your operating system (or users) will have a clever idea to put something else in there, and that could throw a wrench in your code.
Fortunately, a quick filter can do that, and you can do it a couple of ways (I'm just going to show a regex filter):
import os,re
scripts=re.compile(".*\.py$")
files=[open(x,'r') for x in os.listdir() if os.path.isfile(x) and scripts.match(x)]
files=map(lambda x:x.read(),files)
print("\n".join(files))
Note that I'm not checking things like whether I have permission to access the file, so if I have the ability to see the file in the directory but not permission to read it then I'll hit an exception.
I'm trying to get a homemade path navigation function working - basically I need to go through one folder, and explore every folder within it, running a function within each folder.
I reach a problem when I try to change directories within a for loop. I've got this "findDirectories" function:
def findDirectories(list):
for files in os.listdir("."):
print (files)
list.append(files)
os.chdir("y")
That last line causes the problems. If I remove it, the function just compiles a list with all the folders in that folder. Unfortunately, this means I have to run this each time I go down a folder, I can't just run the whole thing once. I've specified the folder "y" as that's a real folder, but the program crashes upon opening even with that. Doing os.chdir("y") outside of the for loop has no issues at all.
I'm new to Python, but not to programming in general. How can I get this to work, or is there a better way? The final result I need is running a Function on each single "*Response.xml" file that exists within this folder, no matter how deeply nested it is.
Well, you don't post the traceback of the actual error but clearly it doesn't work as you have specified y as a relative path.
Thus it may be able to change to y in the first iteration of the loop, but in the second it will be trying to change to a subdirectory of y that is also called y
Which you probably do not have.
You want to be doing something like
import os
for dirName, subDirs, fileNames in os.walk(rootPath):
# its not clear which files you want, I assume anything that ends with Response.xml?
for f in fileNames:
if f.endswith("Response.xml"):
# this is the path you will want to use
filePath = os.path.join(dirName, f)
# now do something with it!
doSomethingWithFilePath(filePath)
Thats untested, but you have the idea ...
As Dan said, os.walk would be better. See the example there.