I have a folder with many .csv files in it with the following format:
FGS07_NAV_26246_20210422_86oylt.xls
FGS07_NAV_26246_ is always the same, 20210422 is the date and the most important parameter to go and pick the file, _86oylt also changes but not important at all.
I need to read one csv file with the same date as the operation date.
let’s think that y is our date part, so I tried this code, but it doesn’t give me the write name:
file2 = r'C:/Users/name/Finance/LOF_PnL/FGS07_NAV_26246_' + y + '*.xls'
df2 = pd.read_excel(file2)
How should I fix?
if you want just the specific file, you could try this one:
xls_file = [file for file in os.listdir(r"C:/Users/name/Finance/LOF_PnL") if file.endswith("xls") and y in file][0]
you can use glob module:
import glob
file2 = glob.glob(file2)[0]
import os
all_files = os.listdir(r'C:/Users/name/Finance/LOF_PnL')
filtered_files = list(filter(lambda x : 'FGS07_NAV_26246_' + y in x, all_files))
and now filtered_files is a list with the names of all files having 'FGS07_NAV_26246_' + y in their file names. You can add the full path to these names if you want the absolute path. You can also use regex for a more fancy pattern lookup than in
Maybe you can try to use join() or os.path.join() which are more standard.
"".join([str1, str2])
os.path.join(path_to_file, filename)
I hope this could be helpful. Maybe check the type of the file again also.
Related
How do you call a file name defined as list by each name?
*First, you default the method to import the csv file.
def f_read_csv(tgrt_csv):
trgt_csv_temp = '%s.csv' % (tgrt_csv)
tgrt_tbl = pd.read_csv("".join([get_csv_path,trgt_csv_temp]))
return tgrt_tbl
*Secondly, using the for csv I tried to bring in the name of each file in list.
for name in read_csv_list:
f_read_csv('%s' %name)
How can I get each csv file in the name of the list?
I often use list comprehension to generate dataframes.
dfs = [f_read_csv(name) for name in read_csv_list]
and use pd.concat to concat them
df = pd.concat(dfs)
and more, use glob to generate files list
files = glob.glob("/path/to/target/files/*.csv")
I’m having trouble understanding your question. I think you’re asking how to append '.csv' to the elements of the list. Is that correct?
If so then you can use the map() function to achieve it:
z = list(map(lambda ip:(ip+'.csv'),tgrt_csv)
Then you can load the contents using pd.read_csv() method.
I've changed your function, to get the names out, because I am not sure it works, some parts of the puzzle are missing :-).
But I maintained some characteristics like asking for the files extension.
In order to use os.listdir we import os module.
import os
I've defined my path as a folder in my python working directory and stored 3 csv files in there.
path = 'somedir/'
Here is your new function:
def f_read_csv(tgrt_csv):
tgrt_tbl = [] #to store file names
for file in os.listdir(tgrt_csv): #access to the directory
if file.endswith('.csv'): #checking files with .csv extension
name_file = os.path.join(tgrt_csv, file)
tgrt_tbl.append(name_file)
return tgrt_tbl
Then you call the function f_read_csv and pass the path:
names = f_read_csv("somedir/")
The output, if you print it:
['somedir/file 1.csv', 'somedir/file 2.csv', 'somedir/file 3.csv']
If you want them as strings you can get them out of the list:
for name in names:
print(name)
somedir/file 1.csv
somedir/file 2.csv
somedir/file 3.csv
do you meaning read a list of filename by pandas, and transfrom then to dataframe, if all csv file has same format, you can use dask.dataframe
import dask.dataframe as dd
ddf = dd.read_csv(f"{get_csv_path}*.csv")
df = ddf.compute()
dask is incompatible with pandas.
I am trying to use pandas.read_csv to read files that contain the date in their names. I used the below code to do the job. The problem is that the files name is not consistent as the number of date change the pattern. I was wondering if there is a way to let the code read the file with parts of the name is the date in front of the file name?
for x in range(0,10):
dat = 20170401+x
dat2 = dat+15
file_name='JS_ALL_V.'+str(dat)+'_'+str(dat2)+'.csvp.gzip'
df = pd.read_csv(file_name,compression='gzip',delimiter='|')
You can use glob library to read file names in unix style
Below is its hello world:
import glob
for name in glob.glob('dir/*'):
print name
An alternative of using glob.glob() (since it seems not working) is os.listdir() as explained in this question in order to have a list containing all the elements (or just the files) in your path.
I have an image sequence path that is as follows : /host_server/master/images/set01a/env_basecolor_default_v001/basecolor_default.*.jpg
In a pythonic way, is it possible for me to code and have it read the first file based on the above file path given?
If not, can I have it list the entire sequence of the sequence but only of that naming? Assuming that there is another sequence called basecolor_default_beta.*.jpgin the same directory
For #2, if I used os.listdir('/host_server/master/images/set01a/env_basecolor_default_v001'), it will be listing out files of the both image sequences
The simplest solution seems to be to use several functions.
1) To get ALL of the full filepaths, use
main_path = "/host_server/master/images/set01a/env_basecolor_default_v001/"
all_files = [os.path.join(main_path, filename) for filename in os.listdir(main_path)]
2) To choose only those of a certain kind, use a filter.
beta_files = list(filter(lambda x: "beta" in x, all_files))
beta_files.sort()
read the first file based on the above file path given?
With effective glob.iglob(pathname, recursive=False) (if you need the name/path of the 1st found file):
import glob
path = '/host_server/master/images/set01a/env_basecolor_default_v001/basecolor_default.*.jpg'
it = glob.iglob(path)
first = next(it)
glob.iglob() - Return an iterator which yields the same values as
glob() without actually storing them all simultaneously.
Try using glob. Something like:
import glob
import os
path = '/host_server/master/images/set01a/env_basecolor_default_v001'
pattern = 'basecolor_default.*.jpg'
filenames = glob.glob(os.path.join(path, pattern))
# read filenames[0]
I have 13 csv files in a folder called data, and I want to export that csv's files in the numerical order (1.csv,2.csv ...13.csv) to a excel file having each sheet named (1,2,3,4,5...13). I tryed something like this:
from pandas.io.excel import ExcelWriter
import pandas
ordered_files = ['1.csv', '2.csv','3.csv','4.csv', '5.csv','6.csv','7.csv', '8.csv','9.csv','10.csv', '11.csv','12.csv','13.csv']
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_files:
pandas.read_csv(csv_file).to_excel(
ew, index = False, sheet_name=csv_file, encoding='utf-8')
And I have two problems with this:
As you see in my list, I can't import the files directly from my folder data, if I will try:
ordered_files = ['data/1.csv']
Wont found a valid csv.
If I use that list method, my sheet will be named 3.csv for example instead of just 3.
A side question, coming from csv I saw some columns that should be int number with the format as strings with a ' in front.
Thank you so much for your time! I use python 3!
If all that concerns you is removing the last four characters from the sheet names, just use sheet_name=csv_file[:-4] in your call to to_excel. The comment from #pazqo shows you how to generate the correct path to find the CSV files in your data directory.
More generally, suppose you wanted to process all the CSV files on a given path, there are several ways to do this. Here's one straightforward way.
import os
from glob import glob
def process(path, ew):
os.chdir(path) # note this is a process-wide change
for csv_file in glob('*.csv'):
pandas.read_csv(csv_file).to_excel(ew,
index = False,
sheet_name=csv_file[:-4],
encoding='utf-8')
with ExcelWriter('my_excel.xlsx') as ew:
process("data", ew)
You might also consider generating the filenames using glob(os.path.join(path, "*.csv")) but that would also require you to remove the leading path from the sheet names - possibly worthwhile to avoid the os.chdir call, which is a bit ugly.
Concerning your first question you could write relative path as follows:
"data/1.csv" or 'data//1.csv'.
About your second point, your sheet is named like this because you are looping over your csv names, the values you are using for your sheet names are then: '1.csv',...
In my opinion you should have write this instead:
from pandas.io.excel import ExcelWriter
import pandas
ext = '.csv'
n_files = 13
with ExcelWriter('data//my_excel.xlsx') as ew:
for i in range(1,n_files+1):
pandas.read_csv('data//'+str(i)+ext)
.to_excel(ew, index = False, sheet_name=str(i), encoding='utf-8')
Because you have 13 files, named from 1 to 13, you should make a loop over this thanks to range(1,n_files+1), range function generate a list of n_files integers starting from 1.
Hope it helps.
For importing files, the path is relative to the current working directory, if you use the absolute path it should work (such as "C:\data\1.csv" in windows, "/home/user/data/1.csv" in a Linux or Unix environment).
To remove the extension from the sheet name, list the file names without the .csv ( such as orderedlist = range(1:13) ), then:
pandas.read_csv(<directory> + str(csv_file) + '.csv').to_excel(
which might be:
pandas.read_csv(/home/user/data/ + str(csv_file) + '.csv').to_excel(
Alternatively, keep the list as is and change the sheet name to
sheet_name=csv_file.split('.')[0]
to only return the portion of csv_file prior to the '.' .
I wanted to know what is the easiest way to rename multiple files using re module in python if at all it is possible.
In my directory their are 25 files all with the file names in the format ' A unique name followed by 20 same characters.mkv '
What I wanted was to delete all the 20 characters.
How can I do this using Python if at all it is possible :)
To get the new name:
>>> re.sub(r'.{20}(.mkv)', r'\1', 'unique12345678901234567890.mkv')
'unique.mkv'
Or without regex:
>>> 'unique12345678901234567890.mkv'[:-24] + '.mkv'
'unique.mkv'
To rename the file use os.rename(old, new): http://docs.python.org/library/os.html#os.rename
To get a list of the files to rename use glob.glob('*.mkv'): http://docs.python.org/library/glob.html#glob.glob
Putting that all together we get:
for filename in glob.glob('*.mkv'):
if len(filename) > 24:
os.rename(filename, filename[:-24] + '.mkv'
Since you are cutting out a specific number of characters from an easily identified point in the string, the re module is somewhat overkill. You can prepare the new filename as:
new_name = old_name.rsplit('.', 1)[0][:-20] + '.mkv'
To find the files, look up os.listdir (or, if you want to look into directories recursively, os.walk), and to rename them, see os.rename.
The re module would be useful if there are other .mkv's in the directory that you don't want to rename, so that you need to do more careful checking to identify the "target" filenames.
Use glob to find the filenames, slice the strings, and use os.rename() to rename them.
Something like:
>>> import os
>>> doIt = False
>>> for filename in ( x for x in os.listdir('.') if x.endswith('.mvk')):
... newname = filename[:-24] + filename[-4:]
... if doIt:
... os.rename(filename,newname)
... print "Renaming {0} to {1}".format(filename,newname)
... else:
... print "Would rename {0} to {1}".format(filename,newname)
...
When manipulating files, always have a dryrun. Change doIt to True, to actually move the files.
You need to use glob and os.rename. The rest if for you to figure it out!
And yes, this is entirely possible and easy to do in Python.