I have 13 csv files in a folder called data, and I want to export that csv's files in the numerical order (1.csv,2.csv ...13.csv) to a excel file having each sheet named (1,2,3,4,5...13). I tryed something like this:
from pandas.io.excel import ExcelWriter
import pandas
ordered_files = ['1.csv', '2.csv','3.csv','4.csv', '5.csv','6.csv','7.csv', '8.csv','9.csv','10.csv', '11.csv','12.csv','13.csv']
with ExcelWriter('my_excel.xlsx') as ew:
for csv_file in ordered_files:
pandas.read_csv(csv_file).to_excel(
ew, index = False, sheet_name=csv_file, encoding='utf-8')
And I have two problems with this:
As you see in my list, I can't import the files directly from my folder data, if I will try:
ordered_files = ['data/1.csv']
Wont found a valid csv.
If I use that list method, my sheet will be named 3.csv for example instead of just 3.
A side question, coming from csv I saw some columns that should be int number with the format as strings with a ' in front.
Thank you so much for your time! I use python 3!
If all that concerns you is removing the last four characters from the sheet names, just use sheet_name=csv_file[:-4] in your call to to_excel. The comment from #pazqo shows you how to generate the correct path to find the CSV files in your data directory.
More generally, suppose you wanted to process all the CSV files on a given path, there are several ways to do this. Here's one straightforward way.
import os
from glob import glob
def process(path, ew):
os.chdir(path) # note this is a process-wide change
for csv_file in glob('*.csv'):
pandas.read_csv(csv_file).to_excel(ew,
index = False,
sheet_name=csv_file[:-4],
encoding='utf-8')
with ExcelWriter('my_excel.xlsx') as ew:
process("data", ew)
You might also consider generating the filenames using glob(os.path.join(path, "*.csv")) but that would also require you to remove the leading path from the sheet names - possibly worthwhile to avoid the os.chdir call, which is a bit ugly.
Concerning your first question you could write relative path as follows:
"data/1.csv" or 'data//1.csv'.
About your second point, your sheet is named like this because you are looping over your csv names, the values you are using for your sheet names are then: '1.csv',...
In my opinion you should have write this instead:
from pandas.io.excel import ExcelWriter
import pandas
ext = '.csv'
n_files = 13
with ExcelWriter('data//my_excel.xlsx') as ew:
for i in range(1,n_files+1):
pandas.read_csv('data//'+str(i)+ext)
.to_excel(ew, index = False, sheet_name=str(i), encoding='utf-8')
Because you have 13 files, named from 1 to 13, you should make a loop over this thanks to range(1,n_files+1), range function generate a list of n_files integers starting from 1.
Hope it helps.
For importing files, the path is relative to the current working directory, if you use the absolute path it should work (such as "C:\data\1.csv" in windows, "/home/user/data/1.csv" in a Linux or Unix environment).
To remove the extension from the sheet name, list the file names without the .csv ( such as orderedlist = range(1:13) ), then:
pandas.read_csv(<directory> + str(csv_file) + '.csv').to_excel(
which might be:
pandas.read_csv(/home/user/data/ + str(csv_file) + '.csv').to_excel(
Alternatively, keep the list as is and change the sheet name to
sheet_name=csv_file.split('.')[0]
to only return the portion of csv_file prior to the '.' .
Related
This might be asking too much, but I am hoping to easily manipulate .xlsm or .xlsx files within a specific directory.
Is there a way to?:
make a list of files in a directory (a la os.listdir or something similar)
choose a file from that list according to index (i.e., typing in '2' to retrieve xyz.xlsm from the list below)
tyc.xlsm
abc.xlsm
xyz.xlsm
gyf.xlsm
txz.xlsm
and then execute pandas.read_excel to convert to .csv for easy import into JMP
Pieces of the puzzle I am stuck on:
A. generating that list in Step 2 above with specific index positions
B. defining the io for pandas.read_excel as the output from that file name selection in Step 2 above.
Below is the code I have so far; I am able to list the .xlsm file and create the .csv from the specific sheet, but not sure how to do it in a folder of multiple Excel files.
import pandas as pd
import numpy as np
import os
for f_name in os.listdir('.'):
if f_name.endswith('.xlsm'):
print(f_name)
data_xls = pd.read_excel('example.xlsm', 'Sheet2', dtype=str, index_col=None)
data_xls.to_csv('csvfile.csv', encoding='utf-8', index=False)
Many thanks in advance!
You're on the right track;
import pandas as pd
import numpy as np
import os
# Makes a list of files in directory
files = []
directory = '.'
for f_name in os.listdir(directory):
if f_name.endswith(".xlsm"):
files.append(os.path.join(directory, f_name))
# Lists possible files
for i, file in enumerate(files):
print(i, file)
# Prompts user to pick a file
while True:
index = input('Pick a file by index: ')
try:
index = int(index)
if index in range(len(files)):
break
finally:
print('Incorrect Input, Try Again.')
# Converts chosen file to csv
df = pd.read_excel(files[index])
df.to_csv(files[index].split('.')[0] + '.csv')
I would suggest you add the names to a list. Put the "excel to csv" process into a function with two arguments. Put your list of names as one argument and index you want to use for the list item into the other. Call the function with those two arguments via CLI with sys arguments if you wish.
Good day
We need to copy data from one server to another server for a migration. I have received an Excel list in which I have the following columns.
All contained files must be copied. Unfortunately, the new path where the documents should be copied should also be written into the DataFrame and finally the whole thing should be exported as CSV.
The export is no problem.
But I have problems with the loop.
In my imagination:
I have a basic destination path
- I work per line
- I copy the file using the file path
- I add the "destination path + file name" in new path in the data frame
- Repeat on the next line
So i started with:
import os
import glob
import shutil
import numpy as np
import pandas as pd
Docdf = pd.read_excel('S:\Test_MSC.xlsx')
destpath = 'S:\\Test_dest\\'
for f in Docdf:
[...] *problem*
Docdf .to_csv("enchanced_file.csv", sep = ";", encoding = "utf-8")
How do I best build the loop?
Many thanks for the support
for f in Docdf: is wrong, it will iterate over column names. You need to iterate over rows or just use the apply method:
from shutils import copyfile
def copying(row):
oldpath = row[4] # column E
newpath = destpath + row[1] # column B
copyfile(oldpath, newpath)
return newpath
Docdf['new_path'] = Docdf.apply(copying, axis=1)
I have a list of dataframes (n = 275). I'd like to save each one of them as a separate csv file in the same directory on my PC. I'd like to write a function to do that automaticaly. maybe someone could give an advice how can I do that?
Can anybody assist me in this:
dframes_list - list of dataframe names
df_00001 - dataframe name example that I have now and that I expect.
Thank you in advance.
(does not address OP; leaving here for historical purposes)
You can do something simple by looping over the list and calling the DataFrame.to_csv method:
import os
folderpath = "your/folder/path"
for i, df in enumerate(dframes_list, 1):
filename = "df_{}".format(i)
filepath = os.path.join(folderpath, filename)
df.to_csv(filepath)
I think the following code will do what you want. Putting it here for others who may want to create separate dataframes from a list of dataframes in python
This an update to the answer provided by #CrepeGoat
import os
folderpath = "your/folder/path-where-to-save-files"
csv = 'csv' # output file type
for i, df in enumerate(dflist, 1):
filename = "df_{}.{}".format(i, csv)
filepath = os.path.join(folderpath, filename)
df.to_csv(filepath)
I have a folder that contains a variable number of files, and each file has a variable string in the name. For example:
my_file V1.csv
my_file V2.csv
my_file something_else.csv
I would need to:
Load all the files which name start with "my_file"
Concatenate all of them in a single dataframe
Right now I am doing it with individual pd.read_csv functions for each file, and then merging them with a concatenate.
This is not optimal as every time the files in the source folder change, I need to modify the script.
Is it possible to automate this process, so that it works even if the source files change?
You can combine glob, pandas.concat and pandas.read_csv fairly easily. Assuming the CSV files are in the same folder as your script:
import glob
import pandas as pd
df = pd.concat([pd.read_csv(f) for f in glob.glob('my_file*.csv')])
for filename in os.listdir(directory):
if filename.startswith("my_file") and filename.endswith(".csv"):
# do some stuff here
continue
else:
continue
I have a folder with .exp files. They're basically .csv files but with a .exp extension (just the format of files exported from the instrument). I know because changing .exp to .csv still allows to open them in Excel as csv files. Example here: https://uowmailedu-my.sharepoint.com/personal/tonyd_uow_edu_au/Documents/LAB/MC-ICPMS%20solution/Dump%20data%20here?csf=1
In Python, I want to read the data from each file into data frames (one for each file). I've tried the following code, but it makes the list dfs with all the files and:
(i) I don't know how to access the content of list dfs and turn it into several data frames
(ii) it looks like the columns in the original .exp files were lost.
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
import glob
import pandas as pd
# get data file names
filenames = glob.glob(path + "/*.csv")
dfs = []
for filename in filenames:
dfs.append(pd.read_csv(filename))
do you guys have any ideas how I could read these files into data frames, so I can easily access the content?
I found this post: Storing csv file's contents into data Frames [Python Pandas] but not too helpful in my case.
thanks
I would recommend you switch to using an absolute path to your folder. Also it is safer to use os.path.join() when combining file parts (better than string concatenation).
To make things easier to understand, I suggest rather than just creating a list of dataframes, that you create a list of tuples containing the filename and the dataframe, that way you will know which is which.
In your code, you are currently searching for csv files not exp files.
The following creates the list of dataframes, each entry also stores the corresponding filename. At end end it cycles through all of the entries and displays the data.
Lastly, it shows you how you would for example display just the first entry.
import pandas as pd
import glob
import os
# change directory
os.chdir('..\LAB\MC-ICPMS solution\Dump data here')
path = os.getcwd()
# get data file names
dfs = []
for filename in glob.glob(os.path.join(path, "*.exp")):
dfs.append((filename, pd.read_csv(filename)))
print "Found {} exp files".format(len(dfs))
# display each of your dataframes
for filename, df in dfs:
print filename
print df
# To display just the first entry:
print "Filename:", df[0][0]
print df[0][1]