Using a value as a name for csv file - python

I'm working with a lot of different files and after I am done editing I want to save it as a new csv file.
print(files[0])
mhofmanmusselsT1_1L.raw
# I want the csv file for this dataset to be namend waves_T1_1L.csv,
# but if I select a files[3] it would be waves_T3_2S.csv
t = files[0]
testtype = t[14:19]
name= ("waves_"+testtype)
Using the to_csv code it uses the df name as the file name. I'am quite new to python so it might be something obvious but is there a way to use
print(name)
waves_T1_1L
name = pd.DataFrame(df)
#Where name would function like if "print(name)" would be used,
#so it will automatically update if a different "files[n]" would be used.
#Unfortunately it won't allow me to do that.
UPDATE
I have figured it out it took more steps than I expected.
Name = files[0]
testtype = Name[14:19]
filename = "waves_"+ testtype
wavedata.to_csv('out.csv')
old_name= r"workdirectory/out.csv"
new_name= r"workdirectory"+filename+".csv"
os.rename(old_name,new_name)
The output file will be changed from out.csv to waves_T1_1L.csv and it is updated if a different file is selected as input.

The first argument of pandas.DataFrame.to_csv will save it to the filename you specify.
df.to_csv('NewName.csv')
If you want to save it to a new folder, you have to use the os library
import os
os.makedirs('folder/subfolder', exist_ok=True)
df.to_csv('folder/subfolder/out.csv')
Relevant Documentation: https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.to_csv.html

Related

Edit CSV filename in Python to append to the current filename

I'm trying to change the name of my csv file with python. So I know that when I want a filename it gives gives a path for instance
C:/user/desktop/somefolder/[someword].csv
so what I want to do is to change this file name to something like
C:/user/desktop/somefolder/[someword] [somenumber].csv
but I don't know what that number is automatically, or the word is automatically the word was generated from another code I don't have access to, the number is generated from the python code I have. so I just want to change the file name to include the [someword] and the [somenumber] before the .csv
I have the os library for python installed incase that's a good library to use for that.
Here is the solution (no extra libs needed):
import os
somenumber = 1 # use number generated in your code
fpath = "C:/user/desktop/somefolder"
for full_fname in os.listdir(fpath):
# `someword` is a file name without an extension in that context
someword, fext = os.path.splitext(full_fname)
old_fpath = os.path.join(fpath, full_fname)
new_fpath = os.path.join(fpath, f"{someword} {somenumber}{fext}")
os.rename(old_fpath, new_fpath)

pandas to_csv - can't save a named csv into a specific path

Trying the simple task of converting a df to a csv, then saving it locally to a specific path. I'll censor the file name with "..." because I'm not sure about the privacy policy.
I checked many topics and I still run into errors, so I'll detail all the steps I went through, with the output:
Attempt #1:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, 'myDataFrame.csv')
TypeError: "delimiter" must be a 1-character string
Attempt #2: After searching about "delimiter" issues, I discovered it was linked to the sep argument, so I used a solution from this link to set a proper separator and encoding.
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path, "myDataFrame.csv", sep=b'\t', encoding='utf-8')
TypeError: to_csv() got multiple values for argument 'sep'
After playing with different arguments, it seems like my code use 'myDataFrame.csv' as the sep argument because when I remove it the code does not return error, but no file is create at the specified path, or anywhere (I checked the desktop, my documents, default "downloads" file).
I checked the documentation pandas.DataFrame.to_csv and I didn't find a parameter that would allow me to set a specific name to the csv I want to locally create. I've been running in circles trying things that lead me to the same errors, without finding a way to save anything locally.
Even the simple code below does not create any csv, and I really don't understand why:
import pandas as pd
file_name = "https://raw.githubusercontent.com/.../titanic.csv"
df = pd.read_csv(file_name)
df.to_csv("myDataFrame.csv")
I just want to simply save a dataframe into a csv, give it a name, and specify the path to where this csv is saved on my hard drive.
You receive the error because you are delivering to many values. The path and the file name must be provided in one string:
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp'
df.to_csv(path +'/myDataFrame.csv')
This saves your file at the path you like.
However, the file of your last attempt should be saved at the root of your PYTHONPATH. Make sure to check on that. The file must exist.
There is no seperate argument to give filename.
Provide name of the file in the path.
Change the code to this
path = r'D:\PROJECTS\DATA_SCIENCE\UDEMY_DataAnalysisBootcamp\myDataFrame.csv'
df.to_csv(path)

How can I make changes to a path string that exists within a variable (as part of a loop)?

I'm currently developing a loop where for each csv in the directory I am re-sampling and then saving as a new csv. However, I would like to retain only part of the original path string contained within the variable so that I can add an identifier for the new file.
For example, the file picked up through the loop may be:
'...\folder1\101_1000_RoomTemperatures.csv'
But I would like the new saved file to look like:
'...\folder2\101_1000_RoomTemperatures_Rounded.csv'
Have noticed SQL and C-related posts about this issue - however, solutions I suspect not relevant for within the python environment. Using the code below I can rename the outputs to enable differentiation, however, not ideal!
for filename in os.listdir(directory):
if filename.endswith('.csv'):
# Pull in the file
df = pd.read_csv(filename)
# actions occur here
# Export the file
df.to_csv('{}_rounded.csv'.format(str(filename)))
The output using this code is:
'...\folder1\101_1000_RoomTemperatures.csv_rounded.csv')
A simple solution would be to split by dots and omit the part after the last dot to get the base filename (without file extension):
>>> filename = r'...\folder1\101_1000_RoomTemperatures.csv'
>>> filename
'...\\folder1\\101_1000_RoomTemperatures.csv'
>>> base_filename = '.'.join(filename.split('.')[:-1])
>>> base_filename
'...\\folder1\\101_1000_RoomTemperatures'
Then use this base_filename to give the new name:
df.to_csv('{}_rounded.csv'.format(base_filename))

how to get the name of an unknown .XLS file into a variable in Python 3.7

I'm using Python 3.7.
I have to download an excel file (.xls) that has a unique filename every time I download it into a specific downloads folder location.
Then with Python and Pandas, I then have to open the excel file and read/convert it to a dataframe.
I want to automate the process, but I'm having trouble telling Python to get the full name of the XLS file as a variable, which will then be used by pandas:
# add dependencies and set location for downloads folder
import os
import glob
import pandas as pd
download_dir = '/Users/Aaron/Downloads/'
# change working directory to download directory
os.chdir(download_dir)
# get filename of excel file to read into pandas
excel_files = glob.glob('*.xls')
blah = str(excel_files)
blah
So then for example, the output for "blah" is:
"['63676532355861.xls']"
I have also tried just using "blah = print(excel_files)" for the above block, instead of the "str" method, and assigning that to a variable, which still doesn't work.
And then the rest of the process would do the following:
# open excel (XLS) file with unknown filename in pandas as a dataframe
data_df = pd.read_excel('WHATEVER.xls', sheet_name=None)
And then after I convert it to a data frame, I want to DELETE the excel file.
So far, I have spent a lot of time reading about fnames, io, open, os.path, and other libraries.
I still don't know how to get the name of the unknown .XLS file into a variable, and then later deleting that file.
Any suggestions would be greatly appreciated.
This code finds an xls file in your specified path reads the xls file and deletes the file.If your directory contains more than 1 xls file,It reads the last one.You can perform whatever operation you want if you find more than one xls files.
import os
for filename in os.listdir(os.getcwd()):
if filename.endswith(".xls"):
print(filename)
#do your operation
data_df = pd.read_excel(filename, sheet_name=None)
os.remove(filename)
Check this,
lst = os.listdir()
matching = [s for s in lst if '.xls' in s]
matching will have all list of excel files.
As you are having only one excel file, you can save in variable like file_name = matching[0]

Run only if "if " statement is true.!

So I've a question, Like I'm reading the fits file and then i'm using the information from the header of the fits to define the other files which are related to the original fits file. But for some of the fits file, the other files (blaze_file, bis_file, ccf_table) are not available. And because of that my code gives the pretty obvious error that No Such file or directory.
import pandas as pd
import sys, os
import numpy as np
from glob import glob
from astropy.io import fits
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
e2ds_hdu = fits.open(filename)
e2ds_header = e2ds_hdu[0].header
date = e2ds_header['DATE-OBS']
date2 = date = date[0:19]
blaze_file = e2ds_header['HIERARCH ESO DRS BLAZE FILE']
bis_file = glob('HARPS.' + date2 + '*_bis_G2_A.fits')
ccf_table = glob('HARPS.' + date2 + '*_ccf_G2_A.tbl')
if not all(file in os.listdir(PATH) for file in [blaze_file,bis_file,ccf_table]):
continue
So what i want to do is like, i want to make my code run only if all the files are available otherwise don't. But the problem is that, i'm defining the other files as variable inside the for loop as i'm using the header information. So how can i define them before the for loop???? and then use something like
So can anyone help me out of this?
The filenames returned by os.listdir() are always relative to the path given there.
In order to be used, they have to be joined with this path.
Example:
PATH = os.path.join("home", "Desktop", "2d_spectra")
for filename in os.listdir(PATH):
if filename.endswith("_e2ds_A.fits"):
filepath = os.path.join(PATH, filename)
e2ds_hdu = fits.open(filepath)
…
Let the filenames be ['a', 'b', 'a_ed2ds_A.fits', 'b_ed2ds_A.fits']. The code now excludes the two first names and then prepends the file path to the remaining two.
a_ed2ds_A.fits becomes /home/Desktop/2d_spectra/a_ed2ds_A.fits and
b_ed2ds_A.fits becomes /home/Desktop/2d_spectra/b_ed2ds_A.fits.
Now they can be accessed from everywhere, not just from the given file path.
I should become accustomed to reading a question in full before trying to answer it.
The problem I mentionned is a problem if you don't start the script from any path outside the said directory. Nevertheless, applying it will make your code much more consistent.
Your real problem, however, lies somewhere else: you examine a file and then, after checking its contents, want to read files whose names depend on informations from that first file.
There are several ways to accomplish your goal:
Just extend your loop with the proper tests.
Pseudo code:
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if all files exist:
proceed
or
for file in files:
if file.endswith("fits"):
open file
read date from header
create file names depending on date
if not all files exist:
continue # actual keyword, no pseudo code!
proceed
Put some functionality into functions (variation of 1.)
Create a loop in a generator function which yields the "interesting information" of one fits file (or alternatively nothing) and have another loop run over them to actually work with the data.
If I am still missing some points or am not detailled enough, please let me know.
Since you have to read the fits file to know the other dependant files names, there's no way you can avoid reading the fit file first. The only thing you can do is test for the dependant files existance before trying to read them and skip the rest of the loop (using continue) if not.
Edit this line
e2ds_hdu = fits.open(filename)
And replace with
e2ds_hdu = fits.open(os.path.join(PATH, filename))

Categories