I'm using xlwings to pull in a excel file from a shared drive.
The files' names change daily based off the data. Eg;
dailysummary_20220429.xlsx
dailysummary_20220428.xlsx
dailysummary_20220427.xlsx
dailysummary_20220426.xlsx
I'm trying to make the code dynamic so that it pulls in today's file each day but struggling with the syntax to make this work. Any help would be much appreciated. So far I have;
from datetime import date
workbook = xw.Book(r'I:\Analytics\dailysummary_{date.today()}.xlsx')
sheet1 = workbook.sheets['OutputTable'].used_range.value
dailydata = pd.DataFrame(sheet1)
Thanks so much!
as suggested by MattR above, you need to format a date the way you want. It will work, but you are using the wrong type of string literal for your purposes.
workbook = xw.Book(f'I:\Analytics\dailysummary_{date.today().strftime("%Y%m%d")}.xlsx')
an f string lets you do the interpolation. A raw string (prefixed with an r) is sort of the opposite -- no interpolation at all
I like breaking things up a little more, might look like overkill though. It allows for easier refactoring. The pathlib module will help you in the future if files start to move, or you get into wanting to use the pathlib.Path.cwd() or .home() to get base paths without needing to change the code all of the time.
The today_str allows you to override the date if you need an old one or something. Just pass '20220425' or whatever.
import datetime as dt
import pathlib
import pandas as pd
import xlwings as xw
def get_dailydata_df(today_str: str = None) -> pd.DataFrame:
base_path = pathlib.Path('I:/Analytics')
if today_str is None:
today_str = dt.datetime.now().strftime('%Y%m%d')
file_str = 'dailysummary_'
file_str = file_str + today_str + '.xlsx'
today_path = pathlib.Path(base_path, file_str)
wb = xw.Book(today_path)
sheet1 = wb.sheets['OutputTable'].used_range.value
dailydata = pd.DataFrame(sheet1)
return dailydata
Related
I've searched for about an hour for an answer to this and none of the solutions I've found are working. I'm trying to get a folder full of CSVs into a single dataframe, to output to one big csv. Here's my current code:
import os
sourceLoc = "SOURCE"
destLoc = sourceLoc + "MasterData.csv"
masterDF = pd.DataFrame([])
for file in os.listdir(sourceLoc):
workingDF = pd.read_csv(sourceLoc + file)
print(workingDF)
masterDF.append(workingDF)
print(masterDF)
The SOURCE is a folder path but I've had to remove it as it's a work network path. The loop is reading the CSVs to the workingDF variable as when I run it it prints the data into the console, but it's also finding 349 rows for each file. None of them have that many rows of data in them.
When I print masterDF it prints Empty DataFrame Columns: [] Index: []
My code is from this solution but that example is using xlsx files and I'm not sure what changes, if any, are needed to get it to work with CSVs. The Pandas documentation on .append and read_csv is quite limited and doesn't indicate anything specific I'm doing wrong.
Any help would be appreciated.
There are a couple of things wrong with your code, but the main thing is that pd.append returns a new dataframe, instead of modifying in place. So you would have to do:
masterDF = masterDF.append(workingDF)
I also like the approach taken by I_Al-thamary - concat will probably be faster.
One last thing I would suggest, is instead of using glob, check out pathlib.
import pandas as pd
from pathlib import Path
path = Path("your path")
df = pd.concat(map(pd.read_csv, path.rglob("*.csv"))))
you can use glob
import glob
import pandas as pd
import os
path = "your path"
df = pd.concat(map(pd.read_csv, glob.glob(os.path.join(path,'*.csv'))))
print(df)
You may store them all in a list and pd.concat them at last.
dfs = [
pd.read_csv(os.path.join(sourceLoc, file))
for file in os.listdir(sourceLoc)
]
masterDF = pd.concat(df)
Is it possible to search/ parse through two columns in excel (let's say columns C & D) and find only the fields with underscores by using python?
Maybe a code like this? Not too sure..:
Import xl.range
Columns = workbook.get("C:D"))
Extract = re.findall(r'\(._?)\', str(Columns)
Please let me know if my code can be further improved on! :)
for those who need an answer, I solved it via using this code:
import openpyxl
from openpyxl.reader.excel import load_workbook
dict_folder = "C:/...../abc"
for file in os.listdir(dict_folder):
if file.endswith(".xlsx"):
wb1 = load_workbook(join(dict_folder, file), data_only = True)
ws = wb1.active
for rowofcellobj in ws["C" : "D"]:
for cellobj in rowofcellobj:
data = re.findall(r"\w+_.*?\w+", str(cellobj.value))
if data != []:
fields = data[0]
fieldset.add(fields)
Yes, it is indeed possible. The main lib you'll get to for that is pandas. With it installed (instructions here) after, of course, installing python, you could do something along the lines of
import pandas as pd
# Reading the Excel worksheet into a pandas.DataFrame type object
sheet_path = 'C:\\Path\\to\\excel\\sheet.xlsx'
df = pd.read_excel(sheet_path)
# Using multiple conditions to find column substring within
underscored = df[(df['C'].str.contains('_')) | (df['D'].str.contains('_'))]
And that'd do it for columns C and D within your worksheet.
pandas has got a very diverse documentation, but to the extent you're looking for, the read_excel function documentation (has examples) will suffice, along with some more content on python itself, if needed.
I am relatively new to Python and im trying to make a script that finds files (photos) that have been created between two dates and puts them into a folder.
For that, I need to get the creation date of the files somehow (Im on Windows).
I already have everything coded but I just need to get the date of each picture. Would also be interesting to see in which form the date is returned. The best would be like m/d/y or d/m/y (d=day; m=month, y=year).
Thank you all in advance! I am new to this forum
I imagine you are somehow listing files if so then use the
os.stat(path).st_ctime to get the creation time in Windows and then using datetime module string format it.
https://docs.python.org/2/library/stat.html#stat.ST_CTIME
https://stackoverflow.com/a/39359270/928680
this example shows how to convert the mtime (modified) time but the same applies to the ctime (creation time)
once you have the ctime it's relatively simple to check if that falls with in a range
https://stackoverflow.com/a/5464465/928680
you will need to do your date logic before converting​ to a string.
one of the solutions, not very efficient.. just to show one of the ways this can be done.
import os
from datetime import datetime
def filter_files(path, start_date, end_date, date_format="%Y"):
result = []
start_time_obj = datetime.strptime(start_date, date_format)
end_time_obj = datetime.strptime(end_date, date_format)
for file in os.listdir(path):
c_time = datetime.fromtimestamp(os.stat(file).st_ctime)
if start_time_obj <= c_time <= end_time_obj:
result.append("{}, {}".format(os.path.join(path, file), c_time))
return result
if __name__ == "__main__":
print "\n".join(filter_files("/Users/Jagadish/Desktop", "2017-05-31", "2017-06-02", "%Y-%m-%d"))
cheers!
See the Python os package for basic system commands, including directory listings with options. You'll be able to extract the file date. See the Python datetime package for date manipulation.
Also, check the available Windows commands on your version: most of them have search functions with a date parameter; you could simply have an OS system command return the needed file names.
You can use subprocess to run a shell command on a file to get meta_data of that file.
import re
from subprocess import check_output
meta_data = check_output('wmic datafile where Name="C:\\\\Users\\\\username\\\\Pictures\\\\xyz.jpg"', shell=True)
# Note that you have to use '\\\\' instead of '\\' for specifying path of the file
pattern = re.compile(r'\b(\d{14})\b.*')
re.findall(pattern,meta_data.decode())
=> ['20161007174858'] # This is the created date of your file in format - YYYYMMDDHHMMSS
Here is my solution. The Pillow/Image module can access the metadata of the .png file. Then we access the 36867 position of that metadata which is DateTimeOriginal. Finally I convert the string returned to a datetime object which gives flexibility to do whatever you need to do with it. Here is the code.
from PIL import Image
from datetime import datetime
# Get the creationTime
creationTime = Image.open('myImage.PNG')._getexif()[36867]
# Convert creationTime to datetime object
creationTime = datetime.strptime(creationTime, '%Y:%m:%d %H:%M:%S')
I can easily fill a column with formulas on Excel using VBA and the range.autofill method:
Range("A2").AutoFill Destination:=Range("A2:A10"), Type:=xlFillDefault
This will take the formula/content on cell A2 (or a range) and expand it to A10.
Looking at the MSDN help I see: https://msdn.microsoft.com/en-us/library/office/ff195345.aspx
and: https://msdn.microsoft.com/en-us/library/office/ff838605.aspx
On xlwings I can do:
import xlwings as xw
rp = xw.Book(myFile)
rp.sheets('mySheet').range('A2').api.autofill(range, 0)
But I do not know how to pass range. I cannot simply type "A2:A10", I need to pass a range object.
I tried to do: rp.sheets('mySheet').range('A2').api.autofill(rp.sheets('mySheet').range('A2:A10'), 0) but this simply blew Python!
Any ideas? Thanks!
You need to use the underlying Range objects via api both times. Assuming you are on Windows, this will work:
import xlwings as xw
from xlwings.constants import AutoFillType
wb = xw.Book('Book1')
sheet = wb.sheets(1)
sheet.range('A2').api.AutoFill(sheet.range("A2:A10").api,
AutoFillType.xlFillDefault)
This will work on Mac:
import xlwings as xw
from xlwings.constants import AutoFillType
wb = xw.Book('euromillions.csv')
ws = wb.sheets('euromillions')
ws.range("A2").api.autofill(destination = ws.range("A2:A5").api, type = AutoFillType.xlFillDefault)
Thanks, Felix Zumstein, for the provided answers. Let me add what worked on my windows system:
sheet.range("A1").api.AutoFill(sheet.range("A1:A10").api, 0)
AutoFillType.xlFillDefault was not recognized in my case, replacing this value by 0 did the job.
I asked this question before but some guys divert me on wrong direction and I didn't get the right answer yet.
I Know how to rename the file but I am struggle to add date and time with the new name of file.
Can you plz guide me that how Can I do that?
import os
os.rename('mark.txt', 'steve.txt')
Try this:
import os
import time
timestamp = time.strftime('%H%M-%Y%m%d')
os.rename('oldname.txt', 'oldname_%s.txt' % (timestamp))
The following will append the timestamp to the file name. You can use this example to expand on it and do whatever you feel like doing. This is a better way then using datetime.datetime.now() as, unformatted, that string will contain a space and that is not recommended on Linux.
I think this will help you
print('renaming archive...')
import datetime
dt = str(datetime.datetime.now())
import os
newname = 'danish_'+dt+'.txt'
os.rename('danish.txt', newname)
print('renaming complete...')
from datetime import datetime
import os
current_time = str(datetime.utcnow())
current_time = "_".join(current_time.split()).replace(":","-")
current_time = current_time[:-7]
os.rename('orfile.txt', 'orfile_'+current_time+'.txt')
This will rename the file to the exact timestamp.
orfile2015-01-02_16-17-41.txt
Please use appropriate variable names it is a bad habit to give names to variables which don't make sense.
import datetime
import os
current_time = datetime.datetime.now()
os.rename('mark.txt', 'mark_' + str(current_time) + '.txt')