I'm using the openpyxl library in Python and I'm trying to read in the value of a cell. The cells value is a date in the format MM/DD/YYYY. I would like for the value to be read into my script simply as a string (i.e. "8/6/2014"), but instead Python is somehow automatically reading it as a date object (Result is "2014-08-06 00:00:00") I don't know if this is something I need to fix in Excel or Python, but how do I get the string I'm looking for?
I would suggest changing it in your Excel if you want to preserve what is being read in by openpyxl. That said, when a cell has been formatted to a date in Excel, it becomes altered to fit a specified format so you've lost the initial string format in either case.
For example, let's say that the user enters the date 1/1/2018 into a cell that is formatted MM/DD/YYYY, Excel will change the data to 01/01/2018 and you will lose the original string that was entered.
If you only care to see data of the form MM/DD/YYYY, an alternate solution would be to cast the date with date_cell.strftime("%m/%d/%Y")
I found out how to fix it with these lines of code:
dateString = str(ws.cell(row=row, column=column).value.date())
newDate = datetime.strptime(dateString, "%Y-%m-%d").strftime("%m/%d/%Y")
The string "newDate" gives me the format "8/6/2018"
Related
I have this dataset with an invalid date data within a column. It is formatted yyyymmdd and I need them to be reformatted to mm/dd/yyyy. I tried coercing the value but it doesnt satisfy the conditions since it is a data and it needed to be printed out even though it is not valid.
heres a sample of the data in csv
The data have a day of '00' and we all know that day 0 is non-existent thus it produce me errors in printing the dataframe.
I tried replacing errors='coerce to errors='ignore just to see if it will push through the conditions but it doesnt.
I want to print/reformat the invalid data without coercing the value. Is there any way around?
Here is my line of code for that:
df['charge_off_date'] = pd.to_datetime(hals2['charge_off_date'], format='%Y%m%d', errors='ignore')
df['charge_off_date'] = df['charge_off_date'].dt.strftime('%m/%d/%Y')
If it's invalid you cannot format it as a date imho. You can treat it as a string though and knowing that it's yyyymmdd format you can just format a string in a custom function and apply it to your column.
def format_invalid_date(d: int)->str:
d=str(d)
return f"{d[:4]}/{d[4:6]}/{d[6:]}"
df['charge_off_date']=df['charge_off_date'].apply(format_invalid_date)
That should convert 19000100 to 1900/01/00, which is still invalid as a date, but looks like a date format.
I save my DataFrame as csv and try to open it in excel, problem is that excel converts some of my float data to date format. I use excel 2016.
This is how my DataFrame looks like in excel.
Does anyone have an idea how to stop this ?
You have to select the required column and then press CNT + 1 and then select the correct format. As you are saving the file as CSV, you have to repeat this action every time you open the file as CSV don't save such information and by default excel reads everything as generic format. You can find more details here
If you use Excel to open a CSV file it will attempt to interpret each cell. Something that resembles a date will be formatted as a date. Excel has the same behaviour if you type or paste something that resembles a date into a cell formatted as General.
However, if you paste the same data into a cell that has already been formatted other than General it will no longer be re-interpreted.
Format a blank Excel sheet as you expect the data to appear. Open the CSV file in a text editor such as Notepad. Copy the data then paste it into the Excel sheet.
If you aren't sure how the data should appear, for example because you aren't sure about the number of columns, you can format all of the cells as Text. That will suppress interpretation but you can change the formatting afterwards.
Incidentally, I discovered a bug in Excel that relates to this. When you add a new row to the bottom of a table it inherits the formatting of the row above, however Excel does this in the wrong order. To see this, format a table column as Text. In the row below the last row of the table, formatted General, type '1/1/2022'. Excel misinterprets this as 44562. That is because it interpreted 1/1/2022 as a date then changed the formatting to Text to match the row above.
Consequently, when applying the initial formatting you should select at least as many rows as in your CSV file. The easiest way to achieve this is simply to format entire columns.
In your particular case you probably want to pre-format certain columns as Number.
The two photos that I've attached below show a dataframe table and a table that was exported out to csv file. I'm wondering if there is any command that can modify the date so that the dates shown on both files would be the same.
On the dataframe: 2017-08-01 -> but after exporting out it becomes 2017/8/1(Instead ->2017/08/01).
Does anyone know how it can be done, or do I can only manually edit the cell format?
[
pandas.DataFrame.to_csv
When you make the call to the to_csv function, you can supply it the parameter date_format='%Y-%m-%d'.
Check out the documentation. One of the parameters that you can pass to_csv is date_format which allows you to control the format of your date like columns. The format is the same as for datetime
df.to_csv(file_path, date_format="%Y-%m-%d")
The form YYYY-MM-DD should be the default output date format for to_csv().
It seems like you are opening the output CSV in a program that may be applying its own style/formatting to the dates. Try opening it in a text editor to confirm.
sometimes, the way others answer(date_format) doesn't work although it is a right way.
you should just change your cell format on Excel in that case.
In that case, follow this way:
right click => Format Cell => Category => Custom => Type: yyyy-mm-dd
I am using xlrd to read a spreadsheet and write to a database. However, there is a cell value which needs to be written to a date column in the database.
The cell is a string and I read it as and trying to convert it to MON-YY as follows.
sales_month_val = curr_sheet.cell(1,5).value
print sales_month_val
current_sales_month = datetime.strptime(sales_month_val,'%MMM%-%YY%')
But I keep getting the conversion failed error message. Is the above conversion to datetime correct to convert to MON-YY format?
Thanks,
bee
You should take a look at this strftime reference.
The format you are looking for is:
%b-%y
I'm using Python-Excel xlwt to create a blank Excel spreadsheet for filling out in a spreadsheet. I would like to specify that a certain range of cells should be date formatted. I'm doing something like:
datestyle = xlwt.XFStyle()
datestyle.num_format_str = "YYYY-MM-DD"
ws.write(row, column, "", datestyle)
but that's a bit over-prescriptive. People may be pasting in data, and that means that if the format doesn't match exactly then there will be problems. Spreadsheets are generally good at spotting and understanding dates pasted in in various formats. I want the spreadsheet to be able to do this without the restriction of a specific input format.
I just want to say 'this cell is a date' and not impose a format. Is this doable?
You can't specify that a cell is a date and not impose a format, not with xlwt and not with anything else, including Excel itself. Two reasons:
(1) You can't specify that a cell is any type. It is whatever the user types or pastes in. You can format it as a date but they can type in text.
(2) "date" is not a data type in Excel. All Excel knows about is text, floating point numbers, booleans (TRUE/FALSE), errors (#DIV/0 etc), and "blank" (formatting but no data). A date cell is just a number cell with a date format.
A general answer to "Can I do X with xlwt?" questions: Firstly try doing X with Excel / OpenOffice Calc / Gnumeric. If you can't, then neither can xlwt.
The format you're prescribing defines how the date will be displayed, but it won't affect how excel interprets entered dates. If your format is YYYY-MM-DD, the user can still enter 5/21/2008, and the string will be converted to it's date value (39589) and then displayed in your specified format: "2008-01-21"