I save my DataFrame as csv and try to open it in excel, problem is that excel converts some of my float data to date format. I use excel 2016.
This is how my DataFrame looks like in excel.
Does anyone have an idea how to stop this ?
You have to select the required column and then press CNT + 1 and then select the correct format. As you are saving the file as CSV, you have to repeat this action every time you open the file as CSV don't save such information and by default excel reads everything as generic format. You can find more details here
If you use Excel to open a CSV file it will attempt to interpret each cell. Something that resembles a date will be formatted as a date. Excel has the same behaviour if you type or paste something that resembles a date into a cell formatted as General.
However, if you paste the same data into a cell that has already been formatted other than General it will no longer be re-interpreted.
Format a blank Excel sheet as you expect the data to appear. Open the CSV file in a text editor such as Notepad. Copy the data then paste it into the Excel sheet.
If you aren't sure how the data should appear, for example because you aren't sure about the number of columns, you can format all of the cells as Text. That will suppress interpretation but you can change the formatting afterwards.
Incidentally, I discovered a bug in Excel that relates to this. When you add a new row to the bottom of a table it inherits the formatting of the row above, however Excel does this in the wrong order. To see this, format a table column as Text. In the row below the last row of the table, formatted General, type '1/1/2022'. Excel misinterprets this as 44562. That is because it interpreted 1/1/2022 as a date then changed the formatting to Text to match the row above.
Consequently, when applying the initial formatting you should select at least as many rows as in your CSV file. The easiest way to achieve this is simply to format entire columns.
In your particular case you probably want to pre-format certain columns as Number.
Related
I have a classic panda data frame made of ID and Text. I would like to get just one column and therefore i use the typical df["columnname"]. But at this point it becomes a Pandas Series. Is there a way to make a new dataframe with just that single column?
I'm asking this is because if I cast the Pandas series in a string (columnname = columnname.astype ("string")) and I save it in a text file, I see that it only saves the first sentence of each line and not the entire textual content, as I would like.
If there are any other solution, I'm open to learn :)
Try this: pd.DataFrame(dfname["columnname"])
I have a dataframe with a column of user ids converted from int to string
df['uid'] = df['uid'].astype(str)
However when I write to csv, the column got rounded to the nearest integer in format 1E+12 (the value is still correct when you select the cell).
But to_excel outputs the column correctly, can someone explain a bit?
Thank you!
CSV doesn't have data types. Excel has no way of knowing what you want, so it tries to interpret it. If you are using Excel, click the data tab and 'from csv' and you can specify dtypes on reading it.
Otherwise open the csv file in notepad and you'll see that the data is there.
I'm using the openpyxl library in Python and I'm trying to read in the value of a cell. The cells value is a date in the format MM/DD/YYYY. I would like for the value to be read into my script simply as a string (i.e. "8/6/2014"), but instead Python is somehow automatically reading it as a date object (Result is "2014-08-06 00:00:00") I don't know if this is something I need to fix in Excel or Python, but how do I get the string I'm looking for?
I would suggest changing it in your Excel if you want to preserve what is being read in by openpyxl. That said, when a cell has been formatted to a date in Excel, it becomes altered to fit a specified format so you've lost the initial string format in either case.
For example, let's say that the user enters the date 1/1/2018 into a cell that is formatted MM/DD/YYYY, Excel will change the data to 01/01/2018 and you will lose the original string that was entered.
If you only care to see data of the form MM/DD/YYYY, an alternate solution would be to cast the date with date_cell.strftime("%m/%d/%Y")
I found out how to fix it with these lines of code:
dateString = str(ws.cell(row=row, column=column).value.date())
newDate = datetime.strptime(dateString, "%Y-%m-%d").strftime("%m/%d/%Y")
The string "newDate" gives me the format "8/6/2018"
I am having an excel file and in that one row of column Model is having value "9-3" which is a string value. I double-checked the excel file to have the column datatype as Plain string instead of Date. But still When I use read_excel and convert it into a data frame, the value is shown as 2017-09-03 00:00:00 instead of string "9-3".
Here is how I read the excel file:
table = pd.read_excel('ManualProfitAdjustmentUpdates.xlsx' , header=0, converters={'Model': str})
Any idea on why pandas is not treating value as string even when I set the converters as str?
The Plain string setting in the excel file affects only how the data is shown in Excel.
The str setting in the converter affects only how it treats the data that it gets.
To force the excel file to return the data as string, the cell's first character should be an apostrophe.
Change "9-3" to "'9-3".
The problem may be with excel. Make sure the entire column is stored as text and not just the singular value you are talking about. If excel had the column saved as a data at any point it will store a year in that cell no matter what is shown or what the datatype is changed too. Pandas is going to read the entire column as one data type so if you have dates above 9-3 it will be converted. Changing dates to strings without years can be tricky. It may be better to save the excel sheet as a csv once it is in the proper format you like and then use pandas pd.read_csv(). I made a test excel workbook "book1.xlsx"
9-3 1 Hello
12-1 2 World
1-8 3 Test
Then ran
import pandas as pd
df = pd.read_excel('book1.xlsx',header=0)
print(df)
and got back my data frame correctly. Thus, I am led to believe it is excel. Sorry is isn't the best answer but I don't believe it is a pandas error.
I'm using Python-Excel xlwt to create a blank Excel spreadsheet for filling out in a spreadsheet. I would like to specify that a certain range of cells should be date formatted. I'm doing something like:
datestyle = xlwt.XFStyle()
datestyle.num_format_str = "YYYY-MM-DD"
ws.write(row, column, "", datestyle)
but that's a bit over-prescriptive. People may be pasting in data, and that means that if the format doesn't match exactly then there will be problems. Spreadsheets are generally good at spotting and understanding dates pasted in in various formats. I want the spreadsheet to be able to do this without the restriction of a specific input format.
I just want to say 'this cell is a date' and not impose a format. Is this doable?
You can't specify that a cell is a date and not impose a format, not with xlwt and not with anything else, including Excel itself. Two reasons:
(1) You can't specify that a cell is any type. It is whatever the user types or pastes in. You can format it as a date but they can type in text.
(2) "date" is not a data type in Excel. All Excel knows about is text, floating point numbers, booleans (TRUE/FALSE), errors (#DIV/0 etc), and "blank" (formatting but no data). A date cell is just a number cell with a date format.
A general answer to "Can I do X with xlwt?" questions: Firstly try doing X with Excel / OpenOffice Calc / Gnumeric. If you can't, then neither can xlwt.
The format you're prescribing defines how the date will be displayed, but it won't affect how excel interprets entered dates. If your format is YYYY-MM-DD, the user can still enter 5/21/2008, and the string will be converted to it's date value (39589) and then displayed in your specified format: "2008-01-21"