Pandas `to_csv` String column got converted - python

I have a dataframe with a column of user ids converted from int to string
df['uid'] = df['uid'].astype(str)
However when I write to csv, the column got rounded to the nearest integer in format 1E+12 (the value is still correct when you select the cell).
But to_excel outputs the column correctly, can someone explain a bit?
Thank you!

CSV doesn't have data types. Excel has no way of knowing what you want, so it tries to interpret it. If you are using Excel, click the data tab and 'from csv' and you can specify dtypes on reading it.
Otherwise open the csv file in notepad and you'll see that the data is there.

Related

Pandas Read csv just read a line of a row

I have a classic panda data frame made of ID and Text. I would like to get just one column and therefore i use the typical df["columnname"]. But at this point it becomes a Pandas Series. Is there a way to make a new dataframe with just that single column?
I'm asking this is because if I cast the Pandas series in a string (columnname = columnname.astype ("string")) and I save it in a text file, I see that it only saves the first sentence of each line and not the entire textual content, as I would like.
If there are any other solution, I'm open to learn :)
Try this: pd.DataFrame(dfname["columnname"])

How do I stop Excel from converting numbers to date?

I save my DataFrame as csv and try to open it in excel, problem is that excel converts some of my float data to date format. I use excel 2016.
This is how my DataFrame looks like in excel.
Does anyone have an idea how to stop this ?
You have to select the required column and then press CNT + 1 and then select the correct format. As you are saving the file as CSV, you have to repeat this action every time you open the file as CSV don't save such information and by default excel reads everything as generic format. You can find more details here
If you use Excel to open a CSV file it will attempt to interpret each cell. Something that resembles a date will be formatted as a date. Excel has the same behaviour if you type or paste something that resembles a date into a cell formatted as General.
However, if you paste the same data into a cell that has already been formatted other than General it will no longer be re-interpreted.
Format a blank Excel sheet as you expect the data to appear. Open the CSV file in a text editor such as Notepad. Copy the data then paste it into the Excel sheet.
If you aren't sure how the data should appear, for example because you aren't sure about the number of columns, you can format all of the cells as Text. That will suppress interpretation but you can change the formatting afterwards.
Incidentally, I discovered a bug in Excel that relates to this. When you add a new row to the bottom of a table it inherits the formatting of the row above, however Excel does this in the wrong order. To see this, format a table column as Text. In the row below the last row of the table, formatted General, type '1/1/2022'. Excel misinterprets this as 44562. That is because it interpreted 1/1/2022 as a date then changed the formatting to Text to match the row above.
Consequently, when applying the initial formatting you should select at least as many rows as in your CSV file. The easiest way to achieve this is simply to format entire columns.
In your particular case you probably want to pre-format certain columns as Number.

Replacing dot with comma from a dataframe using Python

I have a dataframe for example df :
I'm trying to replace the dot with a comma to be able to do calculations in excel.
I used :
df = df.stack().str.replace('.', ',').unstack()
or
df = df.apply(lambda x: x.str.replace('.', ','))
Results :
Nothing changes but I receive his warning at the end of an execution without errors :
FutureWarning: The default value of regex will change from True to
False in a future version. In addition, single character regular
expressions willnot be treated as literal strings when regex=True.
View of what I have :
Expected Results :
Updated Question for more information thanks to #Pythonista anonymous:
print(df.dtypes)
returns :
Date object
Open object
High object
Low object
Close object
Adj Close object
Volume object
dtype: object
I'm extracting data with the to_excel method:
df.to_excel()
I'm not exporting the dataframe in a .csv file but an .xlsx file
Where does the dataframe come from - how was it generated? Was it imported from a CSV file?
Your code works if you apply it to columns which are strings, as long as you remember to do
df = df.apply() and not just df.apply() , e.g.:
import pandas as pd
df = pd.DataFrame()
df['a'] =['some . text', 'some . other . text']
df = df.apply(lambda x: x.str.replace('.', ','))
print(df)
However, you are trying to do this with numbers, not strings.
To be precise, the other question is: what are the dtypes of your dataframe?
If you type
df.dtypes
what's the output?
I presume your columns are numeric and not strings, right? After all, if they are numbers they should be stored as such in your dataframe.
The next question: how are you exporting this table to Excel?
If you are saving a csv file, pandas' to_csv() method has a decimal argument which lets you specify what should be the separator for the decimals (tyipically, dot in the English-speaking world and comma in many countries in continental Europe). Look up the syntax.
If you are using the to_excel() method, it shouldn't matter because Excel should treat it internally as a number, and how it displays it (whether with a dot or comma for decimal separator) will typically depend on the options set in your computer.
Please clarify how you are exporting the data and what happens when you open it in Excel: does Excel treat it as a string? Or as a number, but you would like to see a different separator for the decimals?
Also look here for how to change decimal separators in Excel: https://www.officetooltips.com/excel_2016/tips/change_the_decimal_point_to_a_comma_or_vice_versa.html
UPDATE
OP, you have still not explained where the dataframe comes from. Do you import it from an external source? Do you create it/ calculate it yourself?
The fact that the columns are objects makes me think they are either stored as strings, or maybe some rows are numeric and some are not.
What happens if you try to convert a column to float?
df['Open'] = df['Open'].astype('float64')
If the entire column should be numeric but it's not, then start by cleansing your data.
Second question: what happens when you use Excel to open the file you have just created? Excel displays a comma, but what character Excel sues to separate decimals depends on the Windows/Mac/Excel settings, not on how pandas created the file. Have you tried the link I gave above, can you change how Excel displays decimals? Also, does Excel treat those numbers as numbers or as strings?

How do I prevent a value from converting to a date or executing as division?

I have a column in a dataframe that has values in the format XX/XX (Ex: 05/23, 4/22, etc.) When I convert it to a csv, it converts to a date. How do I prevent this from happening?
I tried putting an equals sign in front but then it executes like division (Ex: =4/20 comes out to 0.5).
df['unique_id'] = '=' + df['unique_id']
I want the output to be in the original format XX/XX (Ex: 5/23 stays 5/23 in the csv file in Excel).
Check the datatypes of your dataframe with df.dtypes. I assume your column is interpreted as date. Then you can do df[col] = df[col].astype(np_type_you_want)
If that doenst bring the wished result, check why the column is interpreted as date when creating the df. Solution depends on where you get the data from.
The issue is not an issue with python or pandas. The issue is that excel thinks its clever and assumes it knows your data type. you were close with trying to put an = before your data but your data needs to be wrapped in qoutes and prefixed with an =. I cant claim to have come up with this answer myself. I obtained it from this answer
The following code will allow you to write a CSV file that will then open in excel without any formating trying to convert to date or executing division. However it shoudl be noted that this is only really a strategy if you will only be opening the CSV in excel. as you are wrapping formating info around your data which will then be stripped out by excel. If you are using this csv in any other software you might need to rethink about it.
import pandas as pd
import csv
data = {'key1': [r'4/5']}
df = pd.DataFrame.from_dict(data)
df['key1'] = '="' + df['key1'] + '"'
print(df)
print(df.dtypes)
with open(r'C:\Users\cd00119621\myfile.csv', 'w') as output:
df.to_csv(output)
RAW OUTPUT in file
,key1
0,"=""4/5"""
EXCEL OUTPUT

Column value is read as date instead of string - Pandas

I am having an excel file and in that one row of column Model is having value "9-3" which is a string value. I double-checked the excel file to have the column datatype as Plain string instead of Date. But still When I use read_excel and convert it into a data frame, the value is shown as 2017-09-03 00:00:00 instead of string "9-3".
Here is how I read the excel file:
table = pd.read_excel('ManualProfitAdjustmentUpdates.xlsx' , header=0, converters={'Model': str})
Any idea on why pandas is not treating value as string even when I set the converters as str?
The Plain string setting in the excel file affects only how the data is shown in Excel.
The str setting in the converter affects only how it treats the data that it gets.
To force the excel file to return the data as string, the cell's first character should be an apostrophe.
Change "9-3" to "'9-3".
The problem may be with excel. Make sure the entire column is stored as text and not just the singular value you are talking about. If excel had the column saved as a data at any point it will store a year in that cell no matter what is shown or what the datatype is changed too. Pandas is going to read the entire column as one data type so if you have dates above 9-3 it will be converted. Changing dates to strings without years can be tricky. It may be better to save the excel sheet as a csv once it is in the proper format you like and then use pandas pd.read_csv(). I made a test excel workbook "book1.xlsx"
9-3 1 Hello
12-1 2 World
1-8 3 Test
Then ran
import pandas as pd
df = pd.read_excel('book1.xlsx',header=0)
print(df)
and got back my data frame correctly. Thus, I am led to believe it is excel. Sorry is isn't the best answer but I don't believe it is a pandas error.

Categories