excel converting the string to scientific notation - python

is there way to prevent data exported from python to be converted into the scientific notation in excel.
ID
1E1
2E9
3E4
After exporting in csv format iam getting:
ID
1.00E+01
2.00E+09
3.00E+04
I found a similar thread however none have a clear explanation or links were broken.

This is not the issue with Python writing the wrong value in CSV file. If you open the csv file, you will see value is written in correct numeric format. If that is not the case, please provide your code and sample data.
Assuming it is written correctly in CSV using python, then Please look for converting the values in excel from scientific notation to text or number.

Related

Encoding two different languages using pd.read_csv problem

I am building a Neural Machine Automatic Translator from German to Arabic. I am reading a CSV file containing German sentences and their corresponding Arabic translations. I want to read both languages at the same time using pd.read_csv. I have tried all the codes for all languages in this Python documentation but none of them worked.
The only thing that worked best for me is this:
df = pd.read_csv("DLATS.csv", encoding ='windows-1256')
'windows-1256' is the encoding Alias for the Arabic language. But the problem is that it doesn't catch the German special characters like (ä) but it converts them into question marks (?). So the word drängte became dr?ngte.
So, can anyone please help me to solve this problem or how to work around it? I have thought of separating the German and Arabic sentences in separate CSV files so that each CSV file contains one row only, and then maybe I will try to mix them in the Python code. But it seems that pd.read_csv requires at least two columns in the CSV file to work.
Update: I have noticed that the original csv file contains these problems as well for the German language. So, I have finally managed to solve my problem by reading excel directly instead of csv since the original file is in Excel, so I used pd.read_excel without any encoding attribute and it worked well. I didn't know before that pandas has pd.read_excel.
In my case I use clear read_csv.
import pandas as pd
df = pd.read_csv('download.csv')
print(df)
german arabic
0 drängte حث
If you get bad results it is possible that data is not properly saved in csv.

Make hash on excel data to detect data changed with openpyxl

I have an excel file with a lot of sheets (100+). Each sheet is independant. I would like to know if the data in a specific sheet has been altered since it last was opened. At the moment, I have a solution based on a for loop on all the relevant cells and calculate a checksum from there. If it is different, then the sheet has been changed. The problem is that I need to access a lot of cells and python is notoriously slow at that kind of task.
My question is: would you people have a better solution than my very naive one that would be more efficient?
I am using pyopenxl, but I could use another library for this specific task but it must be a python library.
The data is not of a single kind: there is a mix of numbers and strings in each sheet. But every sheet is formatted with the same pattern. (i.e. always the same data type at a given coordinate)

Using pandas.DataFrame to get format of excel cell?

TLDR: Uploading an existing excel file to a pandas DataFrame using df = pd.read_excel(file.xlsx). Currently unable to find any way to get the format (in terms of the excel sheet, i.e. General, Number, Currency, etc.) from the DataFrame df. Does anyone have any suggestions?
Associated Topics: I know this is possible in PHP and C#, but I would prefer to stay in python for the simplicity.
You can set a style really easily in pandas, but I can't find any documentation which shows how to get a style for a particular item in the DataFrame.

What is the difference between save a pandas dataframe to pickle and to csv?

I am learning python pandas.
I see a tutorial which shows two ways to save a pandas dataframe.
pd.to_csv('sub.csv') and to open pd.read_csv('sub.csv')
pd.to_pickle('sub.pkl') and to open pd.read_pickle('sub.pkl')
The tutorial says to_pickle is to save the dataframe to disk. I am confused about this. Because when I use to_csv, I did see a csv file appears in the folder, which I assume is also save to disk right?
In general, why we want to save a dataframe using to_pickle rather than save it to csv or txt or other format?
csv
✅human readable
✅cross platform
⛔slower
⛔more disk space
⛔doesn't preserve types in some cases
pickle
✅fast saving/loading
✅less disk space
⛔non human readable
⛔python only
Also take a look at parquet format (to_parquet, read_parquet)
✅fast saving/loading
✅less disk space than pickle
✅supported by many platforms
⛔non human readable
Pickle is a serialized way of storing a Pandas dataframe. Basically, you are writing down the exact representation of the dataframe to disk. This means the types of the columns are and the indices are the same. If you simply save a file as csv, you are just storing it as a comma separated list. Depending on your data set, some information will be lost when you load it back up.
You can read more about pickle library in python, here.

Pandas - Decimal format when writing to_csv instead of scientific

I've just started using Pandas and I'm trying to export my dataset using the to_csv function on my dataframe fp_df
One column (entitled fp_df['Amount Due'])has multiple decimal places (the result is 0.000042) - but when using to_csv it's being output in a scientific notation, which the resulting system will be unable to read. It needs to be output as '0.000042'.
What is the easiest way to do this? The other answers I've found seem overly complex and I don't understand how or why they work.
(Apologies if any of my terminology is off, I'm still learning)
Check the documentation for to_csv(), you'll find an attribute called float_format
df.to_csv(..., float_format='%.6f')
you can define the format you want as defined in the Format Specification Mini-Language

Categories