I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro
The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")
I'm trying to setup a data quality check for numeric columns in a dataframe. I want to run the describe() to produce stats on each numeric columns. How can I filter out other columns to produce stats. See line of code I'm using.
df1 = pandas.read_csv("D:/dc_Project/loans.csv")
print(df1.describe(include=sorted(df1)))
Went with the following from a teammate:
import pandas as pd
import numpy as np
df1 = pandas.read_csv("D:/dc_Project/loans.csv")
df2=df1.select_dtypes(include=np.number)
I have just started to learn to use Jupyter notebook. I have a data file called 'Diseases'.
Opening data file
import pandas as pd
df = pd.read_csv('Diseases.csv')
Choosing data from a column named 'DIABETES', i.e choosing subject IDs that have diabetes, yes is 1 and no is 0.
df[df.DIABETES >1]
Now I want to export this cleaned data (that has fewer rows)
df.to_csv('diabetes-filtered.csv')
This exports the original data file, not the filtered df with fewer rows.
I saw in another question that the inplace argument needs to be used. But I don't know how.
You forget assign back filtered DataFrame, here to df1:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df1 = df[df.DIABETES >1]
df1.to_csv('diabetes-filtered.csv')
Or you can chain filtering and exporting to file:
import pandas as pd
df = pd.read_csv('Diseases.csv')
df[df.DIABETES >1].to_csv('diabetes-filtered.csv')
I want to accomplish the same result as:
import pandas as pd
pd.read_csv("data.csv", index_col=0)
But in a faster way, using the datatable library in python and convert it to a pandas dataframe. This is what I am currently doing:
import datatable as dt
datatable = dt.fread("data.csv")
dataframe = datatable.to_pandas().set_index('C0')
Is there any faster way to do it?
I would like to have a parameter that allows me to use a column as the row labels of the DataTable: Like the index_col=0 in pandas.read_csv(). Also, why does datatable.fread create a 'CO' column?
I have a Pandas data frame with multiple columns whose types are either float64 or strings. I'm trying to use to_csv to write the data frame to an output file. However, it outputs big numbers with scientific notion. For example, if the number is 1344154454156.992676, it's saved in the file as 1.344154e+12.
How to suppress scientific notion for to_csv and keep the numbers as they are in the output file? I have tried to use float_format parameter in the to_csv function but it broke since there are also columns with strings in the data frame.
Here are some example codes:
import pandas as pd
import numpy as np
import os
df = pd.DataFrame({'names': ['a','b','c'],
'values': np.random.rand(3)*100000000000000})
df.to_csv('example.csv')
os.system("cat example.csv")
,names,values
0,a,9.41843213808e+13
1,b,2.23837359193e+13
2,c,9.91801198906e+13
# if i set up float_format:
df.to_csv('example.csv', float_format='{:f}'.format)
ValueError: Unknown format code 'f' for object of type 'str'
How can I get the data saved in the csv without scientific notion like below?
names values
0 a 94184321380806.796875
1 b 22383735919307.046875
2 c 99180119890642.859375
The float_format argument should be a str, use this instead
df.to_csv('example.csv', float_format='%f')
try setting the options like this
pd.set_option('float_format', '{:f}'.format)
For python 3.xx (tested on 3.6.5 and 3.7):
Options and Settings
For visualization of the dataframe pandas.set_option
import pandas as pd #import pandas package
# for visualisation fo the float data once we read the float data:
pd.set_option('display.html.table_schema', True) # to can see the dataframe/table as a html
pd.set_option('display.precision', 5) # setting up the precision point so can see the data how looks, here is 5
df = pd.DataFrame({'names': ['a','b','c'],
'values': np.random.rand(3)*100000000000000}) # generate random dataframe
Output of the data:
df.dtypes # check datatype for columns
[output]:
names object
values float64
dtype: object
Dataframe:
df # output of the dataframe
[output]:
names values
0 a 6.56726e+13
1 b 1.63821e+13
2 c 7.63814e+13
And now write to_csv using the float_format='%.13f' parameter
df.to_csv('estc.csv',sep=',', float_format='%.13f') # write with precision .13
file output:
,names,values
0,a,65672589530749.0703125000000
1,b,16382088158236.9062500000000
2,c,76381375369817.2968750000000
And now write to_csv using the float_format='%f' parameter
df.to_csv('estc.csv',sep=',', float_format='%f') # this will remove the extra zeros after the '.'
file output:
,names,values
0,a,65672589530749.070312
1,b,16382088158236.906250
2,c,76381375369817.296875
For more details check pandas.DataFrame.to_csv