How to drop the first row number column pandas? - python

This question may sound similar to other questions posted, but I'm posting this after searching long for this exact solution.
So, I've a JSON from which I'm creating a pandas dataframe:
col_list = ["allocation","completion_date","has_expanded_access"]
final_data = dict((k,d[k]) for k in (col_list) if k in d)
a = json_normalize(final_data)
And then this:
I tried saving with:
df = df.reset_index(drop=True)
And
df = df.rename_axis(None)
As suggested on few answers, but of no use, when I try to save it, this default first column containing row index comes with header as blank (null), even if I try to drop, it doesn't work. Any help?

Try
df.to_csv('df_name.csv', sep = ';', encoding = 'cp1251', index = False)
to save df without indices.
Or change index column with
df.set_index('col_name')

If you want to save the dataframe as csv file then you can do this:
df.to_csv(filename, index=False)

Related

Csv column unnamed headers being written. How do I stop that, it adds the row number on the left whenever I run the program, offsetting index writing

I am trying to replace a certain cell in a csv but for some reason the code keeps adding this to the csv:
,Unnamed: 0,User ID,Unnamed: 1,Unnamed: 2,Balance
0,0,F7L3-2L3O-8ASV-1CG4,,,5.0
1,1,YP2V-9ERY-6V3H-UG1A,,,4.0
2,2,9FPM-879N-3BKG-ZBX8,,,0.0
3,3,1CY4-47Y8-6317-UQTK,,,5.0
4,4,H9BP-5N77-7S2T-LLMG,,,100.0
It should look like this:
User ID,,,Balance
F7L3-2L3O-8ASV-1CG4,,,5.0
YP2V-9ERY-6V3H-UG1A,,,4.0
9FPM-879N-3BKG-ZBX8,,,0.0
1CY4-47Y8-6317-UQTK,,,5.0
H9BP-5N77-7S2T-LLMG,,,100.0
My code is:
equations_reader = pd.read_csv("bank.csv")
equations_reader.to_csv('bank.csv')
add_e_trial = equations_reader.at[bank_indexer_addbalance, 'Balance'] = read_balance_add + coin_amount
In summary, I want to open the CSV file, make a change and save it again without Pandas adding an index and without it modifying empty columns.
Why is it doing this? How do I fix it?
Pandas as you have seen will allocate Unnamed:xxx column names to empty column headers. These columns can either be removed or renamed.
When saving, by default Pandas will add a numbered index column, this is optional and can be removed by adding an index=False parameter.
For example:
import pandas as pd
df = pd.read_csv("bank.csv")
# Rename any unnamed columns
df = df.rename(columns=lambda x: '' if x.startswith('Unnamed') else x)
# Remove any unnamed columns
# df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
# << update cells >>
df.to_csv('bank2.csv', index=False)
This would rename any column names that start Unnamed to an empty string. This approach should result in bank.csv only having your updated cells applied.

Summary Row for a pd.DataFrame with multiindex

I have a multiIndex dataframe created with pandas similar to this one:
nest = {'A1': dfx[['aa','bb','cc']],
'B1':dfx[['dd']],
'C1':dfx[['ee', 'ff']]}
reform = {(outerKey, innerKey): values for outerKey, innerDict in nest.items() for innerKey, values in innerDict.items()}
dfzx = pd.DataFrame(reform)
What I am trying to achieve is to add a new row at the end of the dataframe that contains a summary of the total for the three categories represented by the new index (A1, B1, C1).
I have tried with df.loc (what I would normally use in this case) but I get error. Similarly for iloc.
a1sum = dfzx['A1'].sum().to_list()
a1sum = sum(a1sum)
b1sum = dfzx['B1'].sum().to_list()
b1sum = sum(b1sum)
c1sum = dfzx['C1'].sum().to_list()
c1sum = sum(c1sum)
totalcat = a1sum, b1sum, c1sum
newrow = ['Total', totalcat]
newrow
dfzx.loc[len(dfzx)] = newrow
ValueError: cannot set a row with mismatched columns
#Alternatively
newrow2 = ['Total', a1sum, b1sum, c1sum]
newrow2
dfzx.loc[len(dfzx)] = newrow2
ValueError: cannot set a row with mismatched columns
How can I fix the mistake? Or else is there any other function that would allow me to proceed?
Note: the DF is destined to be moved on an Excel file (I use ExcelWriter).
The type of results I want to achieve in the end is this one (gray row "SUM"
I came up with a sort of solution on my own.
I created a separate DataFrame in Pandas that contains the summary.
I used ExcelWriter to have both dataframes on the same excel worksheet.
Technically It would be then possible to style and format data in Excel (xlsxwriter or framestyle seem to be popular modules to do so). Alternatively one should be doing that manually.

How to store the string of the column in excel using python

I have attached a screenshot of my excel sheet. I want to store the length of every string in SUPPLIER_id Length column. But when I run my code, CSV columns are blanks.
And when I use this same code in different CSV, it works well.
I am using following code but not able to print the data.
I have attached the snippet of csv. Can somebody tell me why is this happening:
import pandas as pd
data = pd.read_csv(r'C:/Users/patesari/Desktop/python work/nba.csv')
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
data.dropna(inplace = True)
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
data
print(df)
data.to_csv("C:/Users/patesari/Desktop/python work/nba.csv")
I faced a similar problem in the past.
Instead of:
df = pd.DataFrame(data, columns= ['SUPPLIER_ID','ACTION'])
Type this:
data.columns=['SUPPLIER_ID','ACTION']
Also, I don't understand why did you create DataFrame df. It was unnecessary in my opinion.
Aren't you getting a SettingWithCopyWarning from pandas? I would imagine (haven't ran this code) that these lines
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data['SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data['SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)
would not do anything, and should be replaced with
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(str)
data.loc[:, 'SUPPLIER_ID LENGTH']= data['SUPPLIER_ID'].str.len()
data.loc[:, 'SUPPLIER_ID']= data['SUPPLIER_ID'].astype(float)

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.

Removing index column in pandas when reading a csv

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "
When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.
df.reset_index(drop=True, inplace=True)
DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.
You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)
If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))
you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data
One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)
To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.
It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

Categories