Mix of columns in excel to one colum with Pandas - python

I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro

The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")

Related

Csv column unnamed headers being written. How do I stop that, it adds the row number on the left whenever I run the program, offsetting index writing

I am trying to replace a certain cell in a csv but for some reason the code keeps adding this to the csv:
,Unnamed: 0,User ID,Unnamed: 1,Unnamed: 2,Balance
0,0,F7L3-2L3O-8ASV-1CG4,,,5.0
1,1,YP2V-9ERY-6V3H-UG1A,,,4.0
2,2,9FPM-879N-3BKG-ZBX8,,,0.0
3,3,1CY4-47Y8-6317-UQTK,,,5.0
4,4,H9BP-5N77-7S2T-LLMG,,,100.0
It should look like this:
User ID,,,Balance
F7L3-2L3O-8ASV-1CG4,,,5.0
YP2V-9ERY-6V3H-UG1A,,,4.0
9FPM-879N-3BKG-ZBX8,,,0.0
1CY4-47Y8-6317-UQTK,,,5.0
H9BP-5N77-7S2T-LLMG,,,100.0
My code is:
equations_reader = pd.read_csv("bank.csv")
equations_reader.to_csv('bank.csv')
add_e_trial = equations_reader.at[bank_indexer_addbalance, 'Balance'] = read_balance_add + coin_amount
In summary, I want to open the CSV file, make a change and save it again without Pandas adding an index and without it modifying empty columns.
Why is it doing this? How do I fix it?
Pandas as you have seen will allocate Unnamed:xxx column names to empty column headers. These columns can either be removed or renamed.
When saving, by default Pandas will add a numbered index column, this is optional and can be removed by adding an index=False parameter.
For example:
import pandas as pd
df = pd.read_csv("bank.csv")
# Rename any unnamed columns
df = df.rename(columns=lambda x: '' if x.startswith('Unnamed') else x)
# Remove any unnamed columns
# df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
# << update cells >>
df.to_csv('bank2.csv', index=False)
This would rename any column names that start Unnamed to an empty string. This approach should result in bank.csv only having your updated cells applied.

Pandas dataframe

I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks
Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")

applying conditional on data in pandas dataframe and ignore headers

I have data in this format in CSV file. I want to have a excel file where all the values greater than 0 replaced with 1. Now I have tried this code but problem is either I loose the header (years eg 1960/1961) or I get error when I ignore them.
Here is my code trail.
import pandas as pd
data = pd.read_csv("first.csv")
data1 = data.apply(pd.to_numeric,errors='coerce')
data1 = (data1>0).astype(int)
data2 = data1.combine_first(data)
print(data2)
I want the output to be like
Here is the URL to csv file, you can download to run the given code.
https://gofile.io/?c=eWd049
numpy has a .ceil method that round up and .floor to round down.
numpy.ceil()
numpy.floor()
so it should be something like (once you change year/year as column title):
import numpy as np
for column in data.columns:
data[column]=data[column].apply(lambda x: np.ceil(x) if x<1 else np.floor(x))
for column title issues: specify the dtype and check the separator.

Need help to solve the Unnamed and to change it in dataframe in pandas

how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.

Change one column of a DataFrame only

I'm using Pandas with Python 3. I have a dataframe with a bunch of columns, but I only want to change the data type of all the values in one of the columns and leave the others alone. The only way I could find to accomplish this is to edit the column, remove the original column and then merge the edited one back. I would like to edit the column without having to remove and merge, leaving the the rest of the dataframe unaffected. Is this possible?
Here is my solution now:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
def make_float(var):
var = float(var)
return var
#create a new dataframe with the value types I want
df2 = df1['column'].apply(make_float)
#remove the original column
df3 = df1.drop('column',1)
#merge the dataframes
df1 = pd.concat([df3,df2],axis=1)
It also doesn't work to apply the function to the dataframe directly. For example:
df1['column'].apply(make_float)
print(type(df1.iloc[1]['column']))
yields:
<class 'str'>
df1['column'] = df1['column'].astype(float)
It will raise an error if conversion fails for some row.
Apply does not work inplace, but rather returns a series that you discard in this line:
df1['column'].apply(make_float)
Apart from Yakym's solution, you can also do this -
df['column'] += 0.0

Categories