I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro
The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")
Related
I am trying to replace a certain cell in a csv but for some reason the code keeps adding this to the csv:
,Unnamed: 0,User ID,Unnamed: 1,Unnamed: 2,Balance
0,0,F7L3-2L3O-8ASV-1CG4,,,5.0
1,1,YP2V-9ERY-6V3H-UG1A,,,4.0
2,2,9FPM-879N-3BKG-ZBX8,,,0.0
3,3,1CY4-47Y8-6317-UQTK,,,5.0
4,4,H9BP-5N77-7S2T-LLMG,,,100.0
It should look like this:
User ID,,,Balance
F7L3-2L3O-8ASV-1CG4,,,5.0
YP2V-9ERY-6V3H-UG1A,,,4.0
9FPM-879N-3BKG-ZBX8,,,0.0
1CY4-47Y8-6317-UQTK,,,5.0
H9BP-5N77-7S2T-LLMG,,,100.0
My code is:
equations_reader = pd.read_csv("bank.csv")
equations_reader.to_csv('bank.csv')
add_e_trial = equations_reader.at[bank_indexer_addbalance, 'Balance'] = read_balance_add + coin_amount
In summary, I want to open the CSV file, make a change and save it again without Pandas adding an index and without it modifying empty columns.
Why is it doing this? How do I fix it?
Pandas as you have seen will allocate Unnamed:xxx column names to empty column headers. These columns can either be removed or renamed.
When saving, by default Pandas will add a numbered index column, this is optional and can be removed by adding an index=False parameter.
For example:
import pandas as pd
df = pd.read_csv("bank.csv")
# Rename any unnamed columns
df = df.rename(columns=lambda x: '' if x.startswith('Unnamed') else x)
# Remove any unnamed columns
# df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
# << update cells >>
df.to_csv('bank2.csv', index=False)
This would rename any column names that start Unnamed to an empty string. This approach should result in bank.csv only having your updated cells applied.
I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks
Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")
I have data in this format in CSV file. I want to have a excel file where all the values greater than 0 replaced with 1. Now I have tried this code but problem is either I loose the header (years eg 1960/1961) or I get error when I ignore them.
Here is my code trail.
import pandas as pd
data = pd.read_csv("first.csv")
data1 = data.apply(pd.to_numeric,errors='coerce')
data1 = (data1>0).astype(int)
data2 = data1.combine_first(data)
print(data2)
I want the output to be like
Here is the URL to csv file, you can download to run the given code.
https://gofile.io/?c=eWd049
numpy has a .ceil method that round up and .floor to round down.
numpy.ceil()
numpy.floor()
so it should be something like (once you change year/year as column title):
import numpy as np
for column in data.columns:
data[column]=data[column].apply(lambda x: np.ceil(x) if x<1 else np.floor(x))
for column title issues: specify the dtype and check the separator.
how set my indexes from "Unnamed" to the first line of my dataframe in python
import pandas as pd
df = pd.read_excel('example.xls','Day_Report',index_col=None ,skip_footer=31 ,index=False)
df = df.dropna(how='all',axis=1)
df = df.dropna(how='all')
df = df.drop(2)
To set the column names (assuming that's what you mean by "indexes") to the first row, you can use
df.columns = df.loc[0, :].values
Following that, if you want to drop the first row, you can use
df.drop(0, inplace=True)
Edit
As coldspeed correctly notes below, if the source of this is reading a CSV, then adding the skiprows=1 parameter is much better.
I'm using Pandas with Python 3. I have a dataframe with a bunch of columns, but I only want to change the data type of all the values in one of the columns and leave the others alone. The only way I could find to accomplish this is to edit the column, remove the original column and then merge the edited one back. I would like to edit the column without having to remove and merge, leaving the the rest of the dataframe unaffected. Is this possible?
Here is my solution now:
import numpy as np
import pandas as pd
from pandas import Series,DataFrame
def make_float(var):
var = float(var)
return var
#create a new dataframe with the value types I want
df2 = df1['column'].apply(make_float)
#remove the original column
df3 = df1.drop('column',1)
#merge the dataframes
df1 = pd.concat([df3,df2],axis=1)
It also doesn't work to apply the function to the dataframe directly. For example:
df1['column'].apply(make_float)
print(type(df1.iloc[1]['column']))
yields:
<class 'str'>
df1['column'] = df1['column'].astype(float)
It will raise an error if conversion fails for some row.
Apply does not work inplace, but rather returns a series that you discard in this line:
df1['column'].apply(make_float)
Apart from Yakym's solution, you can also do this -
df['column'] += 0.0