Pandas dataframe

Pandas dataframe - python

I want to import an excel where I want to keep just some columns.
This is my code:
df=pd.read_excel(file_location_PDD)
col=df[['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab']]
print(col)
col.to_excel("JETNEW.xlsx")
I selected all the columns which I want it but 2 names of columns don't appear all time in the files which I have to import and these columns are 'usname' and 'sname'.
Cause of that I received an error ['usname','sname'] not in index
How can I do this ?
Thanks

Source -- https://stackoverflow.com/a/38463068/14515824
You need to use df.reindex instead of df[[]]. I also have changed 'excel.xlsx' to r'excel.xlsx' to specify to only read the file.
An example:
df.reindex(columns=['a','b','c'])
Which in your code would be:
file_location_PDD = r'excel.xlsx'
df = pd.read_excel(file_location_PDD)
col = df.reindex(columns=['hkont','dmbtr','belnr','monat','gjahr','budat','shkzg','shkzg','usname','sname','dmsol','dmhab'])
print(col)
col.to_excel("output.xlsx")

Related

Mix of columns in excel to one colum with Pandas

I have to import this Excel in code and I would like to unify the multi-index in a single column. I would like to delete the unnamed columns and unify everything into one. I don't know if it's possible.
I have tried the following and it imports, but the output is not as expected. I add the code here too
import pandas as pd
import numpy as np
macro = pd.read_excel(nameExcel, sheet_name=nameSheet, skiprows=3, header=[1,3,4])
macro = macro[macro.columns[1:]]
macro

The way to solve it is to save another header of the same length as the previous header.
cols = [...]
if len(df1.columns) == len(cols):
df1.columns = cols
else:
print("error")

Csv column unnamed headers being written. How do I stop that, it adds the row number on the left whenever I run the program, offsetting index writing

I am trying to replace a certain cell in a csv but for some reason the code keeps adding this to the csv:
,Unnamed: 0,User ID,Unnamed: 1,Unnamed: 2,Balance
0,0,F7L3-2L3O-8ASV-1CG4,,,5.0
1,1,YP2V-9ERY-6V3H-UG1A,,,4.0
2,2,9FPM-879N-3BKG-ZBX8,,,0.0
3,3,1CY4-47Y8-6317-UQTK,,,5.0
4,4,H9BP-5N77-7S2T-LLMG,,,100.0
It should look like this:
User ID,,,Balance
F7L3-2L3O-8ASV-1CG4,,,5.0
YP2V-9ERY-6V3H-UG1A,,,4.0
9FPM-879N-3BKG-ZBX8,,,0.0
1CY4-47Y8-6317-UQTK,,,5.0
H9BP-5N77-7S2T-LLMG,,,100.0
My code is:
equations_reader = pd.read_csv("bank.csv")
equations_reader.to_csv('bank.csv')
add_e_trial = equations_reader.at[bank_indexer_addbalance, 'Balance'] = read_balance_add + coin_amount
In summary, I want to open the CSV file, make a change and save it again without Pandas adding an index and without it modifying empty columns.
Why is it doing this? How do I fix it?

Pandas as you have seen will allocate Unnamed:xxx column names to empty column headers. These columns can either be removed or renamed.
When saving, by default Pandas will add a numbered index column, this is optional and can be removed by adding an index=False parameter.
For example:
import pandas as pd
df = pd.read_csv("bank.csv")
# Rename any unnamed columns
df = df.rename(columns=lambda x: '' if x.startswith('Unnamed') else x)
# Remove any unnamed columns
# df = df.loc[:, ~df.columns.str.contains('^Unnamed')]
# << update cells >>
df.to_csv('bank2.csv', index=False)
This would rename any column names that start Unnamed to an empty string. This approach should result in bank.csv only having your updated cells applied.

How do I split rows in DataFrame?

I want to split the rows while maintaing the values.
How can I split the rows like that?
The data frame below is an example.
the output that i want to see

You can use the pd.melt( ). Read the documentation for more information: https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I tried working on your problem.
import pandas as pd
melted_df = data.melt(id_vars=['value'], var_name="ToBeDropped", value_name="ID1")
This would show a warning because of the unambiguity in the string passed for "value_name" argument. This would also create a new column which I have assigned the name already. The new column will be called 'ToBeDropped'. Below code will remove the column for you.
df = melted_df.drop(columns = ['ToBeDropped'])
'df' will be your desired output.

via wide_to_long:
df = pd.wide_to_long(df, stubnames='ID', i='value',
j='ID_number').reset_index(0)
via set_index and stack:
df = df.set_index('value').stack().reset_index(name='IDs').drop('level_1', 1)
via melt:
df = df.melt(id_vars='value', value_name="ID1").drop('variable', 1)

How to use 'loc' for column selection of a dataframe in dask

Anyone can tell me how i should select one column with 'loc' in a dataframe using dask?
As a side note, when i am loading the dataframe using dd.read_csv with header equals to "None", the column name is starting from zero to 131094. I am about to select the last column with column name as 131094, and i get the error.
code:
> import dask.dataframe as dd
> df = dd.read_csv('filename.csv', header=None)
> y = df.loc['131094']
error:
File "/usr/local/dask-2018-08-22/lib/python2.7/site-packages/dask-0.5.0-py2.7.egg/dask/dataframe/core.py", line 180, in _loc
"Can not use loc on DataFrame without known divisions")
ValueError: Can not use loc on DataFrame without known divisions
Based on this guideline http://dask.pydata.org/en/latest/dataframe-indexing.html#positional-indexing, my code should work right but don't know what causes the problem.

If you have a named column, then use: df.loc[:,'col_name']
But if you have a positional column, like in your example where you want the last column, then use the answer by #user1717828.

I tried this on a dummy csv and it worked. I can't help you for sure without seeing the file giving you problems. That said, you might be picking rows, not columns.
Instead, try this.
import dask.dataframe as dd
df = dd.read_csv('filename.csv', header=None)
y = df[df.columns[-1]]

Removing index column in pandas when reading a csv

I have the following code which imports a CSV file. There are 3 columns and I want to set the first two of them to variables. When I set the second column to the variable "efficiency" the index column is also tacked on. How can I get rid of the index column?
df = pd.DataFrame.from_csv('Efficiency_Data.csv', header=0, parse_dates=False)
energy = df.index
efficiency = df.Efficiency
print efficiency
I tried using
del df['index']
after I set
energy = df.index
which I found in another post but that results in "KeyError: 'index' "

When writing to and reading from a CSV file include the argument index=False and index_col=False, respectively. Follows an example:
To write:
df.to_csv(filename, index=False)
and to read from the csv
df.read_csv(filename, index_col=False)
This should prevent the issue so you don't need to fix it later.

df.reset_index(drop=True, inplace=True)

DataFrames and Series always have an index. Although it displays alongside the column(s), it is not a column, which is why del df['index'] did not work.
If you want to replace the index with simple sequential numbers, use df.reset_index().
To get a sense for why the index is there and how it is used, see e.g. 10 minutes to Pandas.

You can set one of the columns as an index in case it is an "id" for example.
In this case the index column will be replaced by one of the columns you have chosen.
df.set_index('id', inplace=True)

If your problem is same as mine where you just want to reset the column headers from 0 to column size. Do
df = pd.DataFrame(df.values);
EDIT:
Not a good idea if you have heterogenous data types. Better just use
df.columns = range(len(df.columns))

you can specify which column is an index in your csv file by using index_col parameter of from_csv function
if this doesn't solve you problem please provide example of your data

One thing that i do is df=df.reset_index()
then df=df.drop(['index'],axis=1)

To remove or not to create the default index column, you can set the index_col to False and keep the header as Zero. Here is an example of how you can do it.
recording = pd.read_excel("file.xls",
sheet_name= "sheet1",
header= 0,
index_col= False)
The header = 0 will make your attributes to headers and you can use it later for calling the column.

It works for me this way:
Df = data.set_index("name of the column header to start as index column" )

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Pandas dataframe - python

Related

Mix of columns in excel to one colum with Pandas

Csv column unnamed headers being written. How do I stop that, it adds the row number on the left whenever I run the program, offsetting index writing

How do I split rows in DataFrame?

How to use 'loc' for column selection of a dataframe in dask

Removing index column in pandas when reading a csv

Categories

Resources