Delete empty dataframes from a list with dataframes - python

This is a list of dataframes.
import pandas as pd
data=[pd.DataFrame([1,2,3],columns=['a']),pd.DataFrame([]),pd.DataFrame([]),
pd.DataFrame([3,4,5,6,7],columns=['a'])]
I am trying to delete the empty dataframes from the above list that contains dataframes.
Here is what I have tried:
for i in data:
del i.empty()
data
which gives:
File "<ipython-input-33-d07b32efe793>", line 2
del i.empty()
^ SyntaxError: cannot delete function call
Important:It needs to store them in the data variable as well

try this:
import pandas as pd
data = [pd.DataFrame([1, 2, 3], columns=['a']), pd.DataFrame([]),
pd.DataFrame([]),
pd.DataFrame([3, 4, 5, 6, 7], columns=['a'])]
for i in range(len(data)-1, 0, -1):
if data[i].empty:
del data[i]
print(data)
The problem with your code is that df.empty returns True or False, While what you want to do is delete the item i if i.empty() returned True.
Please noted that in the range we use a reversed range in order to avoid getting list item out of range error.

We ca use filter
data = list(filter(lambda df: not df.empty, data))
or list comprehension
data = [df for df in data if not df.empty]
print(data)
[ a
0 1
1 2
2 3, a
0 3
1 4
2 5
3 6
4 7]

You can do this:
[i for i in data if len(i)>0]
Output:
[ a
0 1
1 2
2 3, a
0 3
1 4
2 5
3 6
4 7]

Related

Insert Row in Dataframe at certain place

I have the following Dataframe:
Now i want to insert an empty row after every time the column "Zweck" equals 7.
So for example the third row should be an empty row.
import numpy as np
import pandas as pd
df = pd.DataFrame({'a': [1, 2, 3, 4, 5], 'b': [1, 2, 3, 4, 5], 'f': [1, 7, 3, 4, 7]})
ren_dict = {i: df.columns[i] for i in range(len(df.columns))}
ind = df[df['f'] == 7].index
df = pd.DataFrame(np.insert(df.values, ind, values=[33], axis=0))
df.rename(columns=ren_dict, inplace=True)
ind_empt = df['a'] == 33
df[ind_empt] = ''
print(df)
Output
a b f
0 1 1 1
1
2 2 2 7
3 3 3 3
4 4 4 4
5
6 5 5 7
Here the dataframe is overwritten, as the append operation will be resource intensive. As a result, the required strings with values 33 appear. This is necessary because np.insert does not allow string values to be substituted. Columns are renamed to their original state with: df.rename. Finally, we find lines with df['a'] == 33 to set to empty values.

What is the best way to write a dictionary of column connection with column2?

How to create dictionary of connecting column index selection:
data:
No_1 C_N
1 1
1 2
1 7
2 13
2 6
desired output should be like {1:[1,2,7],2:[13,6]}
I have tried this, but it doesn't seem to work.
import pandas as pd
df = pd.read_csv('connection.csv' ,sep=',')
for i ,j in zip(range(166),df['No_1']):
if i==j:
print(df['C_N'])
Use df.groupby('No_1')['C_N'].apply(list).to_dict().
This gives you {1: [1, 2, 7], 2: [13, 6]}.

Change all index of pandas series to one value

I'm trying to change all index values of a pandas series to one value. I have 200k+ rows and the index is a number from 0 to 200k+. I want the index to be a single string, for example 'Token'. Is this possible with pandas? I've tried reindex but that doesnt seem to work, I think that would only work if i would give a 200k list of 'token' as argument which is not what I want to do.
Use insert and set_index, like example here below:
df = pd.DataFrame({'B': [1, 2, 3], 'C': [4, 5, 6]})
df
Out:
B C
0 1 4
1 2 5
2 3 6
idx = 0
index_string = 'token'
df.insert(loc=idx, column='A', value=index_string)
df.set_index('A', inplace=True)
df
Out:
B C
A
token 1 4
token 2 5
token 3 6

Get the column names of a python numpy array

I have a csv data file with a header indicating the column names.
xy wz hi kq
0 10 5 6
1 2 4 7
2 5 2 6
I run:
X = np.array(pd.read_csv('gbk_X_1.csv').values)
I want to get the column names:
['xy', 'wz', 'hi', 'kg']
I read this post but the solution provides me with None.
Use the following code:
import re
f = open('f.csv','r')
alllines = f.readlines()
columns = re.sub(' +',' ',alllines[0]) #delete extra space in one line
columns = columns.strip().split(',') #split using space
print(columns)
Assume CSV file is like this:
xy wz hi kq
0 10 5 6
1 2 4 7
2 5 2 6
Let's assume your csv file looks like
xy,wz,hi,kq
0,10,5,6
1,2,4,7
2,5,2,6
Then use pd.read_csv to dump the file into a dataframe
df = pd.read_csv('gbk_X_1.csv')
The dataframe now looks like
df
xy wz hi kq
0 0 10 5 6
1 1 2 4 7
2 2 5 2 6
It's three main components are the
data which you can access via the values attribute
df.values
array([[ 0, 10, 5, 6],
[ 1, 2, 4, 7],
[ 2, 5, 2, 6]])
index which you can access via the index attribute
df.index
RangeIndex(start=0, stop=3, step=1)
columns which you can access via the columns attribute
df.columns
Index(['xy', 'wz', 'hi', 'kq'], dtype='object')
If you want the columns as a list, use the to_list method
df.columns.tolist()
['xy', 'wz', 'hi', 'kq']

Selecting rows from pandas DataFrame using a list

I have a list of lists as below
[[1, 2], [1, 3]]
The DataFrame is similar to
A B C
0 1 2 4
1 0 1 2
2 1 3 0
I would like a DataFrame, if the value in column A is equal to the first element of any of the nested lists and the value in column B of the corresponding row is equal to the second element of that same nested list.
Thus the resulting DataFrame should be
A B C
0 1 2 4
2 1 3 0
The code below do want you need:
tmp_filter = pandas.DataFrame(None) #The dataframe you want
# Create your list and your dataframe
tmp_list = [[1, 2], [1, 3]]
tmp_df = pandas.DataFrame([[1,2,4],[0,1,2],[1,3,0]], columns = ['A','B','C'])
#This function will pass the df pass columns by columns and
#only keep the columns with the value you want
def pass_true_df(df, cond):
for i, c in enumerate(cond):
df = df[df.iloc[:,i] == c]
return df
# Pass through your list and add the row you want to keep
for i in tmp_list:
tmp_filter = pandas.concat([tmp_filter, pass_true_df(tmp_df, i)])
import pandas
df = pandas.DataFrame([[1,2,4],[0,1,2],[1,3,0],[0,2,5],[1,4,0]],
columns = ['A','B','C'])
filt = pandas.DataFrame([[1, 2], [1, 3],[0,2]],
columns = ['A','B'])
accum = []
#grouped to-filter
data_g = df.groupby('A')
for k2,v2 in data_g:
accum.append(v2[v2.B.isin(filt.B[filt.A==k2])])
print(pandas.concat(accum))
result:
A B C
3 0 2 5
0 1 2 4
2 1 3 0
(I made the data and filter a little more complicated as a test.)

Categories