Selecting a Range of Adjacent Columns for Dataframe - python

I am not understanding how to essentially say: columns= [0:6, 12:15])
When I try this I get invalid syntax at the :
import pandas as pd
data = pd.read_excel (rf'C:\Users\dusti\Desktop\bulk export.xlsx',
sheet_name=1,
header=None)
df = pd.DataFrame(data,
columns= [0,1,2,3,4,5,6,12,13,14,15])
df.to_csv(rf'C:\Users\dusti\Desktop\bulk export1.csv',
header=False,
index=False)
print (df)

this thing that you trying is slicing. it used for select a subset of a list
You can use the range function for create numbers and convert it to a list with the list function
list(range(0,6+1)) + list(range(12,15+1))
#output :
[0, 1, 2, 3, 4, 5, 6, 12, 13, 14, 15]

Related

transpose multiple datas in pandas

I have raw data containing the number of stores spread over numerous pages with no headers or columns.
Please sample below
I want to tranpose the data to this
Anyone who can help me figure out how to get the results I want?
import pandas as pd
# Creating the DataFrame
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
"B":[7, 2, 54, 3, None],
"C":[20, 16, 11, 3, 8],
"D":[14, 3, None, 2, 6]})
index_ = ['Row_1', 'Row_2', 'Row_3', 'Row_4', 'Row_5']
df.index = index_
# Print the DataFrame
print(df)
> # return the transpose result = df.transpose()
> # Print the result print(result)

Can DataFrame use np.select after two DataFrame combined?

I can use np.select to insert a new column and set the value for one dataFrame.
But when I combined both dataFrame. The np.select does not work. Seems index error.
import pandas as pd
import numpy as np
df = pd.DataFrame([[3, 2, 1],[4, 5, 6]], columns=['col1','col2','col3'], index=['a','b'])
df2 = pd.DataFrame([[14, 15, 16],[17, 16, 15]], columns=['col1','col2','col3'], index=['c','e'])
count = df.append(df2)
print(count)
conditions = [
(df["col1"] >= df["col2"]) & (df["col2"] >= df["col3"]),
]
choices = [100]
count["col4"] = np.select(conditions,choices, default='WHAT')
count
This is success
This is error after combine, error is :
ValueError: Length of values does not match length of index
I think there is a typo in your code when it comes to count vs df. The following code just works fine.
import pandas as pd
import numpy as np
df = pd.DataFrame([[3, 2, 1],[4, 5, 6]], columns=['col1','col2','col3'], index=['a','b'])
df2 = pd.DataFrame([[14, 15, 16],[17, 16, 15]], columns=['col1','col2','col3'], index=['c','e'])
count = df.append(df2)
print(count)
conditions = [
(count["col1"] >= count["col2"]) & (count["col2"] >= count["col3"]),
]
print(conditions)
choices = [100]
count["col4"] = np.select(conditions,choices, default='WHAT')
count

python dataframe how to convert set column to list

I tried to convert a set column to list in python dataframe, but failed. Not sure what's best way to do so. Thanks.
Here is the example:
I tried to create a 'c' column which convert 'b' set column to list. but 'c' is still set.
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = list(tdf['b'])
tdf
a b c
0 [1, 2, 3] {33, 11, 22} {33, 11, 22}
1 [2, 3, 4] {222, 111} {222, 111}
You could do:
import pandas as pd
data = [{'a': [1,2,3], 'b':{11,22,33}},{'a':[2,3,4],'b':{111,222}}]
tdf = pd.DataFrame(data)
tdf['c'] = [list(e) for e in tdf.b]
print(tdf)
Use apply:
tdf['c'] = tdf['b'].apply(list)
Because using list is doing to whole column not one by one.
Or do:
tdf['c'] = tdf['b'].map(list)

Using pandas string selection after sort_index

I can't figure out why un-commenting ts = ts.sort_index() in the code below throws an ErrorKey:
import datetime
import pandas as pd
df = pd.DataFrame({
'x': [2, 1, 3],
'd': [
datetime.datetime(2018, 5, 21),
datetime.datetime(2018, 5, 20),
datetime.datetime(2018, 5, 22)
]
})
ts = df.set_index('d')
#ts = ts.sort_index()
ts['2018-05-21']
My assumption is that sort_index in some ways generates a new index and therefore breaks the string selection but I can't find any evidence of it.
To provide some context, I want to sort this time series in order to select a time range (e.g., ts['2018-05-21':]). If I don't sort it, it works for the example above but not for the time range.
I will recommend using .loc
#ts = df.set_index('d')
#ts = ts.sort_index()
ts.loc['2018-05-21':,:]
Out[102]:
x
d
2018-05-21 2
2018-05-22 3

Drop Columns that starts with any of a list of strings Pandas

I'm trying to drop all columns from a df that start with any of a list of strings. I needed to copy these columns to their own dfs, and now want to drop them from a copy of the main df to make it easier to analyze.
df.columns = ["AAA1234", "AAA5678", "BBB1234", "BBB5678", "CCC123", "DDD123"...]
Entered some code that gave me this dataframes with these columns:
aaa.columns = ["AAA1234", "AAA5678"]
bbb.columns = ["BBB1234", "BBB5678"]
I did get the final df that I wanted, but my code felt rather clunky:
droplist_cols = [aaa, bbb]
droplist = []
for x in droplist_cols:
for col in x.columns:
droplist.append(col)
df1 = df.drop(labels=droplist, axis=1)
Columns of final df:
df1.columns = ["CCC123", "DDD123"...]
Is there a better way to do this?
--Edit for sample data--
df = pd.DataFrame([[1, 2, 3, 4, 5], [1, 3, 4, 2, 1], [4, 6, 9, 8, 3], [1, 3, 4, 2, 1], [3, 2, 5, 7, 1]], columns=["AAA1234", "AAA5678", "BBB1234", "BBB5678", "CCC123"])
Desired result:
CCC123
0 5
1 1
2 3
3 1
4 1
IICU
Lets begin with a dataframe thus;
df=pd.DataFrame({"A":[0]})
Modify dataframe to include your columns
df2=df.reindex(columns=["AAA1234", "AAA5678", "BBB1234", "BBB5678", "CCC123", "DDD123"], fill_value=0)
Drop all columns starting with A
df3=df2.loc[:,~df2.columns.str.startswith('A')]
If you need to drop say A OR B I would
df3=df2.loc[:,~(df2.columns.str.startswith('A')|df2.columns.str.startswith('B'))]

Categories