I'm using a pandas dataframe to read a csv that has data points for machine learning. I'm trying to come up with a way that would allow me to index a dataframe where it would get that index and the next N number of rows. I don't want to group the data frame into bins with no overlap (i.e. index 0:4, 4:8, etc.) What I do want is to get a result like this: index 0:4, 1:5, 2:6,etc. How would this be done?
Maybe you can create a list of DataFrames, like:
import pandas as pd
import numpy as np
nrows = 7
group_size = 5
df = pd.DataFrame({'col1': np.random.randint(0, 10, nrows)})
print(df)
grp = [df.iloc[x:x+5,] for x in range(df.shape[0] - group_size + 1)]
print(grp[1])
Original DataFrame:
col1
0 2
1 6
2 6
3 5
4 3
5 3
6 8
2nd DataFrame from the list of DataFrames:
col1
1 6
2 6
3 5
4 3
5 3
Related
I need to count how many different elements are in my DataFrame (df).
My df has the day of the month (as a number: 1,2,3 ... 31) in which a certain variable was measured. There are 3 columns that describe the number of the day. There are multiple measurements in one day so my columns have repeated values. I need to know how many days in a month was that variable measured ignoring how many times a day was that measurement done. So I was thinking that counting the days ignoring repeated values.
As an example the data of my df would look like this:
col1 col2 col3
2 2 2
2 2 3
3 3 3
3 4 8
I need an output that tells me that in that DataFrame the numbers are 2, 3, 4 and 8.
Thanks!
Just do:
df=pd.DataFrame({"col1": [2,2,3,3], "col2": [2,2,3,4], "col3": [2,3,3,8]})
df.stack().unique()
Outputs:
[2 3 4 8]
You can use the function drop_duplicates into your dataframe, like:
import pandas as pd
df = pd.DataFrame({'a':[2,2,3], 'b':[2,2,3], 'c':[2,2,3]})
a b c
0 2 2 2
1 2 2 2
2 3 3 3
df = df.drop_duplicates()
print(df['a'].count())
out: 2
Or you can use numpy to get the unique values in the dataframe:
import pandas as pd
import numpy as np
df = pd.DataFrame({'X' : [2, 2, 3, 3], 'Y' : [2,2,3,4], 'Z' : [2,3,3,8]})
df_unique = np.unique(np.array(df))
print(df_unique)
#Output [2 3 4 8]
#for the count of days:
print(len(df_unique))
#Output 4
How about:
Assuming this is your initial df:
col1 col2 col3
0 2 2 2
1 2 2 2
2 3 3 3
Then:
count_df = pd.DataFrame()
for i in df.columns:
df2 = df[i].value_counts()
count_df = pd.concat([count_df, df2], axis=1)
final_df = count_df.sum(axis=1)
final_df = pd.DataFrame(data=final_df, columns=['Occurrences'])
print(final_df)
Occurrences
2 6
3 3
You can use pandas.unique() like so:
pd.unique(df.to_numpy().flatten())
I have done some basic benchmarking, this method appears to be the fastest.
I have 2 dataframes that looks like this:
Index1 Games1
1 1
2 5
3 10
Index2 Games2
4 2
5 4
6 6
How can I combine them to make it like this:
Index Games
1 1
2 5
3 10
4 2
5 4
6 6
Thank you!
Try this:
import pandas as pd
import numpy
# Assuming your dataframes are named df1, and df2
new_frame = pd.DataFrame(numpy.vstack((df1.values, df2.values)))
print(new_frame)
This method creates a new dataframe by performing the vstack operation out of the numpy library.
Vstack is essentially a way of concatenating, but stacks them in sequence, preserving their row order.
I just need one column of my dateframe, but in the original order. When I take it off, it is sorted by the values, and I can't understand why. I tried different ways to pick out one column but all the time it was sorted by the values.
this is my code:
import pandas
data = pandas.read_csv('/data.csv', sep=';')
longti = data.iloc[:,4]
To return the first Column your function should work.
import pandas as pd
df = pd.DataFrame(dict(A=[1,2,3,4,5,6], B=['A','B','C','D','E','F']))
df = df.iloc[:,0]
Out:
0 1
1 2
2 3
3 4
4 5
5 6
If you want to return the second Column you can use the following:
df = df.iloc[:,1]
Out:
0 A
1 B
2 C
3 D
4 E
5 F
This question already has answers here:
Create a Pandas Dataframe by appending one row at a time
(31 answers)
Closed 5 years ago.
I have a pandas data frame that looks something like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
And I would like to add row 0 to the end of the data frame and to get a new data frame that looks like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 1 2 3
What can I do in pandas to do this?
You can try:
df = df.append(df.iloc[0], ignore_index=True)
If you are inserting data from a list, this might help -
import pandas as pd
df = pd.DataFrame( [ [1,2,3], [2,5,7], [7,8,9]], columns=['A', 'B', 'C'])
print(df)
df.loc[-1] = [1,2,3] # list you want to insert
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
print(df)
Let's say I have a data frame with 4 rows, 3 columns. I'd like to stack the rows horizontally so that I get one row with 12 columns. How to do it and how to handle colliding column names?
You can achieve this by stacking the frame to produce a series of all the values, we then want to convert this back to a df using to_frame and then reset_index to drop the index levels and then transpose using .T:
In [2]:
df = pd.DataFrame(np.random.randn(4,3), columns=list('abc'))
df
Out[2]:
a b c
0 -1.744219 -2.475923 1.794151
1 0.952148 -0.783606 0.784224
2 0.386506 -0.242355 -0.799157
3 -0.547648 -0.139976 -0.717316
In [3]:
df.stack().to_frame().reset_index(drop=True).T
Out[3]:
0 1 2 3 4 5 6 \
0 -1.744219 -2.475923 1.794151 0.952148 -0.783606 0.784224 0.386506
7 8 9 10 11
0 -0.242355 -0.799157 -0.547648 -0.139976 -0.717316