This question already has answers here:
Split large Dataframe into smaller equal dataframes
(2 answers)
Split a large pandas dataframe
(10 answers)
Closed 1 year ago.
I have a dataframe that is 100,227 records long.
I would like to export this dataframe into 3 equally sized csv's
I did the following but getting an error. Is there perhaps a simpler approach to doing this?
df_seen = pd.read_csv("data.csv")
df1 = df_seen.shape.iloc[:, :33409]
df2 = df_seen.shape.iloc[33410:, 66818:]
df3 = df_seen.shape.iloc[66819:, 100227]
df1.to_csv('data1.csv', index=False)
df2.to_csv('data2.csv',index = False)
df3.to_csv('data3.csv',index = False)
So for example, the data looks similar to below where I would take the 1 column of 15 numbers and split it into 3 columns of 5 each:
Numbers
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
Desired out put (note each column represents a new csv with a single column)
1 6 11
2 7 12
3 8 13
4 9 14
5 10 15
Try using numpy.array_split:
import numpy as np
df1, df2, df3 = np.array_split(df_seen, 3)
To save each DataFrame to a separate file, you could do:
for i, df in enumerate(np.array_split(df_seen, 3)):
df.to_csv(f"data{i+1}.csv", index=False)
Related
I'm using a pandas dataframe to read a csv that has data points for machine learning. I'm trying to come up with a way that would allow me to index a dataframe where it would get that index and the next N number of rows. I don't want to group the data frame into bins with no overlap (i.e. index 0:4, 4:8, etc.) What I do want is to get a result like this: index 0:4, 1:5, 2:6,etc. How would this be done?
Maybe you can create a list of DataFrames, like:
import pandas as pd
import numpy as np
nrows = 7
group_size = 5
df = pd.DataFrame({'col1': np.random.randint(0, 10, nrows)})
print(df)
grp = [df.iloc[x:x+5,] for x in range(df.shape[0] - group_size + 1)]
print(grp[1])
Original DataFrame:
col1
0 2
1 6
2 6
3 5
4 3
5 3
6 8
2nd DataFrame from the list of DataFrames:
col1
1 6
2 6
3 5
4 3
5 3
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two data frames here
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'id':[1,2,3,2,5], 'grade':[3,5,3,2,1]})
df2 = pd.DataFrame({'id':[1,2,3], 'final':[6,4,2]})
Now I want to take final column from df2 and add to df1 based on the id column. Here is the desired output
output = pd.DataFrame({'id':[1,2,3,2,5],'grade':[3,5,3,2,1], 'final':[6,4,2,4,np.nan]})
What approach can I try?
One way to do it is by using map
df1['final'] = df1['id'].map(df2.set_index('id')['final'])
#result
id grade final
0 1 3 6.0
1 2 5 4.0
2 3 3 2.0
3 2 2 4.0
4 5 1 NaN
I have 2 dataframes that looks like this:
Index1 Games1
1 1
2 5
3 10
Index2 Games2
4 2
5 4
6 6
How can I combine them to make it like this:
Index Games
1 1
2 5
3 10
4 2
5 4
6 6
Thank you!
Try this:
import pandas as pd
import numpy
# Assuming your dataframes are named df1, and df2
new_frame = pd.DataFrame(numpy.vstack((df1.values, df2.values)))
print(new_frame)
This method creates a new dataframe by performing the vstack operation out of the numpy library.
Vstack is essentially a way of concatenating, but stacks them in sequence, preserving their row order.
This question already has answers here:
Pandas number rows within group in increasing order
(2 answers)
Generate column of unique ID in pandas
(1 answer)
Closed 4 years ago.
I have a pandas DataFrame in python with one column A with numerical values:
A
11
12
13
12
14
I want to add a column that contains a counter that counts the number of elements per group up to that index in column A, like this:
A B
11 1
12 1
13 1
12 2
14 1
How do I create column B?
This question already has answers here:
Create a Pandas Dataframe by appending one row at a time
(31 answers)
Closed 5 years ago.
I have a pandas data frame that looks something like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
And I would like to add row 0 to the end of the data frame and to get a new data frame that looks like this:
A B C
0 1 2 3
1 4 5 6
2 7 8 9
3 1 2 3
What can I do in pandas to do this?
You can try:
df = df.append(df.iloc[0], ignore_index=True)
If you are inserting data from a list, this might help -
import pandas as pd
df = pd.DataFrame( [ [1,2,3], [2,5,7], [7,8,9]], columns=['A', 'B', 'C'])
print(df)
df.loc[-1] = [1,2,3] # list you want to insert
df.index = df.index + 1 # shifting index
df = df.sort_index() # sorting by index
print(df)