Adding a dataframe to another after a loop

Adding a dataframe to another after a loop - python

I have a portion of my dataframe here:
days = [1, 2, 3, 4, 5]
time = [2, 4, 2, 4, 2, 4, 2, 4, 2]
df1 = pd.DataFrame(days)
df2 = pd.Series(time)
df2 = df2.transpose()
df3 = df1*df2
df4 = df1.dot(df2.to_frame().T)
df4 =
0 1 2 3 4 5 6 7 8
0 2 4 2 4 2 4 2 4 2
1 4 8 4 8 4 8 4 8 4
2 6 12 6 12 6 12 6 12 6
3 8 16 8 16 8 16 8 16 8
4 10 20 10 20 10 20 10 20 10
I have an if loop that creates a single row dataframe which looks like:
df_new =
0 1 2 3 4 5 6 7 8
0 2 4 2 4 2 4 2 4 2
I need to be able to loop through and add this row to the end of the larger dataframe a handful of times so the end result looks like this:
df_final =
0 1 2 3 4 5 6 7 8
0 2 4 2 4 2 4 2 4 2
1 4 8 4 8 4 8 4 8 4
2 6 12 6 12 6 12 6 12 6
3 8 16 8 16 8 16 8 16 8
4 10 20 10 20 10 20 10 20 10
5 2 4 2 4 2 4 2 4 2
6 5 6 7 8 9 8 7 6 5
I have tried to either append or concact the new dataframe to the existing one, but I receive errors both ways. Either indexing errors or a few looping issues. I think I need either a better understanding of why the row cannot be added to the end of the dataframe or an idea of a work around. The loop has 25 iterations where I only added two, but the idea is the same, I will get a new row in the form of a single row dataframe and I need to add that data from the single row dataframe without the column headers to the final dataframe. I am willing to update my question as soon as I get a better idea of how this can work, it does not seem like a difficult task, but I am sure I am asking the wrong thing.

Related

How to create a column to store trailing high value in Pandas DataFrame?

Consider a DataFrame with only one column named values.
data_dict = {values:[5,4,3,8,6,1,2,9,2,10]}
df = pd.DataFrame(data_dict)
display(df)
The output will look something like:
values
0 5
1 4
2 3
3 8
4 6
5 1
6 2
7 9
8 2
9 10
I want to generate a new column that will have the trailing high value of the previous column.
Expected Output:
values trailing_high
0 5 5
1 4 5
2 3 5
3 8 8
4 6 8
5 1 8
6 2 8
7 9 9
8 2 9
9 10 10
Right now I am using for loop to iterate on df.iterrows() and calculating the values at each row. Because of this, the code is very slow.
Can anyone share the vectorization approach to increase the speed?

Use .cummax:
df["trailing_high"] = df["values"].cummax()
print(df)
Output
values trailing_high
0 5 5
1 4 5
2 3 5
3 8 8
4 6 8
5 1 8
6 2 8
7 9 9
8 2 9
9 10 10

How to plot/graph top modes through panda python

So i have a column in a CSV file that I would like to gather data on. It is full of integers, but I would like to bar-graph the top 5 "modes"/"most occurred" numbers within that column. Is there any way to do this?

Assuming you have a big list of integers in the form of a pandas series s.
s.value_counts().plot.bar() should do it.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html

you can use .value_counts().head().plot(kind='bar')
for example:
df = pd.DataFrame({'a':[1,1,2,3,5,8,1,5,6,9,8,7,5,6,7],'b':[1,1,2,3,3,3,4,5,6,7,7,7,7,8,2]})
df
a b
0 1 1
1 1 1
2 2 2
3 3 3
4 5 3
5 8 3
6 1 4
7 5 5
8 6 6
9 9 7
10 8 7
11 7 7
12 5 7
13 6 8
14 7 2
df.b.value_counts().head() # count values of column 'b' and show only top 5 values
7 4
3 3
2 2
1 2
8 1
Name: b, dtype: int64
df.b.value_counts().head().plot(kind='bar') #create bar plot for top values

Python append loop issue

I would like to append rows to a dataframe using a loop, but I can't figure out how not to overwrite the previously appended rows.
Example of starting dataframe
print df
quantity cost
0 1 30
1 1 5
2 2 10
3 4 8
4 5 2
My goal is
quantity cost
0 1 30
1 1 5
2 2 10
3 4 8
4 5 2
5 2 10
6 4 8
7 4 8
8 4 8
9 5 2
10 5 2
11 5 2
12 5 2
My current code is incorrect (only appending rows with quantity==5), but I can't figure out how to fix it.
for x in xrange(2,6):
data = df['quantity'] == x
data = df[data]
df_new = df.append([data]*(x-1),ignore_index=True)
Any advice would be awesome, thank you!

Python : get random ten values from a pandas dataframe [duplicate]

This question already has answers here:
Random row selection in Pandas dataframe
(6 answers)
Closed 6 years ago.
I am trying to build an algorithm for finding number of clusters. I need to assign random points from a data set as initial means.
I first tried the following code :
mu=random.sample(df,10)
it gave index out of range error.
I converted the same into a numpy array and then did
mu=random.sample(np.array(df).tolist(),10)
instead of giving 10 values as mean it is giving me 10 arrays of values.
How can I get a 10 values to initialise as mean for 10 clusters from the dataframe?

Use numpy.random.choice
df.iloc[np.random.choice(np.arange(len(df)), 10, False)]
Or numpy.random.permutation
df.loc[np.random.permutation(df.index)[:10]]
a b c
11 2 9 9
1 7 7 0
16 5 1 8
15 0 8 2
17 1 5 4
19 5 0 9
10 7 7 0
8 4 4 3
6 6 2 4
14 7 6 2

I think you need DataFrame.sample:
mu = df.sample(10)
Sample:
np.random.seed(100)
df = pd.DataFrame(np.random.randint(10, size=(20,3)), columns=list('abc'))
print (df)
a b c
0 8 8 3
1 7 7 0
2 4 2 5
3 2 2 2
4 1 0 8
5 4 0 9
6 6 2 4
7 1 5 3
8 4 4 3
9 7 1 1
10 7 7 0
11 2 9 9
12 3 2 5
13 8 1 0
14 7 6 2
15 0 8 2
16 5 1 8
17 1 5 4
18 2 8 3
19 5 0 9
mu = df.sample(10)
print (mu)
a b c
11 2 9 9
1 7 7 0
8 4 4 3
5 4 0 9
2 4 2 5
19 5 0 9
13 8 1 0
14 7 6 2
0 8 8 3
9 7 1 1

adding data from one df conditionally in pandas

I have a dataframe that looks like this:
test_data = pd.DataFrame(np.array([np.arange(10)]*3).T, columns =['issuer_id','winner_id','gov'])
issuer_id winner_id gov
0 0 0 0
1 1 1 1
2 2 2 2
3 3 3 3
4 4 4 4
5 5 5 5
6 6 6 6
7 7 7 7
8 8 8 8
9 9 9 9
and a list of two-tuples consisting of a dataframe and a label encoding 'gov' (perhaps a label:dataframe dict would be better). In test_out below the two labels are 2 and 7.
test_out = [(pd.DataFrame(np.array([np.arange(10)]*2).T, columns =['id','partition']),2),(pd.DataFrame(np.array([np.arange(10)]*2).T, columns =['id','partition']),7)]
[( id partition
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9, 2), ( id partition
0 0 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9, 7)]
I want to add two columns to the test_data dataframe: issuer_partition and winner_partition
test_data['issuer_partition']=''
test_data['winner_partition']=''
and I would like to fill in these values from the test_out list where the entry in the gov column determines the labeled dataframe in test_out to draw from. Then I look up the winner_id and issuer_id in the id-partition dataframe and write them into test_data.
Put another way: I have a list of labeled dataframes that I would like to loop through to conditionally fill in data in a primary dataframe.
Is there a clever way to use merge in this scenario?
*edit - added another sentence and fixed test_out code

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Adding a dataframe to another after a loop - python

Related

How to create a column to store trailing high value in Pandas DataFrame?

How to plot/graph top modes through panda python

Python append loop issue

Python : get random ten values from a pandas dataframe [duplicate]

adding data from one df conditionally in pandas

Categories

Resources