Updating values in a particular column of a database using python - python

I have a col called id in a dataframe called _newdata which looks like this. Note that this is a part of the values in the column and not the entire thing.
1
1
1
2
2
2
2
2
4
4
4
4
4
5
5
5
5
7
7
7
7
7
8
8
8
8
10
10
10
What I want to do is the make rename the 'id' with values so that it is in running numbers. Which means I want it to look like this
1
1
1
2
2
2
2
2
3
3
3
3
3
4
4
4
4
5
5
5
5
5
6
6
6
6
7
7
7
I tried using this but it didn't seem to do anything to the file. Could someone tell me where I went wrong or suggest a method to do what I want it to do?
count = 1 #values start at 1
for i, row in _newdata.iterrows():
if row['id']==count or row['id']==count+1:
pass
else:
count+=1
row['id']=count

You can use dense rank():
df['id'] = df['id'].rank(method='dense').astype(int)

Related

How to create a column to store trailing high value in Pandas DataFrame?

Consider a DataFrame with only one column named values.
data_dict = {values:[5,4,3,8,6,1,2,9,2,10]}
df = pd.DataFrame(data_dict)
display(df)
The output will look something like:
values
0 5
1 4
2 3
3 8
4 6
5 1
6 2
7 9
8 2
9 10
I want to generate a new column that will have the trailing high value of the previous column.
Expected Output:
values trailing_high
0 5 5
1 4 5
2 3 5
3 8 8
4 6 8
5 1 8
6 2 8
7 9 9
8 2 9
9 10 10
Right now I am using for loop to iterate on df.iterrows() and calculating the values at each row. Because of this, the code is very slow.
Can anyone share the vectorization approach to increase the speed?
Use .cummax:
df["trailing_high"] = df["values"].cummax()
print(df)
Output
values trailing_high
0 5 5
1 4 5
2 3 5
3 8 8
4 6 8
5 1 8
6 2 8
7 9 9
8 2 9
9 10 10

create pandas.DataFrame from Postgreql and convert multiple rows and one column to one row and multiple column

i have a postgresql db that look like this:
price
1
2
4
9
7
8
3
7
5
3
7
and I want it to look like this:
1 2 4 9 7 8 3 7 5 3 7
I'm reding it using pandas.read_sql()
now I want to convert that the DataFrame will be instead of 11 rows and one column to be 1 row and
11 column, from what I'm understanding I need to use the pandas.melt() function but I didn't understand how?
You can do
df.T
Out[7]:
0 1 2 3 4 5 6 7 8 9 10
price 1 2 4 9 7 8 3 7 5 3 7

How to plot/graph top modes through panda python

So i have a column in a CSV file that I would like to gather data on. It is full of integers, but I would like to bar-graph the top 5 "modes"/"most occurred" numbers within that column. Is there any way to do this?
Assuming you have a big list of integers in the form of a pandas series s.
s.value_counts().plot.bar() should do it.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.value_counts.html
you can use .value_counts().head().plot(kind='bar')
for example:
df = pd.DataFrame({'a':[1,1,2,3,5,8,1,5,6,9,8,7,5,6,7],'b':[1,1,2,3,3,3,4,5,6,7,7,7,7,8,2]})
df
a b
0 1 1
1 1 1
2 2 2
3 3 3
4 5 3
5 8 3
6 1 4
7 5 5
8 6 6
9 9 7
10 8 7
11 7 7
12 5 7
13 6 8
14 7 2
df.b.value_counts().head() # count values of column 'b' and show only top 5 values
7 4
3 3
2 2
1 2
8 1
Name: b, dtype: int64
df.b.value_counts().head().plot(kind='bar') #create bar plot for top values

Delete the other rows in a data frame except the values in a list in python

I have a data frame Input with many variables and also a list k with many values that are name of variables of data frame. I am trying to Include the only values in list and create separate data frame.
k = [IN_15M, IN_9M, IN_6M]`
Input:
ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M IN_6M IN_9M IN_15M
A 2 3 4 6 2 3 4 6
B 3 3 5 7 3 3 5 7
C 2 3 6 6 2 3 6 6
D 3 3 7 7 3 3 7 7
Output:
ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M
A 2 3 4 6 2
B 3 3 5 7 3
C 2 3 6 6 2
D 3 3 7 7 3
I have tried the following code and got an error. Can anyone help me in solving this
error
`Output= Input[K]
By using isin
k = ['IN_15M', 'IN_9M', 'IN_6M']
df.loc[:,~df.columns.isin(k)]
Out[122]:
ID OUT_3M OUT_6M OUT_9M OUT_15M IN_3M
0 A 2 3 4 6 2
1 B 3 3 5 7 3
2 C 2 3 6 6 2
3 D 3 3 7 7 3

Python append loop issue

I would like to append rows to a dataframe using a loop, but I can't figure out how not to overwrite the previously appended rows.
Example of starting dataframe
print df
quantity cost
0 1 30
1 1 5
2 2 10
3 4 8
4 5 2
My goal is
quantity cost
0 1 30
1 1 5
2 2 10
3 4 8
4 5 2
5 2 10
6 4 8
7 4 8
8 4 8
9 5 2
10 5 2
11 5 2
12 5 2
My current code is incorrect (only appending rows with quantity==5), but I can't figure out how to fix it.
for x in xrange(2,6):
data = df['quantity'] == x
data = df[data]
df_new = df.append([data]*(x-1),ignore_index=True)
Any advice would be awesome, thank you!

Categories