I created those lists to store things i generate over a for loop which is shown a bit further down.
neutralScore = []
lightPosScore = []
middlePosScore = []
heavyPosScore = []
lightNegScore = []
middleNegScore = []
heavyNegScore = []
Here comes the loop
score = float
convertStringToInt = get_sent_score_comp_df.apply(lambda x: float(x))
for score in convertStringToInt:
if score <= 0.09 and score >= -0.09:
neutralScore.append(score)
elif score >= 0.091 and score <= 0.49:
lightPosScore.append(score)
I want to save my sentiments into those lists, then join them to be able to convert them into a DataFrame to store them in a MySQl DB.
Is there an elegant way to do this?
scores = pandas.DataFrame(data=neutralScore).append(lightPosScore, middlePosScore, heavyPosScore).append(
lightNegScore,
middleNegScore,
heavyNegScore).columns = ['heavyPosScore', 'middlePosScore', 'lightPosScore', 'neutralScore', 'lightNegScore', 'middleNegScore', 'heavyNegScore']
I know, that declaring the list of columns needs to be done separately, but until now the code looks like that.
So far I tried this and it doesn't work since it returns:
Can only merge Series or DataFrame objects, a <class 'list'> was passed
which is understandable, but I canĀ“t think of a way to solve the problem right now.
It is not entirely clear from question how you want your final dataframe to look like. But I would do something like:
>>> import numpy as np
>>> import pandas as pd
>>> data = np.random.normal(0, 1, size=10)
>>> df = pd.DataFrame(data, columns=['score'])
>>> df
| | score |
|---:|----------:|
| 0 | -0.440304 |
| 1 | -0.597293 |
| 2 | -1.80229 |
| 3 | -1.65654 |
| 4 | -1.14571 |
| 5 | -0.760086 |
| 6 | 0.244437 |
| 7 | 0.828856 |
| 8 | -0.136325 |
| 9 | -0.325836 |
>>> df['neutralScore'] = (df.score <= 0.09) & (df.score >= -0.09)
>>> df['lightPosScore'] = (df.score >= 0.091) & (df.score <= 0.49)
And similar for the columns 'heavyPosScore', 'middlePosScore', 'lightNegScore', 'middleNegScore', 'heavyNegScore'.
>>> df
| | score | neutralScore | lightPosScore |
|---:|-----------:|:---------------|:----------------|
| 0 | -0.475571 | False | False |
| 1 | 0.109076 | False | True |
| 2 | 0.809947 | False | False |
| 3 | 0.595088 | False | False |
| 4 | 0.832727 | False | False |
| 5 | -1.30049 | False | False |
| 6 | 0.245578 | False | True |
| 7 | 0.0998278 | False | True |
| 8 | 0.20592 | False | True |
| 9 | 0.372493 | False | True |
You can then easily filter on the type of score like this:
>>> df[df.lightPosScore]
| | score | neutralScore | lightPosScore |
|---:|---------:|:---------------|:----------------|
| 0 | 0.415629 | False | True |
| 2 | 0.104852 | False | True |
| 4 | 0.39739 | False | True |
Edit
To have one rating colmn, first define a function to give your ratings and apply that to your score column:
>>> def get_rating(score):
if score <= 0.09 and score >= -0.09:
return 'neutralScore'
elif score >= 0.091 and score <= 0.49:
return 'lightPosScore'
else:
return 'to be implemented'
>>> df['rating'] = df['score'].apply(get_rating)
>>> df
| | score | rating |
|---:|-----------:|:------------------|
| 0 | -0.190816 | to be implemented |
| 1 | 0.495197 | to be implemented |
| 2 | -1.20576 | to be implemented |
| 3 | -0.711516 | to be implemented |
| 4 | -0.0606396 | neutralScore |
| 5 | 0.0452575 | neutralScore |
| 6 | 0.154754 | lightPosScore |
| 7 | -0.506285 | to be implemented |
| 8 | -0.896066 | to be implemented |
| 9 | 0.523198 | to be implemented |
Related
I have the following pandas dataframe, where the column id is the dataframe index
+----+-----------+------------+-----------+------------+
| | price_A | amount_A | price_B | amount_b |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
And I want to convert this dataframe in to a multi column data frame, that looks like this
+----+-----------+------------+-----------+------------+
| | A | B |
+----+-----------+------------+-----------+------------+
| id | price | amount | price | amount |
|----+-----------+------------+-----------+------------|
| 0 | 0.652826 | 0.941421 | 0.823048 | 0.728427 |
| 1 | 0.400078 | 0.600585 | 0.194912 | 0.269842 |
| 2 | 0.223524 | 0.146675 | 0.375459 | 0.177165 |
| 3 | 0.330626 | 0.214981 | 0.389855 | 0.541666 |
| 4 | 0.578132 | 0.30478 | 0.789573 | 0.268851 |
| 5 | 0.0943601 | 0.514878 | 0.419333 | 0.0170096 |
| 6 | 0.279122 | 0.401132 | 0.722363 | 0.337094 |
| 7 | 0.444977 | 0.333254 | 0.643878 | 0.371528 |
| 8 | 0.724673 | 0.0632807 | 0.345225 | 0.935403 |
| 9 | 0.905482 | 0.8465 | 0.585653 | 0.364495 |
+----+-----------+------------+-----------+------------+
I've tried transforming my old pandas dataframe in to a dict this way:
dict = {"A": df[["price_a","amount_a"]], "B":df[["price_b", "amount_b"]]}
df = pd.DataFrame(dict, index=df.index)
But I had no success, how can I do that?
Try renaming columns manually:
df.columns=pd.MultiIndex.from_tuples([x.split('_')[::-1] for x in df.columns])
df.index.name='id'
Output:
A B b
price amount price amount
id
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495
You can split the column names on the underscore and convert to a tuple. Once you map each split column name to a tuple, pandas will convert the Index to a MultiIndex for you. From there we just need to call swaplevel to get the letter level to come first and reassign to the dataframe.
note: in my input dataframe I replaced the column name "amount_b" with "amount_B" because it lined up with your expected output so I assumed it was a typo
df.columns = df.columns.str.split("_", expand=True).swaplevel()
print(df)
A B
price amount price amount
0 0.652826 0.941421 0.823048 0.728427
1 0.400078 0.600585 0.194912 0.269842
2 0.223524 0.146675 0.375459 0.177165
3 0.330626 0.214981 0.389855 0.541666
4 0.578132 0.304780 0.789573 0.268851
5 0.094360 0.514878 0.419333 0.017010
6 0.279122 0.401132 0.722363 0.337094
7 0.444977 0.333254 0.643878 0.371528
8 0.724673 0.063281 0.345225 0.935403
9 0.905482 0.846500 0.585653 0.364495
I was asking the question in this link SUMIFS in python jupyter
However, I just realized that the solution didn't work because they can switch in and switch out on different dates. So basically they have to switch out first before they can switch in.
Here is the dataframe (sorted based on the date):
+---------------+--------+---------+-----------+--------+
| Switch In/Out | Client | Quality | Date | Amount |
+---------------+--------+---------+-----------+--------+
| Out | 1 | B | 15-Aug-19 | 360 |
| In | 1 | A | 16-Aug-19 | 180 |
| In | 1 | B | 17-Aug-19 | 180 |
| Out | 1 | A | 18-Aug-19 | 140 |
| In | 1 | B | 18-Aug-19 | 80 |
| In | 1 | A | 19-Aug-19 | 60 |
| Out | 2 | B | 14-Aug-19 | 45 |
| Out | 2 | C | 15-Aug-20 | 85 |
| In | 2 | C | 15-Aug-20 | 130 |
| Out | 2 | A | 20-Aug-19 | 100 |
| In | 2 | A | 22-Aug-19 | 30 |
| In | 2 | B | 23-Aug-19 | 30 |
| In | 2 | C | 23-Aug-19 | 40 |
+---------------+--------+---------+-----------+--------+
I would then create a new column and divide them into different transactions.
+---------------+--------+---------+-----------+--------+------+
| Switch In/Out | Client | Quality | Date | Amount | Rows |
+---------------+--------+---------+-----------+--------+------+
| Out | 1 | B | 15-Aug-19 | 360 | 1 |
| In | 1 | A | 16-Aug-19 | 180 | 1 |
| In | 1 | B | 17-Aug-19 | 180 | 1 |
| Out | 1 | A | 18-Aug-19 | 140 | 2 |
| In | 1 | B | 18-Aug-19 | 80 | 2 |
| In | 1 | A | 19-Aug-19 | 60 | 2 |
| Out | 2 | B | 14-Aug-19 | 45 | 3 |
| Out | 2 | C | 15-Aug-20 | 85 | 3 |
| In | 2 | C | 15-Aug-20 | 130 | 3 |
| Out | 2 | A | 20-Aug-19 | 100 | 4 |
| In | 2 | A | 22-Aug-19 | 30 | 4 |
| In | 2 | B | 23-Aug-19 | 30 | 4 |
| In | 2 | C | 23-Aug-19 | 40 | 4 |
+---------------+--------+---------+-----------+--------+------+
With this, I can apply the pivot formula and take it from there.
However, how do I do this in python? In excel, I can just use multiple SUMIFS and compare in and out. However, this is not possible in python.
Thank you!
One simple solution is to iterate and apply a check (function) over each element being the result a new column, so: map.
Using df.index.map we get the index for each item to pass as a argument, so we can play with the values, get and compare. In your case your aim is to identify the change to "Out" keeping a counter.
import pandas as pd
switchInOut = ["Out", "In", "In", "Out", "In", "In",
"Out", "Out", "In", "Out", "In", "In", "In"]
df = pd.DataFrame(switchInOut, columns=['Switch In/Out'])
counter = 1
def changeToOut(i):
global counter
if df["Switch In/Out"].get(i) == "Out" and df["Switch In/Out"].get(i-1) == "In":
counter += 1
return counter
rows = df.index.map(changeToOut)
df["Rows"] = rows
df
Result:
+----+-----------------+--------+
| | Switch In/Out | Rows |
|----+-----------------+--------|
| 0 | Out | 1 |
| 1 | In | 1 |
| 2 | In | 1 |
| 3 | Out | 2 |
| 4 | In | 2 |
| 5 | In | 2 |
| 6 | Out | 3 |
| 7 | Out | 3 |
| 8 | In | 3 |
| 9 | Out | 4 |
| 10 | In | 4 |
| 11 | In | 4 |
| 12 | In | 4 |
+----+-----------------+--------+
I'm very new to using Pandas DataFrames and was seeking help on creating a new column which is the unsigned minimum less than zero of an exiting column, something like...
df["B"] = abs(min(0, df["A"]))
...in other words, if A is greater than 0 then B equals 0,
otherwise B equals -A
Use pandas.DataFrame.apply
df['B'] = df.apply(lambda x:x['B'] if x['A'] > 0 else -x['A'], axis=1)
I guess this is how your data looks like
+---+-----+
| | A |
+---+-----+
| 0 | -10 |
| 1 | 20 |
| 2 | 50 |
| 3 | -25 |
| 4 | 8 |
+---+-----+
import numpy as np
df['B'] = np.where(df['A'] > 0, 0,-df['A'])
This will give the following result
+---+-----+----+
| | A | B |
+---+-----+----+
| 0 | -10 | 10 |
| 1 | 20 | 0 |
| 2 | 50 | 0 |
| 3 | -25 | 25 |
| 4 | 8 | 0 |
+---+-----+----+
Now I have a table something like the below table:
esn_missing_in_DF_umts
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| cell_name | n_cell_name | source_vendor | target_vendor | source_rnc | target_rnc |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 1 | 8 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 2 | 5 | x | x | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 3 | 6 | x | x | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 4 | 9 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 5 | 10 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 6 | 11 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 7 | 12 | x | y | | |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
Now I have two columns are empty in sqlServer or dataframe the source_rnc and the target_rnc:
Here's the other two tables I want to update the two columns from
esn_umts_intra_sho
|---------------------|------------------|------------------|
| ucell | urelation | ucell_rnc |
|---------------------|------------------|------------------|
| 13 | 5 | abc567 |
|---------------------|------------------|------------------|
| 8 | 6 | abc568 |
|---------------------|------------------|------------------|
| 14 | 8 | abc569 |
|---------------------|------------------|------------------|
| 7 | 9 | abc570 |
|---------------------|------------------|------------------|
| 16 | 10 | abc571 |
|---------------------|------------------|------------------|
| 5 | 11 | abc572 |
|---------------------|------------------|------------------|
| 17 | 12 | abc573 |
|---------------------|------------------|------------------|
| 10 | 9 | abc574 |
|---------------------|------------------|------------------|
| 9 | 17 | abc575 |
|---------------------|------------------|------------------|
| 12 | 11 | abc576 |
|---------------------|------------------|------------------|
| 11 | 12 | abc577 |
|---------------------|------------------|------------------|
df_umts_carrier
|---------------------|------------------|
| cell_name_umts | rnc |
|---------------------|------------------|
| 1 | xyz123 |
|---------------------|------------------|
| 2 | xyz124 |
|---------------------|------------------|
| 3 | xyz125 |
|---------------------|------------------|
| 4 | xyz126 |
|---------------------|------------------|
| 5 | xyz127 |
|---------------------|------------------|
| 6 | xyz128 |
|---------------------|------------------|
| 7 | xyz129 |
|---------------------|------------------|
So Not I want to update the source_rnc and target_rnc through those two tables esn_umts_intra_sho and df_umts_carrier
So I imagine that the query could be like this
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = CASE WHEN [toolDB].[dbo].[esn_missing_in_DF_umts].[target_vendor] = 'HUA' THEN [toolDB].[dbo].[df_umts_carrier].[rnc]
FROM [toolDB].[dbo].[esn_missing_in_DF_umts]
INNER JOIN [toolDB].[dbo].[df_umts_carrier]
ON [n_cell_name] = [cell_name_umts]
ELSE
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = [toolDB].[dbo].[esn_umts_intra_sho].[ucell_rnc]
From [toolDB].[dbo].[esn_missing_in_DF_umts] INNER JOIN [toolDB].[dbo].[esn_umts_intra_sho]
ON [n_cell_name] = [ucell]
I want the final output to be somthing like this:
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| cell_name | n_cell_name | source_vendor | target_vendor | source_rnc | target_rnc |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 1 | 8 | x | y | xyz123 | abc568 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 2 | 5 | x | x | xyz124 | xyz127 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 3 | 6 | x | x | xyz125 | xyz128 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 4 | 9 | x | y | xyz126 | abc575 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 5 | 10 | x | y | xyz127 | abc574 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 6 | 11 | x | y | xyz128 | abc576 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
| 7 | 12 | x | y | xyz129 | abc577 |
|---------------------|------------------|---------------------|------------------|------------------|------------------|
I tried even with pandas but doesn't work...
I wish someone help me.
The best thing is to make the query as if you were writing a SELECT statement with the Case clause in it. Once it works as expected, you can amend it for your update.
So in this example, if the main tables Column = bla, then get the data from the first joined table, else the other table.
Quick amendment Make sure its all rows you are happy to update, else remember to put in a where statement. That's why its best to work out your logic in a SELECT and move on from there.
I think you want something like this:
UPDATE [toolDB].[dbo].[esn_missing_in_DF_umts]
SET [toolDB].[dbo].[esn_missing_in_DF_umts].[target_rnc] = (CASE WHEN UMT.target_vendor = 'HUA' THEN carrier.rnc ELSE SHO.ucell_rnc END )
FROM [toolDB].[dbo].[esn_missing_in_DF_umts] UMT
LEFT JOIN [toolDB].[dbo].[df_umts_carrier] carrier ON UMT.n_cell_name = carrier.cell_name_umts
LEFT JOIN [toolDB].[dbo].[esn_umts_intra_sho] SHO ON UMT.n_cell_name = SHO.ucell
I have the following sheetinfo model with the following data:
| Trav | Group | Subgroup | Sheet_num | T_val |
| SAT123A01 | SAT123 | A | 1 | 400 |
| SAT123A02 | SAT123 | A | 2 | 0 |
| SAT123A03 | SAT123 | A | 3 | 0 |
| SAT123A04 | SAT123 | A | 4 | 0 |
| SAT123A05 | SAT123 | A | 5 | 500 |
| SAT123B05 | SAT123 | B | 5 | 400 |
| SAT123B04 | SAT123 | B | 4 | 0 |
| SAT123B03 | SAT123 | B | 3 | 0 |
| SAT123B02 | SAT123 | B | 2 | 500 |
| SAT124A01 | SAT124 | A | 1 | 400 |
| SAT124A02 | SAT124 | A | 2 | 0 |
| SAT124A03 | SAT124 | A | 3 | 0 |
| SAT124A04 | SAT124 | A | 4 | 475 |
I would like to interpolate and update the table with the correct T_val.
Formula is:
new_t_val = delta / (cnt -1) * sheet_num + min_tvc_of_subgroup
For instance the top 5:
| Trav | Group | Subgroup | Sheet_num | T_val |
| SAT123A01 | SAT123 | A | 1 | 400 |
| SAT123A02 | SAT123 | A | 2 | 425 |
| SAT123A03 | SAT123 | A | 3 | 450 |
| SAT123A04 | SAT123 | A | 4 | 475 |
| SAT123A05 | SAT123 | A | 5 | 500 |
I have a django query that works to update the data, however it is SLOW and stops after a while (due to type errors etc.)
My question is there a way to accomplish this in SQL?
The ability to do this as one database call doesn't exist in stock Django. 3rd party packages exist though: https://github.com/aykut/django-bulk-update
Example of how that package works:
rows = Model.objects.all()
for row in rows:
# Modify rows as appropriate
row.T_val = delta / (cnt -1) * row.sheet_num + min_tvc_of_subgroup
Model.objects.bulk_update(rows)
For datasets up to the 1,000,000 range, this should have reasonable performance. Most of the bottleneck in iterating through and .save()-ing each object is the overhead on a database call. The python part is reasonably fast. The above example has only two database calls so it will be perhaps an order of magnitude or two faster.