Related
there is a series like below
s = pd.Series([25, 33, 39])
0 25
1 33
2 39
dtype: int64
and a list
list = [5, 10, 20, 26, 30, 31, 32, 35, 40]
[5, 10, 20, 26, 30, 31, 32, 35, 40]
I'd like to find the nearest number in the list and **change the number **in the series
for example
first number is the series is 25
but the list is [5, 10, 20, 26, 30, 31, 32, 35, 40]
so the firtst nearest number(corresponding to 25 in the series)
is 20 (Actually 26 is nearest number, but I need a number less than 25)
and then the second number is 31, thrid is 35
after finding the number and change that in the series
desired out s is
0 20
1 31
2 35
please give me advice. It's a very important task for me.
if possilbe? without for loop plz
Find the nearest number(but not exceed) in the list and change numbers in the series(Python)
You are looking for merge_asof:
s = pd.Series([25, 33, 39], name="s")
l = pd.Series([5, 10, 20, 26, 30, 31, 32, 35, 40], name="l")
pd.merge_asof(s, l, left_on="s", right_on="l")
A few notes:
There is a bug in your expected output. The closest number to 33 is 32.
Don't name your variable list. It overwrites the name of a very common Python class.
Make sure l is sorted.
I was trying to understand numpy masks better and decided to try a simple fizzbuzz exercise (since np arrays are homogenous, 9993 is "fizz", 9995 = "buzz", 9998 = "fizzbuzz"). However, I noticed behavior I cannot understand and was hoping that someone could explain.
In the first case, I created my masks like that:
In:
a = np.arange(32)
a[(a % 3 == 0) & (a % 5 == 0)] = 9998
a
Out:
array([9998, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 9998, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 9998, 31])
In:
a[a % 3 == 0] = 9993
a[a % 5 == 0] = 9995
a
Out:
array([9998, 1, 2, 9993, 4, 9995, 9993, 7, 8, 9993, 9995,
11, 9993, 13, 14, 9998, 16, 17, 9993, 19, 9995, 9993,
22, 23, 9993, 9995, 26, 9993, 28, 29, 9998, 31])
Notice that 9998 has not been overwritten by the subsequent steps, as expected (it divides by neither 3 nor 5). So far so good. However, then I tried to be clever and name my masks:
In:
a = np.arange(32)
fizz = (a % 3 == 0)
buzz = (a % 5 == 0)
fizzbuzz = fizz & buzz
a[fizzbuzz] = 9998
a
Out:
array([9998, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 13, 14, 9998, 16, 17, 18, 19, 20, 21,
22, 23, 24, 25, 26, 27, 28, 29, 9998, 31])
In:
a[fizz] = 9993
a[buzz] = 9995
a
Out:
array([9995, 1, 2, 9993, 4, 9995, 9993, 7, 8, 9993, 9995,
11, 9993, 13, 14, 9995, 16, 17, 9993, 19, 9995, 9993,
22, 23, 9993, 9995, 26, 9993, 28, 29, 9995, 31])
From what I could grasp, it would appear that at the "fizzbuzz = fizz & buzz" step, I create a mask such that it provides me with a copy of the array when applied over it. This is in contrast to just writing the mask out, which appears to work as intended and modify the array directly (15 & 30 remain 9998 even after the % 3 and % 5 masks are applied).
My question is why does this happen? From my perspective the logic is absolutely the same in both cases. Writing it as "a[fizz & buzz]" instead of "a[fizzbuzz]" did not help.
I think your problem is that, when you generate the array in first step you got a = [0,1,2,3,...,31]. and the comparation that you are doing (first snipet) is with the value in the array and not with de index. so when you do the first replace you got a=[9998,1,2,9998,4,9998,6...] then the next replace you are using the values so the compare 9998%3==0 when you are in index 15 is False and 9998%5==0 is also False
In the seccond case you are using boolean array to acccess a then you're acceding with indexs. In this case doesn't mater the value in that index.
if you want the same behavior in both you can modify when you create fizz and buzz
a = np.arange(32)
fizzbuzz = (a % 3 == 0) & (a % 5 == 0)
a[fizzbuzz] = 9998
print(a)
fizz = (a % 3 == 0)
buzz = (a % 5 == 0)
a[fizz] = 9993
a[buzz] = 9995
print(a)
so the thing is that you are creating fizz and buzz with different arrays y both cases
(sorry for potato english)
So I have an array of 5 integers v and another of 10 integers v.
I have a 5 by 10 matrix P that I would want to fill so that (P)ij = v[i] + u[j]
I tried:
P = np.empty((len(asset_grid),len(asset_grid)))
for i in range(asset_grid):
for j in range(asset_grid):
P[i,j] = asset_grid[i] + asset_grid[j]
but it gives me an error
TypeError: only integer arrays with one element can be converted to an index
How should I be able to do this in Python. I apologize if my approach is too naive, I am used to Matlab and now slowly learning Python. Any help is appreciated.
Broadcasting is what you want to do. Although for small arrays such as yours, it doesn't make a difference, it makes a significant difference with larger arrays:
>>> arr1 = np.arange(5)
>>> arr2 = np.arange(10,20)
>>> arr1[:,None] + arr2
array([[10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
[11, 12, 13, 14, 15, 16, 17, 18, 19, 20],
[12, 13, 14, 15, 16, 17, 18, 19, 20, 21],
[13, 14, 15, 16, 17, 18, 19, 20, 21, 22],
[14, 15, 16, 17, 18, 19, 20, 21, 22, 23]])
Generally with numpy you want to avoid iteration over rows and columns and use vectorized/broadcasted operations. This is where speed improvements actually come from.
So, elaborating based on your comment:
Say P_ij is ith element of x raised to the 4th power minus jth element of y raised to 2nd power
In general, Python supports most arithmetical operations you would want in a vectorized way, using the usual Python operators:
>>> arr1[:, None]**4 - arr2**2
array([[-100, -121, -144, -169, -196, -225, -256, -289, -324, -361],
[ -99, -120, -143, -168, -195, -224, -255, -288, -323, -360],
[ -84, -105, -128, -153, -180, -209, -240, -273, -308, -345],
[ -19, -40, -63, -88, -115, -144, -175, -208, -243, -280],
[ 156, 135, 112, 87, 60, 31, 0, -33, -68, -105]])
Code Start
from collections import Counter
import numpy as np
List = [[7,12,17,26,29,31],\
[4,9,11,17,26,27],\
[5,6,8,21,31,33],\
[3,17,21,23,27,28],\
[4,10,18,19,25,27],\
[5,8,13,19,27,28],\
[15,16,21,22,27,33],\
[11,12,13,14,18,33],\
[2,8,10,18,20,33],\
[2,7,10,20,27,29],\
]
for i in List:
print(i, List.count(i), (List.count(i)/len(List)))
Code End
Result
[7, 12, 17, 26, 29, 31] 1 0.1
[4, 9, 11, 17, 26, 27] 1 0.1
[5, 6, 8, 21, 31, 33] 1 0.1
[3, 17, 21, 23, 27, 28] 1 0.1
[4, 10, 18, 19, 25, 27] 1 0.1
[5, 8, 13, 19, 27, 28] 1 0.1
[15, 16, 21, 22, 27, 33] 1 0.1
[11, 12, 13, 14, 18, 33] 1 0.1
[2, 8, 10, 18, 20, 33] 1 0.1
[2, 7, 10, 20, 27, 29] 1 0.1
Question
How can I get the result like this? Count occurrence of every element by a single line.
2 2 0.2
3 1 0.1
4 2 0.2
5 1 0.1
...
33 4 0.4
I tried many different ways but always receive the same result.
As I am a newbe to Python, I hope someone can help me figure this out.
BTW, if there is any book explain openpyxl, list manipulation and computational science clearly, please help to recommend.
Thank you very much in advance.
What you are doing right now is counting the occurences of each line (as a list) within the numpy array. Since they all appear only once you get a count of 1.
First we have to get all the numbers that appear in the array:
unique = np.unique(List)
Then we can loop over the rows and count how often they appear:
counts = {u:0 for u in unique}
List = np.asarray(List)
for i in unique:
for row in List:
if i in row:
counts[i]+=1
Lastly, if you want to print the results:
for k,v in counts.items():
print(k,v,v/len(List))
I want to get rid of the first "pts", and 2 as the key of loc and pts and imsize, loc, pts, imsize are the key of their values.
This is my list:
test = [{'pts': u"""{"2": {"loc": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], "pts": [[
238.12358231315642, 253.66068427891196], [458.64277680287756,
241.96368624623565], [697.01528407931528, 227.18853083653903],
[958.16615594570135, 201.82451404989325], [1246.281686434784,
203.42515588594387], [1548.4572965158027, 241.5523054071067],
[1892.7592776185272, 342.33495115591734], [2254.5289081772476,
445.98514873992008], [2656.9656149224697, 571.79649071207928],
[2971.1562661999892, 867.70244034080304], [3068.3911866286853,
1286.0266095582174], [2929.8340389691793, 1672.0031179683222],
[2613.8132245402226, 1903.4008185146297], [2238.0791358590532,
1946.1114436655755], [1891.3179056028878, 1862.0534199001079],
[1575.3878471688531, 1818.865481764926], [1287.8256402921395,
1766.8583248583836], [1026.4040596301347, 1702.4873909751091],
[783.93932060128668, 1640.5323348318664], [560.42180223554942,
1588.6583330557301], [354.57960965335764, 1540.1880782707833],
[164.40489058630092, 1498.9624158157519]], "imsize": [3264, 2448]},
"43": {"loc": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
"pts": [[986.9522062723704, 697.0146815449186], [1178.2569415526625,
664.0692929800059], [1360.425560676298, 662.1313289467757],
[1526.8136155293448, 681.7878212838245], [1683.2349982114938,
697.2915335496658], [1827.4748926847676, 710.8572817822769],
[1962.0249669918903, 720.2702499436805], [2086.054665118621,
725.8072900386238], [2203.7167671361667, 730.7906261240727],
[2313.903865025539, 730.7906261240728], [2417.1696627962324,
733.2822941667973], [2513.2373084434994, 760.4137906320195],
[2603.7679139958227, 795.2971432301624], [2689.5920354674445,
829.0730878093167], [2769.3254128346284, 857.0351402887804],
[2840.4763780546505, 917.1120253189156], [2882.55788277622,
1023.4231951418275]], "imsize": [3264, 2448]},
"47": {"loc": [34, 35, 36], "pts": [[1393.0609259457722,
1700.979369842461], [1193.0180580859501, 1746.2349694566501],
[957.55776444111029, 1801.984621155289]],
"imsize": [3264, 2448]}}"""}]
I tried this:
test = test[0]
a = test[0].pts
print test
print a #not print a
If you want to change the value of points into a dict that you can parse try:
points = eval(test[0]['pts'])
this will make points equal to:
{'47': {'loc': [34, 35, 36],
'pts': [[1393.0609259457722, 1700.979369842461], [1193.01805808595, 1746.23496945665], [957.5577644411103, 1801.984621155289]],
'imsize': [3264, 2448]},
'2': {'loc': [11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32],
'pts': [[238.12358231315642, 253.66068427891196], [458.64277680287756, 241.96368624623565], [697.0152840793153, 227.18853083653903], [958.1661559457013, 201.82451404989325], [1246.281686434784, 203.42515588594387], [1548.4572965158027, 241.5523054071067], [1892.7592776185272, 342.33495115591734], [2254.5289081772476, 445.9851487399201], [2656.9656149224697, 571.7964907120793], [2971.156266199989, 867.702440340803], [3068.3911866286853, 1286.0266095582174], [2929.8340389691793, 1672.0031179683222], [2613.8132245402226, 1903.4008185146297], [2238.079135859053, 1946.1114436655755], [1891.3179056028878, 1862.0534199001079], [1575.387847168853, 1818.865481764926], [1287.8256402921395, 1766.8583248583836], [1026.4040596301347, 1702.487390975109], [783.9393206012867, 1640.5323348318664], [560.4218022355494, 1588.65833305573], [354.57960965335764, 1540.1880782707833], [164.40489058630092, 1498.962415815752]],
'imsize': [3264, 2448]},
'43': {'loc': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
'pts': [[986.9522062723704, 697.0146815449186], [1178.2569415526625, 664.0692929800059],[1360.425560676298, 662.1313289467757], [1526.8136155293448, 681.7878212838245],[1683.2349982114938, 697.2915335496658], [1827.4748926847676, 710.8572817822769],[1962.0249669918903, 720.2702499436805], [2086.054665118621, 725.8072900386238],[2203.7167671361667, 730.7906261240727], [2313.903865025539, 730.7906261240728],[2417.1696627962324, 733.2822941667973], [2513.2373084434994, 760.4137906320195],[2603.7679139958227, 795.2971432301624], [2689.5920354674445, 829.0730878093167],[2769.3254128346284, 857.0351402887804], [2840.4763780546505, 917.1120253189156],[2882.55788277622, 1023.4231951418275]],
'imsize': [3264, 2448]}
}
You can then get each of those dicts by the keys points['47'],points['2'], or points['43'].
Instead try
print test[0]['pts']
as you need to use the key here.
I think it's a little unclear what you're asking.
It looks like you have a list of dictionaries, and you only care about the first element of the list, whose key is pts. Then you want to do something with the dictionary key. The dictionary key appears ti be a JSON string, so you'll need to decode it. Try the following, to get started:
from pprint import pprint
import json
test = [{'pts': u"""{"2": {"loc": [11, 12, 13, 14, 15, 16, 17, 18, 19, 20,
21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32], "pts": [[
238.12358231315642, 253.66068427891196], [458.64277680287756,
241.96368624623565], [697.01528407931528, 227.18853083653903],
[958.16615594570135, 201.82451404989325], [1246.281686434784,
203.42515588594387], [1548.4572965158027, 241.5523054071067],
[1892.7592776185272, 342.33495115591734], [2254.5289081772476,
445.98514873992008], [2656.9656149224697, 571.79649071207928],
[2971.1562661999892, 867.70244034080304], [3068.3911866286853,
1286.0266095582174], [2929.8340389691793, 1672.0031179683222],
[2613.8132245402226, 1903.4008185146297], [2238.0791358590532,
1946.1114436655755], [1891.3179056028878, 1862.0534199001079],
[1575.3878471688531, 1818.865481764926], [1287.8256402921395,
1766.8583248583836], [1026.4040596301347, 1702.4873909751091],
[783.93932060128668, 1640.5323348318664], [560.42180223554942,
1588.6583330557301], [354.57960965335764, 1540.1880782707833],
[164.40489058630092, 1498.9624158157519]], "imsize": [3264, 2448]},
"43": {"loc": [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17],
"pts": [[986.9522062723704, 697.0146815449186], [1178.2569415526625,
664.0692929800059], [1360.425560676298, 662.1313289467757],
[1526.8136155293448, 681.7878212838245], [1683.2349982114938,
697.2915335496658], [1827.4748926847676, 710.8572817822769],
[1962.0249669918903, 720.2702499436805], [2086.054665118621,
725.8072900386238], [2203.7167671361667, 730.7906261240727],
[2313.903865025539, 730.7906261240728], [2417.1696627962324,
733.2822941667973], [2513.2373084434994, 760.4137906320195],
[2603.7679139958227, 795.2971432301624], [2689.5920354674445,
829.0730878093167], [2769.3254128346284, 857.0351402887804],
[2840.4763780546505, 917.1120253189156], [2882.55788277622,
1023.4231951418275]], "imsize": [3264, 2448]},
"47": {"loc": [34, 35, 36], "pts": [[1393.0609259457722,
1700.979369842461], [1193.0180580859501, 1746.2349694566501],
[957.55776444111029, 1801.984621155289]],
"imsize": [3264, 2448]}}"""}]
# Print the original input (it's a list of dictionaries that map keys to JSON strings)
pprint(test)
pts = json.loads(test[0]['pts'])
# Print the decoded JSON.
pprint(pts)