plotting rolling_mean (pandas) not working

plotting rolling_mean (pandas) not working - python

I know there's a bunch of questions on here regarding the use of the rolling_mean function in pandas - but I can't get it to work.
I'm new to Python, and the packages numpy,pandas etc.
I have a list of DateTime objects and I have a list of simple integers. Plotting them works without a problem. But I can't plot a moving average line to the graph!
I'm trying to use the Pandas docs to understand it but I still can't get it to work. This is what I have tried:
# code before this simply reads a file and converts dates into datetime objects and values into a list
x1 = date_object #list of datetime objects
y1 = values1 #list of integer values
plot(x1,y1) # works fine
t = pd.date_range(date_object[0].strftime('%m/%d/%Y'), date_object[len(date_object)-1].strftime('%m/%d/%Y'), freq='W')
ts = pd.Series(y1, t)
ts_movavg = pd.rolling_mean(ts,10)
plot(ts_movavg)
..I get the following error:
ValueError: setting an array element with a sequence.
As you can probably quickly tell, I'm very confused. I think I'm missing the point of the Series object.
EDIT: (full traceback)
ValueError Traceback (most recent call last)
<ipython-input-228-2247062d3126> in <module>()
33 ts = pd.Series(y1, x1)
34
---> 35 ts_movavg = PD.rolling_mean(ts,10)
36
37 ts_movavg.head()
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in f(arg, window, min_periods, freq, center, time_rule, **kwargs)
507 return _rolling_moment(arg, window, call_cython, min_periods,
508 freq=freq, center=center,
--> 509 time_rule=time_rule, **kwargs)
510
511 return f
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in _rolling_moment(arg, window, func, minp, axis, freq, center, time_rule, **kwargs)
278 arg = _conv_timerule(arg, freq, time_rule)
279 calc = lambda x: func(x, window, minp=minp, **kwargs)
--> 280 return_hook, values = _process_data_structure(arg)
281 # actually calculate the moment. Faster way to do this?
282 if values.ndim > 1:
C:\Users\****\Anaconda\lib\site-packages\pandas\stats\moments.py in _process_data_structure(arg, kill_inf)
326
327 if not issubclass(values.dtype.type, float):
--> 328 values = values.astype(float)
329
330 if kill_inf:
ValueError: setting an array element with a sequence.
Could someone show me how to plot a moving average line using the rolling_mean function?

Related

linestrings: Input operand 0 does not have enough dimensions (has 1, gufunc core with signature (i, d)->() requires 2)

I am following the sample source code on this link: Simplified detection of urban types
. And after updating GeoPandas (0.11.0) and GDAL (3.5.0) packages, I started getting this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 profile = momepy.StreetProfile(streets, buildings)
File ~/opt/anaconda3/envs/PythonEnv/lib/python3.9/site-packages/momepy/dimension.py:590, in StreetProfile.__init__(self, left, right, heights, distance, tick_length, verbose)
587 ticks.append([line_end_1, pt])
588 ticks.append([line_end_2, pt])
--> 590 ticks = pygeos.linestrings(ticks)
592 inp, res = right.sindex.query_bulk(ticks, predicate="intersects")
593 intersections = pygeos.intersection(ticks[inp], right.geometry.values.data[res])
File ~/opt/anaconda3/envs/PythonEnv/lib/python3.9/site-packages/pygeos/decorators.py:80, in multithreading_enabled.<locals>.wrapped(*args, **kwargs)
78 for arr in array_args:
79 arr.flags.writeable = False
---> 80 return func(*args, **kwargs)
81 finally:
82 for arr, old_flag in zip(array_args, old_flags):
File ~/opt/anaconda3/envs/PythonEnv/lib/python3.9/site-packages/pygeos/creation.py:119, in linestrings(coords, y, z, indices, out, **kwargs)
117 coords = _xyz_to_coords(coords, y, z)
118 if indices is None:
--> 119 return lib.linestrings(coords, out=out, **kwargs)
120 else:
121 return simple_geometries_1d(coords, indices, GeometryType.LINESTRING, out=out)
ValueError: linestrings: Input operand 0 does not have enough dimensions (has 1, gufunc core with signature (i, d)->() requires 2)
the minimum source code that can regenerate the error:
import geopandas
import libpysal
import momepy
import osmnx
import pandas
place = 'Znojmo, Czechia'
buildings = osmnx.geometries.geometries_from_place(place, tags={'building':True})
osm_graph = osmnx.graph_from_place(place, network_type='drive')
#osm_graph = osmnx.projection.project_graph(osm_graph, to_crs=local_crs)
streets = osmnx.graph_to_gdfs(
osm_graph,
nodes=False,
edges=True,
node_geometry=False,
fill_edge_geometry=True
)
profile = momepy.StreetProfile(streets, buildings)
Does anyone aware of changes in the new packages that would result in this error?

What is 'G' in CVXPY and how to fix it

I'm trying to use a binary integer linear program to assign members of my staff to different shift. I have a 16x9 matrix of preferences for my staff in a csv (16 staff members, 9 slots to fill) and I used the following code to try and assign them:
weights = pd.read_csv("holiday_green day.csv", index_col= 0)
weights = weights.to_numpy().astype(float)
selection = cvx.Variable((9,16), boolean = True)
row_sum_vector = np.ones((16,1)).astype(float)
result_constraint = np.ones((9,1)).astype(float) * 2
objective = cvx.Minimize(cvx.trace(weights # assignments))
prob = cvx.Problem(objective, [assignments # row_sum_vector == result_constraint])
prob.solve()
When I try running this, I get the error TypeError: G must be a 'd' matrix and I don't know where to start debugging. I looked at this post, but it wasn't helpful. Can someone help me figure out what G is and what it means by 'd' matrix? Its my first time actually using CVXPY and I'm very lost.
Full Stack Trace:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-23-d07ad22cbc25> in <module>()
6 objective = cvx.Minimize(cvx.atoms.affine.trace.trace(weights # assignments))
7 prob = cvx.Problem(objective, [assignments # row_sum_vector == result_constraint])
----> 8 prob.solve()
3 frames
/usr/local/lib/python3.7/dist-packages/cvxpy/problems/problem.py in solve(self, *args, **kwargs)
288 else:
289 solve_func = Problem._solve
--> 290 return solve_func(self, *args, **kwargs)
291
292 #classmethod
/usr/local/lib/python3.7/dist-packages/cvxpy/problems/problem.py in _solve(self, solver, warm_start, verbose, parallel, gp, qcp, **kwargs)
570 self._intermediate_problem)
571 solution = self._solving_chain.solve_via_data(
--> 572 self, data, warm_start, verbose, kwargs)
573 full_chain = self._solving_chain.prepend(self._intermediate_chain)
574 inverse_data = self._intermediate_inverse_data + solving_inverse_data
/usr/local/lib/python3.7/dist-packages/cvxpy/reductions/solvers/solving_chain.py in solve_via_data(self, problem, data, warm_start, verbose, solver_opts)
194 """
195 return self.solver.solve_via_data(data, warm_start, verbose,
--> 196 solver_opts, problem._solver_cache)
/usr/local/lib/python3.7/dist-packages/cvxpy/reductions/solvers/conic_solvers/glpk_mi_conif.py in solve_via_data(self, data, warm_start, verbose, solver_opts, solver_cache)
73 data[s.B],
74 set(int(i) for i in data[s.INT_IDX]),
---> 75 set(int(i) for i in data[s.BOOL_IDX]))
76 results_dict = {}
77 results_dict["status"] = results_tup[0]
TypeError: G must be a 'd' matrix
Edit: Tried casting all numpy arrays as float like they suggested in a different post. It didn't work.

How to solve Python OSMNX error message: TypeError: unhashable type: 'dict'

My previous script using OSMNX from Geoff boeing is not working anymore since I did Conda update. It used to run before. This is the bare part of the script that gives the error message
import osmnx as ox
import pandas as pd
import geopandas as gpd
import networkx as nx
import numpy as np
# Set place and language; It must return a POLYGON/POLYLINE, not a POINT, so you might have to play with it a little, or set which_result below accordingly
place='ALmere, Netherlands'
# note the which_result parameter, as per comment above. Default which_result=1. For places like Utrecht changing it gives a different result
G = ox.graph_from_place(place, network_type='all', which_result=1)
# For the colouring, we take the attributes from each edge found extract the road name, and use the function above to create the colour array
edge_attributes = ox.graph_to_gdfs(G, nodes=False)
Gives error message:
TypeError: unhashable type: 'dict'
I seem to have the last version of Osmnx (conda list shows 0.16.1). I did find this question, but can't translate that to my code: TypeError: unhashable type: 'dict' in Networkx random walk code that was previously working
And this one: https://github.com/gboeing/osmnx/issues/372. My Python version is 3.8.5
Traceback below:
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-1-6a868604da3b> in <module>
10
11 # note the which_result parameter, as per comment above. Default which_result=1. For places like Utrecht changing it gives a different result
---> 12 G = ox.graph_from_place(place, network_type='all', which_result=1)
13
14 # For the colouring, we take the attributes from each edge found extract the road name, and use the function above to create the colour array
~\Anaconda3\lib\site-packages\osmnx\core.py in graph_from_place(query, network_type, simplify, retain_all, truncate_by_edge, name, which_result, buffer_dist, timeout, memory, max_query_area_size, clean_periphery, infrastructure, custom_filter)
1443 max_query_area_size=max_query_area_size,
1444 clean_periphery=clean_periphery, infrastructure=infrastructure,
-> 1445 custom_filter=custom_filter)
1446
1447 log('graph_from_place() returning graph with {:,} nodes and {:,} edges'.format(len(list(G.nodes())), len(list(G.edges()))))
~\Anaconda3\lib\site-packages\osmnx\core.py in graph_from_polygon(polygon, network_type, simplify, retain_all, truncate_by_edge, name, timeout, memory, max_query_area_size, clean_periphery, infrastructure, custom_filter)
1319 G_buffered = create_graph(response_jsons, name=name, retain_all=True,
1320 bidirectional=network_type in settings.bidirectional_network_types)
-> 1321 G_buffered = truncate_graph_polygon(G_buffered, polygon_buffered, retain_all=True, truncate_by_edge=truncate_by_edge)
1322
1323 # simplify the graph topology
~\Anaconda3\lib\site-packages\osmnx\core.py in truncate_graph_polygon(G, polygon, retain_all, truncate_by_edge, quadrat_width, min_num, buffer_amount)
731
732 # find all the nodes in the graph that lie outside the polygon
--> 733 points_within_geometry = intersect_index_quadrats(gdf_nodes, polygon, quadrat_width=quadrat_width, min_num=min_num, buffer_amount=buffer_amount)
734 nodes_outside_polygon = gdf_nodes[~gdf_nodes.index.isin(points_within_geometry.index)]
735
~\Anaconda3\lib\site-packages\osmnx\core.py in intersect_index_quadrats(gdf, geometry, quadrat_width, min_num, buffer_amount)
678 # drop duplicate points, if buffered poly caused an overlap on point(s)
679 # that lay directly on a quadrat line
--> 680 points_within_geometry = points_within_geometry.drop_duplicates(subset='node')
681 else:
682 # after simplifying the graph, and given the requested network type,
~\Anaconda3\lib\site-packages\pandas\core\frame.py in drop_duplicates(self, subset, keep, inplace, ignore_index)
5106
5107 inplace = validate_bool_kwarg(inplace, "inplace")
-> 5108 duplicated = self.duplicated(subset, keep=keep)
5109
5110 result = self[-duplicated]
~\Anaconda3\lib\site-packages\pandas\core\frame.py in duplicated(self, subset, keep)
5245
5246 vals = (col.values for name, col in self.items() if name in subset)
-> 5247 labels, shape = map(list, zip(*map(f, vals)))
5248
5249 ids = get_group_index(labels, shape, sort=False, xnull=False)
~\Anaconda3\lib\site-packages\pandas\core\frame.py in f(vals)
5220 def f(vals):
5221 labels, shape = algorithms.factorize(
-> 5222 vals, size_hint=min(len(self), _SIZE_HINT_LIMIT)
5223 )
5224 return labels.astype("i8", copy=False), len(shape)
~\Anaconda3\lib\site-packages\pandas\core\algorithms.py in factorize(values, sort, na_sentinel, size_hint)
676
677 codes, uniques = _factorize_array(
--> 678 values, na_sentinel=na_sentinel, size_hint=size_hint, na_value=na_value
679 )
680
~\Anaconda3\lib\site-packages\pandas\core\algorithms.py in _factorize_array(values, na_sentinel, size_hint, na_value, mask)
499 table = hash_klass(size_hint or len(values))
500 uniques, codes = table.factorize(
--> 501 values, na_sentinel=na_sentinel, na_value=na_value, mask=mask
502 )
503
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.factorize()
pandas\_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
TypeError: unhashable type: 'dict'

Shapes not aligned in Python AutoImpute data imputation package?

I'm trying to use the (relatively new) Python AutoImpute package, but I keep getting a shape mismatch error when trying to use a particular column as a predictor.
This is what my pandas dataframe looks like
I can impute using the 'sex', 'group', and 'binned_age' columns, but not using the 'experiment' column. When I try doing that, I get this error:
ValueError: shapes (9,) and (4,13) not aligned: 9 (dim 0) != 4 (dim 0)
This is my code for actually fitting and running the imputer:
cat_predictors = ['experiment', 'sex', 'group', 'binned_age']
si = SingleImputer(
strategy={'FSIQ': 'default predictive'},
predictors={'FSIQ': cat_predictors},
)
imputed_data = si.fit_transform(df2)
In trying to diagnose the problem, I found out that if I reduce the number of unique strings in the 'experiment' column to 3 or fewer, my problem goes away for some reason. But, I don't want to do that and lose some of my data. Any help?
Full trace below:
ValueError Traceback (most recent call last)
<ipython-input-11-3d4388ba92e4> in <module>
1 si = SingleImputer(
2 strategy={'FSIQ': 'pmm'}, imp_kwgs={'pmm': {'tune': 10000, 'sample':10000}})
----> 3 data_imputed_once = si.fit_transform(df2)
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/dataframe/single_imputer.py in fit_transform(self, X, y)
288 X (pd.DataFrame): imputed in place or copy of original.
289 """
--> 290 return self.fit(X, y).transform(X)
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
59 err = f"Neither {d_err} nor {a_err} are of type pd.DataFrame"
60 raise TypeError(err)
---> 61 return func(d, *args, **kwargs)
62 return wrapper
63
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
124
125 # return func if no missingness violations detected, then return wrap
--> 126 return func(d, *args, **kwargs)
127 return wrapper
128
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/utils/checks.py in wrapper(d, *args, **kwargs)
171 err = f"All values missing in column(s) {nc}. Should be removed."
172 raise ValueError(err)
--> 173 return func(d, *args, **kwargs)
174 return wrapper
175
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/dataframe/single_imputer.py in transform(self, X, imp_ixs)
274
275 # perform imputation given the specified imputer and value for x_
--> 276 X.loc[imp_ix, column] = imputer.impute(x_)
277 return X
278
/om/user/agupta81/anaconda/envs/myenv/lib/python3.8/site-packages/autoimpute/imputations/series/pmm.py in impute(self, X)
187 # imputed values are actual y vals corresponding to nearest neighbors
188 # therefore, this is a form of "hot-deck" imputation
--> 189 y_pred_bayes = alpha_bayes + beta_bayes.dot(X.T)
190 n_ = self.neighbors
191 if X.columns.size == 1:
ValueError: shapes (9,) and (4,13) not aligned: 9 (dim 0) != 4 (dim 0)

Histograms plots over multiple Time Series

I have a pandas data set where I group the data by day. I would like to take this data and plot histrograms for each day on the same plot but offset to the day in which the data occured. I researched this item and someone stated that you need to use pcolor, which is a nice alternative
Here is a link to some example data..
http://pastebin.com/rKzj5Qzf
I attempted to use the lambda function in the below post,which create a Series. Pcolor does not like this series and says it need more that 1 value to unpack..
stackoverflow.com/questions/17050202/plot-timeseries-of-histograms-in-python
Does anyone know what I am doing wrong?
EDIT:
SO the output pasted from the series 'df' in the following
from running the following code snippet:
daily = x1.groupby(x1.date).price
f = lambda x: pd.Series(np.histogram(x, bins=bins)[0], index=bins[:-1])
df=daily.apply(f)
once, I do this i attempt to pass to matplotlib
import matplotlib as plt
plt.pcolor(df.T)
This is where I get the problem. I clearly have 3 items :date,price, count
EDIT:::Traceback
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-2-460b943e4ead> in <module>()
----> 1 plt.pcolor(df.T)
/usr/lib/pymodules/python2.7/matplotlib/pyplot.pyc in pcolor(*args, **kwargs)
2926 ax.hold(hold)
2927 try:
-> 2928 ret = ax.pcolor(*args, **kwargs)
2929 draw_if_interactive()
2930 finally:
/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in pcolor(self, *args, **kwargs)
7543 shading = kwargs.pop('shading', 'flat')
7544
-> 7545 X, Y, C = self._pcolorargs('pcolor', *args, allmatch=False)
7546 Ny, Nx = X.shape
7547
/usr/lib/pymodules/python2.7/matplotlib/axes.pyc in _pcolorargs(funcname, *args, **kw)
7339 if len(args) == 1:
7340 C = args[0]
-> 7341 numRows, numCols = C.shape
7342 if allmatch:
7343 X, Y = np.meshgrid(np.arange(numCols), np.arange(numRows))
ValueError: need more than 1 value to unpack

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

plotting rolling_mean (pandas) not working - python

Related

linestrings: Input operand 0 does not have enough dimensions (has 1, gufunc core with signature (i, d)->() requires 2)

What is 'G' in CVXPY and how to fix it

How to solve Python OSMNX error message: TypeError: unhashable type: 'dict'

Shapes not aligned in Python AutoImpute data imputation package?

Histograms plots over multiple Time Series

Categories

Resources