What is invalid index to scalar variable error in python? - python

I am quite new to python so please bear with me.
Currently, this is my code
import statistics
import pandas as pd
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
from datetime import datetime
df = pd.read_csv(r"/Users/aaronhuang/Documents/Desktop/ffp/exfileCLEAN2.csv", skiprows=[1]) # replace this with wherever the file is.
start_time = datetime.now()
magnitudes = df['Magnitude '].values
times = df['Time '].values
average = statistics.mean(magnitudes)
sd = statistics.stdev(magnitudes)
below = sd*3
i = 0
while(i < len(df['Magnitude '])):
if(abs(df['Magnitude '][i]) <= (average - below)):
print(df['Time '][i])
outlier_indicies=(df['Time '][i])
i += 1
window = 2
num = 1
x = times[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
y = magnitudes[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
plt.plot(x, y)
plt.xlabel('Time (units)')
plt.ylabel('Magnitude (units)')
plt.show()
fig = plt.figure()
It outputs this:
/Users/aaronhuang/.conda/envs/EXTTEst/bin/python "/Users/aaronhuang/PycharmProjects/EXTTEst/Code sandbox.py"
2456116.494
2456116.535
2456116.576
2456116.624
2456116.673
2456116.714
2456116.799
2456123.527
2456166.634
2456570.526
2456595.515
2457485.722
2457497.93
2457500.674
2457566.874
2457567.877
Traceback (most recent call last):
File "/Users/aaronhuang/PycharmProjects/EXTTEst/Code sandbox.py", line 38, in <module>
x = times[outlier_indicies[num]-window:outlier_indicies[num]+window+1]
IndexError: invalid index to scalar variable.
Process finished with exit code 1
How can I solve this error? I would like my code to take the "time" values printed, and graph them to their "magnitude" values. If there are any questions please leave a comment.
Thank you

Can't tell exactly what you are trying to do. But the indexing format you are using should evaluate to something like times[10:20], going from the 10th to the 20th index of times. The problem is that (I'm guessing) the numbers you have in there aren't ints, but possibly timestamps?
Maybe you want something like:
mask = (times > outlier_indicies[num-window]) & (times < outlier_indicies[num+window+1])
x = times[mask]
y = magnitude[mask]
But I'm really just guessing, and obv can't see your data.

Related

Sequence item 0: expected str instance, numpy.int64 found

I keep getting this issue when I try and plot p values. I don't understand what is the sequence item 0. I found a couple of similar questions but I still dont understand what is causing this issue in my code below nor how to fix it.
from statannotations.Annotator import Annotator
cluster_0_wmd = Hub_all_data.loc[(Hub_all_data.Module_ID == 0), "Within_module_degree"].values
cluster_1_wmd = Hub_all_data.loc[(Hub_all_data.Module_ID == 1), "Within_module_degree"].values
cluster_2_wmd = Hub_all_data.loc[(Hub_all_data.Module_ID == 2), "Within_module_degree"].values
with sns.plotting_context('notebook', font_scale=1.4):
# Plot with seaborn
sns.violinplot(**plotting_parameters)
stat_results = [mannwhitneyu(cluster_0_wmd, cluster_1_wmd, alternative="two-sided"),
mannwhitneyu(cluster_0_wmd, cluster_2_wmd, alternative="two-sided"),
mannwhitneyu(cluster_1_wmd, cluster_2_wmd, alternative="two-sided")]
pvalues = [result.pvalue for result in stat_results]
xval = [0,1,2]
plotting_parameters = {
'data': Hub_all_data,
'x': 'Module_ID',
'y': 'Within_module_degree',
'palette': my_col}
pairs = [('cluster_0_wmd', 'cluster_1_wmd'),
('cluster_0_wmd', 'cluster_2_wmd'),
('cluster_1_wmd', 'cluster_2_wmd')]
pairs2 = [(0,1), (0,2), (1,2)]
formatted_pvalues = [f"p={p:.2e}" for p in pvalues]
annotator = Annotator(ax, pairs2, **plotting_parameters)
annotator.set_custom_annotations(formatted_pvalues)
annotator.annotate()
plt.show()
I get the error on literally the annotator.annotate() line. Here is the error line:
runcell(27, '/Users/albitcabanmurillo/N5_nwxwryan.py')
p-value annotation legend:
ns: p <= 1.00e+00
*: 1.00e-02 < p <= 5.00e-02
**: 1.00e-03 < p <= 1.00e-02
***: 1.00e-04 < p <= 1.00e-03
****: p <= 1.00e-04
Traceback (most recent call last):
File "/Users/albitcabanmurillo/N5_nwxwryan.py", line 421, in <module>
annotator.annotate()
File "/Users/albitcabanmurillo/opt/anaconda3/envs/caiman2/lib/python3.7/site-packages/statannotations/Annotator.py", line 222, in annotate
orig_value_lim=orig_value_lim)
File "/Users/albitcabanmurillo/opt/anaconda3/envs/caiman2/lib/python3.7/site-packages/statannotations/Annotator.py", line 506, in _annotate_pair
annotation.print_labels_and_content()
File "/Users/albitcabanmurillo/opt/anaconda3/envs/caiman2/lib/python3.7/site-packages/statannotations/Annotation.py", line 43, in print_labels_and_content
for struct in self.structs])
TypeError: sequence item 0: expected str instance, numpy.int64 found
Here is some code to try to reproduce your issue using some dummy data.
It seems statannotations doesn't like the numeric values in 'Module_ID'. You could try to change them to strings (also using them for pairs2). Depending on when you change them, you might also need to change the numeric values to strings in Hub_all_data.loc[(Hub_all_data.Module_ID == "0"), ...].
Note that in your example code, you used plotting_parameters before assigning a value (I assume you just want to move the assignment to before the call to sns.violinplot). The code also uses an unknown ax as the first parameter to Annotator(...); here I assume ax is the return value of sns.violinplot.
from matplotlib import pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
from statannotations.Annotator import Annotator
# first create some dummy data for testing
Hub_all_data = pd.DataFrame({'Module_ID': np.repeat(np.arange(3), 100),
'Within_module_degree': np.random.randn(300).cumsum()})
# change Module_ID from numeric to string
Hub_all_data['Module_ID'] = Hub_all_data['Module_ID'].astype(str)
plotting_parameters = {'data': Hub_all_data,
'x': 'Module_ID',
'y': 'Within_module_degree'}
with sns.plotting_context('notebook', font_scale=1.4):
ax = sns.violinplot(**plotting_parameters)
# also change these values to strings
pairs2 = [("0", "1"), ("0", "2"), ("1", "2")]
# just use some dummy data, as we don't have the original data nor the functions
pvalues = [0.001, 0.002, 0.03]
formatted_pvalues = [f"p={p:.2e}" for p in pvalues]
annotator = Annotator(ax, pairs2, **plotting_parameters)
annotator.set_custom_annotations(formatted_pvalues)
annotator.annotate()
sns.despine()
plt.tight_layout() # make all labels fit nicely
plt.show()
You could also update statannotations to version 0.5+, where it was fixed.

How to grab data based on two parameters using a function

I am attempting to create a few plots of some wind data, however, I am having trouble selecting specific data using two parameters, that being the hour of day and month. I am attempting to use a function to find grab the specific data but instead get the error
Traceback (most recent call last):
File "/Users/Cpower18/Documents/Tryong_again.py", line 47, in <module>
plt.plot(hr, hdh(hr, mn2))
File "/Users/Cpower18/Documents/Tryong_again.py", line 37, in hdh
for n, k in hr, mn2:
ValueError: too many values to unpack (expected 2)
I am currently using dataframes to sort the data based on date and a function to grab the specific data. I have managed to do so with only one variable, that being the hour of the day, however, not for two variables.
import csv
import warnings
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
warnings.simplefilter(action='ignore', category=FutureWarning)
data = pd.read_csv('merged_1.csv')
df = pd.DataFrame(data)
df['Wind Spd (km/h)'] = pd.to_numeric(df['Wind Spd (km/h)'], errors ='coerce')
df['Date/Time'] = pd.to_datetime(df['Date/Time'], errors = 'coerce')
df = df.set_index(pd.DatetimeIndex(df['Date/Time']))
df['hour'] = df.index.hour
df['month'] = df.index.month
mn1 = np.linspace(1, 2, 2)
mn2 = np.linspace(3, 5, 3)
mn3 = np.linspace(6, 8, 3)
mn4 = np.linspace(9, 11, 3)
mn5 = np.linspace(12)
hr = np.linspace(0, 23, 24)
def hdh(hr, mn2):
out = []
for n, k in hr, mn2:
t = (df['hour'] == n) & (df['month'] == k)
s = t['Wind Spd (km/h)'].mean(axis = 0) / 3.6
out.append(s)
return out
plt.plot(hr, hdh(hr, mn2))
plt.xlabel('Hour')
plt.ylabel('Wind Speed (m/s)')
plt.xlim(0, 24)
plt.ylim(2.85, 4.75)
plt.title('ShearENV Anual Average Hourly Wind Speed')
plt.grid(which = 'both', axis='both')
plt.show()`
The expected result should be a list of the data conforming to a specific hour (for example 01:00) and a specific season (for example months 3 to 5). As of now, I am only getting errors, thank you for any help.

index out of bounds// Python, dataframe, plot

I want to plot the point with max value from dataframe.
import pandas as pd
import matplotlib.pyplot as plt
dane = pd.read_table('C:\\xxx.txt', names=('rok', 'kroliki', 'lisy', 'marchewki'))
df = pd.DataFrame(dane)
data = df[1:]
data=data.astype(float)
x = int(data['kroliki'].max())
y = int(data['lisy'].max())
z = int(data['marchewki'].max())
p= data['rok'].where(data['kroliki'] == x)
q = data['rok'].where(data['lisy'] == y)
r = data['rok'].where(data['marchewki'] == z)
p1 = int(p[p.notnull()])
q1 = int(q[q.notnull()])
r1 = int(r[r.notnull()])
point = pd.DataFrame({'x':[p1],'y':[q1],'z':[r1]})
point.plot((p1,x),(q1,y),(r1,z))
I have such an error:
IndexError: index 1993 is out of bounds for axis 0 with size 4
May somebody know what is wrong with this code?
Thanks
I think that when you use Pandas to plot, it will look for indices within itself and not for values.
So, in your case, when you do:
point.plot(p1,x)
Pandas will look for the index 1993 in the x-direction, i.e, throughout all columns. In other words, you should have 1993 columns.
I tried to reproduce your problem as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(10, 4)), columns=('rok', 'kroliki', 'lisy', 'marchewki'))
data = df[1:]
data=data.astype(float)
x = int(data['kroliki'].max())
y = int(data['lisy'].max())
z = int(data['marchewki'].max())
p = data['rok'].where(data['kroliki'] == x)
q = data['rok'].where(data['lisy'] == y)
r = data['rok'].where(data['marchewki'] == z)
p1 = int(p[p.notnull()])
q1 = int(q[q.notnull()])
r1 = int(r[r.notnull()])
point = pd.DataFrame({'x':[p1],'y':[q1],'z':[r1]})
point.plot((p1,x),(q1,y),(r1,z))
I get the following error:
>>> AttributeError: 'tuple' object has no attribute 'lower'
And when I run each point separately:
>>>> IndexError: index 85 is out of bounds for axis 0 with size 3
To solve it:
import matplotlib.pyplot as plt
plt.plot((point.x, point.y, point.z), (x,y,z),'ko')
And I got the following result:
Hope it helps.

labeled intervals in matplotlib

I'm making a reference to the question on Plotting labeled intervals in matplotlib/gnuplot, the problem with the solution exposed there, is that doesn't work with only one line of data in the files. This is the code I'm trying:
#!/usr/bin/env python
#
import matplotlib.pyplot as plt
from matplotlib.dates import DateFormatter, MinuteLocator, SecondLocator
import numpy as np
from StringIO import StringIO
import datetime as dt
a=StringIO("""MMEX 2016-01-29T12:38:22 2016-01-29T12:39:03 SUCCESS
""")
#Converts str into a datetime object.
conv = lambda s: dt.datetime.strptime(s, '%Y-%m-%dT%H:%M:%S')
#Use numpy to read the data in.
data = np.genfromtxt(a, converters={1: conv, 2: conv},
names=['caption', 'start', 'stop', 'state'], dtype=None)
cap, start, stop = data['caption'], data['start'], data['stop']
#Check the status, because we paint all lines with the same color
#together
is_ok = (data['state'] == 'SUCCESS')
not_ok = np.logical_not(is_ok)
#Get unique captions and there indices and the inverse mapping
captions, unique_idx, caption_inv = np.unique(cap, 1, 1)
#Build y values from the number of unique captions.
y = (caption_inv + 1) / float(len(captions) + 1)
#Plot function
def timelines(y, xstart, xstop, color='b'):
"""Plot timelines at y from xstart to xstop with given color."""
plt.hlines(y, xstart, xstop, color, lw=4)
plt.vlines(xstart, y+0.005, y-0.005, color, lw=2)
plt.vlines(xstop, y+0.005, y-0.005, color, lw=2)
#Plot ok tl black
timelines(y[is_ok], start[is_ok], stop[is_ok], 'k')
#Plot fail tl red
timelines(y[not_ok], start[not_ok], stop[not_ok], 'r')
#Setup the plot
ax = plt.gca()
ax.xaxis_date()
myFmt = DateFormatter('%Y-%m-%dT%H:%M:%S')
ax.xaxis.set_major_formatter(myFmt)
ax.xaxis.set_major_locator(SecondLocator(interval=3600)) # used to be SecondLocator(0, interval=20)
#To adjust the xlimits a timedelta is needed.
delta = (stop.max() - start.min())/10
plt.yticks(y[unique_idx], captions)
plt.ylim(0,1)
plt.xlim(start.min()-delta, stop.max()+delta)
plt.xlabel('Time')
plt.xticks(rotation=70)
plt.show(block=True)
When I try this code, I get the following error:
Traceback (most recent call last):
File "./testPlot.py", line 49, in <module>
timelines(y[is_ok], start[is_ok], stop[is_ok], 'k')
ValueError: boolean index array should have 1 dimension
Also, when I try to add a dummy line on the data, let's said "MMEX 2016-01-01T00:00:00 2016-01-01T00:00:00 SUCCESS", the plot works but doesn't look good.
Any suggestions? I tried to put this question on the same post when I found the solution, but I don't have enough reputation...
Thanks in advance
The issue is that when you only read 1 item with np.genfromtxt, it is producing scalars (0-dimensions). We need them to be at least 1D.
You can add these lines just above where you define your timelines function, and then everything works ok.
This makes use of the numpy function np.atleast_1d(), to turn the scalars into 1D numpy arrays.
#Check the dimensions are at least 1D (for 1-item data input)
if start.ndim < 1:
start = np.atleast_1d(start)
if stop.ndim < 1::
stop = np.atleast_1d(stop)
if is_ok.ndim < 1:
is_ok = np.atleast_1d(is_ok)
if not_ok.ndim < 1:
not_ok = np.atleast_1d(is_ok)
The output:

TypeError: only length-1 arrays can be converted to Python scalars (for loop)

I am trying to code an analytical solution to a dam break in a rectangular channel. The idea is to have water on one side of the dam at 4m and no water on the downstream side of the dam, then to have the dam removed and see how the water evolves over time. I have the following code but im having issues with the "for i in range (x):" line. I will paste my code and the error i get. Can anyone explain why i get this error and any possible solutions? Thank you
__author__="A.H"
__date__ ="$04-Aug-2014 13:46:59$"
import numpy as np
import matplotlib.pyplot as plt
import math
import sys
from math import sqrt
import decimal
nx, ny = (69,69)
x5 = np.linspace(0,20.1,nx)
y5 = np.linspace(0,20.1,ny)
xv,yv = np.meshgrid(x5,y5)
x = np.arange(0,20.3956,0.2956)
y = np.arange(0,20.3956,0.2956)
t59=np.arange (1,4761,1)
h0=4.0
g=9.81
t=1
xa=10.5-(t*math.sqrt(g*h0))
xb=10.5+(2*t*math.sqrt(g*h0))
h=np.zeros(len(x))
for i in range(x):
if x[i]<=xa:
h=h0
elif xa<=x[i]<=xb:
h=(4.0/(9.0*g))*(((math.sqrt(g*h0))-(x[i]/(2.0*t)))**2.0)
else:
h=0
f = open(r'C:\opentelemac\bluetang\examples\telemac2d\dambreak\D1.i3s', 'r')
while True:
line = f.readline()
if line[0] not in [':','#']: break
ran = int(line.split()[0])
length = np.zeros(ran)
wse = np.zeros(ran)
for i in range (ran):
fields = f.readline().split()
length[i] = float(fields[0])
wse[i] = float(fields[2])
all =[length[i],wse[i]]
plt.figure(2)
plt.plot(length,h)
plt.plot(length,wse)
plt.legend(['Analytical solution','Model'], loc='upper right')
plt.show()
When i run this code i get the error:
Traceback (most recent call last):
File "D:\Work\NetBeansProjects\Ritter_test_2\src\ritter_test_2.py", line 29, in
for i in range(x):
TypeError: only length-1 arrays can be converted to Python scalars
The second half of the code when i read the text file in works fine. I believe its just the for loop and if statements that have issues but i may be wrong. Thanks for any help.
Others have pointed out that you most probably meant range(len(x)).
You can do it more simply by just iterating over the items in the array x:
for value in x:
if value <= xa:
h = h0
elif xa <= value <= xb:
h = (4.0/(9.0*g))*(((math.sqrt(g*h0))-(value/(2.0*t)))**2.0)
else:
h = 0
There's also a small issue with the condition in the elif - here xa can not be equal to value because if it was it would have already satisfied the previous condition in the if value <= xa. Just thought I'd point it out in case it's actually important.

Categories