Astroquery python: querying NED with list of objects - python

I have extracted a list of Simbad names from a VizieR catalog and would like to find the axis ratio of the objects from the diameters table in NED. Code below.
import numpy as np
from astropy.table import Table,Column
from astroquery.vizier import Vizier
from astroquery.ned import Ned
v = Vizier(columns = ['SimbadName','W50max'])
catalist = v.find_catalogs('VIII/73')
v.ROW_LIMIT = -1
a = v.get_catalogs(catalist.keys())
filter = a[0]['W50max'] > 500
targets = a[0][filter]
print targets
simName = targets['SimbadName']
W50max = targets['W50max']
counter = 1
for objects in simName:
result_table = Ned.get_table(objects, table='diameters')
## find where Axis Ratio results are not masked
notMasked = (np.where(result_table['NED Axis Ratio'].mask == False))
## calculate average value of Axis Ratio
print counter, np.sum(result_table['NED Axis Ratio'])/np.size(notMasked)
counter += 1
The fourth object in simNames has no diameters table so creates an error:
File "/home/tom/VizRauto.py", line 40, in <module>
result_table = Ned.get_table(objects, table='diameters')
File "/usr/local/lib/python2.7/dist-packages/astroquery/ned/core.py", line 505, in get_table
result = self._parse_result(response, verbose=verbose)
File "/usr/local/lib/python2.7/dist-packages/astroquery/ned/core.py", line 631, in _parse_result
raise RemoteServiceError("The remote service returned the following error message.\nERROR: {err_msg}".format(err_msg=err_msg))
RemoteServiceError: The remote service returned the following error message.
ERROR: Unknown error
So I tried:
counter = 1
for objects in simName:
try:
result_table = Ned.get_table(objects, table='diameters')
## find where Axis Ratio results are not masked
notMasked = (np.where(result_table['NED Axis Ratio'].mask == False))
## calculate average value of Axis Ratio
print counter, np.sum(result_table['NED Axis Ratio'])/np.size(notMasked)
except RemoteServiceError:
continue
counter += 1
which produces:
Traceback (most recent call last):
File "/home/tom/Dropbox/AST03CosmoLarge/Project/scripts/VizRauto.py", line 57, in <module>
except RemoteServiceError:
NameError: name 'RemoteServiceError' is not defined
So obviously the RemoteServiceError from core.py is not recognized. What is the best way to handle this or is there a better method for querying NED with a list of objects?

Related

Too many values to unpack in multi dictionary

I'm importing data from .csvs and creating a lot of data dictionaries. My code is based off someone else's work with a dataset that has substantially fewer columns than mine. I'll show both her code and then mine and then the error I'm receiving.
Original Code:
capacitya = open('C:/Users/Nafiseh/Desktop/Book chapter-code/arc-s.csv', 'r')
csv_capacitya = csv.reader(capacitya)
mydict_capacitya = {}
for row in csv_capacitya:
mydict_capacitya[(row[0], row[1],row[2])] = float(row[3])
My modification:
# arc capacity
capacitya = open('C:/Users/Emma/Documents/2021-2022/Thesis/Data/arcs.csv', 'r')
csv_capacitya = csv.reader(capacitya)
mydict_capacitya = {}
for row in csv_capacitya:
mydict_capacitya[(row[0], row[1],row[2])] = list(row[3:22])
When I run this later segment of code:
# arc capacity
capacitya = open('C:/Users/Emma/Documents/2021-2022/Thesis/Data/arcs.csv', 'r')
csv_capacitya = csv.reader(capacitya)
mydict_capacitya = {}
for row in csv_capacitya:
mydict_capacitya[(row[0], row[1],row[2])] = list(row[3:22])
#print(mydict_capacitya)
capacityaatt = open('C:/Users/Emma/Documents/2021-2022/Thesis/Data/distarc.csv', 'r')
csv_capacityaatt = csv.reader(capacityaatt)
mydict_capacityaatt = {}
for row in csv_capacityaatt:
mydict_capacityaatt[(row[0], row[1],row[2])] = float(row[3])
attarc, capacityatt= multidict(mydict_capacityaatt)
attarc = tuplelist(attarc)
arc, capacitya = multidict(mydict_capacitya)
Error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-29-66e3074f2135> in <module>
120 attarc, capacityatt= multidict(mydict_capacityaatt)
121 attarc = tuplelist(attarc)
--> 122 arc, capacitya = multidict(mydict_capacitya)
123
ValueError: too many values to unpack (expected 2)
If it helps, both in the original code and in my modification, columns [0:2] represent [k,i,j]. In the original dataset, column [4] represented the value. In the updated dataset, columns [3:22] represent values on the new index g. That is, column [4] represents values when g = 2, for example.
Thanks!
Edit: Added more relevant segments of code

Vector autoregressive (VAR) model fitting with different lag operator

I am a Master 2 student in computational neuroscience.
I'm at the very end of my analysis and I have a problem with the application of a VAR model (vector autoregressive model).
It is a rather complex problem to solve and it concerns the test of different lags operators on the data. For me the problem comes when I try to compute the cholesky factorization on a covariance matrix with negative numbers . :
I may have found a solution but I can't include it in the python function that deploys the model ("VAR"). If someone has ten minutes to help me, please write me. Thanks for your attention :)
for i in [1,2,3,4,6,8,9,10,12,13,14,15,16,17,18,19,20]:
print(i)
df_entropie_G1_w_diff = df_entropie_G1_w.iloc[i,2145:].diff()
df_RMSE_G1_w_diff = df_g1_RMSE_w.iloc[i,2145:].diff()
df_var_G1_w_diff = df_var_G1_w.iloc[i,2145:].diff()
df_data = pd.concat([df_entropie_G1_w_diff,df_RMSE_G1_w_diff,df_var_G1_w_diff],axis = 1)
df_data = df_data.diff().dropna()
df_data = df_data.T
df_data = df_data.reset_index()
del df_data['index']
df_data = df_data.T
df_data['Time'] = pd.to_timedelta(np.arange(537), unit='s')
df_data.index = df_data['Time']
del df_data['Time']
Arrange names of columns
df_data_T = df_data.T
df_data_T = df_data_T.reset_index()
del df_data_T['index']
df_data_T = df_data_T.T
df_data = df_data_T.rename(columns={0:'Entropie',1:'RMSE',2:'Var'})
model = VAR(df_data)
liste_aic = []
liste_bic = []
liste_fpe = []
liste_hqic = []
for a in range(0,25,1):
result = model.fit(a)
print('Lag Order =', a)
print('AIC : ', result.aic)
print('BIC : ', result.bic)
print('FPE : ', result.fpe)
print('HQIC: ', result.hqic, '\n')
liste_aic.append(result.aic)
liste_bic.append(result.bic)
liste_fpe.append(result.fpe)
liste_hqic.append(result.hqic)
1
Lag Order = 0
AIC : -59.6358271069015
BIC : -59.61188298346849
FPE : 1.260344786813777e-26
HQIC: -59.626460351200464
Lag Order = 1
/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/base/tsa_model.py:578: ValueWarning: An unsupported index was provided and will be ignored when e.g. forecasting.
warnings.warn('An unsupported index was provided and will be'
Traceback (most recent call last):
File "", line 139, in
print('AIC : ', result.aic)
File "/opt/anaconda3/lib/python3.8/site-packages/statsmodels/base/wrapper.py", line 34, in getattribute
obj = getattr(results, attr)
File "/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/vector_ar/var_model.py", line 2139, in aic
return self.info_criteria['aic']
File "pandas/_libs/properties.pyx", line 33, in pandas._libs.properties.CachedProperty.get
File "/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tsa/vector_ar/var_model.py", line 2120, in info_criteria
ld = logdet_symm(self.sigma_u_mle)
File "/opt/anaconda3/lib/python3.8/site-packages/statsmodels/tools/linalg.py", line 28, in logdet_symm
c, _ = linalg.cho_factor(m, lower=True)
File "/opt/anaconda3/lib/python3.8/site-packages/scipy/linalg/decomp_cholesky.py", line 152, in cho_factor
c, lower = _cholesky(a, lower=lower, overwrite_a=overwrite_a, clean=False,
File "/opt/anaconda3/lib/python3.8/site-packages/scipy/linalg/decomp_cholesky.py", line 37, in _cholesky
raise LinAlgError("%d-th leading minor of the array is not positive "
LinAlgError: 3-th leading minor of the array is not positive definite

OneHotEncoding error when applying to an empty field

The code consists of applying the OneHotEncoding technique to two fields of a binetflow file: Proto and State. I have to do this to 5 files. I was able to apply the code below with perfection to the first two. When it gets to the third it throws the error:
TypeError: '<' not supported between instances of 'str' and 'float'.
I'm sure the error's in line: 0.000000,icmp,,60,60.0,0 of the file in which the field State's empty.
I want to simply ignore the One hot Encoding and copy the State field the way it is, which is empty and jump to the next line.
df = opendataset()
df['State2'] = df['State']
df['Proto2'] = df['Proto']
df['Dur'] = df.Dur.apply(lambda n: '%.6f' % n)
le = LabelEncoder()
dfle = df
dfle.State = le.fit_transform(dfle.State)
X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()
OnehotX = ohe.fit_transform(X).toarray()
OnehotY = ohe.fit_transform(Y).toarray()
dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)
dfle['State'] = (dx[dx.columns[0:]].apply(lambda x:''.join(x.dropna().astype(int).astype(str)), axis=1))
dfle['Proto'] = (dy[dy.columns[0:]].apply(lambda y:''.join(y.dropna().astype(int).astype(str)), axis=1))
08-03 Edit
This (below) is the TraceBack when I run the code above. As you can see, the error is dfle.State = le.fit_transform(dfle.State) and consequently OnehotX = ohe.fit_transform(X).toarray().
Traceback (most recent call last):
File
"C:/Users/V/PycharmProjects/PreProcess/testfile.py",
line 39, in dfle.State = le.fit_transform(dfle.State)
File
"C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py",
line 236, in fit_transform self.classes_, y = _encode(y, encode=True)
File
"C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py",
line 108, in _encode return _encode_python(values, uniques, encode)
File
"C:\Users\V\PycharmProjects\PreProcess\venv\lib\site-packages\sklearn\preprocessing\label.py",
> line 63, in _encode_python uniques = sorted(set(values))
TypeError: '<' not supported between instances of 'str' and 'float'
NEW CODE:
I tried to do what Hemerson Tacon said and apply Try/Exception to the parts where the traceback throws an error but it warns me that it has an error and throws another error.
le = LabelEncoder()
dfle = df
try:
dfle.State = le.fit_transform(dfle.State)
except TypeError:
pass
X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()
try:
OnehotX = ohe.fit_transform(X).toarray()
except ValueError:
pass
OnehotY = ohe.fit_transform(Y).toarray()
dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)
dfle['State'] = (dx[dx.columns[0:]].apply(lambda x:''.join(x.dropna().astype(int).astype(str)), axis=1))
dfle['Proto'] = (dy[dy.columns[0:]].apply(lambda y:''.join(y.dropna().astype(int).astype(str)), axis=1))
NEW ERROR:
Traceback (most recent call last): File
"C:/Users/V/PycharmProjects/PreProcess/testfile.py",
line 53, in
** dx = pd.DataFrame(data=OnehotX) NameError: name 'OnehotX' is not defined**
LAST EDIT 09/03
The solution to the problem was to simply add the line df.replace() to the code. So when it reads it replaces NaN for the word empty fixing the problem.
dfle['State'].replace(np.nan,"empty", inplace=True)
df = opendataset()
df['State2'] = df['State']
df['Proto2'] = df['Proto']
df['Dur'] = df.Dur.apply(lambda n: '%.6f' % n)
le = LabelEncoder()
dfle = df
dfle['State'].replace(np.nan,"empty", inplace=True)
dfle.State = le.fit_transform(dfle.State)
X = dfle[['State']].values
Y = dfle[['Proto']].values
ohe = OneHotEncoder()
OnehotX = ohe.fit_transform(X).toarray()
OnehotY = ohe.fit_transform(Y).toarray()
dx = pd.DataFrame(data=OnehotX)
dy = pd.DataFrame(data=OnehotY)
You could put your code in question inside a try block and catch the TypeError exception, check if is the case where the State's field is empty and if true ignore it as you said, and if not true raise the error again.
If you had posted the actual code that calls the OneHotEncoding to your data would be easier to answer you and provide some code in the answer.
Edit
The OnehotX variable is defined only inside the try block. You need to define it outside and before this block to fix the error. Something like OnehotX = None would work. Also, I reinforce what I said before, in the except block would be a good practice to test if the exception is due to the problem you have identified, this means, test if the State field is empty.

Error in scikit code

I am new to Machine Learning and am trying the titanic problem from Kaggle. I have written the attached code that uses decision tree to do computations on data. There is an error that I am unable to remove.
Code :
#!/usr/bin/env python
from __future__ import print_function
import pandas as pd
import numpy as np
from sklearn import tree
train_uri = './titanic/train.csv'
test_uri = './titanic/test.csv'
train = pd.read_csv(train_uri)
test = pd.read_csv(test_uri)
# print(train[train["Sex"] == 'female']["Survived"].value_counts(normalize=True))
train['Child'] = float('NaN')
train['Child'][train['Age'] < 18] = 1
train['Child'][train['Age'] >= 18] = 0
# print(train[train['Child'] == 1]['Survived'].value_counts(normalize=True))
# print(train['Embarked'][train['Embarked'] == 'C'].value_counts())
# print(train.shape)
## Fill empty 'Embarked' values with 'S'
train['Embarked'] = train['Embarked'].fillna('S')
## Convert Embarked classes to integers
train["Embarked"][train["Embarked"] == "S"] = 0
train['Embarked'][train['Embarked'] == "C"] = 1
train['Embarked'][train['Embarked'] == "Q"] = 2
train['Sex'][train['Sex'] == 'male'] = 0
train['Sex'][train['Sex'] == 'female'] = 1
target = train['Survived'].values
features_a = train[['Pclass', 'Sex', 'Age', 'Fare']].values
tree_a = tree.DecisionTreeClassifier()
##### Line With Error #####
tree_a = tree_a.fit(features_a, target)
# print(tree_a.feature_importances_)
# print(tree_a.score(features_a, target))
Error:
Traceback (most recent call last):
File "titanic.py", line 40, in <module>
tree_a = tree_a.fit(features_a, target)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 739, in fit
X_idx_sorted=X_idx_sorted)
File "/usr/local/lib/python2.7/dist-packages/sklearn/tree/tree.py", line 122, in fit
X = check_array(X, dtype=DTYPE, accept_sparse="csc")
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 407, in check_array
_assert_all_finite(array)
File "/usr/local/lib/python2.7/dist-packages/sklearn/utils/validation.py", line 58, in _assert_all_finite
" or a value too large for %r." % X.dtype)
ValueError: Input contains NaN, infinity or a value too large for dtype('float32').
This error isn't present when I run the code on Datacamp server but present when I run it locally. I don't understand why this is coming up, I have checked the data and the values in either features_a or target don't contain NaN or really high values.
Try each feature one by one and you will probably find one of them has some nulls. I note you do not check if sex has nulls.
Also by coding each categoric variable manually it would be easy to make an error perhaps by misspelling one of the categories. Instead you can use df=pd.get_dummies(df) and it will automatically code all the categoric variables for you. No need to specify each category manually.
You can also try dropna() function of pandas to drop all those rows from dataset which have invalid values like NaN.

Index Error after for loop has already completed one loop

I'm trying to plot the last 30 days of sst data using a for loop. My code will run through the first loop fine but then give this error on the second:
Traceback (most recent call last):
File "sstt.py", line 20, in <module>
Temp = Temp[i,:,:]
IndexError: too many indices for array
It doesn't matter what indice I start on, the second loop always gives this error. If I start on -29, then -28 fails. If I start on -28, -27 fails, etc.
Code:
import numpy as np
import math as m
import urllib2
from pydap.client import open_url
from pydap.proxy import ArrayProxy
data_url_mean = 'http://www.esrl.noaa.gov/psd/thredds/dodsC/Datasets/noaa.oisst.v2.highres/sst.day.mean.2015.v2.nc'
dataset1 = open_url(data_url_mean)
# Daily Mean
Temp = dataset1['sst']
timestep = [-29,-28,-27,-26,-25,-24,-23,-22,-21,-20,-19,-18,-17,-16,-15,-14,-13,-12,-11,-10,-9,-8,-7,-6,-5,-4,-3,-2,-1]
for i in timestep:
# Daily Mean
Temp = Temp[i,:,:]
Temp = Temp.array[:]
Temp = Temp * (9./5.) + 32.
Temp = Temp.squeeze()
print i
You're assigning all of your values to the same variable. After the first pass of the loop, Temp is no longer equal to the dataset, and the attempt to perform the operation expecting it to be the dataset fails.
You need to come up with some new names for the variables that you assign values to.

Categories