Issue appending info from networkx dict to a numpy array - python

Trying to take strings from my networkx dict and append them to a certain place within an array:
def sameSports(node1, node2, G):
A = [G.node[node1]['sport1'], G.node[node1]['sport2'], G.node[node1]['sport3']]
B = [G.node[node2]['sport1'], G.node[node2]['sport2'], G.node[node2]['sport3']]
sharedSports = np.intersect1d(A,B)
sharedSports = np.delete(sharedSports, np.where(sharedSports == 'N/A'))
return sharedSports
def meetingDay(sharedSports,rightAxis):
meetingDays = [[] for i in range (len(sharedSports))]
columns=['Monday','Tuesday','Wednesday','Thursday','Friday']
for i in range (0,5):
for t in sharedSports:
n=np.where(sharedSports==t)
meetingDays[n].append(t)
if t in rightAxis[i]:
meetingDays[n].append(columns[i])
return meetingDays
The G.node[n]['sport'] dicts return single letter strings 'A'-'W', and the rightAxis array is a 5-section array containing some of the same letters. Getting this error:
Traceback (most recent call last):
File ".\main.py", line 18, in
a,b=add_sport_to_edges(G , rightAxis)
File "C:\Users\rjtkr\OneDrive\Documents\Work\Summer 2020 Research\epidemic\Athletics Network\sportsnetwork.py", line 99, in add_sport_to_edges
temp = meetingDay(sharedSports, rightAxis)
File "C:\Users\rjtkr\OneDrive\Documents\Work\Summer 2020 Research\epidemic\Athletics Network\sportsnetwork.py", line 90, in meetingDay
meetingDays[n].append(t)
TypeError: list indices must be integers or slices, not tuple
Unsure if it's 'n' or 't' that it doesn't like, but fairly sure that neither is a tuple. Hopefully I included all relevant code. The other lines showing up in the error come after the attached code, and the error remains the same when they are removed.

Related

index out of range for splitting file names

I have a list of ~1,000 data files and at the beginning of my program I am splitting the file name into its certain parts in a for loop. The code runs for about 726 of the files and then I get the error
Traceback (most recent call last):
File "c:\Py Practice\ABBRH.py", line 58, in <module>
tube2 = sub3[2])
IndexError: list index out of range
I understand what the error is, but when looking at the file it stops on, there is an index. In fact, when I run the singular file by itself through the same code it works. The files are somewhat different, but they all should work the same.
the stopped file is "opimized.new.12_12_40-90-12_12_40.Ni00Nj00.lammps"
the numbers are arranged opimized.new.n1_m1_l1-angle-n2_m2_l2
sub1=(file.split('.'))
sub2=sub1[2]
sub3=(sub2.split('-'))
tube1 = sub3[0]
nma1 = (tube1.split('_'))
n1 = int(nma1[0])
m1 = int(nma1[1])
tube2 = sub3[2]
nma2 = (tube2.split('_'))
n2 = int(nma2[0])
m2 = int(nma2[1])

list index out of range reading files

I am a beginner of python and would need some help. I have run into a problem when trying to manipulating some dat-files.
I have created 159 dat.files (refitted_to_digit0_0.dat, refitted_to_digit0_1.dat, ...refitted_to_digit0_158.dat) containing time series data of two columns (timestep, value) of 2999 rows. In my python program I have created a list of these files with filelist_refit_0=glob.glob('refitted_to_digit0_*')
plist_refit_0=[]
I now try to load the second column of each 159 files into the plist_refit_0 so that each place in the list contains an array of 2999 values (second columns) that I will use for further manipulations. I have created a for-loop for this and use the len(filelist_refit_0) as the range for the loop. The length being 159 (number of files: 0-158).
However, when I run this I get an error message: list index out of range.
I have tried with a lower range for the for-loop and it seems to work up until range 66 but not above that. filelist_refit_0[66] refer to file refitted_to_digit0_158.dat and filelist_refit_0[67] refer to refitted_to_digit0_16.dat. filelist_refit_0[158] refer to refitted_to_digit0_99.dat. Instead of being sorted in ascending order based on the value 0->158 I think the plist_refit_0 have the files in ascending order based on the digits: refitted_to_digit0_0.dat first, then refitted_to_digit0_1.dat, then refitted_to_digit0_10.dat, then refitted_to_digit0_100.dat, then refitted_to_digit0_101.dat resulting in refitted_to_digit0_158.dat being on place 66 in the list. However, I still don't understand why the compiler interprets the index as being out of range above 66 when the length of the filelist_refit_0 being 159 and there really are 159 files, no matter the order. If anyone can explain this and have some advice how to solve this problem, I highly appreciate it! Thanks for your help.
I have tried the following to understand the sorting:
print len(filelist_refit_0) => 159
print filelist_refit_0[66] => refitted_to_digit0_158.dat
print filelist_refit_0[67] => refitted_to_digit0_16.dat
print filelist_refit_0[158] => refitted_to_digit0_99.dat
print filelist_refit_0[0] => refitted_to_digit0_0.dat
I have "manually" loaded the files and it seems to work for most index e.g.
t, p = loadtxt(filelist_refit_0[65], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
t, p = loadtxt(filelist_refit_0[67], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
print plist_refit_0[0]
print plist_refit_0[1]
BUT it does not work for index66!:
t, p = loadtxt(filelist_refit_0[66], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Then I get error: list index out of range.
As can be seen above it refers to refitted_to_digit0_158.dat which is the last file. I have looked into the file and it looks exactly the same as all the other files, which the same number of columns and raw-elements (2999). Why is this entry different?
Python 2:
filelist_refit_0 = glob.glob('refitted_to_digit0_*')
plist_refit_0 = []
for i in range(len(filelist_refit_0)):
t, p = loadtxt(filelist_refit_0[i], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Traceback (most recent call last):
File "test.py", line 107, in <module>
t,p=loadtxt(filelist_refit_0[i],usecols=(0,1),unpack=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1092, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1012, in read_data
vals = [vals[j] for j in usecols]
IndexError: list index out of range

How to fix "TypeError: 'int' object is not iterable" error in concurrent.futures threading?

My goal is to scrape some links and using threads to do it faster.
When I try to make threads, it raises TypeError: 'int' object is not iterable.
Here is our script:
import requests
import pandas
import json
import concurrent.futures
from from collections import Iterable
# our profiles that we will scrape
profile = ['kaid_329989584305166460858587','kaid_896965538702696832878421','kaid_1016087245179855929335360','kaid_107978685698667673890057','kaid_797178279095652336786972','kaid_1071597544417993409487377','kaid_635504323514339937071278','kaid_415838303653268882671828','kaid_176050803424226087137783']
# lists of the data that we are going to fill up with each profile
total_project_votes=[]
def scraper(kaid):
data = requests.get('https://www.khanacademy.org/api/internal/user/scratchpads?casing=camel&kaid={}&sort=1&page=0&limit=40000&subject=all&lang=en&_=190425-1456-9243a2c09af3_1556290764747'.format(kaid))
sum_votes=[]
try:
data=data.json()
for item in data['scratchpads']:
try :
sum_votes=item['sumVotesIncremented']
except KeyError:
pass
sum_votes=map(int,sum_votes) # change all items of the list in integers
print(isinstance(sum_votes, Iterable)) #to check if it is an iterable element
print(isinstance(sum_votes, int)) # to check if it is a int element
sum_votes=list(sum_votes) # transform into a list
sum_votes=map(abs,sum_votes) # change all items in absolute value
sum_votes=list(sum_votes) # transform into a list
sum_votes=sum(sum_votes) # sum all items in the list
sum_votes=str(sum_votes) # transform into a string
total_project_votes=sum_votes
except json.decoder.JSONDecodeError:
total_project_votes='NA'
return total_project_votes
with concurrent.futures.ThreadPoolExecutor(max_workers=2) as executor:
future_kaid = {executor.submit(scraper, kaid): kaid for kaid in profile}
for future in concurrent.futures.as_completed(future_kaid):
kaid = future_kaid[future]
results = future.result()
# print(results) why printing only one of them and then stops?
total_project_votes.append(results[0])
# write into a dataframe and print it:
d = {'total_project_votes':total_project_votes}
dataframe = pandas.DataFrame(data=d)
print(dataframe)
I expected to get this output:
total_project_votes
0 0
1 2353
2 41
3 0
4 0
5 12
6 5529
7 NA
8 2
But instead I get this error:
TypeError: 'int' object is not iterable
I don't really understand what this error means. What is wrong in my script? How can I solve it?
When I look at Traceback it looks like this is where the issue is coming from:
sum_votes=map(int,sum_votes).
down below some additional information
Traceback:
Traceback (most recent call last):
File "toz.py", line 91, in <module>
results = future.result()
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 425, in result
return self.__get_result()
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\_base.py", line 384, in __get_result
raise self._exception
File "C:\Users\*\AppData\Local\Programs\Python\Python37-32\lib\concurrent\futures\thread.py", line 57, in run
result = self.fn(*self.args, **self.kwargs)
File "my_scrap.py", line 71, in scraper
sum_votes=map(int,sum_votes) # change all items of the list in integers
TypeError: 'int' object is not iterable
I found my error:
I should have put:
sum_votes.append(item['sumVotesIncremented'])
Instead of:
sum_votes=item['sumVotesIncremented'].
Also, because we only have one item here: total_project_votes. Our tuple results have only one item.
And that can cause some problems. Because when we do results[0] it doesn't behave like a list.
It is not going to show the whole total_project_votes but the first character of the string. (For example "Hello" become "H").
And if total_project_votes was an int object instead of a string. It would generate an other error.
To solve this issue, I need to add another object in the tuple results and then when you do results[0] it actually behave like a list.

Python appending a list in a for loop with numpy array data

I am writing a program that will append a list with a single element pulled from a 2 dimensional numpy array. So far, I have:
# For loop to get correlation data of selected (x,y) pixel for all bands
zdata = []
for n in d.bands:
cor_xy = np.array(d.bands[n])
zdata.append(cor_xy[y,x])
Every time I run my program, I get the following error:
Traceback (most recent call last):
File "/home/sdelgadi/scr/plot_pixel_data.py", line 36, in <module>
cor_xy = np.array(d.bands[n])
TypeError: only integer arrays with one element can be converted to an index
My method works when I try it from the python interpreter without using a loop, i.e.
>>> zdata = []
>>> a = np.array(d.bands[0])
>>> zdata.append(a[y,x])
>>> a = np.array(d.bands[1])
>>> zdata.append(a[y,x])
>>> print(zdata)
[0.59056658, 0.58640128]
What is different about creating a for loop and doing this manually, and how can I get my loop to stop causing errors?
You're treating n as if it's an index into d.bands when it's an element of d.bands
zdata = []
for n in d.bands:
cor_xy = np.array(n)
zdata.append(cor_xy[y,x])
You say a = np.array(d.bands[0]) works. The first n should be exactly the same thing as d.bands[0]. If so then np.array(n) is all you need.

Tuples Vs List Vs Numpy Arrays for Plotting a Boxplot in Python

I am trying to plot a boxplot for a column in several csv files (without the header row of course), but running into some confusion around tuples, lists and arrays. Here is what I have so far
#!/usr/bin/env python
import csv
from numpy import *
import pylab as p
import matplotlib
#open one file, until boxplot-ing works
f = csv.reader (open('2-node.csv'))
#get all the columns in the file
timeStamp,elapsed,label,responseCode,responseMessage,threadName,dataType,success,bytes,Latency = zip(*f)
#Make list out of elapsed to pop the 1st element -- the header
elapsed_list = list(elapsed)
elapsed_list.pop(0)
#Turn list back to a tuple
elapsed = tuple(elapsed_list)
#Turn list to an numpy array
elapsed_array = array(elapsed_list)
#Elapsed Column statically entered into an array
data_array = ([4631, 3641, 1902, 1937, 1745, 8937] )
print data_array #prints in this format: ([xx,xx,xx,xx]), .__class__ is list ... ?
print elapsed #prints in this format: ('xx','xx','xx','xx'), .__class__ is tuple
print elapsed_list # #print in this format: ['xx', 'xx', 'xx', 'xx', 'xx'], .__class__ is list
print elapsed_array #prints in this format: ['xx' 'xx' 'xx' 'xx' 'xx'] -- notice no commas, .__class__ is numpy.ndarray
p.boxplot (data_array) #works
p.boxplot (elapsed) # does not work, error below
p.boxplit (elapsed_list) #does not work
p.boxplot (elapsed_array) #does not work
p.show()
For boxplots, the 1st argument is an "an array or a sequence of vectors", so I would think elapsed_array would work ... ? But yet data_array, a "list," works ... but elapsed_list` a "list" does not ... ? Is there a better way to do this ... ?
I am fairly new to python, and I would like to understand the what about the differences among a tuple, list, and numpy-array prevents this boxplot from working.
Example error message is:
Traceback (most recent call last):
File "../pullcol.py", line 32, in <module>
p.boxplot (elapsed_list)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/pyplot.py", line 1962, in boxplot
ret = ax.boxplot(x, notch, sym, vert, whis, positions, widths, patch_artist, bootstrap)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/axes.py", line 5383, in boxplot
q1, med, q3 = mlab.prctile(d,[25,50,75])
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 946, in prctile
return _interpolate(values[ai],values[bi],frac)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/matplotlib/mlab.py", line 920, in _interpolate
return a + (b - a)*fraction
TypeError: unsupported operand type(s) for -: 'numpy.ndarray' and 'numpy.ndarray'
elapsed contains strings. Matplotlib needs integers or floats to plot something. Try converting each value of elapsed to integer. You can do this like so
elapsed = tuple([int(i) for i in elapsed])
or as FredL commented below:
elapsed_list = array(elapsed_list, dtype=float)
I'm not familiar with numpy or matplotlib, but just from the description and what's working, it appears it is looking for a nested sequence of sequences. Which is why data_array works as it's a tuple containing a list, where as all your other input is only one layer deep.
As for the differences, a list is a mutable sequence of objects, a tuple is an immutable sequence of objects and an array is a mutable sequence of bytes, ints, chars (basically 1, 2, 4 or 8 byte values).
Here's a link to the Python docs about 5.6. Sequence Types, from there you can jump to more detailed info about lists, tuples, arrays or any of the other sequence types in Python.

Categories