I am a beginner of python and would need some help. I have run into a problem when trying to manipulating some dat-files.
I have created 159 dat.files (refitted_to_digit0_0.dat, refitted_to_digit0_1.dat, ...refitted_to_digit0_158.dat) containing time series data of two columns (timestep, value) of 2999 rows. In my python program I have created a list of these files with filelist_refit_0=glob.glob('refitted_to_digit0_*')
plist_refit_0=[]
I now try to load the second column of each 159 files into the plist_refit_0 so that each place in the list contains an array of 2999 values (second columns) that I will use for further manipulations. I have created a for-loop for this and use the len(filelist_refit_0) as the range for the loop. The length being 159 (number of files: 0-158).
However, when I run this I get an error message: list index out of range.
I have tried with a lower range for the for-loop and it seems to work up until range 66 but not above that. filelist_refit_0[66] refer to file refitted_to_digit0_158.dat and filelist_refit_0[67] refer to refitted_to_digit0_16.dat. filelist_refit_0[158] refer to refitted_to_digit0_99.dat. Instead of being sorted in ascending order based on the value 0->158 I think the plist_refit_0 have the files in ascending order based on the digits: refitted_to_digit0_0.dat first, then refitted_to_digit0_1.dat, then refitted_to_digit0_10.dat, then refitted_to_digit0_100.dat, then refitted_to_digit0_101.dat resulting in refitted_to_digit0_158.dat being on place 66 in the list. However, I still don't understand why the compiler interprets the index as being out of range above 66 when the length of the filelist_refit_0 being 159 and there really are 159 files, no matter the order. If anyone can explain this and have some advice how to solve this problem, I highly appreciate it! Thanks for your help.
I have tried the following to understand the sorting:
print len(filelist_refit_0) => 159
print filelist_refit_0[66] => refitted_to_digit0_158.dat
print filelist_refit_0[67] => refitted_to_digit0_16.dat
print filelist_refit_0[158] => refitted_to_digit0_99.dat
print filelist_refit_0[0] => refitted_to_digit0_0.dat
I have "manually" loaded the files and it seems to work for most index e.g.
t, p = loadtxt(filelist_refit_0[65], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
t, p = loadtxt(filelist_refit_0[67], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
print plist_refit_0[0]
print plist_refit_0[1]
BUT it does not work for index66!:
t, p = loadtxt(filelist_refit_0[66], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Then I get error: list index out of range.
As can be seen above it refers to refitted_to_digit0_158.dat which is the last file. I have looked into the file and it looks exactly the same as all the other files, which the same number of columns and raw-elements (2999). Why is this entry different?
Python 2:
filelist_refit_0 = glob.glob('refitted_to_digit0_*')
plist_refit_0 = []
for i in range(len(filelist_refit_0)):
t, p = loadtxt(filelist_refit_0[i], usecols=(0, 1), unpack=True)
plist_refit_0.append(p)
Traceback (most recent call last):
File "test.py", line 107, in <module>
t,p=loadtxt(filelist_refit_0[i],usecols=(0,1),unpack=True)
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1092, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/numpy/lib/npyio.py", line 1012, in read_data
vals = [vals[j] for j in usecols]
IndexError: list index out of range
I ran this Huffman Coding code in python 2 and it works smoothly. However, in python 3, it gives me an error as above. I know that the types are not the same (and hence incomparable) but how should I fix this?
Note that the error specifically points out at q.put((kiri[0]+kanan[0],node)), which I believe the issue lies on the comparison made in the priority queue.
Example of input that causes the error:
3
1
2
3
The first line refers to the number of characters. The next lines show the frequency of the first character, second character and so on.
Note that the code somehow runs if the first line is less than 3. For example:
2
1
2
works just fine
Any help is greatly appreciated. Thank you!
n=list(map(int, input().split()))
n=n[0]
li=[None]*n
for i in range(n):
inp=list(map(int, input().split()))
li[i]=inp[0]
char=[None]*n
index=1
for i in range(n):
char[i]=index
index+=1
freq=list(zip(li,char))
import queue
class Tree:
def __init__(self,kanan,kiri):
self.kanan=kanan
self.kiri=kiri
def anak(self):
return int((self.kanan,self.kiri))
q=queue.PriorityQueue()
for nilai in freq:
q.put(nilai)
size=q.qsize()
for i in range(size,1,-1):
kanan=q.get()
kiri=q.get()
node=Tree(kanan,kiri)
q.put((kiri[0]+kanan[0],node))
huffmantree=q.get()
def traverse(huffmantree,st,pref):
if isinstance(huffmantree[1].kanan[1],Tree):
traverse(huffmantree[1].kanan,st,pref+"0")
else: st[huffmantree[1].kanan[1]]=pref+"0"
if isinstance(huffmantree[1].kiri[1],Tree):
traverse(huffmantree[1].kiri,st,pref+"1")
else: st[huffmantree[1].kiri[1]]=pref+"1"
return st
binarystring=traverse(huffmantree,{},"")
for i in freq: print(binarystring[i[1]])
The basic problem is that you're pushing different types into your tree nodes. The second element of your ordered pair (2-tuple) is an integer as long as you have a simple tree. When you get to a child (grandchild) node, however, you try to use a Tree object as the second element.
Sorting into the tree requires a well-defined ordering. A Tree and an int have no such ordering.
Very simply, you must decide what the other element should be, and keep that consistent in your data handling. Here's some basic debugging instrumentation:
for nilai in freq:
print("nilai insert", nilai)
q.put(nilai)
size = q.qsize()
for i in range(size,1,-1):
kanan = q.get()
kiri = q.get()
node = Tree(kanan,kiri)
new_val = kiri[0] + kanan[0]
print("1st", new_val)
print("2nd", node)
q.put((new_val,node))
Execution for your 3-1-2-3 case:
$ python3 so.py
3
1
2
3
nilai insert (1, 1)
nilai insert (2, 2)
nilai insert (3, 3)
1st 3
2nd <__main__.Tree object at 0x7efd60c9a160>
Traceback (most recent call last):
File "so.py", line 40, in <module>
q.put((new_val,node))
File "/usr/lib64/python3.4/queue.py", line 146, in put
self._put(item)
File "/usr/lib64/python3.4/queue.py", line 230, in _put
heappush(self.queue, item)
TypeError: unorderable types: Tree() < int()
The critical difference is between upper and lower additions:
q.put(nilai) # Simple pair, such as (1, 1)
and
q.put((new_val,node))
Where the second element is a Tree.
I am not Python programmer, but I need to use some method from SciPy library. I just want to repeat inner loop a couple of times, but with changed index of table. Here is my code for now:
from scipy.stats import pearsonr
fileName = open('ILPDataset.txt', 'r')
attributeValue, classValue = [], []
for index in range(0, 10, 1):
for line in fileName.readlines():
data = line.split(',')
attributeValue.append(float(data[index]))
classValue.append(float(data[10]))
print(index)
print(pearsonr(attributeValue, classValue))
And I am getting the following output:
0
(-0.13735062681256097, 0.0008840631556260505)
1
(-0.13735062681256097, 0.0008840631556260505)
2
(-0.13735062681256097, 0.0008840631556260505)
3
(-0.13735062681256097, 0.0008840631556260505)
4
(-0.13735062681256097, 0.0008840631556260505)
5
(-0.13735062681256097, 0.0008840631556260505)
6
(-0.13735062681256097, 0.0008840631556260505)
7
(-0.13735062681256097, 0.0008840631556260505)
8
(-0.13735062681256097, 0.0008840631556260505)
9
(-0.13735062681256097, 0.0008840631556260505)
As you can see index is changing, but the result of that function is always like the index would be 0.
When I am running script couple of times but with changing index value like this:
attributeValue.append(float(data[0]))
attributeValue.append(float(data[1]))
...
attributeValue.append(float(data[9]))
everything is ok, and I am getting correct results, but I can't do it in one loop statement. What am I doing wrong?
EDIT:
Test file:
62,1,6.8,3,542,116,66,6.4,3.1,0.9,1
40,1,1.9,1,231,16,55,4.3,1.6,0.6,1
63,1,0.9,0.2,194,52,45,6,3.9,1.85,2
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,4.1,2,289,875,731,5,2.7,1.1,1
34,1,6.2,3,240,1680,850,7.2,4,1.2,1
20,1,1.1,0.5,128,20,30,3.9,1.9,0.95,2
84,0,0.7,0.2,188,13,21,6,3.2,1.1,2
57,1,4,1.9,190,45,111,5.2,1.5,0.4,1
52,1,0.9,0.2,156,35,44,4.9,2.9,1.4,1
57,1,1,0.3,187,19,23,5.2,2.9,1.2,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
38,0,2.6,1.2,410,59,57,5.6,3,0.8,2
30,1,1.3,0.4,482,102,80,6.9,3.3,0.9,1
17,0,0.7,0.2,145,18,36,7.2,3.9,1.18,2
46,0,14.2,7.8,374,38,77,4.3,2,0.8,1
Expected results of pearsonr for 9 script runs:
data[0] (0.06050513030608389, 0.8238536636813034)
data[1] (-0.49265895172303803, 0.052525691067199995)
data[2] (-0.5073312383613632, 0.0448647312201305)
data[3] (-0.4852842899321005, 0.056723468068371544)
data[4] (-0.2919584357031029, 0.27254138535817224)
data[5] (-0.41640591455640696, 0.10863082761524119)
data[6] (-0.46954072465442487, 0.0665061785375443)
data[7] (0.08874739193909209, 0.7437895010751641)
data[8] (0.3104260624799073, 0.24193152445774302)
data[9] (0.2943030868699842, 0.26853066217221616)
Turn each line of the file into a list of floats
data = []
with open'ILPDataset.txt') as fileName:
for line in fileName:
line = line.strip()
line = line.split(',')
line = [float(item) for item in line[:11]]
data.append(line)
Transpose the data so that each list in data has the column values from the original file. data --> [[column 0 items], [column 1 items],[column 2 items],...]
data = zip(*data) # for Python 2.7x
#data = list(zip(*data)) # for python 3.x
Correlate:
for n in [0,1,2,3,4,5,6,7,8,9]:
corr = pearsonr(data[n], data[10])
print('data[{}], {}'.format(n, corr))
#wwii 's answer is very good
Only one suggestion. list(zip(*data)) seems a bit overkill to me. zip is really for lists with variable types and potentially variable lengths to be composed into tuples. Only then be transformed back into lists in this case with list()).
So why not just use the simple transpose operation which is what this is?
import numpy;
//...
data = numpy.transpose(data);
which does the same job, probably faster (not measure) and more deterministically.