import from file data error in Python - python

I'm having some issues importing data from file in Python. I am quite new to Python, so my error is probably quite simple.
I am reading in 3 column, tab-delimited text files with no headers. I am creating 3 instances of the data file using three different datafiles.
I can see that each object is referencing a different memory location, so they are separate.
When I look at the data stored in each instance, each instance has the same contents, consisting of the three datafiles appended to each other.
What have I done wrong?
The class to read in the data is:
class Minimal:
def __init__(self, data=[]):
self.data = data
def readFile(self, filename):
f = open(filename, 'r')
for line in f:
line = line.strip()
columns = line.split()
#creates a list of angle, intensity and error and appends it to the diffraction pattern
self.data.append( [float(columns[0]), float(columns[1]), float(columns[2])] )
f.close()
def printData(self):
for dataPoint in self.data:
print str(dataPoint)
The datafiles look like:
1 4 2
2 5 2.3
3 4 2
4 6 2.5
5 8 5
6 10 3
The program I am using to actually create the instances of Minimal is:
from minimal import Minimal
d1 = Minimal()
d1.readFile("data1.xye")
d2 = Minimal()
d2.readFile("data2.xye")
d3 = Minimal()
d3.readFile("data3.xye")
print "Data1"
print d1
d1.printData()
print "\nData2"
print d2
d2.printData()
print "\nData3"
print d3
d3.printData()
The output is:
Data1
<minimal.Minimal instance at 0x016A35F8>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Data2
<minimal.Minimal instance at 0x016A3620>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Data3
<minimal.Minimal instance at 0x016A3648>
[1.0, 4.0, 2.0]
[2.0, 5.0, 2.3]
[3.0, 4.0, 2.0]
[4.0, 6.0, 2.5]
[5.0, 8.0, 5.0]
[6.0, 10.0, 3.0]
[2.0, 4.0, 2.0]
[3.0, 5.0, 2.3]
[4.0, 4.0, 2.0]
[5.0, 6.0, 2.5]
[6.0, 8.0, 5.0]
[7.0, 10.0, 3.0]
[3.0, 4.0, 2.0]
[4.0, 5.0, 2.3]
[5.0, 4.0, 2.0]
[6.0, 6.0, 2.5]
[7.0, 8.0, 5.0]
[8.0, 10.0, 3.0]
Tool completed successfully

Default value data is evaluated only once; data attributes of Minimal instances reference the same list.
>>> class Minimal:
... def __init__(self, data=[]):
... self.data = data
...
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
True
Replace as follow:
>>> class Minimal:
... def __init__(self, data=None):
... self.data = data or []
...
>>> a1 = Minimal()
>>> a2 = Minimal()
>>> a1.data is a2.data
False
See “Least Astonishment” in Python: The Mutable Default Argument.

Consider the following:
def d():
print("d() invoked")
return 1
def f(p=d())
pass
print"("Start")
f()
f()
It prints
d() invoked
Start
Not
Start
d() invoked
d() invoked
Why? Because default arguments are computed on function definition (and stored in some kind of internal global for reuse every subsequent time they are needed). They are not computed on each function invocation.
In other words, they behave more or less like:
_f_p_default= d()
def f(p)
if p is None: p= _f_p_default
pass
Make the above substitution in your code, and you will understand the problem immediately.
The correct form for your code was already provided by #falsetru . I'm just trying to explain the rationale.

Related

How to use an index of a dataframe to assign values to a row of a new column?

I have a dataset that consists of ID (participant), run, indexnumber (that is, an index number of a slalom turn) and performance (that could be velocity or time). In addition, I have information for each id and run where in the slalom turn (that is, the index) they actually start to turn.
My goal is to create a new column in the dataframe that contain 0 if the id has not started to turn and 1 if they have started to turn. This column could be called phase.
For example:
For ID1 the point where this skier starts to turn i index 4 for the first run and 9 for the second run. Therefore, I want all rows in the new column to contain 0s until index nr 4 and 1s thereafter (for the first run). For the second run I want all rows to contain 0s until index nr 9 and 1 thereafter.
Is there a simple way to do this with pandas or vanilla python?
example = [[1.0, 1.0, 1.0, 0.6912982024915187],
[1.0, 1.0, 2.0, 0.16453900411106737],
[1.0, 1.0, 3.0, 0.11362801727310845],
[1.0, 1.0, 4.0, 0.587778444335624],
[1.0, 1.0, 5.0, 0.8455388913351765],
[1.0, 1.0, 6.0, 0.5719366584505648],
[1.0, 1.0, 7.0, 0.4665520044952449],
[1.0, 1.0, 8.0, 0.9105152709573275],
[1.0, 1.0, 9.0, 0.4600099001744885],
[1.0, 1.0, 10.0, 0.8577060884077763],
[1.0, 2.0, 1.0, 0.11550722410813963],
[1.0, 2.0, 2.0, 0.5729090378222077],
[1.0, 2.0, 3.0, 0.43990164344919824],
[1.0, 2.0, 4.0, 0.595242293948498],
[1.0, 2.0, 5.0, 0.443684017624451],
[1.0, 2.0, 6.0, 0.3608135854303052],
[1.0, 2.0, 7.0, 0.28525404982906766],
[1.0, 2.0, 8.0, 0.11561422303194391],
[1.0, 2.0, 9.0, 0.8579134051748011],
[1.0, 2.0, 10.0, 0.540598113345226],
[2.0, 1.0, 1.0, 0.4058570295736075],
[2.0, 1.0, 2.0, 0.9422426000325298],
[2.0, 1.0, 3.0, 0.7918655742964762],
[2.0, 1.0, 4.0, 0.4145753321336241],
[2.0, 1.0, 5.0, 0.5256388261997529],
[2.0, 1.0, 6.0, 0.8140335187050629],
[2.0, 1.0, 7.0, 0.12134416740848841],
[2.0, 1.0, 8.0, 0.9016748379372173],
[2.0, 1.0, 9.0, 0.462241316800442],
[2.0, 1.0, 10.0, 0.7839715857746699],
[2.0, 2.0, 1.0, 0.5300527244824904],
[2.0, 2.0, 2.0, 0.8784844676567194],
[2.0, 2.0, 3.0, 0.14395673182343738],
[2.0, 2.0, 4.0, 0.7606405990262495],
[2.0, 2.0, 5.0, 0.5123048342846208],
[2.0, 2.0, 6.0, 0.25608277502943655],
[2.0, 2.0, 7.0, 0.4264542956426933],
[2.0, 2.0, 8.0, 0.9144976708651866],
[2.0, 2.0, 9.0, 0.875888479621729],
[2.0, 2.0, 10.0, 0.3428732760552141]]
turnPhaseId1 = [4,9] #the index number when ID1 starts to turn in run 1 and run 2, respectively
turnPhaseId2 = [2,5] #the index number when ID2 starts to turn in run 1 and run 2, respectively
pd.DataFrame(example, columns=['id', 'run', 'index', 'performance'])
I believe it is a better idea to turnPhase into a dictionary, and then use apply:
turn_dict = {1: [4, 9],
2: [2, 5]}
We also need to change the column types as we need to reach dictionary keys, and list indexes, which are int:
df['id'] = df['id'].astype(int)
df['index'] = df['index'].astype(int)
Finally, apply:
df['new_column'] = df.apply(lambda x: 0 if x['index'] < turn_dict[x['id']][int(x['run'] -1)] else 1 , axis=1)

the function keeps unwanted informations

for some reason when i call the function "novalinhasubtraida",
it keeps its information and affects it`s next calls
matriz = [[1.0, 7.0, 9.0, 5.0],
[1.125, 1.0, 0.25, 0.875],
[0.4, 0.6, 1.0, 0.2]]
result = list()
def subrairlinhas(matriz,linhasubtraida,linhasubtraiadora):
result.clear()
for item1, item2 in zip(matriz[linhasubtraida], matriz[linhasubtraiadora]):
item = item1 - item2*(matriz[linhasubtraida][linhasubtraiadora])
#print(f'item:{item}')
result.append(item)
return result
#novalinhasubtraida contains subtrairlinhas
def novalinhasubtraida(matriz,linhatransformada,linhado1):
result = subrairlinhas(matriz,linhatransformada,linhado1)
#print(result)
matriz.remove(matriz[linhatransformada])
matriz.insert(linhatransformada,result)
return matriz
for example:
INPUT:
novalinhasubtraida(matriz,1,0)
print(matriz)
novalinhasubtraida(matriz,2,0)
print(matriz)
Output:
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.4, 0.6, 1.0, 0.2]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -2.2, -2.6, -1.8], [0.0, -2.2, -2.6, -1.8]]
when instead a insert this:
INPUT:
novalinhasubtraida(matriz,2,0)
print(matriz)
novalinhasubtraida(matriz,1,0)
print(matriz)
OUTPUT:
[[1.0, 7.0, 9.0, 5.0], [1.125, 1.0, 0.25, 0.875], [0.0, -2.2, -2.6, -1.8]]
[[1.0, 7.0, 9.0, 5.0], [0.0, -6.875, -9.875, -4.75], [0.0, -6.875, -9.875, -4.75]]

How to convert a list containing an even number of floats into a string divided by lists whose size is half of that even number? [duplicate]

This question already has answers here:
Split list into smaller lists (split in half)
(21 answers)
Closed 1 year ago.
Say you have this list of floats assuming we have an even number of entries :
[1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
How could one turn this into this string :
[1.0, 2.0, 3.0][4.0, 5.0, 6.0]
If we had 10 elements we would have :
[1.0, 2.0, 3.0, 4.0, 5.0][6.0, 7.0, 8.0, 9.0, 10.0]
etc.
I tried :
list_to_str = ' '.join([str(e) for e in total_list])
final_str = '[' + list_to_str + ']'
But with this, the first '[' and the last ']' are placed only at the beginning and at the end of the string... the middle ones are missing...
Try this:
total_list = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
str(total_list[:len(total_list)//2]) + str(total_list[len(total_list)//2:])
#'[1.0, 2.0, 3.0][4.0, 5.0, 6.0]'
you can try list-comprehension
' '.join(map(str, [total_list[i: i+len(total_list)//2] for i in range(0, len(total_list), len(total_list)//2)]))
'[1.0, 2.0, 3.0] [4.0, 5.0, 6.0]'
I'd try something like this.
yourLst = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
middle = len(yourLst)//2
lsts = [yourLst[:middle],yourLst[middle:]]
yourString = ''.join(str(lst) for lst in lsts)
output
[1.0, 2.0, 3.0][4.0, 5.0, 6.0]
and for those who crave one line code,
yourLst = [1.0, 2.0, 3.0, 4.0, 5.0, 6.0]
yourString = ''.join(str(lst) for lst in [yourLst[:len(yourLst)//2],yourLst[len(yourLst)//2:]])

Loading a matrix of float numbers from a file [duplicate]

This question already has answers here:
How do I clone a list so that it doesn't change unexpectedly after assignment?
(24 answers)
How to to read a matrix from a given file?
(8 answers)
Closed 4 years ago.
I have some data like,
1 2 3 4
3 5 6 7
2 8 9 10
and my code is
#!/bin/usr/python
file=open("fer.txt","r")
M=[]
P=[]
K=[]
def particle(M,P,K):
for line in file:
y=line.split()
M.append(y)
for i in range(3):
for j in range(4):
P.append(float(M[i][j]))
K.append(P)
return K
print(particle(M,P,K))
and then I got
[[1.0, 2.0, 3.0, 4.0, 3.0, 5.0, 6.0, 7.0, 2.0, 8.0, 9.0, 10.0], [1.0,
2.0, 3.0, 4.0, 3.0, 5.0, 6.0, 7.0, 2.0, 8.0, 9.0, 10.0], [1.0, 2.0, 3.0,
4.0, 3.0, 5.0, 6.0, 7.0, 2.0, 8.0, 9.0, 10.0]]
but I ahould have or I want something like this,
[[1.0, 2.0, 3.0, 4.0,], [3.0, 5.0, 6.0, 7.0],[ 2.0, 8.0, 9.0, 10.0]]
Edit for duplicate flag: I dont see how its a duplicate question of mine. I am asking different thing also I am trying to understand why my code doesnt work.
Using an iteration. You can use map to convert all elements of list to float.
Ex:
res = []
with open(filename2) as infile:
for line in infile:
line = line.strip()
res.append(map(float, line.split()))
print(res)
Output:
[[1.0, 2.0, 3.0, 4.0], [3.0, 5.0, 6.0, 7.0], [2.0, 8.0, 9.0, 10.0]]
first answer is correct but if you want in a one liner:
#sample.txt content:
1 2 3 4
3 5 6 7
2 8 9 10
#code:
In [12]: d = open('sample.txt').readlines()
...: result = [map(float, x.split()) for x in d]
output:
In [13]: result
Out[13]: [[1.0, 2.0, 3.0, 4.0], [3.0, 5.0, 6.0, 7.0], [2.0, 8.0, 9.0, 10.0]]

How to use pandas rows to form new columns

I have the following:
pa = pd.DataFrame({'a':np.array([[1.,4.],[2.],[3.,4.,5.]]),
'b':np.array([[2.,5.],[3., 6.],[4.,5.,6.]])})
This will yield:
a b
0 [1.0, 4.0] [2.0, 5.0]
1 [2.0, 3.3] [3.0, 6.0]
2 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0]
I have tried various techniques to concatenate items of each array into a new array.
Something in this fashion:
a b c
0 [1.0, 4.0] [2.0, 5.0] [1.0, 2.0]
1 [1.0, 4.0] [2.0, 5.0] [4.0, 5.0]
2 [2.0, 3.3] [3.0, 6.0] [2.0, 3.0]
3 [2.0, 3.3] [3.0, 6.0] [3.3, 6.0]
4 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [3.0, 4.0]
5 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [4.0, 5.0]
6 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] [5.0, 6.0]
if there are other columns I can update those items into the newly created columns. But I'm stuck in getting to this position.
Can anyone please help out?
By using zip with unnesting method
pa['New']=[list(zip(x,y)) for x, y in zip(pa.a,pa.b)]
s=pa.New.str.len()
df=pd.DataFrame({'a':pa['a'].repeat(s),'b':pa['b'].repeat(s),'New':list(map(list,pa.New.sum()))})
df
New a b
0 [1.0, 2.0] [1.0, 4.0] [2.0, 5.0]
0 [4.0, 5.0] [1.0, 4.0] [2.0, 5.0]
1 [2.0, 3.0] [2.0, 3.3] [3.0, 6.0]
1 [3.3, 6.0] [2.0, 3.3] [3.0, 6.0]
2 [3.0, 4.0] [3.0, 4.0, 5.0] [4.0, 5.0, 6.0]
2 [4.0, 5.0] [3.0, 4.0, 5.0] [4.0, 5.0, 6.0]
2 [5.0, 6.0] [3.0, 4.0, 5.0] [4.0, 5.0, 6.0]
IIUC, you need something like this?
def f(row):
return pd.Series(zip(row["a"], row["b"]))
mod = df.apply(f, 1).stack()
mod.index = mod.index.get_level_values(0)
df.merge(mod.to_frame(), left_index=True, right_index=True)
a b c
0 [1.0, 4.0] [2.0, 5.0] (1.0, 2.0)
0 [1.0, 4.0] [2.0, 5.0] (4.0, 5.0)
1 [2.0, 3.3] [3.0, 6.0] (2.0, 3.0)
1 [2.0, 3.3] [3.0, 6.0] (3.3, 6.0)
2 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (3.0, 4.0)
2 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (4.0, 5.0)
2 [3.0, 4.0, 5.0] [4.0, 5.0, 6.0] (5.0, 6.0)

Categories