if i have code like this
import pandas as pd
import random
x = 5
table = []
row = []
for i in range(x):
for j in range(x):
if i == j :
row.append(0)
else :
row.append(random.randint(0,1))
table.append(row)
row = []
df = pd.DataFrame(table)
df
and the output will be there
how to make graph from this table ?
i want the output graph like this [(0,1), (0,2), (1,0), (1,2), (1,4), (2,3), (2,4), (3,0), (3,1), (3,2), (3,4), (4,0)]
IIUC, replace 0 by NA, stack (which drops the NA by default), and convert the index to list:
df.replace(0, pd.NA).stack().index.to_list()
output:
[(0, 3), (0, 4), (1, 0), (1, 2), (1, 3), (2, 0), (2, 1), (4, 0), (4, 3)]
matching input:
0 1 2 3 4
0 0 0 0 1 1
1 1 0 1 1 0
2 1 1 0 0 0
3 0 0 0 0 0
4 1 0 0 1 0
Related
I have a list and dataframe (example below).
0 1
0 ((test1, AA), (1, 1)) 1
1 ((test2, BB), (1, 1)) 2
2 ((test1, CC), (1, 1)) 3
3 ((test1, DD), (2, 1)) 8
4 ((test3, EE), (3, 1)) 9
I need to filter out only data with first elements test1 AND 1 . Could you please help?
Expected output:
0 1
0 ((test1, AA), (1, 1)) 1
2 ((test1, CC), (1, 1)) 3
You can use boolean indexing:
v = df[0].apply(lambda i: i[0][0] == 'test1' and i[1][0] == 1)
df = df[v]
print(df)
Output
0 1
0 ((test1, AA), (1, 1)) 1
2 ((test1, CC), (1, 1)) 3
I am trying to make a sorta beginners level enigma machine and I have a problem with the rotors and them not increasing after the first run through with a letter. Down below is the function with said problem
def rotating(rotate_a, rotate_b, rotate_c, add_amount):
rotate_a += 1
print("")
if rotate_a == 27:
rotate_a = 0
rotate_b += 1
add_amount += 1
rotate_a_b_c(b1, add_amount)
else:
rotate_a_b_c(a1, add_amount)
add_amount += 1
rotate_a += 1
if rotate_b == 27:
rotate_b = 0
rotate_c += 1
add_amount += 1
rotate_a_b_c(c1, add_amount)
else:
add_amount += 1
if rotate_c == 27:
rotate_c = 0
add_amount += 1
else:
add_amount += 1
print(rotate_a, rotate_b, rotate_c)
The way I want this to work is there is another function encrypting and once it encrypts one letter it will go to move the rotors(add_ammount is a little feature that is not important) at the bottom of the original code you can see what is printing out all the 1 and 0
C:\Users\jgola\AppData\Local\Programs\Python\Python37-32\python.exe "C:/Users/jgola/PycharmProjects/Game/enigma matchine.py"
word to be coded(NO CAPS): abcd efgh ijkl mnop qrst uvwx yz ab cder ghij klmn opqr stuv wxwz
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
1 0 0
bxtu disw amlj gvqy rcze okfh npbx tudc swam ljgv qyrc zeok fhfp
Process finished with exit code 0
this is the output of the enigma machine(we are focusing on the ones and zeros) every time a new set of numbers are printed it should have an increase in the numbers but it doesn't. why exactly is it doing that and what can I do to fix this?
def coding(a, b, c, d):
add_amount = 0
roa = 0
rob = 0
roc = 0
final = str()
var = input("word to be coded(NO CAPS): ")
var = word_preparing(var)
var_len = int(len(var) / 4)
for z in range(0, var_len):
x = z * 4
for l in range(0, 4):
start = ord(var[l+x])-97
for i in range(0, 26):
if a[start][1] == b[i][0]:
start = i
break
for i in range(0, 26):
if b[start][1] == c[i][0]:
start = i
break
for i in range(0, 26):
if c[start][1] == d[i][0]:
start = i
break
for i in range(0, 26):
if d[start][1] == c[i][0]:
start = i
break
for i in range(0, 26):
if c[start][1] == b[i][0]:
start = i
break
for i in range(0, 26):
if b[start][1] == a[i][0]:
start = i
end = chr(a[start][1]+96)
final += end
rotating(roa, rob, roc, add_amount)
a = [(a1[0], 22), (a1[1], 24), (a1[2], 15), (a1[3], 8), (a1[4], 10), (a1[5], 9), (a1[6], 26),
(a1[7], 16), (a1[8], 2), (a1[9], 21), (a1[10], 20),
(a1[11], 1), (a1[12], 6), (a1[13], 18), (a1[14], 25), (a1[15], 19), (a1[16], 5), (a1[17], 14),
(a1[18], 17), (a1[19], 3), (a1[20], 23),
(a1[21], 7), (a1[22], 13), (a1[23], 4), (a1[24], 12), (a1[25], 11)]
b = [(b1[0], 15), (b1[1], 5), (b1[2], 24), (b1[3], 16), (b1[4], 13), (b1[5], 4), (b1[6], 3),
(b1[7], 20), (b1[8], 25), (b1[9], 6), (b1[10], 23),
(b1[11], 21), (b1[12], 9), (b1[13], 14), (b1[14], 26), (b1[15], 19), (b1[16], 17), (b1[17], 1),
(b1[18], 11), (b1[19], 8), (b1[20], 12),
(b1[21], 7), (b1[22], 22), (b1[23], 2), (b1[24], 18), (b1[25], 10)]
c = [(c1[0], 9), (c1[1], 11), (c1[2], 6), (c1[3], 14), (c1[4], 13), (c1[5], 8), (c1[6], 22),
(c1[7], 26), (c1[8], 16), (c1[9], 2), (c1[10], 5),
(c1[11], 25), (c1[12], 23), (c1[13], 21), (c1[14], 17), (c1[15], 24), (c1[16], 20), # the reason this is all derpy is because I copied this from mobile to comuter
(c1[17], 18), (c1[18], 3), (c1[19], 19), (c1[20], 15),
(c1[21], 1), (c1[22], 10), (c1[23], 12), (c1[24], 4), (c1[25], 7)]
break
final += " "
print(final)
You are passing the values of the variables to the function, not the variables themselves. These copies only exist while the function is running, and changing them does not affect the values of the original variables, even though they have the same name.
Try adding
global rotate_a, rotate_b, rotate_c
at the top of the function, right after the def statement.
Also remove the variables from the def statement and all calls to the function.
def rotating(add_amount):
I am trying to produce a bigram word co-occurrence matrix, indicating how many times one word follows another in a corpus.
As a test, I wrote the following (which I gathered from other SE questions):
from sklearn.feature_extraction.text import CountVectorizer
test_sent = ['hello', 'i', 'am', 'hello', 'i', 'dont', 'want', 'to', 'i', 'dont']
bigram_vec = CountVectorizer(ngram_range=(1,2))
X = bigram_vec.fit_transform(test_sent)
Xc = (X.T * X)
print Xc
This should give the correct output. The matrix Xc is output like so:
(0, 0) 1
(1, 1) 2
(2, 2) 2
(3, 3) 1
(4, 4) 1
I have no idea how to interpret this. I attempted to make it dense to help with my interpretation using Xc.todense(), which got this:
[[1 0 0 0 0]
[0 2 0 0 0]
[0 0 2 0 0]
[0 0 0 1 0]
[0 0 0 0 1]]
Neither of these give the correct word co-occurrence matrix showing one how many times row follows column.
Could someone please explain how I can interpret/use the output? Why is it like that?
Addition to question
Here is another possible output with a different example using ngram_range=(2,2):
from sklearn.feature_extraction.text import CountVectorizer
test_sent = ['hello biggest awesome biggest biggest awesome today lively splendid awesome today']
bigram_vec = CountVectorizer(ngram_range=(2,2))
X = bigram_vec.fit_transform(test_sent)
print bigram_vec.get_feature_names()
Xc = (X.T * X)
print Xc
print ' '
print Xc.todense()
(4, 0) 1
(2, 0) 2
(0, 0) 1
(3, 0) 1
(1, 0) 2
(7, 0) 1
(5, 0) 1
(6, 0) 1
(4, 1) 2
(2, 1) 4
(0, 1) 2
(3, 1) 2
(1, 1) 4
(7, 1) 2
(5, 1) 2
(6, 1) 2
(4, 2) 2
(2, 2) 4
(0, 2) 2
(3, 2) 2
(1, 2) 4
(7, 2) 2
(5, 2) 2
(6, 2) 2
(4, 3) 1
: :
(6, 4) 1
(4, 5) 1
(2, 5) 2
(0, 5) 1
(3, 5) 1
(1, 5) 2
(7, 5) 1
(5, 5) 1
(6, 5) 1
(4, 6) 1
(2, 6) 2
(0, 6) 1
(3, 6) 1
(1, 6) 2
(7, 6) 1
(5, 6) 1
(6, 6) 1
(4, 7) 1
(2, 7) 2
(0, 7) 1
(3, 7) 1
(1, 7) 2
(7, 7) 1
(5, 7) 1
(6, 7) 1
[[1 2 2 1 1 1 1 1]
[2 4 4 2 2 2 2 2]
[2 4 4 2 2 2 2 2]
[1 2 2 1 1 1 1 1]
[1 2 2 1 1 1 1 1]
[1 2 2 1 1 1 1 1]
[1 2 2 1 1 1 1 1]
[1 2 2 1 1 1 1 1]]
This one seems to tokenize by bigrams, since calling bigram_vec.get_feature_names() gives
[u'awesome biggest', u'awesome today', u'biggest awesome', u'biggest biggest', u'hello biggest', u'lively splendid', u'splendid awesome', u'today lively']
Some help interpretting this would be great. It's a symmetric matrix so I'm thinking it might just be number of occurrences?
First you need to check out the feature names which the CountVectorizer is using.
Do this:
bigram_vec.get_feature_names()
# Out: [u'am', u'dont', u'hello', u'to', u'want']
You see that the word "i" is not present. That's because the default tokenizer uses a pattern:
token_pattern : string
Regular expression denoting what constitutes a “token”, only used if
analyzer == 'word'. The default regexp select tokens of 2 or more
alphanumeric characters (punctuation is completely ignored and always
treated as a token separator).
And the actual output of the X should be interpreted as:
[u'am', u'dont', u'hello', u'to', u'want']
'hello' [[ 0 0 1 0 0]
'i' [ 0 0 0 0 0]
'am' [ 1 0 0 0 0]
'hello' [ 0 0 1 0 0]
'i' [ 0 0 0 0 0]
'dont' [ 0 1 0 0 0]
'want' [ 0 0 0 0 1]
'to' [ 0 0 0 1 0]
'i' [ 0 0 0 0 0]
'dont' [ 0 1 0 0 0]]
Now when you do X.T * X this should be interpreted as:
u'am' u'dont' u'hello' u'to' u'want'
u'am' [[1 0 0 0 0]
u'dont' [0 2 0 0 0]
u'hello' [0 0 2 0 0]
u'to' [0 0 0 1 0]
u'want' [0 0 0 0 1]]
If you are expecting anything else, then you should add the details in the question.
I have a dataframe:
>>> df = pd.DataFrame(np.random.random((3,3)))
>>> df
0 1 2
0 0.732993 0.611314 0.485260
1 0.935140 0.153149 0.065653
2 0.392037 0.797568 0.662104
What is the easiest way for me convert each entry to a 2-tuple, with first element from the current dataframe, and 2nd element from the last columns ('2')?
i.e. I want the final results to be:
0 1 2
0 (0.732993, 0.485260) (0.611314, 0.485260) (0.485260, 0.485260)
1 (0.935140, 0.065653) (0.153149, 0.065653) (0.065653, 0.065653)
2 (0.392037, 0.662104) (0.797568, 0.662104) (0.662104, 0.662104)
As of pd version 0.20, you can use df.transform:
In [111]: df
Out[111]:
0 1 2
0 1 3 4
1 2 4 5
2 3 5 6
In [112]: df.transform(lambda x: list(zip(x, df[2])))
Out[112]:
0 1 2
0 (1, 4) (3, 4) (4, 4)
1 (2, 5) (4, 5) (5, 5)
2 (3, 6) (5, 6) (6, 6)
Or, another solution using df.apply:
In [113]: df.apply(lambda x: list(zip(x, df[2])))
Out[113]:
0 1 2
0 (1, 4) (3, 4) (4, 4)
1 (2, 5) (4, 5) (5, 5)
2 (3, 6) (5, 6) (6, 6)
You can also use dict comprehension:
In [126]: pd.DataFrame({i : df[[i, 2]].apply(tuple, axis=1) for i in df.columns})
Out[126]:
0 1 2
0 (1, 4) (3, 4) (4, 4)
1 (2, 5) (4, 5) (5, 5)
2 (3, 6) (5, 6) (6, 6)
I agree with Corley's comment that you are better off leaving the data in the current format, and changing your algorithm to process data explicitly from the second column.
However, to answer your question, you can define a function that does what's desired and call it using apply.
I don't like this answer, it is ugly and "apply" is syntatic sugar for a "For Loop", you are definitely better off not using this:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.random((3,3)))
df
0 1 2
0 0.847380 0.897275 0.462872
1 0.161202 0.852504 0.951304
2 0.093574 0.503927 0.986476
def make_tuple(row):
n= len(row)
row = [(x,row[n - 1]) for x in row]
return row
df.apply(make_tuple, axis =1)
0 (0.847379908309, 0.462871875315) (0.897274903359, 0.462871875315)
1 (0.161202442072, 0.951303842798) (0.852504052133, 0.951303842798)
2 (0.0935742441563, 0.986475692614) (0.503927404884, 0.986475692614)
2
0 (0.462871875315, 0.462871875315)
1 (0.951303842798, 0.951303842798)
2 (0.986475692614, 0.986475692614)
I have a list of columns that I need to concatenate. An example table would be:
import numpy as np
cats1=['T_JW', 'T_BE', 'T_FI', 'T_DE', 'T_AP', 'T_KI', 'T_HE']
data=np.array([random.sample(range(0,2)*7,7)]*3)
df_=pd.DataFrame(data, columns=cats1)
So I need to get the concatenation of each line (if it's possible with a blank space between each value). I tried:
listaFin=['']*1000
for i in cats1:
lista=list(df_[i])
listaFin=zip(listaFin,lista)
But I get a list of tuples:
listaFin:
[((((((('', 0), 0), 1), 0), 1), 0), 1),
((((((('', 0), 0), 1), 0), 1), 0), 1),
((((((('', 0), 0), 1), 0), 1), 0), 1)]
And I need to get something like
[0 0 1 0 1 0 1,
0 0 1 0 1 0 1,
0 0 1 0 1 0 1]
How can I do this only using one loop or less (i don't want to use a double loop)?
Thanks.
I don't think you can have a list of space delimited integers in Python without them being in a string (I might be wrong). Having said that, the answer I have is:
output = []
for i in range(0,df_.shape[0]):
output.append(' '.join(str(x) for x in list(df_.loc[i])))
print(output)
output looks like this:
['1 0 0 0 1 0 1', '1 0 0 0 1 0 1', '1 0 0 0 1 0 1']