How to print a specific key from a string of dictionaries. - python

I wanted to get a one hot data based on the number of elements in the list when using sklearn transform.
Code:
from sklearn.feature_extraction.text import CountVectorizer
from itertools import chain
x = [['1234', '5678', '910', 'baba'], ['8', '1'],
[], ['9', '3'], [], ['7', '6'], [], []]
vector = CountVectorizer(token_pattern=r".+", min_df=1, max_df=1.0, lowercase=False,
max_features=None)
vec = [xxx for xx in x for xxx in xx]
vector.fit(chain.from_iterable([vec]))
print(vector.get_feature_names())
new = []
for xx in x:
new.append(vector.transform(xx))
for x in new:
for xx in x.toarray():
print(xx)
Current output:
['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
[0 1 0 0 0 0 0 0 0 0]
[0 0 0 1 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 1 0]
[0 0 0 0 0 0 0 0 0 1]
[0 0 0 0 0 0 1 0 0 0]
[1 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 1 0 0]
[0 0 1 0 0 0 0 0 0 0]
[0 0 0 0 0 1 0 0 0 0]
[0 0 0 0 1 0 0 0 0 0]
My expected output:
['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
[0 1 0 1 0 0 0 0 1 1]
[1 0 0 0 0 0 1 0 0 0]
[0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 1 1 0 0 0 0]
Is there a way to do it using my code? I have tried to change it many times but unfortunately to no luck. Somehow, my brain stops to process anything now.

You shouldn't need explicit for loops for this task. You can use MultiLabelBinarizer instead, also from the sklearn library. It doesn't handle empty lists, so just filter those out first.
Here's an example with Pandas:
import pandas as pd
from sklearn.preprocessing import MultiLabelBinarizer
L = [['1234', '5678', '910', 'baba'], ['8', '1'],
[], ['9', '3'], [], ['7', '6'], [], []]
s = pd.Series(list(filter(None, L)))
mlb = MultiLabelBinarizer()
res = pd.DataFrame(mlb.fit_transform(s),
columns=mlb.classes_,
index=s.index)
print(res)
1 1234 3 5678 6 7 8 9 910 baba
0 0 1 0 1 0 0 0 0 1 1
1 1 0 0 0 0 0 1 0 0 0
2 0 0 1 0 0 0 0 1 0 0
3 0 0 0 0 1 1 0 0 0 0

You can try of using intersect and np isin
intersect function will give closed elements and isin will create boolean list
mask = ['1', '1234', '3', '5678', '6', '7', '8', '9', '910', 'baba']
for xx in x:
if len(xx)>1:
print(np.isin(mask,np.array(list(set(xx).intersection(set(mask))))).astype(int))
Out:
[0 1 0 1 0 0 0 0 1 1]
[1 0 0 0 0 0 1 0 0 0]
[0 0 1 0 0 0 0 1 0 0]
[0 0 0 0 1 1 0 0 0 0]
Flattening the lists
#if you have big lists of elements you can flatten by
sum(x,[])
Out:
['1234', '5678', '910', 'baba', '8', '1', '9', '3', '7', '6']

For future readers:
I somehow solved it with a SUPER NAIVE way.
Here is the codes:
from sklearn.feature_extraction.text import CountVectorizer
from itertools import chain
x = [['1234', '5678', '910', 'baba'], ['8', '1'],
[], ['9', '3'], [], ['7', '6'], [], []]
vector = CountVectorizer(token_pattern=r"\S*\d+\S*", min_df=1, max_df=1.0, lowercase=False,
max_features=None)
vec = [xxx for xx in x for xxx in xx]
vector.fit(chain.from_iterable([vec]))
print(vector.get_feature_names())
new = []
for xx in x:
new.append(" ".join(xx))
neww = vector.transform(new)
print(neww.toarray())

Related

Replace all columns in dataframe from index 1 onwards using conditions

I want to replace columns in dataframe after the first column based on the first column. Suppose we have:
df = {'Z': ['1', '0', '1', '1', '0'],
'A': ['1', '1', '1', '0', '0'],
'B': ['0', '0', '1', '0', '0'],
'C': ['1', '0', '0', '0', '`1']}
df = pd.DataFrame (df, columns = ['Z','A','B','C'])
df
I want to replace the columns with 1 IF column = Z ELSE 0 .
The desired outcome is the following:
df2 = {'Z': ['1', '0', '1', '1', '0'],
'A': ['1', '0', '1', '0', '1'],
'B': ['0', '1', '1', '0', '1'],
'C': ['1', '1', '0', '0', '`0']}
df2 = pd.DataFrame (df2, columns = ['Z','A','B','C'])
df2
The problem is that I have 60 columns (A,B,C,D,.....) and I want to be able to do them at the same time.
Use numpy broadcasting:
# Z column
z = df.iloc[:, 0].values
# rest of columns
rest = df.iloc[:, 1:].values
# do comparison and set values
df.iloc[:, 1:] = (z[:, None] == rest).astype(int)
print(df)
Output
Z A B C
0 1 1 0 1
1 0 0 1 1
2 1 1 1 0
3 1 0 0 0
4 0 1 1 0
If you need a new DataFrame, do the following:
z = df.iloc[:, 0].values
rest = df.iloc[:, 1:].values
df2 = pd.DataFrame(data=(z[:, None] == rest).astype(int), columns=df.columns[1:], index=df['Z']).reset_index()
print(df2)
Output
Z A B C
0 1 1 0 1
1 0 0 1 1
2 1 1 1 0
3 1 0 0 0
4 0 1 1 0
You can use DataFrame.eq along axis=0 to compare the column Z with rest of the columns then join the resulting dataframe with the column Z and mask the NaN values:
df[['Z']].join(df.drop('Z', 1).eq(df['Z'], axis=0).astype(int)).mask(df.isna())
Z A B C
0 1 1 0 1
1 0 0 1 1
2 1 1 1 0
3 1 0 0 0
4 0 1 1 0
I think there's an easy way to do that by checking for equality and converting to integer.
z = df["Z"]
others = [c for c in df.columns if c != "Z"] # all columns but 'Z'
df[others] = df[others].transform(lambda x : x.eq(z).astype(int))
outputs :
Z A B C
0 1 1 0 1
1 0 0 1 1
2 1 1 1 0
3 1 0 0 0
4 0 1 1 0
Note that there is a way to keep the NA's, you must use pandas datatypes though, see nullable data types and text data types.

Pandas crosstab - How to print rows/columns for values that don't exist in the data sets?

I am a beginner with pandas at best and I couldn't find a solution to this problem anywhere.
Let's say I have two variables: variable1, variable2.
They can have the following predefined values:
variable1 = ['1', '4', '9', '15', '20']
variable2 = ['2', '5', '6']
However, the current data set only has some of those values:
df = pd.DataFrame({variable1 : ['1', '9', '20'],
variable2 : ['2', '2', '6']})
When crossing the variables:
pd.crosstab(df.variable1, df.variable2)
I get:
variable2 2 6
variable1
1 1 0
20 0 1
9 1 0
Is there a way to put all the possible categorical values in both the columns and the rows even if the current data set does not have all of them? The goal is to have a table of the same size when running the script with an updated data set which may have the values that were not present in the previous data set.
Use DataFrame.reindex:
variable1 = ['1', '4', '9', '15', '20']
variable2 = ['2', '5', '6']
df = pd.DataFrame({'variable1' : ['1', '9', '20'],
'variable2' : ['2', '2', '6']})
print (df)
variable1 variable2
0 1 2
1 9 2
2 20 6
df = pd.crosstab(df.variable1, df.variable2)
df = df.reindex(index=variable1, columns=variable2, fill_value=0)
print (df)
variable2 2 5 6
variable1
1 1 0 0
4 0 0 0
9 1 0 0
15 0 0 0
20 0 0 1
from collections import OrderedDict
valuelabels = OrderedDict([('S8', [['1', 'Medical oncology'],
['2', 'Hematology'],
['3', 'Hematology/Oncology'],
['4', 'Other']]),
('S9', [['1', 'Academic / Teaching Hospital'],
['2', 'Community-Based Solo Private Practice'],
['3', 'Community-Based Group Private Practice (record practice size )'], ['4', 'Community Non-Teaching Hospital'],
['5', 'Comprehensive Cancer Center'],
['6', 'Other (specify)']])])
#print (valuelabels)
df = pd.DataFrame({'variable1' : ['1', '2', '4'],
'variable2' : ['2', '3', '1']})
table = pd.crosstab(df.variable1, df.variable2)
print (table)
variable2 1 2 3
variable1
1 0 1 0
2 0 0 1
4 1 0 0
d1 = dict(list(zip([a[0] for a in valuelabels['S8']], [a[1] for a in valuelabels['S8']])))
print (d1)
{'4': 'Other', '1': 'Medical oncology', '2': 'Hematology', '3': 'Hematology/Oncology'}
d2 = dict(list(zip([a[0] for a in valuelabels['S9']], [a[1] for a in valuelabels['S9']])))
print (d2)
{'1': 'Academic / Teaching Hospital',
'3': 'Community-Based Group Private Practice (record practice size )',
'4': 'Community Non-Teaching Hospital',
'6': 'Other (specify)',
'2': 'Community-Based Solo Private Practice',
'5': 'Comprehensive Cancer Center'}
table = table.reindex(index=[a[0] for a in valuelabels['S8']],
columns=[a[0] for a in valuelabels['S9'], fill_value=0)
print (table)
variable2 1 2 3 4 5 6
variable1
1 0 1 0 0 0 0
2 0 0 1 0 0 0
3 0 0 0 0 0 0
4 1 0 0 0 0 0
table.index = table.index.to_series().map(d1).values
table.columns = table.columns.to_series().map(d2).values
print (table)
Academic / Teaching Hospital \
Medical oncology 0
Hematology 0
Hematology/Oncology 0
Other 1
Community-Based Solo Private Practice \
Medical oncology 1
Hematology 0
Hematology/Oncology 0
Other 0
Community-Based Group Private Practice (record practice size ) \
Medical oncology 0
Hematology 1
Hematology/Oncology 0
Other 0
Community Non-Teaching Hospital \
Medical oncology 0
Hematology 0
Hematology/Oncology 0
Other 0
Comprehensive Cancer Center Other (specify)
Medical oncology 0 0
Hematology 0 0
Hematology/Oncology 0 0
Other 0 0
You can use reindex:
ct = pd.crosstab(df.variable1, df.variable2)
ct.reindex(index=variable1, columns=variable2).fillna(0).astype('int')
Out:
variable2 2 5 6
variable1
1 1 0 0
4 0 0 0
9 1 0 0
15 0 0 0
20 0 0 1
def TargetPercentByNominal (
targetVar, # target variable
predictor): # nominal predictor
countTable = pandas.crosstab(index = predictor, columns = targetVar, margins = True, dropna = True)
x = countTable.drop('All', 1)
percentTable = countTable.div(x.sum(1), axis='index')*100
print("Frequency Table: \n")
print(countTable)
print( )
print("Percent Table: \n")
print(percentTable)
return

How to apply logical operator OR in some of the list item?

I want to know is it possible to include logical operator OR in the list item. For example:
CHARS = ['X','Y','Z']
change this line of code to something like: (I know this is not a correct way)
Can anyone help me?
CHARS = ['X','Y','Z','X OR Y','Y OR Z','X OR Z']
Example code:
import numpy as np
seqs = ["XYZXYZ","YZYZYZ"]
CHARS = ['X','Y','Z']
CHARS_COUNT = len(CHARS)
maxlen = max(map(len, seqs))
res = np.zeros((len(seqs), CHARS_COUNT * maxlen), dtype=np.uint8)
for si, seq in enumerate(seqs):
seqlen = len(seq)
arr = np.chararray((seqlen,), buffer=seq)
for ii, char in enumerate(CHARS):
res[si][ii*seqlen:(ii+1)*seqlen][arr == char] = 1
print res
It scan through to detect X first if it is occurred then will be awarded 1 then detect Y and last Z.
Output:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1]]
Expected output after include logical OR:
[[1 0 0 1 0 0 0 1 0 0 1 0 0 0 1 0 0 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 1 1 0 1]
[0 0 0 0 0 0 1 0 1 0 1 0 0 1 0 1 0 1 1 0 1 0 1 0 1 1 1 1 1 1 0 1 0 1 0 1]]
The example below is a bit contrived, but using itertools.combinations would be a way to generate combinations of size n for a given list. Combine this with str.join() and you'd be able to generate strings as exemplified in the first part of your question:
import itertools
CHARS = ['X','Y','Z']
allCombinations = [" OR ".join(x) for i in range(1,len(CHARS)) for x in itertools.combinations(CHARS, i)]
print repr(allCombinations)
Output:
['X', 'Y', 'Z', 'X OR Y', 'X OR Z', 'Y OR Z']

python dict timing mystery

I'm doing sequence alignment, and have run into a rather mysterious timing issue related to the origin of my dict data structure.
Basically, I have function alignment(s1, s2, scores)
which takes in two string s1 and s2, and a scoring matrix (as a python dict) for each possible pair of 20 amino acids and a gap '-'. So scores has 440 keys (char1, char2), with integer values.
Here is the mystery: If I read scores from a text file (call it scores1) and run
alignment(s1, s2, scores1)
for some 1000-ish long strings s1, s2 of amino acids I get the following timing (using cProfile and not showing the function output):
2537776 function calls in 11.796 seconds
Now if I create the exactly same dict in my file (call it scores2) and run
alignment(s1, s2, scores2)
I get the same output results but in 3 times less time:
2537776 function calls in 4.263 seconds
The output in both cases is identical, it is just the timing that is different.
Running print scores1 == scores2 results in True, so they contain identical information.
I verified that using an arbitrary function (instead of alignment) that accesses the dict
many times yields the same factor of 3 timing discrepancy in the two cases.
There must be some metadata related to where the dicts originated from that is slowing down my function (when from a file), even though in both cases I actually read in the file.
I tried creating a new dict object for each via scores1 = dict(scores1) etc., but the same timing discrepancy persists. Quite confusing, but I'm pretty sure there will be a good lesson in this if I can figure it out.
scores1 = create_score_dict_from_file('lcs_scores.txt')
scores2 = create_score_dict(find_alp(s1, s2), match=1, mismatch=0, indel=0)
print scores1 == scores2 # True
alignment(s1, s2, scores1) # gives right answer in about 12s
alignment(s1, s2, scores2) # gives right answer in about 4s
EDIT: Added code and results below:
Here is the a simplified version of the code:
import numpy as np
from time import time
def create_scores_from_file(score_file, sigma=0):
"""
Creates a dict of the scores for each pair in an alphabet,
as well as each indel (an amino acid, paired with '-'), which is scored -sigma.
"""
f = open(score_file, 'r')
alp = f.readline().strip().split()
scores = []
for line in f:
scores.append(map(int, line.strip().split()[1:]))
f.close()
scores = np.array(scores)
score_dict = {}
for c1 in range(len(alp)):
score_dict[(alp[c1], '-')] = -sigma
score_dict[('-', alp[c1])] = -sigma
for c2 in range(len(alp)):
score_dict[(alp[c1], alp[c2])] = scores[c1, c2]
return score_dict
def score_matrix(alp=('A', 'C', 'G', 'T'), match=1, mismatch=0, indel=0):
score_dict = {}
for c1 in range(len(alp)):
score_dict[(alp[c1], '-')] = indel
score_dict[('-', alp[c1])] = indel
for c2 in range(len(alp)):
score_dict[(alp[c1], alp[c2])] = match if c1 == c2 else mismatch
return score_dict
def use_dict_in_function(n, d):
start = time()
count = 0
for i in xrange(n):
for k in d.keys():
count += d[k]
print "Time: ", time() - start
return count
def timing_test():
alp = tuple('A C D E F G H I K L M N P Q R S T V W Y'.split())
scores1 = create_scores_from_file('lcs_scores.txt')
scores2 = score_matrix(alp, match=1, mismatch=0, indel=0)
print type(scores1), id(scores1)
print type(scores2), id(scores2)
print repr(scores1)
print repr(scores2)
print type(list(scores1)[0][0])
print type(list(scores2)[0][0])
print scores1 == scores2
print repr(scores1) == repr(scores2)
n = 10000
use_dict_in_function(n, scores1)
use_dict_in_function(n, scores2)
if __name__ == "__main__":
timing_test()
The results are:
<type 'dict'> 140309927965024
<type 'dict'> 140309928036128
{('S', 'W'): 0, ('G', 'G'): 1, ('E', 'M'): 0, ('P', '-'): 0,... (440 key: values)
{('S', 'W'): 0, ('G', 'G'): 1, ('E', 'M'): 0, ('P', '-'): 0,... (440 key: values)
<type 'str'>
<type 'str'>
True
True
Time: 1.51075315475
Time: 0.352770090103
Here is the contents of the file lcs_scores.txt:
A C D E F G H I K L M N P Q R S T V W Y
A 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
C 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
D 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
E 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
F 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
H 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
I 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
K 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
L 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
M 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
N 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
P 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
Q 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
R 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
S 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
V 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
W 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0
Y 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
Which version of Python? And print the repr() of each dict to ensure they really are the same (not just that they compare equal). Can't guess. For example, perhaps you're using Python 2, and in one case your char1 and char2 are plain strings but in the other case they're Unicode strings. Then comparison would say they're the same, but repr() will show the difference:
>>> d1 = {"a": 1}
>>> d2 = {u"a": 1}
>>> d1 == d2
True
>>> print repr(d1), repr(d2)
{'a': 1} {u'a': 1}
In any case, in CPython there is absolutely no internal "metadata" recording where any object came from.
EDIT - something to try
Wonderful job whittling down the problem! This is becoming a pleasure :-) I'd like you to try something. First comment out this line:
scores = np.array(scores)
Then change this line:
score_dict[(alp[c1], alp[c2])] = scores[c1, c2]
to:
score_dict[(alp[c1], alp[c2])] = scores[c1][c2]
^^^^^^
When I do that, the two methods return essentially identical times. I'm not a numpy expert, but my guess is that your "from file" code is using a machine-native numpy integer type for the dict values, and that there's substantial overhead to convert those into Python integers whenever the values are used.
Or maybe not - but that's my guess for now, and I'm sticking to it ;-)

Read each entire Column of CSV file using python (preferably by help of pandas )

I have some data in Microsoft excel that I save them as CSV file for ease of use. the data Structure is like this:
MS Excel format:
L1
0 1 0 0 0 1 1
0 0 1 0 0 1 0
0 0 0 1 0 0 1
0 0 0 0 1 0 0
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
1 1 1 1 1 1 1
CSV format
L1,,,,,,,,,,,,,,
0,1,0,0,0,1,1,
0,0,1,0,0,1,0,
0,0,0,1,0,0,1,
0,0,0,0,1,0,0,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
As you see only the first column has label now I want to read the CSV file (or it's easier the excel file) to get each column and do some bit manipulation operation on them. How can I acheive this? I have read something about pandas But I can't find anything useful in order to fetch each coloumn
Given the .csv file temp.csv
L1x,,,,,,,
0,1,0,0,0,1,1,
0,0,1,0,0,1,0,
0,0,0,1,0,0,1,
0,0,0,0,1,0,0,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
read it in as follows:
import pandas
a = pandas.read_csv('temp.csv', names = ["c%d" % i for i in range(8)], skiprows = 1)
a
Output:
c0 c1 c2 c3 c4 c5 c6 c7
0 0 1 0 0 0 1 1 NaN
1 0 0 1 0 0 1 0 NaN
2 0 0 0 1 0 0 1 NaN
3 0 0 0 0 1 0 0 NaN
4 1 1 1 1 1 1 1 NaN
5 1 1 1 1 1 1 1 NaN
6 1 1 1 1 1 1 1 NaN
7 1 1 1 1 1 1 1 NaN
The 'NaN's in the last column come from the pesky trailing commas. The 8 in the range needs to match the number of columns. To access the columns in a use either
a.c3
or
a[c3]
both of which result in
0 0
1 0
2 1
3 0
4 1
5 1
6 1
7 1
Name: c3
The cool thing about pandas is that if you want to XOR two columns you can very simply.
a.c0^a.c2
Output
0 0
1 1
2 0
3 0
4 0
5 0
6 0
7 0
Name: c0
Assume I have:
Which you can save into a CSV file that looks like so:
L1,,,
L2,0,10,20
L3,1,11,21
L4,2,12,22
L5,3,13,23
L6,4,14,24
L7,5,15,25
L8,6,16,26
L9,7,17,27
L10,8,18,28
To get just any col, use CSV reader and transpose with zip:
import csv
with open('test.csv', 'rU') as fin:
reader=csv.reader(fin)
data=list(reader)
print 'data:', data
# data: [['L1', '', '', ''], ['L2', '0', '10', '20'], ['L3', '1', '11', '21'], ['L4', '2', '12', '22'], ['L5', '3', '13', '23'], ['L6', '4', '14', '24'], ['L7', '5', '15', '25'], ['L8', '6', '16', '26'], ['L9', '7', '17', '27'], ['L10', '8', '18', '28']]
Notice the data is a list of rows. You can transpose that List of Lists using zip to get a list of columns:
trans=zip(*data)
print 'trans:',trans
# trans: [('L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L7', 'L8', 'L9', 'L10'), ('', '0', '1', '2', '3', '4', '5', '6', '7', '8'), ('', '10', '11', '12', '13', '14', '15', '16', '17', '18'), ('', '20', '21', '22', '23', '24', '25', '26', '27', '28')]
Then just index to get a specific column:
print trans[0]
# ('L1', 'L2', 'L3', 'L4', 'L5', 'L6', 'L7', 'L8', 'L9', 'L10')
Of course if you want to do arithmetic on the cells, you will need to convert the string to ints or floats as appropriate.
import pandas as pd
pd.read_excel("foo.xls", "Sheet 1",
names=["c%d" % i for i in range(7)])
Output:
c0 c1 c2 c3 c4 c5 c6
0 0 1 0 0 0 1 1
1 0 0 1 0 0 1 0
2 0 0 0 1 0 0 1
3 0 0 0 0 1 0 0
4 1 1 1 1 1 1 1
5 1 1 1 1 1 1 1
6 1 1 1 1 1 1 1
7 1 1 1 1 1 1 1
Sample Code returns column as array.:
input = """L1,,,,,,,,,,,,,,
0,1,0,0,0,1,1,
0,0,1,0,0,1,0,
0,0,0,1,0,0,1,
0,0,0,0,1,0,0,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
1,1,1,1,1,1,1,
"""
def getColumn(data,column_number):
dump_array=[]
lines=data.split("\n")
for line in lines:
tmp_cell = line.split(",")
dump_array.append(tmp_cell[3])
return dump_array
#for ex. get column 3
getColumn(3,input)
This may give an idea to manuplate your grid...
Note: I dont have an interpreter for testing code now, so sorry if there is typo...

Categories