trace the column following the string column for given dataset

trace the column following the string column for given dataset - python

Looking for generic logic which can be applied to any dataset for finding the column which falls next to string column.
Problem:
Sum of first column and the column following the one with the string datatype = 5th column
{
[10, 20, 'tyh', 30],
[66, 88, 'ltk', 99],
[41, 31, 'qed', 11]
}
Expected output:
{
[10, 20, 'tyh', 30, 40],
[66, 88, 'ltk', 99, 165],
[41, 31, 'qed', 11 , 52]
}
The logic should work for below df also
{
[111, 'tfy', 122, 133],
[167, 'elt', 187, 197],
[143, 'xqe', 132, 112]
}
expected output :
{
[111, 'tfy', 122, 133 , 244],
[167, 'elt', 187, 197 , 364],
[143, 'xqe', 132, 112, 255]
}
what I did so far:
data[4]= data[0]+data[3]
but this is hard coded and so wont work for second df.

str_col_no = df.columns[df.dtypes==object][0]
df[4] = df[0] + df[str_col_no+1]

Related

Time series with matrix

I have a mat extension data which I want to separate every seconds values. My matrix is (7,5,2500) time series 3 dimensional matrix which want to get the values of (7,5,1) ...(7,5,2500) separately and save it
for example
array([155, 33, 129,167,189,63,35
161, 218, 6,58,36,25,3
89,63,36,25,78,95,21
78,52,36,56,25,15,68
]],
[215, 142, 235,
143, 249, 164],
[221, 71, 229,
56, 91, 120],
[236, 4, 177,
171, 105, 40])
for getting every part of this data for example this matrix
[215, 142, 235,
143, 249, 164]
what should I do?

a = [[155, 33, 129, 161, 218, 6],
[215, 142, 235, 143, 249, 164],
[221, 71, 229, 56, 91, 120],
[236, 4, 177, 171, 105, 40]]
print(a[1])

Assuming you have your data saved in a numpy array you could use slicing to extract the sub-matrices you need. Here is an example with a (3,5,3) matrix (but the example could be applied to any dimension):
A = numpy.array([[[1,1,1],
[2,2,2],
[3,3,3],
[4,4,4],
[5,5,5]],
[[11,11,11],
[21,21,21],
[31,31,31],
[41,41,41],
[51,51,51]],
[[12,12,12],
[22,22,22],
[32,32,32],
[42,42,42],
[52,52,52]]]
sub_matrix_1 = A[:,:,0]
print (sub_matrix_1)
Will produce:
[[ 1 2 3 4 5]
[11 21 31 41 51]
[12 22 32 42 52]]
EDIT: it is also possible to iterate over the array to get the 3rd dimension array:
for i in range(A.shape[-1]):
print (A[:,:,i])
# Your submatrix is A[:,:,i], you can directly manipulate it

Numpy modify each array in multidimensional array with arange [duplicate]

This question already has answers here:
Vectorized NumPy linspace for multiple start and stop values
(4 answers)
Closed 5 years ago.
I have a dataset that contains a multidimensional array of shape (2400, 2).
I want to be able to take each of these 2400 rows, and modify them to be a range from the start and end points (the two elements in each of the 2400 rows). The range is always the same length (in my case, a length of 60).
For example, if I have something like this:
array([[ 78, 82],
[ 90, 94],
[ 102, 106]])
My output should be something like this:
array([[ 78, 79, 80, 81, 82],
[ 90, 91, 92, 93, 94],
[ 102, 103, 104, 105, 106]])
The only way I have been able to do this is with a for loop, but I am trying to avoid looping through each row as the dataset can get very large.
Thanks!

Since by necessity all of the aranges need to be equally long, we can create an arange along the first entry and then replicate it for the others.
For example:
x = np.array([[78, 82],
[90, 94],
[102, 106]])
>>> x[:, :1] + np.arange(0, 1 + x[0, 1] - x[0, 0])
# array([[ 78, 79, 80, 81],
# [ 90, 91, 92, 93],
# [102, 103, 104, 105]])

If the difference between the second column and first column is always 4, then you can extract the first column and add an array of [0,1,2,3,4] to it:
arr = np.array([[ 78, 82],
[ 90, 94],
[ 102, 106]])
arr[:,:1] + np.arange(5)
Out[331]:
array([[ 78, 79, 80, 81, 82],
[ 90, 91, 92, 93, 94],
[102, 103, 104, 105, 106]])

How to use numpy to flip this array?

Suppose I have an array like this:
a = array([[[ 29, 29, 27],
[ 36, 38, 40],
[ 86, 88, 89]],
[[200, 200, 198],
[199, 199, 197]
[194, 194, 194]]])
and I want to flip the 3rd element from left to right in the list-of-lists so it will become like this:
b = array([[[ 29, 29, 89], # 27 became 89
[ 36, 38, 40],
[ 86, 88, 27]], # 89 became 27
[[200, 200, 194], # 198 became 194
[199, 199, 197],
[194, 194, 198]]]) # 194 became 198
I looked up the NumPy manual but I still cannot figure out a solution. .flip and .fliplr look suitable in this case, but how do I use them?

Index the array to select the sub-array, using:
> a[:,:,-1]
array([[198, 197, 194],
[ 27, 40, 89]])
This selects the last element along the 3rd dimension of a. The sub-array is of shape (2,3). Then reverse the selection using:
a[:,:,-1][:,::-1]
The second slice, [:,::-1], takes everything along the first dimension as-is ([:]), and all of the elements along the second dimension, but reversed ([::-1]). The slice syntax is basically saying start at the first element, go the last element ([:]), but do it in the reverse order ([::-1]). You could pseudo-code write it as [start here : end here : use this step size]. The the -1 tells it walk backwards.
And assign it to the first slice of the original array. This updates/overwrites the original value of a
a[:,:,-1] = a[:,:,-1][:,::-1]
> a
array([[[ 29, 29, 89],
[ 36, 38, 40],
[ 86, 88, 27]],
[[200, 200, 194],
[199, 199, 197],
[194, 194, 198]]])

Group Python lists based on repeated items

This question is very similar to this one Group Python list of lists into groups based on overlapping items, in fact it could be called a duplicate.
Basically, I have a list of sub-lists where each sub-list contains some number of integers (this number is not the same among sub-lists). I need to group all sub-lists that share one integer or more.
The reason I'm asking a new separate question is because I'm attempting to adapt Martijn Pieters' great answer with no luck.
Here's the MWE:
def grouper(sequence):
result = [] # will hold (members, group) tuples
for item in sequence:
for members, group in result:
if members.intersection(item): # overlap
members.update(item)
group.append(item)
break
else: # no group found, add new
result.append((set(item), [item]))
return [group for members, group in result]
gr = [[29, 27, 26, 28], [31, 11, 10, 3, 30], [71, 51, 52, 69],
[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81],
[86, 68, 67, 84]]
for i, group in enumerate(grouper(gr)):
print 'g{}:'.format(i), group
and the output I get is:
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
g4: [[86, 84, 81, 82, 83, 85]]
The last group g4 should have been merged with g3, since the lists inside them share the items 81, 83 and 84, and even a single repeated element should be enough for them to be merged.
I'm not sure if I'm applying the code wrong, or if there's something wrong with the code.

You can describe the merge you want to do as a set consolidation or as a connected-components problem. I tend to use an off-the-shelf set consolidation algorithm and then adapt it to the particular situation. For example, IIUC, you could use something like
def consolidate(sets):
# http://rosettacode.org/wiki/Set_consolidation#Python:_Iterative
setlist = [s for s in sets if s]
for i, s1 in enumerate(setlist):
if s1:
for s2 in setlist[i+1:]:
intersection = s1.intersection(s2)
if intersection:
s2.update(s1)
s1.clear()
s1 = s2
return [s for s in setlist if s]
def wrapper(seqs):
consolidated = consolidate(map(set, seqs))
groupmap = {x: i for i,seq in enumerate(consolidated) for x in seq}
output = {}
for seq in seqs:
target = output.setdefault(groupmap[seq[0]], [])
target.append(seq)
return list(output.values())
which gives
>>> for i, group in enumerate(wrapper(gr)):
... print('g{}:'.format(i), group)
...
g0: [[29, 27, 26, 28]]
g1: [[31, 11, 10, 3, 30]]
g2: [[71, 51, 52, 69]]
g3: [[78, 67, 68, 39, 75], [86, 84, 81, 82, 83, 85], [84, 67, 78, 77, 81], [86, 68, 67, 84]]
(Order not guaranteed because of the use of the dictionaries.)

Sounds like set consolidation if you turn each sub list into a set instead as you are interested in the contents not the order so sets are the best data-structure choice. See this: http://rosettacode.org/wiki/Set_consolidation

2-dimensional Array decomposition in Python

I would appreciate your help with a translation to Python 3 that decomposes an input array of any size into smaller square arrays of length 4.
I have tried chunks and the array functions in numpy but they are useless for this.
Here is my code in Perl that works well, but I want it to compare to Python (in efficiency).
sub make_array {
my $input = shift;
my $result;
my #parts = split '-', $input;
$result = [];
# Test for valid number of lines in inputs
my $lines = scalar #parts;
if($lines % $width){
die "Invalid line count $lines not divisible by $width" ;
# Or could pad here by adding an entire row of '0'.
}
# Chunk input lines into NxN subarrays
# loop across all input lines in steps of N lines
my $line_width = 0;
for (my $nn=0;$nn<$lines;$nn+=$width){
# make a temp array to handle $width rows of input
my #temp = (0..$width-1);
for my $ii (0..$width-1){
my $p = $parts[$nn+$ii];
my $padding_needed = length($p) % $width;
if($padding_needed != 0) {
print "'$p' is not divisible by correct width of $width, Adding $padding_needed zeros\n";
for my $pp (0..$padding_needed){
$p .= "0";
}
}
if($line_width == 0){
$line_width = length($p);
}
$temp[$ii] = $p;
}
# now process temp array left to right, creating keys
my $chunks = ($line_width/$width);
if($DEBUG) { print "chunks: $chunks\n"; }
for (my $zz =0;$zz<$chunks;$zz++){
if($DEBUG) { print "zz:$zz\n"; }
my $key;
for (my $yy=0;$yy<$width;$yy++){
my $qq = $temp[$yy];
$key .= substr($qq,$zz*$width, $width) . "-";
}
chop $key; # lose the trailing '-'
if($DEBUG) { print "Key: $key\n"; }
push #$result, $key;
}
}
if($DEBUG){
print "Reformatted input:";
print Dumper $result;
my $count = scalar #$result;
print "There are $count keys to check against the lookup table\n";
}
return $result;
}
As an example, I have the following 12 x 12 matrix:
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
000011110011
and I want it decomposed into 6 square submatrices of length 4:
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
0000 1111 0011
The original matrix comes from a file (the program should read it from a text file) in the following format:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011
So the program needs to split it by hyphens and take each chunk as a row of the large matrix. The 6 submatrices should come in the same input format, hence the first one would be:
0000,0000,0000,0000
The program should decompose any input matrix into square matrices of length j, say 4, if the original matrix is of size not multiple of 4 then it should disregard the remaining chunks that couldn't form a 4x4 matrix.
Several large matrices of different size could come in the original input file, with break lines as separators. For example, the original large matrix together with anothe rmatrix would look like the following in a text file:
000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011,000011110011\n
0101,0101,0101,0101
and retrieve the 2 sets of subarrays, one of 6 arrays of 4x4 and a single one of 4x4 for the second one. If you solve it for the single case is of course fine.

This is easy with numpy. Suppose we have a 12x12 array;
In [1]: import numpy as np
In [2]: a = np.arange(144).reshape([-1,12])
In [3]: a
Out[3]:
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
[ 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23],
[ 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35],
[ 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47],
[ 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59],
[ 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71],
[ 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83],
[ 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95],
[ 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107],
[108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119],
[120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131],
[132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143]])
To select the top-left 4x4 array, use slicing:
In [4]: a[0:4,0:4]
Out[4]:
array([[ 0, 1, 2, 3],
[12, 13, 14, 15],
[24, 25, 26, 27],
[36, 37, 38, 39]])
The right-bottom sub-array is:
In [7]: a[8:12,8:12]
Out[7]:
array([[104, 105, 106, 107],
[116, 117, 118, 119],
[128, 129, 130, 131],
[140, 141, 142, 143]])
You can guess the rest...

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

trace the column following the string column for given dataset - python

str_col_no = df.columns[df.dtypes==object][0] df[4] = df[0] + df[str_col_no+1]

Related

Time series with matrix

Numpy modify each array in multidimensional array with arange [duplicate]

How to use numpy to flip this array?

Group Python lists based on repeated items

2-dimensional Array decomposition in Python

Categories

Resources