Understanding Matrix to List of List and then Numpy Array - python

I want to construct a matrix like:
Col1 Col2 Col3 Coln
row1 1 2 4 2
row2 3 8 3 3
row3 8 7 7 3
rown n n n n
I have yet to find anything in the python documentation that states how a list of list is assembled, is it like:
a = [[1,2,4,2],[3,8,3,3],[8,7,7,3],[n,n,n,n]]
Where each row is a list item or should it be that each column is a list item:
b = [[1,3,8,n],[2,8,7,n],[4,3,7,n],[2,3,3,n]]
I would think that this would be a common question but I can't seem to find a straight answer.
Based on the documentation I'm guessing that I can convert this to a numpy array by simply:
np.array(a)
Can anyone help?

You want the first version:
a = [[1,2,4,2],[3,8,3,3],[8,7,7,3],[n,n,n,n]]
When accessing an element in a matrix, you typically use matrix[row][col], so with the above Python list format a[i] would give you row i, and a[i][j] would give you the jth element from the ith row.
To convert it to a numpy array, np.array(a) is the correct method.

This:
a = [[1,2,4,2],[3,8,3,3],[8,7,7,3],[n,n,n,n]]
will create the list you want, and yes, np.array(a) will convert it to a numpy array.
Also, this is the 'pythonish' was of creating an array with m rows and n columns (and setting all the elements to 0):
a = [[0 for i in range(n)] for j in range(m)]

Since you mention "matrix" let me also add that you have the np.matrix() option as well.
For example: You can use
A = [[1,2,3],[4,5,6],[7,8,9]]
to create a list (of lists), with each inner list representing a row.
Then
AA = np.array(A)
will create a 2D array with the appearance of a matrix, but not all the properties of a matrix.
Whereas
AM = np.matrix(A)
will create a matrix.
If you perform arithmetic operations on these two then you'll see the difference. For example
AA**2
will square each element in the 2D array. However
AM**2
will perform matrix multiplication of AM by itself.
BTW. The above usage assumes "import numpy as np" of course.

Use the first convention. If transpose needed:
>>> a = [[1,2,4,2],[3,8,3,3],[8,7,7,3],['n','n','n','n']]
>>> trans=[]
>>> for i in range(len(a)):
... trans.append([row[i] for row in a])
...
>>> trans
[[1, 3, 8, 'n'], [2, 8, 7, 'n'], [4, 3, 7, 'n'], [2, 3, 3, 'n']]
An element is then a[row][col] vs trans[col][row] (with respect to a of your example)
The first is used by Python and that is easily seen why you should use the first convention when laid out:
a = [[1,2,4,2],
[3,8,3,3],
[8,7,7,3],
['n','n','n','n']]
Certainly when you use numpy, use the first convention since that is used by numpy:
>>> np.array(a)
array([['1', '2', '4', '2'],
['3', '8', '3', '3'],
['8', '7', '7', '3'],
['n', 'n', 'n', 'n']],
dtype='|S1')
>>> np.array(trans)
array([['1', '3', '8', 'n'],
['2', '8', '7', 'n'],
['4', '3', '7', 'n'],
['2', '3', '3', 'n']],
dtype='|S1')
Note: numpy converts the ints to strings because of the 'n' in the final row/col.
When you actual start to print that table, here is a way:
def pprint_table(table):
def format_field(field, fmt='{:,.0f}'):
if type(field) is str: return field
if type(field) is tuple: return field[1].format(field[0])
return fmt.format(field)
def get_max_col_w(table, index):
return max([len(format_field(row[index])) for row in table])
col_paddings=[get_max_col_w(table, i) for i in range(len(table[0]))]
for i,row in enumerate(table):
# left col
row_tab=[row[0].ljust(col_paddings[0])]
# rest of the cols
row_tab+=[format_field(row[j]).rjust(col_paddings[j]) for j in range(1,len(row))]
print(' '.join(row_tab))
pprint_table([
['','Col 1', 'Col 2', 'Col 3', 'Col 4'],
['row 1', '1','2','4','2'],
['row 2','3','8','3','3'],
['row 3','8','7','7','3'],
['row 4', 'n','n','n','n']])
Prints:
Col 1 Col 2 Col 3 Col 4
row 1 1 2 4 2
row 2 3 8 3 3
row 3 8 7 7 3
row 4 n n n n

Related

series of list to multidimensional np array

I have a pandas dataframe df.
One column is a string of numbers (as characters) divided by blank space
I need to convert it to multidim numpy array.
I thought that :
df.A.apply(lambda x: np.array(x.split(" "))).values
would make the trick
Actually it returns an array of array....
array([array(['70', '80', '82', ..., '106', '109', '82'], dtype='<U3'),
array(['151', '150', '147', ..., '193', '183', '184'], dtype='<U3'),
Which does not seem to be what I look what i am looking for whcih should rather look like
array([[[['70', '80', '82', ..., '106', '109', '82'],['151', '150', '147', ..., '193', '183', '184']....
First: what shoudl I do to have my daya in the second format?
Second: I am actually a bit confused about the difference between the 2 data structures. In the end of the day a multidimensional array is an array of arrays. From this perspective it would seem that the 2 are the same structure. But I am sure I am missing somthing
EXAMPLE:
df=pd.DataFrame({"A":[0,1,2,3],"B":["1 2 3 4","5 6 7 8","9 10 11 12","13 14 15 16"]})
A B
0 0 "1 2 3 4"
1 1 "5 6 7 8"
2 2 "9 10 11 12"
3 3 "13 14 15 16"
This command
df.B.apply(lambda x: np.array(x.split(" "))).values
gives:
array([array(['1', '2', '3', '4'], dtype='<U1'),
array(['5', '6', '7', '8'], dtype='<U1'),
array(['9', '10', '11', '12'], dtype='<U2'),
array(['13', '14', '15', '16'], dtype='<U2')], dtype=object)
instead of
array([['1', '2', '3', '4'],
['5', '6', '7', '8'],
['9', '10', '11', '12'],
['13', '14', '15', '16']], dtype='<U2')
Question1: How do I get this last structure?
Question2: what is the difference between the 2? Technically are both array of arrays...
you can do it using str.split on df.A directly, with the parameter expand=True and then use values such as:
df = pd.DataFrame({'A':['70 80 82','151 150 147']})
print (df.A.str.split(' ',expand=True).values)
array([['70', '80', '82'],
['151', '150', '147']], dtype=object)
with your method, if all the strings contain the same amount of numbers, you can still use np.stack to get the same result:
print (np.stack(df.A.apply(lambda x: np.array(x.split(" "))).values))
EDIT: for the difference, not sure I can explain it good enough but I try. let's define
arr1 = df.A.str.split(' ',expand=True).values
arr2 = df.A.apply(lambda x: np.array(x.split(" "))).values
First you can notice that the shape is not the same:
print(arr1.shape)
(2, 3)
print(arr2.shape)
(2,)
so I would say one difference is that arr2 is a 1D array of elements that happens to be also 1D array. When you construct arr2 with values, it constructs a 1D array from the serie df.A.apply(lambda x: np.array(x.split(" "))) without looking at the type in this serie. For arr1, the difference is that df.A.str.split(' ',expand=True) is not a serie but a dataframe, so using values will construct an 2D array with a shape being (number of rows,nb of columns). In both case you use values, but actually having an array in a cell of a serie (as created in your method) will not create a 2D array.
Then, if you want to access any element (such as the first row second element) you can do it by arr1[0,1] while arr2[0,1] will throw an error because this structure is not a 2D array, but arr2[0][1] gives the good answer because you access the second element [1] of the first 1D array [0] in arr2.
I hope it gives some explanation.

How to read lines from file starting from arbitrary newline in Python

I have a file formatted in a way that lines are separated with a new line, like the following
1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3
I would like to read the lines separately starting, for example, from the second one and save them in an array. I think I can manage the last part, but I can't figure out how to read starting from the nth newline of the file.
Any idea on how can I do it?
Thanks.
Best regards.
As files are iterable in python you could call next on it to skip the first line, for example:
with open('data.txt', 'r') as data:
next(data)
for line in data:
print line.split()
Would yield:
['2', '2', '2', '2', '2', '2', '2']
['3', '3', '3', '3', '3', '3']
References:
next
str.split
You can use itertool.islice for this, eg:
from itertools import islice
with open('filename') as fin:
wanted = islice(fin, 1, None) # change 1 to lines to skip
data = [line.split() for line in wanted]
Well, you could do something like this:
n1, n2 = 0, 2
with open('filename.txt') as f:
print '\n'.join(f.read().split('\n')[n1:n2+1])
This would produce (as per the contents in the file you've posted) the output like this:
1 1 1 1
2 2 2 2 2 2 2
3 3 3 3 3 3
EDIT 1:
#mic-tiz According to the comment you posted below, I understand that you wish to have all the numbers in your text file into a single array.
with open('filename.txt') as f:
array = [i for i in f.read() if not i == ' ']
This code as you mentioned, would produce a list array
array = ['1', '1', '1', '1', '\n', '2', '2', '2', '2', '2', '2', '2', '\n', '3', '3', '3', '3', '3', '3']
Then, you can print the elements by splitting it on the occurrence of \n character.
EDIT 2:
You can save those numbers in a dictionary using the code below
d = {}
with open('filename.txt') as f:
array = f.read().split('\n')
for i in range(len(array)):
d['l%r'%i] = [int(j) for j in array[i] if not j == ' ']
This will produce d = {'l2': [3, 3, 3, 3, 3, 3], 'l0': [1, 1, 1, 1], 'l1': [2, 2, 2, 2, 2, 2, 2]}
lines = open('test.txt', 'r').readlines()
# n is your desired line
for lineno in range(n-1, len(lines)):
print list(lines[lineno].strip())
You cannot jump directly to a specific line. You have to read the first n lines:
n = 1
with open('data.txt', 'r') as data:
for idx, _ in enumerate(data):
if idx == n:
break
for line in data:
print line.split()

Python: how to convert a str to an array or list of integer [duplicate]

How do i make something like
x = '1 2 3 45 87 65 6 8'
>>> foo(x)
[1,2,3,45,87,65,6,8]
I'm completely stuck, if i do it by index, then the numbers with more than 1 digit will be broken down. Help please.
The most simple solution is to use .split()to create a list of strings:
x = x.split()
Alternatively, you can use a list comprehension in combination with the .split() method:
x = [int(i) for i in x.split()]
You could even use map map as a third option:
x = list(map(int, x.split()))
This will create a list of int's if you want integers.
No need to worry, because python provide split() function to change string into a list.
x='1 2 3 4 67 8 9'
x.split()
['1', '2', '3', '4', '67', '8']
or if you want output in integer form then you can use map function
map(int ,x.split(' '))
[1, 2, 3, 4, 67, 8]
Having input with space at beginning or end of the string or delimited with multiple uneven amount of spaces between the items as above, s.split(' ') returns also empty items:
>>> s=' 1 2 3 4 67 8 9 '
>>> list(s.split(' '))
['', '1', '2', '', '3', '4', '67', '8', '9', '']
I's better to avoid specifying a delimiter:
>>> list(s.split())
['1', '2', '3', '4', '67', '8', '9']
If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed).
If you want to split only at spaces, empty strings can be easily filtered:
>>> [item for item in s.split(' ') if item]
['1', '2', '3', '4', '67', '8', '9']
A simple line can be...
print (map(int, x.split()))
As some one wisely corrected me, in python >=3, it shall become,
print(list(map(int,x.split())))
It can also be user in earlier versions.
Just to make a clear explanation.
You can use the string method str.split() which split the string into a list. You can learn more about this method here.
Example:
def foo(x):
x = x.split() #x is now ['1','2','3','45', ..] the spaces are removed.
for i, v in enumerate(x): #Loop through the list
x[i] = int(v) #convert each element of v to an integer
That should do it!
>>> x
[1, 2, 3, 45, 87, 65, 6, 8]
Assuming you only have digits in your input, you can have something like following:
>>> x = '1 2 3 45 87 65 6 8'
>>> num_x = map(int, filter(None, x.split(' ')))
>>> num_x
[1 2 3 45 87 65 6 8]
This will take care of the case when the digits are separated by more than one space character or when there are space characters in front or rear of the input. Something like following:
>>> x = ' 1 2 3 4 '
>>> num_x = map(int, filter(None, x.split(' ')))
>>> num_x
[1, 2, 3, 4]
You can replace input to x.split(' ') to match other delimiter types as well e.g. , or ; etc.
x = '1 2 3 45 87 65 6 8'
new_list = []
for i in x.split(" "):
new_list.append(int(i))
Output:
>>> x
[1, 2, 3, 45, 87, 65, 6, 8]
if you want to create a list from the zeroth position:
x = '1 2 3 4 5 6 7'
result = x.split(" ")[0:]
print(result)
the result will be:
['1', '2', '3', '4', '5', '6', '7']

Problems with iterating over a string

I have a string of numbers I'm trying to iterate through. Say for example the string is 20 characters long, I'm trying to find the product of the first 5 numbers, then the second 5, the third, and so on.
So far I have converted the number to a string, then used an iterating index to produce the numbers I want to find the product of as strings.
I've then split the strings of numbers into an array of characters, then converted the characters to integers. I've then used a function to find the product of those numbers, then add it to an array.
The idea is that once I have the full array, I can find the largest of the products.
The problem I'm having is that after the first iteration, the product is coming back as 0, when it should be much higher.
My code looks like this:
def product(list):
p = 1
for i in list:
p *= i
return p
products = []
count = 1
testno = 73167176531330624919225119674426574742355349194934969835203127745063262395783180169848018694788518438586156078911294949545950173795833195285320880551112540698747158523863050715693290
startno = 0
endno = 13
end = (len(str(testno)))-1
print("the end is",end)
while count < 4:
teststring = (str(testno))[startno:endno]
print("teststring is", teststring)
strlist = (list(teststring))
print("strlist is", strlist)
numlist = list(map(int, strlist))
print("numlist is",numlist)
listproduct = (product(numlist))
print("listproduct is",listproduct)
products.append(listproduct)
print("products is now",products)
startno = startno + 1
endno = endno + 1
print("startno is now", startno)
print("endno is now", endno)
count += 1
print("the list of products is", products)
print("the biggest product is", max(products))
I have not done this as elegantly as I wanted to, perhaps because I don't properly understand the problem.
The offending output I'm getting looks like this:
the end is 999
teststring is 7316717653133
strlist is ['7', '3', '1', '6', '7', '1', '7', '6', '5', '3', '1', '3', '3']
numlist is [7, 3, 1, 6, 7, 1, 7, 6, 5, 3, 1, 3, 3]
listproduct is 5000940
products is now [5000940]
startno is now 1
endno is now 14
teststring is 3167176531330
strlist is ['3', '1', '6', '7', '1', '7', '6', '5', '3', '1', '3', '3', '0']
numlist is [3, 1, 6, 7, 1, 7, 6, 5, 3, 1, 3, 3, 0]
listproduct is 0
products is now [5000940, 0]
startno is now 2
endno is now 15
teststring is 1671765313306
strlist is ['1', '6', '7', '1', '7', '6', '5', '3', '1', '3', '3', '0', '6']
numlist is [1, 6, 7, 1, 7, 6, 5, 3, 1, 3, 3, 0, 6]
listproduct is 0
products is now [5000940, 0, 0]
startno is now 3
endno is now 16
the list of products is [5000940, 0, 0]
the biggest product is 5000940
I would be most grateful if someone could explain to me what is going wrong, how I can rectify it, and if there are any more elegant ways I could solve this problem.
Many thanks in advance for your help!
#Axtract, Just modify your product function to below.
def product(list):
p = 1
for i in list:
if i == 0: # Just use this if check here
pass
else:
p *= i
return p
You have zeros in your products. The first one happens not to contain a zero, but all the others do.
So, your function is working properly --- just a problem with the input data.
The product of zero and any number is always zero.
Notice that when your numlist has a zero, the product is zero.
Your first iteration doesn't have a zero, which is why you have a nonzero product.

Change a string of integers separated by spaces to a list of int

How do i make something like
x = '1 2 3 45 87 65 6 8'
>>> foo(x)
[1,2,3,45,87,65,6,8]
I'm completely stuck, if i do it by index, then the numbers with more than 1 digit will be broken down. Help please.
The most simple solution is to use .split()to create a list of strings:
x = x.split()
Alternatively, you can use a list comprehension in combination with the .split() method:
x = [int(i) for i in x.split()]
You could even use map map as a third option:
x = list(map(int, x.split()))
This will create a list of int's if you want integers.
No need to worry, because python provide split() function to change string into a list.
x='1 2 3 4 67 8 9'
x.split()
['1', '2', '3', '4', '67', '8']
or if you want output in integer form then you can use map function
map(int ,x.split(' '))
[1, 2, 3, 4, 67, 8]
Having input with space at beginning or end of the string or delimited with multiple uneven amount of spaces between the items as above, s.split(' ') returns also empty items:
>>> s=' 1 2 3 4 67 8 9 '
>>> list(s.split(' '))
['', '1', '2', '', '3', '4', '67', '8', '9', '']
I's better to avoid specifying a delimiter:
>>> list(s.split())
['1', '2', '3', '4', '67', '8', '9']
If the optional second argument sep is absent or None, the words are separated by arbitrary strings of whitespace characters (space, tab, newline, return, formfeed).
If you want to split only at spaces, empty strings can be easily filtered:
>>> [item for item in s.split(' ') if item]
['1', '2', '3', '4', '67', '8', '9']
A simple line can be...
print (map(int, x.split()))
As some one wisely corrected me, in python >=3, it shall become,
print(list(map(int,x.split())))
It can also be user in earlier versions.
Just to make a clear explanation.
You can use the string method str.split() which split the string into a list. You can learn more about this method here.
Example:
def foo(x):
x = x.split() #x is now ['1','2','3','45', ..] the spaces are removed.
for i, v in enumerate(x): #Loop through the list
x[i] = int(v) #convert each element of v to an integer
That should do it!
>>> x
[1, 2, 3, 45, 87, 65, 6, 8]
Assuming you only have digits in your input, you can have something like following:
>>> x = '1 2 3 45 87 65 6 8'
>>> num_x = map(int, filter(None, x.split(' ')))
>>> num_x
[1 2 3 45 87 65 6 8]
This will take care of the case when the digits are separated by more than one space character or when there are space characters in front or rear of the input. Something like following:
>>> x = ' 1 2 3 4 '
>>> num_x = map(int, filter(None, x.split(' ')))
>>> num_x
[1, 2, 3, 4]
You can replace input to x.split(' ') to match other delimiter types as well e.g. , or ; etc.
x = '1 2 3 45 87 65 6 8'
new_list = []
for i in x.split(" "):
new_list.append(int(i))
Output:
>>> x
[1, 2, 3, 45, 87, 65, 6, 8]
if you want to create a list from the zeroth position:
x = '1 2 3 4 5 6 7'
result = x.split(" ")[0:]
print(result)
the result will be:
['1', '2', '3', '4', '5', '6', '7']

Categories