replacing special characters in a numpy array with blanks - python

I have a list of lists (see below) which has ? where a value is missing:
([[1,2,3,4],
[5,6,7,8],
[9,?,11,12]])
I want to convert this to a numpy array using np.array(test), however, the ? value is causing an issue. What I want to do is replace the ? with blank space '' and then convert to a numpy array so that I have the following
so that I end up with the following array:
([[1,2,3,4],
[5,6,7,8],
[9,,11,12]])

Use list comprehension:
matrix = ...
new_matrix = [["" if not isinstance(x,int) else x for x in sublist] for sublist in matrix]

Python does not have type for ?
check this
a =?
print(type(a))
Above code will cause syntax error
It must be "?".
If this is the case then you can use
list1 = ([[1,2,3,4],
[5,6,7,8],
[9,?,11,12]])
for i1, ele in enumerate(list1):
for i2, x in enumerate(ele):
if x == "?":
list1[i1][i2] = ""
print(list1)

This is an approach using loops to find elements that can't be turned into integers and replaces them with blank spaces.
import numpy as np
preArray = ([[1,2,3,4],
[5,6,7,8],
[9,'?',11,12]])
newPreArray = []
for row in preArray:
newRow = []
for val in row:
try:
int(val)
newRow.append(val)
except:
newRow.append('')
newPreArray.append(newRow)
array = np.array(newPreArray)

For a single list you can do something like:
>>> myList = [4, 5, '?', 6]
>>> myNewList = [i if str(i).isdigit() else '' for i in myList]
>>> myNewList
[4,5,'',6]
so take that information and make it work with a list of lists.

Related

Remove NaN from lists in python

I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)
It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)
You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.

Shedding list layers - Python

How could you turn:
[[["hi"], ["hello"]]]
into:
[["hi"], ["hello"]]
While also working with [] as an input
You can use pop function to take the first item out of the list.
>>> a = [["hi"]]
>>> a = a.pop(0)
>>> a
['hi']
Or you can also do something like:
>>> a = [["hi"]]
>>> a = a[0]
>>> a
['hi']
As you edited your question:
>>> a = [[["hi"], ["hello"]]]
>>> a = a.pop(0)
>>> a
[['hi'], ['hello']]
If you have an empty list:
a = []
try:a = a.pop(0) #try if it don't raises an error
except:pass #if error is raised, ingnore it
So , according to the code above, if the list is empty , it will not raise any error.
If you want it to be more simpler:
a = []
if len(a) != 0:a = a[0]
print(a)
You can also make a function to make it simple to use with empty or non-empty lists:
def get(list):
return list if len(list) == 0 else list[0]
Testing our function:
>>> get([])
[]
>>> get([["Hi","Bye"]])
['Hi', 'Bye']
You can use a nested list comprehension to flatten it:
list_of_lists = [['hi'],['hello']]
flattened = [val for sublist in list_of_lists for val in sublist]
print(flattened)
This also works if you have multipe sublists in your list.
You can read more about this on python's documentation
Use itertools
import itertools
ip = [["hi"]]
op = list(itertools.chain.from_iterable(ip))
# op ["hi"]
Or a numpy solution using ravel
import numpy as np
ip = [["hi"]]
ip = np.array(ip)
op = list(ip.ravel())
Since you edited your question, you only want n-1 dimensions, you could use np.transpose -
ip = [[["hi"], ["hello"]]]
op = ip.transpose().reshape(2, 1)
You could check if your input is empty[] explicitly before doing any operations on it, that way to avoid errors -
if ip:
# do some operations on ip

Finding indices of items from a list in another list even if they repeat

This answer works very well for finding indices of items from a list in another list, but the problem with it is, it only gives them once. However, I would like my list of indices to have the same length as the searched for list.
Here is an example:
thelist = ['A','B','C','D','E'] # the list whose indices I want
Mylist = ['B','C','B','E'] # my list of values that I am searching in the other list
ilist = [i for i, x in enumerate(thelist) if any(thing in x for thing in Mylist)]
With this solution, ilist = [1,2,4] but what I want is ilist = [1,2,1,4] so that len(ilist) = len(Mylist). It leaves out the index that has already been found, but if my items repeat in the list, it will not give me the duplicates.
thelist = ['A','B','C','D','E']
Mylist = ['B','C','B','E']
ilist = [thelist.index(x) for x in Mylist]
print(ilist) # [1, 2, 1, 4]
Basically, "for each element of Mylist, get its position in thelist."
This assumes that every element in Mylist exists in thelist. If the element occurs in thelist more than once, it takes the first location.
UPDATE
For substrings:
thelist = ['A','boB','C','D','E']
Mylist = ['B','C','B','E']
ilist = [next(i for i, y in enumerate(thelist) if x in y) for x in Mylist]
print(ilist) # [1, 2, 1, 4]
UPDATE 2
Here's a version that does substrings in the other direction using the example in the comments below:
thelist = ['A','B','C','D','E']
Mylist = ['Boo','Cup','Bee','Eerr','Cool','Aah']
ilist = [next(i for i, y in enumerate(thelist) if y in x) for x in Mylist]
print(ilist) # [1, 2, 1, 4, 2, 0]
Below code would work
ilist = [ theList.index(i) for i in MyList ]
Make a reverse lookup from strings to indices:
string_indices = {c: i for i, c in enumerate(thelist)}
ilist = [string_indices[c] for c in Mylist]
This avoids the quadratic behaviour of repeated .index() lookups.
If you data can be implicitly converted to ndarray, as your example implies, you could use numpy_indexed (disclaimer: I am its author), to perform this kind of operation in an efficient (fully vectorized and NlogN) manner.
import numpy_indexed as npi
ilist = npi.indices(thelist, Mylist)
npi.indices is essentially the array-generalization of list.index. Also, it has a kwarg to give you control over how to deal with missing values and such.

Python 2d array iteration

I have two lists, one with a parameter name and one with a pin name, and I am trying to combine the two lists into a 2d matrix but I cannot get the syntax right.
For example:
list1 = [parm1,parm2,parm3]
list2 = [end1,end2,end3]
and I want the matrix to be:
matrix1= [[parm1+ end1,parm1+end2, parm1+end3]
[parm2+ end1,parm2+end2, parm2+end3]
[parm3+ end1,parm3+end2, parm3+end3]
right now my code is
for i in range(len(parm_name)):
for j in range(len(end_name)):
pin_name[i][j] = parm_name[i] + end_name[j]
and it's not working.
Instead of reassigning elements of a preinitialized list, simply create a new one:
list1 = [parm1,parm2,parm3]
list2 = [end1,end2,end3]
matrix1 = [[p+e for e in list2] for p in list1]
That last line can be expanded into the following equivalent code:
matrix1 = []
for p in list1:
result = []
for e in list2:
result.append(p+e)
matrix1.append(result)
You can create matrix1 with the following:
matrix1 = [[p_name + e_name for e_name in list2] for p_name in list1]
You don't give much code so it's difficult to say why yours isn't working. I suspect that you don't initialize your matrix appropriately. But you don't need to initialize then assign, you can do it all in one step with list comprehension

python if statement to check a column for a value and do a command

for z in range(0,countD.shape[0]):
if countD[z,0] in background_low1[:,0]:
background_lowCountD.append(countD[z,:])
else:
background_goodCountD.append(countD[z,:])
I'm using the above code and getting a "list indices must be integers, not tuple" error message. I have two uneven arrays (CountD and background_low1), If a value is present in column 0 of both arrays at any row level I want to move that row to a new array, if its only present in 1 I want that row moved to a second new array.
You are getting this error message because lists are unidimensional (in theory). But since a list can contain another list, you can create a multidimensional list. Now accessing an element of a list is done using an index (which must be an integer) between brackets. When dealing with multidimensional lists, just use multiple brackets one after the other :
>>> a = ['a','b','c']
>>> b = [1,2,a]
>>> print b[2]
>>> ['a','b','c']
>>> print b[2][0]
>>> 'a'
So, to answer your question, try something like this :
list1 = [[1,2,3,4],
[5,6,7,8],
[9,10,11,12]]
list2 = [[1,4,5,6],
[7,6,7,8],
[9,1,2,3]]
newList = []
otherList = []
#you need to make sure both lists are the same size
if len(list1) == len(list2):
for i in range(len(list1)):
if list1[i][0] == list2[i][0]:
newList.append(list1[i])
else:
otherList.append(list1[i])
#the lists should now look like this
>>> print newList
>>> [[1,2,3,4],[9,10,11,12]]
>>> print otherList
>>> [[5,6,7,8]]

Categories