appending strings to a new list under if statement - python

After reading the file directory using this line,
x = glob.glob('**/*.txt', recursive = True)
I got this output,
doping_center9_2.txt
doping_center9_3.txt
doping_center9_4.txt
doping_center9_5.txt
n_eff_doping_center1_1.txt
n_eff_doping_center1_2.txt
n_eff_doping_center1_3.txt
n_eff_doping_center1_4.txt
Now, I would like to create another list and appending the strings with a starting with n_eff. I tried this:
n_eff = []
for i in range(len(x)):
if x[i] == x[i].startswith("n_eff"):
n_eff.append(x[i])
Unfortunately, nothing is happening there, not even an error.

The problem is that startswith is returning boolean value (True or False). You are then checking if x[i] is equal to a boolean value which is always false because it contains string.
Changing the condition should help:
if x[i].startswith("n_eff"):

If the output is multiline you need to split it with split('\n').
Function "<string>".startswith(<arg>) returns True if <string> starts with the parameter string <arg>. You can check if a string starts with the specified sub string with this function.
Using list comprehension it can be written as:
new_list = [x for x in output if x.startswith('\n')]
edited considering output a list.

In [51]: x = '''doping_center9_2.txt
...: doping_center9_3.txt
...: doping_center9_4.txt
...: doping_center9_5.txt
...: n_eff_doping_center1_1.txt
...: n_eff_doping_center1_2.txt
...: n_eff_doping_center1_3.txt
...: n_eff_doping_center1_4.txt
...: '''.splitlines()
In [52]: x
Out[52]:
['doping_center9_2.txt',
'doping_center9_3.txt',
...
'n_eff_doping_center1_4.txt']
Are you trying to do something like this?
In [53]: for i in x:
...: if i.startswith('n_eff'):
...: print(i)
...:
n_eff_doping_center1_1.txt
n_eff_doping_center1_2.txt
n_eff_doping_center1_3.txt
n_eff_doping_center1_4.txt

Related

Can i include multiple statements when creating a one-line for loop?

I have an array I want to iterate through. The array consists of strings consisting of numbers and signs.
like this: €110.5M
I want to loop over it and remove all Euro sign and also the M and return that array with the strings as ints.
How would I do this knowing that the array is a column in a table?
You could just strip the characters,
>>> x = '€110.5M'
>>> x.strip('€M')
'110.5'
def sanitize_string(ss):
ss = ss.replace('$', '').replace('€', '').lower()
if 'm' in ss:
res = float(ss.replace('m', '')) * 1000000
elif 'k' in ss:
res = float(ss.replace('k', '')) * 1000
return int(res)
This can be applied to a list as follows:
>>> ls = [sanitize_string(x) for x in ["€3.5M", "€15.7M" , "€167M"]]
>>> ls
[3500000, 15700000, 167000000]
If you want to apply it to the column of a table instead:
dataFrame = dataFrame.price.apply(sanitize_string) # Assuming you're using DataFrames and the column is called 'price'
You can use a string comprehension:
numbers = [float(p.replace('€','').replace('M','')) for p in a]
which gives:
[110.5, 210.5, 310.5]
You can use a list comprehension to construct one list from another:
foo = ["€13.5M", "€15M" , "€167M"]
foo_cleaned = [value.translate(None, "€M")]
str.translate replaces all occurrences of characters in the latter string with the first argument None.
Try this
arr = ["€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M","€110.5M"]
f = [x.replace("€","").replace("M","") for x in arr]
You can call .replace() on a string as often as you like. An initial solution could be something like this:
my_array = ['€110.5M', '€111.5M', '€112.5M']
my_cleaned_array = []
for elem in my_array:
my_cleaned_array.append(elem.replace('€', '').replace('M', ''))
At this point, you still have strings in your array. If you want to return them as ints, you can write int(elem.replace('€', '').replace('M', '')) instead. But be aware that you will then lose everything after the floating point, i.e. you will end up with [110, 111, 112].
You can use Regex to do that.
import re
str = "€110.5M"
x = re.findall("\-?\d+\.\d+", str )
print(x)
I didn't quite understand the second part of the question.

how can delete items from a list in python and replace them with new items?

I am having a problem with replacing a full list with another list.
For example, let's say I have a list that contains the words ['han','san'];
how can I have another list that has [1,2] send its values to replace han and san in that list?
i = [1,2]
p = ['han', 'san']
I want to have han and san replaced with 1 and 2.
In [1]: i = [1,2]
In [2]: p = ['han', 'san']
In [3]: i[:] = p # replace all content of i with content of p
In [4]: i
Out[4]: ['han', 'san']
If you have a longer list and want just the first two replaced:
In [5]: i = [1,2,3,4]
In [6]: p = ['han', 'san']
In [7]: i[:2] = p # replace just first two elements with contents of p
In [8]: i
Out[8]: ['han', 'san', 3, 4]
The i[:] syntax selects the whole list, so i[:] = p adds every element from p to i, if you used i = p i would be a reference to p so any changes in p would be reflected in i as i is p, they both point to the same object in memory.
Using i[:2] we are selecting only the first two elements of longer version of i and setting them equal to the contents of p, if p had ten elements we would add ten elements to i. When assigning using the [:] syntax the right side must always be an iterable.
In [9]: id(i)
Out[9]: 140380204622192
In [10]: i[:] = p
In [11]: id(i) # still same object
Out[11]: 140380204622192
In [12]: i = p
In [13]: id(i) # now i is p
Out[13]: 140380204431624
In [14]: id(p)
Out[14]: 140380204431624
map(lambda x:p[x-1],i)
is one way
Using an operation such as:
p = d
Would work for your example, but it would replace one list with the other rather than replace each item with another list's items. You could try Joran Beasley's method mentioned earlier:
map(lambda x: p[x - 1], i)
Or you could define a function:
def replace_list(old_list, new_list):
for x in range(len(old_list)):
old_list[x] = new_list[x]
As you can see, there are various methods for replacing the items in a list, and your choice really depends on your program. Hope this helped!

list to tuple return in python

I am new to python. I am trying to create a function which takes string and list as arguments and returns a boolean value for every list element found (this should be returned as tuple) in the string. I have tried the following code
def my_check(str1, list1):
words = str1.split()
x = 1
for l in range(len(list1)):
for i in range(len(words)):
if list1[l] == words[i]:
x = x+1
if (x > 1):
print(True)
x = 1
else:
print(False)
output = my_check('my name is ide3', ['is', 'my', 'no'])
print(output)
This code outputs
True
True
False
How can i return this value as a tuple with
>>> output
(True, True, False)
Any idea is appreciated.
If you want to modify any code that prints things into code that returns things, you have to:
Create an empty collection at the top.
Replace every print call with a call that add the value to the collection instead.
Return the collection.
So:
def my_check(str1, list1):
result = () # create an empty collection
words = str1.split()
x = 1
for l in range(len(list1)):
for i in range(len(words)):
if list1[l] == words[i]:
x = x+1
if (x > 1):
result += (True,) # add the value instead of printing
x = 1
else:
result += (False,) # add the value instead of printing
return result # return the collection
This is a bit awkward with tuples, but it works. You might instead want to consider using a list, because that's less awkward (and you can always return tuple(result) at the end if you really need to convert it).
Generators to the rescue (edited: got it backwards the first time)
def my_check(str1, list1):
return tuple(w in str1.split() for w in list1)
Considering the efficiency, maybe we should build a set from str1.split() first because query item in a set is much faster than that in a list, like this:
def my_check(str1, list1):
#build a set from the list str1.split() first
wordsSet=set(str1.split())
#build a tuple from the boolean generator
return tuple((word in wordsSet) for word in list1)
You can check for a string directly within a string, so split() isn't necessary. So this works too:
def my_check(str1, list1):
return tuple(w in mystr for w in mylist)
# return [w in mystr for w in mylist] # Much faster than creating tuples
However, since returning a tuple as opposed to a new list isn't often needed, you should be able to just use straight list comprehension above (you can always cast the list to a tuple in your code downstream, if you have to).
python results:
In [117]: %timeit my_check_wtuple('my name is ide3', ['is', 'my', 'no'])
100000 loops, best of 3: 2.31 µs per loop
In [119]: %timeit my_check_wlist('my name is ide3', ['is', 'my', 'no'])
1000000 loops, best of 3: 614 ns per loop

Python: Compare more numbers

I would like to search for numbers in existing list. If is one of this numbers repeated then set variable's value to true and break for loop.
list = [3, 5, 3] //numbers in list
So if the function gets two same numbers then break for - in this case there is 3 repeated.
How to do that?
First, don't name your list list. That is a Python built-in, and using it as a variable name can give undesired side effects. Let's call it L instead.
You can solve your problem by comparing the list to a set version of itself.
Edit: You want true when there is a repeat, not the other way around. Code edited.
def testlist(L):
return sorted(set(L)) != sorted(L)
You could look into sets. You loop through your list, and either add the number to a support set, or break out the loop.
>>> l = [3, 5, 3]
>>> s = set()
>>> s
set([])
>>> for x in l:
... if x not in s:
... s.add(x)
... else:
... break
You could also take a step further and make a function out of this code, returning the first duplicated number you find (or None if the list doesn't contain duplicates):
def get_first_duplicate(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return x
get_first_duplicate([3, 5, 3])
# returns 3
Otherwise, if you want to get a boolean answer to the question "does this list contain duplicates?", you can return it instead of the duplicate element:
def has_duplicates(l):
s = set()
for x in l:
if x not in s:
s.add(x)
else:
return true
return false
get_first_duplicate([3, 5, 3])
# returns True
senderle pointed out:
there's an idiom that people sometimes use to compress this logic into a couple of lines. I don't necessarily recommend it, but it's worth knowing:
s = set(); has_dupe = any(x in s or s.add(x) for x in l)
you can use collections.Counter() and any():
>>> lis=[3,5,3]
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # True means yes some value is repeated
True
>>> lis=range(10)
>>> c=Counter(lis)
>>> any(x>1 for x in c.values()) # False means all values only appeared once
False
or use sets and match lengths:
In [5]: lis=[3,3,5]
In [6]: not (len(lis)==len(set(lis)))
Out[6]: True
In [7]: lis=range(10)
In [8]: not (len(lis)==len(set(lis)))
Out[8]: False
You should never give the name list to a variable - list is a type in Python, and you can give yourself all kinds of problems masking built-in names like that. Give it a descriptive name, like numbers.
That said ... you could use a set to keep track of which numbers you've already seen:
def first_double(seq):
"""Return the first item in seq that appears twice."""
found = set()
for item in seq:
if item in found:
return item
# return will terminate the function, so no need for 'break'.
else:
found.add(item)
numbers = [3, 5, 3]
number = first_double(numbers)
without additional memory:
any(l.count(x) > 1 for x in l)

String Manipulation of All Elements of an Array in Python

I'm trying to isolate a substring of each string element of an array such that it is the string until the last period. For example I would want to have:
input = 'A.01.0'
output = 'A.01'
or
input = 'A.0'
output = 'A'
And I want to do this for all elements of an array.
Use some rsplit magic:
x=["123","456.678","abc.def.ghi"]
[y.rsplit(".",1)[0] for y in x]
This is one way to produce the wanted output format. You need to alter this to suit your needs.
output = input[:input.rindex('.')]
For the entire array:
arr = ['A.01.0', 'A.0']
arr = [x[:x.rindex('.')] for x in arr]
Hope that helps :-)
Something like this?
>>> i = ['A.01.0', 'A.0']
>>> [x[:x.rfind('.')] for x in i]
['A.01', 'A']

Categories