Can this code be rewritten to be more pythonic? - python

I read in values from a file. I have a line that has this format
size = 'GG|0,WQ|3,EW|8,RE|23'
I want it to be a list of dictionary.
Right now I use this code which works perfect but it seems like there has to be a cleaner way to do it.
>>> size = 'GG|0,WQ|3,EW|8,RE|23'
>>> a = [{i.split('|')[0]:i.split('|')[1]} for i in size.split(',')]
>>> a
[{'GG': '0'}, {'WQ': '3'}, {'EW': '8'}, {'RE': '23'}]
>>>

size = 'GG|0,WQ|3,EW|8,RE|23'
elements = size.split(',')
a = [dict([x.split('|')]) for x in elements]

Perhaps you can use python regex... You can take this further obviously
import re
size = 'GG|0,WQ|3,EW|8,RE|23'
pattern = re.compile('[A-Z][A-Z][|][0-9]+')
my_list = re.findall(pattern, size)
my_dic = [dict([item.split('|')]) for item in my_list]

Related

Python How to find arrays that has a certain element efficiently

Given lists(a list can have an element that is in another list) and a string, I want to find all names of lists that contains a given string.
Simply, I could just go through all lists using if statements, but I feel that there is more efficient way to do so.
Any suggestion and advice would be appreciated. Thank you.
Example of Simple Method I came up with
arrayA = ['1','2','3','4','5']
arrayB = ['3','4','5']
arrayC = ['1','3','5']
arrayD = ['7']
foundArrays = []
if givenString in arrayA:
foundArrays.append('arrayA')
if givenString in arrayB:
foundArrays.append('arrayB')
if givenString in arrayC:
foundArrays.append('arrayC')
if givenString in arrayD:
foundArrays.append('arrayD')
return foundArrays
Lookup in a list is not very efficient; a set is much better.
Let's define your data like
data = { # a dict of sets
"a": {1, 2, 3, 4, 5},
"b": {3, 4, 5},
"c": {1, 3, 5},
"d": {7}
}
then we can search like
search_for = 3 # for example
in_which = {label for label,values in data.items() if search_for in values}
# -> in_which = {'a', 'b', 'c'}
If you are going to repeat this often, it may be worth pre-processing your data like
from collections import defaultdict
lookup = defaultdict(set)
for label,values in data.items():
for v in values:
lookup[v].add(label)
Now you can simply
in_which = lookup[search_for] # -> {'a', 'b', 'c'}
The simple one-liner is:
result = [lst for lst in [arrayA, arrayB, arrayC, arrayD] if givenString in lst]
or if you prefer a more functional style:
result = filter(lambda lst: givenString in lst, [arrayA, arrayB, arrayC, arrayD])
Note that neither of these gives you the NAME of the list. You shouldn't ever need to know that, though.
Array names?
Try something like this with eval() nonetheless using eval() is evil
arrayA = [1,2,3,4,5,'x']
arrayB = [3,4,5]
arrayC = [1,3,5]
arrayD = [7,'x']
foundArrays = []
array_names = ['arrayA', 'arrayB', 'arrayC', 'arrayD']
givenString = 'x'
result = [arr for arr in array_names if givenString in eval(arr)]
print result
['arrayA', 'arrayD']

CSV string to decimal in python list

I am using an API that returns what appears to be a CSV string that i need to parse for two decimal numbers and then need to append those numbers to separate lists as decimal numbers (also while ignoring the timestamp at the end):
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
decimal_lowest_in_string = []
decimal_highest_in_string = []
Processing time is a factor in this situation so, what is the fastest way to accomplish this?
Split the string by comma:
>>> string_values = returned_string_from_API.split(',')
>>> string_values
['0', 'F', 'F', '1.139520', '1.139720', '0', '0', '20160608163132000']
Get the values from string:
>>> string_values[3:5]
['1.139520', '1.139720']
Convert to float:
>>> decimal_values = [float(val) for val in string_values[3:5]]
>>> decimal_values
[1.13952, 1.13972]
Get min and max in the appropriate list:
>>> decimal_lowest_in_string = []
>>> decimal_highest_in_string = []
>>> decimal_lowest_in_string.append(min(decimal_values))
>>> decimal_lowest_in_string
[1.13952]
>>> decimal_highest_in_string.append(max(decimal_values))
>>> decimal_highest_in_string
[1.13972]
1) The version which does not rely on cvs
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
def isfloat(value):
try:
float(value)
return True
except ValueError:
return False
float_numbers = filter(isfloat, returned_string_from_API.split(','))
2) try pandas package
Fastest way is to use regular expression. Readability is another issue..
import re
returned_string_from_API = '0,F,F,1.139520,1.139720,0,0,20160608163132000'
decimal_lowest_in_string = []
decimal_highest_in_string = []
re_check = re.compile(r"[0-9]+\.\d*")
m = re_check.findall(returned_string_from_API)
decimal_lowest_in_string.append(min(m))
decimal_highest_in_string.append(max(m))

How to encode categorical values in Python

Given a vocabulary ["NY", "LA", "GA"],
how can one encode it in such a way that it becomes:
"NY" = 100
"LA" = 010
"GA" = 001
So if I do a lookup on "NY GA", I get 101
you can use numpy.in1d:
>>> xs = np.array(["NY", "LA", "GA"])
>>> ''.join('1' if f else '0' for f in np.in1d(xs, 'NY GA'.split(' ')))
'101'
or:
>>> ''.join(np.where(np.in1d(xs, 'NY GA'.split(' ')), '1', '0'))
'101'
vocab = ["NY", "LA", "GA"]
categorystring = '0'*len(vocab)
selectedVocabs = 'NY GA'
for sel in selectedVocabs.split():
categorystring = list(categorystring)
categorystring[vocab.index(sel)] = '1'
categorystring = ''.join(categorystring)
This is the end result of my won testing, turns out Python doesn't support string item assignment, somehow i thought it did.
Personally i think behzad's solution is better, numpy does a better job and is faster.
Or you can
vocabulary = ["NY","LA","GA"]
i=pow(10,len(vocabulary)-1)
dictVocab = dict()
for word in vocabulary:
dictVocab[word] = i
i /= 10
yourStr = "NY LA"
result = 0
for word in yourStr.split():
result += dictVocab[word]
Another solution using numpy. It looks like you're tyring to binary encode a dictionary, so the code below feels natural to me.
import numpy as np
def to_binary_representation(your_str="NY LA"):
xs = np.array(["NY", "LA", "GA"])
ys = 2**np.arange(3)[::-1]
lookup_table = dict(zip(xs,ys))
return bin(np.sum([lookup_table[k] for k in your_str.split()]))
It's also not needed to do it in numpy, but it is probably faster in case you have large arrays to work on. np.sum can be replaced by the builtin sum then and the xs and ys can be transformed to non-numpy equivalents.
To create a lookup dictionary, reverse the vocabulary, enumerate it, and take the power of 2:
>>> vocabulary = ["NY", "LA", "GA"]
d = dict((word, 2 ** i) for i, word in enumerate(reversed(vocabulary)))
>>> d
{'NY': 4, 'GA': 1, 'LA': 2}
To query the dictionary:
>>> query = "NY GA"
>>> sum(code for word, code in d.iteritems() if word in query.split())
5
If you want it formatted to binary:
>>> '{0:b}'.format(5)
'101'
edit: if you want a 'one liner':
>>> '{0:b}'.format(
sum(2 ** i
for i, word in enumerate(reversed(vocabulary))
if word in query.split()))
'101'
edit2: if you want padding, for example with six 'bits':
>>> '{0:06b}'.format(5)
'000101'

string into a list in Python

I have a str that contains a list of numbers and I want to convert it to a list. Right now I can only get the entire list in the 0th entry of the list, but I want each number to be an element of a list. Does anyone know of an easy way to do this in Python?
for i in in_data.splitlines():
print i.split('Counter32: ')[1].strip().split()
my result not i want
['12576810']\n['1917472404']\n['3104185795']
my data
IF-MIB::ifInOctets.1 = Counter32: 12576810
IF-MIB::ifInOctets.2 = Counter32: 1917472404
IF-MIB::ifInOctets.3 = Counter32: 3104185795
i want result
['12576810','1917472404','3104185795']
Given your data as
>>> data="""IF-MIB::ifInOctets.1 = Counter32: 12576810
IF-MIB::ifInOctets.2 = Counter32: 1917472404
IF-MIB::ifInOctets.3 = Counter32: 3104185795"""
You can use regex where the intent is more clear
>>> import re
>>> [re.findall("\d+$",e)[0] for e in data.splitlines()]
['12576810', '1917472404', '3104185795']
or as #jamylak as pointed out
re.findall("\d+$",data,re.MULTILINE)
Or str.rsplit which will have a edge on performance
>>> [e.rsplit()[-1] for e in data.splitlines()]
['12576810', '1917472404', '3104185795']
You are already quite far. Based on the code you have, try this:
result = []
for i in in_data.splitlines():
result.append(i.split('Counter32: ')[1].strip())
print result
you could also do:
result = [i.split('Counter32: ')[1].strip() for i in in_data.splitlines()]
Then, you can also go and look at what #Abhijit and #KurzedMetal are doing with regular expressions. In general, that would be the way to go, but I really like how you avoided them with a simple split.
My best try with the info you gave:
>>> data = r"['12576810']\n['1917472404']\n['3104185795']"
>>> import re
>>> re.findall("\d+", data)
['12576810', '1917472404', '3104185795']
you could even convert it to int or long if necesary with map()
>>> map(int, re.findall("\d+", data))
[12576810, 1917472404, 3104185795L]
>>> map(long, re.findall("\d+", data))
[12576810L, 1917472404L, 3104185795L]
This is how I'd do it.
data="""IF-MIB::ifInOctets.1 = Counter32: 12576810 ... IF-MIB::ifInOctets.2 = Counter32: 1917472404 ... IF-MIB::ifInOctets.3
= Counter32: 3104185795"""
[ x.split()[-1] for x in data.split("\n") ]
with open('in.txt') as f:
numbers=[y.split()[-1] for y in f]
print(numbers)
['12576810', '1917472404', '3104185795']
or:
with open('in.txt') as f:
numbers=[]
for x in f:
x=x.split()
numbers.append(x[-1])
print(numbers)
['12576810', '1917472404', '3104185795']
result = [(item[(item.rfind(' ')):]).strip() for item in list_of_data]
A variant using list comprehension. Iterator over all line of data, find the last index of a blank, cut down the strip from last found blank position to it's end, strip the resulting string (erase possible blanks) and put the result in a new list.
data = """F-MIB::ifInOctets.1 = Counter32: 12576810
IF-MIB::ifInOctets.2 = Counter32: 1917472404
IF-MIB::ifInOctets.3 = Counter32: 3104185795"""
result = [ (item[(item.rfind(' ')):]).strip() for item in data.splitlines()]
print result
Result:
['12576810', '1917472404', '3104185795']

editing List content in Python

I have a variable data:
data = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
I need to convert it to remove all occurrence of "b" in the list.
How can i do that?
Not sure whether it would help - but it works with your sample:
initList = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
resultList = [str(x) for x in initList ]
Or in P3:
resultList = [x.decode("utf-8") for x in initList ] # where utf-8 is encoding used
Check more on decode function.
Also you may want to take a look into the following related SO thread.
>>> a = [b'script', b'-compiler', b'123cds', b'-algo', b'timing']
>>> map(str, a)
['script', '-compiler', '123cds', '-algo', 'timing']
strin = "[b'script', b'-compiler', b'123cds', b'-algo', b'timing']"
arr = strin.strip('[]').split(', ')
res = [part.strip("b'") for part in arr]
>>> res
['script', '-compiler', '123cds', '-algo', 'timing']

Categories