I have a large CSV with comma separated lines of varying length. Sorting another set of data I used split(',') in a loop to separate fields, but this method requires each line to have the same number of entries. Is there a way I can look at a line and, independent of the total number of entries, just pull the Nth item? For reference, the method I was using will only work with a line that looks like AAA,BBB,CCC,DDD
entry = 'A,B,C,D'
(a,b,c,d) = entry.split(',')
print a,b,c,d
But I would like to pull A and C even if it looks like A,B,C,D,E,F or A,B,C.
Use a list instead of separate variables.
values = entry.split(',')
print values[0], values[2]
Just use a list:
xyzzy = entry.split(",");
print xyzzy[0], xyzzy[2]
But be aware that, once you allow the possibility of variable element counts, you'd probably better allow for too few:
entry = 'A,B'
xyzzy = entry.split(",");
(a,c) = ('?','?')
if len(xyzzy) > 0: a = xyzzy[0]
if len(xyzzy) > 2: c = xyzzy[2]
print a, c
If you don't want to index the results, it's not difficult to write your own function to deal with the situation where there are either too few or two many values. Although it requires a few more lines of code to set up, an advantage is that you can give the results meaningful names instead of anonymous ones likeresults[0]andresults[2].
def splitter(s, take, sep=',', default=None):
r = s.split(sep)
if len(r) < take:
r.extend((default for _ in xrange(take - len(r))))
return r[:take]
entry = 'A,B,C'
a,b,c,d = splitter(entry, 4)
print a,b,c,d # --> A B C None
entry = 'A,B,C,D,E,F'
a,b,c,d = splitter(entry, 4)
print a,b,c,d # --> A B C D
Related
I'm trying to store a bidirectional relationship in a database and to minimise duplicity of storing two records per relationship, I'm trying to find a way to take two UUIDs in either order and return the same unique id regardless of which UUID was supplied first.
F(a,b) should return the same value as F(b,a)
Examples of ShortUUID output:
wpsWLdLt9nscn2jbTD3uxe
vytxeTZskVKR7C7WgdSP3d
Could something like this work for you?
The function takes two strings as input, orders them, concatenates them into one string, encodes that string and finally returns the hashed result.
import hashlib
def F(a, b):
data = ''.join(sorted([a, b])).encode()
return hashlib.sha1(data).hexdigest()
The output is
>> a = 'string_1'
>> b = 'string_2'
>> print(F(a, b))
376598c12bb7949427f4c037070fff76fe932a66
>> print(F(b, a))
376598c12bb7949427f4c037070fff76fe932a66
Interesting! What do you think of this, that will retain your ShortUUID format?
def F(a,b):
l = (len(a)//2)+1
each_half = zip(a[:l],b[:l]) if a < b else zip(b[:l],a[:l])
return ''.join([x+y for x,y in (each_half)])[:len(a)]
The first line will ensure that F works also if you will change your ShortUUID to have an odd length.
The second line zip one char from the first half of each a and b, ordered.
The last will return the joined string, capped at the length of a
Just tried:
a = 'wpsWLdLt9nscn2jbTD3uxe'
b = 'vytxeTZskVKR7C7WgdSP3d'
assert F(a,b) == F(b,a)
print(F(a,b)) # vwyptsxWeLTdZLstk9VnKs
Given the data for the row index to be found as max_sw and list is sw_col.
I tried this and some other variation, but nothing worked.
print(i for i in range(len(sw_col)) if sw_col[i]== max_sw)
The line you have is almost there. If you put the generator into a list and use only index position zero, this will give you the correct answer:
sw_col = ['a','b','c']
max_sw = 'c'
print([i for i in range(len(sw_col)) if sw_col[i]== max_sw][0]) # prints 2
A more concise solution would be to look up the item directly in the list, like so:
sw_col = ['a','b','c']
max_sw = 'c'
print(sw_col.index(max_sw)) # prints 2
Let's say I know beforehand that the string
"key1:key2[]:key3[]:key4" should map to "newKey1[]:newKey2[]:newKey3"
then given "key1:key2[2]:key3[3]:key4",
my method should return "newKey1[2]:newKey2[3]:newKey3"
(the order of numbers within the square brackets should stay, like in the above example)
My solution looks like this:
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3"}
def transform(parent_key, parent_key_with_index):
indexes_in_parent_key = re.findall(r'\[(.*?)\]', parent_key_with_index)
target_list = predefined_mapping[parent_key].split(":")
t = []
i = 0
for elem in target_list:
try:
sub_result = re.subn(r'\[(.*?)\]', '[{}]'.format(indexes_in_parent_key[i]), elem)
if sub_result[1] > 0:
i += 1
new_elem = sub_result[0]
except IndexError as e:
new_elem = elem
t.append(new_elem)
print ":".join(t)
transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4")
prints newKey1[2]:newKey2[3]:newKey3 as the result.
Can someone suggest a better and elegant solution (around the usage of regex especially)?
Thanks!
You can do it a bit more elegantly by simply splitting the mapped structure on [], then interspersing the indexes from the actual data and, finally, joining everything together:
import itertools
# split the map immediately on [] so that you don't have to split each time on transform
predefined_mapping = {"key1:key2[]:key3[]:key4": "newKey1[]:newKey2[]:newKey3".split("[]")}
def transform(key, source):
mapping = predefined_mapping.get(key, None)
if not mapping: # no mapping for this key found, return unaltered
return source
indexes = re.findall(r'\[.*?\]', source) # get individual indexes
return "".join(i for e in itertools.izip_longest(mapping, indexes) for i in e if i)
print(transform("key1:key2[]:key3[]:key4", "key1:key2[2]:key3[3]:key4"))
# newKey1[2]:newKey2[3]:newKey3
NOTE: On Python 3 use itertools.zip_longest() instead.
I still think you're over-engineering this and that there is probably a much more elegant and far less error-prone approach to the whole problem. I'd advise stepping back and looking at the bigger picture instead of hammering out this particular solution just because it seems to be addressing the immediate need.
Suppose I have the following function:
def function3(start, end):
"""Read MO information."""
config_found = False
var = []
for line in v['molecular orbital primitive coefficients']:
if line.strip() == end:
config_found = False
elif config_found:
i = line.rstrip()
var.append(i)
elif line.strip() == start:
config_found = True
var1 = [elem.strip() for elem in var]
var2 = var1[1:-1]
var3 = np.array([line.split() for line in var2])
var3 = np.asarray([list(map(float, item)) for item in var3])
return var3
And suppose I store its output in variables like so:
monumber1=function3('1','2')
monumber2=function3('2','3')
monumber3=function3('3','4')
etc.
Is there a way for me to execute this function a set number of times and store the output in a set number of variables without manually setting the variable name and function arguments every time? Maybe using a for loop? This is my attempt, but I'm struggling to make it functional:
for i in xrange(70):
monumber[x] = function3([i],[i+1])
Thank you!
The problem is your use of square brackets. Here is code that should work:
monumber = [] # make it an empty list
for i in xrange(70):
monumber.append(function3(str(i),str(i+1))) # you had string integers, so cast
For the more Pythonic one-liner, you can use a list comprehension:
monumber = [function3(str(i),str(i+1)) for i in xrange(70)]
Now that the monumber variable has been created, I can access the element at any given index i using the syntax monumber[i]. Some examples:
first = monumber[0] # gets the first element of monumber
last = monumber[-1] # gets the last index of monumber
for i in xrange(10,20): # starts at i = 10 and ends at i = 19
print(monumber[i]) # print the i-th element of monumber
You've almost got it. Except you should use i on the left hand side, too:
monumber[i] = function3([i],[i+1])
Now, this is the basic idea, but the code will only work if monumber is already a list with enough elements, otherwise an IndexError will occur.
Instead of creating a list and filling it with placeholders in advance, we can dynamically append new values to it:
monumber = []
for i in xrange(70):
monumber.append(function3([i],[i+1]))
Another problem is that you seem to be confusing different types of arguments that your function works with. In the function body, it looks like start and end are strings, but in your code, you give to lists with one integer each. Without changing the function, you can do:
monumber = []
for i in xrange(70):
monumber.append(function3(str(i),str(i+1)))
[Python 3.1]
Edit: mistake in the original code.
I need to print a table. The first row should be a header, which consists of column names separated by tabs. The following rows should contain the data (also tab-separated).
To clarify, let's say I have columns "speed", "power", "weight". I originally wrote the following code, with the help from a related question I asked earlier:
column_names = ['speed', 'power', 'weight']
def f(row_number):
# some calculations here to populate variables speed, power, weight
# e.g., power = retrieve_avg_power(row_number) * 2.5
# e.g., speed = math.sqrt(power) / 2
# etc.
locals_ = locals()
return {x : locals_[x] for x in column_names}
def print_table(rows):
print(*column_names, sep = '\t')
for row_number in range(rows):
row = f(row_number)
print(*[row[x] for x in component_names], sep = '\t')
But then I learned that I should avoid using locals() if possible.
Now I'm stuck. I don't want to type the list of all the column names more than once. I don't want to rely on the fact that every dictionary I create inside f() is likely to iterate through its keys in the same order. And I don't want to use locals().
Note that the functions print_table() and f() do a lot of other stuff; so I have to keep them separate.
How should I write the code?
class Columns:
pass
def f(row_number):
c = Columns()
c.power = retrieve_avg_power(row_number) * 2.5
c.speed = math.sqrt(power) / 2
return c.__dict__
This also lets you specify which of the variables are meant as columns, instead of rather being temporary in the function.
You could use an OrderedDict to fix the order of the dictionaries. But as I see it that isn't even necessary. You are always taking the keys from the column_names list (except in the last line, I assume that is a typo), so the order of the values will always be the same.
an alternative to locals() will be to use the inspect module
import inspect
def f(row_number):
# some calculations here to populate variables speed, power, weight
# e.g., power = retrieve_avg_power(row_number) * 2.5
# e.g., speed = math.sqrt(power) / 2
# etc.
locals_ = inspect.currentframe().f_locals
return {x : locals_[x] for x in column_names }