In python I have a list like below
in_list =[u'test_1,testing_1', u'test_2,testing_2', u'test_3,testing_3']
I want to print the values in this list in a loop
for test, testing in input:
print test, testing
I get this error:
ValueError: too many values to unpack
What is the correct method?
You have a list of three values on the right side; you have only two variables on the left. Doing this assignment of a sequence (list, in your case) to a series of variables is called "unpacking". You must have a 1:1 correspondence between values and variables for this to work.
I think what you're trying to do is to iterate through comma-separated value pairs. Try something like the code below. Iterate through the three strings in your input list (use a different variable name: input is a built-in function). For each string, split it at the comma. This gives you a list of two values ... and those you can unpack.
for pair in input_list: # "input" is a built-in function; use a different name
test, testing = pair.split(',')
# continue with your coding
Related
I'm learning to access dictionary keys-values and work with list comprehensions. My assignment asks me to:
"Use a while loop that prints only variant names located in chromosomes that do not have numbers (e.g., X)."
And I'm working with this dictionary of lists, where the keys are variant names, and the zeroth elements in the list values (the character sets on the left of the colon([0])) are chromosome names, while the characters to the right of the colon ([1])are their chromosome location, and the [2] values are gene names.
cancer_variations={"rs13283416": ["9:116539328-116539328+","ASTN2"],\
"rs17610181":["17:61590592-61590592+","NACA2"],\
"rs1569113445":["X:12906527-12906527+","TLR8TLR8-AS1"],\
"rs143083812":["7:129203569-129203569+","SMO"],\
"rs5009270":["7:112519123-112519123+","IFRD1"],\
"rs12901372":["15:67078168-67078168+","SMAD3"],\
"rs4765540":["12:124315096-124315096+","FAM101A"],\
"rs3815148":["CHR_HG2266_PATCH:107297975-107297975+","COG5"],\
"rs12982744":["19:2177194-2177194+","DOT1L"],\
"rs11842874":["13:113040195-113040195+","MCF2L"]}
I have found how to print the variant names based on the length of the zeroth element in the lists (the chromosome names):
for rs, info in cancer_variations.items():
tmp_info=info[0].split(":")
if (len(tmp_info[0])>3):
print(rs)
But I'm having trouble printing the key values, the variant names, based on the TYPE of the chromosome name, the zeroth element in the list values. To that end, I've devised this code, but I'm not sure how to phrase the Boolean values to print only if the chromosome name is one particular type, (Str) or (int).
for rs, info in cancer_variations.items():
tmp_info=info[0].split(":")
if tmp_info[0] = type.str
print(rs)
I am not sure exactly what I'm not seeing here with my syntax.
Any help will be greatly appreciated.
If I understand you right, you want to check if the first part before : contains a number or not.
You can iterate the string character-by-character and use str.isnumeric() to check if the character is number or not. If any character is a number, continue to next item:
cancer_variations = {
"rs13283416": ["9:116539328-116539328+", "ASTN2"],
"rs17610181": ["17:61590592-61590592+", "NACA2"],
"rs1569113445": ["X:12906527-12906527+", "TLR8TLR8-AS1"],
"rs143083812": ["7:129203569-129203569+", "SMO"],
"rs5009270": ["7:112519123-112519123+", "IFRD1"],
"rs12901372": ["15:67078168-67078168+", "SMAD3"],
"rs4765540": ["12:124315096-124315096+", "FAM101A"],
"rs3815148": ["CHR_HG2266_PATCH:107297975-107297975+", "COG5"],
"rs12982744": ["19:2177194-2177194+", "DOT1L"],
"rs11842874": ["13:113040195-113040195+", "MCF2L"],
}
for k, (v, *_) in cancer_variations.items():
if not any(ch.isnumeric() for ch in v.split(":")[0]):
print(k)
Prints:
rs1569113445
You need to look up how to determine your desired classification of the data. In this case, all you need is to differentiate alphabetic data from numeric:
if tmp_info[0].isalpha():
print(rs)
Should get you on your way.
First you need to make sure what you want to do.
If what you want is to distinguish a numeric string from a normal string, then you may want to know that a numeric string is strictly formed of numbers; if you add any other character, it's not considered numeric by python. You can prove this making this experiment:
print('23123'.isnumeric())
print('2312ds3'.isnumeric())
Results in:
True
False
Numeric strings is what you are looking to exclude, and any other, in this case, that stays as str, will fit, if i'm understanding.
So, in that manner, we are going to iterate over the dict, using the loop you've made:
for rs, info in cancer_variations.items():
tmp_info=info[0].split(":")
if not tmp_info[0].isnumeric():
print(rs)
Which results in:
rs1569113445
rs3815148
I need to find a way to split a string of multiples numbers into multiples strings of those numbers and then split again to have individual digits which would allow me to test those first inputed numbers to see if they are a harshad number without using for, else, while and if.
So far i'm able to split the input string:
a = input("Multiple numbers separated by a ,: ")
a.split(",")
Then I need to split again I think I need to use the map function. Any idea how to go any further.
The python builtin functions map, filter, and reduce are going to be your friend when you are working in a more functional style.
map
The map function lets you transform each item in an iterable (list, tuple, etc.) by passing it to a function and using the return value as a new value in a new iteratable*.
The non-functional approach would use a for ... in construct:
numbers_as_strings = ["1", "12", "13"]
numbers_as_ints = []
for number in numbers_as_strings:
numbers_as_ints.append(int(number))
or more concisely a list comprehension
numbers_as_ints =[int(number) for number in numbers_as_strings]
Since you are eschewing for there is another way
numbers_as_ints = map(int, numbers_as_strings)
But you don't just want your strings mapped to integers, you want to test them for harshadiness. Since we're doing the functional thing let's create a function to do this for us.
def is_harshad(number_as_string):
return # do your harshad test here
Then you can map your numbers through this function
list(map(is_harshad, numbers_as_string)) # wrap in list() to resolve the returned map object.
>>> [True, True, False]
But maybe you want the results as a sequence of harshady number strings? Well check out filter
filter
The filter function lets you choose which items from an iterable you want to keep in a new iterable. You give it a function that operates on an single item and returns True for a keeper or False for a rejection. You also give it an iterable of items to test.
A non-functional way to do this is with a for loop
harshady_numbers = []
for number in numbers_as_strings:
if is_harshad(number):
harshady_numbers.append(number)
Or more concisely and nicely, with a list comprehension
harshady_numbers = [number for number in numbers_as_strings if is_harshady(number)]
But, since we're getting functional well use filter
harshady_numbers = filter(is_harshady, numbers_as_strings)
That's about it. Apply the same functional thinking to complete the is_harshad function and you're done.
map() can take more than one iterable argument and it returns an iterator not a list.
I have 2 arrays like:
['16.37.235.200','17.37.235.200','16.37.235.200', '18.37.235.200']
['17.37.235.200','17.37.235.200','16.37.235.200', '17.37.235.200']
And I want to map (injective) every IP address to an integer value.
Like for that instance above, eg.:
[0,1,0,3]
[1,1,0,1]
Is their an existing function (of NumPy or anything else) for that?
Ok i found this solution for seperate mapping of the lists
Python Map List of Strings to Integer List
Works like i want for seperated mapping of the 2 lists.
I would like to compare a column from several pairs of pandas dataframes and write the shared values to an empty list. I have written a function that can do this with a single pair of dataframes, but I cannot seem to scale it up.
def parser(dataframe1,dataframe2,emptylist):
for i1 in dataframe1['POS']:
for i2 in dataframe2['POS']:
if i1 == i2:
emptylist.append(i1)
Where 'POS' is a column header in the two pandas dataframes.
I have made a list of variable names for each input value of this function, eg.
dataframe1_names=['name1','name2',etc...]
dataframe2_names=['name1','name2',etc...]
emptylist_names=['name1','name2',etc...]
Where each element of the list is a string containing the name of a variable (either a pandas dataframe in the case of the first two, or an empty list in the case of the last).
I have tried to iterate through these lists using the following code:
import itertools
for a, b, c in zip(range(len(dataframe1_names)), range(len(dataframe2_names)), range(len(emptylist_names))):
parser(dataframe1_names[a],dataframe2_names[b],emptylist_names[c])
But this returns TypeError: string indices must be integers.
I believe that this error is coming from passing the function a string containing the variable name instead of the variable name itself. Is there another way to pass multiple variables to a function in an automated way?
Thanks for your help!
Do you have to use strings of object names, instead of just the objects themselves? If you do
dataframes1=[name1,name2,...]
dataframes2=[name1,name2,...]
emptylists=[name1,name2,...]
Then you can just do
for a,b,c in zip( dataframes1, dataframes2, emptylists ):
parser(a,b,c)
The way you do this is really circuitous and unpythonic, by the way, so I've changed it a bit. Rather than getting lists of indexes for the for statement, I just iterate through the lists (and thus the objects) themselves. This is much more compact, and easier to understand. For that matter, do you have a need to input the empty list as an argument (eg, perhaps they aren't always empty)? And your code for the parser, while correct, doesn't take advantage of pandas at all, and will be very slow: to compare columns, you can simply do dataframe1['COL'] == dataframe2['COL'], which will give you a boolean series of where values are equal. Then you can use this for indexing a dataframe, to get the shared values. It comes out as a dataframe or series, but it's easy enough to convert to a list. Thus, your parser function can be reduced to the following, if you don't need to create the "empty list" elsewhere first:
def parser( df1, df2 ):
return list( df1['COL'][ df1['COL']==df2['COL'] ] )
This will be much, much faster, though as it returns the list, you'll have to do something with it, so in your case, you'd do something like:
sharedlists = [ parser(a,b) for a,b in zip( dataframes1, dataframes2 ) ]
If you must use variable names, the following very unsafe sort of code will convert your lists of names into lists of objects (you'll need to do this for each list):
dataframes1 = [ eval(name) for name in dataframe1_names ]
If this is just for numerical work you're doing in an interpreter, eval is alright, but for any code you're releasing, it's very insecure: it will evaluate whatever code is in the string passed into it, thus allowing arbitrary code execution.
This sounds like a use case of .query()
A use case for query() is when you have a collection of DataFrame
objects that have a subset of column names (or index levels/names) in
common. You can pass the same query to both frames without having to
specify which frame you’re interested in querying
map(lambda frame: frame.query(expr), [df, df2])
What kind of output are you looking for in the case where you have more than two DataFrame objects? In the case of just two, the following line would accomplish what your parser function does:
common = df1[df1["fieldname"] == df2["fieldname"]]["fieldname"]
except that common would be a DataFrame object itself, rather than a list, but you can easily get a list from it by doing list(common).
If you're looking for a function that takes any number of DataFrames and returns a list of common values in some field for each pair, you could do something like this:
from itertools import combinations
def common_lists(field, *dfs):
return [df1[df1[field] == df2[field]][field] for df1, df2 in combinations(dfs, 2)]
The same deal about getting a list from a DataFrame applies here, since you'll be getting a list of DataFrames.
As far as this bit:
import itertools
for a, b, c in zip(range(len(dataframe1_names)), range(len(dataframe2_names)), range(len(emptylist_names))):
parser(dataframe1_names[a],dataframe2_names[b],emptylist_names[c])
What you're doing is creating a list that looks something like this:
[(0,0,0), (1,1,1), ... (n,n,n)]
where n is the length of the shortest of dataframe1_names, dataframe2_names, and emptylist_names. So on the first iteration of the loop, you have a == b == c == 0, and you're using these values to index into your arrays of data frame variable names, so you're calling parser("name1", "name1", "name1"), passing it strings instead of pandas DataFrame objects. Your parser function is expecting DataFrame objects so it barfs when you try to call dataframe1["POS"] where dataframe1 is the string "name1".
I'm extracting features from a specific class of objects I have and decided to built a method that extracts all features at once, i.e. call all feature extraction methods and return them in a tuple, as shown below.
def extractFeatures(self):
if self.getLength()<=10:
return ()
else:
return (self.getMean(), # a number
self.getStd(), # a number
self.getSkew(), # a number
self.getKurt(), # a number
# Many other methods here, such as:
self.getACF(), # which returns a TUPLE of numbers...
)
Nevertheless, I have some methods returning tuples with numbers instead of individual numbers, and since I'm still doing some tests and varying the length in each one of these tuples, hard typing self.getACF()[0], self.getACF()[1], self.getACF()[2], ... is not a good idea.
Is there a pythonic way of getting these values already "unpacked" so that I can return a tuple of only numbers instead of numbers and maybe nested tuples of indefinite size?
You could build a list of the values to return, then convert to a tuple at the end. This lets you use append for single values and extend for tuples:
def extractFeatures(self):
if self.getLength() > 10:
out = [self.getMean(), self.getStd(), self.getSkew()]
out.append(self.getKurt()] # single value
out.extend(self.getACF()) # multiple values
return tuple(out)
Note that this will implicitly return None if self.getLength() is 10 or less.
However, bear in mind that your calling function now needs to know exactly what numbers are coming and in what order. An alternative in this case is to return a dictionary:
return {'mean': self.getMean(), ... 'ACF': self.getACF()}
Now the calling function can easily access the features required by key, and you can pass these as keyword arguments to other functions with dictionary unpacking:
def func_uses_mean_and_std(mean=None, std=None, **kwargs):
...
features = instance.extractFeatures()
result = func_uses_mean_and_std(**features)