I have a data set which contains keywords,rank_organic and document column. For every keyword there are documents for rank_organic = 1,2,3,4 or 5. But for some keywords I have some rank_organic field missing.
For example: For keyword A, I have rank_organic = 1,2,4,5 and 3 is missing. I want to create a list of documents of length 5 where for rank_organic=3, null or space should come and for rest rank the documents should come.
Below is the code which I am using but it giving error. Please help me how to achieve it.
def key_doc(data):
lis=[]
for i in pd.unique(data['keyword']):
a = data.loc[data['keyword'].isin([i])]
j = i.replace(" ","_")
j = Node(i, parent= Testing,
documents=[(a.loc[(a['rank_organic']==1)])['vocab'].tolist()[0]
,(a.loc[(a['rank_organic']==2)])['vocab'].tolist()[0]
,(a.loc[(a['rank_organic']==3)])['vocab'].tolist()[0]
,(a.loc[(a['rank_organic']==4)])['vocab'].tolist()[0]
,(a.loc[(a['rank_organic']==5)])['vocab'].tolist()[0]])
# print j.name, len(j.documents)
lis.append(j)
return lis
ERROR:
,(a.loc[(a['rank_organic']==3)])['vocab'].tolist()[0]
IndexError: list index out of range
I recommend you use a list or dictionary comprehension for this and use next to retrieve the first element. next also has an optional argument, which we define as [] (empty list), if there are no elements to extract.
docs = [next(iter(a.loc[a['rank_organic'] == i, 'vocab'].tolist()), []) \
for i in range(1, 6)]
Then feed docs as your class instance argument.
Below is a minimal example of how the next(iter(lst), ...) method works:-
lst = [[1, 2, 3], [4], [], [3, 5]]
res = [next(iter(i), []) for i in lst]
# [1, 4, [], 3]
Related
i m trying to add the name of the column in pandas data frame to each element in that column so i added the column names to a list as well as iterated the rows to a list . so now i need to do something like this:
names=['a','c']
list_of_lists = [[1, 2], [3, 4]]
for nlist in list_of_lists:
for element in nlist:
print(f"{i}_{list_of_lists[element]}" for i in range( len(name)))
so what i need it to print is [[a_1,c_2],[a_3,c_4]] but this did [[a_1,a_2],[c_3,c_4]
You can write a nested list comprehension that first loops over your list of lists, then zips that against the name list.
>>> [[f'{i}_{j}' for i,j in zip(names, sub)] for sub in list_of_lists]
[['a_1', 'c_2'], ['a_3', 'c_4']]
Trying to more closely match what was requested (i.e. no quotation marks):
class C():
def __repr__(self):
return self.s
def __init__(self,c,n):
self.s = ("%s_%d" % (c,n))
print( [[C(*x) for x in zip(names,lol)] for lol in list_of_lists] )
It seams that there is a little mistake in your code. On the last line, name is used, but was never defined.
I wanted to ask you, is it really only to print what would be your table with the new elements' names that you wanted to do or to modify your table?
If you only want to print a str, I agree with Cory Kramer, but if what you want is to change your table, I propose this solution:
column_names = ['a', 'c']
columns = [[1, 2], [3, 4]]
for column_counter in range(len(column_names)):
for row_counter in range(len(columns[column_counter])):
element = columns[column_counter][row_counter]
# Add new element to new column
columns[column_counter][row_counter] = str(column_names[column_counter] + '_' + str(element))
print(columns)
The output:
[['a_1', 'a_2'], ['c_3', 'c_4']]
For example if i have an list containing integers
arr = [1,2,3,4,5,6]
I would like to split this list into two lists based on specific indexes.
If i specify the indexes 0,1 and 3 it should return the old list with the removed items and a new list containing only the specified items.
arr = [1,2,3,4,5,6]
foo(arr, "013"): # returns -> [3,5,6] and [1,2,4]
Here's one way using a generator function, by popping the elements from the input list, while they are yielded from the function.
Given that the items in the list are being removed while iterating over it, it'll be necessary to sort in reverse order the list of indices, so that the indices of the actual values to remove in the input list remain unchanged while its values are being removed.
def foo(l, ix):
for i in sorted(list(ix), reverse=True):
yield l.pop(int(i))
By calling the function we get the values that have been removed:
arr = [1,2,3,4,5,6]
list(foo(arr, "013"))[::-1]
# [1, 2, 4]
And these have been removed from the original list:
print(arr)
# [3, 5, 6]
Hi you should look as pop() function.
Using this function is modifying the list directly.
The code should look like :
def foo( arr, indexes):
res= []
# process list in descending order to not modify order of indexes
for i in sorted(indexes, reverse=True):
res = arr.pop(i)
return res, arr
Thus foo(arr, [0,1,3]) is returning : [3,5,6], [1,2,4]
Created one solution based on How to remove multiple indexes from a list at the same time?
which does not use yield.
arr = [1,2,3,4,5,6]
indexes = [0,1,3]
def foo(arr, indexes):
temp = []
for index in sorted(indexes, reverse=True):
temp.append(arr.pop(index))
return arr, temp # returns -> [3, 5, 6] and [4, 2, 1]
This is what you need:
def foo(arr,idxstr):
out=[] # List which will contain elements according to string idxstr
left=list(arr) # A copy of the main List, which will result to the exceptions of the list out
for index in idxstr: # Iterates through every character in string idxstr
out.append(arr[int(index)])
left.remove(arr[int(index)])
return(out,left)
Given a list of string, say mystr= ["State0", "State1", "State2", "State5", "State8"].
I need to find the missing States (here "State3", "State4", "State6", "State7"). Is there a possibility to find it?
Desired output: mylist = ["State3", "State4", "State6", "State7"]
Assuming the highest number in the list might not be known, one way is to extract the numerical part in each string, take the set.difference with a range up to the highest value and create a new list using a list comprehension:
import re
ints = [int(re.search(r'\d+', i).group(0)) for i in mystr]
# [0, 1, 2, 5, 8]
missing = set(range(max(ints))) - set(ints)
# {3, 4, 6, 7}
[f'State{i}' for i in missing]
# ['State3', 'State4', 'State6', 'State7']
Simple enough with list comprehensions and f-strings:
mystr = ["State0", "State1", "State2", "State5", "State8"]
highest = max(int(state.split('State')[-1]) for state in mystr)
mylist = [f"State{i}" for i in range(highest) if f"State{i}" not in mystr]
print(mylist)
Output:
['State3', 'State4', 'State6', 'State7']
Note that this solution is nice and general and will work even if the last element in the original list if for example "State1024", and even if the original list is not sorted.
I am not sure what you are asking. I think this is what you expect.
mystr= ["State0", "State1", "State2", "State5", "State8"]
print(['State'+str(p) for p in range(8) if 'State'+str(p) not in mystr ])
You can use the following solution:
lst = ["State0", "State1", "State2", "State5", "State8"]
states = set(lst)
len_states = len(states)
missing = []
num = 0
while len_states:
state = f'State{num}'
if state in states:
len_states -= 1
else:
missing.append(state)
num += 1
print(missing)
Output:
['State3', 'State4', 'State6', 'State7']
I have a python script that imports a CSV file and based on the file imported, I have a list of the indexes of the file.
I am trying to match the indexes in FILESTRUCT to the CSV file and then replace the data in the column with new generated data. Here is a code snip-it:
This is just a parsed CSV file returned from my fileParser method:
PARSED = fileParser()
This is a list of CSV column positions:
FILESTRUCT = [6,7,8,9,47]
This is the script that is in question:
def deID(PARSED, FILESTRUCT):
for item in PARSED:
for idx, lis in enumerate(item):
if idx == FILESTRUCT[0]:
lis = dataGen.firstName()
elif idx == FILESTRUCT[1]:
lis = dataGen.lastName()
elif idx == FILESTRUCT[2]:
lis = dataGen.email()
elif idx == FILESTRUCT[3]:
lis = dataGen.empid()
elif idx == FILESTRUCT[4]:
lis = dataGen.ssnGen()
else:
continue
return(PARSED)
I have verified that it is correctly matching the indices (idx) with the integers in FILESTRUCT by adding a print statement at the end of each if statement. That works perfectly.
The problem is that when I return(PARSED) it is not returning it with the new generated values, it is instead, returning the original PARSED input values. I assume that I am probably messing something up with how I use the enumerate method in my second loop, but I do not understand the enumerate method well enough to really know what I am messing up here.
You can use
item[idx] = dataGen.firstName()
to modify the underlying item. The reason here is that enumerate() returns (id, value) tuples rather than references to the iterable that you passed.
Given your example above you may not even need enumerate, because you're not parsing the lis at all. So you could also just do
for i in range(len(item)):
# your if .. elif statements go here ...
item[i] = dataGen.firstName()
On a side-note, the elif statements in your code will become unwieldy once you start adding more conditions and columns. Maybe consider making FILESTRUCT a dictionary like:
FILESTRUCT = {
6: dataGen.firstName,
7: dataGen.lastName,
....
}
...
for idx in range(len(item)):
if idx in FILESTRUCT.keys():
item[idx] = FILESTRUCT[idx]()
So PARSED is an iterable, and item is an element of it and is also an iterable, and you want to make changes to PARSED by changing elements of item.
So let's do a test.
a = [1, 2, 3]
print 'Before:'
print a
for i, e in enumerate(a):
e += 10
print 'After:'
print a
for e in a:
e += 10
print 'Again:'
print a
a[0] += 10
print 'Finally:'
print a
The results are:
Before:
[1, 2, 3]
After:
[1, 2, 3]
Again:
[1, 2, 3]
Finally:
[11, 2, 3]
And we see, a is not changed by changing the enumerated elements.
You aren't returning a changed variable. You don't ever change the variable FILESTRUCT. Rather make another variable, make it as you loop through FILESTRUCT and then return your new FILE.
You can't change the values in a loop like that, Kind of like expecting this to return all x's:
demo_data = "A string with some words"
for letter in demo_data:
letter = "x"
return demo_data
It won't, it will return: "A string with some words"
I am having trouble with list comprehension in Python
Basically I have code that looks like this
output = []
for i, num in enumerate(test):
loss_ = do something
test_ = do something else
output.append(sum(loss_*test_)/float(sum(loss_)))
How can I write this using list comprehension such as:
[sum(loss_*test_)/float(sum(loss_))) for i, num in enumerate(test)]
however I don't know how to assign the values of loss_ and test_
You can use a nested list comprehension to define those values:
output = [sum(loss_*test_)/float(sum(loss_))
for loss_, test_ in ((do something, do something else)
for i, num in enumerate(test))]
Of course, whether that's any more readable is another question.
As Yaroslav mentioned in the comments, list comprehensions don't allow you to save a value into a variable directly.
However it allows you to use functions.
I've made a very basic example (because the sample you provided is incomplete to test), but it should show how you can still execute code in a list comprehension.
def loss():
print "loss"
return 1
def test():
print "test"
return 5
output = [loss()*test() for i in range(10) ]
print output
which is this case will result in a list [5, 5, 5, 5, 5, 5, 5, 5, 5, 5]
I hope this somehow shows how you could end up with the behaviour that you were looking for.
ip_list = string.split(" ") # split the string to a list using space seperator
for i in range(len(ip_list)): # len(ip_list) returns the number of items in the list - 4
# range(4) resolved to 0, 1, 2, 3
if (i % 2 == 0): ip_list[i] += "-" # if i is even number - concatenate hyphen to the current IP string
else: ip_list[i] += "," # otherwize concatenate comma
print("".join(ip_list)[:-1]) # "".join(ip_list) - join the list back to a string
# [:-1] trim the last character of the result (the extra comma)