Split array based on value

Split array based on value - python

I have an array:
foo = ['1', '2', '', '1', '2', '3', '', '1', '', '2']
¿Is there any efficient way to split this array into sub-arrays using '' as separator?
I want to get:
foo = [['1', '2'], ['1', '2', '3'], ['1'], ['2']]

In one line:
[list(g) for k, g in itertools.groupby(foo, lambda x: x == '') if not k]
Edit:
From the oficial documentation:
groupby
generates a break or new group every time the value of the key
function changes (which is why it is usually necessary to have sorted
the data using the same key function).
The key I generate can be True, or False. It changes each time we find the empty string element. So when it's True, g will contain an iterable with all the element before finding an empty string. So I convert this iterable as a list, and of course I add the group only when the key change
Don't know how to explain it better, sorry :/ Hope it helped

Create a list containing a single list.
output = [[]]
Now, iterate over your input list. If the item is not '', append it to the last element of output. If it is, add an empty list to output.
for item in foo:
if item == '':
output.append([])
else:
output[-1].append(item)
At the end of this, you have your desired output
[['1', '2'], ['1', '2', '3'], ['1'], ['2']]

Related

How to select elements of lists in a list group, if the elements(string) startswith a letter/number?

Here I want to select the elements in each list which meet the condition that they starts with '6'. However I didn't find the way to achieve it.
The lists are converted from a dataframe:
d = {'c1': ['64774', '60240', '60500', '19303', '38724', '11402'],
'c2': ['', '95868', '95867', '60271', '60502', '19125'],
'c3':['','','','','95867','60500']}
df= pd.DataFrame(data=d)
df
c1 c2 c3
64774
60240 95868
60500 95867
19303 60271
38724 60502 95867
11402 19125 60500
list = df.values.tolist()
list = str(list)
list
[['64774', '', ''],
['60240', '95868', ''],
['60500', '95867', ''],
['19303', '60271', ''],
['38724', '60502', '95867'],
['11402', '19125', '60500']]
I tried the code like:
[x for x in list if x.startswith('6')]
However it only returned '6' for elements meet the condition
['6', '6', '6', '6', '6', '6', '6', '6', '6']
What I'm looking for is a group of lists like:
"[['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]"

When you do list = str(list) you're converting your list to a string representation, i.e. list becomes
"[['64774', '', ''], ['60240', '95868', ''], ['60500', '95867', ''], ['19303', '60271', ''], ['38724', '60502', '95867'], ['11402', '19125', '60500']]"
You then loop through the string with the list comprehension
[x for x in list if x.startswith('6')]
Which produces each individual character in the string which means you just find all occurrences of 6 in the string, hence your result of
['6', '6', '6', '6', '6', '6', '6', '6', '6']
Sidenote: Don't use variable names that shadow builtin functions, like list, dict and so on, it will almost definitely cause issues down the line.
I'm not sure if there is any specific reason to use a dataframe/pandas for your question. If not, you could simply use a list comprehension
d = {
'c1': ['64774', '60240', '60500', '19303', '38724', '11402'],
'c2': ['', '95868', '95867', '60271', '60502', '19125'],
'c3':['','','','','95867','60500']
}
d2 = [[x] for v in d.values() for x in v if x.startswith('6')]
# d2: [['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]

You don't need to convert your list into str(list) since it is already string type.
lst = df.values.tolist()
lst = [[i] for l in lst for i in l if i.startswith('6') ]
print(lst)
Result:
[['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']]

Try this:
flatten = lambda l: [[item] for sublist in l for item in sublist]
print( flatten([ df[col][df[col].str.startswith("6") ].tolist() for col in df]))
Here, I used a list generator that collects all matching cells in a list, while iterating over the columns; this yields [['64774', '60240', '60500'], ['60271', '60502'], ['60500']]. To get to your desired output, I defined a function flatten which (somewhat) flattens that list to [['64774'], ['60240'], ['60500'], ['60271'], ['60502'], ['60500']].

how to remove the first occurence of an integer in a list

this is my code:
positions = []
for i in lines[2]:
if i not in positions:
positions.append(i)
print (positions)
print (lines[1])
print (lines[2])
the output is:
['1', '2', '3', '4', '5']
['is', 'the', 'time', 'this', 'ends']
['1', '2', '3', '4', '1', '5']
I would want my output of the variable "positions" to be; ['2','3','4','1','5']
so instead of removing the second duplicate from the variable "lines[2]" it should remove the first duplicate.

You can reverse your list, create the positions and then reverse it back as mentioned by #tobias_k in the comment:
lst = ['1', '2', '3', '4', '1', '5']
positions = []
for i in reversed(lst):
if i not in positions:
positions.append(i)
list(reversed(positions))
# ['2', '3', '4', '1', '5']

You'll need to first detect what values are duplicated before you can build positions. Use an itertools.Counter() object to test if a value has been seen more than once:
from itertools import Counter
counts = Counter(lines[2])
positions = []
for i in lines[2]:
counts[i] -= 1
if counts[i] == 0:
# only add if this is the 'last' value
positions.append(i)
This'll work for any number of repetitions of values; only the last value to appear is ever used.
You could also reverse the list, and track what you have already seen with a set, which is faster than testing against the list:
positions = []
seen = set()
for i in reversed(lines[2]):
if i not in seen:
# only add if this is the first time we see the value
positions.append(i)
seen.add(i)
positions = positions[::-1] # reverse the output list
Both approaches require two iterations; the first to create the counts mapping, the second to reverse the output list. Which is faster will depend on the size of lines[2] and the number of duplicates in it, and wether or not you are using Python 3 (where Counter performance was significantly improved).

you can use a dictionary to save the last position of the element and then build a new list with that information
>>> data=['1', '2', '3', '4', '1', '5']
>>> temp={ e:i for i,e in enumerate(data) }
>>> sorted(temp, key=lambda x:temp[x])
['2', '3', '4', '1', '5']
>>>

Remove odd-indexed elements from list in Python

I'm trying to remove the odd-indexed elements from my list (where zero is considered even) but removing them this way won't work because it throws off the index values.
lst = ['712490959', '2', '623726061', '2', '552157404', '2', '1285252944', '2', '1130181076', '2', '552157404', '3', '545600725', '0']
def remove_odd_elements(lst):
i=0
for element in lst:
if i % 2 == 0:
pass
else:
lst.remove(element)
i = i + 1
How can I iterate over my list and cleanly remove those odd-indexed elements?

You can delete all odd items in one go using a slice:
del lst[1::2]
Demo:
>>> lst = ['712490959', '2', '623726061', '2', '552157404', '2', '1285252944', '2', '1130181076', '2', '552157404', '3', '545600725', '0']
>>> del lst[1::2]
>>> lst
['712490959', '623726061', '552157404', '1285252944', '1130181076', '552157404', '545600725']
You cannot delete elements from a list while you iterate over it, because the list iterator doesn't adjust as you delete items. See Loop "Forgets" to Remove Some Items what happens when you try.
An alternative would be to build a new list object to replace the old, using a list comprehension with enumerate() providing the indices:
lst = [v for i, v in enumerate(lst) if i % 2 == 0]
This keeps the even elements, rather than remove the odd elements.

Since you want to eliminate odd items and keep the even ones , you can use a filter as follows :
>>>filtered_lst=list(filter(lambda x : x % 2 ==0 , lst))
this approach has the overhead of creating a new list.

Reading both numbers in an integer instead of the first when sorting

I'm trying to sort data from a text file and show it in python.
So far i have:
text_file = open ("Class1.txt", "r")
data = text_file.read().splitlines()
namelist, scorelist = [],[]
for li in data:
namelist.append(li.split(":")[0])
scorelist.append(li.split(":")[1])
scorelist.sort()
print (scorelist)
text_file.close()
It sorts the the data, however it only reads the first number:
['0', '0', '10', '3', '3', '5']
It reads 10 as "1"
This is what my text file looks like:
Harry:3
Jarrod:10
Jacob:0
Harold:5
Charlie:3
Jj:0

It's lexographically sorting, if you need integer sorting, append the split as an int
scorelist.append(int(li.split(":")[1]))

Since scorelist is a list of strings, "10" shows up before "3" because the first character in "10" is less than the first character in "3" (lexicographic sorting -- like words in a dictionary). The trick here is to tell python to sort integers. You can do that as the other answers point out by sorting a list of integers rather than a list of strings, OR you could use a key function to sort:
scorelist.sort(key=int)
This tells python to sort the items as integers rather than as strings. The nice thing here is that you don't need to change the data at all. You still end up with a list of strings rather than a list of integers -- you just tell python to change how it compares the strings. Neat.
demo:
>>> scorelist = ['3', '10', '0', '5', '3', '0']
>>> scorelist_int = [int(s) for s in scorelist]
>>>
>>> scorelist.sort(key=int)
>>> scorelist
['0', '0', '3', '3', '5', '10']
>>>
>>> scorelist_int.sort()
>>> scorelist_int
[0, 0, 3, 3, 5, 10]

The data are actually strings. The sort is done like in a dictionary.
You should convert scores into int:
scorelist.append(int(li.split(":")[1]))

Python: Very Basic, Can't figure out why it is not splitting into the larger number listed but rather into individual integers

Really quick question here, some other people helped me on another problem but I can't get any of their code to work because I don't understand something very fundamental here.
8000.5 16745 0.1257
8001.0 16745 0.1242
8001.5 16745 0.1565
8002.0 16745 0.1595
8002.5 16745 0.1093
8003.0 16745 0.1644
I have a data file as such, and when I type
f1 = open(sys.argv[1], 'rt')
for line in f1:
fields = line.split()
print list(fields [0])
I get the output
['1', '6', '8', '2', '5', '.', '5']
['1', '6', '8', '2', '6', '.', '0']
['1', '6', '8', '2', '6', '.', '5']
['1', '6', '8', '2', '7', '.', '0']
['1', '6', '8', '2', '7', '.', '5']
['1', '6', '8', '2', '8', '.', '0']
['1', '6', '8', '2', '8', '.', '5']
['1', '6', '8', '2', '9', '.', '0']
Whereas I would have expected from trialling stuff like print list(fields) to get something like
[16825.5, 162826.0 ....]
What obvious thing am I missing here?
thanks!

Remove the list; .split() already returns a list.
You are turning the first element of the fields into a list:
>>> fields = ['8000.5', '16745', '0.1257']
>>> fields[0]
'8000.5'
>>> list(fields[0])
['8', '0', '0', '0', '.', '5']
If you want to have the first column as a list, you can build a list as you go:
myfirstcolumn = []
for line in f1:
fields = line.split()
myfirstcolumn.append(fields[0])
This can be simplified into a list comprehension:
myfirstcolumn = [line.split()[0] for line in f1]

The last command is the problem.
print list(fields[0]) takes the zero'th item from your split list, then takes it and converts it into a list.
Since you have a list of strings already ['8000.5','16745','0.1257'], the zero'th item is a string, which converts into a list of individual elements when list() is applied to it.

Your first problem is that you apply list to a string:
list("123") == ["1", "2", "3"]
Secondly, you print once per line in the file, but it seems you want to collect the first item of each line and print them all at once.
Third, in Python 2, there's no 't' mode in the call to open (text mode is the default).
I think what you want is:
with open(sys.argv[1], 'r') as f:
print [ line.split()[0] for line in f ]

The problem was you were converting the first field which you correctly extracted into a list.
Here's a solution to print the first column:
with open(sys.argv[1]) as f1:
first_col = []
for line in f1:
fields = line.split()
first_col.append(fields[0])
print first_col
gives:
['8000.5', '8001.0', '8001.5', '8002.0', '8002.5', '8003.0']
Rather than doing f1 = open(sys.argv[1], 'rt') consider using with which will close the file when you are done or in case of an exception. Also, I left off rt since open() defaults to read and text mode.
Finally, this could also be written using list comprehension:
with open(sys.argv[1]) as f1:
first_col = [line.split()[0] for line in f1]

Others have already done a great job answering this question, the behavior that your seeing is because you're using list on a string. list will take any object that you can iterate over and turn it into a list -- one element at a time. This isn't really surprising except that the object doesn't even have to have an __iter__ method (which is the case with strings) -- There are a number of posts on SO about __iter__ so I won't focus on that part.
In any event, try the following code and see what it prints out:
>>> def enlighten_me(obj):
... print (list(obj))
... print (hasattr(obj))
...
>>> enlighten_me("Hello World")
>>> enlighten_me( (1,2,3,4) )
>>> enlighten_me( {'red':'wagon',1:5} )
Of course, you can try the example with sets, lists, generators ... Anything you can iterate over.
Levon posted a nice answer about how to create a column while reading your file. I will demonstrate the same thing using the built-in zip function.
rows=[]
for row in myfile:
rows.append(row.split())
#now rows is stored as [ [col1,col2,...] , [col1,col2,...], ... ]
At this point we could get the first column by (Levon's answer):
column1=[]
for row in rows:
column1.append(row[0])
or more succinctly:
column1=[row[0] for row in rows] #<-- This is called a list comprehension
But what if you want all the columns? (and what if you don't know how many columns there are?). This is a job for zip.
zip takes iterables as input and matches them up. In other words:
zip(iter1,iter2)
will take iter1[0] and match it with iter2[0], and match iter1[1] with iter2[1] and so on -- kind of like a zipper if you think about it. But, zip can take more than just 2 arguments ...
zip(iter1,iter2,iter3) #results in [ [iter1[0],iter2[0],iter3[0]] , [iter1[1],iter2[1],iter3[1]], ... ]
Now, the last piece of the puzzle that we need is argument unpacking with the star operator.
If I have a function:
def foo(a,b,c):
print a
print b
print c
I can call that function like this:
A=[1,2,3]
foo(A[0],A[1],A[2])
Or, I can call it like this:
foo(*A)
Hopefully this makes sense -- the star takes each element in the list and "unpacks" it before passing it to foo.
So, putting the pieces together (remember back to the list of rows), we can unpack the list of rows and pass it to zip which will match corresponding indices in each row (i.e. columns).
columns=zip(*rows)
Now to get the first column, we just do:
columns[0] #first column
for lists of lists, I like to think of zip(*list_of_lists) as a sort of poor-man's transpose.
Hopefully this has been helpful.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Split array based on value - python

I have an array: foo = ['1', '2', '', '1', '2', '3', '', '1', '', '2'] ¿Is there any efficient way to split this array into sub-arrays using '' as separator? I want to get: foo = [['1', '2'], ['1', '2', '3'], ['1'], ['2']]

Related

How to select elements of lists in a list group, if the elements(string) startswith a letter/number?

how to remove the first occurence of an integer in a list

Remove odd-indexed elements from list in Python

Reading both numbers in an integer instead of the first when sorting

Python: Very Basic, Can't figure out why it is not splitting into the larger number listed but rather into individual integers

Categories

Resources