Data and string merging using python - python

I found an interesting function in Julia called zip. zip orders the calls to its subiterators in such a way that stateful iterators will not advance when another iterator finishes in the current iteration.
I would like to create a similar kind of code that gives output similar to Julia's zip.
For example, say a=1:5 and b=["e","d","b","c","a"], I would like to have an output where each value of both datasets is selected like this:
(1,"e"),(2,"d"), (3,"b") and so on.
Is there any possible way to do this in Python?

This is done by the zip() function in Pyhton.
Here is some documentation about it. The description says :
Returns a list of tuples, where the i-th tuple contains the i-th element from each of the argument sequences or iterables.
The returned list is truncated in length to the length of the shortest argument sequence. When there are multiple arguments which are all of the same length, zip() is similar to map() with an initial argument of None. With a single sequence argument, it returns a list of 1-tuples. With no arguments, it returns an empty list.
The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using zip(*[iter(s)]*n).
And here are a few examples :
zip('foo', 'bar')
>>> [('f', 'b'), ('o', 'a'), ('o', 'r')]
zip((1, 1), (2, 4))
>>> [(1, 2), (1, 4)]
zip((1, 2, 3), (4, 5))
>>> [(1, 4), (2, 5)]
zip(range(1,6), ['a','b','c','f','k'])
>>> [(1,'a'), (2,'b'), (3,'c'), (4,'f'), (5,'k')]

Related

Why aren't two `zip` objects equal if the underlying data is equal?

Suppose we create two zips from lists and tuples, and compare them, like so:
>>> x1=[1,2,3]
>>> y1=[4,5,6]
>>> x2=(1,2,3)
>>> y2=(4,5,6)
>>> w1=zip(x1,y1)
>>> w2=zip(x2,y2)
>>> w1 == w2
False
But using list on each zip shows the same result:
>>> list(w1)
[(1, 4), (2, 5), (3, 6)]
>>> list(w2)
[(1, 4), (2, 5), (3, 6)]
Why don't they compare equal, if the contents are equal?
The two zip objects don't compare equal because the zip class doesn't define any logic for comparison, so it uses the default object logic that only cares about object identity. In this case, an object can only ever compare equal to itself; the object contents don't matter.
So, the zip objects will not compare equal even if they are constructed the same way, from the same immutable data:
>>> x = (1, 2, 3)
>>> y = (4, 5, 6)
>>> zip(x, y) == zip(x, y) # separate objects, therefore not equal
False
That said: zip objects don't "contain" the values in the iteration, which is why they can't be reused. The only robust way to verify that they'll give the same results when iterated, is to do that iteration.
Internally, the zip object just has some iterators over other data. One might think, why not compare the iterators, to see if they "point at" the same position in the same underlying data? But that cannot work, either: there are arbitrarily many ways to implement the iterator, plus the iterator doesn't in general know anything about what it's iterating over. Many iterators won't "know where" they are in the underlying data. Many iterators aren't at a position in underlying data, but instead they calculate values on the fly.

join two lists which don't have same length, repeating shortest

I have two lists:
l1 = [1,2,3,4,5]
l2 = ["a","b","c"]
My expected output:
l3 = [(1,"a"),(2,"b"),(3,"c"),(4,"a"),(5,"b")]
So basically I'm looking to join two lists and when they are not same lenght i have to spread items from other list by repeating from start.
I tried:
using zip() but it is bad for this case as it join with equal length
>>> list(zip(l1,l2))
[(1, 'a'), (2, 'b'), (3, 'c')]
You can use itertools.cycle so that zip aggregates elements both from l1 and a cycling version of l2:
from itertools import cycle
list(zip(l1, cycle(l2)))
# [(1, 'a'), (2, 'b'), (3, 'c'), (4, 'a'), (5, 'b')]
cycle is very useful when the iterable over which you are cycling is combined or zipped with other iterables, so the iterating process will stop as soon as some other iterable is exhausted. Otherwise it will keep cycling indefinitely (in the case there is a single cycle generator or also that all other iterables are also infinite as #chepner points out)

In Python, if I print a zip object casted to a set (print(set(zip_object))) twice in a row I get => set( ) as the second result. Why?

Given a zip object created from two lists:
print(set(zip_object))
print(set(zip_object))
Yields two different results. The second result is: set( ). Why?
In learning about Python's zip function I followed the two examples:
https://www.w3schools.com/python/ref_func_zip.asp
https://www.geeksforgeeks.org/zip-in-python/
One example shows the results of zip by casting the object to a tuple, the other by casting to a set. I noticed when I attempted to print the casted zip_object twice in a row, I got two different results.
Similar "errors" occur whether I cast to tuple , set, or list, so which data type I'm casting to doesn't seem to matter.
If I store the casted results into a new variable
(a = set(zip_object),
instead of printing directly,
print(a)
print(a)
produces identical results, as expected. So this error may have to do with the zip object being overwritten in memory?
Directly printing a casted, non-zipped list, tuple or set, twice, produces expected results. So it has something to do with the zip function.
Given the code:
courses = ['History', 'Math', 'Physics', 'CompSci']
period = [1, 2, 3, 4]
schedule = zip(period, courses)
print(set(schedule))
print(set(schedule))
Expected:
=> {(4, 'CompSci'), (1, 'History'), (2, 'Math'), (3, 'Physics')}
=> {(4, 'CompSci'), (1, 'History'), (2, 'Math'), (3, 'Physics')}
Actual:
=> {(4, 'CompSci'), (1, 'History'), (2, 'Math'), (3, 'Physics')}
=> set( )
Why do we get set( )?
Thanks for the help!
You are exhausting the iterator; it doesn't reset between calls to set. Effectively, set calls next on the iterator until StopIteration is raised. The next call to set starts with an iterator on which the first call to next raises StopIteration.

What does the code zip( *sorted( zip(units, errors) ) ) do?

For my application units and errors are always lists of numerical values. I tried googling what each part does and figured out the firs part of zip. It seems
ziped_list = zip(units, errors)
simply pairs the units and errors to produce a list as [...,(unit, error),...]. Then Its passed to sorted which sorts the elements. Since I did not provide the argument for key, then it compares the elements directly as the documentation implies:
The default value is None (compare the elements directly).
Since the ziped_list is a list of tuples of integers, then it seems that it makes a comparison between tuples directly. From a small example in my terminal (python 3) it seems it compares based on the first element (even though the documentation implies the comparison is element wise):
>>> (1,None) < (2,None)
True
>>> (2,None) < (1,None)
False
The last bit the unpacking and then zip still remain a mystery and I have not been able to figure out what they do. I understand that * unpacks to positional argument but doing * doesn't let me see exactly what its doing if I try it in the command line. What further confuses me is why zip requires to be passed as argument an unpacked list such as *sorted if it already takes as an argument zip(*iterable) a variable called iterable. It just seems confusing (to me) why we would need to unpack something that just allows as input a list of iterables.
If you don't unpack list, then pass to argument as one element, so zip can't aggregates elements from each of the iterables.
For example:
a = [3, 2, 1,]
b = ['a', 'b', 'c']
ret = zip(a, b)
the_list = sorted(ret)
the_list >> [(1, 'c'), (2, 'b'), (3, 'a')]
zip(*the_list) is equal to zip((1, 'c'), (2, 'b'), (3, 'a'))
output : [(1, 2, 3), ('c', 'b', 'a')]
If you just use zip(the_list) is equal to zip([(1, 'c'), (2, 'b'), (3, 'a')],)
output: [((1, 'c'),), ((2, 'b'),), ((3, 'a'),)]
You can also see What does ** (double star) and * (star) do for Python parameters?
Seems you've already figured out what zip does.
When you sort the zipped list, sorted compares the first element of each tuple, and sorts the list. If the first elements are equal, the order is determined by the second element.
The * operator then unpacks the sorted list.
Finally, the second zip recombines the the output.
So what you end up with is two lists of tuples. The first list is errors, sorted from smallest to largest. The second list is the corresponding errors.

What exactly are tuples in Python?

I'm following a couple of Pythone exercises and I'm stumped at this one.
# C. sort_last
# Given a list of non-empty tuples, return a list sorted in increasing
# order by the last element in each tuple.
# e.g. [(1, 7), (1, 3), (3, 4, 5), (2, 2)] yields
# [(2, 2), (1, 3), (3, 4, 5), (1, 7)]
# Hint: use a custom key= function to extract the last element form each tuple.
def sort_last(tuples):
# +++your code here+++
return
What is a Tuple? Do they mean a List of Lists?
The tuple is the simplest of Python's sequence types. You can think about it as an immutable (read-only) list:
>>> t = (1, 2, 3)
>>> print t[0]
1
>>> t[0] = 2
TypeError: tuple object does not support item assignment
Tuples can be turned into new lists by just passing them to list() (like any iterable), and any iterable can be turned into a new tuple by passing it to tuple():
>>> list(t)
[1, 2, 3]
>>> tuple(["hello", []])
("hello", [])
Hope this helps. Also see what the tutorial has to say about tuples.
Why are there separate tuple and list data types? (Python FAQ)
Python Tuples are Not Just Constant Lists
Understanding tuples vs. lists in Python
A tuple and a list is very similar. The main difference (as a user) is that a tuple is immutable (can't be modified)
In your example:
[(2, 2), (1, 3), (3, 4, 5), (1, 7)]
This is a list of tuples
[...] is the list
(2,2) is a tuple
A tuple is a sequence of immutable Python objects. Tuples are sequences, just like lists. The differences between tuples and lists are, the tuples cannot be changed unlike lists and tuples use parentheses, whereas lists use square brackets.
Creating a tuple is as simple as putting different comma-separated values. Optionally you can put these comma-separated values between parentheses also. For example −
tup1 = ('Dog', 'Cat', 2222, 555555);
tup2 = (10, 20, 30, 40, 50 );
tup3 = "a", "b", "c", "d";
The empty tuple is written as two parentheses containing nothing −
tup1 = ();
To write a tuple containing a single value you have to include a comma, even though there is only one value −
tup1 = (45,);
Like string indices, tuple indices start at 0, and they can be sliced, concatenated, and so on.
The best summary of the differences between lists and tuples I have read is this:
One common summary of these more interesting, if subtle, differences is that tuples are heterogeneous and lists are homogeneous. In other words: Tuples (generally) are sequences of different kinds of stuff, and you deal with the tuple as a coherent unit. Lists (generally) are sequences of the same kind of stuff, and you deal with the items individually.
From: http://news.e-scribe.com/397
Tuples are used to group related variables together. It's often more convenient to use a tuple rather than writing yet another single-use class. Granted, accessing their content by index is more obscure than a named member variable, but it's always possible to use 'tuple unpacking':
def returnTuple(a, b):
return (a, b)
a, b = returnTuple(1, 2)
In Python programming, a tuple is similar to a list. The difference between them is that we cannot change the elements of a tuple once it is assigned whereas in a list, elements can be changed.
data-types-in-python

Categories