Iterate through a string in chunks of different sizes python - python

So I am working with files in python, feel like there is a name for them but I'm not sure what it is. They are like csv files but with no separator. Anyway in my file I have lots of lines of data where the first 7 characters are an ID number then the next 5 are something else and so on. So I want to go through the file reading each line and splitting it up and storing it into a list. Here is an example:
From the file: "0030108102017033119080001010048000000"
These are the chunks I would like to split the string into: [7, 2, 8, 6, 2, 2, 5, 5] Each number represents the length of each chunk.
First I tried this:
n = [7, 2, 8, 6, 2, 2, 5, 5]
for i in range(0, 37, n):
print(i)
Naturally this didn't work, so now I've started thinking about possible methods and they all seem quite complex. I looked around online and couldn't seem to find anything, only even sized chunks. So any input?
EDIT: The answer I'm looking for should in this case look like this:
['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']
Where each value in the list n represents the length of each chunk.

If these are ASCII strings (or rather, one byte per character), I might use struct.unpack for this.
>>> import struct
>>> sizes = [7, 2, 8, 6, 2, 2, 5, 5]
>>> struct.unpack(''.join("%ds" % x for x in sizes), "0030108102017033119080001010048000000")
('0030108', '10', '20170331', '190800', '01', '01', '00480', '00000')
>>>
Otherwise, you can construct the necessary slice objects from partial sums of the sizes, which is simple to do if you are using Python 3:
>>> psums = list(itertools.accumulate([0] + sizes))
>>> [s[slice(*i)] for i in zip(psums, psums[1:])]
['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']
accumulate can be implemented in Python 2 with something like
def accumulate(itr):
total = 0
for x in itr:
total += x
yield total

from itertools import accumulate, chain
s = "0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
ranges = list(accumulate(n))
list(map(lambda i: s[i[0]:i[1]], zip(chain([0], ranges), ranges))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

Could you try this?
for line in file:
n = [7, 2, 8, 6, 2, 2, 5, 5]
total = 0
for i in n:
print(line[total:total+i])
total += i
This is how I might have done it. The code iterates through each line in the file, and for each line, iterate through the list of lengths you need to pull out which is in the list n. This can be amended to do something else instead of print, but the idea is that a slice is returned from the line. The total variable keeps track of how far into the lines we are.

Here's a generator that yields the chunks by iterating through the characters of the lsit and forming substrings from them. You can use this to process any iterable in this fashion.:
def chunks(s, sizes):
it = iter(s)
for size in sizes:
l = []
try:
for _ in range(size):
l.append(next(it))
finally:
yield ''.join(l)
s="0030108102017033119080001010048000000"
n = [7, 2, 8, 6, 2, 2, 5, 5]
print(list(chunks(s, n)))
# ['0030108', '10', '20170331', '190800', '01', '01', '00480', '00000']

Related

Hardest python generator interview question - python generator object of ranges of numbers

got this one in a python course, still can't figure it out:
Input - a string of start and end points a of a desired range.
Output - a generator of numbers containing all the numbers in all of the ranges.
The problem: making a function, using only two generators expressions (no for loops).
Example:
Input:
list(parse_ranges("1-2,4-4,8-10"))
Desired output:
[1, 2, 4, 8, 9, 10]
what I've come to so far:
def parse_ranges(ranges_string):
first_generator = ([int(i[0]),int(i[-1])] for i in ranges_string.split(','))
second_generator = (range(j[0],j[1]) for j in first_generator)
return second_generator
my output:
[range(1, 2), range(4, 4), range(8, 0)]
Well, that does it, but I wouldn't recommend to write such unreadable code...
def parse_ranges(string):
ranges = (tuple(map(int, (s.split('-')))) for s in string.split(','))
return (x for r in ranges for x in range(r[0], r[1]+1) )
list(parse_ranges("1-2,4-4,8-10"))
# [1, 2, 4, 8, 9, 10]
My two cents:
s = "1-2,4-4,8-10"
def parse_ranges(s):
ranges = ((int(start), int(stop) + 1) for start, stop in (chunk.split('-') for chunk in s.split(',')))
yield from (i for start, end in ranges for i in range(start, end))
print(list(parse_ranges(s)))
Output
[1, 2, 4, 8, 9, 10]

Create a list of random integers and then put all numbers that are the same right beside eachother

def run():
lst=[]
for i in range(0,20):
ran = random.randint(1,10)
lst.append(ran)
return lst
So far I have created a list of random integers from 1 to 9 with 20 values, however how can I incorporate a swapping method so that values that are the same but not next to eachother will be next to eachother?
Thanks
You can build your own sorting criteria using indexes for the key argument.
import random
def run():
lst=[]
for i in range(0,20):
ran = random.randint(1,10)
lst.append(ran)
return lst
lst = run()
print(lst)
#[5, 10, 5, 1, 8, 10, 10, 6, 4, 9, 3, 9, 6, 9, 2, 9, 9, 1, 7, 8]
result = sorted(lst, key = lambda x: lst.index(x))
print(result)
#[5, 5, 10, 10, 10, 1, 1, 8, 8, 6, 6, 4, 9, 9, 9, 9, 9, 3, 2, 7]
Perhaps just sort the list:
lst = sorted(lst)
import random
#this is the function you gave with little edits, to see the changes it make
# after the process
def run():
lst=[]
for i in range(0,20):
ran = random.randint(1,10)
lst.append(ran)
print(lst)
swap(lst)
print(lst)
return lst
#this uses indexes of every element, and checks every other element of the list.
#this swap function works for lists with element made up of strings as well.
def swap(lst):
for i in range(len(lst)):
nu_m=lst[i]
x=i+1
while x<len(lst):
dump=i+1
acc=lst[i+1]
if(lst[i]==lst[x]):
lst[dump]=lst[x]
lst[x]=acc
x=x+1
x=run()
First let's create another list to keep the order of the unique numbers (like a set, but not sorted).
unsorted_set = []
for nb in lst:
if nb not in unsorted_set:
unsorted_set.append(nb)
Now that we got this list, let's create a final list that will continue that list but each number will be repeated n times, n is the occurences of the number in the first list. We will do this with lst.count()
final_list = []
for nb in unsorted_set:
for _ in range(lst.count(nb)):
final_list.append(nb)
Note that this code can be simplified a lot with List Comprehension.

Finding if a number is a perfect square using maths trickery

First post here;
I'm trying to find if an inputted number is a perfect square. This is what i've come up with (im a complete first timer noob)
import math
num = int(input("enter the number:"))
square_root = math.sqrt(num)
perfect_square = list[1, 4, 5, 6, 9, 00]
ldigit = num%10
if ldigit in perfect_square:
print(num, "Is perfect square")
The list are digits that if the integer ends in, it will be a perfect square.
perfect_square = list[1, 4, 5, 6, 9, 00]
TypeError: 'type' object is not subscriptable
Never seen this before (surprise). Apologies if it's a total mess of logic and understanding.
You have an error in your code:
perfect_square = list[1, 4, 5, 6, 9, 00]
Should be:
perfect_square = ['1', '4', '5', '6', '9', '00']
Secondly these are defined as ints, so you cannot have a number 00, instead convert everything to string to do the check and then back to ints with str and int.
Personally I'd rather go with another approach:
import math
num = int(15)
square_root = math.sqrt(num)
if square_root == int(square_root):
print(f"{num} is a perfect square")
else:
print(f"{num} is not a perfect square")
You declare a list without the keyword 'list', like this:
perfect_square = [1, 4, 5, 6, 9, 00]
We dont need list keyword in order to create List object in python .
List is a python builtin type. List literals are written within square brackets [].
For Example :
squares = [1, 4, 9, 16]
squares is a List here .
Ashutosh

In Python how can I change the values in a list to meet certain criteria

In Python, I have several lists that look like variations of:
[X,1,2,3,4,5,6,7,8,9,X,11,12,13,14,15,16,17,18,19,20]
[X,1,2,3,4,5,6,7,8,9,10,X,12,13,14,15,16,17,18,19,20]
[0,X,2,3,4,5,6,7,8,9,10,11,X,13,14,15,16,17,18,19,20]
The X can fall anywhere. There are criteria where I put an X, but it's not important for this example. The numbers are always contiguous around/through the X.
I need to renumber these lists to meet a certain criteria - once there is an X, the numbers need to reset to zero. Each X == a reset. Each X needs to become a zero, and counting resumes from there to the next X. Results I'd want:
[0,1,2,3,4,5,6,7,8,9,0,1,2,3,4,5,6,7,8,9,10]
[0,1,2,3,4,5,6,7,8,9,10,0,1,2,3,4,5,6,7,8,9]
Seems like a list comprehension of some type or a generator could help me here, but I can't get it right.
I'm new and learning - your patience and kindness are appreciated. :-)
EDIT: I'm getting pummeled with downvotes, like I've reposted on reddit or something. I want to be a good citizen - what is getting me down arrows? I didn't show code? Unclear question? Help me be better. Thanks!
Assuming the existing values don't matter this would work
def fixList(inputList, splitChar='X'):
outputList = inputList[:]
x = None
for i in xrange(len(outputList)):
if outputList[i] == splitChar:
outputList[i] = x = 0
elif x is None:
continue
else:
outputList[i] = x
x += 1
return outputList
eg
>>> a = ['X',1,2,3,4,5,6,7,8,9,'X',11,12,13,14,15,16,17,18,19,20]
>>> fixList(a)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
>>> b = ['y',1,2,3,4,5,6,7,8,9,10,'y',12,13,14,15,16,17,18,19,20]
>>> fixList(b, splitChar='y')
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
EDIT: fixed to account for the instances where list does not start with either X or 0,1,2,...
Using the string 'X' as X and the_list as list:
[0 if i == 'X' else i for i in the_list]
This will return the filtered list.

python: splitting a long string at distinct places in one run

I am entirely new to programming and just yesterday started learning python for scientific purposes.
Now, I would like to split a single very long string (174 chars) into several smaller as follows:
string = 'AA111-99XYZ '
split = ('AA', 11, 1, -99, 'XYZ')
Right now, the only thing I can think of is to use the slice syntax x-times, but maybe there is a more elegant way? Is there a way to use a list of integers to indicate the positions of where to split, e.g.
split_at = (2, 4, 5, 8, 11)
split = function(split_at, string)
I hope my question is not too silly - I couldn't find a similar example, but maybe I just don't know what I'm looking for?
Thanks,
Jan
Like this:
>>> string = 'AA111-99XYZ '
>>> split_at = [2, 4, 5, 8, 11]
>>> [string[i:j] for i, j in zip([0]+split_at, split_at+[None])]
['AA', '11', '1', '-99', 'XYZ', ' ']
def split_string(string, points):
for left, right in zip(points, points[1:]):
yield string[left:right]
to avoid redundancy, you could take ATOzTOA's nice solution and put it in a lamba-function:
st = 'AA111-99XYZ '
sa = [2, 4, 5, 8, 11]
res = lambda string,split_at:[string[i:j] for i, j in zip([0]+split_at, split_at+[None])]
print(res(st,sa))
Being relatively new to Python myself, I took the approach of a complete beginner here just to help guide someone who isn't yet familiar with the power of Python.
string = 'AA111-99XYZ '
split_at = [2, 4, 5, 8, 11]
for i in range(len(split_at)):
if i == 0:
print string[:split_at[i]]
if i < len(split_at)-1:
print string[split_at[i]:split_at[i+1]]
if i == len(split_at)-1:
print string[split_at[i]:]

Categories