Related
I have 2 list of lists.
One is say list1, which is :
[['1', '2', '*', '2', '1', '0.8'],
['1', '2', '3', '1', '1', '0.7'],
['*', '*', '3', '4', '1', '0.5'],
['1', '2', '*', '1', '1', '0.3'],
['2', '2', '*', '2', '1', '0.1']]
And list2 is :
[['3', '*', '1', '4', '1', '0.9'],
['1', '2', '2', '2', '1', '0.4'],
['2', '2', '*', '2', '1', '0.1']]
Now I want to get the union of these 2 list of lists and create a 3rd list of list and as ['2', '2', '*', '2', '1', '0.1'] this list is there in both of them so I want this to be in the final list of list for only 1 time.
I am doing:
final_list=list1 + list2
Final list is producing :
[['3', '*', '1', '4', '1', '0.9'],
['1', '2', '*', '2', '1', '0.8'],
['1', '2', '3', '1', '1', '0.7'],
['*', '*', '3', '4', '1', '0.5'],
['1', '2', '2', '2', '1', '0.4'],
['1', '2', '*', '1', '1', '0.3'],
['2', '2', '*', '2', '1', '0.1'],
['2', '2', '*', '2', '1', '0.1']]
My desired outcome is :
[['3', '*', '1', '4', '1', '0.9'],
['1', '2', '*', '2', '1', '0.8'],
['1', '2', '3', '1', '1', '0.7'],
['*', '*', '3', '4', '1', '0.5'],
['1', '2', '2', '2', '1', '0.4'],
['1', '2', '*', '1', '1', '0.3'],
['2', '2', '*', '2', '1', '0.1']]
If you don't care about order, you can convert them to sets and union (|) them:
list1 = [['1', '2', '*', '2', '1', '0.8'], ['1', '2', '3', '1', '1', '0.7'], ['*', '*', '3', '4', '1', '0.5'], ['1', '2', '*', '1', '1', '0.3'], ['2', '2', '*', '2', '1', '0.1']]
list2 = [['3', '*', '1', '4', '1', '0.9'], ['1', '2', '2', '2', '1', '0.4'], ['2', '2', '*', '2', '1', '0.1']]
output = set(map(tuple, list1)) | set(map(tuple, list2))
print(output)
# {('1', '2', '*', '1', '1', '0.3'),
# ('2', '2', '*', '2', '1', '0.1'),
# ('1', '2', '3', '1', '1', '0.7'),
# ('1', '2', '*', '2', '1', '0.8'),
# ('3', '*', '1', '4', '1', '0.9'),
# ('1', '2', '2', '2', '1', '0.4'),
# ('*', '*', '3', '4', '1', '0.5')}
If you want to have a list of lists (instead of a set of tuples), add the following:
output = list(map(list, output))
While this has been answered, here's a pretty simple way of doing it creating a third list and only appending elements to the third list if the element doesn't exist yet.
>>> results = []
>>> for sublist in list1 + list2:
... if sublist not in results:
... results.append(sublist)
...
>>> pp(results)
[['1', '2', '*', '2', '1', '0.8'],
['1', '2', '3', '1', '1', '0.7'],
['*', '*', '3', '4', '1', '0.5'],
['1', '2', '*', '1', '1', '0.3'],
['2', '2', '*', '2', '1', '0.1'],
['3', '*', '1', '4', '1', '0.9'],
['1', '2', '2', '2', '1', '0.4']]
>>>
You should be able to create a set, add each element of each list to that set, then create a list from that set, something like:
new_set = set()
for item in list1:
set.add(item)
for item in list2:
set.add(item)
final_list = list(set)
It won't be guaranteed to be in order but it will have only unique elements. Of course, if you don't care about order, you may as well be using sets exclusively so you don't have to concern yourself with duplicates.
If you want to maintain order, you just maintain both a final list and the set, and only add to the final list if it's not already in the set:
dupe_set = set()
final_list = []
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)
for item in list1:
if item not in dupe_set:
set.add(item)
final_list.append(item)
Whatever method you choose, this is the sort of thing that cries out for a function/method to do the heavy lifting, since it's quite likely you may want to do this more than once in your code (sometimes, it's worth a separate method even if only done once, as that can aid readability).
For example, these are the equivalent methods for the two options shown above, able to handle any number of input lists:
# Call with x = join_lists_uniq_any_order([list1, list2, ...])
def join_lists_uniq_any_order(list_of_lists):
# Process each list, adding each item to (initially empty) set.
new_set = set()
for one_list in list_of_lists:
for item in one_list:
set.add(item)
return list(set)
# Call with x = join_lists_uniq_keep_order([list1, list2, ...])
def join_lists_uniq_keep_order(list_of_lists):
dupe_set = set()
final_list = []
# Process each list, adding each item if not yet seen.
for one_list in list_of_lists:
for item in one_list:
if item not in dupe_set:
set.add(item)
final_list.append(item)
I have a list which has numbers, but because its appended to a list using for loop it has '\n' and I don't know how to remove it.
the list looks like this
['3', '7', '4', '5', '5', '9', '2', '2', '7', '\n', '4', '3', '7', '1', '5', '9', '4', '3', '0', '\n', '3', '7', '2', '4', '1', '0', '2', '7', '5', '\n', '7', '8', '4', '5', '1', '6', '2', '5', '7', '\n', '2', '8', '0', '6', '6', '1', '1', '2', '3', '\n', '9', '3', '5', '6', '8', '3', '8', '7', '1', '\n', '6', '7', '5', '5', '4', '7', '4', '8', '6']
I want to remove ' ' and '\n' so it would look like this
[374559227,437159430,372410275,784516257,280661123,935683871,675547486]
Join to a string and split the newlines:
l = [
'3', '7', '4', '5', '5', '9', '2', '2', '7', '\n', '4', '3', '7', '1', '5',
'9', '4', '3', '0', '\n', '3', '7', '2', '4', '1', '0', '2', '7', '5', '\n',
'7', '8', '4', '5', '1', '6', '2', '5', '7', '\n', '2', '8', '0', '6', '6',
'1', '1', '2', '3', '\n', '9', '3', '5', '6', '8', '3', '8', '7', '1', '\n',
'6', '7', '5', '5', '4', '7', '4', '8', '6'
]
print([int(x) for x in ''.join(l).split('\n')])
>>> [374559227, 437159430, 372410275, 784516257, 280661123, 935683871, 675547486]
You can use itertools.groupby:
>>> from itertools import groupby
>>> lst = ['3', '7', '4', '5', '5', '9', '2', '2', '7', '\n', '4', '3', '7', '1', '5', '9', '4', '3', '0', '\n', '3', '7', '2', '4', '1', '0', '2', '7', '5', '\n', '7', '8', '4', '5', '1', '6', '2', '5', '7', '\n', '2', '8', '0', '6', '6', '1', '1', '2', '3', '\n', '9', '3', '5', '6', '8', '3', '8', '7', '1', '\n', '6', '7', '5', '5', '4', '7', '4', '8', '6']
>>> [int(''.join(digits)) for is_number, digits in groupby(lst, lambda x: x != '\n') if is_number]
[374559227, 437159430, 372410275, 784516257, 280661123, 935683871, 675547486]
You can use reduce function
from functools import reduce
lst = ['3', '7', '4', '5', '5', '9', '2', '2', '7', '\n', '4', '3', '7', '1', '5', '9', '4', '3', '0', '\n', '3', '7', '2', '4', '1', '0', '2', '7', '5', '\n', '7', '8', '4', '5', '1', '6', '2', '5', '7', '\n', '2', '8', '0', '6', '6', '1', '1', '2', '3', '\n', '9', '3', '5', '6', '8', '3', '8', '7', '1', '\n', '6', '7', '5', '5', '4', '7', '4', '8', '6']
lst_result = [int(n) for n in reduce(lambda x, y: f"{x}{y}", lst).split('\n')]
Output:
[374559227, 437159430, 372410275, 784516257, 280661123, 935683871, 675547486]
I'm wondering if it is possible to convert the listings into a specific groups to which I could place them in a table format later on.
This is the output that I needed to group, I converted them into a list so that I could easily divide them in table manner.
f=open("sample1.txt", "r")
f.read()
Here's the output:
'0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL +99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430 31558 63001 10214 20197 40117 52014 70544 82108 333 20211 55062 56999 59012 82820 86280 555 60973=\n'
Here's what I have done already. I have managed to change it into a list which resulted in this output:
with open('sample1.txt', 'r') as file:
data = file.read().replace('\n', '')
print (list(data))
The Output:
['0', '2', '4', '5', '9', '8', '4', '3', '0', '0', '9', '9', '9', '9', '9', '2', '0', '1', '8', '0', '1', '0', '1', '0', '0', '0', '0', '4', '+', '1', '4', '6', '5', '0', '+', '1', '2', '1', '0', '5', '0', 'F', 'M', '-', '1', '2', '+', '0', '0', '4', '6', '9', '9', '9', '9', '9', 'V', '0', '2', '0', '3', '0', '0', '1', 'N', '0', '0', '1', '0', '1', '0', '9', '0', '0', '0', '1', 'C', 'N', '0', '0', '8', '0', '0', '0', '1', '9', '9', '+', '0', '2', '1', '4', '1', '+', '0', '1', '9', '7', '1', '1', '0', '1', '1', '7', '1', 'A', 'D', 'D', 'A', 'Y', '1', '4', '1', '0', '2', '1', 'A', 'Y', '2', '4', '1', '0', '2', '1', 'G', 'A', '1', '0', '2', '1', '+', '0', '0', '6', '0', '0', '1', '0', '8', '1', 'G', 'A', '2', '0', '6', '1', '+', '0', '9', '0', '0', '0', '1', '0', '2', '1', 'G', 'E', '1', '9', 'M', 'S', 'L', ' ', ' ', ' ', '+', '9', '9', '9', '9', '9', '+', '9', '9', '9', '9', '9', 'G', 'F', '1', '0', '6', '9', '9', '1', '0', '2', '1', '9', '9', '9', '0', '0', '6', '0', '0', '1', '9', '9', '9', '9', '9', '9', 'K', 'A', '1', '1', '2', '0', 'N', '+', '0', '2', '1', '1', '1', 'M', 'D', '1', '2', '1', '0', '1', '4', '1', '+', '9', '9', '9', '9', 'M', 'W', '1', '0', '5', '1', 'R', 'E', 'M', 'S', 'Y', 'N', '1', '0', '4', '9', '8', '4', '3', '0', ' ', '3', '1', '5', '5', '8', ' ', '6', '3', '0', '0', '1', ' ', '1', '0', '2', '1', '4', ' ', '2', '0', '1', '9', '7', ' ', '4', '0', '1', '1', '7', ' ', '5', '2', '0', '1', '4', ' ', '7', '0', '5', '4', '4', ' ', '8', '2', '1', '0', '8', ' ', '3', '3', '3', ' ', '2', '0', '2', '1', '1', ' ', '5', '5', '0', '6', '2', ' ', '5', '6', '9', '9', '9', ' ', '5', '9', '0', '1', '2', ' ', '8', '2', '8', '2', '0', ' ', '8', '6', '2', '8', '0', ' ', '5', '5', '5', ' ', '6', '0', '9', '7', '3', '=']
My goal is to group them into something like these:
0245,984300,99999,2018,01,01,0000,4,+1....
The number of digits belonging to each column is predetermined, for example there are always 4 digits for the first column and 6 for the second, and so on.
I was thinking of concatenating them. But I'm not sure if it would be possible.
You can use operator.itemgetter
from operator import itemgetter
g = itemgetter(slice(0, 4), slice(4, 10))
with open('sample1.txt') as file:
for line in file:
print(g(line))
Or even better you can make the slices dynamically using zip and itertools.accumulate:
indexes = [4, 6, ...]
g = itemgetter(*map(slice, *map(accumulate, zip([0]+indexes, indexes))))
Then proceed as before
I would recommend naming everything if you actually want to use this data, and double checking that all the lengths make sense. So to start you do
with open('sample1.txt', 'r') as file:
data = file.read().rstrip('\n"')
first, second, *rest = data.split()
if len(first) != 163:
raise ValueError(f"The first part should be 163 characters long, but it's {len(first)}")
if len(second) != 163:
raise ValueError(f"The second part should be characters long, but it's {len(first)}")
So now you have 3 variables
first is "0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL"
second is "+99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430"
rest is ['31558', '63001', '10214', '20197', '40117', '52014', '70544', '82108', '333', '20211', '55062', '56999', '59012', '82820', '86280', '555', '60973']
And then repeat that idea
date, whatever, whatever2, whatever3 = first.split('+')
and then for parsing the first part I would just have a list like
something = date[0:4]
something_else = date[4:10]
third_thing = date[10:15]
year = [15:19]
month = [19:21]
day = [21:23]
and so on. And then you can use all these variables in the code that analyzes them.
If this is some sort of standard, you should look for a library that parses strings like that or write one yourself.
Obviously name the variables better
I have a function called "organizer" which takes 2 values a list and an index(number), to reorganize this list, after that it return a new list.
I want to execute this function 10 times to create 10 lists and add this lists to a single list of lists for later use, the program acomplish this, however it repeat the last list 10 times, it does not add the other lists.
I put the function "organizer" in here,because Im unable to find where the problem relies so more information may be needed.
I have been using the print function along the function to see where it fails, and it creates the lists as I desire, the problem is that after creating them it just copy the last created list as many times as the loop goes on. It takes the final list produced and copy it several times
Here is the code:
number_list = ["12","11","10","9","8","7","6","5","4","3","2","1"]
indexes = [0,5,6,8,10,4,5,2,1,9]
def organizer(list,index): # Takes a list and reorganize it by a number which is an index.
number1 = index - 1
count_to_add = -1
list_to_add = []
for item in range(number1):
count_to_add += 1
a = list[count_to_add]
list_to_add.append(a)
number2 = index - 1
for item1 in range(number2):
del list[0]
list.extend(list_to_add)
return list
def lists_creator(list_indexes): # Create a list of 10 lists ,with 12 elements each.
final_list = []
count = -1
for item in list_indexes:
count += 1
result = organizer(number_list, list_indexes[count])
final_list.append(result)
return final_list
final_output = lists_creator(indexes)
print(final_output)
Here is the result:(the same list 10 times)
[['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8'], ['7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8']]
Note: If I change the line 28 in list_creator function
final_list.append(result)
for
final_list.extend(result)
the result is:
['12', '11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1', '8', '7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '3', '2', '1', '12', '11', '10', '9', '8', '7', '6', '5', '4', '8', '7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1', '12', '8', '7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '4', '3', '2', '1', '12', '11', '10', '9', '8', '7', '6', '5', '3', '2', '1', '12', '11', '10', '9', '8', '7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8', '7', '6', '5', '4', '7', '6', '5', '4', '3', '2', '1', '12', '11', '10', '9', '8']
Which is the result that I desire but is not a group of lists.
you are using the same list every time, so the result consists of actually one list,
in your list_creator method, you have to give to organizer method a copy of your original list number_list, otherwise you just mutate over and over again the same one list:
def lists_creator(list_indexes): # Create a list of 10 lists ,with 12 elements each.
final_list = []
count = -1
for item in list_indexes:
count += 1
result = organizer(list(number_list), list_indexes[count])
final_list.append(result)
return final_list
def lists_creator(list_indexes): # Create a list of 10 lists ,with 12 elements each.
final_list = []
list = []
for i in range(len(list_indexes)):
result = organizer(number_list, list_indexes[i])
list.extend(result)
n = 0
m = 12
for n_times in range(len(list_indexes)):
final_list.append(list[n:m])
n = m
m += 12
return final_list
I'm using the following script to grab all the files in a directory, then filtering them based on their modified date.
dir = '/tmp/whatever'
dir_files = os.listdir(dir)
dir_files.sort(key=lambda x: os.stat(os.path.join(dir, x)).st_mtime)
files = []
for f in dir_files:
t = os.path.getmtime(dir + '/' + f)
c = os.path.getctime(dir + '/' + f)
mod_time = datetime.datetime.fromtimestamp(t)
created_time = datetime.datetime.fromtimestamp(c)
if mod_time >= form.cleaned_data['start'].replace(tzinfo=None) and mod_time <= form.cleaned_data['end'].replace(tzinfo=None):
files.append(f)
return by_hour
I'm need to go one step further and group the files by the hour in which they where modified. Does anyone know how to do this off the top of their head?
UPDATE: I'd like to have them in a dictionary ({date,hour,files})
UPDATED:
Thanks for all your replies!. I tried using the response from david, but when I output the result it looks like below (ie. it's breaking up the filename):
defaultdict(<type 'list'>, {datetime.datetime(2013, 1, 9, 15, 0): ['2', '8', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '1', '8', '4', '3', '.', 'a', 'v', 'i', '2', '9', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '2', '0', '2', '4', '.', 'a', 'v', 'i', '3', '0', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '3', '8', '5', '9', '.', 'a', 'v', 'i', '3', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '4', '1', '2', '4', '.', 'a', 'v', 'i', '3', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '3', '1', '0', '.', 'a', 'v', 'i', '3', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '5', '5', '8', '.', 'a', 'v', 'i'], datetime.datetime(2013, 1, 9, 19, 0): ['6', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '1', '1', '8', '.', 'a', 'v', 'i', '6', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '6', '3', '1', '.', 'a', 'v', 'i', '6', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '1', '4', '1', '5', '.', 'a', 'v', 'i', '6', '4', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '2', '2', '3', '3', '.', 'a', 'v', 'i']})
I was hoping to get it to store the complete file names. Also how would I loop over it and grab the files in each hour and the hour they belong to?
I managed to sort the above out by just changing it to append. However it's not sorted from the oldest hour to the most recent.
Many thanks,
Ben
You can round a datetime object to the nearest hour with the line:
mod_hour = datetime.datetime(*mod_time.timetuple()[:4])
(This is because mod_time.timetuple()[:4] returns a tuple like (2013, 1, 8, 21). Thus, using a collections.defaultdict to keep a dictionary of lists:
import collections
by_hour = collections.defaultdict(list)
for f in dir_files:
t = os.path.getmtime(dir + '/' + f)
mod_time = datetime.datetime.fromtimestamp(t)
mod_hour = datetime.datetime(*mod_time.timetuple()[:4])
# for example, (2013, 1, 8, 21)
by_hour[mod_hour].append(f)
import os, datetime, operator
dir = "Your_dir_path"
by_hour =sorted([(f,datetime.datetime.fromtimestamp(os.path.getmtime(os.path.join(dir , f)))) for f in os.listdir(dir)],key=operator.itemgetter(1), reverse=True)
above code will give sorting based on year-->month-->day-->hour-->min-->sec format.
Building on David's excellent answer, you can use itertools.groupby to simplify the work a little bit:
import os, itertools, datetime
dir = '/tmp/whatever'
mtime = lambda f : datetime.datetime.fromtimestamp(os.path.getmtime(dir + '/' + f))
mtime_hour = lambda f: datetime.datetime(*mtime(f).timetuple()[:4])
dir_files = sorted(os.listdir(dir), key=mtime)
dir_files = filter(lambda f: datetime.datetime(2012,1,2,4) < mtime(f) < datetime.datetime(2012,12,1,4), dir_files)
by_hour = dict((k,list(v)) for k,v in itertools.groupby(dir_files, key=mtime_hour)) #python 2.6
#by_hour = {k:list(v) for k,v in itertools.groupby(dir_files, key=mtime_hour)} #python 2.7
Build entries lazily, Use UTC timezone, read modification time only once:
#!/usr/bin/env python
import os
from collections import defaultdict
from datetime import datetime
HOUR = 3600 # seconds in an hour
dirpath = "/path/to/dir"
start, end = datetime(...), datetime(...)
# get full paths for all entries in dirpath
entries = (os.path.join(dirpath, name) for name in os.listdir(dirpath))
# add modification time truncated to hour
def date_and_hour(path):
return datetime.utcfromtimestamp(os.path.getmtime(path) // HOUR * HOUR)
entries = ((date_and_hour(path), path) for path in entries)
# filter by date range: [start, end)
entries = ((mtime, path) for mtime, path in entries if start <= mtime < end)
# group by hour
result = defaultdict(list)
for dt, path in entries:
result[dt].append(path)
from pprint import pprint
pprint(dict(result))