I'm using the following script to grab all the files in a directory, then filtering them based on their modified date.
dir = '/tmp/whatever'
dir_files = os.listdir(dir)
dir_files.sort(key=lambda x: os.stat(os.path.join(dir, x)).st_mtime)
files = []
for f in dir_files:
t = os.path.getmtime(dir + '/' + f)
c = os.path.getctime(dir + '/' + f)
mod_time = datetime.datetime.fromtimestamp(t)
created_time = datetime.datetime.fromtimestamp(c)
if mod_time >= form.cleaned_data['start'].replace(tzinfo=None) and mod_time <= form.cleaned_data['end'].replace(tzinfo=None):
files.append(f)
return by_hour
I'm need to go one step further and group the files by the hour in which they where modified. Does anyone know how to do this off the top of their head?
UPDATE: I'd like to have them in a dictionary ({date,hour,files})
UPDATED:
Thanks for all your replies!. I tried using the response from david, but when I output the result it looks like below (ie. it's breaking up the filename):
defaultdict(<type 'list'>, {datetime.datetime(2013, 1, 9, 15, 0): ['2', '8', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '1', '8', '4', '3', '.', 'a', 'v', 'i', '2', '9', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '2', '0', '2', '4', '.', 'a', 'v', 'i', '3', '0', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '3', '8', '5', '9', '.', 'a', 'v', 'i', '3', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '4', '1', '2', '4', '.', 'a', 'v', 'i', '3', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '3', '1', '0', '.', 'a', 'v', 'i', '3', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '5', '5', '5', '5', '8', '.', 'a', 'v', 'i'], datetime.datetime(2013, 1, 9, 19, 0): ['6', '1', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '1', '1', '8', '.', 'a', 'v', 'i', '6', '2', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '0', '6', '3', '1', '.', 'a', 'v', 'i', '6', '3', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '1', '4', '1', '5', '.', 'a', 'v', 'i', '6', '4', '-', '2', '0', '1', '3', '0', '1', '0', '9', '1', '9', '2', '2', '3', '3', '.', 'a', 'v', 'i']})
I was hoping to get it to store the complete file names. Also how would I loop over it and grab the files in each hour and the hour they belong to?
I managed to sort the above out by just changing it to append. However it's not sorted from the oldest hour to the most recent.
Many thanks,
Ben
You can round a datetime object to the nearest hour with the line:
mod_hour = datetime.datetime(*mod_time.timetuple()[:4])
(This is because mod_time.timetuple()[:4] returns a tuple like (2013, 1, 8, 21). Thus, using a collections.defaultdict to keep a dictionary of lists:
import collections
by_hour = collections.defaultdict(list)
for f in dir_files:
t = os.path.getmtime(dir + '/' + f)
mod_time = datetime.datetime.fromtimestamp(t)
mod_hour = datetime.datetime(*mod_time.timetuple()[:4])
# for example, (2013, 1, 8, 21)
by_hour[mod_hour].append(f)
import os, datetime, operator
dir = "Your_dir_path"
by_hour =sorted([(f,datetime.datetime.fromtimestamp(os.path.getmtime(os.path.join(dir , f)))) for f in os.listdir(dir)],key=operator.itemgetter(1), reverse=True)
above code will give sorting based on year-->month-->day-->hour-->min-->sec format.
Building on David's excellent answer, you can use itertools.groupby to simplify the work a little bit:
import os, itertools, datetime
dir = '/tmp/whatever'
mtime = lambda f : datetime.datetime.fromtimestamp(os.path.getmtime(dir + '/' + f))
mtime_hour = lambda f: datetime.datetime(*mtime(f).timetuple()[:4])
dir_files = sorted(os.listdir(dir), key=mtime)
dir_files = filter(lambda f: datetime.datetime(2012,1,2,4) < mtime(f) < datetime.datetime(2012,12,1,4), dir_files)
by_hour = dict((k,list(v)) for k,v in itertools.groupby(dir_files, key=mtime_hour)) #python 2.6
#by_hour = {k:list(v) for k,v in itertools.groupby(dir_files, key=mtime_hour)} #python 2.7
Build entries lazily, Use UTC timezone, read modification time only once:
#!/usr/bin/env python
import os
from collections import defaultdict
from datetime import datetime
HOUR = 3600 # seconds in an hour
dirpath = "/path/to/dir"
start, end = datetime(...), datetime(...)
# get full paths for all entries in dirpath
entries = (os.path.join(dirpath, name) for name in os.listdir(dirpath))
# add modification time truncated to hour
def date_and_hour(path):
return datetime.utcfromtimestamp(os.path.getmtime(path) // HOUR * HOUR)
entries = ((date_and_hour(path), path) for path in entries)
# filter by date range: [start, end)
entries = ((mtime, path) for mtime, path in entries if start <= mtime < end)
# group by hour
result = defaultdict(list)
for dt, path in entries:
result[dt].append(path)
from pprint import pprint
pprint(dict(result))
Related
I am cleaning up data in a .txt file. I use 3 different .txt. files to analyze and clean up three different constructs. The .txt files all have 10 respondents, the first and the last have 17 answers per respondent. The middle one has 16 answers per respondent. The problem I'm facing right now is that the first and last work, but the middle one with 16 questions has problems with the index. All three pieces of code look almost identical.
The error code:
Traceback (most recent call last):
File "main.py", line 161, in <module>
itemF = dataF[row,column]
IndexError: index 16 is out of bounds for axis 1 with size 16
Sample input:
['N.v.t.', '0', '1', '2', '1', 'N.v.t.', '0', '0', '2', '0', '0', '3', '2', '3', '1', '1']
['N.v.t.', '1', 'N.v.t.', '0', '0', 'N.v.t.', '2', '0', 'N.v.t.', '1', '0', '1', '1', '2', '0', '1']
['N.v.t.', '0', 'N.v.t.', '0', '0', 'N.v.t.', '0', '0', 'N.v.t.', '0', '0', '3', '0', '3', '0', '0']
['2', '2', 'N.v.t.', '1', '3', '1', '2', '1', '1', '3', '2', '2', '3', '1', '2', '3']
['1', '2', 'N.v.t.', '0', '0', 'N.v.t.', '2', '2', '0', '2', '1', '2', '2', '3', '1', '2']
['N.v.t.', '0', 'N.v.t.', '1', '0', 'N.v.t.', '1', '2', 'N.v.t.', '1', '0', '3', '1', '3', '2', '2']
['0', '3', 'N.v.t.', '0', '2', '3', '2', '1', '3', '2', '2', '2', '2', '3', '0', '1']
['1', '3', 'N.v.t.', '0', '2', 'N.v.t.', '0', '2', 'N.v.t.', '0', '1', '1', '0', '2', '2', '1']
['1', '2', '2', '2', '3', '3', '0', '2', '2', '2', '2', '2', '2', '2', '2', '1']
['1', '2', 'N.v.t.', '0', '2', 'N.v.t.', '1', '3', '2', '2', '1', '3', '2', '2', '2', '2']
The code:
import numpy
dataF = numpy.loadtxt("answersFEAR.txt", dtype = str, delimiter = ", ")
shapeF = dataF.shape
(shapeF[0] == 5)
print(dataF)
for i in range(0, shape[0]):
str1 = dataF[i, 0]
str2 = dataF[i, -1]
dataF[i, 0] = str1.replace('[', '')
dataF[i, -1] = str2.replace(']', '')
for column in range(0,shape[1]):
for row in range(0, shape[0]):
itemF = dataF[row,column]
dataF[rij,kolom] = itemF.replace("'", '')
dataF[dataF == 'N.v.t.'] = numpy.nan
print("DATA FEAR")
print(dataD)
scoresF = dataF[:,1:17]
scoresF = scoresF.astype(float)
average_score_fear = numpy.nanmean(scoresF, axis = 1)
print("")
print("AVERAGE SCORE FEAR")
print(average_score_fear)
The expected outcome should look like this (this is just one result):
["['1'" "'2'" "'2'" "'2'" "'3'" "'3'" "'0'" "'2'" "'2'" "'2'" "'2'" "'2'" '2'" "'2'" "'2'" "'1']"]
DATA FEAR
[['1', '2', '2', '2', '3', '3', '0', '2', '2', '2', '2', '2', '2', '2', '2', '1']]
AVERAGE SCORE FEAR
After a lengthy operation I get a defaultdict like the following:
l = defaultdict(list)
l = [('S', ['(', 'Num']), ('Num',['Sign', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])]
print(l)
How can I now update all values for each key if they have values from another key?
Expected result:
Step 1:
l_new = [('S', ['(', 'Sign', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Num',['-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])]
print(l)
Step 2:
l_new = [('S', ['(', '-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Num',['-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])]
print(l)
You can use a one-liner like so:
l = dict([('S', ['(', 'Num']), ('Num',['Sign', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])])
l_new = {
k: [val for subl in (l.get(el, [el]) for el in v) for val in subl]
for k, v in l.items()
}
print(l_new)
# {'S': ['(', 'Sign', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0'],
# 'Num': ['-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0'],
# 'Op': ['+', '-', '*', '/'],
# 'Sign': ['-']}
If you need to repeat the operation more than once, then just run it again. (Reassign to l and compute another l_new.)
You can use a recursive generator function:
l = [('S', ['(', 'Num']), ('Num',['Sign', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])]
l_d = dict(l) #convert l to a dictionary for faster lookup
def get_keys(d, c = []):
if d not in l_d:
yield d
elif d not in c:
yield from [i for k in l_d[d] for i in get_keys(k, c+[d])]
r = [(a, [j for k in b for j in get_keys(k)]) for a, b in l]
Output:
[('S', ['(', '-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Num', ['-', '1', '2', '3', '4', '5', '6', '7', '8', '9', '0']), ('Op', ['+', '-', '*', '/']), ('Sign', ['-'])]
With recursion, you do not need to manually run a single comprehension multiple times to replace key tokens in value lists of l, as it will continue to traverse l, replacing the key tokens with their subsequent non-key tokens.
gene=[]
x=1
path = '/content/drive/MyDrive/abc.txt'
file = open(path,'r').readlines()
for i in file[x]:
i.split('\t')[0]
gene.append(i)
x+=1
print(gene)
my output is
['A', 'H', 'Y', '3', '9', '2', '7', '8', '\t', '1', '4', '5', '.', '5', '4', '4', '\t', '1', '3', '5', '.', '2', '4', '\t', '5', '6', '.', '5', '1', '3', '8']
but i want it to split at '/t', taking the first element in the list which is AHY392978 and append it into the list gene
['AHY39278', 'AHY39278']
any idea how to join the elements together?
add this code to your project:
string = ''
for(i in range len(gene)):
string += gene[i]
you can get your data from string
I'm wondering if it is possible to convert the listings into a specific groups to which I could place them in a table format later on.
This is the output that I needed to group, I converted them into a list so that I could easily divide them in table manner.
f=open("sample1.txt", "r")
f.read()
Here's the output:
'0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL +99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430 31558 63001 10214 20197 40117 52014 70544 82108 333 20211 55062 56999 59012 82820 86280 555 60973=\n'
Here's what I have done already. I have managed to change it into a list which resulted in this output:
with open('sample1.txt', 'r') as file:
data = file.read().replace('\n', '')
print (list(data))
The Output:
['0', '2', '4', '5', '9', '8', '4', '3', '0', '0', '9', '9', '9', '9', '9', '2', '0', '1', '8', '0', '1', '0', '1', '0', '0', '0', '0', '4', '+', '1', '4', '6', '5', '0', '+', '1', '2', '1', '0', '5', '0', 'F', 'M', '-', '1', '2', '+', '0', '0', '4', '6', '9', '9', '9', '9', '9', 'V', '0', '2', '0', '3', '0', '0', '1', 'N', '0', '0', '1', '0', '1', '0', '9', '0', '0', '0', '1', 'C', 'N', '0', '0', '8', '0', '0', '0', '1', '9', '9', '+', '0', '2', '1', '4', '1', '+', '0', '1', '9', '7', '1', '1', '0', '1', '1', '7', '1', 'A', 'D', 'D', 'A', 'Y', '1', '4', '1', '0', '2', '1', 'A', 'Y', '2', '4', '1', '0', '2', '1', 'G', 'A', '1', '0', '2', '1', '+', '0', '0', '6', '0', '0', '1', '0', '8', '1', 'G', 'A', '2', '0', '6', '1', '+', '0', '9', '0', '0', '0', '1', '0', '2', '1', 'G', 'E', '1', '9', 'M', 'S', 'L', ' ', ' ', ' ', '+', '9', '9', '9', '9', '9', '+', '9', '9', '9', '9', '9', 'G', 'F', '1', '0', '6', '9', '9', '1', '0', '2', '1', '9', '9', '9', '0', '0', '6', '0', '0', '1', '9', '9', '9', '9', '9', '9', 'K', 'A', '1', '1', '2', '0', 'N', '+', '0', '2', '1', '1', '1', 'M', 'D', '1', '2', '1', '0', '1', '4', '1', '+', '9', '9', '9', '9', 'M', 'W', '1', '0', '5', '1', 'R', 'E', 'M', 'S', 'Y', 'N', '1', '0', '4', '9', '8', '4', '3', '0', ' ', '3', '1', '5', '5', '8', ' ', '6', '3', '0', '0', '1', ' ', '1', '0', '2', '1', '4', ' ', '2', '0', '1', '9', '7', ' ', '4', '0', '1', '1', '7', ' ', '5', '2', '0', '1', '4', ' ', '7', '0', '5', '4', '4', ' ', '8', '2', '1', '0', '8', ' ', '3', '3', '3', ' ', '2', '0', '2', '1', '1', ' ', '5', '5', '0', '6', '2', ' ', '5', '6', '9', '9', '9', ' ', '5', '9', '0', '1', '2', ' ', '8', '2', '8', '2', '0', ' ', '8', '6', '2', '8', '0', ' ', '5', '5', '5', ' ', '6', '0', '9', '7', '3', '=']
My goal is to group them into something like these:
0245,984300,99999,2018,01,01,0000,4,+1....
The number of digits belonging to each column is predetermined, for example there are always 4 digits for the first column and 6 for the second, and so on.
I was thinking of concatenating them. But I'm not sure if it would be possible.
You can use operator.itemgetter
from operator import itemgetter
g = itemgetter(slice(0, 4), slice(4, 10))
with open('sample1.txt') as file:
for line in file:
print(g(line))
Or even better you can make the slices dynamically using zip and itertools.accumulate:
indexes = [4, 6, ...]
g = itemgetter(*map(slice, *map(accumulate, zip([0]+indexes, indexes))))
Then proceed as before
I would recommend naming everything if you actually want to use this data, and double checking that all the lengths make sense. So to start you do
with open('sample1.txt', 'r') as file:
data = file.read().rstrip('\n"')
first, second, *rest = data.split()
if len(first) != 163:
raise ValueError(f"The first part should be 163 characters long, but it's {len(first)}")
if len(second) != 163:
raise ValueError(f"The second part should be characters long, but it's {len(first)}")
So now you have 3 variables
first is "0245984300999992018010100004+14650+121050FM-12+004699999V0203001N00101090001CN008000199+02141+01971101171ADDAY141021AY241021GA1021+006001081GA2061+090001021GE19MSL"
second is "+99999+99999GF106991021999006001999999KA1120N+02111MD1210141+9999MW1051REMSYN10498430"
rest is ['31558', '63001', '10214', '20197', '40117', '52014', '70544', '82108', '333', '20211', '55062', '56999', '59012', '82820', '86280', '555', '60973']
And then repeat that idea
date, whatever, whatever2, whatever3 = first.split('+')
and then for parsing the first part I would just have a list like
something = date[0:4]
something_else = date[4:10]
third_thing = date[10:15]
year = [15:19]
month = [19:21]
day = [21:23]
and so on. And then you can use all these variables in the code that analyzes them.
If this is some sort of standard, you should look for a library that parses strings like that or write one yourself.
Obviously name the variables better
I am reading a csv file from pandas where I have a column of (3,3) shaped lists.
An example list is as follows.
[[45.70345721, -0.00014686, -1.679e-05], [-0.00012219, 45.70271889, 0.00012527], [-1.161e-05, 0.00013083, 45.70306778]]
I tried to convert this list to a numpy float array with np.array(arr).astype(np.float). But it gives the following error.
ValueError: could not convert string to float:
When I searched for the root cause I observed that this list is in fully string format. print [i for i in arr] gives the following where everything is a string.
['[', '[', '4', '5', '.', '7', '0', '3', '4', '5', '7', '2', '1', ',', ' ', '-', '0', '.', '0', '0', '0', '1', '4', '6', '8', '6', ',', ' ', '-', '1', '.', '6', '7', '9', 'e', '-', '0', '5', ']', ',', ' ', '[', '-', '0', '.', '0', '0', '0', '1', '2', '2', '1', '9', ',', ' ', '4', '5', '.', '7', '0', '2', '7', '1', '8', '8', '9', ',', ' ', '0', '.', '0', '0', '0', '1', '2', '5', '2', '7', ']', ',', ' ', '[', '-', '1', '.', '1', '6', '1', 'e', '-', '0', '5', ',', ' ', '0', '.', '0', '0', '0', '1', '3', '0', '8', '3', ',', ' ', '4', '5', '.', '7', '0', '3', '0', '6', '7', '7', '8', ']', ']']
How do I convert this list to a numpy float array?
EDIT
Here is a snap of a part of my data frame.
When loaded, the data frame is in the below format. df here is a small example data frame.
df = pd.DataFrame(columns=["e_total"], data=[[['[', '[', '4', '5', '.', '7', '0', '3', '4', '5', '7', '2', '1', ',', ' ', '-', '0', '.', '0', '0', '0', '1', '4', '6', '8', '6', ',', ' ', '-', '1', '.', '6', '7', '9', 'e', '-', '0', '5', ']', ',', ' ', '[', '-', '0', '.', '0', '0', '0', '1', '2', '2', '1', '9', ',', ' ', '4', '5', '.', '7', '0', '2', '7', '1', '8', '8', '9', ',', ' ', '0', '.', '0', '0', '0', '1', '2', '5', '2', '7', ']', ',', ' ', '[', '-', '1', '.', '1', '6', '1', 'e', '-', '0', '5', ',', ' ', '0', '.', '0', '0', '0', '1', '3', '0', '8', '3', ',', ' ', '4', '5', '.', '7', '0', '3', '0', '6', '7', '7', '8', ']', ']']]])
Could someone give it a try and help me to convert this to a float array.
You can probably use eval() to turn the entire string into an actual list. eval() is generally not good to use, but in this case it might be your best bet.
What you listed as your "example" is not correct. You are listing the result of your print statement and list comprehension. What is being stored as an entry for that column is a string.
you should be able to simply take each item and wrap it in eval
eval(arr)
that should return you a shape (3,3) python list. From there you can convert it to a numpy array as necessary and change the types.
Aren't the numbers in the lists already floats? If that is the case just making the list an np.array will do what you are asking. You only need to do
np.array(list)
if the numbers are actually strings like you are showing in the second part you will have to go through the list and convert each number individually using either a nest loop or nested list comprehension.
the loop looks like this
for i in list:
for j in i:
j= np.float(j)
the list comprehension looks like
new_list= [ [np.float(j) for j in i] for i in list]