I'm writing a script in Python 3, where I go through a file, and collect information about the duration of various tasks. I need to maintain a list of summations of these durations (in the form of datetime.timedelta objects), split by date and which task was done. Each task is identified by an ID string.
This means that while going through the file I build a list of records, where each record consist of a date, an ID string and a duration. When adding a new record I first check if the date and ID string combination is already present in the list. If it is I add the new duration to the current duration in the list. If the date and ID string combination doesn't exist, I append the record to the list.
I don't know in advance how many different combinations of date and ID string there is, so I can't pre-allocate them.
At the end I would like to be able to sort the list on date and ID string before printing it to standard out.
I tried doing it in a list of tuples, but tuples are immutable, so I can't add a new duration to an existing duration I found.
If pressed I could create a new ID string by concatenating a string representation of the date and the ID string. But I would really prefer to keep those two values separate.
Is this possible? And if so: How?
I wouldn't use a list in this case, but rather a dict. Here's a simple example:
data = {}
with open("myfile.txt") as file:
for line in file:
# Parse the line for the following:
# tid: The task ID we read
# date: The date we read
# duration: The duration we read
# Once the data has been parsed out, store it:
data.setdefault((date, tid), 0)
data[(date, tid)] += duration
After parsing the file you can get the keys to the dict (data.keys()), sort them, and print out the results.
Related
I'm wondering if it's possible to query for 2 indicies in Elasticsearch, and display the results mixed together in 1 table. For example:
Indicies:
food-american-burger
food-italian-pizza
food-japanese-ramen
food-mexican-burritos
#query here for burger and pizza, and display the results in a csv file
#i.e. if there was a timestamp field, display results starting from the most recent
I know you can do a query for food-*, but it would give 2 indices that I wouldn't want.
I looked up the multisearch module for Elasticsearch DSL, but the documentation shows only an instance of 1 index query:
ms = MultiSearch(index='blogs')
ms = ms.add(Search().filter('term', tags='python'))
ms = ms.add(Search().filter('term', tags='elasticsearch'))
Part 1:
Is it possible to use this for multiple indices? Ultimately, I would like to query for x number of indicies and display all the data in a single human-readable format (csv, json, etc.), but I'm not sure how to perform a single query for only the indices I want.
I currently have the functionality to perform queries and write out the data, but each data file would only consist of that index I queried for. I would like to display all the data into one file.
Part 2:
The data is stored in a dictionary, and then I am writing it to a csv. It is currently being ordered by timestamp. The code:
sorted_rows = sorted(rows,key=lambda x: x['#timestamp'], reverse=True)
for row in sorted_rows:
writer.writerow(row.values())
When writing to the csv, the timestamp field is not the first column. I'm storing the fields in a dictionary, and updating that dictionary for every Elasticsearch hit, then writing it to the csv. Is there a way to move the timestamp field to the first column?
Thanks!
According to the Elasticsearch Docs, you can query a single index (e.g. food-american-burger), multiple comma-separated indicies (e.g. food-american-burger,food-italian-pizza), or all indicies using the _all keyword.
I haven't personally used the Python client, but this is an API convention and should apply to any of the official Elasticsearch clients.
For part 2, you should probably submit a separate question to keep things to a single topic per question, since the two topics are not directly related.
So, I have a large data frame with customer names. I used the phone number and email combined to create a unique ID key for each customer. But, sometimes there will be a typo in the email so it will create two keys for the same customer.
Like so:
Key | Order #
555261andymiller#gmail.com 901345
555261andymller#gmail.com 901345
I'm thinking of combining all the keys based on the phone number (partial string) and then assigning all the keys within each group to the first key in every group. How would I go about doing this in Pandas? I've tried iterating over the rows and I've also tried the groupby method by partial string, but I can't seem to assign new values using this method.
If you really don't care what the new ID is, you can groupby the first characters of the string (which represent the phone number)
For example:
df.groupby(df.Key.str[:6]).first()
This will result in a dataframe where the index is the the first entry of the customer record. This assumes that the phone number will always be correct, though it sounds like that should not be an issue
I have a dictionary with some share related data:
share_data = {'2016-06-13':{'open': 2190, 'close':2200}, '2015-09-10':{'open': 2870, 'close':2450} # and so on, circa 1,500 entries
is there a way of iterating over the dictionary in order, so the oldest date is retrieved first, then the one soon after etc?
thanks!
Sure, the default lexicographical order of your date strings will map to chronological order. So it is very easy:
for key in sorted(share_data.keys()):
#do something
This post has some nice examples of custom sorting on dictionaries.
EDIT: Sorry for the confusion I'll explain what the program is for. It's to keep track of a users new weight record. This file will only update when they have exceeded their previous weight record with a timestamp. I want the user to be able to see a time line of their progress for each lift using the timestamp. This is why i was using lift['key']={data:dict} So that they can reference each lift type and query the date for example lift['snatch']['5/25'] this will tell them what they maxed that day. But i can't seem to be able to write this to a csv file properly. Thank you for you time! Happy friday!
I've been researching for days and can't seem to figure out how to add data to a specific Fieldname which is a the highest level key in my dict.
The data i want to add is a dict in it's own.
How I vision it to look like in the CSV file:
snatch <> squat <> jerk
10/25:150lbs <> 10/25:200lbs <> 10/25:0lbs
So this is how it would look like when they created the file. How am I able to update just one field.
Say the user only squatted that day and wants to Append data to that Field.
What I have so far:
import time
import csv
lifts={}
csv_columns = ['snatch','squat','jerk']
creation = time.strftime('%M:%S', time.localtime())
lifts['snatch']={creation:'150lbs'}
lifts['squat']={creation:'200lbs'}
lifts['jerk']={creation:'0lbs'}
try:
with open(csv_file, 'w') as csvfile:
writer = csvDictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in lifts:
writer.writerow(lifts)
except IOError as (errno, sterror):
print("Error")
return
->One of my issues is that when it writers to the csv file it writes it over three times. Not quite sure why. It's the format I want but it's there three times.
-> I also want to implement this next code and write to the specific column, when i do so it writes null or blanks in the other columns.
lifts['jerk'].update({time.strftime('%M:%S', time.localtime() : '160lbs'})
Then out putting
snatch <> squat <> jerk
10/25:150lbs <> 10/25:200lbs <> 10/25:0lbs 10/26:160lbs
Sorry I'm new to python and not quit sure how to use this editor i want that result to land under the {10/25:0lbs} Just like it would show in excel.
It's important to keep track of what's going on here: lifts is a dictionary with strings for keys ("snatch", "squat", "jerk") and whose values are also dictionaries. This second level of dictionaries has timestamp strings for keys and strings as values.
I suspect that when you want to update the lifts['jerk'] dictionary, you don't use the same key (timestamp) as the existing entry.
It doesn't seem like you need a dictionary for the second level; consider using a list instead, but if you must, you can access like so: lifts['jerk'][lifts['jerk'].keys()[0]] which is rather hamfisted - again, consider either using a different data type for the values of your lifts dictionary or use keys that are easier to reference than timestamps.
EDIT: You could do something like lifts['jerk'] = {'timestamp':creation,'weight':'165lbs'} which requires some restructuring of your data.
I have code that looks like the following, which executes every minute:
huge_list = query_results() # Returns a long list of dictionaries.
db.objects.insert(huge_list)
I need the current datetime appended to each object in the list before insertion. Is there some way I can modify the insert command so it also appends a 'datetime' field, and if not what would be the most efficient way of doing this? There are several thousands records in the response dictionary, so I feel like visiting each index of the list and appending a field may not be the most efficient method.
Later I will need to be able to query for records individually from the whole group, and also for records within a specific datetime range.
Thanks in advance!