Hiya so I have a data frame which has the time something occurs in one column and the time that it ends in the next column. I need to try and find the time difference between the two, but theyre both strings so it wont simply let me compare them, is there a way I can change them to ints (theyre in the format HH:MM:SS) I found a way to split them using .split (I've put what I did for the original time below, the I could do the same for the second column and work them out from there, but I was wondering if there was an easier way?
...
TIA!
q = 0
for int in range(long):
intel = df_data_bad_time1.loc[q,'Time']
H_M_S = intel.split(':')
df_data_bad_time1.loc[q,'Hours'] = H_M_S[0]
df_data_bad_time1.loc[q,'Mins'] = H_M_S[1]
df_data_bad_time1.loc[q,'Secs'] = H_M_S[2]
q = q + 1
df_data_bad_time1['Hours'] = pd.to_numeric(df_data_bad_time1['Hours'], errors='coerce').astype('Int64')
df_data_bad_time1['Mins'] = pd.to_numeric(df_data_bad_time1['Mins'], errors='coerce').astype('Int64')
df_data_bad_time1['Secs'] = pd.to_numeric(df_data_bad_time1['Secs'], errors='coerce').astype('Int64')
df_data_bad_time1.head(15)
Here's a simple function I wrote, you can take a look at it and tell me if you don't understand anything:
https://repl.it/#d4nieldev/subtract-time
Related
I'm trying to find a substring within a dataframe object. I'm turning the dataframe object into a string before i do this.
Even though i know for sure that the substring exists in the dataframe object, the "in" operator keeps returning false.
I've spent hours trying to figure out how else i can do this. I've also tried using
df1.str.contains
but it errors out.
Can someone please let me know what i'm doing wrong? I'm willing to try different approaches if necessary.
Here is the code i'm using:
changesfull = r'C:\Users\user\Desktop\changes\changes.xlsx'
locations = pd.read_excel(changesfull)
for change in changes:
y = y+1
z = z+1
g = g+1
df1 = locations.iloc[y:z,2:3]
newframe = "product" in str(df1)
print(newframe)
When i print out str(df1), I get an output of dataframe object:
"1 Effective July 15, 2018 - the following produc..."
Could it be that its unable to find the word product because it doesn't acutally print the full thing? eg. it should be " - the following product is available"
In case this actually helps anyone in the future:
The issue was with my my iloc i was using.
I changed that portion to the following and it worked:
df1 = locations.iloc[y,2]
I've googled around a bit and it seems like nobody has had this problem before so here we go:
Function:
def split_time(time_string):
time = time_string.split('T')
time_array = time[-1]
return time_array
Call of Function:
class Entry():
def __init__(self,start,end,summary,description):
self.start_date = split_time(start)
self.end_date = split_time(end)
self.summary = summary
self.description = description
My function recieves a string containing a date time format like this: 2018-03-17T09:00:00+01:00
I want to cut it at 'T' so i used time = time_string.split('T') which worked just fine!
The output of time is ['2018-05-08', '12:00:00+02:00'].
So now i wanted to split it some more and ran into the following error:
While i can access time[0] which delivers the output 2018-05-08 i cant access time[1], i just get an Index out of range Error.
To me it seems like time does contain an array with two strings inside because of its output yo i'm really at a loss right now.
Any help would be appreciated =)
(and an explanation too!)
Use item[-1] to access the last item in the last.
Still unsure why item[1] would throw an error for a list with two items in it.
So I want to pick some data out of a text file, which looks like this:
##After some other stuff which could change
EASY:[5,500]
MEDIUM:[10,100]
HARD:[20,1000]
EXPERT:[30,2000]
EXTREME:[50,5000]
I'm writing a function which uses the difficulty ('EASY' 'HARD' e.t.c) to return the following list. My current code looks like this:
def setAI(difficulty): #difficulty='EASY' or 'HARD' or...e.t.c)
configFile=open('AISettings.txt')
config=configFile.read()
print(config[(config.find(difficulty)):(config.find(']',(config.find(difficulty))))]) #So it will return the chunk between the difficulty, and the next closed-square-bracket after that
This produces the following output:
>>> HARD:[20,1000
I tried fixing it like this:
print(config[(config.find(difficulty)+2):(config.find(']',(config.find(difficulty)+2))+1)])
which returns:
>>>RD:[20,1000]
The issue I'm trying to adress is that I want it to start after the colon, I am aware that I could use the length of the difficulty string to solve this, but is there a simpler way of returning the end of the string when using the .find() command?
P.S: I couldn't find any duplicates for this, but it is a slightly odd question, so sorry if it's already on here somewhere; Thanks in advance
EDIT: Thanks for the replies, I think you basically all solved the problem, but the chosen answer was becasue I like the iteration line-by-line idea, Cheers guys :)
Well if the file look like this, why not just iterate line by line and do something like:
def setAI(difficulty): #difficulty='EASY' or 'HARD' or...e.t.c)
configFile=open('AISettings.txt')
config=configFile.readlines()
for line in config:
if line.startswith(difficulty.upper()):
print(line[len(difficulty) + 1:])
Find returns the location. But ranges assume that their end number should not be included. Just add one to the end.
config = """
##After some other stuff which could change
EASY:[5,500]
MEDIUM:[10,100]
HARD:[20,1000]
EXPERT:[30,2000]
EXTREME:[50,5000]
"""
difficulty = 'HARD'
begin = config.find(difficulty)
end = config.find(']', begin)
print(config[begin:end+1])
The function find will always give you the position of the first letter of the string. Also consider that the notation string[start:end] will give you the substring including the character at start but excluding the character at end. Therefore you could use something like the following:
def setAI(difficulty):
configFile = open('AISettings.txt')
config = configFile.read()
start = config.find(difficulty) + len(difficulty) + 1
end = config.find(']', start) + 1
print(config[start:end])
I'm just starting out with Python and wondering how I would go about sorting this
list from the earliest time to the latest.
('5:00PM','2:00PM','7:00AM','8:45PM','12:00PM')
Any help is appreciated.
In python3 with standard library only:
import time
hours = ('5:00PM','2:00PM','7:00AM','8:45PM','12:00PM')
format = '%I:%M%p'
time_hours = [time.strptime(t, format) for t in hours]
result = [time.strftime(format, h) for h in sorted(time_hours)]
assert result == ['07:00AM', '12:00PM', '02:00PM', '05:00PM', '08:45PM']
I recommend that you install the PyPi DateTime package and use those facilities for whatever manipulation you desire. The problem at hand would look something like:
stamps = ('5:00PM','2:00PM','7:00AM','8:45PM','12:00PM')
DT_stamps = [DateTime(s) for s in stamps]
DT_stamps.sort()
Implementation details are left as an exercise for the student. :-)
If the times are always going to be in that format, you could split the times into subsections.
x = "12:30PM"
# Use python's string slicing to split on the last two characters
time, day_half = x[:-2], x[-2:]
# Use python's string.split() function to get the difference between hours and minutes
# Because "11" < "2" for strings, we need to convert them to integers
hour, minute = [int(t) for t in time.split(":")]
# Get the remainder because 12 should actually be 0
hour = hour % 12
# Output it as a tuple, which sorts based on each element from left to right
sortable = (day_half, hour, minute)
#: ("PM", 12, 30)
To wrap it all up, use something like:
def sortable_time(time_str):
time, day_half = time_str[:-2], time_str[-2:]
hour, minute = [int(t) for t in time.split(":")]
hour = hour % 12
return day_half, hour, minute
# When sorting, use `key` to define the method we're sorting with
# (The returned list however, will be filled with the original strings)
result = sorted(your_time_list, key=sortable_time)
#: ['7:00AM', '12:00PM', '2:00PM', '5:00PM', '8:45PM']
If you're not guaranteed to have the two letters at the end, or the colon in the middle, you're best off using a library like what is suggested by Prune.
What you're showing isn't a list of times, it's a tuple of strings. Tuples are immutable and can't be sorted, only mutable collections like lists can be. So first you need to convert your tuple to a list:
times = ['5:00PM','2:00PM','7:00AM','8:45PM','12:00PM']
You could try sorting this list now, but the strings won't sort the way you expect. Instead, you need to create a custom sort function that will temporarily convert the values in the list to struct_time objects and sort using those.
import time
time_format = '%I:%M%p' # match hours, minutes and AM/PM
def compare_as_time(time_str1, time_str2):
# parse time strings to time objects
time1 = time.strptime(time_str1, time_format)
time2 = time.strptime(time_str2, time_format)
# return comparison, sort expects -1, 1 or 0 to determine order
if time1 < time2:
return -1
elif time1 > time2:
return 1
else:
return 0
Now you can call sorted() and pass in your list and your custom comparison function and you'll get a list of strings back, sorted by the time in those strings:
sorted_times = sorted(times, compare_as_time)
Note for Python 3: The previous example assumes Python 2. If you're using Python 3, you'll need to convert the comparison function to a key function. This can be done using functools.cmp_to_key() as follows:
form functools import cmp_to_key
sorted_times = sorted(times, key=cmp_to_key(compare_as_time))
I have a database where each case holds info about handwritten digits, eg:
Digit1Seq : when in the sequence of 12 digits the "1" was drawn
Digit1Ht: the height of the digit "1"
Digit1Width: its width
Digit2Seq: same info for digit "2"
on up to digit "12"
I find I now need the information organized a little differently as well. In particular I want a new variables with the height and width of the first digit written, then the height and width of the second, etc., as SPSS vars
FirstDigitHt
FirstDigitWidth ...
TwelvthDigitWidth
Here's a Python program I wrote to do within SPSS what ought to be a very simple computation, but it runs into a sort of namespace problem:
BEGIN PROGRAM PYTHON.
import spss
indices = ["1", "2", "3","4","5", "6", "7", "8", "9", "10", "11", "12"]
seq=0
for i in indices:
spss.Submit("COMPUTE seq = COMDigit" + i + "Seq.")
spss.Submit("EXECUTE.")
spss.Submit("COMPUTE COM" + indices[seq] + "thWidth = COMDigit" + i + "Width.")
spss.Submit("COMPUTE COM" + indices[seq] + "thHgt = COMDigit" + i + "Hgt.")
spss.Submit("EXECUTE.")
END PROGRAM.
It's clear what's wrong here: the value of seq in the first COMPUTE command doesn't get back to Python, so that the right thing can happen in the next two COMPUTEcommands. Python's value of seq doesn't change, so I end up with SPSS code that gives me only two variables (COM1thWidth and COM1Hgt), into which COMDigit1Width, COMDigit2Width, etc. get written.
Is there any way to get Python to access SPSS's value of seq each time so that the string concatenation will create the correct COMPUTE? Or am I just thinking about this incorrectly?
Have googled extensively, but find no way to do this.
As I'm new to using Python in SPSS (and not all that much of wiz with SPSS) there may well be a far easier way to do this.
All suggestions most welcome.
Probably the easiest way to get your SPSS variable data into Python variables for manipulation is with the spss.Dataset class.
To do this, You will need:
1.) the dataset name of your SPSS Dataset
2.) either the name of the variable you want to pull data from or its index in your dataset.
If the name of the variable you want to extract data from is named 'seq' (as I believe it was in your question), then you can use something like:
BEGIN PROGRAM PYTHON.
from __future__ import with_statement
import spss
with spss.DataStep()
#the lines below create references to your dataset,
#to its variable list, and to its case data
lv_dataset = spss.Dataset(name = <name of your SPSS dataset>)
lv_caseData = lv_dataset.cases
lv_variables = lv_dataset.varlist
#the line below extracts all the data from the SPSS variable named 'seq' in the dataset referenced above into a list
#to make use of an SPSS cases object, you specify in square brackets which rows and which variables to extract from, such as:
#Each row you request to be extracted will be returned as a list of values, one value for each variable you request data for
#lv_theData = lv_caseData[rowStartIndex:rowEndIndex, columnStartIndex:columnEndIndex]
#This means that if you want to get data for one variable across many rows of data, you will get a list for each row of data, but each row's list will have only one value in it, hence in the code below, we grab the first element of each list returned
lv_variableData = [itm[0] for itm in lv_caseData[0:len(lv_caseData), lv_variables['seq'].index]]
END PROGRAM.
There are lots of ways to process the case data held by Statistics via Python, but the case data has to be read explicitly using the spss.Cursor, spssdata.Spssdata, or spss.Dataset class. It does not live in the Python namespace.
In this case the simplest thing to do would be to just substitute the formula for seq into the later references. There are many other ways to tackle this.
Also, get rid of those EXECUTE calls. They just force unnecessary data passes. Statistics will automatically pass the data when it needs to based on the command stream.
Hi I just stumbled across this, and you've probably moved on, but it might help other folks. I don't thing you actually need to access have Python access the SPSS values. I think something like this might work:
BEGIN PROGRAM PYTHON.
import spss
for i in range(1,13):
k = "COMPUTE seq = COMDigit" + str(i) + "Seq."
l = "Do if seq = " + str(i)+ "."
m = "COMPUTE COM" + str(i) + "thWidth = COMDigit" + str(i) + "Width."
n = "COMPUTE COM" + str(i) + "thHgt = COMDigit" + str(i) + "Hgt."
o = "End if."
print k
print l
print m
print n
print o
spss.Submit(k)
spss.Submit(l)
spss.Submit(m)
spss.Submit(n)
spss.Submit(o)
spss.Submit("EXECUTE.")
END PROGRAM.
But I'd have to see the data to make sure I'm understanding your problem correctly. Also, the print stuff makes the code look ugly, but its the only way I can keep a handle on whats going on under the hood. Cheerio!