How to parse text to dataframe?

How to parse text to dataframe? - python

Let's imagine that I have a text like this:
if last_n_input_time <= 7463.0:
else: if last_n_input_time > 7463.0
else: if passwd_input_time > 27560.0
else: if secret_answ_input_time > 7673.5
if first_n_input_time <= 4054.5:
if passwd_input_time <= 5041.0:
else: if passwd_input_time > 5041.0
else: if first_n_input_time > 4054.5
return [[ 1.01167029]]
....
And I have a dataframe with such columns as passwd_input_time, first_n_input_time and others - named in the same way as variables in the text.
The question is how can I search for example for first_n_input_time and if I find it in the text, then move to another symbol and see whether it's > or <= and then cut out the value which goes after > or <= symbols and add it to the cell of my dataframe.
df['first_n_input_time'] = 4079 should the result of my function.
I understand how to find the world but I don't know how to cut the lines in such a way so I could get, for example, "secret_answ_input_time <= 7673.5" and then operate on it.
Example:
I want to cut out "if passwd_input_time <= 47635.5" at first. Find whether "password_input_time" it belongs to the list of column names of my dataframe (yes, it does). Then I need to move next and to check what the symbol is here "<=" or ">". If it's "<=" then I take the value 47635.5 and write to the cell of df['password_input_time_1']. If it's ">" then I write the value to another column df['password_input_time_2']
Here are some pieces of code I wrote trying to implement this but I'm stuck a bit cos I don't know how to move to the next word in the text:
def to_dataframe(i, str)
for word in str_.split():
if any(word in s for s in cols_list):
#move to the next word somehow
#I will call it next_word later on for simplicity
col_name = word #save the value to refer to it later
if next_word == "<=":
col_name.append('_1')
#move to the value somehow
#I will call it 'value' later on
df[col_name][i] = value
if next_word == ">"
col_name.append('_2')
#move to the value somehow
#I will call it 'value' later on
df[col_name][i]= value
Where cols_list is list of columns names of my dataframe.

Related

Dataframe Is No Longer Accessible

I am trying to make my code look better and create functions that do all the work from running just one line but it is not working as intended. I am currently pulling data from a pdf that is in a table into a pandas dataframe. From there I have 4 functions, all calling each other and finally returning the updated dataframe. I can see that it is full updated when I print it in the last method. However I am unable to access and use that updated dataframe, even after I return it.
My code is as follows
def data_cleaner(dataFrame):
#removing random rows
removed = dataFrame.drop(columns=['Unnamed: 1','Unnamed: 2','Unnamed: 4','Unnamed: 5','Unnamed: 7','Unnamed: 9','Unnamed: 11','Unnamed: 13','Unnamed: 15','Unnamed: 17','Unnamed: 19'])
#call next method
col_combiner(removed)
def col_combiner(dataFrame):
#Grabbing first and second row of table to combine
first_row = dataFrame.iloc[0]
second_row = dataFrame.iloc[1]
#List to combine columns
newColNames = []
#Run through each row and combine them into one name
for i,j in zip(first_row,second_row):
#Check to see if they are not strings, if they are not convert it
if not isinstance(i,str):
i = str(i)
if not isinstance(j,str):
j = str(j)
newString = ''
#Check for double NAN case and change it to Expenses
if i == 'nan' and j == 'nan':
i = 'Expenses'
newString = newString + i
#Check for leading NAN and remove it
elif i == 'nan':
newString = newString + j
else:
newString = newString + i + ' ' + j
newColNames.append(newString)
#Now update the dataframes column names
dataFrame.columns = newColNames
#Remove the name rows since they are now the column names
dataFrame = dataFrame.iloc[2:,:]
#Going to clean the values in the DF
clean_numbers(dataFrame)
def clean_numbers(dataFrame):
#Fill NAN values with 0
noNan = dataFrame.fillna(0)
#Pull each column, clean the values, then put it back
for i in range(noNan.shape[1]):
colList = noNan.iloc[:,i].tolist()
#calling to clean the column so that it is all ints
col_checker(colList)
noNan.iloc[:,i] = colList
return noNan
def col_checker(col):
#Going through, checking and cleaning
for i in range(len(col)):
#print(type(colList[i]))
if isinstance(col[i],str):
col[i] = col[i].replace(',','')
if col[i].isdigit():
#print('not here')
col[i] = int(col[i])
#If it is not a number then make it 0
else:
col[i] = 0
Then when I run this:
doesThisWork = data_cleaner(cleaner)
type(doesThisWork)
I get NoneType. I might be doing this the long way as I am new to this, so any advice is much appreciated!

The reason you are getting NoneType is because your function does not have a return statement, meaning that when finishing executing it will automatically returns None. And it is the return value of a function that is assigned to a variable var in a statement like this:
var = fun(x)
Now, a different thing entirely is whether or not your dataframe cleaner will be changed by the function data_cleaner, which can happen because dataframes are mutable objects in Python.
In other words, your function can read your dataframe and change it, so after the function call cleaner is different than before. At the same time, your function can return a value (which it doesn't) and this value will be assigned to doesThisWork.
Usually, you should prefer that your function does only one thing, so expect that the function changes its argument and return a value is usually bad practice.

Include a header from Excel in a for loop with openpyxl

I am trying to include a header when printing data in a column.
Issue
But when I try it an error comes up:
TypeError: '<' not supported between instances of 'int' and 'str'
Code
def pm1():
for cell in all_columns[1]:
power = (cell.value)
if x < power < y:
print(f"{power}")
else:
print("Not steady")
pm1()
I know you cannot compare an string with operation values.
How can I include the header while looping throughout the entire column?

Based on what I understand from your comments, this may work for you.
def pm1():
for cell in all_columns[1]:
for thing in cell:
# in openpyxl you can call on .row or .column to get the location of your cell
# you said you wanted to print the header (row 1), a sting
if thing.row == 1:
print(thing.value)
else:
# you said that the values under the header will be a digit
# so now you should be safe to set your variable and make a comparison
power = thing.value
if x < power < y:
print(f"{power}")
else:
print("Not steady")

So you are looping through all cells of a column, here given by a first column all_columns[1].
Assume the first cell of each column might contain a header which has a value is of type string (type(cell.value) == str).
Then you have to possibilities:
Given the first cell of each column (in row 1) is a header, take advantage of that position
If all other cells contain numerical values, you can handle only the str values differently as supposed headers
def power_of(value):
# either define boundaries x,y here or global
power = float(value) # defensive conversion, some values might erroneously be stored as text in Excel
if x < power < y:
return f"{power}"
return "Not steady" # default return instead else
def pm1():
for cell in all_columns[1]:
if (cell.row == 1): # assume the header is always in first row
print(cell.value) # print header
else:
print(power_of(cell.value))
pm1()

Assigning a variable from a different file with a string

In the file values there are a bunch of lists and I want to assign gun_type to a value depending on what my current_gun_name is plus the string _Iron.
How do I do this? This is my current code but it doesnt work
current_gun_name = string_assigned_above
iron = win32api.GetAsyncKeyState(0x67)
if iron < KeyMin and win32api.GetAsyncKeyState(0x11) < 0:
gun_type = values.(current_gun_name + "_Iron")
So, on the last line there I am trying to pull a list from another file called values. But the list that I am trying to pull depends on the current_gun_name string. For example:
current_string = "test"
list_from_values = values.(current_gun_name + "ing")
print(list_from_values)
In this code it should find a list in the file called values. The list it will find and print will be called "testing" as I am asking it to use the variable plus "ing" Except this doesnt work

Python Selenium - replacing the location['x'] value within an iteration followed by click()

in order to automize some tasks in the browser with Selenium I need to identify a certain field on a website and click it. Because the values I'm using to identify the correct field can be displayed multiple times I'm iterating through the findings including multiple conditions. Maybe the code is written ineffectivly, but the conditions - with the goal of locating the correct x and y coordinate is working. I'd like to know if I can somehow modify a location['x'] value in order the execute a click command.
# finding the X Value
tempmatchesx = driver.find_elements_by_xpath("//*[text()='" + tempoa + "']")
tempmatchesxVal =''
if indicator == '1':
for i in tempmatchesx:
if (i.location['x'] >= temptype['x']) and (i.location['y'] >= temptype['y']) and (i.location['x'] < temptypeoppo['x']):
tempmatchesxVal = i.location['x']
break
elif indicator == '2':
for i in tempmatchesx:
if (i.location['x'] >= temptype['x']) and (i.location['y'] >= temptype['y']) and (i.location['x'] > temptypeoppo['x']):
tempmatchesxVal = i.location['x']
break
# finding the Y Value
tempmatchesy = driver.find_elements_by_xpath("//*[text()='" + tempgoals + "']")
tempmatchesyVal =''
if indicator == '1':
for i in tempmatchesy:
if (i.location['x'] >= temptype['x']) and (i.location['y'] >= temptype['y']) and (i.location['x'] < temptypeoppo['x']):
i.location['x'] = tempmatchesxVal
i.click()
break
elif indicator == '2':
for i in tempmatchesy:
if (i.location['x'] >= temptype['x']) and (i.location['y'] >= temptype['y']) and (i.location['x'] > temptypeoppo['x']):
i.location['x'] = tempmatchesxVal
i.click()
So basically the part my question is referring to is the following:
i.location['x'] = tempmatchesxVal
i.click()
Within an iteration, is it somehow possible to replace the location-X value with the before identified x value (tempmatchesxVal)?
Or could the way I did it work and the failure (without error code) might be somewhere else?
For now, no item gets clicked.
Update:
The purpose of the whole is to click an element where I dont know now the content from, therefor I can't simply search for that. There I identifying the column and row where the element is lcoated.
Two "find_elements_by_xpath" are done with different inputs - the first is tempoa to identify column (x-value) and the second tempgoals for the row (y-value).
Apparently I can't modify an i.location[coordinate] - how can I then click that element?

I've solved it on my own. Instead of going with the initial idea of modifying the coordinates (thanks to the answers - so thats not possible) I implemented an additional condition:
for i in tempOdds:
if (i.location['y'] == tempmatchesyVal) and (i.location['x'] >= tempmatchesxVal):
So basically only allowing elements with the same x and y coordinates/ x can be bigger but the first chosen one is the correct one

It is not fully clear to me what are your conditions, and what some of the code's purpose it, but what seems very wrong is assigning new value to i.location[something].
i are html elements, you can get their location, but I don't think setting them is functioning.
Update: element's location is the location in pixels from the top left corner of the page. You mention columns and rows: if there is some kind of table on the page, I doubt i.location would work for you identifying the columns and rows you are looking for. If you like to work with pixel offsets, though, you can check action_chains for moving the mouse, and clicking. https://seleniumhq.github.io/selenium/docs/api/py/webdriver/selenium.webdriver.common.action_chains.html

How can I slice a string from the index of the first occurrence of a sub string to the second occurrence of a sub string in Python?

I'm working on a homework assignment where I have to figure out how to slice a string starting at the index of the first occurrence of a sub string to the index of the second occurrence of that sub string. The problem is specifically trying to slice the string "juxtaposition" from the first "t" to the second "t" and we are supposed to use .find() to do so, but I haven't been able to figure it out.
I've tried to use WHILE to create a loop to find the index of the different occurrences of the sub string and then use that to slice the string, but I haven't been able to make it work.
This is what I've been able to come up with so far:
long_word = "juxtaposition"
location = long_word.find("t")
start_index = location
stop_index = 0
while location != -1:
stop_index + long_word.find("t",location + 1)
print(long_word[int(start_index),int(stop_index)])
When I ran this it didn't show an error message but it doesn't show an output either, and in order to edit the cell again I have to interrupt the kernel.

There are a million ways to approach this. One, which is a bit ugly but interesting for learning is to slice your string such as: mystring[start:stop] where you specify the start point as the first .find() and the stop point as the second .find().
The stop point is interesting, because you're passing .find()+1 as the start point of .find() so it skips the first instance of the letter. The final +1 is to include the 't' in the output if you want it.
Typically in python this would be frowned upon because it's unnecessarily unreadable, but I thought I'd post it to give you an idea of how flexible you can be in solving these problems
long_word[long_word.find('t'):long_word.find('t',long_word.find('t')+1)+1]
Output
'taposit'

def find_all(a_str, sub):
start = 0
while True:
start = a_str.find(sub, start)
if start == -1: return
yield start
start += len(sub) # use start += 1 to find overlapping matches
long_word = "juxtaposition"
location = "t"
locations = (list(find_all(long_word, location)))
substr = long_word[locations[0]:locations[1]+1]
print (substr)
output:
taposit

The find method on strings in Python accepts a second parameter for the index in the string to begin searching. In order to find the second occurrence of the substring's index, provide the first occurrence's index + 1 as the second parameter to find:
def get_substring(long_word, search):
first_occurence_idx = long_word.find(search)
if first_occurence_idx == -1:
return
# for second call of `find`, provide start param so that it only searches
# after the first occurence
second_occurence_idx = long_word.find(search, first_occurence_idx + 1)
if second_occurence_idx == -1:
return
return long_word[first_occurence_idx:second_occurence_idx + len(search)]
# example provided
assert get_substring('juxtaposition', 't') == 'taposit'
# case where search occurs once in long_word
assert get_substring('juxtaposition', 'x') is None
# case where search is not in long_word
assert get_substring('juxtaposition', 'z') is None
# case where len(search > 1) and search occurs twice
assert get_substring('juxtaposition justice', 'ju') == 'juxtaposition ju'

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to parse text to dataframe? - python

Related

Dataframe Is No Longer Accessible

Include a header from Excel in a for loop with openpyxl

Assigning a variable from a different file with a string

Python Selenium - replacing the location['x'] value within an iteration followed by click()

How can I slice a string from the index of the first occurrence of a sub string to the second occurrence of a sub string in Python?

Categories

Resources