How to extract numbers from excel cell - python

I want to get only the numbers from a cell (excel). I tried the following:
uzemelteto = first_sheet.cell(17, 11)
res = [int(i) for i in uzemelteto.split() if i.isdigit()]
print res
But it gives an error like: AttributeError: 'Cell' object has no attribute 'split'
How can I modify it, to be able to get only digits?

worksheet.cell() returns an object, namely an instance of the class Cell (docs).
A Cell object has a property value, so instead of
uzemelteto.split()
use
uzemelteto.value.split()
or, to be super safe, because the type of cell.value may vary based on the content, you can use
str(uzemelteto.value).split()

Related

Attribute error even after adding the enumerate

I keep getting Attributeerror such as 'tuple' object has no attribute 'split'.
At first I did not have enumerate in this code, and I got attribute error saying that 'int' object has no attribute 'split'. So added te enumerate referencing this question.
But i sill get the same error.
enterreleased_date = []
released_country = []
released_year = []
for i in enumerate(df['released']):
date = i.split("(")[0]
country = i.split("(")[1].replace(')','')
released_date.append(date)
released_country.append(country)
df['released_country'] = released_country
df['released_date'] = released_date
df['released_date'] = pd.to_datetime(df['released_date'])
df['released_year'] = df['released_date'].dt.year
df['released_month'] = df['released_date'].dt.month
#drop the unneccessary columns --yearcorrect was created by accident so we'll delete that as well
df.drop(['year','released','released_date','yearcorrect'], axis=1, inplace=True)
df['logbudget'] = np.log(df['budget'])
df['loggross'] = np.log(df['gross'])
df.head(3) code here
When you are calling this:
for i in enumerate(df['released']):
enumerate(x) return a tuple object of 2 elements. 1st the number i of sample in x and 2nd the i-th element in x itself. What do you get as i then is a tuple of the two elements per iteration. If you are using .split for a string, you need to write:
for i, string in enumerate(df['released']):
So now you get the two elements inside the tuple of each iteration. And then you can use split() as: string.split().
Try:
for i in enumerate(df['released']):
print(i)
And:
for i, string in enumerate(df['released']):
print(i)
print(string )
To see the difference.
EDITED: As #flakes suggests, it seems like you don't actually need to use enumerate. As for-loops in python are wiser than other languages and they iterate over elements in arrays/lists/etc and don't need a number to help them. for elem in df['released']: would be enough.

'tuple' object has no attribute 'startswith' when stripping first and last quote from the string in pandas

I am trying to get rid of single quotes around nested dictionaries in pandas data frame (the first element and last element of an object). I am looping through each row in column metadata.
Example of the nested dictionary that is hidden inside of the quotes is below:
'{"dek": "<p>Don\'t forget to buy a card</p>", "links": {"edit": {"dev": "//patty-menshealth.feature.hearstapps.net/en/content/edit/76517422-96ad-4b5c-a24a-c080c58bce0c", "prod": "//patty-menshealth.prod.com/en/content/edit/76517422-96ad-4b5c-a24a-c080c58bce0c"}}}'
I tried the following:
def string_format(df):
for text in df.iteritems():
if text.startswith("'") and text.endswith("'"):
text = text[1:-1]
return text
string_format(df["metadata"])
Returns AttributeError: 'tuple' object has no attribute 'startswith'
You're using pandas.Series.iteritems which in fact iterate over (index, value) tuples. So to make your code work, you should try changing your loop like this:
for label, text in df.iteritems():
# process text
But i suggest you checking out pandas documentation on working with text. For example, you can index your Series directly via .str accessor.

How To Iterate Through Objects (Saved in List) Using For Loop?

I am running an API to grab some information from a website where I am storing the information in a list '[]'. How can I run a for loop through this information to:
1) Iterate through the list of objects in a for loop (specifically comparing one objects text
2) If one value of the object equals a 1 word, save whole object into a new list
I have tried running a for loop through the list/objects but get the error ''list' object is not callable'
tree = Et.fromstring(response.content)
for child in tree.findall('meterConsumption'):
for audit in child.findall('audit'):
for creator in audit.findall('createdBy'):
for ID in child.findall('id'):
print ('Entry ID: ',ID.text)
for use in child.findall('usage'):
print ('Use: ',use.text)
for cost in child.findall('cost'):
print ('Cost: ',cost.text)
for startdate in child.findall('startDate'):
print ('Startdate: ',startdate.text)
for enddate in child.findall('endDate'):
print ('Enddate: ',enddate.text)
#save object to list
allentries.append(Entry(ID.text,
use.text,
cost.text,
startdate.text,
enddate.text,
creator.text))
for x in allentries():
print (x.entryid)
I am looking to get a list of all key value pairs in the object. For example it would like:
Id[1], use[1], cost[1], startdate[1], enddate[1], creator[1]
Id[2], use[2], cost[2], startdate[2], enddate[2], creator[2]
Id[3], use[3], cost[3], startdate[3], enddate[3], creator[3]
The say from this, if creator == "human".append to all info from this object to a new object list
Triple for loops followed a for loop for child.findall('id') will result in a compile-time error called as identation error.
for child in tree.findall('meterConsumption'):
for audit in child.findall('audit'):
for creator in audit.findall('createdBy'):
#identation error
for ID in child.findall('id'):
print ('Entry ID: ', ID.text)
List object is not callable means u trying to call list-objects.
allentries is a list & u are trying to call a list by using ().
Remove this ().
You'll get the 'list' object is not callable error when you try to call a list like you would call a function. That's happening in your second last line:
for x in allentries():
print (x.entryid)
Since allentries is your list, tacking on a () at the end of it is the syntax for "calling" it, which doesn't make sense for objects that aren't functions. So the correct syntax is:
for x in allentries:
print (x.entryid)
Per your second question, I'd suggest looking into pandas DataFrames as handy ways to work with tabular data in Python. Since child.findall() gives you a list of objects that you're extracting text from with .text, you can pass a dictionary of lists to the DataFrame constructor like so:
import pandas as pd
# Initialize empty dataframe
allentries = pd.DataFrame()
for child in tree.findall('meterConsumption'):
for audit in child.findall('audit'):
for creator in audit.findall('createdBy'):
# Also touched up your indentation
for ID in child.findall('id'):
print ('Entry ID: ',ID.text)
for use in child.findall('usage'):
print ('Use: ',use.text)
for cost in child.findall('cost'):
print ('Cost: ',cost.text)
for startdate in child.findall('startDate'):
print ('Startdate: ',startdate.text)
for enddate in child.findall('endDate'):
print ('Enddate: ',enddate.text)
# Use list comprehensions to extract text attributes
allentries.append({
'ID': [ID.text for ID in child.findall('id')],
'use': [use.text for use in child.findall('use')],
'cost': [cost.text for cost in child.findall('cost')],
'startdate': [startdate.text for startdate in child.findall('startDate')],
'enddate': [enddate.text for enddate in child.findall('endDate')]
})
# Display the final dataframe
print(allentries)

remove spaces from csv field but ignoring datetime objec

i have a csv that is a result of a DB2 query.
For some reason the csv is created like this
"filed1 ", "field2, ","2017-11-24"
i'm able to remove the white spaces inside field with this:
for result in results:
result = [x.strip(' ') for x in result]
csvwriter.writerow(result)
but the date field is <type 'datetime.date'> so i get the error
AttributeError: 'datetime.date' object has no attribute 'strip'
How can i apply the strip function only to string object? Or can i transform the datetime.date object in str object?
Thanks very much
You could change your list comprehension as follows:
result = [str(x).strip() for x in result]
This will first convert all the cells to a string and then apply the strip() on that. Or more directly as follows:
csvwriter.writerow([str(x).strip() for x in result])
Just check the type before:
if isinstance(x,str):
...

Node object is not iterable

I get this error message, when I try to parse the result set, returned by MATCH query. What I want is to somehow convert the resultset to a dictionary. I should say that I know how to access particular fields of the result set - like row['a']['name'], but what I do not like is that I can not convert the whole row['a'] to a dictionary or to get something like row['a'].keys().
So, this is what I tried:
res = graph.cypher.execute("MATCH (a:Object {id: 1}) return a")
for r in res:
print r['a']['id'] # this works
for r in res:
print r['a'].keys() # this does not
#what I want is something like
{x:y for (x,y) in zip(r['a'].keys(), r['a'].values()}
From the documentation, it looks like execute is returning a py2neo.cypher.RecordList of py2neo.cypher.Record objects, which can then be iterated over:
for r in res:
for v in r['a']:
# do something with v
Unfortunately, looking at the source code, there doesn't seem to be an obvious way to access the column name, without doing a dir(r) and filtering the results, e.g. [c for c in dir(r) if not c.startswith('_')].
Edit: Looking at it again, I guess r is the Record while r['a'] is something else. You'll have to see what kinda of object r['a'] is using type(r['a']), and then see if there's a way to access the keys.
The accessors directly attached to the Node object are a shortcut to the properties attribute. Therefore you will want to iterate through r["a"].properties in the same way you would any other dictionary.

Categories