It's probably something very stupid but I can't find a solution to not print the indexes when executing the code.My code goes:
Reading the excel file and choosing a specific component
df= pd.read_excel('Components.xlsx')
component_name = 'Name'
Forcing the index to be a certain column
df = df.set_index(['TECHNICAL DATA'])
Selecting data in a cell with df.loc
component_lifetime=df.loc[['Life time of Full unit'],component_name]
print(componet_lifetime)
What I get is:
TECHNICAL DATA
Life time of Full unit 20
Is it possible to hide all the index data and only print 20? Thank you ^^
Use pd.DataFrame.at for scalar access by label:
res = df.at['Life time of Full unit', 'Name']
A short guide to indexing:
Use iat / at for scalar access / setting by integer position or label respectively.
Use iloc / loc for non-scalar access / setting by integer position or label respectively.
You can also extract the NumPy array via values, but this is rarely necessary.
Related
I am trying to use a dictionary value to define the slice ranges for the iloc function but I keep getting the error -- Can only index by location with a [integer, integer slice (START point is INCLUDED, END point is EXCLUDED), listlike of integers, boolean array] . The excel sheet is built for visual information and not in any kind of real table format (not mine so I can’t change it) so I have to slice the specific ranges without column labels.
tried code - got the error
cr_dict= {'AA':'[42:43,32:65]', 'BB':'[33:34, 32:65]'}
df = my_df.iloc[cr_dict['AA']]
the results I want would be similar to
df = my_df.iloc[42:43,32:65]
I know I could change the dictionary and use the following but it looks convoluted and not as easy to read– is there a better way?
Code
cr_dict= {'AA':[42,43,32,65], 'BB':'[33,34, 32,65]'}
df = my_df.iloc[cr_dict['AA'][0]: cr_dict['AA'][0], cr_dict['AA'][0]: cr_dict['AA'][0]]
Define your dictionaries slightly differently.
cr_dict= {'AA':[42,43]+list(range(32,65)),
'BB':[33,34]+list(range(32,65))}
Then you can slice your DataFrame like so:
>>> my_df.iloc[cr_dict["AA"], cr_dict["BB"]].sort_index()
I have a spreadsheet with fields containing a body of text.
I want to calculate the Gunning-Fog score on each row and have the value output to that same excel file as a new column. To do that, I first need to calculate the score for each row. The code below works if I hard key the text into the df variable. However, it does not work when I define the field in the sheet (i.e., rfds) and pass that through to my r variable. I get the following error, but two fields I am testing contain 3,896 and 4,843 words respectively.
readability.exceptions.ReadabilityException: 100 words required.
Am I missing something obvious? Disclaimer, I am very new to python and coding in general! Any help is appreciated.
from readability import Readability
import pandas as pd
df = pd.read_excel(r"C:/Users/name/edgar/test/item1a_sandbox.xls")
rfd = df["Item 1A"]
rfds = rfd.to_string() # to fix "TypeError: expected string or buffer"
r = Readability(rfds)
fog = r.gunning_fog()
print(fog.score)
TL;DR: You need to pass the cell value and are currently passing a column of cells.
This line rfd = df["Item 1A"] returns a reference to a column. rfd.to_string() then generates a string containing either length (number of rows in the column) or the column reference. This is why a TypeError was thrown - neither the length nor the reference are strings.
Rather than taking a column and going down it, approach it from the other direction. Take the rows and then pull out the column:
for index, row in df.iterrows():
print(row.iloc[2])
The [2] is the column index.
Now a cell identifier exists, this can be passed to the Readability calculator:
r = Readability(row.iloc[2])
fog = r.gunning_fog()
print(fog.score)
Note that these can be combined together into one command:
print(Readability(row.iloc[2]).gunning_fog())
This shows you how commands can be chained together - which way you find it easier is up to you. The chaining is useful when you give it to something like apply or applymap.
Putting the whole thing together (the step by step way):
from readability import Readability
import pandas as pd
df = pd.read_excel(r"C:/Users/name/edgar/test/item1a_sandbox.xls")
for index, row in df.iterrows():
r = Readability(row.iloc[2])
fog = r.gunning_fog()
print(fog.score)
Or the clever way:
from readability import Readability
import pandas as pd
df = pd.read_excel(r"C:/Users/name/edgar/test/item1a_sandbox.xls")
print(df["Item 1A"].apply(lambda x: Readability(x).gunning_fog()))
Let's assume we have a simple dataframe like this:
df = pd.DataFrame({'col1':[1,2,3], 'col2':[10,20,30]})
Then I can select elements like this
df.col2[0] or df.col2[1]
But if I want to select the last element with df.col2[-1] it results in the error message:
KeyError: -1
I know that there are workarounds to that. I could do for example df.col2[len(df)-1] or df.iloc[-1,1]. But why wouldn't be the much simpler version of indexing directly by -1 be allowed? Am I maybe missing another simple selection way for -1? Tnx
The index labels of your DataFrame are [0,1,2]. Your code df.col2[1] is an equivalent of using a loc function as df['col2'].loc[1](or df.col2.loc[1]). You can see that you index does not contain a label '-1' (which is why you get the KeyError).
For positional indexing you need to use an iloc function (which you can use on Pandas Series as well as DataFrame), so you could do df['col2'].iloc[-1] (or df.col2.iloc[-1]).
As you can see, you can use both label based ('col2') and position based (-1) indexing together, you don't need to choose one or another as df.iloc[-1,1] or df.col2[len(df)-1] (which would be equivalent to df.loc[lend(df)-1,'col2'])
I am trying to read a google sheet using python using the gspread library.
The initial authentication settings is done and I am able to read the respective sheet.
However when I do
sheet.get_all_records()
The column containing numeric like values (eg. 0001,0002,1000) are converted as numeric field. That is the leading zeroes are truncated. How to prevent this from happening?
You can prevent gspread from casting values to int passing the numericise_ignore parameter to the get_all_records() method.
You can disable it for a specific list of indices in the row:
# Disable casting for columns 1, 2 and 4 (1 indexed):
sheet.get_all_records(numericise_ignore=[1, 2, 4])
Or, disable it for the whole row values with numericise_ignore set to 'all' :
sheet.get_all_records(numericise_ignore=['all'])
How about this answer? In this answer, as one of several workarounds, get_all_values() is used instead of get_all_records(). After the values are retrieved, the array is converted to the list. Please think of this as just one of several answers.
Sample script:
values = worksheet.get_all_values()
head = values.pop(0)
result = [{head[i]: col for i, col in enumerate(row)} for row in values]
Reference:
get_all_values()
If this was not the direction you want, I apologize.
Here's what I have in my dataframe-
RecordType Latitude Longitude Name
L 28.2N 70W Jon
L 34.3N 56W Dan
L 54.2N 72W Rachel
Note: The dtype of all the columns is object.
Now, in my final dataframe, I only want to include those rows in which the Latitude and Longitude fall in a certain range (say 24 < Latitude < 30 and 79 < Longitude < 87).
My idea is to apply a function to all the values in the Latitude and Longitude columns to first get float values like 28.2, etc. and then to compare the values to see if they fall into my range. So I wrote the following-
def numbers(value):
return float(value[:-1])
result[u'Latitude'] = result[u'Latitude'].apply(numbers)
result[u'Longitude'] = result[u'Longitude'].apply(numbers)
But I get the following warning-
Warning:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
I'm having a hard time understanding this since I'm new to Pandas. What's the best way to do this?
If you don't want to modify df, I would suggest getting rid of the apply and vectorising this. One option is using eval.
u = df.assign(Latitude=df['Latitude'].str[:-1].astype(float))
u['Longitude'] = df['Longitude'].str[:-1].astype(float)
df[u.eval("24 < Latitude < 30 and 79 < Longitude < 87")]
You have more options using Series.between:
u = df['Latitude'].str[:-1].astype(float))
v = df['Longitude'].str[:-1].astype(float))
df[u.between(24, 30, inclusive=False) & v.between(79, 87, inclusive=False)]
As for why Pandas threw that particular A value is trying to be set on a copy of a slice... warning and how to avoid it:
First, using this syntax should prevent the error message:
result.loc[:,'Latitude'] = result['Latitude'].apply(numbers)
Pandas gave you the warning because your .apply() function may be attempting to modify a temporary copy of Latitude/Longitude columns in your dataframe. Meaning, the column is copied to a new location in memory before the operation is performed on it. The article you referenced (http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy) gives examples of why this could potentially cause unexpected problems in certain situations.
Pandas instead recommends that you instead use syntax that will ensure you are modifying a view of your dataframe's column with the .apply() operation. Doing this will ensure that your dataframe ends up being modified in the manner you expect. The code I wrote above using .loc will tell Pandas to access and modify the contents of that column in-place in memory, and this will keep Pandas from throwing the warning that you saw.