passing a column as parameter in python function

passing a column as parameter in python function - python

I have been trying to store unqiue values in a column from pandas data frame using the following code to further use in the function. Code snippet:
def mergeFields(data,field):
oldlist = pd.unique(data[field])
data_rcvd_from_satish.tags = mergeFields("data_rcvd_from_satish","tags")
Error:
TypeError: string indices must be integers, not list
I know the error which i have been getting is similar to many other question still I am not able to resolve the error. I would request not to consider this duplicate and please answer.

data is a string. Please review the arguments passed to mergeFields. you basically wrote:
"data_rcvd_from_satish"["tags"]
which is invalid.

Related

Pandas function to add field to dataframe does not work

I have some code which I want to use in a dynamic python function. This code adds a field to an existing dataframe and does some adjustments to it. However, I got the error "TypeError: string indices must be integers". What am I doing incorrectly?
See below the function plus the code for calling the function.
import pandas as pd
#function
def create_new_date_column_in_df_based_on_other_date_string_column(df,df_field_existing_str,df_field_new):
df[df_field_new] = df[df_field_existing_str]
df[df_field_new] = df[df_field_existing_str].str.replace('12:00:00 AM','')
df[df_field_new] = df[df_field_existing_str].str.strip()
df[df_field_new]=pd.to_datetime(df[df_field_existing_str]).dt.strftime('%m-%d-%Y')
return df[df_field_new]
#calling the function
create_new_date_column_in_df_based_on_other_date_string_column(df='my_df1',df_field_existing_str='existingfieldname',df_field_new='newfieldname')

The parameters df you are giving the function is of type str, and so is df_field_existing_str.
What basically you're doing is trying to slice a string/get a specific characters by using the [] (or the .__getitem__() method) with another string which is impossible.
You are not using a DataFrame here, only strings, thus you are getting this TypeError.

Python / Pyspark Indexing and Slicing issue on Databricks

I'm not entirely sure if I need to index or slice to retrieve elements from an output in Python.
For example, the variable "Ancestor" produces the following output.
Out[30]: {'ancestorPath': '/mnt/lake/RAW/Internal/origination/dbo/xpd_opportunitystatushistory/1/Year=2022/Month=11/Day=29/Time=05-11',
'dfConfig': '{"sparkConfig":{"header":"true"}}',
'fileFormat': 'SQL'}
The element "xpd_opportunitystatushistory" is a table and I would like to retrieve "xpd_opportunitystatushistory" from the output.
I was thinking of something like:
table = Ancestor[:6]
But it fails.
Any thoughts?
I have been working on this while waiting for help.
the following
Ancestor['ancestorPath']
Give me
Out[17]: '/mnt/lake/RAW/Internal/origination/dbo/xpd_opportunitystatushistory/1/Year=2022/Month=11/Day=29/Time=05-11'
If someone could help with the remaining code to pull out 'xpd_opportunitystatushistory' that would be most helpful
ta

Ancestor is a dictionary (key value pairs) and hence has to be accessed using a key which in this case is ancestorPath.
I have assigned the value similar to yours and was able to retrieve ancesterPath as you have figured out.
Now to get the xpd_opportunitystatushistory you can use the following code. Since the value of Ancestor['ancestorPath'] is a string, you can split and then extract the required value from the resulting array:
req_array = Ancestor['ancestorPath'].split("/")
print(req_array)
print(req_array[7])
If you want to retrieve complete path until xpd_opportunitystatushistory, then you can use the following instead:
req_array = Ancestor['ancestorPath'].split("/")
print(req_array)
print('/'.join(req_array[:8]))

TypeError: list indices must be integers or slices, not str in Global terrorism dataset

filterYear = data['Year'] == 1970
I am getting an error of
1 filterYear = data['Year'] == 1970
TypeError: list indices must be integers or slices, not str
I tried to see the datatype of the series and its numeric.
I am at a loss

Without a fuller selection of code that you're trying (unsuccessfully) to run, it's hard to be sure of just where your bug lies, but I suspect your data is a list, of the type:
data = [thing1, thing2, thing3, etc.]
If so, then to retrieve thing2 from data, you must index it using an integer representing the position of thing2 in data, like so: data[1].
The string 'Year' is not an integer, and so cannot be used to identify which member of the list data you are trying to retrieve - which leads to the error you've encountered.
If you wished to store data with labels, you might want to consider using a dict:
dict1 = {'label1' : 'thing1', 'label2' : 'thing2'}
Set up in this way, if you invoked dict1['label1'], you would obtain 'thing1'.
Like I said, a fuller description of both your code, and precisely what you're trying to achieve, would help formulate a more helpful response.
Best of luck.

It looks like your index is not a DatTimeIndex. Check with print(data.index). You can set the index with data = data.set_index('name_of_column_with_data'). Please show a sample of your DataFrame for further help.
(assuming you are using pandas, not clear from your question)

This type of error occurs when indexing a list with anything other than integers or slices, as the error mentions. Usually, the most straightforward method for solving this problem is to convert any relevant values to integer type using the int() function. Something else to watch out for is accidentally using list values to index a list, which also gives us the type error.

ValueError: too many values to unpack (expected 2) due to entrance value in python

I am trying to call the function add_site. But each time, I tried the Python console shows:
ValueError: too many values to unpack (expected 2)
My code is shown below. Each print function is verified with success. Instance and solution are dictionaries and the others are array or list.
call to the function
definition of the function add_site
How to solve this problem ? Thank you for your help.

The reason is that you are returning only one variable instead of two in the first if condition when it is set to True.
Also, you should add your code as text instead of images so it's easier for us to have a look.

Why is "numpy.int32" not able to be printed here? (Using geopandas + python 3.9.5)

Here is the relevant code:
import geopandas as gpd
#A shape file (.shp) is imported here, contents do not matter, since the "size()" function gets the size of the contents
shapefile = 'Data/Code_Specific/ne_50m_admin_1_states_provinces/ne_50m_admin_1_states_provinces.shp'
gdf = gpd.read_file(shapefile)[['admin', 'adm0_a3', 'postal', 'geometry']]
#size
#Return an int representing the number of elements in this object.
print(gdf.size())
I am getting an error for the last line of code,
TypeError: 'numpy.int32' object is not callable
The main purpose for this is that I am trying to integrade gdf.size() into a for loop:
for index in range(gdf.size()):
print("test", index)
#if Austrailia, remove
if gdf.get('adm0_a3')[index] == "AUS":
gdf = gdf.drop(gdf.index[index])
I have absolutely no clue what to do here, this is my first post on this site ever. Hope I don't get guilded with a badge of honor for how stupid or simple this is, I'm stumped.

gpd.read_file will return either a GeoDataFrame or a DataFrame object, both of which have the attribute size which returns an integer. The attribute is simply accessed with gdf.size and by adding brackets next to it, you get your error.
size is the wrong attribute to use, as for a table it returns the number of rows times the number of columns. At first glance the following should work
for index in gdf.index:
...
but you're modifying the length of an iterable while iterating from it. This can throw everything out of sync and cause a KeyError if you drop an index and before you try to access it. Since all you want to do is filter some rows, simply use
gdf = gdf[gdf['adm0_a3'] != 'AUS']

I think the function you are looking for is,
gdf.shape[0]
or
len(gdf.index)
I think the first option is more readable but the second one is faster.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

passing a column as parameter in python function - python

data is a string. Please review the arguments passed to mergeFields. you basically wrote: "data_rcvd_from_satish"["tags"] which is invalid.

Related

Pandas function to add field to dataframe does not work

Python / Pyspark Indexing and Slicing issue on Databricks

TypeError: list indices must be integers or slices, not str in Global terrorism dataset

ValueError: too many values to unpack (expected 2) due to entrance value in python

Why is "numpy.int32" not able to be printed here? (Using geopandas + python 3.9.5)

Categories

Resources