Transforming variables using .apply() [closed]

Transforming variables using .apply() [closed] - python

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
The unique values of Shell are 1, 2 and 3. Write a function that changes these values 1, 2 3 to “one”, “two”, “three” respectively. Apply the function to the Shelf column using the .apply() method of the DataFrame to transform the values 1, 2, 3 to “one”, “two”, “three” respectively. Do not create an extra column but override the existing Shell column. The new Shell column should now have unique values “one”, “two”, “three”.
The dataframe is in the picture below

In you future question always add a sample data in format that is easy to be copied. Also write what have you tried to achieve your goal and what part causes you a problem. Because no one will give you the answer on more advanced topic and you will learn nothing hence fail on your next task.
You could do like that:
values_map = {
'1': 'one',
'2': 'two',
'3': 'three'
}
df['Shelf'] = df['Shelf'].apply(lambda x: values_map[x])

Related

Python : for loop for multi variables [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 12 months ago.
Improve this question
It seems a simple question, but I'm new to Python.
I have 10 variables (those names are from A to J), those variables are float32 np.arrays. I want to apply the following command :
variable = variable*mask[0,:,:]; variable[variable==0] = np.nan
On all variables, just in one line rather than writing 10 lines, taking into account keeping variables names the same.
Psuedocode exmaple
FOR all variables A-J
variable = variable*mask[0,:,:]; variable[variable==0] = np.nan
ENDFOR

You can do something like this?
variables = [a,b,c]
for i in range(len(variables)):
x = variables[i]
variables[i] = x*mask[0,:,:]; x[x==0] = np.nan
note: this just updates the items in the list

I have a list in which items are getting added at any index dynamically , so i want to get the latest element added in list? [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 1 year ago.
Improve this question
a = [5,2,7]
If 2 is the latest element added in a then return 2 .

You can use this syntax:
>>> list_ = [0, 1, 2]
>>> list_[-1]
2

There is no direct way of knowing how a list was modified. Python does not keep track of this information. This means you would have to keep a copy of the list before updating it and run something like
a = [5,2,7]
old_a = a.copy()
a[1] = 0
[old_a[i] for i,v in enumerate(a) if old_a[i]!=v]
However, if you are able to keep track of this, you are certainly able to keep track of the added value, and to run the tests before adding it to the new list. In summary, the design of what you are doing should probably be reconsidered.

Pandas Error- None of Index are in the columns [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 months ago.
Improve this question
Problem Statement
I have the CSV data as shown in the image. From this, I have to use only keep RegionName, State and the quarterly mean values from 2000 - 2016. Also, I want to use multi-indexing with [State, RegionName].
I am working on a CSV file with pandas in python. As shown in the screenshot.
Thank you in advance.

Right before the troublesome for year in range(...) loop, you did:
house_data.columns = pd.to_datetime(house_data.columns).to_period('M')
That means your columns are no longer strings. So inside the for loop:
house_data[str(year)+'q2'] = house_data[[str(year)+'-04',...]].mean(axis=1)
would fail and throw that error since there are no column with string name. To fix this, do this instead:
house_data.columns = pd.to_datetime(house_data.columns).to_period('M').strftime('%Y-%m')
However, you are better do:
house_data.columns = pd.to_datetime(house_data.columns).to_period('Q')
house_data.groupby(level=0, axis=1).mean()

How to assign names to a dataframe in Pyspark [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 2 years ago.
Improve this question
At every step shall I be introducing a new variable name or I can continue to use the same name. Kindly advise what's the best practice and why?
df1 = df.withColumn('last_insert_timestamp', lit(datetime.now())
df2 = df1.withColumn('process_date', lit(rundate)
Versus
df = df.withColumn('last_insert_timestamp', lit(datetime.now())
df = df.withColumn('process_date', lit(rundate)

There is no best practice for that. It depends on what you want to do.
In Python, variables are just labels assigned to an object. So if you need your original DF object to be modified through your code then change the assignment to the newly generated DF.
Now, if you need to keep the first DF for other processing later in the code, then you may assign a new variable name.
You might find more explanations here: Reassigning Variables in Python

You can use like this
df = df.withColumn('last_insert_timestamp', lit(datetime.now()) \
.withColumn('process_date', lit(rundate)

How to reduce the time complexity of this nested loop code in python [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 3 years ago.
Improve this question
Please help me reduce the time complexity of the nested loop in Python
df is a dataframe with say 3 columns, say name, city and date for eg
rep data frame has the average/means based on 2 columns name and city from df. I need to reattach the mean from rep to df
for i in range(0,len(rep)):
for j in range(k,len(df)):
if df["X"][j] == rep["X"][i]:
df["Mean"][j] = rep["Mean"][i]
else:
k=j
break

What you want is something like:
df.set_index('X').join(rep.set_index('X'))
Setting as index the keys on which you are doing the join will make the process much faster. After you have done the join, you can filter the old mean (with the drop dataframe method), and the values that you don't want

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Transforming variables using .apply() [closed] - python

Related

Python : for loop for multi variables [closed]

I have a list in which items are getting added at any index dynamically , so i want to get the latest element added in list? [closed]

Pandas Error- None of Index are in the columns [closed]

How to assign names to a dataframe in Pyspark [closed]

How to reduce the time complexity of this nested loop code in python [closed]

Categories

Resources