How to apply a python code on rows that are the same? - python

I have an Excel file containing some data. The first row contains names, some of which are similar.
I want to create a loop in Python in order to apply an operation on each group of similar rows.
I tried this but it didn't help because it shows only one group of my data :
df.loc[df['name'] == 'sofia']
DATA
This is my data it represents the data of a cell of a network
in this very long file like 19000 rows i have the data of each cell for two years 2020 and 2021 and i have to the prediction of the data_trend of the year 2022 so i did the holtwinters method on one cell only and i wanted to apply it on all the cells this is my problem iam begginer in python so i will be so gratful if you can help me out
thanks in advance

Can you give us some example data and describe what you want to do with the rows?
Spontaneously it sounds like you should do something like:
def your_func(x):
your operations
return results of operations
df['col_to_upd'] = df.groupby('name')['col_to_upd'].apply(your_func)

This page may be relevant for you to look up the Pandas functions. Like erikheld writes, it is difficult to give an exact solution to your problem, with no data matrix provided, or example of the operation you want to perform on each group.

Related

How do I create a column with certain conditions built in (not the same as a conditional column) in python

I've attached a screenshot of a table in excel, but I'm doing this in pythonenter image description here
I'm trying to recreate the column "predict" in python, I have the other columns already. I am trying to get the first row of "predict" to be equal to the first row of "ytd" and then for every value following that one, I want it to be the result of the "nc" value multiplied by the previous value in the "predict" column. It doesn't have to be done in this particular order or in this way, I just want that to be the end result, and any clear help to achieve that would be much appreciated. I feel like there should be a way to do this with conditionals, but I am struggling to find the right combination of information.
Have you got any code in Python? Is the information there or are you reading the information from the excel file, and then printing out or saving to the file? I didn't quite understand the question.

How to group specific number of records into subsets to be proceeded in Spark?

I'd like to start by saying that I'm completely new to Spark so this may be a trivial question but I appreciate any feedback.
What would I like to do?
I want to process data in the following way:
Read data using Structured Streaming Spark component.
I count statistics i.e. mean, median on 10 consecutive records in each column (6 columns) and on all 10 records through all columns.
I write this data into a SQL table
What is the issue?
My problem mainly concerns points 2 and 3. I do not know how to group the records into subsets (chunks) consisting of 10 records and process them per column and for all 10 records simultaneously. Do you have any suggestions?
The second question is about writing to an SQL table because I don't know exactly which functionality of SPARK to use for this. Could you provide me any hints about it?
If any more specifications is needed, feel free to inform me about it. I would be very glad to add it.

Manipulating data for network analysis

I am trying to manipulate my dataframe before I conduct network analysis using networkx.
Here is an sample of data i got:
sample data
I am trying to use the title and cast columns and trun them to something like this:
ideal format
The ideal result is to have one column for each individual actor and the movie/show that he/she is in. If the actor has more than 1 show/movie, I want to have different rows for that actor as well.
Could someone please advise me on how to make it happen? Thank you!!
So to use pandas you first import into the dataframe. Lets call it "f".
import pandas
f = pandas.read_csv('path/to/csv')
after that you can access individual columns by doing:
f['title']
similar to a dictionary. if you want both in the same dataframe, pass in a list of columns like so:
f[['title', 'cast']]
that is as much as I can provide without knowing the extent of the project.

Simple DataFrame question from a beginner

I'm learning dataframe now. I've been stuck in how to get a subset of a dataframe or table with its label index. I know it's a very simple question but I couldn't find the solution in pandas documentation. Hope someone could help me. Appreciate your help.
So, I have a dataframe named df_teams like below:
enter image description here
If I want to get a subtable of a specific team 'Warriors', I can use df_teams[df_teams['nickname']=='Warriors'], resulting a row in the form of dataframe. My question is, what if I want to get a subtable of more teams, say I want information of both 'Warriors' and 'Hawks' to form a new table? Can I do something similar by using logical index and finishing in one line of code?
You could do a bitwise or on the two conditions using the '|' character.
df_teams[(df_teams['nickname']=='Warriors')|(df_teams['nickname']=='Hawks')]
Alternatively if you have a list of values you want to check against you could instead use the isin method to return rows that have one of the values present in the list.
E.g
df_teams[df_teams['nickname'].isin(['Warriors','Hawks'])]

Pandas and complicated filtering and merge/join multiple sub-data frames

I have a seemingly complicated problem and I have a general idea of how I should solve it but I am not sure if it is the best way to go about it. I'll give the scenario and would appreciate any help on how to break this down. I'm fairly new with Pandas so please excuse my ignorance.
The Scenario
I have a CSV file that I import as a dataframe. My example I am working through contains 2742 rows × 136 columns. The rows are variable but the columns are set. I have a set of 23 lookup tables (also as CSV files) named per year, per quarter (range is 2020 3rd quarter - 2015 1st quarter) The lookup files are named as such: PPRRVU203.csv. So that contains values from the 3rd quarter of 2020. The lookup tables are matched by two columns ('Code' and 'Mod') and I use three values that are associated in the lookup.
I am trying to filter sections of my data frame, pull the correct values from the matching lookup file, merge back into the original subset, and then replace into the original dataframe.
Thoughts
I can probably abstract this and wrap in a function but not sure how I can place back in. My question, for those that understand Pandas better than myself, what is the best method to filter, replace the values, and write the file back out.
The straight forward solution would be to filter the original dataframe into 23 separate dataframes, then do the merge on each individual file, then concat into a new dataframe and output to CSV.
This seems highly inefficient?
I can post code but I am looking for more of any high-level thoughts?
Not sure exactly how your DataFrame looks like but Pandas.query() method will maybe prove useful for the selection of data.
name = df.query('columnname == "something"')

Categories