enter image description here
I am facing problem in converting DataFrame A to DataFrame B. I have tried using the .transpose() method. However, it did not work. Please help if you can. I cannot share the code, as it is confidential.
Please check these functions:
melt
explode
stack
Also, converting your df (or parts of it) to dictionary may be useful: to_dict.
Related
I am trying to format a pandas DataFrame value representation.
Basically, all I want is to get the "Thousand" separator on my values.
I managed to do it using the pd.style.format function. It does the job, but also "breaks" all my table original design.
here is an example of what is going on:
Is there anything I can do to avoid doing it? I want to keep the original table format, only changing the format of the value.
PS: Don't know if it makes any difference, but I am using Google Colab.
In case anyone is having the same problem as I was using Colab, I have found a solution:
.set_table_attributes('class="dataframe"') seems to solve the problem
More infos can be found here: https://github.com/googlecolab/colabtools/issues/1687
For this case you could do:
pdf.assign(a=pdf['a'].map("{:,.0f}".format))
I'm new to coding and having a hard time expressing/searching for the correct terms to help me along with this task. In my work I get some pretty large excel-files from people out in the field monitoring birds. The results need to be prepared for databases, reports, tables and more. I was hoping to use Python to automate some tasks for this.
How can I use Python (pandas?) to find certain rows/columns based on a common name/ID but with a unique suffix , and aggregate/sum the results that belongs together under that common name? As an example in the table provided I need get all the results from sub-localities e.g. AA3_f, AA3_lf and AA3_s expressed as the sum (total of gulls for each species) of the subs in a new row for the main Locality AA3.
Can someone please provide some code for this task, or help me in some other way? I have searched and watched many tutorials on python, numpy, pandas and also matplotlib .. still clueless on how to set this up
any help appreciated
Thanks!
Update:
#Harsh Nagouda, thanks for your reply. I tried your example using groupby function, but I having trouble dividing into correct groups. The "Locality" column has only unique values/ID because they all have a suffix (they are sub categories).
I tried to solve this by slicing the strings:
eng.Locality.str.slice(0,4,1)
i managed to slice off the suffices so that the remainders = AA3_ , AA4_ and so on.
Then i tried to do this slicing in the groupby function. That failed. Then I tried to slice using pandas.Dataframe.apply(). That failed as well.
eng["Locality"].apply(eng.Locality.str.slice(0,4,1))
sum = eng.groupby(["Locality"].str.slice(0,4,1)).sum()
Any more help out there? As you can see above - I need it :-)
In your case, the pd.groupby option seems to be a good fit for the problem. The groupby function does exactly what it means, it groups parts of the dataframe you like it to.
Since you mentioned a case based on grouping by localities and finding the sum of those values, this snippet should help you out:
sum = eng.groupby(["Locality"]).sum()
Additional commands and sorting styles can be found here:
https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.groupby.html
I finally figured out a way to get it done. Maybe not the smoothest way, but at least I get the end result I need:
Edited the Locality-ID to remove suffix:eng["Locality"]=eng["Locality].str.slice(0,4,1)
Used the groupby function:sum = eng.groupby(["Locality"]).sum()
End result:
Table
Here are my data and index value image :
As in the snap pandas Dataframe returning two values. What could be possibly wrong? I am beginner, sorry for the bad editing.
I think I see the issue.
data['Title'].iloc[0]
Try something like this. I think the .head() portion of the code is causinng you issues
I'm learning dataframe now. I've been stuck in how to get a subset of a dataframe or table with its label index. I know it's a very simple question but I couldn't find the solution in pandas documentation. Hope someone could help me. Appreciate your help.
So, I have a dataframe named df_teams like below:
enter image description here
If I want to get a subtable of a specific team 'Warriors', I can use df_teams[df_teams['nickname']=='Warriors'], resulting a row in the form of dataframe. My question is, what if I want to get a subtable of more teams, say I want information of both 'Warriors' and 'Hawks' to form a new table? Can I do something similar by using logical index and finishing in one line of code?
You could do a bitwise or on the two conditions using the '|' character.
df_teams[(df_teams['nickname']=='Warriors')|(df_teams['nickname']=='Hawks')]
Alternatively if you have a list of values you want to check against you could instead use the isin method to return rows that have one of the values present in the list.
E.g
df_teams[df_teams['nickname'].isin(['Warriors','Hawks'])]
I'm using the function del df['column name'] to delete the column in Pandas but there is the error as the attached picture. I have no idea why it does not work. Much appreciated for any help to solve the problem.
You should use the drop method instead.
df.drop(columns='column_name')
And if you want to chage the original Dataframe you should add the inplace=True as an argument to the method.
Also, avoid posting pictures if possible. Posting the written code is often more usufel and makes it easier for someone to help you!