I have dataframe like this
I would like this to convert to a flat table as below
You can use pd.DataFrame.stack(), which could list all dataframe values to a list
df.stack().reset_index()
Out:
Maybe pandas.DataFrame.to_numpy does the trick?
Or to keep the index use pandas.DataFrame.to_records
Related
I cant seem to find a way to split all of the array values from the column of a dataframe.
I have managed to get all the array values using this code:
The dataframe is as follows:
I want to use value.counts() on the dataframe and I get this
I want the array values that are clubbed together to be split so that I can get the accurate count of every value.
Thanks in advance!
You could try .explode(), which would create a new row for every value in each list.
df_mentioned_id_exploded = pd.DataFrame(df_mentioned_id.explode('entities.user_mentions'))
With the above code you would create a new dataframe df_mentioned_id_exploded with a single column entities.user_mentions, which you could then use .value_counts() on.
I am trying to convert a Pandas DataFrame to a dictionary. I would like to have pid be the key and the remaining two columns be values within the tuples.
I have tried aggregated_events.set_index('pid').to_dict('list') and aggregated_events.set_index('pid').to_dict() but know I am missing something. Any help would be greatly appreciated!
Original Dataframe
You can first transpose your dataframe to get first column as new column names, something like this:
df = df.set_index('pid').T
Then you can use to_dict to convert a dataframe to a dictionary.
I have Pandas DataFrame in this form:
How can I transform this into a new DataFrame with this form:
I am beginning to use Seaborn and Plotly for plotting, and it seems like they prefer data to be formatted in the second way.
Lets try set_index(), unstack(), renamecolumns
`df.set_index('Date').unstack().reset_index().rename(columns={'level_0':'Name',0:'Score'})`
How it works
df.set_index('Date')#Sets Date as index
df.set_index('Date').unstack()#Flips, melts the dataframe
d=df.set_index('Date').unstack().reset_index()# resets the datframe and allocates columns, those in index become level_suffix and attained values become 0
d.rename(columns={'level_0':'Name',0:'Score'})#renames columns
Use melt function in pandas
df.melt(id_vars="Date", value_vars=["Andy", "Barry", "Cathy"], var_name="Name", value_name="Score")
This should work :
df.stack().reset_index(level=1).rename(columns={'level_1':'Name')
I'm afraid this might not be the right way of doing this, so any other ideas are welcome. But I have a function that takes a DataFrame to do some calculation. However, now I need to iterate over another DataFrame rows and pass a that row to that function, but as a 1-tuple DataFrame:
I've tried:
for u in df.iterrows():
foo(u)
But foo is a tuple... I can do several steps and convert to DataFrame (maybe), but is there a clean (or better way than) iterating over rows and returning a 1-line dataframe?
It is much more efficient to use apply function than iterating through the rows of a dataframe.
Assuming that df is your dataframe, you can write this:
df.apply(lambda x: foo(pd.DataFrame(x)), axis=1)
Try this:
import pandas as pd
df = pd.DataFrame([{'A':10, 'B':100}, {'A':11,'B':110}, {'A':12,'B':120}])
for index, row in df.iterrows():
tpl = tuple([row['A'], row['B']])
I would like to collect the values of the first column of a pandas dataframe into an array. How can I accomplish this?
So far I have tried this:
first_column_values = df.iloc[:,[0]]
But it is not the result I wish to have.
You're close. It should be:
first_column_values = df.iloc[:, 0].values