How to transpose a dataframe in pandas [duplicate] - python

This question already has answers here:
How do I melt a pandas dataframe?
(3 answers)
Closed 7 months ago.
I have a table in csv that looks like this:
I want to tranpose it to look like this, where the columns are nmow rows of a new column called ACCOUNTLABEL, and the values are in a corresponding column called VALUE:
Any help? thanks!

You might want to look at pandas.melt function : https://pandas.pydata.org/docs/reference/api/pandas.melt.html
I wouldn't call that a 'transposition' but 'un-pivoting' a table.

Edit: I just noticed that your question has nothing to do with transposing a DataFrame, but I will leave this here, in case it helps.
Use df.T for this. It uses the linked method.
I didn't downvote your question, but someone did because the provided link is the first search result if you google 'transpose pandas dataframe'.

Related

PANDAS: How to rename a column but not lose its previous text in pandas? [duplicate]

This question already has answers here:
Pandas read in table without headers
(5 answers)
Closed 1 year ago.
Okay so I was reading up a text file and using .read_csv() and ended up with this dataframe:
But the problem is that, the im feeling rather rotten... text is ended up being as a column rather than a dataframe feature, and when I try to rename the column I just end up losing the feature all together, and skipping onto the 2nd value in the dataframe:
EDIT:
This is how I read in the text file.
Any answers, comments are heartfully accepted.
The final solution would be (respectfully concluded by #luigigi)
pd.read_csv("emotions.txt", sep=";", header=None)
Thanks!
You can pre-defined the columns name with the code.
df = pd.read_csv('emotions.txt', sep =';', names=['TEXT','EMOTION'], header=None)

How do you remove every second row in a pandas dataframe? [duplicate]

This question already has answers here:
pandas read_csv remove blank rows
(4 answers)
Closed 1 year ago.
I’ve read a file into a dataframe, and every second row is n/a. How do I remove the offending blank rows?
I am assuming there are many ways to do this. But I just use iloc
df = df.iloc[::2,:]
Try it and let me know if it worked for you.

Python Grouping on one column and detailing min and max alphabetical values from another column from a number of rows [duplicate]

This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
I am fairly new to using Python and having come from using SQL I have been using PANDAS to build reports from CSV files with reasonable success. I have been able to answer most of questions thanks mainly to this site, but I dont seem to be able to find an answer to my question:
I have a dataframe which has 2 columns I want to be able to group on the first column and display the lowest and highest alphabetical values from the second column concatenated into a third column. I could do this fairly easy in SQL but as I say I am struggling getting my head around it in Python/Pandas
example:
source data:
LINK_NAME, CITY_NAME
Linka, Citya
Linka, Cityz
Linkb,Cityx
Linkb,Cityc
Desired output:
LINK_NAME,LINKID
Linka, CityaCityz
Linkb,CitycCityx
Edit:
Sorry for missing part of your question.
To sort the strings within each group alphabetically, you could define a function to apply to the grouped items:
def first_and_last_alpha(series):
sorted_series = series.sort_values()
return "".join([sorted_series.iloc[0], sorted_series.iloc[-1]])
df.groupby("LINK_NAME")["CITY_NAME"].apply(first_and_last_alpha)
Original:
Your question seems to be a duplicate of this one.
The same effect, with your data, is achieved by:
df.groupby("LINK_NAME")["CITY_NAME"].apply(lambda x: "".join(x))
where df is your pandas.Dataframe object
In future, it's good to provide a reproducible example, including anything you've attempted before posting. For example, the output from df.to_dict() would allow me to recreate your example data instantly.

Pandas: Dictionary of Dataframes [duplicate]

This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 5 years ago.
I have a function that I made to analyze experimental data (all individual .txt files)
This function outputs a dictionary ({}) of Pandas Dataframes
Is there a efficient way to iterate over this dictionary and output individual dataframes?
Let's say my dictionary is called analysisdict
for key in analysisdict.keys():
dfx=pd.concat([analysisdict[key]['X'], analysisdict[key]['Y']], axis=1)
Where dfx would be an individual dataframe. (I'm guessing a second loop might be required? Perhaps I should iterate through a list of df names?)
The output would be df1...dfn
EDIT: I initially misread your question, and thought you wanted to concatenate all the DataFrames into one. This does that:
dfx = pd.concat([df for df in analysisdict.values()], ignore_index=True)
(Thanks to #paul-h for the ignore_index=True tip)
I read your question more carefully and realized that you're asking how to assign each DataFrame in your dictionary to its own variable, resulting in separate DataFrames named df1, df2, ..., dfn. Everything in my experience says that dynamically creating variables in this way is an anti-pattern, and best left to dictionaries. Check out the discussion here: How can you dynamically create variables via a while loop?

Copy Pandas DataFrame using '=' trick [duplicate]

This question already has an answer here:
pandas dataframe, copy by value
(1 answer)
Closed 5 years ago.
I have two pandas DataFrames, sdm. I wanted to create a copy of that DataFrame and work on that and later, I want to create another copy from sdm and work on different analysis. However, when I create a new Data Frame like this,
new_df = sdm
It creates a copy, however, when I alter new_df, it makes changes to the my old DataFrame sdm. How can I handle this without using =?
What python does is passing by reference. Try this:
new_df = sdm.copy()
I think you should have search more, I am sure there will be lots of questions on this topic!
you need to use new_df = sdm.copy() instead which is described here in the official documentation. new_df = sdm doesn't work because this assignement operation performs a copy by reference and not by value which means in nutshell, both new_df and sdm will reference the same data in memory.

Categories