This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 2 years ago.
I have two data frames where one looks like this and is called top_10_unique_artists:
and one that looks like this called artists:
I am trying to do an inner join based on the artistID by saying
import pandas as pd
top_10_unique_users.join(artists, on=top_10_unique_users.artistID)
however, when I do that the inner join is clearly not working properly because it is joining different ID's together rather than finding the artists in the artist table with the same ID as shown below:
You can use merge function, this way you can specify different column names in the two dataframes
import pandas as pd
pd.merge(top_10_unique_users,artists, how='left', left_on = 'artistID', right_on='id')
I cannot test the code as you only provided screenshots and not actual code but that should work.
Related
This question already has answers here:
Pandas groupby with delimiter join
(2 answers)
Closed 8 months ago.
I have a CSV file called Project.csv
I am reading this file using pandas df = pd.read_csv('Project.csv',low_memory=False)
Inside this CSV file, there are rows with duplicate Project ID and Name but the other Column data are unique. I was looking for a way to find duplicate rows based on Project ID and merge records with ',' if they are unique.
Project.csv
I am looking to store this record in a data frame and filter it to make it look like this:
A simply groupby will do the job
after_df = your_df.groupby(['Project Id'])[['Project Name','Project Types','Owner']].agg(set)
This will give you a similar result to what you want. If you want to take out the key symbool of the strings parameters so you have a nice looking string do this.
after_df.astype(str).replace(r'{|}|\'','',regex=True)
This question already has answers here:
Import multiple CSV files into pandas and concatenate into one DataFrame
(20 answers)
How do I combine two dataframes?
(8 answers)
Closed 8 months ago.
I am trying to join a lot of CSV files into a single dataframe after doing some conversions and filters, when I use the append method for the sn2 dataframe, the exported CSV contains all the data I want, however when I use the append method for the sn3 dataframe, only the data from the last CSV is exported, what am I missing?
sn2=pd.DataFrame()
sn3=pd.DataFrame()
files=os.listdir(load_path)
for file in files:
df_temp=pd.read_csv(load_path+file)
df_temp['Date']=file.split('.')[0]
df_temp['Date']=pd.to_datetime(df_temp['Date'],format='%Y%m%d%H%M')
filter1=df_temp['Name']=='Atribute1'
temp1=df_temp[filter1]
sn2=sn2.append(temp1)
filter2=df_temp['Name']=='Atribute2'
temp2=df_temp[filter2]
sn3=pd.concat([temp2])
You have to pass all the dataframes that you want to concatenate to concat:
sn3 = pd.concat([sn3, temp2])
This question already has answers here:
Joining pandas DataFrames by Column names
(3 answers)
Pandas Merging 101
(8 answers)
Closed last year.
I am following this article, but I was only able to get it to work by making sure there were matching titles, the two still had computer names, but they were called differently in the title, how could I modify my command so that it still references the same column, is that possible?
lj_df2 = pd.merge(d2, d3, on="PrimaryUser", how="left")
For example, I have this, but on my other csv, I have Employee # not primary user
This question already has answers here:
Concatenate strings from several rows using Pandas groupby
(8 answers)
Closed 3 years ago.
I am fairly new to using Python and having come from using SQL I have been using PANDAS to build reports from CSV files with reasonable success. I have been able to answer most of questions thanks mainly to this site, but I dont seem to be able to find an answer to my question:
I have a dataframe which has 2 columns I want to be able to group on the first column and display the lowest and highest alphabetical values from the second column concatenated into a third column. I could do this fairly easy in SQL but as I say I am struggling getting my head around it in Python/Pandas
example:
source data:
LINK_NAME, CITY_NAME
Linka, Citya
Linka, Cityz
Linkb,Cityx
Linkb,Cityc
Desired output:
LINK_NAME,LINKID
Linka, CityaCityz
Linkb,CitycCityx
Edit:
Sorry for missing part of your question.
To sort the strings within each group alphabetically, you could define a function to apply to the grouped items:
def first_and_last_alpha(series):
sorted_series = series.sort_values()
return "".join([sorted_series.iloc[0], sorted_series.iloc[-1]])
df.groupby("LINK_NAME")["CITY_NAME"].apply(first_and_last_alpha)
Original:
Your question seems to be a duplicate of this one.
The same effect, with your data, is achieved by:
df.groupby("LINK_NAME")["CITY_NAME"].apply(lambda x: "".join(x))
where df is your pandas.Dataframe object
In future, it's good to provide a reproducible example, including anything you've attempted before posting. For example, the output from df.to_dict() would allow me to recreate your example data instantly.
This question already has answers here:
How can you dynamically create variables? [duplicate]
(8 answers)
Closed 5 years ago.
I have a function that I made to analyze experimental data (all individual .txt files)
This function outputs a dictionary ({}) of Pandas Dataframes
Is there a efficient way to iterate over this dictionary and output individual dataframes?
Let's say my dictionary is called analysisdict
for key in analysisdict.keys():
dfx=pd.concat([analysisdict[key]['X'], analysisdict[key]['Y']], axis=1)
Where dfx would be an individual dataframe. (I'm guessing a second loop might be required? Perhaps I should iterate through a list of df names?)
The output would be df1...dfn
EDIT: I initially misread your question, and thought you wanted to concatenate all the DataFrames into one. This does that:
dfx = pd.concat([df for df in analysisdict.values()], ignore_index=True)
(Thanks to #paul-h for the ignore_index=True tip)
I read your question more carefully and realized that you're asking how to assign each DataFrame in your dictionary to its own variable, resulting in separate DataFrames named df1, df2, ..., dfn. Everything in my experience says that dynamically creating variables in this way is an anti-pattern, and best left to dictionaries. Check out the discussion here: How can you dynamically create variables via a while loop?