This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
I have 2 DataFrames as shown below:
df1:
OSIED geometry
257005 POLYGON ((311852.712 178933.993, 312106.023 17...
017049 POLYGON ((272943.107 137755.159, 272647.627 13...
017032 POLYGON ((276637.425 146141.397, 276601.509 14.
df2:
small_area Median_BER
2570059001 212.9
017049002 212.9
217112003 212.9
I need to search for df1.col1 in df2.col2 using "contains" logic and if it matches, get all the columns from both dataframes:
osied geometry small_area ber
257005 POLYGON ((311852.71 2570059001 212.9
I am new to python, which function which does this? isin function isn't useful here.
Updated:
Try this:
if any(df1.col1.isin(df2.col1)):
pd.concat([df1, df2], axis=1)
I think what you are probably looking for is some kind of merge. You can do:
df2.merge(df1, left_on='col2', right_on='col1', how='inner')
or change the 'how' argument based on what you're looking for.
Related
This question already has answers here:
Filter pandas DataFrame by substring criteria
(17 answers)
Closed 2 years ago.
I have 2 dataframes df1 and df2.
I would like to get all rows in df1 that has exact string match in column B of df2
This is df1:
df1={"columnA":['apple,cherry','pineple,lemon','banana, pear','cherry, pear, lemon']}
df1=pd.DataFrame(df1)
This is df2:
df2={"columnB":['apple','cherry']}
df2=pd.DataFrame(df2)
Below code output incorrect result:
df1[df1['columnA'].str.contains('|'.join(df2['columnB'].values))]
Pineapple is not supposed to appear as this is not exact match.
How can i get result like this:
Without actual reproducible code it's harder to help you, but I think this should work:
words = [rf'\b{string}\b' for string in df2.columnB]
df1[df1['columnA'].str.contains('|'.join(words))]
df1={"columnA":['apple,cherry','pineple,lemon','banana, pear','cherry, pear, lemon']}
df1=pd.DataFrame(df1)
df2={"columnB":['apple','cherry']}
df2=pd.DataFrame(df2)
Larger way of doing it ,but correct and simpler
list1=[]
for i in range(0,len(df1)):
for j in range(0,len(df2)):
if(df2["columnB"][j] in df1["columnA"][i]):
list1.append(i)
break
df=df1.loc[list1]
Answer
ColumnA
0 apple,cherry
3 cherry, pear, lemon
You were very close, but you will need to apply the word-operator of regex:
df1[df1['columnA'].str.contains("\b(" + '|'.join(df2['columnB'].values) + ")\b")]
This will look for the complete words.
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 3 years ago.
So here's my daily challenge :
I have an Excel file containing a list of streets, and some of those streets will be doubled (or tripled) based on their road type. For instance :
In another Excel file, I have the street names (without duplicates) and their mean distances between features such as this :
Both Excel files have been converted to pandas dataframes as so :
duplicates_df = pd.DataFrame()
duplicates_df['Street_names'] = street_names
dist_df=pd.DataFrame()
dist_df['Street_names'] = names_dist_values
dist_df['Mean_Dist'] = dist_values
dist_df['STD'] = std_values
I would like to find a way to append the values of mean distance and STD many times in the duplicates_df whenever a street has more than one occurence, but I am struggling with the proper syntax. This is probably an easy fix, but I've never done this before.
The desired output would be :
Any help would be greatly appreciated!
Thanks again!
pd.merge(duplicates_df, dist_df, on="Street_names")
This question already has answers here:
Pandas: Selecting DataFrame rows between two dates (Datetime Index)
(3 answers)
Select rows between two DatetimeIndex dates
(2 answers)
Closed 4 years ago.
I've got a data frame of weekly stock price returns that are indexed by date, as follows.
FTSE_350 SP_500
2005-01-14 -0.004498 -0.001408
2005-01-21 0.001287 -0.014056
2005-01-28 0.011469 0.002988
2005-02-04 0.016406 0.027037
2005-02-11 0.015315 0.001887
I would like to return a data frame of rows where the index is in some interval, let's say all dates in January 2005. I'm aware that I could do this by turning the index into a "Date" column, but I was wondering if there's any way to do this directly.
Yup, there is, even simpler than doing a column!
Using .loc function, then just slice the dates out, like:
print(df.loc['2005-01-01':'2005-01-31'])
Output:
FTSE_350 SP_500
2005-01-14 -0.004498 -0.001408
2005-01-21 0.001287 -0.014056
2005-01-28 0.011469 0.002988
Btw, if index are objects, do:
df.index = pd.to_datetime(df.index)
before everything.
As #Peter mentioned The best is:
print(df.loc['2005-01'])
Also outputs:
FTSE_350 SP_500
2005-01-14 -0.004498 -0.001408
2005-01-21 0.001287 -0.014056
2005-01-28 0.011469 0.002988
This question already has answers here:
Pandas Merging 101
(8 answers)
Closed 4 years ago.
I have a dictionary of pandas dataframes, each frame contains timestamps and market caps corresponding to the timestamps, the keys of which are:
coins = ['dashcoin','litecoin','dogecoin','nxt']
I would like to create a new key in the dictionary 'merge' and using the pd.merge method merge the 4 existing dataframes according to their timestamp (I want completed rows so using 'inner' join method will be appropriate.
Sample of one of the data frames:
data2['nxt'].head()
Out[214]:
timestamp nxt_cap
0 2013-12-04 15091900
1 2013-12-05 14936300
2 2013-12-06 11237100
3 2013-12-07 7031430
4 2013-12-08 6292640
I'm currently getting a result using this code:
data2['merged'] = data2['dogecoin']
for coin in coins:
data2['merged'] = pd.merge(left=data2['merged'],right=data2[coin], left_on='timestamp', right_on='timestamp')
but this repeats 'dogecoin' in 'merged', however if data2['merged'] is not = data2['dogecoin'] (or some similar data) then the merge function won't work as the values are non existent in 'merge'
EDIT: my desired result is create one merged dataframe seen in a new element in dictionary 'data2' (data2['merged']), containing the merged data frames from the other elements in data2
Try replacing the generalized pd.merge() with actual named df but you must begin dataframe with at least a first one:
data2['merged'] = data2['dashcoin']
# LEAVE OUT FIRST ELEMENT
for coin in coins[1:]:
data2['merged'] = data2['merged'].merge(data2[coin], on='timestamp')
Since you've already made coins a list, why not just something like
data2['merged'] = data2[coins[0]]
for coin in coins[1:]:
data2['merged'] = pd.merge(....
Unless I'm misunderstanding, this question isn't specific to dataframes, it's just about how to write a loop when the first element has to be treated differently to the rest.
This question already has answers here:
How to access pandas groupby dataframe by key
(6 answers)
Closed 8 years ago.
I want to group a dataframe by a column, called 'A', and inspect a particular group.
grouped = df.groupby('A', sort=False)
However, I don't know how to access a group, for example, I expect that
grouped.first()
would give me the first group
Or
grouped['foo']
would give me the group where A=='foo'.
However, Pandas doesn't work like that.
I couldn't find a similar example online.
Try: grouped.get_group('foo'), that is what you need.
from io import StringIO # from StringIO... if python 2.X
import pandas
data = pandas.read_csv(StringIO("""\
area,core,stratum,conc,qual
A,1,a,8.40,=
A,1,b,3.65,=
A,2,a,10.00,=
A,2,b,4.00,ND
A,3,a,6.64,=
A,3,b,4.96,=
"""), index_col=[0,1,2])
groups = data.groupby(level=['area', 'stratum'])
groups.get_group(('A', 'a')) # make sure it's a tuple
conc qual
area core stratum
A 1 a 8.40 =
2 a 10.00 =
3 a 6.64 =