Reindexing and filling on one level of a hierarchical index in pandas - python

I have a pandas dataframe with a two level hierarchical index ('item_id' and 'date'). Each row has columns for a variety of metrics for a particular item in a particular month. Here's a sample:
total_annotations unique_tags
date item_id
2007-04-01 2 30 14
2007-05-01 2 32 16
2007-06-01 2 36 19
2008-07-01 2 81 33
2008-11-01 2 82 34
2009-04-01 2 84 35
2010-03-01 2 90 35
2010-04-01 2 100 36
2010-11-01 2 105 40
2011-05-01 2 106 40
2011-07-01 2 108 42
2005-08-01 3 479 200
2005-09-01 3 707 269
2005-10-01 3 980 327
2005-11-01 3 1176 373
2005-12-01 3 1536 438
2006-01-01 3 1854 497
2006-02-01 3 2206 560
2006-03-01 3 2558 632
2007-02-01 3 5650 1019
As you can see, there are not observations for all consecutive months for each item. What I want to do is reindex the dataframe such that each item has rows for each month in a specified range. Now, this is easy to accomplish for any given item. So, for item_id 99, for example:
baseDateRange = pd.date_range('2005-07-01','2013-01-01',freq='MS')
data.xs(99,level='item_id').reindex(baseDateRange,method='ffill')
But with this method, I'd have to iterate through all the item_ids, then merge everything together, which seems woefully over-complicated.
So how can I apply this to the full dataframe, ffill-ing the observations (but also the item_id index) such that each item_id has properly filled rows for all the dates in baseDateRange?

Essentially for each group you want to reindex and ffill. The apply gets passed a data frame that has the item_id and date still in the index, so reset, then set and reindex with filling.
idx is your baseDateRange from above.
In [33]: df.groupby(level='item_id').apply(
lambda x: x.reset_index().set_index('date').reindex(idx,method='ffill')).head(30)
Out[33]:
item_id annotations tags
item_id
2 2005-07-01 NaN NaN NaN
2005-08-01 NaN NaN NaN
2005-09-01 NaN NaN NaN
2005-10-01 NaN NaN NaN
2005-11-01 NaN NaN NaN
2005-12-01 NaN NaN NaN
2006-01-01 NaN NaN NaN
2006-02-01 NaN NaN NaN
2006-03-01 NaN NaN NaN
2006-04-01 NaN NaN NaN
2006-05-01 NaN NaN NaN
2006-06-01 NaN NaN NaN
2006-07-01 NaN NaN NaN
2006-08-01 NaN NaN NaN
2006-09-01 NaN NaN NaN
2006-10-01 NaN NaN NaN
2006-11-01 NaN NaN NaN
2006-12-01 NaN NaN NaN
2007-01-01 NaN NaN NaN
2007-02-01 NaN NaN NaN
2007-03-01 NaN NaN NaN
2007-04-01 2 30 14
2007-05-01 2 32 16
2007-06-01 2 36 19
2007-07-01 2 36 19
2007-08-01 2 36 19
2007-09-01 2 36 19
2007-10-01 2 36 19
2007-11-01 2 36 19
2007-12-01 2 36 19

Constructing on Jeff's answer, I consider this to be somewhat more readable. It is also considerably more efficient since only the droplevel and reindex methods are used.
df = df.set_index(['item_id', 'date'])
def fill_missing_dates(x, idx=all_dates):
x.index = x.index.droplevel('item_id')
return x.reindex(idx, method='ffill')
filled_df = (df.groupby('item_id')
.apply(fill_missing_dates))

Related

Continuous dates for products in Pandas

I started to work with Pandas and I have some issues that I don't really know how to solve.
I have a dataframe with date, product, stock and sales. Some dates and products are missing. I would like to get a timeseries for each product in a range of dates.
For example:
product udsStock udsSales
date
2019-12-26 14 161 848
2019-12-27 14 1340 914
2019-12-30 14 856 0
2019-12-25 4 3132 439
2019-12-27 4 3177 616
2020-01-01 4 500 883
It has to be the same range for all products even if one product doesn't appear in one date in the range.
If I want the range 2019-12-25 to 2020-01-01, the final dataframe should be like this one:
product udsStock udsSales
date
2019-12-25 14 NaN NaN
2019-12-26 14 161 848
2019-12-27 14 1340 914
2019-12-28 14 NaN NaN
2019-12-29 14 NaN NaN
2019-12-30 14 856 0
2019-12-31 14 NaN NaN
2020-01-01 14 NaN NaN
2019-12-25 4 3132 439
2019-12-26 4 NaN NaN
2019-12-27 4 3177 616
2019-12-28 4 NaN NaN
2019-12-29 4 NaN NaN
2019-12-30 4 NaN NaN
2019-12-31 4 NaN NaN
2020-01-01 4 500 883
I have tried to reindex by the range but it doesn't work because there are identical indexes.
idx = pd.date_range('25-12-2019', '01-01-2020')
df = df.reindex(idx)
I also have tried to index by date and product and then reindex, but I don't know how to put the product that is missing.
Any more ideas?
Thanks in advance
We can use pd.date_range and groupby.reindex to achieve your result:
date_range = pd.date_range(start='2019-12-25', end='2020-01-01', freq='D')
df = df.groupby('product', sort=False).apply(lambda x: x.reindex(date_range))
df['product'] = df.groupby(level=0)['product'].ffill().bfill()
df = df.droplevel(0)
product udsStock udsSales
2019-12-25 14.0 NaN NaN
2019-12-26 14.0 161.0 848.0
2019-12-27 14.0 1340.0 914.0
2019-12-28 14.0 NaN NaN
2019-12-29 14.0 NaN NaN
2019-12-30 14.0 856.0 0.0
2019-12-31 14.0 NaN NaN
2020-01-01 14.0 NaN NaN
2019-12-25 4.0 3132.0 439.0
2019-12-26 4.0 NaN NaN
2019-12-27 4.0 3177.0 616.0
2019-12-28 4.0 NaN NaN
2019-12-29 4.0 NaN NaN
2019-12-30 4.0 NaN NaN
2019-12-31 4.0 NaN NaN
2020-01-01 4.0 500.0 883.0
Convert index to datetime object :
df2.index = pd.to_datetime(df2.index)
Create unique combinations of date and product :
import itertools
idx = pd.date_range("25-12-2019", "01-01-2020")
product = df2["product"].unique()
temp = itertools.product(idx, product)
temp = pd.MultiIndex.from_tuples(temp, names=["date", "product"])
temp
MultiIndex([('2019-12-25', 14),
('2019-12-25', 4),
('2019-12-26', 14),
('2019-12-26', 4),
('2019-12-27', 14),
('2019-12-27', 4),
('2019-12-28', 14),
('2019-12-28', 4),
('2019-12-29', 14),
('2019-12-29', 4),
('2019-12-30', 14),
('2019-12-30', 4),
('2019-12-31', 14),
('2019-12-31', 4),
('2020-01-01', 14),
('2020-01-01', 4)],
names=['date', 'product'])
Reindex dataframe :
df2.set_index("product", append=True).reindex(temp).sort_index(
level=1, ascending=False
).reset_index(level="product")
product udsStock udsSales
date
2020-01-01 14 NaN NaN
2019-12-31 14 NaN NaN
2019-12-30 14 856.0 0.0
2019-12-29 14 NaN NaN
2019-12-28 14 NaN NaN
2019-12-27 14 1340.0 914.0
2019-12-26 14 161.0 848.0
2019-12-25 14 NaN NaN
2020-01-01 4 500.0 883.0
2019-12-31 4 NaN NaN
2019-12-30 4 NaN NaN
2019-12-29 4 NaN NaN
2019-12-28 4 NaN NaN
2019-12-27 4 3177.0 616.0
2019-12-26 4 NaN NaN
2019-12-25 4 3132.0 439.0
In R, specifically tidyverse, it can be achieved with the complete method. In Python, the pyjanitor package has something similar, but a few kinks remain to be ironed out (A PR has been submitted already for this).

Locating columns values in pandas dataframe with conditions

We have a dataframe (df_source):
Unnamed: 0 DATETIME DEVICE_ID COD_1 DAT_1 COD_2 DAT_2 COD_3 DAT_3 COD_4 DAT_4 COD_5 DAT_5 COD_6 DAT_6 COD_7 DAT_7
0 0 200520160941 002222111188 35 200408100500.0 12 200408100400 16 200408100300 11 200408100200 19 200408100100 35 200408100000 43
1 19 200507173541 000049000110 00 190904192701.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 20 200507173547 000049000110 00 190908185501.0 08 190908185501 NaN NaN NaN NaN NaN NaN NaN NaN NaN
3 21 200507173547 000049000110 00 190908205601.0 08 190908205601 NaN NaN NaN NaN NaN NaN NaN NaN NaN
4 22 200507173547 000049000110 00 190909005800.0 08 190909005800 NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
159 775 200529000843 000049768051 40 200529000601.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
160 776 200529000843 000049015792 00 200529000701.0 33 200529000701 NaN NaN NaN NaN NaN NaN NaN NaN NaN
161 779 200529000843 000049180500 00 200529000601.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
162 784 200529000843 000049089310 00 200529000201.0 03 200529000201 61 200529000201 NaN NaN NaN NaN NaN NaN NaN
163 786 200529000843 000049768051 40 200529000401.0 NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
We calculated values_cont, a dict, for a subset:
v_subset = ['COD_1', 'COD_2', 'COD_3', 'COD_4', 'COD_5', 'COD_6', 'COD_7']
values_cont = pd.value_counts(df_source[v_subset].values.ravel())
We obtained as result (values, counter):
00 134
08 37
42 12
40 12
33 3
11 3
03 2
35 2
43 2
44 1
61 1
04 1
12 1
60 1
05 1
19 1
34 1
16 1
Now, the question is:
How to locate values in columns corresponding to counter, for instance:
How to locate:
df['DEVICE_ID'] # corresponding with values ('00') and counter ('134')
df['DEVICE_ID'] # corresponding with values ('08') and counter ('37')
...
df['DEVICE_ID'] # corresponding with values ('16') and counter ('1')
I believe you need DataFrame.melt with aggregate join for ID and GroupBy.size for counts.
This implementation will result in a dataframe with a column (value) for the CODES, all the associated DEVICE_IDs, and the count of ids associated with each code.
This is an alternative to values_cont in the question.
v_subset = ['COD_1', 'COD_2', 'COD_3', 'COD_4', 'COD_5', 'COD_6', 'COD_7']
df = (df_source.melt(id_vars='DEVICE_ID', value_vars=v_subset)
.dropna(subset=['value'])
.groupby('value')
.agg(DEVICE_ID = ('DEVICE_ID', ','.join), count= ('value','size'))
.reset_index())
print (df)
value DEVICE_ID count
0 00 000049000110,000049000110,000049000110,0000490... 7
1 03 000049089310 1
2 08 000049000110,000049000110,000049000110 3
3 11 002222111188 1
4 12 002222111188 1
5 16 002222111188 1
6 19 002222111188 1
7 33 000049015792 1
8 35 002222111188,002222111188 2
9 40 000049768051,000049768051 2
10 43 002222111188 1
11 61 000049089310 1
# print DEVICE_ID for CODES == '03'
print(df.DEVICE_ID[df.value == '03'])
[out]:
1 000049089310
Name: DEVICE_ID, dtype: object
Given the question as related to df_source, to select specific parts of the dataframe, use Pandas: Boolean Indexing
# to return all rows where COD_1 is '00'
df_source[df_source.COD_1 == '00']
# to return only the DEVICE_ID column where COD_1 is '00'
df_source['DEVICE_ID'][df_source.COD_1 == '00']
You can use df.iloc to search out rows that match based on columns. Then from that row you can select the column of interest and output it. There may be a more pythonic way to do this.
df2=df.iloc[df['COD_1']==00]
df3=df2.iloc[df2['DAT_1']==134]
df_out=df3.iloc['DEVICE_ID']
here's more info in .iloc: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html

Need to iterate over row to check conditions and retrieve values from different columns if the conditions are met

I have a daily price data for a stock. Pasting last 31 rows of the data as an example dataset as below:
Date RSI Smooth max min
110 2019-02-13 38.506874 224.006543 NaN NaN
111 2019-02-14 39.567068 227.309923 NaN NaN
112 2019-02-15 43.774479 229.830776 NaN NaN
113 2019-02-18 43.651440 231.690179 NaN NaN
114 2019-02-19 43.467237 232.701976 NaN NaN
115 2019-02-20 44.370123 233.526131 NaN NaN
116 2019-02-21 45.605073 233.834988 233.834988 NaN
117 2019-02-22 46.837518 232.335179 NaN NaN
118 2019-02-25 42.087860 229.570711 NaN NaN
119 2019-02-26 39.008014 226.379526 NaN NaN
120 2019-02-27 39.542339 225.607475 NaN 225.607475
121 2019-02-28 39.051104 228.305615 NaN NaN
122 2019-03-01 48.191687 232.544289 NaN NaN
123 2019-03-05 51.909527 237.063534 NaN NaN
124 2019-03-06 52.988668 240.243201 NaN NaN
125 2019-03-07 54.205990 242.265173 NaN NaN
126 2019-03-08 54.967076 243.912033 NaN NaN
127 2019-03-11 58.080738 244.432163 244.432163 NaN
128 2019-03-12 55.587328 243.573710 NaN NaN
129 2019-03-13 51.714123 241.191933 NaN NaN
130 2019-03-14 48.948075 238.470485 NaN NaN
131 2019-03-15 46.615111 236.144640 NaN NaN
132 2019-03-18 48.219815 233.588265 NaN NaN
133 2019-03-19 41.866898 230.271903 NaN 230.271903
134 2019-03-20 34.818844 239.457110 NaN NaN
135 2019-03-22 42.167870 246.824173 NaN NaN
136 2019-03-25 60.228588 255.294124 NaN NaN
137 2019-03-26 66.896640 267.069173 NaN NaN
138 2019-03-27 68.823285 278.222343 NaN NaN
139 2019-03-28 63.654023 289.042091 289.042091 NaN
I am trying to develop a logic of code which as below:
if max > 0, then search for the previous non-zero max value which and assign it to max2. Also, assign the corresponding RSI of previous non-zero max as RSI2.
Desired output:
For line 139 in the data set, max2 will be 244.432163 and RSI2 will be 58.080738
For line 138 in the data set, max2 will be 0 and RSI 2 will be 0 and so on...
I tried different approached but was unsuccessful at getting any outputs so I do not have a sample code to paste.
I also tried using if loops but I am unable to make it work. I am very new at programming.
First you will need to iterate the dataframe.
Then you will need to store the previous values that you will need to save on the next hit. Since you are always going back to the previous max, you can reuse that as you loop through.
Something like this (did not test, just for an idea):
last_max = 0
last_rsi = 0
for index, row in df.iterrows():
if row['max']:
row['max2'] = last_max
row['rsi2'] = last_rsi
last_max = row['max'] # store this max/rsi for next time
last_rsi = row['rsi']
The right answer is to add a line of code as below:
df[['max2', 'RSI2']] = df[['max', 'RSI']].dropna(subset=['max']).shift(1).fillna(0)

How to read and write table with extra information as a dataframe and adding new columns from the informations

I have a file-like object generated from StringIO which is a table with lines of information ahead the table (see below starting from #TIMESTAMP).
I want to add extra columns to the exisiting table using the information "Date", "UTCoffset - Time (Substraction)" from #Timestamp and "ZenAngle" from #GLOBAL_SUMMARY.
I used pd.read_csv command to read it but it only worked when I skip the first 8 rows which includes the information I need. Also the Error "TypeError: data argument can't be an iterator" was reported as I tried to import the object below as dataframe.
#TIMESTAMP
UTCOffset,Date,Time
+00:30:32,2011-09-05,08:32:21
#GLOBAL_SUMMARY
Time,IntACGIH,IntCIE,ZenAngle,MuValue,AzimAngle,Flag,TempC,O3,Err_O3,SO2,Err_SO2,F324
08:32:21,7.3576,52.758,59.109,1.929,114.427,000000,24,291,1,,,91.9
#GLOBAL
Wavelength,S-Irradiance,Time
290.0,0.000e+00
290.5,0.000e+00
291.0,4.380e-06
291.5,2.234e-05
292.0,2.102e-05
292.5,2.204e-05
293.0,2.453e-05
293.5,2.256e-05
294.0,3.088e-05
294.5,4.676e-05
295.0,3.384e-05
295.5,3.582e-05
296.0,4.298e-05
296.5,3.774e-05
297.0,4.779e-05
297.5,7.399e-05
298.0,9.214e-05
298.5,1.080e-04
299.0,2.143e-04
299.5,3.180e-04
300.0,3.337e-04
300.5,4.990e-04
301.0,8.688e-04
301.5,1.210e-03
302.0,1.133e-03
I think you can first use read_csv to create 3 DataFrames:
import pandas as pd
import io
temp=u"""#TIMESTAMP
UTCOffset,Date,Time
+00:30:32,2011-09-05,08:32:21
#GLOBAL_SUMMARY
Time,IntACGIH,IntCIE,ZenAngle,MuValue,AzimAngle,Flag,TempC,O3,Err_O3,SO2,Err_SO2,F324
08:32:21,7.3576,52.758,59.109,1.929,114.427,000000,24,291,1,,,91.9
#GLOBAL
Wavelength,S-Irradiance,Time
290.0,0.000e+00
290.5,0.000e+00
291.0,4.380e-06
291.5,2.234e-05
292.0,2.102e-05
292.5,2.204e-05
293.0,2.453e-05
293.5,2.256e-05
294.0,3.088e-05
294.5,4.676e-05
295.0,3.384e-05
295.5,3.582e-05
296.0,4.298e-05
296.5,3.774e-05
297.0,4.779e-05
297.5,7.399e-05
298.0,9.214e-05
298.5,1.080e-04
299.0,2.143e-04
299.5,3.180e-04
300.0,3.337e-04
300.5,4.990e-04
301.0,8.688e-04
301.5,1.210e-03
302.0,1.133e-03
"""
df1 = pd.read_csv(io.StringIO(temp),
skiprows=9)
print (df1)
Wavelength S-Irradiance Time
0 290.0 0.000000 NaN
1 290.5 0.000000 NaN
2 291.0 0.000004 NaN
3 291.5 0.000022 NaN
4 292.0 0.000021 NaN
5 292.5 0.000022 NaN
6 293.0 0.000025 NaN
7 293.5 0.000023 NaN
8 294.0 0.000031 NaN
9 294.5 0.000047 NaN
10 295.0 0.000034 NaN
11 295.5 0.000036 NaN
12 296.0 0.000043 NaN
13 296.5 0.000038 NaN
14 297.0 0.000048 NaN
15 297.5 0.000074 NaN
16 298.0 0.000092 NaN
17 298.5 0.000108 NaN
18 299.0 0.000214 NaN
19 299.5 0.000318 NaN
20 300.0 0.000334 NaN
21 300.5 0.000499 NaN
22 301.0 0.000869 NaN
23 301.5 0.001210 NaN
24 302.0 0.001133 NaN
df2 = pd.read_csv(io.StringIO(temp),
skiprows=1,
nrows=1)
print (df2)
UTCOffset Date Time
0 +00:30:32 2011-09-05 08:32:21
df3 = pd.read_csv(io.StringIO(temp),
skiprows=5,
nrows=1)
print (df3)
Time IntACGIH IntCIE ZenAngle MuValue AzimAngle Flag TempC O3 \
0 08:32:21 7.3576 52.758 59.109 1.929 114.427 0 24 291
Err_O3 SO2 Err_SO2 F324
0 1 NaN NaN 91.9

Python CSV joining columns

I am trying to make a new columnn with conditional statements utilizing Pandas Version 0.17.1. I have two csv's both about 100mb's in size.
What I have:
CSV1:
Index TC_NUM
1241 1105.0017
1242 1105.0018
1243 1105.0019
1244 1105.002
1245 1105.0021
1246 1105.0022
CSV2:
KEYS TC_NUM
UXS-689 3001.0045
FIT-3015 1135.0027
FIT-2994 1140.0156
FIT-2991 1910, 1942.0001, 3004.0004, 3004.0020, 3004.0026, 3004.0063, 3004.0065, 3004.0079, 3004.0084, 3004.0091, 2101.0015, 2101.0016, 2101.0017, 2101.0018, 2101.0050, 2101.0052, 2101.0054, 2101.0055, 2101.0071, 2101.0074, 2101.0075, 2206.0001, 2103.0001, 2103.0002, 2103.0009, 2103.0011, 3000.0004, 3000.0030, 1927.0020
FIT-2990 2034.0002, 3004.0035, 3004.0084, 2034.0001
FIT-2918 3001.0039, 3004.0042
What I want:
Index TC_NUM Matched_Keys
1241 1105.0017 FIT-3015
1242 1105.0018 UXS-668
1243 1105.0019 FIT-087
1244 1105.002 FIT-715
1245 1105.0021 FIT-910
1246 1105.0022 FIT-219
If the TC_NUM in CSV2 matches the TC_NUM from CSV1, it prints the key in a column on CSV1
Code:
dftakecolumns = pd.read_csv('JiraKeysEnv.csv')
dfmergehere = pd.read_csv('output2.csv')
s = dftakecolumns['KEYS']
a = dftakecolumns['TC_NUM']
d = dfmergehere['TC_NUM']
for crows in a:
for toes in d:
if toes == crows:
print toes
dfmergehere['Matched_Keys'] = dftakecolumns.apply(toes, axis=None, join_axis=None, join='outer')
You can try this solution:
Notice - I change value in first (1105.0017) and fourth (1105.0022) row of df2 for test of merge.
print df1
Index TC_NUM
0 1241 1105.0017
1 1242 1105.0018
2 1243 1105.0019
3 1244 1105.0020
4 1245 1105.0021
5 1246 1105.0022
print df2
KEYS TC_NUM
0 UXS-689 1105.0017
1 FIT-3015 1135.0027
2 FIT-2994 1140.0156
3 FIT-2991 1105.0022, 1942.0001, 3004.0004, 3004.0020, 30...
4 FIT-2990 2034.0002, 3004.0035, 3004.0084, 2034.0001
5 FIT-2918 3001.0039, 3004.0042
#convert string column TC_NUM to dataframe df3
df3 = pd.DataFrame([ x.split(',') for x in df2['TC_NUM'].tolist() ])
#convert string df3 to float df3
df3 = df3.astype(float)
print df3
0 1 2 3 4 5 \
0 1105.0017 NaN NaN NaN NaN NaN
1 1135.0027 NaN NaN NaN NaN NaN
2 1140.0156 NaN NaN NaN NaN NaN
3 1105.0022 1942.0001 3004.0004 3004.0020 3004.0026 3004.0063
4 2034.0002 3004.0035 3004.0084 2034.0001 NaN NaN
5 3001.0039 3004.0042 NaN NaN NaN NaN
6 7 8 9 ... 19 20 \
0 NaN NaN NaN NaN ... NaN NaN
1 NaN NaN NaN NaN ... NaN NaN
2 NaN NaN NaN NaN ... NaN NaN
3 3004.0065 3004.0079 3004.0084 3004.0091 ... 2101.0074 2101.0075
4 NaN NaN NaN NaN ... NaN NaN
5 NaN NaN NaN NaN ... NaN NaN
21 22 23 24 25 26 27 \
0 NaN NaN NaN NaN NaN NaN NaN
1 NaN NaN NaN NaN NaN NaN NaN
2 NaN NaN NaN NaN NaN NaN NaN
3 2206.0001 2103.0001 2103.0002 2103.0009 2103.0011 3000.0004 3000.003
4 NaN NaN NaN NaN NaN NaN NaN
5 NaN NaN NaN NaN NaN NaN NaN
28
0 NaN
1 NaN
2 NaN
3 1927.002
4 NaN
5 NaN
[6 rows x 29 columns]
#concat column KEYS to df3
df2 = pd.concat([df2['KEYS'], df3], axis=1)
#stack - rows to one column for merging
df2 = df2.set_index('KEYS').stack().reset_index(level=1,drop=True).reset_index(name='TC_NUM')
print df2
KEYS TC_NUM
0 UXS-689 1105.0017
1 FIT-3015 1135.0027
2 FIT-2994 1140.0156
3 FIT-2991 1105.0022
4 FIT-2991 1942.0001
5 FIT-2991 3004.0004
6 FIT-2991 3004.0020
7 FIT-2991 3004.0026
8 FIT-2991 3004.0063
9 FIT-2991 3004.0065
10 FIT-2991 3004.0079
11 FIT-2991 3004.0084
12 FIT-2991 3004.0091
13 FIT-2991 2101.0015
14 FIT-2991 2101.0016
15 FIT-2991 2101.0017
16 FIT-2991 2101.0018
17 FIT-2991 2101.0050
18 FIT-2991 2101.0052
19 FIT-2991 2101.0054
20 FIT-2991 2101.0055
21 FIT-2991 2101.0071
22 FIT-2991 2101.0074
23 FIT-2991 2101.0075
24 FIT-2991 2206.0001
25 FIT-2991 2103.0001
26 FIT-2991 2103.0002
27 FIT-2991 2103.0009
28 FIT-2991 2103.0011
29 FIT-2991 3000.0004
30 FIT-2991 3000.0030
31 FIT-2991 1927.0020
32 FIT-2990 2034.0002
33 FIT-2990 3004.0035
34 FIT-2990 3004.0084
35 FIT-2990 2034.0001
36 FIT-2918 3001.0039
37 FIT-2918 3004.0042
#merge on column TC_NUM
print pd.merge(df1, df2, on=['TC_NUM'])
Index TC_NUM KEYS
0 1241 1105.0017 UXS-689
1 1246 1105.0022 FIT-2991

Categories