How do I create a DataFrame with multi-level columns? - python
An existing question, Creating a Pandas Dataframe with Multi Column Index, deals with a very "regular" DataFrame where all columns and rows are products and all data is present.
My situation is, alas, different. I have this kind of data:
[{"street": "Euclid", "house":42, "area":123, (1,"bedrooms"):1, (1,"bathrooms"):4},
{"street": "Euclid", "house":19, "area":234, (2,"bedrooms"):3, (2,"bathrooms"):3},
{"street": "Riemann", "house":42, "area":345, (1,"bedrooms"):5,
(1,"bathrooms"):2, (2,"bedrooms"):12, (2, "bathrooms"):17},
{"street": "Riemann", "house":19, "area":456, (1,"bedrooms"):7, (1,"bathrooms"):1}]
and I want this sort of DataFrame with both rows and columns having multi-level indexes:
area 1 2
street house bedrooms bathrooms bedrooms bathrooms
Euclid 42 123 1 4
Euclid 19 234 3 3
Riemann 42 345 5 2 12 17
Riemann 19 456 7 1
So, the row index should be
MultiIndex([("Euclid",42),("Euclid",19),("Riemann",42),("Riemann",19)],
names=["street","house"])
and the columns index should be
MultiIndex([("area",None),(1,"bedrooms"),(1,"bathrooms"),(2,"bedrooms"),(2,"bathrooms")],
names=["floor","entity"])
and I see no way to generate these indexes from the list of dictionaries I have.
i feel there should be something better than this; hopefully someone on SO puts out sth much better:
Create a function to process each entry in the dictionary:
def process(entry):
#read in data and get the keys to be the column names
m = pd.DataFrame.from_dict(entry,orient='index').T
#set index
m = m.set_index(['street','house'])
#create multi-index columns
col1 = [ent[0] if isinstance(ent,tuple) else ent for ent in m.columns ]
col2 = [ent[-1] if isinstance(ent,tuple) else None for ent in m.columns ]
#assign multi-index column to m
m.columns=[col1,col2]
return m
Apply function above to data(i wrapped the dictionary into the data variable):
res = [process(entry) for entry in data]
concatenate to get final output
pd.concat(res)
area 1 2
NaN bedrooms bathrooms bedrooms bathrooms
street house
Euclid 42 123 1 4 NaN NaN
19 234 NaN NaN 3 3
Riemann 42 345 5 2 12 17
19 456 7 1 NaN NaN
Related
Why am I unable to add an existing data into a new dataframe column?
I have a dataset containing 250 employee names, gender and their salary. I am trying to create a new dataframe to simply 'extract' the salary for males and females respectively. This dataframe would have 2 columns, one with Male Salaries and another with Female Salaries. From this dataframe, I would like to create a side by side boxplot with matplotlib to analyse if there is any gender wage gap. # Import libraries import matplotlib.pyplot as plt import numpy as np import pandas as pd df = pd.read_csv("TMA_Data.csv") df.head() #Filter out female employees df_female = df[(df.Gender == "F")] df_female.head() #Filter out male employees df_male = df[(df.Gender == "M ")] df_male.head() #Create new dataframe with only salaries df2 = pd.DataFrame(columns = ["Male Salaries", "Female Salaries"]) print(df2) #Assign Male Salaries column df2["Male Salaries"] = df_male["Salary"] df2.head() #This works Output: Male Salaries Female Salaries 3 93046 NaN 7 66808 NaN 10 46998 NaN 16 74312 NaN 17 50178 NaN #Assign Female Salaries column (THIS IS WHERE THE PROBLEM LIES) df2["Female Salaries"] = df_female["Salary"] df2.head() Output: Male Salaries Female Salaries 3 93046 NaN 7 66808 NaN 10 46998 NaN 16 74312 NaN 17 50178 NaN How come I am unable to add the values for female salaries (nothing seems to be added)? Also, given that my eventual goal is to create two side-by-side boxplots, feel free to suggest if I can do this in a completely different way. Thank you very much! Edit: Dataset preview:
Solution: Use .reset_index: df2 = pd.DataFrame(columns = ["Male Salaries", "Female Salaries"]) df2["Male Salaries"] = df_male["Salary"].reset_index(drop=True) df2["Female Salaries"] = df_female["Salary"].reset_index(drop=True) Explanation: When setting values of a column of the dataframe, they are set to their respective indices. And your Male and Female indices are obviously different, since they came from different rows of the initial dataframe. Example: df = pd.DataFrame([[1], [2], [3]]) df 0 0 1 1 2 2 3 Works as you expected: df[1] = [4, 5, 6] df 0 1 0 1 4 1 2 5 2 3 6 Works NOT as you expected: df[2] = pd.Series([4, 5, 6], index=[1, 0, 999]) df 0 1 2 0 1 4 5.0 1 2 5 4.0 2 3 6 NaN
How can I filter a table based in two values at same time? [duplicate]
This question already has answers here: Pandas Merging 101 (8 answers) Closed 2 years ago. I have two dataframes and I want to filter one dataframe if two other values are not present in the other dataframe. Both dataframes share the name of the columns. For example, dataframe A has: col1,col2 1 5 -10 15 6 7 and dataframe B has: col1,col2 6 7 -10 15 -1 5 So in this example, I would like to pick the value pair in A and see if it is present in B. First row of A has value pair 1,5 and since 1,5 is not present in B that row would be excluded from A. Second and third row of A has value -10,15 and 6,7, and since both are present in B I would like to keep them. So the desired output of the filtered table A would be: col1,col2 -10 15 6 7 How can I achieve this? EDIT: One of the first things I tried was a merge, but the resulting dataframe was actually bigger than the original. Since merge and merging 101 topic was suggested, I will add the real dataframes here. Dataframe A have latitude, longitude and id columns (id is not the index). It has 363 rows: id lat lon 0 0 -33.252192 -70.765291 1 1 -33.224300 -70.780249 2 2 -33.251651 -70.797289 3 3 -33.298574 -70.770133 4 4 -33.214315 -70.787822 ... ... ... ... 358 499 -33.227614 -70.770126 359 501 -33.299217 -70.770685 360 502 -33.191476 -70.801492 361 503 -33.239037 -70.780278 362 504 -33.263893 -70.762674 Dataframe B has 73096 rows and it also has and id, latitude and longitude. I'm putting here only lat and lon. lat lon 1 -33.260415 -70.713767 2 -33.461718 -70.853525 3 -33.258741 -70.638032 4 -33.544858 -70.578624 8 -33.535512 -70.574188 ... ... ... 97724 -33.451817 -70.847999 97725 -33.452225 -70.846520 97726 -33.450841 -70.841494 97729 -33.461407 -70.856090 97730 -33.457633 -70.822085 So I want to see if the lat,lon pair in A is present in B and if not then exclude it from A. When I do A.merge(B) I get a dataframe that is 1108 rows long.
You can try pandas.merge. Something like df1.merge(df2, how='inner', left_on=['col1','col2'], right_on=['col1','col2']). (To help you remember, the naming of these arguments comes from an inner join in database terminology)
A simple merge will do df_out = dfA.merge(dfB) Output col1 col2 0 -10 15 1 6 7 df.merge does an inner join by default.
How to findout difference between two dataframes irrespective of index? [duplicate]
I have two data frames df1 and df2, where df2 is a subset of df1. How do I get a new data frame (df3) which is the difference between the two data frames? In other word, a data frame that has all the rows/columns in df1 that are not in df2?
By using drop_duplicates pd.concat([df1,df2]).drop_duplicates(keep=False) Update : The above method only works for those data frames that don't already have duplicates themselves. For example: df1=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]}) df2=pd.DataFrame({'A':[1],'B':[2]}) It will output like below , which is wrong Wrong Output : pd.concat([df1, df2]).drop_duplicates(keep=False) Out[655]: A B 1 2 3 Correct Output Out[656]: A B 1 2 3 2 3 4 3 3 4 How to achieve that? Method 1: Using isin with tuple df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))] Out[657]: A B 1 2 3 2 3 4 3 3 4 Method 2: merge with indicator df1.merge(df2,indicator = True, how='left').loc[lambda x : x['_merge']!='both'] Out[421]: A B _merge 1 2 3 left_only 2 3 4 left_only 3 3 4 left_only
For rows, try this, where Name is the joint index column (can be a list for multiple common columns, or specify left_on and right_on): m = df1.merge(df2, on='Name', how='outer', suffixes=['', '_'], indicator=True) The indicator=True setting is useful as it adds a column called _merge, with all changes between df1 and df2, categorized into 3 possible kinds: "left_only", "right_only" or "both". For columns, try this: set(df1.columns).symmetric_difference(df2.columns)
Accepted answer Method 1 will not work for data frames with NaNs inside, as pd.np.nan != pd.np.nan. I am not sure if this is the best way, but it can be avoided by df1[~df1.astype(str).apply(tuple, 1).isin(df2.astype(str).apply(tuple, 1))] It's slower, because it needs to cast data to string, but thanks to this casting pd.np.nan == pd.np.nan. Let's go trough the code. First we cast values to string, and apply tuple function to each row. df1.astype(str).apply(tuple, 1) df2.astype(str).apply(tuple, 1) Thanks to that, we get pd.Series object with list of tuples. Each tuple contains whole row from df1/df2. Then we apply isin method on df1 to check if each tuple "is in" df2. The result is pd.Series with bool values. True if tuple from df1 is in df2. In the end, we negate results with ~ sign, and applying filter on df1. Long story short, we get only those rows from df1 that are not in df2. To make it more readable, we may write it as: df1_str_tuples = df1.astype(str).apply(tuple, 1) df2_str_tuples = df2.astype(str).apply(tuple, 1) df1_values_in_df2_filter = df1_str_tuples.isin(df2_str_tuples) df1_values_not_in_df2 = df1[~df1_values_in_df2_filter]
import pandas as pd # given df1 = pd.DataFrame({'Name':['John','Mike','Smith','Wale','Marry','Tom','Menda','Bolt','Yuswa',], 'Age':[23,45,12,34,27,44,28,39,40]}) df2 = pd.DataFrame({'Name':['John','Smith','Wale','Tom','Menda','Yuswa',], 'Age':[23,12,34,44,28,40]}) # find elements in df1 that are not in df2 df_1notin2 = df1[~(df1['Name'].isin(df2['Name']) & df1['Age'].isin(df2['Age']))].reset_index(drop=True) # output: print('df1\n', df1) print('df2\n', df2) print('df_1notin2\n', df_1notin2) # df1 # Age Name # 0 23 John # 1 45 Mike # 2 12 Smith # 3 34 Wale # 4 27 Marry # 5 44 Tom # 6 28 Menda # 7 39 Bolt # 8 40 Yuswa # df2 # Age Name # 0 23 John # 1 12 Smith # 2 34 Wale # 3 44 Tom # 4 28 Menda # 5 40 Yuswa # df_1notin2 # Age Name # 0 45 Mike # 1 27 Marry # 2 39 Bolt
Perhaps a simpler one-liner, with identical or different column names. Worked even when df2['Name2'] contained duplicate values. newDf = df1.set_index('Name1') .drop(df2['Name2'], errors='ignore') .reset_index(drop=False)
edit2, I figured out a new solution without the need of setting index newdf=pd.concat([df1,df2]).drop_duplicates(keep=False) Okay i found the answer of highest vote already contain what I have figured out. Yes, we can only use this code on condition that there are no duplicates in each two dfs. I have a tricky method. First we set ’Name’ as the index of two dataframe given by the question. Since we have same ’Name’ in two dfs, we can just drop the ’smaller’ df’s index from the ‘bigger’ df. Here is the code. df1.set_index('Name',inplace=True) df2.set_index('Name',inplace=True) newdf=df1.drop(df2.index)
Pandas now offers a new API to do data frame diff: pandas.DataFrame.compare df.compare(df2) col1 col3 self other self other 0 a c NaN NaN 2 NaN NaN 3.0 4.0
In addition to accepted answer, I would like to propose one more wider solution that can find a 2D set difference of two dataframes with any index/columns (they might not coincide for both datarames). Also method allows to setup tolerance for float elements for dataframe comparison (it uses np.isclose) import numpy as np import pandas as pd def get_dataframe_setdiff2d(df_new: pd.DataFrame, df_old: pd.DataFrame, rtol=1e-03, atol=1e-05) -> pd.DataFrame: """Returns set difference of two pandas DataFrames""" union_index = np.union1d(df_new.index, df_old.index) union_columns = np.union1d(df_new.columns, df_old.columns) new = df_new.reindex(index=union_index, columns=union_columns) old = df_old.reindex(index=union_index, columns=union_columns) mask_diff = ~np.isclose(new, old, rtol, atol) df_bool = pd.DataFrame(mask_diff, union_index, union_columns) df_diff = pd.concat([new[df_bool].stack(), old[df_bool].stack()], axis=1) df_diff.columns = ["New", "Old"] return df_diff Example: In [1] df1 = pd.DataFrame({'A':[2,1,2],'C':[2,1,2]}) df2 = pd.DataFrame({'A':[1,1],'B':[1,1]}) print("df1:\n", df1, "\n") print("df2:\n", df2, "\n") diff = get_dataframe_setdiff2d(df1, df2) print("diff:\n", diff, "\n") Out [1] df1: A C 0 2 2 1 1 1 2 2 2 df2: A B 0 1 1 1 1 1 diff: New Old 0 A 2.0 1.0 B NaN 1.0 C 2.0 NaN 1 B NaN 1.0 C 1.0 NaN 2 A 2.0 NaN C 2.0 NaN
As mentioned here that df1[~df1.apply(tuple,1).isin(df2.apply(tuple,1))] is correct solution but it will produce wrong output if df1=pd.DataFrame({'A':[1],'B':[2]}) df2=pd.DataFrame({'A':[1,2,3,3],'B':[2,3,4,4]}) In that case above solution will give Empty DataFrame, instead you should use concat method after removing duplicates from each datframe. Use concate with drop_duplicates df1=df1.drop_duplicates(keep="first") df2=df2.drop_duplicates(keep="first") pd.concat([df1,df2]).drop_duplicates(keep=False)
I had issues with handling duplicates when there were duplicates on one side and at least one on the other side, so I used Counter.collections to do a better diff, ensuring both sides have the same count. This doesn't return duplicates, but it won't return any if both sides have the same count. from collections import Counter def diff(df1, df2, on=None): """ :param on: same as pandas.df.merge(on) (a list of columns) """ on = on if on else df1.columns df1on = df1[on] df2on = df2[on] c1 = Counter(df1on.apply(tuple, 'columns')) c2 = Counter(df2on.apply(tuple, 'columns')) c1c2 = c1-c2 c2c1 = c2-c1 df1ondf2on = pd.DataFrame(list(c1c2.elements()), columns=on) df2ondf1on = pd.DataFrame(list(c2c1.elements()), columns=on) df1df2 = df1.merge(df1ondf2on).drop_duplicates(subset=on) df2df1 = df2.merge(df2ondf1on).drop_duplicates(subset=on) return pd.concat([df1df2, df2df1]) > df1 = pd.DataFrame({'a': [1, 1, 3, 4, 4]}) > df2 = pd.DataFrame({'a': [1, 2, 3, 4, 4]}) > diff(df1, df2) a 0 1 0 2
There is a new method in pandas DataFrame.compare that compare 2 different dataframes and return which values changed in each column for the data records. Example First Dataframe Id Customer Status Date 1 ABC Good Mar 2023 2 BAC Good Feb 2024 3 CBA Bad Apr 2022 Second Dataframe Id Customer Status Date 1 ABC Bad Mar 2023 2 BAC Good Feb 2024 5 CBA Good Apr 2024 Comparing Dataframes print("Dataframe difference -- \n") print(df1.compare(df2)) print("Dataframe difference keeping equal values -- \n") print(df1.compare(df2, keep_equal=True)) print("Dataframe difference keeping same shape -- \n") print(df1.compare(df2, keep_shape=True)) print("Dataframe difference keeping same shape and equal values -- \n") print(df1.compare(df2, keep_shape=True, keep_equal=True)) Result Dataframe difference -- Id Status Date self other self other self other 0 NaN NaN Good Bad NaN NaN 2 3.0 5.0 Bad Good Apr 2022 Apr 2024 Dataframe difference keeping equal values -- Id Status Date self other self other self other 0 1 1 Good Bad Mar 2023 Mar 2023 2 3 5 Bad Good Apr 2022 Apr 2024 Dataframe difference keeping same shape -- Id Customer Status Date self other self other self other self other 0 NaN NaN NaN NaN Good Bad NaN NaN 1 NaN NaN NaN NaN NaN NaN NaN NaN 2 3.0 5.0 NaN NaN Bad Good Apr 2022 Apr 2024 Dataframe difference keeping same shape and equal values -- Id Customer Status Date self other self other self other self other 0 1 1 ABC ABC Good Bad Mar 2023 Mar 2023 1 2 2 BAC BAC Good Good Feb 2024 Feb 2024 2 3 5 CBA CBA Bad Good Apr 2022 Apr 2024
A slight variation of the nice #liangli's solution that does not require to change the index of existing dataframes: newdf = df1.drop(df1.join(df2.set_index('Name').index))
Finding difference by index. Assuming df1 is a subset of df2 and the indexes are carried forward when subsetting df1.loc[set(df1.index).symmetric_difference(set(df2.index))].dropna() # Example df1 = pd.DataFrame({"gender":np.random.choice(['m','f'],size=5), "subject":np.random.choice(["bio","phy","chem"],size=5)}, index = [1,2,3,4,5]) df2 = df1.loc[[1,3,5]] df1 gender subject 1 f bio 2 m chem 3 f phy 4 m bio 5 f bio df2 gender subject 1 f bio 3 f phy 5 f bio df3 = df1.loc[set(df1.index).symmetric_difference(set(df2.index))].dropna() df3 gender subject 2 m chem 4 m bio
Defining our dataframes: df1 = pd.DataFrame({ 'Name': ['John','Mike','Smith','Wale','Marry','Tom','Menda','Bolt','Yuswa'], 'Age': [23,45,12,34,27,44,28,39,40] }) df2 = df1[df1.Name.isin(['John','Smith','Wale','Tom','Menda','Yuswa']) df1 Name Age 0 John 23 1 Mike 45 2 Smith 12 3 Wale 34 4 Marry 27 5 Tom 44 6 Menda 28 7 Bolt 39 8 Yuswa 40 df2 Name Age 0 John 23 2 Smith 12 3 Wale 34 5 Tom 44 6 Menda 28 8 Yuswa 40 The difference between the two would be: df1[~df1.isin(df2)].dropna() Name Age 1 Mike 45.0 4 Marry 27.0 7 Bolt 39.0 Where: df1.isin(df2) returns the rows in df1 that are also in df2. ~ (Element-wise logical NOT) in front of the expression negates the results, so we get the elements in df1 that are NOT in df2–the difference between the two. .dropna() drops the rows with NaN presenting the desired output Note This only works if len(df1) >= len(df2). If df2 is longer than df1 you can reverse the expression: df2[~df2.isin(df1)].dropna()
I found the deepdiff library is a wonderful tool that also extends well to dataframes if different detail is required or ordering matters. You can experiment with diffing to_dict('records'), to_numpy(), and other exports: import pandas as pd from deepdiff import DeepDiff df1 = pd.DataFrame({ 'Name': ['John','Mike','Smith','Wale','Marry','Tom','Menda','Bolt','Yuswa'], 'Age': [23,45,12,34,27,44,28,39,40] }) df2 = df1[df1.Name.isin(['John','Smith','Wale','Tom','Menda','Yuswa'])] DeepDiff(df1.to_dict(), df2.to_dict()) # {'dictionary_item_removed': [root['Name'][1], root['Name'][4], root['Name'][7], root['Age'][1], root['Age'][4], root['Age'][7]]}
Symmetric Difference If you are interested in the rows that are only in one of the dataframes but not both, you are looking for the set difference: pd.concat([df1,df2]).drop_duplicates(keep=False) ⚠️ Only works, if both dataframes do not contain any duplicates. Set Difference / Relational Algebra Difference If you are interested in the relational algebra difference / set difference, i.e. df1-df2 or df1\df2: pd.concat([df1,df2,df2]).drop_duplicates(keep=False) ⚠️ Only works, if both dataframes do not contain any duplicates.
Another possible solution is to use numpy broadcasting: df1[np.all(~np.all(df1.values == df2.values[:, None], axis=2), axis=0)] Output: Name Age 1 Mike 45 4 Marry 27 7 Bolt 39
Using the lambda function you can filter the rows with _merge value “left_only” to get all the rows in df1 which are missing from df2 df3 = df1.merge(df2, how = 'outer' ,indicator=True).loc[lambda x :x['_merge']=='left_only'] df
Try this one: df_new = df1.merge(df2, how='outer', indicator=True).query('_merge == "left_only"').drop('_merge', 1) It will result a new dataframe with the differences: the values that exist in df1 but not in df2.
Python Pandas: Forming a matrix (2D array) from the values in a dataframe (ignoring NaN values)
I have a dataframe with 12 columns (Drug Categories) - where identical values (drug category name) could appear across the different columns. DRG01 DRG02 ... DRG11 DRG12 0 AMOXYCILLIN ORAL SOLIDS AMOEBICIDES ORAL SOLIDS ... NaN NaN 1 VITAMIN DROPS NaN ... NaN NaN 2 AMOXYCILLIN ORAL SOLIDS ANTIHISTAMINES ORAL LIQ ... NaN NaN 3 AMOEBICIDES ORAL LIQUID NaN ... NaN NaN ... ... ... ... ... ... 81531 NaN NaN ... NaN NaN [81532 rows x 12 columns] My objective is to create a matrix (2D array) - with rows and columns consisting of the unique drug category names (ignoring/dropping the NaN values). The value of the cells would be the number of times these drug category names appear together in a row. Essentially I'm trying to achieve something as below: AMOXYCILLIN ORAL SOLIDS AMOEBICIDES ORAL SOLIDS ANTIHISTAMINES ORALLIQ VITAM.. AMOXYCILLIN ORAL SOLIDS 0 1 1 0 AMOEBICIDES ORAL SOLIDS 1 1 0 0 ANTIHISTAMINES ORAL LIQ 1 0 0 0 VITAMIN DROPS 0 0 0 1 ..... .....
like this? from collections import Counter from collections import defaultdict as dd import pandas as pd connection_counter = dd(lambda: Counter()) # count for every drug the time it appears with every other drug def to_counter(row): #send each row to the connection_counter and add a connection to each value in the row with all other drugs in row for drug_name in row: connection_counter[drug_name].update(row) connection_counter[drug_name].pop(drug_name,None) # so it won't count an appearance with itself df.apply(lambda x: to_counter(x), axis = 1) #df is the table you have df1 = pd.DataFrame() # the table you want for drug_name in connection_counter: df1 = df1.append(pd.DataFrame(connection_counter[drug_name],index = [drug_name]))
Using itertools.combinations and a few pandas function you can do it quite nicely: pairs_df = pd.DataFrame(df.apply(lambda x: pd.Series(map(sorted, combinations(x, 2))), axis=1).stack().to_list()) # pairs_df has a row for every pair of drugs (in columns 0, 1). pairs_df["occurrences"] = 1 pairs_df = pairs_df.groupby([0, 1]).sum() # Group identical combinations and count occurences. result_df = pairs_df.reset_index(level=1).pivot(columns=1) # Pivot to create the requested shape.
How can I create a new column in a pandas pivot table with only matching values of populated columns?
I have a pandas pivot table that lists individuals in rows and data sources across the columns. There are hundreds of individuals going down amongst the rows and hundreds of sources going across along the columns. Desired_Value Source_1 Source_2 Source_3 ... Source_50 person1 20 20 20 20 person2 5 5 5 5 person3 Review 3 4 4 4 ... person50 1 1 1 What I want to do is create the Desired_Value column above. I want to pull in a value so long as it matches across all values (ignoring blank fields). If values do not match I want to show Review. I use this pandas command to print my df to excel currently (without any Desired_Value column): df13 = df12.pivot_table(index='person', columns = 'source_name', values = 'actual_data', aggfunc='first') I'm new to Python so apologies if this is a silly question.
This is one method to do it: df = df13.copy() df = df.astype('Int64') # So NaN and Int values can coexist # Create new column at the front of the data frame df['Desired_Value'] = np.nan cols = df.columns.tolist() cols = cols[-1:] + cols[:-1] df = df[cols] # Loop over all rows and flag columns for review for idx, row in df.iterrows(): val = row.dropna().unique() if len(val) == 1: df.loc[idx, 'Desired_Value'] = val else: df.loc[idx, 'Desired_Value'] = 'Review' print(df) Desired_Value Source_1 Source_2 Source_3 Source_50 person1 20 20 20 NaN 20 person2 5 5 NaN 5 5 person3 Review 3 4 4 4 person50 1 1 NaN NaN 1