I have a dataframe:
dataframe = pd.DataFrame()
dataframe['column'] = [10,20,30,40]
I want to effectively duplicate each element 3 times so it becomes the equivalent of:
dataframe['column'] = [10,10,10,20,20,20,30,30,30,40,40,40]
I need a solution that will work for a df of any size.I also need the index to stay 1,2,3,4 etc..
magic_list = [10,20,30,40]
dataframe['column'] = [x for y in [[a for i in range(3)] for a in magic_list] for x in y]
Related
I have converted data frame rows to lists and in those list there are NaN values which I would like to remove.
This is my attempt but the NaN values are not removed
import pandas as pd
df = pd.read_excel('MainFile.xlsx', dtype = str)
df_list = df.values.tolist()
print(df_list)
print('=' * 50)
for l in df_list:
newlist = [x for x in l if x != 'nan']
print(newlist)
Here's a snapshot of the original data
I could find a solution using these lines (but I welcome any ideas)
for l in df_list:
newlist = [x for x in l if x == x]
print(newlist)
It is not working because you are trying to compare it to string 'nan'.
If excel cell is empty it is returned as NaN value in pandas.
You can use numpy library, to compare it with NaN:
import numpy as np
for l in df_list:
newlist = [x for x in l if x != np.nan]
print(newlist)
EDIT:
If you want to get all values from the dataframe, which are not NaN, you can just do:
df.stack().tolist()
If you want to print values with the loop (as in your example), you can do:
for l in df.columns:
print(list(df[l][df[l].notna()]))
To create nested list with a loop:
main = []
for l in df.T.columns:
new_list = list(df.T[l][df.T[l].notna()])
main.append(new_list)
print(main)
You can always try the approach that is proposed here:
import numpy as np
newlist = [x for x in df_list if np.isnan(x) == False]
print(newlist)
I hope that this will help.
I have a function:
def func(x):
y = pd.read_csv(x)
return y
In case i have to loop this function for severeal inputs i am expected to have combined dataframe from all those inputs as output:
list = ["a.csv", "b.csv", "d.csv", "e.csv"]
for i in list:
m = func(x)
How can i get value of m as combined dataframes from all the input files?
For combine DataFrames use concat in list comprehension:
Notice: Dont use list like variable, because python code word.
L = ["a.csv", "b.csv", "d.csv", "e.csv"]
df1 = pd.concat([func(x) for i in L])
Or in loop:
out = []
for i in L:
m = func(x)
out.append(m)
df1 = pd.concat(out)
I have a df something like this:
lst = [[30029509,37337567,41511334,41511334,41511334]]
lst2 = [35619048]
lst3 = [[41511334,37337567,41511334]]
lst4 = [[37337567,41511334]]
df = pd.DataFrame()
df['0'] = lst, lst2, lst3, lst4
I need to count how many times there is a '41511334' in every column
I do this code:
df['new'] = '41511334' in str(df['0'])
And I got True in every column's row, but it's a mistake for second line.
What's wrong?
Thanks
str(df['0']) gives a string representation of column 0 and so includes all the data. You will then see that
'41511334' in str(df['0'])
gives True, and you assign this to every row of the 'new' column. You are looking for something like
df['new'] = df['0'].apply(lambda x: '41511334' in str(x))
or
df['new'] = df['0'].astype(str).str.contains('41511334')
I have the following list of combinations:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
I want to create a new combination list from these two lists which follow the following criteria:
A. List a and List b have common elements.
The second element of the tuple in list a equals the first element of the
Tuple in the list b.
Combination of List a and List b should form a new combination:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)]
All other tuples in both lists which do not satisfy the criteria get ignored.
QUESTIONS
Can I do this criteria based combination using itertools?
What is the most elegant way to solve this problem with/without using modules?
How can one display the output in excel sheet to display each element of a tuple in list d to a separate column such as:
d = [(1,10,21),(2,8,28),(2,8,15),(300,28,34)] is displayed in excel as:
Col A = [1, 2, 2, 300]
Col B = [10,8,8,28]
Col C = [21,28,15,34]
pandas works like a charm for excel.
Here is the code:
a = [(1,10),(2,8),(300,28),(413,212)]
b = [(8,28), (8,15),(10,21),(28,34),(413,12)]
c = [(x, y, t) for x, y in a for z, t in b if y == z]
import pandas as pd
df = pd.DataFrame(c)
df.to_excel('MyFile.xlsx', header=False, index=False)
Suppose I have three lists where one contains NaN's (I think they're 'NaNs', they get printed as '--' from a previous masked array operation):
a = [1,2,3,4,5]
b = [6,7,--,9,--]
c = [6,7,8,9,10]
I'd like to perform an operation that iterates through b, and deletes the indexes from all lists where b[i]=NaN. I'm thinking something like this:
for i in range(0,len(b):
if b[i] = NaN:
del.a[i] etc
b is generated from from masking c under some condition earlier on in my code, something like this:
b = np.ma.MaskedArray(c, condition)
Thanks!
This is easy to do using numpy:
import numpy as np
a = np.array([1,2,3,4,5])
b = np.array([6,7,np.NaN,9,np.NaN])
c = np.array([6,7,8,9,10])
where_are_nans = np.isnan(b)
filtered_array = a[~where_are_nans] #note the ~ negation
print(filtered_array)
And as you can easily see it returns:
[1 2 4]