The objective of the code below is to create another identical pandas dataframe, where all values are replaced with zero.
input numpy as np
import pandas as pd
#Given preexisting dataframe
len(df) #Returns 1502
def zeroCreator(data):
zeroFrame = pd.DataFrame(np.zeros(len(data),1))
return zeroFrame
print(zeroCreator(df)) #Returns a TypeError: data type not understood
How do I work around this TypeError?
Edit: Thank you for all your clarifications, it appears that I hadn't entered the dataframe parameters correctly into np.zeros (missing a pair of parentheses), although a simpler solution does exist.
Just clone a new df and assign 0 to it
zero_df = df.copy()
zero_df[:] = 0
I am trying to convert a formatted string into a pandas data frame.
[['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
I am trying to use the pandas.DataFrame method to do so but the result is that this whole string is placed inside one element in the DataFrame.
Best approach would be to split the string with the '],[' delimeter and then convert to df.
import numpy as np
import pandas as pd
def stringToDF(s):
array = s.split('],[')
# Adjust the constructor parameters based on your string
df = pd.DataFrame(data=array,
#index=array[1:,0],
#columns=array[0,1:]
)
print(df)
return df
stringToDF(s)
Good luck!
Is this what you mean?
import pandas as pd
list_of_lists = [['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
result = pd.DataFrame({'result': list_of_lists})
I have a dataframe that looks something like this:
Now I simply want to return the headers of the columns that have the string "worked" to a list.
So that in this case the list only includes lst=["OBE"]
You can obtain it like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'OBE': ['Worked', 'Worked', np.nan, 'Uploaded'],
'TDG': ['Uploaded']*4,
'TMA':[np.nan]*4, 'TMCZ': ['Uploaded']*4})
columns_with_worked = (df == 'Worked').any(axis=0)
columns_with_worked[columns_with_worked].index.tolist()
['OBE']
So the solutions construct a boolean Series of which columns contain the term "Worked". Then, we only get the portion of the series related to the true label, select the labels by invoking index and return that object as a list
Let us consider the following pandas dataframe:
df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'})
where the B column is composed by two numpy arrays.
If we save the dataframe and the load it again, the numpy array is converted into a string.
df.to_csv('test.csv', index = False)
df.read_csv('test.csv')
Is there any simple way of solve this problem? Here is the output of the loaded dataframe.
you can pickle the data instead.
df.to_pickle('test.csv')
df = pd.read_pickle('test.csv')
This will ensure that the format remains the same. However, it is not human readable
If human readability is an issue, I would recommend converting it to a json file
df.to_json('abc.json')
df = pd.read_json('abc.json')
Use the following function to format each row.
def formatting(string_numpy):
"""formatting : Conversion of String List to List
Args:
string_numpy (str)
Returns:
l (list): list of values
"""
list_values = string_numpy.split(", ")
list_values[0] = list_values[0][2:]
list_values[-1] = list_values[-1][:-2]
return list_values
Then use the following apply function to convert it back into numpy arrays.
df[col] = df.col.apply(formatting)
I'm attempting to create a DataFrame with the following:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
# The inital set of baby names and birth rates
names =['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]
#Now we wil zip them together
BabyDataSet = zip(names,births)
##we have to add the 'list' for version 3.x
print (list(BabyDataSet))
#create the DataFrame
df = DataFrame(BabyDataSet, columns = ['Names', 'Births'] )
print (df)
when I run the program I get the following error: 'data type can't be an iterator'
I read the following, 'What does the "yield" keyword do in Python?', but I do not understand how that applies to what I'm doing. Any help and further understanding would be greatly appreciated.
In python 3, zip returns an iterator, not a list like it does in python 2. Just convert it to a list as you construct the DataFrame, like this.
df = DataFrame(list(BabyDataSet), columns = ['Names', 'Births'] )
You can also create the dataframe using an alternate syntax that avoids the zip/generator issue entirely.
df = DataFrame({'Names': names, 'Births': births})
Read the documentation on initializing dataframes. Pandas simply takes the dictionary, creates one column for each entry with the key as the name and the value as the value.
Dict can contain Series, arrays, constants, or list-like objects