Python set to array and dataframe - python

Interpretation by a friendly editor:
I have data in the form of a set.
import numpy as n , pandas as p
s={12,34,78,100}
print(n.array(s))
print(p.DataFrame(s))
The above code converts the set without a problem into a numpy array.
But when I try to create a DataFrame from it I get the following error:
ValueError: DataFrame constructor not properly called!
So is there any way to convert a python set/nested set into a numpy array/dictionary so I can create a DataFrame from it?
Original Question:
I have a data in form of set .
Code
import numpy as n , pandas as p
s={12,34,78,100}
print(n.array(s))
print(p.DataFrame(s))
The above code returns same set for numpyarray and DataFrame constructor not called at o/p . So is there any way to convert python set , nested set into numpy array and dictionary ??

Pandas can't deal with sets (dicts are ok you can use p.DataFrame.from_dict(s) for those)
What you need to do is to convert your set into a list and then convert to DataFrame:
import pandas as pd
s = {12,34,78,100}
s = list(s)
print(pd.DataFrame(s))

You can use list(s):
import pandas as p
s = {12,34,78,100}
df = p.DataFrame(list(s))
print(df)

Why do you want to convert it to a list first? The DataFrame() method accepts data which can be iterable. Sets are iterable.
dataFrame = pandas.DataFrame(yourSet)
This will create a column header: "0" which you can rename it like so:
dataFrame.columns = ['columnName']

import numpy as n , pandas as p
s={12,34,78,100}
#Create DataFrame directly from set
df = p.DataFrame(s)
#Can also create a keys, values pair (dictionary) and then create Data Frame,
#it useful as key will be used as Column Header and values as data
df1 = p.DataFrame({'Values': data} for data in s)

Related

Finding the Length of a Pandas Dataframe Within a Function

The objective of the code below is to create another identical pandas dataframe, where all values are replaced with zero.
input numpy as np
import pandas as pd
#Given preexisting dataframe
len(df) #Returns 1502
def zeroCreator(data):
zeroFrame = pd.DataFrame(np.zeros(len(data),1))
return zeroFrame
print(zeroCreator(df)) #Returns a TypeError: data type not understood
How do I work around this TypeError?
Edit: Thank you for all your clarifications, it appears that I hadn't entered the dataframe parameters correctly into np.zeros (missing a pair of parentheses), although a simpler solution does exist.
Just clone a new df and assign 0 to it
zero_df = df.copy()
zero_df[:] = 0

Convert string formatted as Pandas DataFrame into an actual DataFrame

I am trying to convert a formatted string into a pandas data frame.
[['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
I am trying to use the pandas.DataFrame method to do so but the result is that this whole string is placed inside one element in the DataFrame.
Best approach would be to split the string with the '],[' delimeter and then convert to df.
import numpy as np
import pandas as pd
def stringToDF(s):
array = s.split('],[')
# Adjust the constructor parameters based on your string
df = pd.DataFrame(data=array,
#index=array[1:,0],
#columns=array[0,1:]
)
print(df)
return df
stringToDF(s)
Good luck!
Is this what you mean?
import pandas as pd
list_of_lists = [['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
result = pd.DataFrame({'result': list_of_lists})

Check Dataframe for certain string and return the column headers of the columns that string is found in

I have a dataframe that looks something like this:
Now I simply want to return the headers of the columns that have the string "worked" to a list.
So that in this case the list only includes lst=["OBE"]
You can obtain it like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'OBE': ['Worked', 'Worked', np.nan, 'Uploaded'],
'TDG': ['Uploaded']*4,
'TMA':[np.nan]*4, 'TMCZ': ['Uploaded']*4})
columns_with_worked = (df == 'Worked').any(axis=0)
columns_with_worked[columns_with_worked].index.tolist()
['OBE']
So the solutions construct a boolean Series of which columns contain the term "Worked". Then, we only get the portion of the series related to the true label, select the labels by invoking index and return that object as a list

Save pandas dataframe with numpy arrays column

Let us consider the following pandas dataframe:
df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'})
where the B column is composed by two numpy arrays.
If we save the dataframe and the load it again, the numpy array is converted into a string.
df.to_csv('test.csv', index = False)
df.read_csv('test.csv')
Is there any simple way of solve this problem? Here is the output of the loaded dataframe.
you can pickle the data instead.
df.to_pickle('test.csv')
df = pd.read_pickle('test.csv')
This will ensure that the format remains the same. However, it is not human readable
If human readability is an issue, I would recommend converting it to a json file
df.to_json('abc.json')
df = pd.read_json('abc.json')
Use the following function to format each row.
def formatting(string_numpy):
"""formatting : Conversion of String List to List
Args:
string_numpy (str)
Returns:
l (list): list of values
"""
list_values = string_numpy.split(", ")
list_values[0] = list_values[0][2:]
list_values[-1] = list_values[-1][:-2]
return list_values
Then use the following apply function to convert it back into numpy arrays.
df[col] = df.col.apply(formatting)

Python 3.4, error when creating a DataFrame with panda

I'm attempting to create a DataFrame with the following:
from pandas import DataFrame, read_csv
import matplotlib.pyplot as plt
import pandas as pd
import sys
# The inital set of baby names and birth rates
names =['Bob','Jessica','Mary','John','Mel']
births = [968, 155, 77, 578, 973]
#Now we wil zip them together
BabyDataSet = zip(names,births)
##we have to add the 'list' for version 3.x
print (list(BabyDataSet))
#create the DataFrame
df = DataFrame(BabyDataSet, columns = ['Names', 'Births'] )
print (df)
when I run the program I get the following error: 'data type can't be an iterator'
I read the following, 'What does the "yield" keyword do in Python?', but I do not understand how that applies to what I'm doing. Any help and further understanding would be greatly appreciated.
In python 3, zip returns an iterator, not a list like it does in python 2. Just convert it to a list as you construct the DataFrame, like this.
df = DataFrame(list(BabyDataSet), columns = ['Names', 'Births'] )
You can also create the dataframe using an alternate syntax that avoids the zip/generator issue entirely.
df = DataFrame({'Names': names, 'Births': births})
Read the documentation on initializing dataframes. Pandas simply takes the dictionary, creates one column for each entry with the key as the name and the value as the value.
Dict can contain Series, arrays, constants, or list-like objects

Categories