Convert string formatted as Pandas DataFrame into an actual DataFrame - python

I am trying to convert a formatted string into a pandas data frame.
[['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
I am trying to use the pandas.DataFrame method to do so but the result is that this whole string is placed inside one element in the DataFrame.

Best approach would be to split the string with the '],[' delimeter and then convert to df.
import numpy as np
import pandas as pd
def stringToDF(s):
array = s.split('],[')
# Adjust the constructor parameters based on your string
df = pd.DataFrame(data=array,
#index=array[1:,0],
#columns=array[0,1:]
)
print(df)
return df
stringToDF(s)
Good luck!

Is this what you mean?
import pandas as pd
list_of_lists = [['CD_012','JM_022','PT_011','CD_012','JM_022','ST_049','MB_021','MB_021','CB_003'
,'FG_031','PC_004'],['NL_003','AM_006','MB_021'],
['JA_012','MB_021','MB_021','MB_021'],['JU_006'],
['FG_002','FG_002','CK_055','ST_049','NM_004','CD_012','OP_002','FG_002','FG_031',
'TG_005','SP_014'],['FG_002','FG_031'],['MD_010'],
['JA_012','MB_021','NL_003','MZ_020','MB_021'],['MB_021'],['PC_004'],
['MB_021','MB_021'],['AM_006','NM_004','TB_006','MB_021']]
result = pd.DataFrame({'result': list_of_lists})

Related

How do I extract the date from a column in a csv file using pandas?

This is the 'aired' column in the csv file:
as
Link to the csv file:
https://drive.google.com/file/d/1w7kIJ5O6XIStiimowC5TLsOCUEJxuy6x/view?usp=sharing
I want to extract the date and the month (in words) from the date following the 'from' word and store it in a separate column in another csv file. The 'from' is an obstruction since had it been just the date it would have been easily extracted as a timestamp format.
You are starting from a string and want to break out the data within it. The single quotes is a clue that this is a dict structure in string form. The Python standard libraries include the ast (Abstract Syntax Trees) module whose literal_eval method can read a string into a dict, gleaned from this SO answer: Convert a String representation of a Dictionary to a dictionary?
You want to apply that to your column to get the dict, at which point you expand it into separate columns using .apply(pd.Series), based on this SO answer: Splitting dictionary/list inside a Pandas Column into Separate Columns
Try the following
import pandas as pd
import ast
df = pd.read_csv('AnimeList.csv')
# turn the pd.Series of strings into a pd.Series of dicts
aired_dict = df['aired'].apply(ast.literal_eval)
# turn the pd.Series of dicts into a pd.Series of pd.Series objects
aired_df = aired_dict.apply(pd.Series)
# pandas automatically translates that into a pd.DataFrame
# concatenate the remainder of the dataframe with the new data
df_aired = pd.concat([df.drop(['aired'], axis=1), aired_df], axis=1)
# convert the date strings to datetime values
df_aired['aired_from'] = pd.to_datetime(df_aired['from'])
df_aired['aired_to'] = pd.to_datetime(df_aired['to'])
import pandas as pd
file = pd.read_csv('file.csv')
result = []
for cell in file['aired']:
date = cell[8:22]
date_ts = pd.to_datetime(date, format='%Y-%m-%d')
result.append((date_ts.month_name(), date_ts))
df = pd.DataFrame(result, columns=['month', 'date'])
df.to_csv('result_file.csv')

Python set to array and dataframe

Interpretation by a friendly editor:
I have data in the form of a set.
import numpy as n , pandas as p
s={12,34,78,100}
print(n.array(s))
print(p.DataFrame(s))
The above code converts the set without a problem into a numpy array.
But when I try to create a DataFrame from it I get the following error:
ValueError: DataFrame constructor not properly called!
So is there any way to convert a python set/nested set into a numpy array/dictionary so I can create a DataFrame from it?
Original Question:
I have a data in form of set .
Code
import numpy as n , pandas as p
s={12,34,78,100}
print(n.array(s))
print(p.DataFrame(s))
The above code returns same set for numpyarray and DataFrame constructor not called at o/p . So is there any way to convert python set , nested set into numpy array and dictionary ??
Pandas can't deal with sets (dicts are ok you can use p.DataFrame.from_dict(s) for those)
What you need to do is to convert your set into a list and then convert to DataFrame:
import pandas as pd
s = {12,34,78,100}
s = list(s)
print(pd.DataFrame(s))
You can use list(s):
import pandas as p
s = {12,34,78,100}
df = p.DataFrame(list(s))
print(df)
Why do you want to convert it to a list first? The DataFrame() method accepts data which can be iterable. Sets are iterable.
dataFrame = pandas.DataFrame(yourSet)
This will create a column header: "0" which you can rename it like so:
dataFrame.columns = ['columnName']
import numpy as n , pandas as p
s={12,34,78,100}
#Create DataFrame directly from set
df = p.DataFrame(s)
#Can also create a keys, values pair (dictionary) and then create Data Frame,
#it useful as key will be used as Column Header and values as data
df1 = p.DataFrame({'Values': data} for data in s)

Check Dataframe for certain string and return the column headers of the columns that string is found in

I have a dataframe that looks something like this:
Now I simply want to return the headers of the columns that have the string "worked" to a list.
So that in this case the list only includes lst=["OBE"]
You can obtain it like this:
import pandas as pd
import numpy as np
df = pd.DataFrame({'OBE': ['Worked', 'Worked', np.nan, 'Uploaded'],
'TDG': ['Uploaded']*4,
'TMA':[np.nan]*4, 'TMCZ': ['Uploaded']*4})
columns_with_worked = (df == 'Worked').any(axis=0)
columns_with_worked[columns_with_worked].index.tolist()
['OBE']
So the solutions construct a boolean Series of which columns contain the term "Worked". Then, we only get the portion of the series related to the true label, select the labels by invoking index and return that object as a list

Save pandas dataframe with numpy arrays column

Let us consider the following pandas dataframe:
df = pd.DataFrame([[1,np.array([6,7])],[4,np.array([8,9])]], columns = {'A','B'})
where the B column is composed by two numpy arrays.
If we save the dataframe and the load it again, the numpy array is converted into a string.
df.to_csv('test.csv', index = False)
df.read_csv('test.csv')
Is there any simple way of solve this problem? Here is the output of the loaded dataframe.
you can pickle the data instead.
df.to_pickle('test.csv')
df = pd.read_pickle('test.csv')
This will ensure that the format remains the same. However, it is not human readable
If human readability is an issue, I would recommend converting it to a json file
df.to_json('abc.json')
df = pd.read_json('abc.json')
Use the following function to format each row.
def formatting(string_numpy):
"""formatting : Conversion of String List to List
Args:
string_numpy (str)
Returns:
l (list): list of values
"""
list_values = string_numpy.split(", ")
list_values[0] = list_values[0][2:]
list_values[-1] = list_values[-1][:-2]
return list_values
Then use the following apply function to convert it back into numpy arrays.
df[col] = df.col.apply(formatting)

Transposing a pandas dataframe with multiple columns

I have a dataframe that currently looks like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-13','2017-03-14','2017-03-15'],'SP':[35.6,56.7,41,41],'1M':[-7.8,56,56,-3.4],'3M':[24,-31,53,5]}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','SP','1M','3M'])
print df
I would like to transponse in a way such that all the value fields get transposed to the Value Column and the date is appended as a row item. The column name of the value field becomes a row for the Description column. That is the resulting Dataframe should look like this:
import numpy as np
raw_data = {'Series_Date':['2017-03-10','2017-03-10','2017-03-10','2017-03-13','2017-03-13','2017-03-13','2017-03-14','2017-03-14','2017-03-14','2017-03-15','2017-03-15','2017-03-15'],'Value':[35.6,-7.8,24,56.7,56,-31,41,56,53,41,-3.4,5],'Desc':['SP','1M','3M','SP','1M','3M','SP','1M','3M','SP','1M','3M']}
import pandas as pd
df = pd.DataFrame(raw_data,columns=['Series_Date','Value','Desc'])
print df
Could someone please help how I can flip and transpose my DataFrame this way?
Use pd.melt to transform DF from a wide format to a long one:
idx = "Series_Date" # identifier variable
pd.melt(df, id_vars=idx, var_name="Desc").sort_values(idx).reset_index(drop=True)

Categories