(Python) How to only keep specific part of cells in a dataframe

(Python) How to only keep specific part of cells in a dataframe - python

I want to cleanup a source column of my dataframe. At the end I only want to keep the part behind 'name'.
What is the best way to do this?
For example:
row 1, column 1:
{'id': 'rtl-nieuws', 'name': 'RTL Nieuws'}
row 2, column 1:
{'id': 'none', 'name': 'www.ad.nl'}
Desired outcome:
row 1, column 1:
RTL Nieuws
row 2, column 1:
www.ad.nl

Is this what you are trying to do? In the future, please consider giving a working example to solve the request from.
data = pd.DataFrame({
"id": ["rtl-nieuws", "none"],
"name": ["RTL Nieuws", "www.ad.nl"]
}, index=[0,1])
data.drop("id", axis = 1)
# name
# 0 RTL Nieuws
# 1 www.ad.nl

Considering your data seems to be in the format of a dictionary, you can use ast.literal_eval() to access the value at the 'name' key.
import ast
current_cell = "{'id': 'rtl-nieuws', 'name': 'RTL Nieuws'}"
name = ast.literal_eval(current_cell)['name']
print(name)
>>> RTL Nieuws

Related

Splitting at specific string from a dataframe column in Python

I have a dataframe with a column called "Spl" with the values below: I am trying to extract the values next to 'name': strings (some rows have multiple values) but I see the new column generated with the specific location of the memory. I used the below code to extract. Any help how to extract the values after "name:" string is much appreciated.
Column values:
'name': 'Chirotherapie', 'name': 'Innen Medizin'
'name': 'Manuelle Medizin'
'name': 'Akupunktur', 'name': 'Chirotherapie', 'name': 'Innen Medizin'
Code:
df['Spl'] = lambda x: len(x['Spl'].str.split("'name':"))
Output:
<function <lambda> at 0x0000027BF8F68940>

Just simply do:-
df['Spl']=df['Spl'].str.split("'name':").str.len()

Just do count
df['Spl'] = df['Spl'].str.count("'name':")+1

how to normalize this below json using panda in django

using this view.py query my output is showing something like this. you can see in choices field there are multiple array so i can normalize in serial wise here is my json
{"pages":[{"name":"page1","title":"SurveyWindow Pvt. Ltd. Customer Feedback","description":"Question marked * are compulsory.",
"elements":[{"type":"radiogroup","name":"question1","title":"Do you like our product? *","isRequired":true,
"choices":[{"value":"Yes","text":"Yes"},{"value":"No","text":"No"}]},{"type":"checkbox","name":"question2","title":"Please Rate Our PM Skill","isRequired":false,"choices":[{"value":"High","text":"High"},{"value":"Low","text":"Low"},{"value":"Medium","text":"Medium"}]},{"type":"radiogroup","name":"question3","title":"Do you like our services? *","isRequired":true,"choices":[{"value":"Yes","text":"Yes"},{"value":"No","text":"No"}]}]}]}
this is my view.py
jsondata=SurveyMaster.objects.all().filter(survey_id='1H2711202014572740')
q = jsondata.values('survey_json_design')
qs_json = pd.DataFrame.from_records(q)
datatotable = pd.json_normalize(qs_json['survey_json_design'], record_path=['pages','elements'])
qs_json = datatotable.to_html()

Based on your comments and picture here's what I would do to go from the picture to something more SQL-friendly (what you refer to as "normalization"), but keep in mind this might blow up if you don't have sufficient memory.
Create a new list which you'll fill with the new data, then iterate over the pandas table's rows, and then over every item in your list. For every iteration in the inner loop use the data from the row (minus the column you're iteration over). For convenience I added it as the last element.
# Example data
df = pd.DataFrame({"choices": [[{"text": "yes", "value": "yes"},
{"text": "no", "value": "no"}],
[{"ch1": 1, "ch2": 2}, {"ch3": "ch3"}]],
"name": ["kostas", "rajesh"]})
data = []
for i, row in df.iterrows():
for val in row["choices"]:
data.append((*row.drop("choices").values, val))
df = pd.DataFrame(data, columns=["names", "choices"])
print(df)
names choices
0 kostas {'text': 'yes', 'value': 'yes'}
1 kostas {'text': 'no', 'value': 'no'}
2 george {'ch1': 1, 'ch2': 2}
3 george {'ch3': 'ch3'}
This is where I guess you want to go. All that's left is to just modify the column / variable names with your own data.

Parse one string column in dataframe column into many other columns

I have a column in a pandas data frame that contains string like the following format as for example
fullyRandom=true+mapSizeDividedBy64=51048
mapSizeDividedBy16000=9756+fullyRandom=false
qType=MpmcArrayQueue+qCapacity=822398+burstSize=664
count=11087+mySeed=2+maxLength=9490
capacity=27281
capacity=79882
we can read for example the first row as 2 parameters separated by '+' each parameter has a value, that clear by '=' that separate between the parameter and its value.
in Output, I'm asking if there is a python script that either extract the parameters we retrieve a list of unique parameters like the following
[fullyRandom,mapSizeDividedBy64,mapSizeDividedBy64,qType,qCapacity,qCapacity, count,mySeed,maxLength,Capacity]
Notice from the previous list that it contains only the unique parameters without its values
Or extended pandas data frame if it's not too difficult if we can parse the following column and convert into many columns, each column is for one parameter that store it's value in it

Try this, it will store the values in a list.
data = []
with open('<your text file>', 'r') as file:
content = file.readlines()
for row in content:
if '+' in row:
sub_row = row.strip('\n').split('+')
for r in sub_row:
data.append(r)
else:
data.append(row.strip('\n'))
print(data)
Output:
['fullyRandom=true', 'mapSizeDividedBy64=51048', 'mapSizeDividedBy16000=9756', 'fullyRandom=false', 'qType=MpmcArrayQueue', 'qCapacity=822398', 'burstSize=664', 'count=11087', 'mySeed=2', 'maxLength=9490', 'capacity=27281', 'capacity=79882']
to convert to a list of dict that could be used in pandas:
dict_list = []
for item in data:
df = {
item.split('=')[0]: item.split('=')[1]
}
dict_list.append(df)
print(dict_list)
Output:
[{'fullyRandom': 'true'}, {'mapSizeDividedBy64': '51048'}, {'mapSizeDividedBy16000': '9756'}, {'fullyRandom': 'false'}, {'qType': 'MpmcArrayQueue'}, {'qCapacity': '822398'}, {'burstSize': '664'}, {'count': '11087'}, {'mySeed': '2'}, {'maxLength': '9490'}, {'capacity': '27281'}, {'capacity': '79882'}]
To just get the headers:
dict_list.append(item.split('=')[0])
Output:
['fullyRandom', 'mapSizeDividedBy64', 'mapSizeDividedBy16000', 'fullyRandom', 'qType', 'qCapacity', 'burstSize', 'count', 'mySeed', 'maxLength', 'capacity', 'capacity']

How to select a value from a DataFrame (even if there is no entry)

I'm trying to select an entry from a DataFrame structure based on some conditions. It works - as long as there is an entry that fits to the query (first example in code shown below)
But sometimes, there is no proper entry, so the return value should be an empty string (not an error message). This I cannot achieve.
from pandas import DataFrame
data = {'ID': [1, 1, 2, 2],
'Key': ['Key1','Key2','Key1','Key2'],
'Value': ['Value1', 'Value2', 'Value3', 'Value4']
}
df = DataFrame(data,columns= ['ID', 'Key','Value'])
print(df.loc[(df['ID']==1) & (df['Key'] == 'Key2')].iloc[-1,-1])
# shows "Value2" (correct)
print(df.loc[(df['ID']==1) & (df['Key'] == 'Key3')].iloc[-1,-1])
# should show "" (desired output)
Thanks in advance.

The following does what you are looking for:
keyname = 'Key2' # Key3
try:
print(df.loc[(df['ID']==1) & (df['Key'] == keyname)].iloc[-1,-1])
except IndexError:
print("")

Trying this a bit more explicitly using a mask that you can inspect.
mask = (df['ID']==1) & (df['Key'] == 'Key3')
if mask.any():
print(df[mask].Value.iloc[0])
else:
print("")

Extract JSON | API | Pandas DataFrame

I am using the Facebook API (v2.10) to which I've extracted the data I need, 95% of which is perfect. My problem is the 'actions' metric which returns as a dictionary within a list within another dictionary.
At present, all the data is in a DataFrame, however, the 'actions' column is a list of dictionaries that contain each individual action for that day.
{
"actions": [
{
"action_type": "offsite_conversion.custom.xxxxxxxxxxx",
"value": "7"
},
{
"action_type": "offsite_conversion.custom.xxxxxxxxxxx",
"value": "3"
},
{
"action_type": "offsite_conversion.custom.xxxxxxxxxxx",
"value": "144"
},
{
"action_type": "offsite_conversion.custom.xxxxxxxxxxx",
"value": "34"
}]}
All this appears in one cell (row) within the DataFrame.
What is the best way to:
Get the action type, create a new column and use the Use "action_type" as the column name?
List the correct value under this column
It looks like JSON but when I look at the type, it's a panda series (stored as an object).
For those willing to help (thank you, I greatly appreciate it) - can you either point me in the direction of the right material and I will read it and work it out on my own (I'm not entirely sure what to look for) or if you decide this is an easy problem, explain to me how and why you solved it this way. Don't just want the answer
I have tried the following (with help from a friend) and it kind of works, but I have issues with this running in my script. IE: if it runs within a bigger code block, I get the following error:
for i in range(df.shape[0]):
line = df.loc[i, 'Conversions']
L = ast.literal_eval(line)
for l in L:
cid = l['action_type']
value = l['value']
df.loc[i, cid] = value
If I save the DF as a csv, call it using pd.read_csv...it executes properly, but not within the script. No idea why.
Error:
ValueError: malformed node or string: [{'value': '1', 'action_type': 'offsite_conversion.custom.xxxxx}]
Any help would be greatly appreciated.
Thanks,
Adrian

You can use json_normalize:
In [11]: d # e.g. dict from json.load OR instead pass the json path to json_normalize
Out[11]:
{'actions': [{'action_type': 'offsite_conversion.custom.xxxxxxxxxxx',
'value': '7'},
{'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '3'},
{'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '144'},
{'action_type': 'offsite_conversion.custom.xxxxxxxxxxx', 'value': '34'}]}
In [12]: pd.io.json.json_normalize(d, record_path="actions")
Out[12]:
action_type value
0 offsite_conversion.custom.xxxxxxxxxxx 7
1 offsite_conversion.custom.xxxxxxxxxxx 3
2 offsite_conversion.custom.xxxxxxxxxxx 144
3 offsite_conversion.custom.xxxxxxxxxxx 34

You can use df.join(pd.DataFrame(df['Conversions'].tolist()).pivot(columns='action_type', values='value').reset_index(drop=True)).
Explanation:
df['Conversions'].tolist() returns a list of dictionaries. This list is then transformed into a DataFrame using pd.DataFrame. Then, you can use the pivot function to pivot the table into the shape that you want.
Lastly, you can join the table with your original DataFrame. Note that this only works if you DataFrame's index is the default (i.e., integers starting from 0). If this is not the case, you can do this instead:
df2 = pd.DataFrame(df['Conversions'].tolist()).pivot(columns='action_type', values='value').reset_index(drop=True)
for col in df2.columns:
df[col] = df2[col]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

(Python) How to only keep specific part of cells in a dataframe - python

Is this what you are trying to do? In the future, please consider giving a working example to solve the request from. data = pd.DataFrame({ "id": ["rtl-nieuws", "none"], "name": ["RTL Nieuws", "www.ad.nl"] }, index=[0,1]) data.drop("id", axis = 1) # name # 0 RTL Nieuws # 1 www.ad.nl

Considering your data seems to be in the format of a dictionary, you can use ast.literal_eval() to access the value at the 'name' key. import ast current_cell = "{'id': 'rtl-nieuws', 'name': 'RTL Nieuws'}" name = ast.literal_eval(current_cell)['name'] print(name) >>> RTL Nieuws

Related

Splitting at specific string from a dataframe column in Python

how to normalize this below json using panda in django

Parse one string column in dataframe column into many other columns

How to select a value from a DataFrame (even if there is no entry)

Extract JSON | API | Pandas DataFrame

Categories

Resources