How to split a column twice using Pandas?

How to split a column twice using Pandas? - python

I have a column {'duration': 0, 'is_incoming': False}
I want to fetch 0 and Falseout of this. How do I split it using Python (Pandas)?
I tried - data["additional_data"] = data["additional_data"].apply(lambda x :",".join(x.split(":")[:-1]))
I want two columns Duration and Incoming_Time
How do I do this?

You can try converting those string to actual dict:
from ast import literal_eval
Finally:
out=pd.DataFrame(df['additional_data'].astype(str).map(literal_eval).tolist())
Now if you print out you will get your expected output
If needed use join() method:
df=df.join(out)
Now if you print df you will get your expected result

If your column additional_data contains real dict / json, you can directly use the string accessor .str[] to get the dict values by keys, as follows:
data['Duration'] = data['additional_data].str['duration']
data['Incoming_Time'] = = data['additional_data].str['is_incoming']
If your column additional_data contains strings of dict (enclosing dict with a pair of single quotes or double quotes), you need to convert the string to dict first, by:
from ast import literal_eval
data['Duration'] = data['additional_data].map(literal_eval).str['duration']
data['Incoming_Time'] = data['additional_data].map(literal_eval).str['is_incoming']

Related

Convert JSON dictionary to Pandas Dataframe

I need to convert a JSON dictionary to a Pandas DataFrame, but the embedding is tripping me up.
Here is basically what the JSON dict looks like.
{
"report": "{'name':{'data':[{'key 1':'value 1','key 2':'value 2'},{'key 1':'value 1','key 2':'value 2'}]}}"
}
In the DataFrame, I want the keys to be the column headers and values in the rows below them.
The extra layer of embedding is throwing me off somewhat from all the usual methods of doing this.
One tricky part is 'name' will change each time I get this JSON dict, so I can't use an exact sting value for 'name'.

Your JSON looks a bit odd. It looks more like a Python dict converted to a string, so you can use ast.literal_eval (a built-in function) to convert it to a real dict, and then use pd.json_normalize to get it into a dataframe form:
import ast
j = ...
parsed_json = ast.literal_eval(j['report'])
df = pd.json_normalize(parsed_json, record_path=[list(parsed_json)[0], 'data'])
Output:
>>> df
key 1 key 2
0 value 1 value 2
1 value 1 value 2

The error suggests that you're trying to index the strings (because the value under report is a string) using another string.
You just need ast.literal_eval to parse the string and a DataFrame constructor. If the "name" is unknown, you can iterate over the dict.values after you parse the string.
import ast
out = pd.DataFrame(y for x in ast.literal_eval(your_data['report']).values() for y in x['data'])
Output:
key 1 key 2
0 value 1 value 2
1 value 1 value 2

How to convert a string type column to list type in pandas dataframe?

I have a dataframe as below,
df = pd.DataFrame({'URL_domains':[['wa.me','t.co','goo.gl','fb.com'],['tinyurl.com','bit.ly'],['test.in']]})
Here the column URL_Domains has got 2 observations with a list of Domains.
I would like to know the length of each observations URL domain list as:
df['len_of_url_list'] = df['URL_domains'].map(len)
and output as:
That is fine and no issues with above case and
In my case these list observations are treated string type as below:
When i execute the below code with string type URL domain it has shown the below output:
How to make a datatype conversion from string to list here in pandas ?

Use ast.literal_eval, because eval is bad practice:
import ast
df['len_of_url_list'] = df['URL_domains'].map(ast.literal_eval)

df["URL_domains"] = df["URL_domains"].apply(eval)

String manipulation pandas

I have a column named Timestamp of type str, and I would like to change the values of the column to a more appropriate format, i.e. 353 to 3:53 pm.
How can I do this using pandas or appropriate string manipulation?
c = pd.DataFrame({"Timestamp":x,"Latitude":y,"Longitude":z})
c.head()

This will also work:
from datetime import datetime
c['Timestamp'].apply(lambda x: datetime.strptime(x.rjust(4, '0'), '%H%M').strftime('%H:%M'))

You can call apply on the column and pass a function that will split each string and insert a colon:
c['Timestamp'].apply(lambda x: x[0:-2] + ':' + x[-2:])

(just wanted to get this in here)
As #ChrisA mentioned in the comments, you can also do this:
c['Timestamp'] = pd.to_datetime(c['Timestamp'], format='%H%M').dt.time

DataFrame to JSON format, using column value as value

I want to output the format of JSON, which is like:
{"553588913747808256":"rumour","524949003834634240":"rumour","553221281181859841":"rumour","580322346508124160":"non-rumour","544307417677189121":"rumour"}
Here, I have a df_prediction_with_id dataFrame and I set_index using the id_str:
df_prediction_with_id
rumor_or_not
id_str
552800070199148544 non-rumour
544388259359387648 non-rumour
552805970536333314 non-rumour
525071376084791297 rumour
498355319979143168 non-rumour
What I've tried is to use DataFrame.to_json.
json = df_prediction_with_id.to_json(orient='index')
What I've got is:
{"552813420136128513":{"rumor_or_not":"non-rumour"},"544340943965409281":{"rumor_or_not":"non-rumour"}}
Is there any way that I could directly use the value in the column as the value? Thanks.

You can simply select the column and call .to_json():
print(df_prediction_with_id["rumor_or_not"].to_json())
Prints:
{"552800070199148544":"non-rumour","544388259359387648":"non-rumour","552805970536333314":"non-rumour","525071376084791297":"rumour","498355319979143168":"non-rumour"}

Changing string values

I have the following list:
weather = ['sunny', 'foggy', 'cloudy', 22] that I would like to use it in a 'sche' of spark DataFrame in the follwoing way:
sche = "f'status_{weather[0]}_today' string, f'temprature_{weather[3]_today}' int"
So that at the end I get 2 columns in my new dataframe as following:
First column: status_sunny_today
Second column: temprature_22_today
but when I run the code it returns error and does not recognize the format in sche above. If I print just sche, it returns: f'status_{weather[0]}_today' string, f'temprature_{weather[3]_today}' int

This is the correct way to use format strings in Python:
sche = f"'status_{weather[0]}_today' string, 'temprature_{weather[3]_today}' int"
Put f before the whole string, not inside the string.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

How to split a column twice using Pandas? - python

Related

Convert JSON dictionary to Pandas Dataframe

How to convert a string type column to list type in pandas dataframe?

String manipulation pandas

DataFrame to JSON format, using column value as value

Changing string values

Categories

Resources