Convert a dataframe column into a list of object

Convert a dataframe column into a list of object - python

I am using pandas to read a CSV which contains a phone_number field (string), however, I need to convert this field into the below JSON format
[{'phone_number':'+01 373643222'}] and put it under a new column name called phone_numbers, how can I do that?
Searched online but the examples I found are converting the all the columns into JSON by using to_json() which is apparently cannot solve my case.
Below is an example
import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice'],
'phone_number': ['+1 569-483-2388', '+1 555-555-1212', '+1 432-867-5309']})

use map function like this
df["phone_numbers"] = df["phone_number"].map(lambda x: [{"phone_number": x}] )
display(df)

Related

ParserError: unable to convert txt file to df due to json format and delimiter being the same

Im fairly new dealing with .txt files that has a dictionary within it. Im trying to pd.read_csv and create a dataframe in pandas.I get thrown an error of Error tokenizing data. C error: Expected 4 fields in line 2, saw 11. I belive I found the root problem which is the file is difficult to read because each row contains a dict, whose key-value pairs are separated by commas in this case is the delimiter.
Data (store.txt)
id,name,storeid,report
11,JohnSmith,3221-123-555,{"Source":"online","FileFormat":0,"Isonline":true,"comment":"NAN","itemtrack":"110", "info": {"haircolor":"black", "age":53}, "itemsboughtid":[],"stolenitem":[{"item":"candy","code":1},{"item":"candy","code":1}]}
35,BillyDan,3221-123-555,{"Source":"letter","FileFormat":0,"Isonline":false,"comment":"this is the best store, hands down and i will surely be back...","itemtrack":"110", "info": {"haircolor":"black", "age":21},"itemsboughtid":[1,42,465,5],"stolenitem":[{"item":"shoe","code":2}]}
64,NickWalker,3221-123-555, {"Source":"letter","FileFormat":0,"Isonline":false, "comment":"we need this area to be fixed, so much stuff is everywhere and i do not like this one bit at all, never again...","itemtrack":"110", "info": {"haircolor":"red", "age":22},"itemsboughtid":[1,2],"stolenitem":[{"item":"sweater","code":11},{"item":"mask","code":221},{"item":"jack,jill","code":001}]}
How would I read this csv file and create new columns based on the key-values. In addition, what if there are more key-value in other data... for example > 11 keys within the dictionary.
Is there a an efficient way of create a df from the example above?
My code when trying to read as csv##
df = pd.read_csv('store.txt', header=None)
I tried to import json and user a converter but it do not work and converted all the commas to a |
`
import json
df = pd.read_csv('store.txt', converters={'report': json.loads}, header=0, sep="|")
In addition I also tried to use:
`
import pandas as pd
import json
df=pd.read_csv('store.txt', converters={'report':json.loads}, header=0, quotechar="'")
I also was thinking to add a quote at the begining of the dictionary and at the end to make it a string but thought that was too tedious to find the closing brackets.

I think adding quotes around the dictionaries is the right approach. You can use regex to do so and use a different quote character than " (I used § in my example):
from io import StringIO
import re
import json
with open("store.txt", "r") as f:
csv_content = re.sub(r"(\{.*})", r"§\1§", f.read())
df = pd.read_csv(StringIO(csv_content), skipinitialspace=True, quotechar="§", engine="python")
df_out = pd.concat([
df[["id", "name", "storeid"]],
pd.DataFrame(df["report"].apply(lambda x: json.loads(x)).values.tolist())
], axis=1)
print(df_out)
Note: the very last value in your csv isn't valid json: "code":001. It should either be "code":"001" or "code":1
Output:
id name storeid Source ... itemtrack info itemsboughtid stolenitem
0 11 JohnSmith 3221-123-555 online ... 110 {'haircolor': 'black', 'age': 53} [] [{'item': 'candy', 'code': 1}, {'item': 'candy...
1 35 BillyDan 3221-123-555 letter ... 110 {'haircolor': 'black', 'age': 21} [1, 42, 465, 5] [{'item': 'shoe', 'code': 2}]
2 64 NickWalker 3221-123-555 letter ... 110 {'haircolor': 'red', 'age': 22} [1, 2] [{'item': 'sweater', 'code': 11}, {'item': 'ma...

How to get rid of Series heading (column heading) using Pandas Library in Python

Using pandas Library, I made dictionaries that are nested in a list from file “german_words.csv”.
(for Info: “german_words.csv” is file with German words and corresponding English translated words)
german_words.csv (It's just sample, current file contains thousands of words):
Deutsch,English
Gedanken,thought
Stadt,city
Baum,tree
überqueren,cross
Bauernhof,farm
schwer,hard
Beginn,start
Macht,might
Geschichte,story
Säge,saw
weit,far
Meer,sea
Here's the code of that:
import pandas
import random
word_data = pandas.read_csv("./data/german_words.csv")
word_data_list = word_data.to_dict(orient="records")
print(random.choice(word_data_list))
And then printing random dictionary from that list.
list looks like this:
[{'Deutsch': 'Gedanken', 'English': 'thought'}, {'Deutsch': 'Stadt', 'English': 'city'}, {'Deutsch': 'Baum', 'English': 'tree'}, ....]
Here's the sample output:
{'Deutsch': 'Küste', 'English': 'coast'}
But the problem is, I don't want the column heading in the dictionaries.
I want these dictionaries in list as follows:
[{'Gedanken': 'thought'}, {'Stadt': 'city'}, {'Baum': 'tree'} ...]

Create Series by column Deutsch like index, select column English and then convert to dictionaries:
print (word_data.set_index('Deutsch')['English'].to_dict())
Or if only 2 columns DataFrame is possible use:
print (dict(word_data.to_numpy()))
EDIT: For list of dictionaries use:
print([{x["Deutsch"]: x["English"]} for x in word_data.to_dict(orient="records")])
[{'Gedanken': 'thought'}, {'Stadt': 'city'}, {'Baum': 'tree'},
{'überqueren': 'cross'}, {'Bauernhof': 'farm'}, {'schwer': 'hard'},
{'Beginn': 'start'}, {'Macht': 'might'}, {'Geschichte': 'story'},
{'Säge': 'saw'}, {'weit': 'far'}, {'Meer': 'sea'}]

import pandas as pd
word_data = pd.DataFrame(
data={
"Deutsch": ["Gedanken", "Stadt", "Baum"],
"English": ["thought", "city", "tree"],
}
)
print({d["Deutsch"]: d["English"] for d in word_data.to_dict(orient="records")})
# {'Gedanken': 'thought', 'Stadt': 'city', 'Baum': 'tree'}

how to open the dict like csv document in Python

So the problem is that it is a csv file and when I open it with pandas, it looks like this:
data=pd.read_csv('test.csv', sep=',', usecols=['properties'])
data.head()[![enter image description here][1]][1]
It is like a dictionary in each row, just confused how to open it correctly with gender, document_type, etc as columns
{'gender': 'Male', 'nationality': 'IRL', 'document_type': 'passport', 'date_of_expiry': '2019-08-12', 'issuing_country': 'IRL'}
{'gender': 'Female', 'document_type': 'driving_licence', 'date_of_expiry': '2023-02-28', 'issuing_country': 'GBR'}
{'gender': 'Male', 'nationality': 'ITA', 'document_type': 'passport', 'date_of_expiry': '2018-06-09', 'issuing_country': 'ITA'}

It looks like the cvs file is not properly formated to be read by the default funtion from pandas. You will need to create the columns your self.
data['gender'] = data['properties'].str[0].str['gender']
For each one of the fields in the dictionary you have.
If there are too many columns, you should consider evaluating the string. like this
import ast
my_dict = ast.literal_eval(df.loc[0]['properties')
for key in my_dict.keys():
data[key] = data['properties'].str[0].str[key]
This should build your DataFrame just fine

Import raw data from web page as a dataframe

I'm trying to import some data from a webpage into a dataframe.
Data: a block of text in the following format
[{"ID":0,"Name":"John","Location":"Chicago","Created":"2017-04-23"}, ... ]
I am successfully making the request to the server and can return the data in text form, but cannot seem to convert this to a DataFrame.
E.g
r = requests.get(url)
people = r.text
print(people)
So from this point, I am a bit confused on how to structure this text as a DataFrame. Most tutorials online seem to demonstrate importing csv, excel or html etc.

If people is a list of dict in string format, you can use json.loads to convert it to a list of dict and then create a DataFrame easily
>>> import json
>>> import pandas as pd
>>> people='[{"ID":0,"Name":"John","Location":"Chicago","Created":"2017-04-23"}]'
>>> json.loads(people)
[{'ID': 0, 'Name': 'John', 'Location': 'Chicago', 'Created': '2017-04-23'}]
>>>
>>> data=json.loads(people)
>>> pd.DataFrame(data)
Created ID Location Name
0 2017-04-23 0 Chicago John

How to aggregate after fetching result using groupby using itertools

I am having a list as
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
This list i have fetched using itertools groupby.
I want to form a list after adding up the quant, amt and current filed for same name and inv_name and create a list something like : [{'name':'xyz','inv_name':'asd','quant':500,'amt':22000,'current':33000}
Any suggestions on how to achieve this?

If you are happy using a 3rd party library, pandas accepts a list of dictionaries:
import pandas as pd
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},
{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
df = pd.DataFrame(a)
res = df.groupby(['name', 'inv_name'], as_index=False).sum().to_dict(orient='records')
# [{'amt': 22000,
# 'current': 33000,
# 'inv_name': 'asd',
# 'name': 'xyz',
# 'quant': 500}]

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Convert a dataframe column into a list of object - python

use map function like this df["phone_numbers"] = df["phone_number"].map(lambda x: [{"phone_number": x}] ) display(df)

Related

ParserError: unable to convert txt file to df due to json format and delimiter being the same

How to get rid of Series heading (column heading) using Pandas Library in Python

how to open the dict like csv document in Python

Import raw data from web page as a dataframe

How to aggregate after fetching result using groupby using itertools

Categories

Resources