Convert a dataframe column into a list of object - python

I am using pandas to read a CSV which contains a phone_number field (string), however, I need to convert this field into the below JSON format
[{'phone_number':'+01 373643222'}] and put it under a new column name called phone_numbers, how can I do that?
Searched online but the examples I found are converting the all the columns into JSON by using to_json() which is apparently cannot solve my case.
Below is an example
import pandas as pd
df = pd.DataFrame({'user': ['Bob', 'Jane', 'Alice'],
'phone_number': ['+1 569-483-2388', '+1 555-555-1212', '+1 432-867-5309']})

use map function like this
df["phone_numbers"] = df["phone_number"].map(lambda x: [{"phone_number": x}] )
display(df)

Related

ParserError: unable to convert txt file to df due to json format and delimiter being the same

Im fairly new dealing with .txt files that has a dictionary within it. Im trying to pd.read_csv and create a dataframe in pandas.I get thrown an error of Error tokenizing data. C error: Expected 4 fields in line 2, saw 11. I belive I found the root problem which is the file is difficult to read because each row contains a dict, whose key-value pairs are separated by commas in this case is the delimiter.
Data (store.txt)
id,name,storeid,report
11,JohnSmith,3221-123-555,{"Source":"online","FileFormat":0,"Isonline":true,"comment":"NAN","itemtrack":"110", "info": {"haircolor":"black", "age":53}, "itemsboughtid":[],"stolenitem":[{"item":"candy","code":1},{"item":"candy","code":1}]}
35,BillyDan,3221-123-555,{"Source":"letter","FileFormat":0,"Isonline":false,"comment":"this is the best store, hands down and i will surely be back...","itemtrack":"110", "info": {"haircolor":"black", "age":21},"itemsboughtid":[1,42,465,5],"stolenitem":[{"item":"shoe","code":2}]}
64,NickWalker,3221-123-555, {"Source":"letter","FileFormat":0,"Isonline":false, "comment":"we need this area to be fixed, so much stuff is everywhere and i do not like this one bit at all, never again...","itemtrack":"110", "info": {"haircolor":"red", "age":22},"itemsboughtid":[1,2],"stolenitem":[{"item":"sweater","code":11},{"item":"mask","code":221},{"item":"jack,jill","code":001}]}
How would I read this csv file and create new columns based on the key-values. In addition, what if there are more key-value in other data... for example > 11 keys within the dictionary.
Is there a an efficient way of create a df from the example above?
My code when trying to read as csv##
df = pd.read_csv('store.txt', header=None)
I tried to import json and user a converter but it do not work and converted all the commas to a |
`
import json
df = pd.read_csv('store.txt', converters={'report': json.loads}, header=0, sep="|")
In addition I also tried to use:
`
import pandas as pd
import json
df=pd.read_csv('store.txt', converters={'report':json.loads}, header=0, quotechar="'")
I also was thinking to add a quote at the begining of the dictionary and at the end to make it a string but thought that was too tedious to find the closing brackets.
I think adding quotes around the dictionaries is the right approach. You can use regex to do so and use a different quote character than " (I used § in my example):
from io import StringIO
import re
import json
with open("store.txt", "r") as f:
csv_content = re.sub(r"(\{.*})", r"§\1§", f.read())
df = pd.read_csv(StringIO(csv_content), skipinitialspace=True, quotechar="§", engine="python")
df_out = pd.concat([
df[["id", "name", "storeid"]],
pd.DataFrame(df["report"].apply(lambda x: json.loads(x)).values.tolist())
], axis=1)
print(df_out)
Note: the very last value in your csv isn't valid json: "code":001. It should either be "code":"001" or "code":1
Output:
id name storeid Source ... itemtrack info itemsboughtid stolenitem
0 11 JohnSmith 3221-123-555 online ... 110 {'haircolor': 'black', 'age': 53} [] [{'item': 'candy', 'code': 1}, {'item': 'candy...
1 35 BillyDan 3221-123-555 letter ... 110 {'haircolor': 'black', 'age': 21} [1, 42, 465, 5] [{'item': 'shoe', 'code': 2}]
2 64 NickWalker 3221-123-555 letter ... 110 {'haircolor': 'red', 'age': 22} [1, 2] [{'item': 'sweater', 'code': 11}, {'item': 'ma...

How to get rid of Series heading (column heading) using Pandas Library in Python

Using pandas Library, I made dictionaries that are nested in a list from file “german_words.csv”.
(for Info: “german_words.csv” is file with German words and corresponding English translated words)
german_words.csv (It's just sample, current file contains thousands of words):
Deutsch,English
Gedanken,thought
Stadt,city
Baum,tree
überqueren,cross
Bauernhof,farm
schwer,hard
Beginn,start
Macht,might
Geschichte,story
Säge,saw
weit,far
Meer,sea
Here's the code of that:
import pandas
import random
word_data = pandas.read_csv("./data/german_words.csv")
word_data_list = word_data.to_dict(orient="records")
print(random.choice(word_data_list))
And then printing random dictionary from that list.
list looks like this:
[{'Deutsch': 'Gedanken', 'English': 'thought'}, {'Deutsch': 'Stadt', 'English': 'city'}, {'Deutsch': 'Baum', 'English': 'tree'}, ....]
Here's the sample output:
{'Deutsch': 'Küste', 'English': 'coast'}
But the problem is, I don't want the column heading in the dictionaries.
I want these dictionaries in list as follows:
[{'Gedanken': 'thought'}, {'Stadt': 'city'}, {'Baum': 'tree'} ...]
Create Series by column Deutsch like index, select column English and then convert to dictionaries:
print (word_data.set_index('Deutsch')['English'].to_dict())
Or if only 2 columns DataFrame is possible use:
print (dict(word_data.to_numpy()))
EDIT: For list of dictionaries use:
print([{x["Deutsch"]: x["English"]} for x in word_data.to_dict(orient="records")])
[{'Gedanken': 'thought'}, {'Stadt': 'city'}, {'Baum': 'tree'},
{'überqueren': 'cross'}, {'Bauernhof': 'farm'}, {'schwer': 'hard'},
{'Beginn': 'start'}, {'Macht': 'might'}, {'Geschichte': 'story'},
{'Säge': 'saw'}, {'weit': 'far'}, {'Meer': 'sea'}]
import pandas as pd
word_data = pd.DataFrame(
data={
"Deutsch": ["Gedanken", "Stadt", "Baum"],
"English": ["thought", "city", "tree"],
}
)
print({d["Deutsch"]: d["English"] for d in word_data.to_dict(orient="records")})
# {'Gedanken': 'thought', 'Stadt': 'city', 'Baum': 'tree'}

how to open the dict like csv document in Python

So the problem is that it is a csv file and when I open it with pandas, it looks like this:
data=pd.read_csv('test.csv', sep=',', usecols=['properties'])
data.head()[![enter image description here][1]][1]
It is like a dictionary in each row, just confused how to open it correctly with gender, document_type, etc as columns
{'gender': 'Male', 'nationality': 'IRL', 'document_type': 'passport', 'date_of_expiry': '2019-08-12', 'issuing_country': 'IRL'}
{'gender': 'Female', 'document_type': 'driving_licence', 'date_of_expiry': '2023-02-28', 'issuing_country': 'GBR'}
{'gender': 'Male', 'nationality': 'ITA', 'document_type': 'passport', 'date_of_expiry': '2018-06-09', 'issuing_country': 'ITA'}
It looks like the cvs file is not properly formated to be read by the default funtion from pandas. You will need to create the columns your self.
data['gender'] = data['properties'].str[0].str['gender']
For each one of the fields in the dictionary you have.
If there are too many columns, you should consider evaluating the string. like this
import ast
my_dict = ast.literal_eval(df.loc[0]['properties')
for key in my_dict.keys():
data[key] = data['properties'].str[0].str[key]
This should build your DataFrame just fine

Import raw data from web page as a dataframe

I'm trying to import some data from a webpage into a dataframe.
Data: a block of text in the following format
[{"ID":0,"Name":"John","Location":"Chicago","Created":"2017-04-23"}, ... ]
I am successfully making the request to the server and can return the data in text form, but cannot seem to convert this to a DataFrame.
E.g
r = requests.get(url)
people = r.text
print(people)
So from this point, I am a bit confused on how to structure this text as a DataFrame. Most tutorials online seem to demonstrate importing csv, excel or html etc.
If people is a list of dict in string format, you can use json.loads to convert it to a list of dict and then create a DataFrame easily
>>> import json
>>> import pandas as pd
>>> people='[{"ID":0,"Name":"John","Location":"Chicago","Created":"2017-04-23"}]'
>>> json.loads(people)
[{'ID': 0, 'Name': 'John', 'Location': 'Chicago', 'Created': '2017-04-23'}]
>>>
>>> data=json.loads(people)
>>> pd.DataFrame(data)
Created ID Location Name
0 2017-04-23 0 Chicago John

How to aggregate after fetching result using groupby using itertools

I am having a list as
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
This list i have fetched using itertools groupby.
I want to form a list after adding up the quant, amt and current filed for same name and inv_name and create a list something like : [{'name':'xyz','inv_name':'asd','quant':500,'amt':22000,'current':33000}
Any suggestions on how to achieve this?
If you are happy using a 3rd party library, pandas accepts a list of dictionaries:
import pandas as pd
a=[{'name': 'xyz','inv_name':'asd','quant':300,'amt':20000, 'current':30000},
{'name': 'xyz','inv_name':'asd','quant':200,'amt':2000,'current':3000}]
df = pd.DataFrame(a)
res = df.groupby(['name', 'inv_name'], as_index=False).sum().to_dict(orient='records')
# [{'amt': 22000,
# 'current': 33000,
# 'inv_name': 'asd',
# 'name': 'xyz',
# 'quant': 500}]

Categories