Read CSV with OOP Python - python
i'm new to OOP in Python. So, I want to read CSV files using OOP.
I have a CSV file with 5 columns separated by a comma.
I want to read that CSV file, with each column stored in a table of the new dataframe.
So, suppose I have data like these:
1434,"2021-08-13 06:31:59",unread,082196788998,kuse.hamdy#gmail.com
1433,"2021-08-13 06:09:41",unread,081554220007,ritaambarwati1#umsida.ac.id
1432,"2021-08-13 05:35:07",unread,081911075017,rifqinaufalfayyadh#gmail.com
I want the OOP code to read that CSV file and store it to a new table like these:
id date status number email
1434 2021-08-13 06:31:59 unread 089296788998 kuse.hamdy#gmail.com
1433 2021-08-13 06:09:41 unread 081554271927 ritati1#yahoo.com
1432 2021-08-13 05:35:07 unread 081911075017 rifqinaufalfayyadh#gmail.com
I tried this code:
import csv
class Complete_list:
def __init__(self, row, header, list_):
self.__dict__ = dict(zip(header, row))
self.list_ = list_
def __repr__(self):
return self.list_
data = list(csv.reader(open("complete_list.csv")))
instances = [Complete_list(a, data[1], "date_{}".format(i+1)) for i, a in enumerate(data[1:])]
instances = list(instances)
for i in instances:
j = i.list_.split(',')
print(j)
Somehow, I could not access the value of each list separated by the comma and put it into a new dataframe with multiple columns. Instead, I got the result like this:
['date_1']
['date_2']
['date_3']
To be honest , you are better of using libraies like pandas. but this is how i would approach it.
class complete_list:
def __init__(self, path, header=None):
self.data = path
self.header= header
def read(self):
with open(self.data, 'r') as f:
data = [x.split(',') for x in f.readlines()]
return data
def printer(self):
if self.header:
a,b,c,d,e = self.header
yield(f'{a:^10} {b:^15} {c:^25}{d:10}{e:^10}')
for i in self.read():
# print(i)
yield f'{i[0]:^10}| {i[1]:^10} | {i[2]:^10} | {i[3]:^10} | {i[4]:^10}'
headers= ['id', 'date', 'status', 'number', 'email']
data_frame = complete_list('yes.txt',header = headers).printer()
output
id date status number email
1434 | "2021-08-13 06:31:59" | unread | 082196788998 | kuse.hamdy#gmail.com
1433 | "2021-08-13 06:09:41" | unread | 081554220007 | ritaambarwati1#umsida.ac
1432 | "2021-08-13 05:35:07" | unread | 081911075017 | rifqinaufalfayyadh#gmail.com
The pandas library is the perfect tool for that
import pandas as pd
df = pd.read_csv("data.csv", sep=",", names=['id', 'date', 'status', 'number', 'email'])
print(df)
id date status number email
0 1434 2021-08-13 06:31:59 unread 82196788998 kuse.hamdy#gmail.com
1 1433 2021-08-13 06:09:41 unread 81554220007 ritaambarwati1#umsida.ac.id
2 1432 2021-08-13 05:35:07 unread 81911075017 rifqinaufalfayyadh#gmail.com
Related
pyspark streaming dataframe write to different path depending on column values
In databricks notebook, I am reading json files with readStream, json has structure for example: id entityType eventId 1 person 123 2 employee 234 3 client 687 4 client 687 My code: cloudfile = { "cloudFiles.format": "json", "cloudFiles.schemaLocation": SCHEMA_LOCATION "cloudFiles.useNotifications", True} df = (spark.readStream .format('cloudfiles') .options(**cloudfile) .load(SOURCE_PATH) ) How can I write it using writeStream to different folders, depending on column values? Output exmaple: mainPath/{entityType}/{eventId}/data.json entity with id = 1 to file: mainPath/person/123/data.json entity with id = 2 to file: mainPath/employee/234/data.json entity with id = 3 to file: mainPath/client/687/data.json ...
Extract nested values from data frame using python
I've extracted the data from API response and created a dictionary function: def data_from_api(a): dictionary = dict( data = a['number'] ,created_by = a['opened_by'] ,assigned_to = a['assigned'] ,closed_by = a['closed'] ) return dictionary and then to df (around 1k records): raw_data = [] for k in data['resultsData']: records = data_from_api(k) raw_data.append(records) I would like to create a function allows to extract the nested fields {display_value} in the columns in the dataframe. I need only the names like John Snow, etc. Please see below: How to create a function extracts the display values for those fields? I've tried to create something like: df = pd.DataFrame.from_records(raw_data) def get_nested_fields(nested): if isinstance(nested, dict): return nested['display_value'] else: return '' df['created_by'] = df['opened_by'].apply(get_nested_fields) df['assigned_to'] = df['assigned'].apply(get_nested_fields) df['closed_by'] = df['closed'].apply(get_nested_fields) but I'm getting an error: KeyError: 'created_by' Could you please help me?
You can use .str and get() like below. If the key isn't there, it'll write None. df = pd.DataFrame({'data':[1234, 5678, 5656], 'created_by':[{'display_value':'John Snow', 'link':'a.com'}, {'display_value':'John Dow'}, {'my_value':'Jane Doe'}]}) df['author'] = df['created_by'].str.get('display_value') output data created_by author 0 1234 {'display_value': 'John Snow', 'link': 'a.com'} John Snow 1 5678 {'display_value': 'John Dow'} John Dow 2 5656 {'my_value': 'Jane Doe'} None
How can I split a batch string message received from Azure Service Bus to row by row?
I'm a beginner in python, I have an Azure function that runs with a time trigger. This function reads a batch of raw JSON data from an Azure service bus with string format. This is a two-row of data. In reality, I received about 50 like this message is continuous. Now I want to split this message row by row and then archive it to Azure Storage. The message is like the below sample ( concat of row1 and row2 ) : {"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":4560000,"SiName":"","As":"","PId":2107401,"ICheck":0,"SeeNum":40509704561424,"Type":0,"Counter":34,"PaId":0,"MeType":31,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1}{"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.0254566,\"10\":-0.054562772}},\"NUMBER_TAG\":\"2145600\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":1,"id":"074222a38-2816-42c7-b95c-6644448ba9d","t":-2} Row 1 is: {"Name":"","Seri":21000000,"SiName":"","As":"","PId":21070101,"ICheck":0,"SeeNum":405097041391424,"Type":0,"Counter":33,"PaId":0,"MeType":30,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTACHED_DEVICE_SERIAL_NUMBER_TAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-1} Row 2 is: {"Name":"","Seri":4560000,"SiName":"","As":"","PId":2107401,"ICheck":0,"SeeNum":40509704561424,"Type":0,"Counter":34,"PaId":0,"MeType":31,"RecTime":"2021-10-21T09:04:41.0151Z","ReaTime":null,"Cape":"2021-10-21T09:04:40.644","Status":0,"text":"{\"TYPE_TAG\":\"00\",\"ENSORAG\":{\"date_time\":\"2021-10-21 09:04:40.644\",\"seber\":10,\"seqmber\":405097041391424,\"lo_name\":\"\",\"accati\":{\"0\":0.0,\"1\":-0.037665367,\"2\":-0.033863068,\"3\":-0.026795387,\"4\":-0.03757,\"5\":-0.02809906,\"6\":-0.016090393,\"7\":-0.040496826,\"8\":-0.05318451,\"9\":-0.025012016,\"10\":-0.057872772}},\"ATTTAG\":\"21000000\",\"error\":{}}","CerId":null,"Id":null,"Asse":null,"Id":0,"id":"075f0a38-2816-42c7-b95c-66c425b8ba9d","t":-2} The structure of a row is like the below image: In my opinion, First I should split each row and then create a data frame and insert each value in the related column. After that, I append to a blob. Is it right? How can I do? What is your suggested solution? Edited: My code for reading from service bus: from azure.servicebus import ServiceBusClient, ServiceBusMessage connection_str = "**" topic_name = "***" subscription_name = "***" servicebus_client = ServiceBusClient.from_connection_string( conn_str=connection_str, logging_enable=True) with servicebus_client: # get the Subscription Receiver object for the subscription receiver = servicebus_client.get_subscription_receiver( topic_name=topic_name, subscription_name=subscription_name, ) with receiver: for msg in receiver: print("Received: " + str(msg)) # complete the message so that the message is removed from the subscription receiver.complete_message(msg)
Since the messages are sent individually, you can process them individually. There is no need to concatenate into a string. Just keep appending them into a data frame. Below sample is for a queue but you can extend to a topic/subscription. I've also attached the results to show you what the output looks like. from azure.servicebus import ServiceBusClient import pandas as pd import json from pandas import json_normalize CONNECTION_STR = 'Endpoint=sb://xxxxxxxxxxxxxxxxxxxxxxxxxxxxx' QUEUE_NAME = 'xxxxxxxxxxx' servicebus_client = ServiceBusClient.from_connection_string(conn_str=CONNECTION_STR) with servicebus_client: receiver = servicebus_client.get_queue_receiver(queue_name=QUEUE_NAME) # create an Empty DataFrame object df = pd.DataFrame() msg_concat = "" dfs = [] with receiver: received_msgs = receiver.receive_messages(max_message_count=10, max_wait_time=5) for msg in received_msgs: msg_dict = json.loads(str(msg)) df2 = json_normalize(msg_dict) df = df.append(df2, ignore_index = True) receiver.complete_message(msg) print(df) print("Receive is done.") Name Seri SiName As ... Id Asse id t 0 21000000 ... 0 None 075f0a38-2816-42c7-b95c-66c425b8ba9d -1 1 4560000 ... 0 None 075f0a38-2816-42c7-b95c-66c425b8ba9d -2 [2 rows x 21 columns] Receive is done.
Consider sample data with three rows: data = '{"Name": "Hassan", "code":"12"}{"Name": "Jack", "code":"345"}{"Name": "Jack", "code":"345"}' Here is how you can get dataframe from this data: from ast import literal_eval data = [literal_eval(d + '}')for d in data.split('}')[0:-1]] df = pd.DataFrame.from_records(data) Output: Name code 0 Hassan 12 1 Jack 345 2 Jack 345
read multiple nested json file with python pandas
im beginner with python. i want to read this json file data1 like in the attachment. i have tried to read all columns in the file, but i can only read the 'data' nest. i don't know how to read all the columns in both "data" and "quotes" nest. can you please help me? Thankyou my code: import pandas as pd data = json.load(open('C:/JSON_IMPORT/data1.json')) df = pd.DataFrame(data["data"]) print (df) **Json file:** ```{"status": {"timestamp":"2021-03-16T19:27:55.404Z","error_code":0,"error_message":null,"elapsed":173,"credit_count":22,"notice":null,"total_count":4368}, "data":[{"id":1, "name":"Bitcoin", "symbol":"BTC", "slug":"bitcoin", "num_market_pairs":9862, "date_added":"2013-04-28T00:00:00.000Z", "tags":["mineable","pow","sha-256","store-of-value","state-channels","coinbase-ventures-portfolio","three-arrows-capital-portfolio","polychain-capital-portfolio"], "max_supply":21000000, "circulating_supply":18655725, "total_supply":18655725, "platform":null, "cmc_rank":1, "last_updated":"2021-03-16T19:26:11.000Z", "quote":{ "USD":{ "price":55643.86231386882, "volume_24h":57006039705.56386, "percent_change_1h":-0.22948654, "percent_change_24h":-0.66133846, "percent_change_7d":3.26713607, "percent_change_30d":14.24843475, "percent_change_60d":54.21680422, "percent_change_90d":168.83609047, "market_cap":1038076593265.4004, "last_updated":"2021-03-16T19:26:11.000Z"}} }} ]
Here you go. You should use pd.json_normalize and concatenate that with the dataframe made from data['status'] df = pd.concat([pd.DataFrame(data['status'],index=[0]), pd.json_normalize(data, record_path=['data'])], axis=1) print(df) # > timestamp error_code error_message elapsed credit_count notice total_count id name symbol slug num_market_pairs date_added tags max_supply circulating_supply total_supply platform cmc_rank last_updated quote.USD.price quote.USD.volume_24h quote.USD.percent_change_1h quote.USD.percent_change_24h quote.USD.percent_change_7d quote.USD.percent_change_30d quote.USD.percent_change_60d quote.USD.percent_change_90d quote.USD.market_cap quote.USD.last_updated 0 2021-03-16T19:27:55.404Z 0 null 173 22 null 4368 1 Bitcoin BTC bitcoin 9862 2013-04-28T00:00:00.000Z ['mineable', 'pow', 'sha-256', 'store-of-value', 'state-channels', 'coinbase-ventures-portfolio', 'three-arrows-capital-portfolio', 'polychain-capital-portfolio'] 21000000 18655725 18655725 null 1 2021-03-16T19:26:11.000Z 55643.86231386882 57006039705.56386 -0.22948654 -0.66133846 3.26713607 14.24843475 54.21680422 168.83609047 1038076593265.4004 2021-03-16T19:26:11.000Z
Django - Saving one form multiple times
I have a Django view that uses one form multiple times. The form is saving relationship Subgroup id as a foreign key and Student id as a foreign key . The problem I'm having is when I try to save information to database it only saves the last record. For example (database model): 1 858 | Pump | Iron 2 78 | Madagaskar| Thomas And if Im trying to split them into seperate groups, only Madagaskar his data is saved: id | timestamp | student_Id_id | subgroup_Id_id | +----+----------------------------+---------------+----------------+ | 62 | 2016-05-06 10:54:49.022000 | 2 | 91 | The form looks like this: class ApplicationFormaFull1(MultiModelForm): form_classes = { 'sub1': FormSubgroup, 'sub2': FormSubgroup, 'stud_sub': FormStudent_in_Subgroup } and my view : sub = form['sub1'].save(commit=False) sub.student_group = StudentGroup.objects.get(id=element) sub.number = 1 sub.type = 'Other' sub.student_count = firstSubgroup sub.save() sub1 = form['sub2'].save(commit=False) sub1.student_group = StudentGroup.objects.get(id=element) sub1.number = 2 sub1.type = 'Others' sub1.student_count = secondSubgroup sub1.save() if (counter%2==1): stud_sub = form['stud_sub'].save(commit=True) stud_sub.subgroup_Id = sub stud_sub.student_Id = Student.objects.get(id=student) stud_sub.save() else: stud_sub = form['stud_sub'].save(commit=True) stud_sub.subgroup_Id = sub1 stud_sub.student_Id = Student.objects.get(id=student) stud_sub.save() So to sum up, I want that every form would save its information multiple times (dynamically) Maybe the solution is that I should store information in the list and after all forms are added, save them one by one ? stud_sub = form['stud_sub'].save(commit=False) stud_sub.subgroup_Id = sub stud_sub.student_Id = Student.objects.get(id=student) list.add(stud_sub) ... for i in list: i.save() Other solution use formset: ArticleFormSet = formset_factory(ArticleForm, extra=2) formset = ArticleFormSet(initial=[ {'title': 'Django is now open source', 'pub_date': datetime.date.today(),} ]) However i dont know how to change title, pub_date and to add everyting to formset dynimically.