best way to write dictionary data into csv or excel - python

I am trying to write dictionary data into csv file.
Keys:
['file_name', 'candidate_skills', 'SF_name', 'RB_name', 'mb_number', 'email']
Dictionary
{'file_name': 'Aarti Banarashi.docx', 'candidate_skills': ['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS'], 'SF_name': None, 'RB_name': 'aarti banarashi\t\t\t', 'mb_number': ['+918108493333'], 'email': 'aartisingh271294#gmail.com'}
I was thinking each dictionary will be written in on row with each value in new column
'file_name' 'candidate_skills' 'SF_name' 'RB_name' 'mb_number' 'email'
I am getting results like this, into single column only:
file_name,Aarti Banarashi.docx
candidate_skills,"['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS']"
SF_name,
RB_name,aarti banarashi
mb_number,['+918108493333']
email,aartisingh271294#gmail.com
Can you please help me to write it in correct manner? Also when I add new records, it should get appended
My code:
with open('dict.csv', 'wb') as csv_file:
writer = csv.writer(csv_file)
for key, value in res.items():
writer.writerow([key, value])
Expected output
enter image description here

As soon as you work with tables I recommend pandas.
Here is the pandas solution:
d = {'file_name': 'Aarti Banarashi.docx', 'candidate_skills': ['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS'], 'SF_name': None, 'RB_name': 'aarti banarashi\t\t\t', 'mb_number': ['+918108493333'], 'email': 'aartisingh271294#gmail.com'}
import pandas as pd
df = pd.DataFrame.from_dict(d, orient='index').T
df.to_csv("output.csv",index=False)
Output:
file_name,candidate_skills,SF_name,RB_name,mb_number,email
Aarti Banarashi.docx,"['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS']",,aarti banarashi ,['+918108493333'],aartisingh271294#gmail.com

Your script was iterating over each key value pair in your dictionary and then calling writerow() for each pair. writerow() will give you a single new row, so calling it multiple time in this way will give you one row per pair.
res only contains data for a single row in your CSV file. Using a csv.DictWriter(), a single call to writerow() will convert all the dictionary entries into a single output row:
import csv
res = {'file_name': 'Aarti Banarashi.docx', 'candidate_skills': ['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS'], 'SF_name': None, 'RB_name': 'aarti banarashi\t\t\t', 'mb_number': ['+918108493333'], 'email': 'aartisingh271294#gmail.com'}
fieldnames = ['file_name', 'candidate_skills', 'SF_name', 'RB_name', 'mb_number', 'email']
with open('dict.csv', 'wb') as f_file:
csv_writer = csv.DictWriter(f_file, fieldnames=fieldnames)
csv_writer.writeheader()
csv_writer.writerow(res)
Giving you an output dict.csv file as:
file_name,candidate_skills,SF_name,RB_name,mb_number,email
Aarti Banarashi.docx,"['JQuery', ' ', 'Bootstrap', 'codeigniter', '\n', 'Javascript', 'Analysis', 'Ajax', 'HTML', 'Html5', 'SQL', 'MySQL', 'PHP', 'CSS']",,aarti banarashi ,['+918108493333'],aartisingh271294#gmail.com
By explicitly passing fieldnames is forces the ordering of the columns in the output to what you provide. If the ordering is not important, and you can instead use fieldnames=res.keys()

Related

TypeError: list indices must be integers or slices, not str - Python ZabbixAPI

Trying to remove values from a list like I have done previously however I'm running into the above error when running it.
import requests
import json
import pandas as pd
import csv
import numpy
url = 'http://XXX/api_jsonrpc.php'
payload = '{"jsonrpc": "2.0", "method": "event.get", "params": {"output": "extend", "selectAcknowledges": "extend", "selectTags": "extend", "selectSuppressionData": "extend", "selectHosts": ["hostid", "host", "name"], "recent": "true", "sortorder": "DESC"}, "auth": "XXX", "id": 1 }'
headers = {'content-type': 'application/json-rpc'}
r = requests.post(url, data=payload, headers=headers, )
geteventlist = r.json()['result']
print(type(geteventlist))
for i in geteventlist:
i['host'] = i['hosts']['host']
i['hostid'] = i['hosts']['hostid']
i['location'] = i['hosts']['name']
del i['hosts']
file = open('event.csv', 'w+', newline='', encoding="utf_8")
with file:
header = ['hosts', 'eventid', 'userid', 'acknowledged', 'opdata', 'object', 'name', 'suppressed', 'c_eventid', 'clock', 'source', 'objectid', 'severity', 'urls', 'r_eventid', 'value', 'ns', 'suppression_data', 'correlationid', 'tags']
writer = csv.DictWriter(file, fieldnames = header)
writer.writeheader()
writer.writerows(geteventlist)
This is out putting the following:
TypeError: list indices must be integers or slices, not str
The actual data that should be outputted is the following:
[{'hostid': '10519', 'proxy_hostid': '0', 'host': 'XXX', 'status': '0', 'lastaccess': '0', 'ipmi_authtype': '-1', 'ipmi_privilege': '2', 'ipmi_username': '', 'ipmi_password': '', 'maintenanceid': '0', 'maintenance_status': '0', 'maintenance_type': '0', 'maintenance_from': '0', 'name': 'XXX', 'flags': '0', 'templateid': '0', 'description': '', 'tls_connect': '1', 'tls_accept': '1', 'tls_issuer': '', 'tls_subject': '', 'proxy_address': '', 'auto_compress': '1', 'custom_interfaces': '0', 'uuid': '', 'inventory_mode': '1'}]
I understand that it is coming from a list, however I previously was able to specify the dict in order to byapss this but now I cant get around this error.
Thanks in advance for the help.
I have just tried to update this with further reading by using a range(len()) addition like the following:
for i in range(len(geteventlist)):
i['host'] = i['hosts']['host']
i['hostid'] = i['hosts']['hostid']
i['location'] = i['hosts']['name']
del i['hosts']
However this also produces an error of - "TypeError: 'int' object is not subscriptable"
and if I were to remove the len() from the range() section:
for i in range(geteventlist):
i['host'] = i['hosts']['host']
i['hostid'] = i['hosts']['hostid']
i['location'] = i['hosts']['name']
del i['hosts']
I get error - "TypeError: 'list' object cannot be interpreted as an integer"

Match string value to dataframe value and add to string

I have a string of column names and their datatype called cols below:
_LOAD_DATETIME datetime,
_LOAD_FILENAME string,
_LOAD_FILE_ROW_NUMBER int,
_LOAD_FILE_TIMESTAMP datetime,
ID int
Next I make a df from a gsheet I'm reading from the below:
import pandas as pd
output = [['table_name', 'schema_name', 'column_name', 'data_type', 'null?', 'default', 'kind', 'expression', 'comment', 'database_name', 'autoincrement', 'DateTime Comment Added'], ['ACCOUNT', 'SO', '_LOAD_DATETIME', '{"type":"TIMESTAMP_LTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', '', 'V'], ['ACCOUNT', 'SO', '_LOAD_FILENAME', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'VE'], ['B_ACCOUNT', 'SO', '_LOAD_FILE_ROW_NUMBER', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":true,"fixed":false}', 'TRUE', '', 'COLUMN', '', '', 'V'], ['ACCOUNT', 'SO', '_LOAD_FILE_TIMESTAMP', '{"type":"TIMESTAMP_NTZ","precision":0,"scale":9,"nullable":true}', 'TRUE', '', 'COLUMN', '', 'TEST', 'VE', '', '2022-02-16'], ['ACCOUNT', 'SO', 'ID', '{"type":"TEXT","length":16777216,"byteLength":16777216,"nullable":false,"fixed":false}', 'NOT_NULL', '', 'COLUMN', '', 'ID of Account', 'V', '', '2022-02-16'],]
df = pd.DataFrame(output)
df.columns = df.iloc[0]
df = df[1:]
last_2_days = '2022-02-15'
query_list = []
for index, row in df.iterrows():
if row['comment'] is not None and row['comment'] != '' and (row['DateTime Comment Added'] >= last_2_days):
comment_data = row['column_name'], row['comment']
query_list.append(comment_data)
when I print out query_list it looks like this, which is getting the correct data since I only want to get the column_name and comment when the DateTime Comment Added column is within the last 2 days of today:
[('_LOAD_FILE_TIMESTAMP', 'TEST'), ('ID', 'ID of Account')]
What I want to do next (and I'm having trouble figuring out how) is from my cols string earlier I want to add the comment from the query_list to the correct column name in cols AND add the word COMMENT before the actual comment
so cols next should look like this:
_LOAD_DATETIME datetime,
_LOAD_FILENAME string,
_LOAD_FILE_ROW_NUMBER int,
_LOAD_FILE_TIMESTAMP datetime COMMENT 'TEST',
ID int COMMENT 'ID of Account'

Reading whitespace in heading of csv file using pandas

I need to read the heading from csv that have white between them, I need help to fix it. I try differnet way like delimiter = ' ' and delim_whitespace = True. Here is how I'm write the code:
df = pd.read_csv(
d,
dtype = 'str',
usecols=[
'Owner First Name',
'Owner Last Name',
'StreetNumber',
'StreetName',
'State',
'Zip Code',
'Bdrms',
'Legal Description',
'Sq Ftg',
'Address',
'Orig Ln Amt',
'Prop Value'
],
names=[
'Owner_FirstName',
'Owner_LastName',
'StreetNumber',
'StreetName',
'State',
'ZipCode',
'Bdrms',
'Legal_Description',
'Sq_Ftg',
'Address',
'Orig_Ln_Amt',
'Prop_Value'
],
skipinitialspace=True
)
Along with your existing options you can use engine as python and provide the separator like whitespaces (\s) and tabs with (\t) in the sep arg.
pd.read_csv(engine='python', sep='\s+|,|\t')

can someone explain how i can pull specific fields from a dict in python

I'm using python 2.7 and the ebay sdk v2
I have a dict stored and Im writing it to a csv file(trying), problem is i only want certain fields i dont want to have to write out every column
here is my dict
{'itemSearchURL': 'http://www.ebay.co.uk/sch/i.html?LH_ItemCondition=1&_nkw=OMP+OD%2F1989&_ddo=1&_ipg=1&_pgn=1', 'paginationOutput': {'totalPages': '187', 'entriesPerPage': '1', 'pageNumber': '1', 'totalEntries': '187'}, 'ack': 'Success', 'timestamp': '2016-11-15T15:52:01.356Z', 'searchResult': {'item': [{'itemId': '322324027874', 'subtitle': '100% GENUINE OMP STEERING WHEEL - NOT A CHEAP FAKE COPY', 'globalId': 'EBAY-GB', 'title': 'OD/1989/NN OMP TRECENTO UNO SPORTS STEERING WHEEL 300mm in BLACK POLYURETHANE', 'country': 'GB', 'primaryCategory': {'categoryId': '40195', 'categoryName': 'Steering Wheels & Boss Kits'}, 'autoPay': 'false', 'galleryURL': 'http://thumbs3.ebaystatic.com/m/miOhEO1pDb2cff4pPcZpwIQ/140.jpg', 'shippingInfo': {'shippingType': 'Free', 'shipToLocations': ['AU', 'Americas', 'Europe', 'Asia'], 'shippingServiceCost': {'_currencyId': 'GBP', 'value': '0.0'}}, 'location': 'United Kingdom', 'topRatedListing': 'false', 'viewItemURL': 'http://www.ebay.co.uk/itm/OD-1989-NN-OMP-TRECENTO-UNO-SPORTS-STEERING-WHEEL-300mm-BLACK-POLYURETHANE-/322324027874', 'sellingStatus': {'currentPrice': {'_currencyId': 'GBP', 'value': '78.28'}, 'timeLeft': 'P25DT0H21M7S', 'convertedCurrentPrice': {'_currencyId': 'GBP', 'value': '78.28'}, 'sellingState': 'Active'}, 'paymentMethod': 'PayPal', 'isMultiVariationListing': 'false', 'condition': {'conditionId': '1000', 'conditionDisplayName': 'New'}, 'listingInfo': {'listingType': 'FixedPrice', 'gift': 'false', 'bestOfferEnabled': 'false', 'startTime': '2016-11-10T16:13:08.000Z', 'buyItNowAvailable': 'false', 'endTime': '2016-12-10T16:13:08.000Z'}}], '_count': '1'}, 'version': '1.13.0'}
here is my section of code thats not working
def WriteDictToCSV(csv_file,csv_columns,dict_data):
try:
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=csv_columns)
writer.writeheader()
for data in dict_data:
writer.writerow(data)
except IOError as (errno, strerror):
print("I/O error({0}): {1}".format(errno, strerror))
return
csv_columns = ['itemId','title','subtitle','viewItemURL']
currentPath = os.getcwd()
csv_file = currentPath + "/csv/items.csv"
WriteDictToCSV(csv_file,csv_columns,response.dict())
this is my error
Traceback (most recent call last):
File "/home/richard/workspace/ebay title search by csv/learning2.py", line 56, in <module>
WriteDictToCSV(csv_file,csv_columns,response.dict())
File "/home/richard/workspace/ebay title search by csv/learning2.py", line 45, in WriteDictToCSV
writer.writerow(data)
File "/usr/lib64/python2.7/csv.py", line 148, in writerow
return self.writer.writerow(self._dict_to_list(rowdict))
File "/usr/lib64/python2.7/csv.py", line 144, in _dict_to_list
", ".join(wrong_fields))
ValueError: dict contains fields not in fieldnames: i, t, e, m, S, e, a, r, c, h, U, R, L
I understand that i'm missing the column names but really dont know how to parse the bits i want out of the dict i was thinkink about converting the dict to xml and using etree as seen some guides on the net of how to do that but really would like to learn how to work with the dict as it is
I also tried with a for loop but that just gave me errors saying no value
for item in response.dict()['searchResult']['item']:
print "ItemID: %s" % item['itemId'].value
print "Title: %s" % item['title'].value
print "CategoryID: %s" % item['primaryCategory']['categoryId'].value
I'm struggling to get my head around the container
if i do a for loop through the dict i can see the keys and values.
mydata = response.dict()
for key, value in mydata.items():
print key, value
that give me an output in keys
I was thinking of enumerating them and extracting a single key but read that dictionaries are not ordered so it would come out with different results is this true.
You are iterating over the single dictionary you passed in:
for data in dict_data:
writer.writerow(data)
This passes in each key from the dictionary as a separate row; each row is supposed to be a dictionary and the csv module uses iteration over the row to test for extra keys. This is why you see each individual letter from the first key (itemSearchURL here).
Pass in that list of dictionaries buried under the ['searchResult']['item'] keys:
WriteDictToCSV(csv_file, csv_columns, response.dict()['searchResult']['item'])
Now dict_data is a sequence of dict objects, data will be one of the dictionaries in the list.
Next, these dictionaries contain more keys than what you told the DictWriter instance about, so you'll still get an error:
ValueError: dict contains fields not in fieldnames: 'globalId', 'country', 'primaryCategory', 'autoPay', 'galleryURL', 'shippingInfo', 'location', 'topRatedListing', 'sellingStatus', 'paymentMethod', 'isMultiVariationListing', 'condition', 'listingInfo'
Tell DictWriter to ignore these with extrasaction='ignore':
writer = csv.DictWriter(out, fieldnames=csv_columns, extrasaction='ignore')
Last but not least, you don't have to do any of that looping yourself; just use writer.writerows() (plural, note the s at the end) to write a list of rows in one go:
writer = csv.DictWriter(out, fieldnames=csv_columns, extrasaction='ignore')
writer.writeheader()
writer.writerows(dict_data)
Demo:
>>> from cStringIO import StringIO
>>> import csv
>>> response_dict = {'itemSearchURL': 'http://www.ebay.co.uk/sch/i.html?LH_ItemCondition=1&_nkw=OMP+OD%2F1989&_ddo=1&_ipg=1&_pgn=1', 'paginationOutput': {'totalPages': '187', 'entriesPerPage': '1', 'pageNumber': '1', 'totalEntries': '187'}, 'ack': 'Success', 'timestamp': '2016-11-15T15:52:01.356Z', 'searchResult': {'item': [{'itemId': '322324027874', 'subtitle': '100% GENUINE OMP STEERING WHEEL - NOT A CHEAP FAKE COPY', 'globalId': 'EBAY-GB', 'title': 'OD/1989/NN OMP TRECENTO UNO SPORTS STEERING WHEEL 300mm in BLACK POLYURETHANE', 'country': 'GB', 'primaryCategory': {'categoryId': '40195', 'categoryName': 'Steering Wheels & Boss Kits'}, 'autoPay': 'false', 'galleryURL': 'http://thumbs3.ebaystatic.com/m/miOhEO1pDb2cff4pPcZpwIQ/140.jpg', 'shippingInfo': {'shippingType': 'Free', 'shipToLocations': ['AU', 'Americas', 'Europe', 'Asia'], 'shippingServiceCost': {'_currencyId': 'GBP', 'value': '0.0'}}, 'location': 'United Kingdom', 'topRatedListing': 'false', 'viewItemURL': 'http://www.ebay.co.uk/itm/OD-1989-NN-OMP-TRECENTO-UNO-SPORTS-STEERING-WHEEL-300mm-BLACK-POLYURETHANE-/322324027874', 'sellingStatus': {'currentPrice': {'_currencyId': 'GBP', 'value': '78.28'}, 'timeLeft': 'P25DT0H21M7S', 'convertedCurrentPrice': {'_currencyId': 'GBP', 'value': '78.28'}, 'sellingState': 'Active'}, 'paymentMethod': 'PayPal', 'isMultiVariationListing': 'false', 'condition': {'conditionId': '1000', 'conditionDisplayName': 'New'}, 'listingInfo': {'listingType': 'FixedPrice', 'gift': 'false', 'bestOfferEnabled': 'false', 'startTime': '2016-11-10T16:13:08.000Z', 'buyItNowAvailable': 'false', 'endTime': '2016-12-10T16:13:08.000Z'}}], '_count': '1'}, 'version': '1.13.0'}
>>> csv_columns = ['itemId','title','subtitle','viewItemURL']
>>> out = StringIO()
>>> writer = csv.DictWriter(out, fieldnames=csv_columns, extrasaction='ignore')
>>> writer.writeheader()
>>> writer.writerows(response_dict['searchResult']['item'])
>>> print out.getvalue()
itemId,title,subtitle,viewItemURL
322324027874,OD/1989/NN OMP TRECENTO UNO SPORTS STEERING WHEEL 300mm in BLACK POLYURETHANE,100% GENUINE OMP STEERING WHEEL - NOT A CHEAP FAKE COPY,http://www.ebay.co.uk/itm/OD-1989-NN-OMP-TRECENTO-UNO-SPORTS-STEERING-WHEEL-300mm-BLACK-POLYURETHANE-/322324027874

Python script reading from a csv file [duplicate]

This question already has answers here:
How do I read and write CSV files with Python?
(7 answers)
Closed 3 months ago.
"Type","Name","Description","Designation","First-term assessment","Second-term assessment","Total"
"Subject","Nick","D1234","F4321",10,19,29
"Unit","HTML","D1234-1","F4321",18,,
"Topic","Tags","First Term","F4321",18,,
"Subtopic","Review of representation of HTML",,,,,
All the above are the value from an excel sheet , which is converted to csv and that is the one shown above
The header as you notice contains seven coulmns,the data below them vary,
I have this script to generate these from python script,the script is below
from django.db import transaction
import sys
import csv
import StringIO
file = sys.argv[1]
no_cols_flag=0
flag=0
header_arr=[]
print file
f = open(file, 'r')
while (f.readline() != ""):
for i in [line.split(',') for line in open(file)]: # split on the separator
print "==========================================================="
row_flag=0
row_d=""
for j in i: # for each token in the split string
row_flag=1
print j
if j:
no_cols_flag=no_cols_flag+1
data=j.strip()
print j
break
How to modify the above script to say that this data belongs to a particular column header..
thanks..
You're importing the csv module but never use it. Why?
If you do
import csv
reader = csv.reader(open(file, "rb"), dialect="excel") # Python 2.x
# Python 3: reader = csv.reader(open(file, newline=""), dialect="excel")
you get a reader object that will contain all you need; the first row will contain the headers, and the subsequent rows will contain the data in the corresponding places.
Even better might be (if I understand you correctly):
import csv
reader = csv.DictReader(open(file, "rb"), dialect="excel") # Python 2.x
# Python 3: reader = csv.DictReader(open(file, newline=""), dialect="excel")
This DictReader can be iterated over, returning a sequence of dicts that use the column header as keys and the following data as values, so
for row in reader:
print(row)
will output
{'Name': 'Nick', 'Designation': 'F4321', 'Type': 'Subject', 'Total': '29', 'First-term assessment': '10', 'Second-term assessment': '19', 'Description': 'D1234'}
{'Name': 'HTML', 'Designation': 'F4321', 'Type': 'Unit', 'Total': '', 'First-term assessment': '18', 'Second-term assessment': '', 'Description': 'D1234-1'}
{'Name': 'Tags', 'Designation': 'F4321', 'Type': 'Topic', 'Total': '', 'First-term assessment': '18', 'Second-term assessment': '', 'Description': 'First Term'}
{'Name': 'Review of representation of HTML', 'Designation': '', 'Type': 'Subtopic', 'Total': '', 'First-term assessment': '', 'Second-term assessment': '', 'Description': ''}

Categories