I'm experiencing a problem when I dump json data into a CSV file. There is typically a block of json data that is missing from my the CSV file, but can be seen if I print the json in the console or to a file.
Essentially I am calling a service twice and receiving back two json responses that I parse and dump into a CSV file. The service can only be called for 7 day increments (unix time), so I have implemented logic to call the service for this increment over a period of time.
I'm using the python vanilla json and csv libraries.
First the CSV is created with headers:
with open ('history_' + datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")+'.csv', 'wb') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(["Column1","Column2", "Column3", "Column4", "Column5",
"Column6"])
Then, I have a counter that calls the service twice, fifty times (following the open of the CSV file):
while y<50:
jsoResponseOne = getJsonOne(7)
jsonResponseTwo = getJsonTwo(7)
Example json response:
{"Value":
[
{"ExampleName": "Test",
"ExampleNameTwo": "Test2",
"ExampleDate": "1436103790",
"ExampleCode": 00000001,
"ExampleofExample": "abcd",
"AnotherExample": "hello"},
{"ExampleName": "Test2",
"ExampleNameTwo": "Test3",
"ExampleDate": "1436103790",
"ExampleCode": 00000011,
"ExampleofExample": "abcd",
"AnotherExample": "hello2"},
]
}
The CSV output columns would look like:
ExampleName ExampleNameTwo ExampleDate ExampleCode ExampleofExample AnotherExample
Finally, the CSV is written as follows:
for item in jsonResponseOne['Value']:
row = []
row.append(str(item['ExampleName'].encode('utf-8')))
if item.get("ExampleNameTwo"):
row.append(str(item["ExampleNameTwo"]))
else:
row.append("None")
row.append(str(item['ExampleDate']))
row.append(str(item['ExampleCode'].encode('utf-8')))
row.append(str(item['ExampleofExample'].encode('utf-8')))
row.append(str(item['AnotherExample'].encode('utf-8')))
writer.writerow(row)
for item in jsonResponseTwo['Value']:
anotherRow= []
anotherRow.append(str(item['ExampleName'].encode('utf-8')))
if item.get("ExampleNameTwo"):
anotherRow.append(str(item["ExampleNameTwo"]))
else:
anotherRow.append("None")
anotherRow.append(str(item['ExampleDate']))
anotherRow.append(str(item['ExampleCode'].encode('utf-8')))
anotherRow.append(str(item['ExampleofExample'].encode('utf-8')))
anotherRow.append(str(item['AnotherExample'].encode('utf-8')))
writer.writerow(anotherRow)
Why could my CSV output be missing an entire row of data (a block of data from the JSON response)?
Resolved.
The Python script had an indentation issue in the one of the while loops, causing some data to be skipped over and not written to the CSV file.
Related
I am strucked in my work where my requirement is combining multiple json files into single json file and need to compress it in s3 folder
Somehow I did but the json contents are merging in dictionary and I know I have used Dictionary to load my json content from files because I tried with loading as List but it throws mw JSONDecodeError "Extra data:line 1 column 432(431)"
my file looks like below:
file1 (no .json extension will be there)
{"abc":"bcd","12354":"31354321"}
file 2
{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}
my code-
import json
import boto3
s3_client=boto3.client('s3')
bucket_name='<my bucket>'
def lambda_handler(event,context):
key='<Bucket key>'
jsonfilesname = ['<name of the json files which stored in list>']
result=[]
json_data={}
for f in (range(len(jsonfilesname))):
s3_client.download_file(bucket_name,key+jsonfilesname[f],'/tmp/'+key+jsonfilesname[f])
infile = open('/tmp/'+jsonfilesname[f]).read()
json_data[infile] = result
with open('/tmp/merged_file','w') as outfile:
json.dump(json_data,outfile)
my output for the outfile by the above code is
{
"{"abc":"bcd","12354":"31354321"}: []",
"{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"} :[]"
}
my expectation is:
{"abc":"bcd","12354":"31354321"},{"abc":"bcd","12354":"31354321":"hqeddeqf":"5765354"}
Please someone help and advice what needs to be done to get as like my expected output
First of all:
file 2 is not a valid JSON file, correctly it should be:
{
"abc": "bcd",
"12354": "31354321",
"hqeddeqf": "5765354"
}
Also, the output is not a valid JSON file, what you would expect after merging 2 JSON files is an array of JSON objects:
[
{
"abc": "bcd",
"12354": "31354321"
},
{
"abc": "bcd",
"12354": "31354321",
"hqeddeqf": "5765354"
}
]
Knowing this, we could write a Lamdda to merge JSONS files:
import json
import boto3
s3 = boto3.client('s3')
def lambda_handler(event,context):
bucket = '...'
jsonfilesname = ['file1.json', 'file2.json']
result=[]
for key in jsonfilesname:
data = s3.get_object(Bucket=bucket, Key=key)
content = json.loads(data['Body'].read().decode("utf-8"))
result.append(content)
# Do something with the merged content
print(json.dumps(result))
If you are using AWS, I would recommend using S3DistCp
for json file merging as it provides a fault-tolerant, distributed way that can keep up with large files as well by leveraging MapReduce
. However, it does not seem to support in-place merging.
I am trying to make a script that will delete everything within abc2. But right now, it just deletes all the json code.
The json code is located in a file named "demo".
there are multiple
Python:
with open('demo.json', 'w') as destnationF:
with open('demo.json', 'r') as source_file:
for parameters in source_file:
element = json.loads(parameters.strip())
if 'abc1' in element:
del element['abc1']
dest_file.write(json.dumps(element))
snippet of Json:
{
"parameters": [{
"abc1": {
"type": "string",
"defaultValue": "HELLO1"
},
"abc2": {
"type": "string",
"defaultValue": "HELLO2"
}
}]
}
When openning a file with w it clears it, so do it in 2 steps
read the content, keep what you need, delete what you need
write the new content
to_keep = []
with open('demo.json') as file:
content = json.load(file)
for parameter in content['parameters']:
print(parameter)
if 'abc1' in parameter:
del parameter['abc1']
to_keep.append(parameter)
with open('demo.json', 'w') as file:
json.dump({'parameters': to_keep}, file, indent=4)
Opening the file for writing is truncating the file before you can read it.
You should read the entire file into memory, then you can overwrite the file.
You also need to loop through the parameters list, and delete the abc2 properties in its elements. And when you write the JSON back to the file, you need to separate each of them with newline (but it's generally a bad idea to put multiple JSON strings in a single file -- it would be better to collect them all in a list and load and dump it all at once).
with with open('demo.json', 'r+') as source_file:
lines = source_file.readlines()
source_file.seek(0) # overwrite the file
for parameters in lines:
element = json.loads(parameters.strip())
for param in element['parameters']:
if 'abc2' in element:
del element['abc2']
source_file.write(json.sumps(element) + '\n')
source_file.truncate()
I am currently conducting a data scraping project with Python 3 and am attempting to write the scraped data to a CSV file. My current process to do it is this:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
for each in data:
scrapedData = scrap(each)
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
Once this script is finished, however, the CSV file is blank. If I just run:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
a CSV file is produced containing the headers:
header1,header2,..
If I just scrape 1 in data, for example:
outputFile.writerow(['header1', 'header2'...])
scrapedData = scrap(data[0])
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
a CSV file will be created including both the headers and the data for data[0]:
header1,header2,..
header1 data for data[0], header1 data for data[0]
Why is this the case?
When you open a file with w, it erases the previous data
From the docs
w: open for writing, truncating the file first
So when you open the file after writing scrape data with w, you just get a blank file and then you write the header on it so you only see the header. Try replacing w with a. So the new call to open the file would look like
outputFile = csv.writer(open('myFilepath', 'a'))
You can fine more information about the modes to open the file here
Ref: How do you append to a file?
Edit after DYZ's comment:
You should also be closing the file after you are done appending. I would suggest using the file like the:
with open('path/to/file', 'a') as file:
outputFile = csv.writer(file)
# Do your work with the file
This way you don't have to worry about remembering to close it. Once the code exists the with block, the file will be closed.
I would use Pandas for this:
import pandas as pd
headers = ['header1', 'header2', ...]
scraped_df = pd.DataFrame(data, columns=headers)
scraped_df.to_csv('filepath.csv')
Here I'm assuming your data object is a list of lists.
i'm new to python and I've got a large json file that I need to convert to csv - below is a sample
{ "status": "success","Name": "Theresa May","Location": "87654321","AccountCategory": "Business","AccountType": "Current","TicketNo": "12345-12","AvailableBal": "12775.0400","BookBa": "123475.0400","TotalCredit": "1234567","TotalDebit": "0","Usage": "5","Period": "May 11 2014 to Jul 11 2014","Currency": "GBP","Applicants": "Angel","Signatories": [{"Name": "Not Available","BVB":"Not Available"}],"Details": [{"PTransactionDate":"24-Jul-14","PValueDate":"24-Jul-13","PNarration":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"24-Jul-14","PValueDate":"23-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"22-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"21-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"20-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"}]}
I need this to show up as
name, status, location, accountcategory, accounttype, availablebal, totalcredit, totaldebit, etc as columns,
with the pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and 'ptest' having new values each row as the JSON file shows
I've managed to put this script below together looking online, but it's showing me an empty csv file at the end. What have I done wrong? I have used the online json to csv converters and it works, however as these are sensitive files I'm hoping to write/manage with my own script so I can see exactly how it works. Please see below for my python script - can I have some advise on what to change? thanks
import csv
import json
infile = open("BankStatementJSON1.json","r")
outfile = open("testing.csv","w")
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.writerow(row)
import csv, json, sys
# if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8") # set the encode to utf8
# check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open("BankStatementJSON1.json","r") # open json file
outputFile = open("testing2.csv","w") # load csv file
data = json.load("BankStatementJSON1.json") # load json content
inputFile.close() # close the input file
output = csv.writer("testing.csv") # create a csv.write
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values()) # values row
This works for the JSON example you posted. The issue is that you have nested dict and you can't create sub-headers and sub rows for pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and ptest as you want.
You can use csv.DictWriter:
import csv
import json
with open("BankStatementJSON1.json", "r") as inputFile: # open json file
data = json.loads(inputFile.read()) # load json content
with open("testing.csv", "w") as outputFile: # open csv file
output = csv.DictWriter(outputFile, data.keys()) # create a writer
output.writeheader()
output.writerow(data)
Make sure you're closing the output file at the end as well.
I am working on my first python project and am in over my head.
I am trying to collect bike share data from an xml feed.
I want to capture this data every n minutes, lets say 5 for now, then create a csv file of that data.
Additionally I want to create a second file that appends the first file so I have a historical database.
The headers in the csv are:
[ "id", "name", "terminalName", "lastCommWithServer", "lat", "long", "installed", "locked", "installDate", "removalDate", "temporary", "public", "nbBikes", "nbEmptyDocks", "latestUpdateTime" ]
I made a start but I'm not going no where slowly! Any help would be appreciated.
This is what I have but the csv writing is a mess.
import urllib
import csv
url = 'http://www.capitalbikeshare.com/data/stations/bikeStations.xml'
connection = urllib.urlopen(url)
data = connection.read()
with open('statuslog.csv', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(data)