I am working on my first python project and am in over my head.
I am trying to collect bike share data from an xml feed.
I want to capture this data every n minutes, lets say 5 for now, then create a csv file of that data.
Additionally I want to create a second file that appends the first file so I have a historical database.
The headers in the csv are:
[ "id", "name", "terminalName", "lastCommWithServer", "lat", "long", "installed", "locked", "installDate", "removalDate", "temporary", "public", "nbBikes", "nbEmptyDocks", "latestUpdateTime" ]
I made a start but I'm not going no where slowly! Any help would be appreciated.
This is what I have but the csv writing is a mess.
import urllib
import csv
url = 'http://www.capitalbikeshare.com/data/stations/bikeStations.xml'
connection = urllib.urlopen(url)
data = connection.read()
with open('statuslog.csv', 'wb') as myfile:
wr = csv.writer(myfile, quoting=csv.QUOTE_ALL)
wr.writerow(data)
Related
I have a json file say 1.control_file.txt and i need to chnage it's values every now and then,so say i have a ticket Sample1 and with that i need to change the date field, also other fields are required sometimes.
So suppose say i fetched the ticket no as user input and so how do i change the ticket field together with suppose say a given start and end date.
Also the exporter names tag in the json file should be changeable..
Can any on suggest me on how do i do that using shell or python?
Fields i am taking as user input user,ticket,startdate,end_date and sample_names...
"user": "dexter",
"ticket": "Sample1",
"start_date": "2018-07-02",
"end_date": "2019-07-02",
"sample_names": [
"Demo1exp1",
"Demo2exp2",
"Demo3exp3",
"Demo4exp4",
"Demo5exp5",
"Demo6exp6",
"Demo7exp7",
"Demo8exp8",
"Demo9exp9"
]
}```
This snippet of code is probably what you need
import json
with open('data.txt') as json_file:
data = json.load(json_file)
data['start_date'] = "2018-07-03"
with open('data.txt', 'w') as outfile:
json.dump(data, outfile)
I am currently conducting a data scraping project with Python 3 and am attempting to write the scraped data to a CSV file. My current process to do it is this:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
for each in data:
scrapedData = scrap(each)
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
Once this script is finished, however, the CSV file is blank. If I just run:
import csv
outputFile = csv.writer(open('myFilepath', 'w'))
outputFile.writerow(['header1', 'header2'...])
a CSV file is produced containing the headers:
header1,header2,..
If I just scrape 1 in data, for example:
outputFile.writerow(['header1', 'header2'...])
scrapedData = scrap(data[0])
outputFile.writerow([scrapedData.get('header1', 'header 1 NA'), ...])
a CSV file will be created including both the headers and the data for data[0]:
header1,header2,..
header1 data for data[0], header1 data for data[0]
Why is this the case?
When you open a file with w, it erases the previous data
From the docs
w: open for writing, truncating the file first
So when you open the file after writing scrape data with w, you just get a blank file and then you write the header on it so you only see the header. Try replacing w with a. So the new call to open the file would look like
outputFile = csv.writer(open('myFilepath', 'a'))
You can fine more information about the modes to open the file here
Ref: How do you append to a file?
Edit after DYZ's comment:
You should also be closing the file after you are done appending. I would suggest using the file like the:
with open('path/to/file', 'a') as file:
outputFile = csv.writer(file)
# Do your work with the file
This way you don't have to worry about remembering to close it. Once the code exists the with block, the file will be closed.
I would use Pandas for this:
import pandas as pd
headers = ['header1', 'header2', ...]
scraped_df = pd.DataFrame(data, columns=headers)
scraped_df.to_csv('filepath.csv')
Here I'm assuming your data object is a list of lists.
i'm new to python and I've got a large json file that I need to convert to csv - below is a sample
{ "status": "success","Name": "Theresa May","Location": "87654321","AccountCategory": "Business","AccountType": "Current","TicketNo": "12345-12","AvailableBal": "12775.0400","BookBa": "123475.0400","TotalCredit": "1234567","TotalDebit": "0","Usage": "5","Period": "May 11 2014 to Jul 11 2014","Currency": "GBP","Applicants": "Angel","Signatories": [{"Name": "Not Available","BVB":"Not Available"}],"Details": [{"PTransactionDate":"24-Jul-14","PValueDate":"24-Jul-13","PNarration":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"24-Jul-14","PValueDate":"23-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"22-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"21-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"},{"PTransactionDate":"25-Jul-14","PValueDate":"20-Jul-14","PTest":"Cash Deposit","PCredit":"0.0000","PDebit":"40003.0000","PBalance":"40003.0000"}]}
I need this to show up as
name, status, location, accountcategory, accounttype, availablebal, totalcredit, totaldebit, etc as columns,
with the pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and 'ptest' having new values each row as the JSON file shows
I've managed to put this script below together looking online, but it's showing me an empty csv file at the end. What have I done wrong? I have used the online json to csv converters and it works, however as these are sensitive files I'm hoping to write/manage with my own script so I can see exactly how it works. Please see below for my python script - can I have some advise on what to change? thanks
import csv
import json
infile = open("BankStatementJSON1.json","r")
outfile = open("testing.csv","w")
writer = csv.writer(outfile)
for row in json.loads(infile.read()):
writer.writerow(row)
import csv, json, sys
# if you are not using utf-8 files, remove the next line
sys.setdefaultencoding("UTF-8") # set the encode to utf8
# check if you pass the input file and output file
if sys.argv[1] is not None and sys.argv[2] is not None:
fileInput = sys.argv[1]
fileOutput = sys.argv[2]
inputFile = open("BankStatementJSON1.json","r") # open json file
outputFile = open("testing2.csv","w") # load csv file
data = json.load("BankStatementJSON1.json") # load json content
inputFile.close() # close the input file
output = csv.writer("testing.csv") # create a csv.write
output.writerow(data[0].keys()) # header row
for row in data:
output.writerow(row.values()) # values row
This works for the JSON example you posted. The issue is that you have nested dict and you can't create sub-headers and sub rows for pcredit, pdebit, pbalance, ptransactiondate, pvaluedate and ptest as you want.
You can use csv.DictWriter:
import csv
import json
with open("BankStatementJSON1.json", "r") as inputFile: # open json file
data = json.loads(inputFile.read()) # load json content
with open("testing.csv", "w") as outputFile: # open csv file
output = csv.DictWriter(outputFile, data.keys()) # create a writer
output.writeheader()
output.writerow(data)
Make sure you're closing the output file at the end as well.
So after struggling a long time I've found a way to get the data from nba.com in comma separated values
This is the result http://stats.nba.com/stats/leaguedashplayerstats?DateFrom=&DateTo=&GameScope=&GameSegment=&LastNGames=15&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2015-16&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&VsConference=&VsDivision=
How do I get that into a nice CSV or excel file?
Or even better if possible, how can I automatically query this data like web querying a table through excel web query?
The following should get you started:
import requests
import csv
url = "http://stats.nba.com/stats/leaguedashplayerstats?DateFrom=&DateTo=&GameScope=&GameSegment=&LastNGames=15&LeagueID=00&Location=&MeasureType=Advanced&Month=0&OpponentTeamID=0&Outcome=&PaceAdjust=N&PerMode=Totals&Period=0&PlayerExperience=&PlayerPosition=&PlusMinus=N&Rank=N&Season=2015-16&SeasonSegment=&SeasonType=Regular+Season&StarterBench=&VsConference=&VsDivision="
data = requests.get(url)
entries = data.json()
with open('output.csv', 'wb') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(entries['resultSets'][0]['headers'])
csv_output.writerows(entries['resultSets'][0]['rowSet'])
This would produce an output.csv file starting as follows:
PLAYER_ID,PLAYER_NAME,TEAM_ID,TEAM_ABBREVIATION,AGE,GP,W,L,W_PCT,MIN,OFF_RATING,DEF_RATING,NET_RATING,AST_PCT,AST_TO,AST_RATIO,OREB_PCT,DREB_PCT,REB_PCT,TM_TOV_PCT,EFG_PCT,TS_PCT,USG_PCT,PACE,PIE,FGM,FGA,FGM_PG,FGA_PG,FG_PCT,CFID,CFPARAMS
201166,Aaron Brooks,1610612741,CHI,31.0,13,6,7,0.462,17.5,105.8,106.8,-0.9,0.243,2.4,25.9,0.015,0.077,0.046,10.8,0.5,0.511,0.198,95.84,0.065,36,85,2.8,6.5,0.424,5,"201166,1610612741"
203932,Aaron Gordon,1610612753,ORL,20.0,15,3,12,0.2,23.0,98.9,106.4,-7.5,0.1,1.91,15.7,0.089,0.228,0.158,8.2,0.575,0.608,0.151,94.16,0.124,46,87,3.1,5.8,0.529,5,"203932,1610612753"
1626151,Aaron Harrison,1610612766,CHA,21.0,7,3,4,0.429,4.2,103.3,95.4,7.9,0.0,0.0,0.0,0.08,0.08,0.08,16.7,0.0,0.0,0.095,100.22,-0.032,0,5,0.0,0.7,0.0,5,"1626151,1610612766"
I'm experiencing a problem when I dump json data into a CSV file. There is typically a block of json data that is missing from my the CSV file, but can be seen if I print the json in the console or to a file.
Essentially I am calling a service twice and receiving back two json responses that I parse and dump into a CSV file. The service can only be called for 7 day increments (unix time), so I have implemented logic to call the service for this increment over a period of time.
I'm using the python vanilla json and csv libraries.
First the CSV is created with headers:
with open ('history_' + datetime.datetime.now().strftime("%Y-%m-%d_%H-%M-%S")+'.csv', 'wb') as outcsv:
writer = csv.writer(outcsv)
writer.writerow(["Column1","Column2", "Column3", "Column4", "Column5",
"Column6"])
Then, I have a counter that calls the service twice, fifty times (following the open of the CSV file):
while y<50:
jsoResponseOne = getJsonOne(7)
jsonResponseTwo = getJsonTwo(7)
Example json response:
{"Value":
[
{"ExampleName": "Test",
"ExampleNameTwo": "Test2",
"ExampleDate": "1436103790",
"ExampleCode": 00000001,
"ExampleofExample": "abcd",
"AnotherExample": "hello"},
{"ExampleName": "Test2",
"ExampleNameTwo": "Test3",
"ExampleDate": "1436103790",
"ExampleCode": 00000011,
"ExampleofExample": "abcd",
"AnotherExample": "hello2"},
]
}
The CSV output columns would look like:
ExampleName ExampleNameTwo ExampleDate ExampleCode ExampleofExample AnotherExample
Finally, the CSV is written as follows:
for item in jsonResponseOne['Value']:
row = []
row.append(str(item['ExampleName'].encode('utf-8')))
if item.get("ExampleNameTwo"):
row.append(str(item["ExampleNameTwo"]))
else:
row.append("None")
row.append(str(item['ExampleDate']))
row.append(str(item['ExampleCode'].encode('utf-8')))
row.append(str(item['ExampleofExample'].encode('utf-8')))
row.append(str(item['AnotherExample'].encode('utf-8')))
writer.writerow(row)
for item in jsonResponseTwo['Value']:
anotherRow= []
anotherRow.append(str(item['ExampleName'].encode('utf-8')))
if item.get("ExampleNameTwo"):
anotherRow.append(str(item["ExampleNameTwo"]))
else:
anotherRow.append("None")
anotherRow.append(str(item['ExampleDate']))
anotherRow.append(str(item['ExampleCode'].encode('utf-8')))
anotherRow.append(str(item['ExampleofExample'].encode('utf-8')))
anotherRow.append(str(item['AnotherExample'].encode('utf-8')))
writer.writerow(anotherRow)
Why could my CSV output be missing an entire row of data (a block of data from the JSON response)?
Resolved.
The Python script had an indentation issue in the one of the while loops, causing some data to be skipped over and not written to the CSV file.