I am running a python script and I am trying to extract a part of it and convert it to JSON file so that I can pass it to Google Visualization.
I have included my python script and output and detailed info.
What I am doing:
I am trying to run this Python script from AlchemyAPI
https://github.com/AlchemyAPI/alchemyapi-twitter-python
My output is/as follows:
I am getting the stats from the output.
##########################################################
# The Tweets #
##########################################################
#uDiZnoGouD
Date: Mon Apr 07 05:07:19 +0000 2014
To enjoy in case you win!
To help you sulk in case you loose!
#IndiavsSriLanka #T20final http://t.co/hRAsIa19zD
Document Sentiment: positive (Score: 0.261738)
##########################################################
# The Stats #
##########################################################
Document-Level Sentiment:
Positive: 3 (60.00%)
Negative: 1 (20.00%)
Neutral: 1 (20.00%)
Total: 5 (100.00%)
This is the sample code for the stats:
def stats(tweets):
"""
Calculate and print out some basic summary statistics
INPUT:
tweets -> an array containing the analyzed tweets
"""
#init
data = {}
data['doc'] = {}
data['doc']['positive'] = 0
data['doc']['negative'] = 0
data['doc']['neutral'] = 0
data['doc']['total'] = 0
data['entity'] = {}
data['entity']['positive'] = 0
data['entity']['negative'] = 0
data['entity']['neutral'] = 0
data['entity']['total'] = 0
#loop through the tweets and count up the positive, negatives and neutrals
for tweet in tweets:
if 'entity' in tweet['sentiment']:
data['entity'][tweet['sentiment']['entity']['type']] += 1
data['entity']['total'] += 1
if 'doc' in tweet['sentiment']:
data['doc'][tweet['sentiment']['doc']['type']] += 1
data['doc']['total'] += 1
#Make sure there are some analyzed tweets
if data['doc']['total'] == 0 and data['entity']['total'] == 0:
print 'No analysis found for the Tweets'
sys.exit()
#print the stats
print ''
print ''
print '##########################################################'
print '# The Stats #'
print '##########################################################'
print ''
print ''
if data['entity']['total'] > 0:
print 'Entity-Level Sentiment:'
print 'Positive: %d (%.2f%%)' % (data['entity']['positive'], 100.0*data['entity']['positive']/data['entity']['total'])
print 'Negative: %d (%.2f%%)' % (data['entity']['negative'], 100.0*data['entity']['negative']/data['entity']['total'])
print 'Neutral: %d (%.2f%%)' % (data['entity']['neutral'], 100.0*data['entity']['neutral']/data['entity']['total'])
print 'Total: %d (%.2f%%)' % (data['entity']['total'], 100.0*data['entity']['total']/data['entity']['total'])
print ''
print ''
if data['doc']['total'] > 0:
print 'Document-Level Sentiment:'
print 'Positive: %d (%.2f%%)' % (data['doc']['positive'], 100.0*data['doc']['positive']/data['doc']['total'])
print 'Negative: %d (%.2f%%)' % (data['doc']['negative'], 100.0*data['doc']['negative']/data['doc']['total'])
print 'Neutral: %d (%.2f%%)' % (data['doc']['neutral'], 100.0*data['doc']['neutral']/data['doc']['total'])
print 'Total: %d (%.2f%%)' % (data['doc']['total'], 100.0*data['doc']['total']/data['doc']['total'])
Problem Statement:
I would like to get the positive, negitive, neutral sentiment in JSON format so that I can pass them to google visualization. How can I make a JSON file which contains my final stats. (Positive, Negitive and Neutral)?
import json
json_data = json.dumps(data)
if there is something like date time object make sure to convert it into str so that it can be json serializable.
FYI : json.loads used to load it back from json_object.
here make a dict, list or python object.
data_list = []
temp = 'Positive: %d (%.2f%%)' % (data['entity']['positive'], 100.0*data['entity']['positive']/data['entity']['total'])
data_list.append(temp)
temp = print 'Total: %d (%.2f%%)' % (data['entity']['total'], 100.0*data['entity']['total']/data['entity']['total'])
data_list.append(temp)
same as for other data_ques.
now you can dump data.
json.dumps(data_list)
Note - I think you can do all these stuff after respond in json format. here you have data_dict that contains all information you are just formatting them here instead you can this on clint side of response.
as dict object data contains all info you can dump that only and do formatting on client side which receiving json response.
json.dumps(data)
Related
I have not used Python in years and trying to get back into it. I have a Input_file (.csv) that I want to parse and store the output in a output.csv or .txt
I have managed to parse the .csv file using this code, and for the most part the it works but I cant get it save to save to file (Issue 1) without getting the below error (error 1)
import csv
import re
import itertools
file_name = 'PhoneCallData1.txt'
try:
lol = list(csv.reader(open(file_name, 'r'), delimiter=' '))
count =0
except:
print('File cannot be opened:',file_name)
exit()
try:
fout = open('output.txt','w')
except:
Print("File cannot be written to:","OutputFile")
exit()
d = dict()
for item in itertools.chain(lol): # Lists all items (field) in the CSV file.
count +=1 # counter to keep track of row im looping through
if lol[count][3] is None:
print("value is not blank")
count +=1
else:
try:
check_date = re.search(r'(\d+/\d+/\d+)', lol[count][3]) # check to determine if date is a date
except:
continue
check_cost = re.compile(r'($+\d*)', lol[count][9]) # check to determine if value is a cost
if check_date ==TRUE:
try:
key =lol[count][3] # If is a date value, store key
except ValueError:
continue
if check_cost==TRUE:
value = lol[count][9] # if is a cost ($) store value
d[key] = value
print (d[key])
# fout.write((d[key])
# What if there is no value in the cell?
# I keep getting "IndexError: list index out of range", anyone know why?
# Is there a better way to do this?
# I only want to store the destination and the charge
and now comes the complicated part. The file I need to parse has a number of irrelevant rows of data before and in between the required data.
Data Format
What I want to do;
I want to iterate over two columns of data, and only store the rows that have a date or cost in them, dis-guarding the rest of the data.
import csv
import re
import itertools
lol = list(csv.reader(open('PhoneCallData1.txt', 'r'), delimiter=' '))
count =0
d = dict()
for item in itertools.chain(lol): #Lists all items (field) in the CSV file.
count +=1 # counter to keep track of row im looping through
check_date = re.search(r'(\d+/\d+/\d+)', lol[count][3]) #check to determine
check_cost = re.compile(r'($+\d*)', lol[count][9]) #check to determine if value is a cost
if check_date ==TRUE:
key =lol[count][3] #If is a date value, store key
if check_cost==TRUE:
value = lol[count][9] #if is a cost ($) store value
d[key] = value
print (d[key])
#What if there is no value in the cell?
# I keep getting "IndexError: list index out of range", anyone know why?
# Is there a better way to do this?
# I only want to store the destination and the charges
What I have tried;
I tried to index the data after I loaded it, but that didn't seem to work.
I created this to only look at rows at that were more than a certain length, but its terrible code. I was hoping for something more practical and reusable.
import re
with open('PhoneCallData1.txt','r') as f, open('sample_output.txt','w') as fnew:
for line in f:
if len(line) > 50:
print(line)
fnew.write(line + '\n')
Import csv
lol = list(csv.reader(open('PhoneCallData1.txt', 'rb'), delimiter='\t'))
#d = dict()
#key = lol[5][0] # cell A7
#value = lol[5][3] # cell D7
#d[key] = value # add the entry to the dictionary
Keep getting index out of bounds errors
import re
import csv
match=re.search(r'(\d+/\d+/\d+)','testing date 11/12/2017')
print match.group(1)
Trying to use regex to search for the date in the first column of data.
NOTE: I wanted to try Pandas but I feel I need to start here. Any Help would be awesome.
answer to if next record need to be parsed must be specific, and I have answer a similar question, in the same way, finite-state machine may help
main code is:
state = 'init'
output = []
# for line loop:
if state == 'init': # seek for start parsing
# check if start parsing
state = 'start'
elif state == 'start': # start parsing now
# parsing
# check if need to end parsing
state = 'init'
import csv
import re
import itertools
import timeit
start_time = timeit.default_timer()
# code you want to evaluate
file_name = 'PhoneCallData.txt'
try:
lol = list(csv.reader(open(file_name, 'r'), delimiter=' '))
except:
print('File cannot be opened:', file_name)
exit()
try:
fout = open('output.txt','w')
except:
Print("File cannot be written to:","OutputFile")
exit()
# I could assign key value pairs and store in dictionry. Then print, search,ect on the dictionary. Version2
# d = dict()
count =0
total = 0
for row in lol: # Lists all items (field) in the CSV file.
#print(len(row))
count +=1 # counter to keep track of row im looping through
if len(row) == 8:
if row[2].isdigit():
# Remove the $ and convert to float
cost = re.sub('[$]', '', row[7])
# Assign total value
try:
# Calculate total for verification purposes
total = total + float(cost)
total = round(total, 2)
except:
continue
string = str(row[2] + " : " + (row[7]) + " : " + str(total) + "\n")
print (string)
fout.write(string)
if len(row) == 9:
if row[2].isdigit():
# Remove the $ and convert to float
cost = re.sub('[$]', '', row[8])
# Assign total value
try:
# Calculate total for verification purposes
total = total + float(cost)
total = round(total, 2)
except:
continue
string = str(row[2] + " : " + row[8] + " : " + str(total) + "\n")
print(string)
fout.write(string)
if len(row) == 10:
# print (row[2] +":"+ row[9])
# Remove the $ and convert to float
cost = re.sub('[$]', '', row[9])
# Assign total value
try:
# Calculate total for verification purposes
total = total + float(cost)
total = round(total, 2)
except:
continue
string = str(row[2] + " : " + row[9] + " : " + str(total) + "\n")
print(string)
fout.write(string)
# Convert to string so I can print and store in file
count_string = str(count)
total_string = str(total)
total_string.split('.', 2)
# Write to screen
print (total_string + " Total\n")
print("Rows parsed :" + count_string)
# write to file
fout.write(count_string + " Rows were parsed\n")
fout.write(total_string + " Total")
# Calcualte time spent on task
elapsed = timeit.default_timer() - start_time
round_elapsed = round(elapsed, 2)
string_elapsed = str(round_elapsed)
fout.write(string_elapsed)
print(string_elapsed + " seconds")
fout.close()
I wrote this program on IDLE 2.7 by mistake (I am a beginner).
Now I am trying to run it in 3.4 I get errors, I went on an made changes but I am still not able to run it. any help?
Yes the code might not even be great but I am still working on it. So any help will be greatly appreciated.
For me I thought only parentheses were the major difference between both updates.
# Convert a decimal to a hex as a string
def decimalToHex(decimalValue):
hex = ""
while decimalValue != 0:
hexValue = int(decimalValue) % 16
hex = toHexChar(hexValue) + hex
decimalValue = int(decimalValue) // 16
return hex
def printRect (row_count, col_count):
row = []
column = []
for r in range(row_count):
row = []
column = []
end_row_flag = 'False'
for c in range(col_count):
if r % (row_count) == 0:
if c % (col_count-1) == 0:
row.append('+')
else:
row.append('-')
end_row_flag = 'True'
if end_row_flag == 'True':
end_row = row
if c % (col_count-1) == 0:
column.append('|')
else:
column.append(' ')
if row:
print (row)
print (column)
print (end_row)
def charASCII(letter):
return (ord(letter))
# Convert an integer to a single hex digit in a character
def toHexChar(hexValue):
if 0 <= hexValue <= 9:
return chr(hexValue + ord('0'))
else: # 10 <= hexValue <= 15
return chr(hexValue - 10 + ord('A'))
def main():
# Prompt the user to enter a decimal integer
data_file = []
char_file = []
ascii_file = []
hex_key = []
decimal_key = []
nonkey_val = 32
data_file.append(' Dec Hex Char ')
data_file.append('+---------------+')
for i in range(nonkey_val):
a_char = chr(i)
hex_convert = decimalToHex(i)
if i < 10:
decimal_key = '0%s' % i
else:
decimal_key = '%s' % i
if i <= 15:
hex_key = '0%s' % hex_convert
else:
hex_key = hex_convert
data_file.append('| %s %s %s |' % (decimal_key.strip(), hex_key.strip(), a_char))
# data_file.append('%s' % (a_char))
with open ('sample_file.txt', 'r') as f:
data = f.readlines()
for character in data:
print ('character is %s' % character)
decimalValue = charASCII(character[0])
hex_convert = decimalToHex(decimalValue)
print ('decimalValue is %s' % decimalValue)
print ('The hex number for decimal %s is %s' % (decimalValue, hex_convert)
data_file.append('| %s %s %s |' % (decimalValue, hex_convert.strip(), character.strip())))
data_file.append('+---------------+')
print data_file
f.close()
with open ('output_file.txt', 'w+') as o:
for line in data_file:
o.write('%s\n'% line)
o.close
main() # Call the main function
rows = input("Enter the numer of rows: ")
columns = input("Enter the number of columns: ")
printRect (rows, columns)
You had three typos which if you provided the traceback and the relevant piece of code it happened in would have made it more obvious.
with open ('sample_file.txt', 'r') as f:
data = f.readlines()
for character in data:
print ('character is %s' % character)
decimalValue = charASCII(character[0])
hex_convert = decimalToHex(decimalValue)
print ('decimalValue is %s' % decimalValue)
print ('The hex number for decimal %s is %s' % (decimalValue, hex_convert) )# missing paren here
data_file.append('| %s %s %s |' % (decimalValue, hex_convert.strip(), character.strip())) # had extra paren here
data_file.append('+---------------+')
print(data_file ) # missing parens
with closes your files so you don't need to manually but if you were it would be o.close() not o.close
I would also you an actual boolean not a string "True":
end_row_flag = True
if end_row_flag:
I have a py script which amongst other things parses a value from a HTML table using BeautifulSoup.
The value which is being returned (outTempReal) seems to have some whitespace after the value. I know this due to the print I use...
print "Temp 1 =", avgtemperatures[0],
print "Temp 2 =", avgtemperatures[1],
print "Temp 3 =", avgtemperatures[2],
print "Temp 4 =", avgtemperatures[3],
print "Temp 5 =", avgtemperatures[4],
print "Outside Temp =", outTempReal,
print "METAR Temp =", currentTemp,
print "Plant Room Temp =", avgtemperatures[5],
print "Flow Temp =", avgtemperatures[6],
print "Return Temp =", avgtemperatures[7]
Which returns the following...
Temp 1 = 79.625 Temp 2 = 79.1456666667 Temp 3 = 31.229 Temp 4 = 28.125 Temp 5 = 27.2706666667 Outside Temp = 4.8 METAR Temp = 5 Plant Room Temp = 16.7913333333 Flow Temp = 13.875 Return Temp = 18.312
You can see that after the Outside Temp = 4.8 there is whitespace before the next print value.
This is the code used to get the value in the first place...
table = soup.find('table')
for row in table.findAll('tr')[1:]:
col = row.findAll('td')
if len(col) >= 2:
time = col[1].string
temp = col[2].string
outTempReal = re.sub(r'[^0-9\-\d.\s+]',' ', temp)
I have tried the following two methods to remove the whitespace but no joy...
outTempReal.strip()
re.sub('\s+',' ',outTempReal)
I really need this value to be just the decimal number because it is used to update a RRD.
Can anyone help?
.rstrip() will trim any trailing whitespace:
>>> thing = 'value '
>>> thing
'value '
>>> thing.rstrip()
'value'
Please see the code below -
from sys import argv
from urllib2 import urlopen
from os.path import exists
script, to_file = argv
url = "http://numbersapi.com/random"
fact = 0
number = 0
print "Top 5 Facts of The World"
while fact < 5:
response = urlopen(url)
data = response.read()
fact += 1
number += 1
print
print "%s). %s " % (str(number), data)
print "Now, let us save the facts to a file for future use."
print "Does the output file exist? %r" % exists(to_file)
print "When you are ready, simply hit ENTER"
raw_input()
out_file = open(to_file, 'w')
out_file.write(data)
print "Alright, facts are saved in the repo."
out_file.close()
The problem in the above code is when I open the file1.txt, I see only 1 fact printed. As a variation, I brought everything inside the while loop. It leads to the same problem. I believe it writes one fact, but then overwrites with the next and next, until only last fact is saved.
What am I doing wrong?
"data" holds only the last value assigned to it.
from sys import argv
script, to_file = argv
fact = 0
number = 0
out_file = open(to_file, 'w')
while fact < 5:
data = str(fact)
out_file.write(str(data) + '\n')
fact += 1
number += 1
print
print "%s). %s " % (str(number), data)
out_file.close()
You overwrite data with every loop iteration. Try this:
out_file = open(to_file, 'w')
while fact < 5:
response = urlopen(url)
data = response.read()
fact += 1
number += 1
print
print "%s). %s " % (str(number), data)
out_file.write(data)
out_file.write('\n') #one fact per line
out_file.close()
It seems you are overwritting over data on the loop, so at the end you have only the last data. Try changing to something like this:
[...]
final_data=''
while fact < 5:
response = urlopen(url)
data = response.read()
fact += 1
number += 1
print
print "%s). %s " % (str(number), data)
final_data+=data
[...]
out_file.write(final_data)
The issue is that you are writing to file after the loop, so that data points at the last url data fetched. To fix this, store data in a list, and then write everything from the list like so:
for fact in data:
out_file.write(fact + '\n')
You'll need to append the fact fetched like so:
data.append(response.read())
Or ask if you want to write it to the file before fetching facts, and then move file operations like so:
with open(to_file, 'wb') as out_file:
while fact < 5:
response = urlopen(url)
data = response.read()
if should_write:
out_file.write(data + '\n')
fact += 1
number += 1
print
print "%s). %s " % (str(number), data)
I have a BLAST output file in XML format. It is 22 query sequences with 50 hits reported from each sequence. And I want to extract all the 50x22 hits. This is the code I currently have, but it only extracts the 50 hits from the first query.
from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
blast_record = blast_records.next()
save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
save_file.close()
Somebody have any suggestions as to extract all the hits? I guess I have to use something else than alignments.
Hope this was clear. Thanks!
Jon
This should get all records. The novelty compared with the original is the
for blast_record in blast_records
which is a python idiom to iterate through items in a "list-like" object, such as the blast_records (checking the CBIXML module documentation showed that parse() indeed returns an iterator)
from Bio.Blast import NCBIXM
blast_records = NCBIXML.parse(result_handle)
save_file = open("/Users/jonbra/Desktop/my_fasta_seq.fasta", 'w')
for blast_record in blast_records:
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
save_file.write('>%s\n' % (alignment.title,))
#here possibly to output something to file, between each blast_record
save_file.close()
I used this code for extract all the results
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("rpoD.xml")) :
print "QUERY: %s" % record.query
for align in record.alignments :
print " MATCH: %s..." % align.title[:60]
for hsp in align.hsps :
print " HSP, e=%f, from position %i to %i" \
% (hsp.expect, hsp.query_start, hsp.query_end)
if hsp.align_length < 60 :
print " Query: %s" % hsp.query
print " Match: %s" % hsp.match
print " Sbjct: %s" % hsp.sbjct
else :
print " Query: %s..." % hsp.query[:57]
print " Match: %s..." % hsp.match[:57]
print " Sbjct: %s..." % hsp.sbjct[:57]
print "Done"
or for less details
from Bio.Blast import NCBIXML
for record in NCBIXML.parse(open("NC_003197.xml")) :
#We want to ignore any queries with no search results:
if record.alignments :
print "QUERY: %s..." % record.query[:60]
for align in record.alignments :
for hsp in align.hsps :
print " %s HSP, e=%f, from position %i to %i" \
% (align.hit_id, hsp.expect, hsp.query_start, hsp.query_end)
print "Done"
I used this site
http://www2.warwick.ac.uk/fac/sci/moac/currentstudents/peter_cock/python/rpsblast/