how can I get python to append var and dont change it? - python

import twitter
import unicodedata
import string
def get_tweets(user):
resultado=[]
temp=[]
api=twitter.Api()#
statuses=api.GetUserTimeline(user)
for tweet in statuses:
var = unicodedata.normalize('NFKD', tweet.text).encode('utf-8', 'replace')
print var# HoroĢscopo when i dont append it
resultado.append(var)
print resultado# Horo\xcc\x81scopo, mie\xcc\x81rcoles i get these when i append them
get_tweets('HoroscopoDeHoy')

I assume you want to put the unicode into a list:
var = unicodedata.normalize('NFKD', tweet.text)
resultado.append( var )
temp.append(var.encode('utf-8', 'ignore'))

I think the issue is with the print command. Print runs string conversion on the list which escapes any "funny" characters before writing them to standard output. If you want to display each item on the same line I would suggest doing:
for item in resultado:
print item,
This should bypass the string conversion on the list.
Sources:
http://docs.python.org/reference/simple_stmts.html#the-print-statement
http://docs.python.org/reference/expressions.html#string-conversions

Related

Filter specific word in string using python

I got some data like this
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3
The thing I want to do is to put URL that contains "jpg" or "png" into a list by using Python.
like["https.....jpg", "https......jpg", "https........png"]
But I have no ideas. Any suggestions?
Try:
s = """https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3"""
for url in s.split("http"):
if url.endswith(("jpg", "png")):
print("http" + url)
Prints:
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpg
https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpg
replace and split
strs ="https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpghttps://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpghttps://www.travel.taipei/streams/scenery_file_audio/c03.mp3"
strs =strs.replace("jpg", 'jpg ')
strs =strs.replace("png", 'png ')
print(strs.split())
output #
['https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000358.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d756/e285/f317/2ece2309-3d1c-49da-8d3a-32e0227e7732.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d379/e118/f25/554586cb-cf2d-40ef-9b6a-55fcf8d9e598.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d856/e130/f366/21ed2d17-7610-4ad2-b517-5b1b0007612a.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000360.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c1/d356/e17/f185/b1a2de52-4110-4355-a9fb-bf1d0eb627c9.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/image/a0/b0/c0/d593/e103/f285/1633c311-e148-4d03-bb43-292d816951d2.jpg', 'https://www.travel.taipei/d_upload_ttn/sceneadmin/pic/11000359.jpg', 'https://www.travel.taipei/streams/scenery_file_audio/c03.mp3']

Python put string into dictionary

I want to convert a string into a dictionary. I saved this dictionary previously in a text file.
The problem is now, that I am not sure, how the structure of the keys are. The values are generated with Counter(dictionaryName). The dictionary is really large, so I cannot check every key to see how it would be possible.
The keys can contain simple quotes like ', double quotes ", commas and maybe other characters. So is there any possibility to convert it back into a dictionary?
For example this is stored in the file:
Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23,...})
I found previous solutions with for example json, but I have problems with the double quotes and I cannot simply split for the commas.
If you trust the source, load from collections import Counter and eval() the string
How about something like:
>> from collections import Counter
>> line = '''Counter({'element0':512, "'4,5'element1":50, '4:55foobar':23})'''
>> D = eval(line)
>> D
Counter({"'4,5'element1": 50, '4:55foobar': 23, 'element0': 512})
You could remove the Counter( and ) parts, then parse the rest with ast.literal_eval as long as it only involves basic Python data types:
import ast
def parse_Counter_string(s):
s = s.strip()
if not (s.startswith('Counter(') and s.endswith(')')):
raise ValueError('String does not match expected format')
# Counter( is 8 characters
# 12345678
s = s[8:-1]
return Counter(ast.literal_eval(s))
In the future, I recommend picking a different way to serialize your data.
you can use demjson library for doing this, you can have the text directly in your program
import demjson
counter = demjson.decode("enter your text here")
if it is in the file ,you can do the following steps :
WD = dirname(realpath(__file__))
file = open(WD, "filename"), "r")
counter = demjson.decode(file.read())
file.close()

Can't get rid of hex characters

This program makes an array of verbs which come from a text file.
file = open("Verbs.txt", "r")
data = str(file.read())
table = eval(data)
num_table = len(table)
new_table = []
for x in range(0, num_table):
newstr = table[x].replace(")", "")
split = newstr.rsplit("(")
numx = len(split)
for y in range(0, numx):
split[y] = split[y].split(",", 1)[0]
new_table.append(split[y])
num_new_table = len(new_table)
for z in range(0, num_new_table):
print(new_table[z])
However the text itself contains hex characters such as in
('a\\xc4\\x9fr\\xc4\\xb1[Verb]+[Pos]+[Imp]+[A2sg]', ':', 17.6044921875)('A\\xc4\\x9fr\\xc4\\xb1[Noun]+[Prop]+[A3sg]+[Pnon]+[Nom]', ':', 11.5615234375)
I'm trying to get rid of those. How am supposed to do that?
I've looked up pretty much everywhere and decode() returns an error (even after importing codecs).
You could use parse, a python module that allows you to search inside a string for regularly-formatted components, and, from the components returned, you could extract the corresponding integers, replacing them from the original string.
For example (untested alert!):
import parse
# Parse all hex-like items
list_of_findings = parse.findall("\\x{:w}", your_string)
# For each item
for hex_item in list_of_findings:
# Replace the item in the string
your_string = your_string.replace(
# Retrieve the value from the Parse Data Format
hex_item[0],
# Convert the value parsed to a normal hex string,
# then to int, then to string again
str(int("0x"+hex_item[0]))
)
Obs: instead of "int", you could convert the found hex-like values to characters, using chr, as in:
chr(hex_item[0])

List index out of range when parsing a website using .lower()

I'm parsing a website in order to count the number of newlines in which a keyword is mentioned. Everything runs fine with the following code:
import time
import urllib2
from urllib2 import urlopen
import datetime
website = 'http://www.dailyfinance.com/2014/11/13/market-wrap-seventh-dow-record-in-eight-days/#!slide=3077515'
topSplit = 'NEW YORK -- '
bottomSplit = "<div class=\"knot-gallery\""
# Count mentions on newlines
def main():
try:
x = 0
sourceCode = urllib2.urlopen(website).read()
sourceSplit = sourceCode.split(topSplit)[1].split(bottomSplit)[0]
content = sourceSplit.split('\n') # provides an array
for line in content:
if 'gain' in line:
x += 1
print x
except Exception,e:
print 'Failed in the main loop'
print str(e)
main()
However, I'd like to account for all mentions of a particular keyword (in this case 'gain' or 'Gain'). In turn, I included .lower() in reading in the source code.
sourceCode = urllib2.urlopen(website).read().lower()
Yet this gives me the error:
Failed in the main loop
list index out of range
Supposing .lower() is throwing off the indices, why is this occurring?
You are using lowercase only string (that's what lower() does) but you're trying to split using topSplit = 'NEW YORK -- ' which should create a list with a single item.
You then try and access that list on index 1, which will always fail:
sourceCode.split(topSplit)[1]
To account for both cases take a look at regular expressions usage with the re module, here is an example:
>>> string = "some STRING lol"
>>> re.split("string", string, flags=re.IGNORECASE)
['some ', ' lol']
>>> re.split("STRING", string, flags=re.IGNORECASE)
['some ', ' lol']

Parse a json file and add the strings to a URL

How do I parse a json output get the list from data only and then add the output into say google.com/confidetial and the other strings in the list.
so my json out put i will name it "text"
text = {"success":true,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}.
What I am looking to do is get the list under data only. so far the script i got is giving me the entire json out put.
json.loads(text)
print text
output = urllib.urlopen("http://google.com" % text)
print output.geturl()
print output.read()
jsonobj = json.loads(text)
print jsonobj['data']
Will print the list in the data section of your JSON.
If you want to open each as a link after google.com, you could try this:
def processlinks(text):
output = urllib.urlopen('http://google.com/' % text)
print output.geturl()
print output.read()
map(processlinks, jsonobj['data'])
info = json.loads(text)
json_text = json.dumps(info["data"])
Using json.dumps converts the python data structure gotten from json.loads back to regular json text.
So, you could then use json_text wherever you were using text before and it should only have the selected key, in your case: "data".
Perhaps something like this where result is your JSON data:
from itertools import product
base_domains = ['http://www.google.com', 'http://www.example.com']
result = {"success":True,"code":200,"data":["Confidential","L1","Secret","Secret123","foobar","maret1","maret2","posted","rontest"],"errs":[],"debugs":[]}
for path in product(base_domains, result['data']):
print '/'.join(path) # do whatever
http://www.google.com/Confidential
http://www.google.com/L1
http://www.google.com/Secret
http://www.google.com/Secret123
http://www.google.com/foobar
http://www.google.com/maret1
http://www.google.com/maret2
http://www.google.com/posted
http://www.google.com/rontest
http://www.example.com/Confidential
http://www.example.com/L1
http://www.example.com/Secret
http://www.example.com/Secret123
http://www.example.com/foobar
http://www.example.com/maret1
http://www.example.com/maret2
http://www.example.com/posted
http://www.example.com/rontest

Categories