I have lists of strings, some are hashtags - like #rabbitsarecool others are short pieces of prose like "My rabbits name is fred."
I have written a program to seperate them:
def seperate_hashtags_from_prose(*strs):
props = []
hashtags = []
for x in strs:
if x[0]=="#" and x.find(' ')==-1:
hashtags += x
else:
prose += x
return hashtags, prose
seperate_hashtags_from_prose(["I like cats","#cats","Rabbits are the best","#Rabbits"])
This program does not work. in the above example when i debug it, it tells me that on the first loop:
x=["I like cats","#cats","Rabbits are the best",#Rabbits].
Thisis not what I would have expected - my intuition is that something about the way the loop over optional arguments is constructed is causing an error- but i can't see why.
There are several issues.
The most obvious is switching between props and prose. The code you posted does not run.
As others have commented, if you use the * in the function call, you should not make the call with a list. You could use seperate_hashtags_from_prose("I like cats","#cats","Rabbits are the best","#Rabbits") instead.
The line hashtags += x does not do what you think it does. When you use + as an operator on iterables (such as list and string) it will concatenate them. You probably meant hashtags.append(x) instead.
Related
I am able to convert an Hindi script written in English back to Hindi
import codecs,string
from indic_transliteration import sanscript
from indic_transliteration.sanscript import SchemeMap, SCHEMES, transliterate
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
print(transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
character = 'bakrya'
is_hindi(character)
Output:
बक्र्य
But If I try to do something like this, I don't get any conversions
character = 'Bakrya विकणे आहे'
is_hindi(character)
Output:
Bakrya विकणे आहे
Expected Output:
बक्र्य विकणे आहे
I also tried the library Polyglot but I am getting similar results with it.
Preface: I know nothing of devanagari, so you will have to bear with me.
First, consider your function. It can return two things, character or None (print just outputs something, it doesn't actually return a value). That makes your first output example originate from the print function, not Python evaluating your last statement.
Then, when you consider your second test string, it will see that there's some Devanagari text and just return the string back. What you have to do, if this transliteration works as I think it does, is to apply this function to every word in your text.
I modified your function to:
def is_hindi(character):
maxchar = max(character)
if u'\u0900' <= maxchar <= u'\u097f':
return character
else:
return transliterate(character, sanscript.ITRANS, sanscript.DEVANAGARI)
and modified your call to
' '.join(map(is_hindi, character.split()))
Let me explain, from right to left. First, I split your test string into the separate words with .split(). Then, I map (i.e., apply the function to every element) the new is_hindi function to this new list. Last, I join the separate words with a space to return your converted string.
Output:
'बक्र्य विकणे आहे'
If I may suggest, I would place this splitting/mapping functionality into another function, to make things easier to apply.
Edit: I had to modify your test string from 'Bakrya विकणे आहे' to 'bakrya विकणे आहे' because B wasn't being converted. This can be fixed in a generic text with character.lower().
I dont know if I am doing something wrong with the way I am laying out the functions or what. Basically I am trying to send a list of tickers to a database. I am taking the difference of two list so that the one that I send doesnt have the same tickers as the previous send. Can anyone help me?
tickList=[]
compareList=[]
newestList=[]
def searchTwit():
tweets = api.search("#stocks",count=100)
return tweets
def Diff(li1, li2):
return (list(list(set(li1)-set(li2)) + list(set(li2)-set(li1))))
def getTicker(tweets,list1,list2,anyList):
for tweet in tweets:
if "$" in tweet.text:
x = tweet.text.split()
for i in x:
if i.startswith("$") and i[1].isalpha():
i.strip(".")
i.upper()
list1.append(i)
anyList = Diff(list1,list2)
#print(newestList)
list2=list1
#print(list2)
list1.clear()
#print(list1)
return(anyList)
def retrieveTickers(list):
for i in list:
cursor.execute('INSERT INTO master.dbo.TickerTable (TickerName) VALUES (?);', (i))
conn.commit()
while True:
sleep(60 - time() %60)
print(f'This is ticklist {tickList}')
print(f'This is compareList{compareList}')
print(f'This is newestList {newestList}')
full_tweets=searchTwit()
getTicker(full_tweets,tickList,compareList,newestList)
retrieveTickers(newestList)
There are a few places that seem incorrect.
Python strings are immutable so this code block isn't doing what you think.
i.strip(".")
i.upper()
list1.append(i)
The string manipulation functions return new strings so you need to do this:
list1.append(i.strip(".").upper())
Its super unclear what you want retrieveTickers to do since its actually writing to a database. The function verb should match the action.
The diff function is convoluted. It can be simplified to
(set(li1) | set(li2)) - (set(li1) & set(li2))
but I don't see where you need these to be ordered so I would recommend changing the type you standardize on to set then diff can be written as:
li1.symmetric_difference(li2)
which probably doesn't need to be a function.
This block:
full_tweets=searchTwit()
getTicker(full_tweets,tickList,compareList,newestList)
retrieveTickers(newestList)
is also really really hard to follow. I think you are trying to manipulate the lists within the getTicker function. I strongly recommend not doing this. Instead return the information you need and if you need more than one data structure return them as a tuple. If I understand this correctly your function signature should probably look more like this:
new_tickers, already_seen_data = \
getTicker(full_tweets, already_seen_data)
def Change(_text):
L = len(_text)
_i = 2
_text[_i] = "*"
_i += 2
print(_text)
How can I add a mark e.g:* every two Index In String
Why are you using _ in your variables? If it is for any of these reasons then you are OK, if it is a made up syntax, try not to use it as it might cause unnecessary confusion.
As for your code, try:
def change_text(text):
for i in range(len(text)):
if i % 2 == 0: # check if i = even (not odd)
print(text[:i] + "*" + text[i+1:])
When you run change_text("tryout string") the output will look like:
*ryout string
tr*out string
tryo*t string
tryout*string
tryout s*ring
tryout str*ng
tryout strin*
If you meant something else, name a example input and wished for output.
See How to create a Minimal, Complete, and Verifiable example
PS: Please realize that strings are immutable in Python, so you cannot actually change a string, only create new ones from it.. if you want to actually change it you might be better of saving it as a list for example. Like they have done here.
Are you trying to separate every two letters with an asterix?
testtesttest
te*st*te*st*te*st
You could do this using itertools.zip_longest to split the string up, and '*'.join to rebuild it with the markers inserted
from itertools import zip_longest
def add_marker(s):
return '*'.join([''.join(x) for x in zip_longest(*[iter(s)]*2, fillvalue='')])
I've found how to split a delimited string into key:value pairs in a dictionary elsewhere, but I have an incoming string that also includes two parameters that amount to dictionaries themselves: parameters with one or three key:value pairs inside:
clientid=b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0&keyid=987654321&userdata=ip:192.168.10.10,deviceid:1234,optdata:75BCD15&md=AMT-Cam:avatar&playbackmode=st&ver=6&sessionid=&mk=PC&junketid=1342177342&version=6.7.8.9012
Obviously these are dummy parameters to obfuscate proprietary code, here. I'd like to dump all this into a dictionary with the userdata and md keys' values being dictionaries themselves:
requestdict {'clientid' : 'b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0', 'keyid' : '987654321', 'userdata' : {'ip' : '192.168.10.10', 'deviceid' : '1234', 'optdata' : '75BCD15'}, 'md' : {'Cam' : 'avatar'}, 'playbackmode' : 'st', 'ver' : '6', 'sessionid' : '', 'mk' : 'PC', 'junketid' : '1342177342', 'version' : '6.7.8.9012'}
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries? What would the syntax be? If not, I suppose I'll have to split by & and then check & handle splits that contain : but even then I can't figure out the syntax. Can someone help? Thanks!
I basically took Kyle's answer and made it more future-friendly:
def dictelem(input):
parts = input.split('&')
listing = [part.split('=') for part in parts]
result = {}
for entry in listing:
head, tail = entry[0], ''.join(entry[1:])
if ':' in tail:
entries = tail.split(',')
result.update({ head : dict(e.split(':') for e in entries) })
else:
result.update({head: tail})
return result
Here's a two-liner that does what I think you want:
dictelem = lambda x: x if ':' not in x[1] else [x[0],dict(y.split(':') for y in x[1].split(','))]
a = dict(dictelem(x.split('=')) for x in input.split('&'))
Can I take the slick two-level delimitation parsing command that I've found:
requestDict = dict(line.split('=') for line in clientRequest.split('&'))
and add a third level to it to handle & preserve the 2nd-level dictionaries?
Of course you can, but (a) you probably don't want to, because nested comprehensions beyond two levels tend to get unreadable, and (b) this super-simple syntax won't work for cases like yours, where only some of the data can be turned into a dict.
For example, what should happen with 'PC'? Do you want to make that into {'PC': None}? Or maybe the set {'PC'}? Or the list ['PC']? Or just leave it alone? You have to decide, and write the logic for that, and trying to write it as an expression will make your decision very hard to read.
So, let's put that logic in a separate function:
def parseCommasAndColons(s):
bits = [bit.split(':') for bit in s.split(',')]
try:
return dict(bits)
except ValueError:
return bits
This will return a dict like {'ip': '192.168.10.10', 'deviceid': '1234', 'optdata': '75BCD15'} or {'AMT-Cam': 'avatar'} for cases where each comma-separated component has a colon inside it, but a list like ['1342177342'] for cases where any of them don't.
Even this may be a little too clever; I might make the "is this in dictionary format" check more explicit instead of just trying to convert the list of lists and see what happens.
Either way, how would you put that back into your original comprehension?
Well, you want to call it on the value in the line.split('='). So let's add a function for that:
def parseCommasAndColonsForValue(keyvalue):
if len(keyvalue) == 2:
return keyvalue[0], parseCommasAndColons(keyvalue[1])
else:
return keyvalue
requestDict = dict(parseCommasAndColonsForValue(line.split('='))
for line in clientRequest.split('&'))
One last thing: Unless you need to run on older versions of Python, you shouldn't often be calling dict on a generator expression. If it can be rewritten as a dictionary comprehension, it will almost certainly be clearer that way, and if it can't be rewritten as a dictionary comprehension, it probably shouldn't be a 1-liner expression in the first place.
Of course breaking expressions up into separate expressions, turning some of them into statements or even functions, and naming them does make your code longer—but that doesn't necessarily mean worse. About half of the Zen of Python (import this) is devoted to explaining why. Or one quote from Guido: "Python is a bad language for code golf, on purpose."
If you really want to know what it would look like, let's break it into two steps:
>>> {k: [bit2.split(':') for bit2 in v.split(',')] for k, v in (bit.split('=') for bit in s.split('&'))}
{'clientid': [['b59694bf-c7c1-4a3a-8cd5-6dad69f4abb0']],
'junketid': [['1342177342']],
'keyid': [['987654321']],
'md': [['AMT-Cam', 'avatar']],
'mk': [['PC']],
'playbackmode': [['st']],
'sessionid': [['']],
'userdata': [['ip', '192.168.10.10'],
['deviceid', '1234'],
['optdata', '75BCD15']],
'ver': [['6']],
'version': [['6.7.8.9012']]}
That illustrates why you can't just add a dict call for the inner level—because most of those things aren't actually dictionaries, because they had no colons. If you changed that, then it would just be this:
{k: dict(bit2.split(':') for bit2 in v.split(',')) for k, v in (bit.split('=') for bit in s.split('&'))}
I don't think that's very readable, and I doubt most Python programmers would. Reading it 6 months from now and trying to figure out what I meant would take a lot more effort than writing it did.
And trying to debug it will not be fun. What happens if you run that on your input, with missing colons? ValueError: dictionary update sequence element #0 has length 1; 2 is required. Which sequence? No idea. You have to break it down step by step to see what doesn't work. That's no fun.
So, hopefully that illustrates why you don't want to do this.
I have written a little program that parses log files of anywhere between a few thousand lines to a few hundred thousand lines. For this, I have a function in my code which parses every line, looks for keywords, and returns the keywords with the associated values.
These log files contain of little sections. Each section has some values I'm interested in and want to store as a dictionary.
I have simplified the sample below, but the idea is the same.
My original function looked like this, it gets called between 100 and 10000 times per run, so you can understand why I want to optimize it:
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif 'apples' in line:
d['apples'] = True
elif 'bananas' in line:
d['bananas'] = True
elif line.startswith('End of section'):
return d
f = open('fruit.txt','r')
d = parse_txt(f)
print d
The problem I run into, is that I have a lot of conditionals in my program, because it checks for a lot of different things and stores the values for it. And when checking every line for anywhere between 0 and 30 keywords, this gets slow fast. I don't want to do that, because, not every time I run the program I'm interested in everything. I'm only ever interested in 5-6 keywords, but I'm parsing every line for 30 or so keywords.
In order to optimize it, I wrote the following by using exec on a string:
def make_func(args):
func_str = """
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
"""
if 'apples' in args:
func_str += """
elif 'apples' in line:
d['apples'] = True
"""
if 'bananas' in args:
func_str += """
elif 'bananas' in line:
d['bananas'] = True
"""
func_str += """
elif line.startswith('End of section'):
return d"""
print func_str
exec(func_str)
return parse_txt
args = ['apples','bananas']
fun = make_func(args)
f = open('fruit.txt','r')
d = fun(f)
print d
This solution works great, because it speeds up the program by an order of magnitude and it is relatively simple. Depending on the arguments I put in, it will give me the first function, but without checking for all the stuff I don't need.
For example, if I give it args=['bananas'], it will not check for 'apples', which is exactly what I want to do.
This makes it much more efficient.
However, I do not like it this solution very much, because it is not very readable, difficult to change something and very error prone whenever I modify something. Besides that, it feels a little bit dirty.
I am looking for alternative or better ways to do this. I have tried using a set of functions to call on every line, and while this worked, it did not offer me the speed increase that my current solution gives me, because it adds a few function calls for every line. My current solution doesn't have this problem, because it only has to be called once at the start of the program. I have read about the security issues with exec and eval, but I do not really care about that, because I'm the only one using it.
EDIT:
I should add that, for the sake of clarity, I have greatly simplified my function. From the answers I understand that I didn't make this clear enough.
I do not check for keywords in a consistent way. Sometimes I need to check for 2 or 3 keywords in a single line, sometimes just for 1. I also do not treat the result in the same way. For example, sometimes I extract a single value from the line I'm on, sometimes I need to parse the next 5 lines.
I would try defining a list of keywords you want to look for ("keywords") and doing this:
for word in keywords:
if word in line:
d[word] = True
Or, using a list comprehension:
dict([(word,True) for word in keywords if word in line])
Unless I'm mistaken this shouldn't be much slower than your version.
No need to use eval here, in my opinion. You're right in that an eval based solution should raise a red flag most of the time.
Edit: as you have to perform a different action depending on the keyword, I would just define function handlers and then use a dictionary like this:
def keyword_handler_word1(line):
(...)
(...)
def keyword_handler_wordN(line):
(...)
keyword_handlers = { 'word1': keyword_handler_word1, (...), 'wordN': keyword_handler_wordN }
Then, in the actual processing code:
for word in keywords:
# keyword_handlers[word] is a function
keyword_handlers[word](line)
Use regular expressions. Something like the next:
>>> lookup = {'a': 'apple', 'b': 'banane'} # keyword: characters to look for
>>> pattern = '|'.join('(?P<%s>%s)' % (key, val) for key, val in lookup.items())
>>> re.search(pattern, 'apple aaa').groupdict()
{'a': 'apple', 'b': None}
def create_parser(fruits):
def parse_txt(f):
d = {}
for line in f:
if not line:
pass
elif line.startswith('End of section'):
return d
else:
for testfruit in fruits:
if testfruit in line:
d[testfruit] = True
This is what you want - create a test function dynamically.
Depending on what you really want to do, it is, of course, possibe to remove one level of complexity and define
def parse_txt(f, fruits):
[...]
or
def parse_txt(fruits, f):
[...]
and work with functools.partial.
You can use set structure, like this:
fruit = set(['cocos', 'apple', 'lime'])
need = set (['cocos', 'pineapple'])
need. intersection(fruit)
return to you 'cocos'.