I'm trying to create this with the help of python:
in a for loop (from 1,10)
text i+(i*(i+2.5)) text [i+(i*(i+2.5))] text
results:
text 1+(1*(1+2.5) text 4.5 text
text 2+(2*(2+2.5) text 11 text
text 3+(3*(3+2.5) text 19.5 text
etc
Every i must have a different formatting
All code must be in one commandline
This is what I have created
python -c "import locale; locale.setlocale(locale.LC_ALL, ''); print('\n'.join(locale.format('\%.1f',i+(i*(i+2.5))) for i in range(1,10,1)))"
How can add the rest of the text?
UPDATE:
With the help of Tanmaya Meher's answer I created this command:
python -c "import sys,locale; [sys.stdout.write('text1 ' + '\%.2f+(\%.2f*(\%.2f+2.5))'\%(i,i,i) +' text2 ' + '\%.2f'\%(i+(i*(i+2.5))) + ' text3' + '\n') for i in range(1,11,1)]"
but I still don't know where to place local formatting (locale)
I think this will help. Please specify if anything need to be added
python -c "import locale; locale.setlocale(locale.LC_ALL, ''); print ('\n'.join('text1 ' + locale.format_string('%.1f+(%.1f*(%.1f+2.5))',(i,i,i), grouping = True) + ' text2 ' + locale.format_string('%.1f',i+(i*(i+2.5)), grouping = True) + ' text3'for i in range(1,1111)))"
Output:
text1 1.0+(1.0*(1.0+2.5)) text2 4.5 text3
text1 2.0+(2.0*(2.0+2.5)) text2 11.0 text3
text1 3.0+(3.0*(3.0+2.5)) text2 19.5 text3
text1 4.0+(4.0*(4.0+2.5)) text2 30.0 text3
text1 5.0+(5.0*(5.0+2.5)) text2 42.5 text3
text1 6.0+(6.0*(6.0+2.5)) text2 57.0 text3
text1 7.0+(7.0*(7.0+2.5)) text2 73.5 text3
text1 8.0+(8.0*(8.0+2.5)) text2 92.0 text3
text1 9.0+(9.0*(9.0+2.5)) text2 112.5 text3
text1 10.0+(10.0*(10.0+2.5)) text2 135.0 text3
...
...
text1 1,109.0+(1,109.0*(1,109.0+2.5)) text2 12,33,762.5 text3
text1 1,110.0+(1,110.0*(1,110.0+2.5)) text2 12,35,985.0 text3
If you don't want grouping, i.e., 12,35,985.0 format (with commas); then just remove grouping = True from the code.
Related
My df looks like this:
record = {
'text_con_RT_t' : ['RT #Blanc_: ramdon text #hashtag quiere ', '#GonM ramdon text', 'RT #IvEc: #GonzM ramdon text', 'hOLA ramdon text ' ],
'rt' : ['RT', '' , 'RT','' ]
}
# create a dataframe
dataframe2 = pd.DataFrame(record,
columns = ['text_con_RT_t', 'rt'])
I would like to get something like this:
text_con_RT_t
rt
usr_rt
RT #Blanc_: ramdon text #hashtag quiere
RT
blanc
#GonM ramdon text
RT #IvEc: #GonzM ramdon text
RT
ivec
hOLA ramdon text
But i havent succeded, in cases where there is starts with mention, but not retweet, my results looks like this:
text_con_RT_t
rt
usr_rt
RT #Blanc_: ramdon text #hashtag quiere
RT
blanc
#GonM ramdon text
gonm ramdon text
RT #IvEc: #GonzM ramdon text
RT
ivec
hOLA ramdon text
NaN
I have tried with this:
try:
dataframe2["usr_rt"] = dataframe2.text_con_RT_t.str.lower().str.split(':').str[0].str.split('#').str[1]
except dataframe2["rt"]==None: # complicated failed
dataframe2["usr_rt"] = ""
Also with this
if dataframe2["rt"] == "RT":
return (dataframe2["usr_rt"] == dataframe2.text_con_RT_t.str.split(':').str[0].str.split('#').str[1])
What am I missing? thanks
You can use numpy.where to conditionally keep values from extracted value:
dataframe2['usr_rt'] = np.where(
dataframe2.rt == 'RT',
dataframe2.text_con_RT_t.str.extract('#(\w+)', expand=False).str.lower(),
''
)
dataframe2
text_con_RT_t rt usr_rt
0 RT #Blanc_: ramdon text #hashtag quiere RT blanc_
1 #GonM ramdon text
2 RT #IvEc: #GonzM ramdon text RT ivec
3 hOLA ramdon text
Or if retweets always start with RT, you can use regex RT.*?#(\w+):
dataframe2['usr_rt'] = dataframe2.text_con_RT_t.str.extract('RT.*?#(\w+)', expand=False).str.lower()
dataframe2
text_con_RT_t rt usr_rt
0 RT #Blanc_: ramdon text #hashtag quiere RT blanc_
1 #GonM ramdon text NaN
2 RT #IvEc: #GonzM ramdon text RT ivec
3 hOLA ramdon text NaN
I would [personally] find it easier to create values for the new column from record. If you added it into record, you wouldn't need to change the DataFrame after (which I prefer since I'm not great with numpy, so I would just end up extracting the column as list an doing what I've done below anyway).
# allowedChars = '' # '+-.' # add allowed characters
record['usr_rt'] = ['' if not rt == 'RT' else ''.join(
c for c in txt.split('#', 1)[-1].split(':')[0].lower()
if c.isalpha() or c.isdigit() # or c in allowedChars
) for txt, rt in zip(record['text_con_RT_t'], record['rt'])]
if c.isalpha() allows only characters from the alphabet to remain; remove or c.isdigit() if you want to get rid of any numeric character from username as well, and make use of allowedChars and or c in allowedChars if you want to allow some special characters (that includes spaces btw, though I don't think usernames have any).
Anyways, now pd.DataFrame(record) would return a DataFrame that looks like
text_con_RT_t
rt
usr_rt
RT #Blanc_: ramdon text #hashtag quiere
RT
blanc
#GonM ramdon text
RT #IvEc: #GonzM ramdon text
RT
ivec
hOLA ramdon text
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
if(text1 in animals or text2 in animals or text3 in animals):
print(text2) # because it was met in the if/else statment!
I tried to simplify but this animals string will be update everytime.
What is the best and easy way to achieve this without so many if/else statment in my code?
You can use regex.
import re
pattern = '|'.join([text1, text2, text3])
# pattern -> 'brown dog|white cat|fat cow'
res = re.findall(pattern, animals)
print(res)
# ['white cat']
ANY time you have a set of variables of the form xxx1, xxx2, and xxx3, you need to convert that to a list.
animals = 'silly monkey small bee white cat'
text = [
'brown dog',
'white cat',
'fat cow'
]
for t in text:
if t in animals:
print("Found",t)
Use a loop to check each case:
animals = 'silly monkey small bee white cat'
text1 = 'brown dog'
text2 = 'white cat'
text3 = 'fat cow'
for text in [text1, text2, text3]:
if text in animals:
print(text)
I have this list
lst = [' SOME TEXT\nSOME TEXT\nFTY = 1', 'A|1\nB|5\nC|3\n \nD|0\nE|0', 'D|4\nE|1\nG|1', '\nblah blah', '\n--- HHGTY',
'SOME TEXT\nFTY = 1\nA|3\nB|2\nC|8\nD|6\nE|9\nF|3', '', 'blah blah\n \nblah blah',
'--- HHGTY'
]
and I want to print only the elements containing | or HHGTY. I using the code below, but is printing
SOME TEXT and FTY = 1 too. What is wrong? Thanks
>>> for s in lst:
... if ("|" in s) or ("HHGTY" in s):
... print(s)
...
A|1
B|5
C|3
D|0
E|0
D|4
E|1
G|1
--- HHGTY
SOME TEXT
FTY = 1
A|3
B|2
C|8
D|6
E|9
F|3
--- HHGTY
>>>
I think what you want is:
for s in lst:
for subs in s.split('\n'):
if ("|" in subs) or ("HHGTY" in subs):
print(subs)
Your code is doing everything right:
SOME TEXT and FTY = 1 are parts of SOME TEXT \ nFTY = 1 \ nA | 3 \ nB | 2 \ nC | 8 \ nD | 6 \ nE | 9 \ nF | 3.
Because in your 'SOME TEXT\nFTY = 1\nA|3\nB|2\nC|8\nD|6\nE|9\nF|3' element '|' is present.
Say I have a list and a string:
l=['hello my name is michael',
'hello michael is my name',
'hello michaela is my name',
'hello my name is michelle',
'hello i'm Michael',
'hello my lastname is michael',
'hello michael',
'hello my name is michael brown']
s="hello my name is michael"
Internally, I want to search for each word in the string and count how many times each word from this string appears in each list element.
hello my name is michael: 5
hello michael is my name: 5 (all words are present)
hello michaela is my name: 5 (extra characters at end of word are Ok)
hello my name is michelle: 4
hello i'm Michael: 2
hello my lastname is michael: 4 (extra characters are end of word are not Ok)
hello michael: 2
hello my name is michael brown: 5
Finally, I wish to return all matches in the order of the highest count items first. So the output would be:
hello my name is michael: 5
hello michael is my name: 5
hello michaela is my name: 5
hello my name is michael brown: 5
hello my name is michelle: 4
hello my lastname is michael: 4
hello i'm Michael: 2
hello michael: 2
This is essentially a regex matching and sorting problem, but I am over my head on this one. Any advice how to proceed with any or all of the steps?
I don't understand your expected output. Do you mean like this:
import re
l = ['hello my name is michael',
'hello michael is my names',
'hello michaela is my name',
'hello my name is michelle',
'hello i am Michael',
'hello my lastname is michael',
'hello michael',
'hello my name is michael brown']
s = "Hello my name is Michael"
s = s.lower().split()
for item in l:
d = item.lower().split()
count = 0
for ss in s:
try:
if ss in d or re.search(ss+"\w+",item.lower()).group() in d:
count += 1
except:
pass
print (item, count)
I just learning Python and the regular "language". But I just encountered this problem which I can't answer. Some help or recommendations would be helpful.
import re
m = " text1 \abs{foo} text2 text3 \L{E.F} text4 "
def separate_mycmd(cmd,m):
math_cmd = re.compile(cmd)
math = math_cmd.findall(m)
text = math_cmd.split(m)
return(math,text)
(math,text) = separate_mycmd(r'\abs',m)
print math # ['\x07bs']
print text # [' text1 ', '{foo} text2 text3 \\L{E.F} text4 ']
(math,text) = separate_mycmd(r'\L',m)
print math # **Question:** Just ['L'] and not ['\\L'] or ['\L]
print text # [' text1 \x07bs{foo} text2 text3 \\', '{E.F} text4 ']
# **Question:** why the \\ after text3 ?
I don't understand the output from the last call. My related questions are in the comments.
Thanks in advance,
Ulrich
You want to match on "\\L", not \L, giving you:
> (math,text) = separate_mycmd(r'\\L',m)
> print math
['\\L']
> print text
[' text1 \x07bs{foo} text2 text3 ', '{E.F} text4 ']
You also probably wanted to use \\a as well. And you probably also wanted to use it in the string you're searching, giving:
m = " text1 \\abs{foo} text2 text3 \\L{E.F} text4 "