I am trying to extract just the data upto and including the $ symbol from a spreadsheet.
I have isolated the data to give me just the column containing the data but what I am trying to do is extract any and all symbols that follow a $ symbol.
For example:
$AAPL $LOW $TSLA and so on from the entire dataset but I don't need or want $1000 $600 and so on - just letters only and either a period or a space follows but just the characters a-z is what I am trying to get.
I haven't been successful in full extraction and my code is starting to get messy so I'll provide the code that will bring back the data for you to see for yourself. I am using Jupyter Notebook.
import mysql.connector
import pandas
googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheedID,
worksheetName
)
df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
df[df["TWEET"].str.contains("RT")==False]
print(df)
Not sure if I understand what you want correctly, but the following codes give all elements that comes after $ before (blank space).
import mysql.connector
import pandas
googleSheedID = '15fhpxqWDRWkNtEFhi9bQyWUg8pDn4B-R2N18s1xFYTU'
worksheetName = 'Sheet1'
URL = 'https://docs.google.com/spreadsheets/d/{0}/gviz/tq?tqx=out:csv&sheet={1}'.format(
googleSheedID,
worksheetName
)
df = pandas.read_csv(URL)
del df['DATE']
del df['USERNAME']
del df['LINK']
del df['LINK2']
unique_results = []
for i in range(len(df['TWEET'])):
if 'RT' in df["TWEET"][i]:
continue
else:
for j in range(len(df['TWEET'][i])-1):
if df['TWEET'][i][j] == '$':
if df['TWEET'][i][j+1] == '1' or df['TWEET'][i][j+1] == '2' or df['TWEET'][i][j+1] == '3' or\
df['TWEET'][i][j+1] == '4' or df['TWEET'][i][j+1] == '5' or df['TWEET'][i][j+1] == '6' or\
df['TWEET'][i][j+1] == '7' or df['TWEET'][i][j+1] == '8' or df['TWEET'][i][j+1] == '9' or df['TWEET'][i][j+1] == '0':
continue
else:
start = j
for k in range(start, len(df['TWEET'][i])):
if df['TWEET'][i][k] == ' ' or df['TWEET'][i][k:k+1] == '\n':
end = k
break
results = df['TWEET'][i][start:end]
if results not in unique_results:
unique_results.append(results)
print(unique_results)
edit: fixed the code
The outputs are:
['$GME', '$SNDL', '$FUBO', '$AMC', '$LOTZ', '$CLOV', '$USAS', '$AIHS', '$PLM', '$LODE', '$TTNP', '$IMTE', '', '$NAK.', '$NAK', '$CRBP', '$AREC', '$NTEC', '$NTN', '$CBAT', '$ZYNE', '$HOFV', '$GWPH', '$KERN', '$ZYNE,', '$AIM', '$WWR', '$CARV', '$VISL', '$SINO', '$NAKD', '$GRPS', '$RSHN', '$MARA', '$RIOT', '$NXTD', '$LAC', '$BTC', '$ITRM', '$CHCI', '$VERU', '$GMGI', '$WNBD', '$KALV', '$EGOC', '$Veru', '$MRNA', '$PVDG', '$DROP', '$EFOI', '$LLIT', '$AUVI', '$CGIX', '$RELI', '$TLRY', '$ACB', '$TRCH', '$TRCH.', '$TSLA', '$cciv', '$sndl', '$ANCN', '$TGC', '$tlry', '$KXIN', '$AMZN', '$INFI', '$LMND', '$COMS', '$VXX', '$LEDS', '$ACY', '$RHE', '$SINO.', '$GPL', '$SPCE', '$OXY', '$CLSN', '$FTFT', '$FTFT.....', '$BIEI', '$EDRY', '$CLEU', '$FSR', '$SPY', '$NIO', '$LI', '$XPEV,', '$UL', '$RGLG', '$SOS', '$QS', '$THCB', '$SUNW', '$MICT', '$BTC.X', '$T', '$ADOM', '$EBON', '$CLPS', '$HIHO', '$ONTX', '$WNRS', '$SOLO', '$Mara,', '$Riot,', '$SOS,', '$GRNQ,', '$RCON,', '$FTFT,', '$BTBT,', '$MOGO,', '$EQOS,', '$CCNC', '$CCIV', '$tsla', '$fsr', '$wkhs', '$ride', '$nio', '$NETE', '$DPW', '$MOSY', '$SSNT', '$PLTR', '$GSAH:', '$EQOS', '$MTSL', '$CMPS', '$CHIF', '$MU', '$HST', '$SNAP', '$CTXR', '$acy', '$FUBOTV', '$DPBE', '$HYLN', '$SPOT', '$NSAV', '$HYLN,', '$aabb', '$AAL', '$BBIG', '$ITNS', '$CTIB', '$AMPG', '$ZI', '$NUVI', '$INTC', '$TSM', '$AAPL', '$MRJT', '$RCMT', '$IZEA', '$BBIG,', '$ARKK', '$LIAUTO', '$MARA:', '$SOS:', '$XOM', '$ET', '$BRNW', '$SYPR', '$LCID', '$QCOM', '$FIZZ', '$TRVG', '$SLV', '$RAFA', '$TGCTengasco,', '$BYND', '$XTNT', '$NBY', '$sos', '$KMPH', '$', '$(0.60)', '$(0.64)', '$BIDU', '$rkt', '$GTT', '$CHUC', '$CLF', '$INUV', '$RKT', '$COST', '$MDCN', '$HCMC', '$UWMC', '$riot', '$OVID', '$HZON', '$SKT', '$FB', '$PLUG', '$BA', '$PYPL', '$PSTH.', '$NVDA', '$AMPG.', '$aese.', '$spy', '$pltr', '$MSFT', '$AMD', '$QQQ', '$LTNC', '$WKHS', '$EYES', '$RMO', '$GNUS', '$gme', '$mdmp', '$kern', '$AEI', '$BABA', '$YALA', '$TWTR', '$WISH', '$GE', '$ORCL', '$JUPW', '$TMBR', '$SSYS', '$NKE', '$AMPGAmpliTech', '$$$', '$$', '$RGLS', '$HOGE', '$GEGR', '$nclh', '$IGAC', '$FCEL', '$TKAT', '$OCG', '$YVR', '$IPDN.', '$IPDN', "$SINO's", '$WIMI', '$TKAT.', '$BAC', '$LZR', '$LGHL', '$F', '$GM', '$KODK', '$atvk', '$ATVK', '$AIKI', '$DS', '$AI', '$WTII', '$oxy', '$DYAI', '$DSS', '$ZKIN', '$MFH', '$WKEY', '$MKGI', '$DLPN', '$PSWW', '$SNOW', '$ALYA', '$AESE', '$CSCW', '$CIDM', '$HOFV.', '$LIVX', '$FNKO', '$HPR', '$BRQS', '$GIGM', '$APOP', '$EA', '$CUEN', '$TMBR?', '$FLNT,', '$APPS', '$METX', '$STG', '$WSRC', '$AMHC', '$VIAC', '$MO', '$UAVL', '$CS', '$MDT', '$GYST', '$CBBT', '$ASTC', '$AACG', '$WAFU.', '$WAFU', '$CASI', '$mmmw', '$MVIS', '$SNOA', '$C', '$KR', '$EWZ', '$VALE', '$EWZ.', '$CSCO', '$PINS', '$XSPA', '$VPRX', '$CEMI', '$M', '$BMRA', '$SPX', '$akt', '$SURG', '$NCLH', '$ARSN', '$ODT', '$SGBX', '$CRWD.', '$TGRR', '$PENN', '$BB', '$XOP', '$XL', '$FREQ', '$IDRA', '$DKNG', '$COHN', '$ADHC', '$ISWH', '$LEGO', '$OTRA', '$NAAC', '$HCAR', '$PPGH', '$SDAC', '$PNTM', '$OUST', '$IO', '$HQGE', '$HENC', '$KYNC', '$ATNF', '$BNSO', '$HDSN', '$AABB', '$SGH', '$BMY', '$VERY', '$EARS', '$ROKU', '$PIXY', '$APRE', '$SFET', '$SQ', '$EEIQ', '$REDU', '$CNWT', '$NFLX', '$RGBPP', '$RGBP', '$SHOP', '$VITL', '$RAAS', '$CPNG', '$JKS', '$COMP', '$NAFS']
You can use regular expressions.
\$[a-zA-Z]+
After reading the df execute the below code
import re
# Create Empty list for final results
results = []
final_results = []
for row_num in range(len(df['TWEET'])):
string_to_check = df['TWEET'][row_num]
# Check for RT at the beginning of the string only.
# if 'RT' in df["TWEET"][row_num] would have found the "RT" anywhere in the string.
if re.match(r"^RT", string_to_check):
continue
else:
# Check for all words starting with $ and followed by only alphabets.
# This will find $FOOBAR but not $600, $6FOOBAR & $FOO6BAR
rel_text_l = re.findall(r"\$[a-zA-Z]+", string_to_check)
# Check for empty list
if rel_text_l:
# Add elements of list to another list directly
results.extend(rel_text_l)
# Making list of the set of list to remove duplicates
final_results = list(set(results))
print(results)
print(final_results)
The results are
['$GME', '$FOOBAR', '$FOO', '$SNDL', '$FUBO', '$AMC', '$GME', '$LOTZ', '$CLOV', '$USAS', '$GOBLIN', '$LTNC']
['$LTNC', '$GOBLIN', '$AMC', '$FOO', '$FOOBAR', '$LOTZ', '$CLOV', '$SNDL', '$GME', '$USAS', '$FUBO']
Notice that $GME is removed once in final_results
If you were not bothered about remove tweets starting with RT, all this could be achieved in one line of code.
direct_result = list(set(re.findall(r"\$[a-zA-Z]+", str(df['TWEET']))))
I am trying to decrypt a file in Python that I encrypted with another program. Some letters are correctly decrypted while others are not. I am not sure what is going on. All I essentially did was reverse the code for the decryption files. I think it has to do with the way it is iterating through the text, but I am not sure how to fix it.
Here is my decryption code:
decryption_library = {'%':'A','9':'a','#':'B','#':'b','1':'C','2':'c','3':'D','4':'d',
'5':'E','6':'e','7':'F','8':'f','0':'G','}':'g','{':'H',']':'h','[':'I',',':'i',
'.':'J','>':'j','<':'K','/':'k','0':'L','\-':'l','\"':'M',':':'m',';':'N',
'+':'n','$':'O','-':'o','$':'Q','%':'q','^':'R','&':'r','*':'S',
'(':'s',')':'T','~':'t','`':'U','5':'u','\\':'V','+':'v','=':'W','7':'w',
'~':'X',')':'x','2':'Y','*':'y',']':'Z','8':'z'}
orig_file = open('ENCRYPTED_Plain_Text_File.txt','r')
file_read = orig_file.read()
orig_file.close()
encrypt_file = open('DECRYPTED_Plain_Text_File.txt','w')
for ch in file_read:
if ch in decryption_library:
encrypt_file.write(decryption_library[ch])
else:
encrypt_file.write(ch)
encrypt_file.close()
encrypt_file = open('ENCRYPTED_Plain_Text_File.txt','r')
file_read = encrypt_file.read()
encrypt_file.close()
codes_items = decryption_library.items()
for ch in file_read:
if not ch in decryption_library.values() or ch == '.' or ch == ',' or ch == '!':
print(ch)
else:
for k,v in codes_items:
if ch == v and ch != '.':
print(k,end='')
Here is the encrypted text:
)]6 ^-94 ;-~ )9/6+
#2 ^$#5^) 7^$*)
)7- &-94( 4,+6&}64 ,+ 9 *6\-\--7 7--4,
%+4 (-&&* [ 2-5\-4 +-~ ~&9+6\- #-~]
%+4 #6 -+6 ~&9+6\-6&, \--+} [ (~--4
%+4 \---/64 4-7+ -+6 9( 89& 9( [ 2-5\-4
)- 7]6&6 ,~ #6+~ ,+ ~]6 5+46&}&-7~];
Here is what it should be:
The Road Not Taken
BY ROBERT FROST
Two roads diverged in a yellow wood,
And sorry I could not travel both
And be one traveler, long I stood
And looked down one as far as I could
To where it bent in the undergrowth;
Here is what it decrypts to:
xZe Road NoX xakev
BY RQBuRx wRQyx
xwo roads diverged iv a yeVoVoow woodi
qvd sorry I YouVod voX XraveVo boXZ
qvd be ove XraveVoeri Voovg I sXood
qvd Voooked dowv ove as zar as I YouVod
xo wZere iX bevX iv XZe uvdergrowXZN
Your decryption_library is not correct. F.e for index ')' you have value 'T' and also 'x'
I have some problems with writing some Gerrit http://code.google.com/p/gerrit/ hooks.
http://gerrit.googlecode.com/svn/documentation/2.2.0/config-hooks.html
If I parse the command line for
patchset-created --change --change-url --project --branch --uploader --commit --patchset
def main():
if (len(sys.argv) < 2):
showUsage()
exit()
if (sys.argv[1] == 'update-projects'):
updateProjects()
exit()
need = ['action=', 'change=', 'change-url=', 'commit=', 'project=', 'branch=', 'uploader=',
'patchset=', 'abandoner=', 'reason=', 'submitter=', 'comment=', 'CRVW=', 'VRIF=' , 'patchset=' , 'restorer=', 'author=']
print sys.argv[1:]
print '-----'
optlist, args = getopt.getopt(sys.argv[1:], '', need)
id = url = hash = who = comment = reason = codeReview = verified = restorer = ''
print optlist
for o, a in optlist:
if o == '--change': id = a
elif o == '--change-url': url = a
elif o == '--commit': hash = a
elif o == '--action': what = a
elif o == '--uploader': who = a
elif o == '--submitter': who = a
elif o == '--abandoner': who = a
elif o == '--author' : who = a
elif o == '--branch': branch = a
elif o == '--comment': comment = a
elif o == '--CRVW' : codeReview = a
elif o == '--VRIF' : verified = a
elif o == '--patchset' : patchset = a
elif o == '--restorer' : who = a
elif o == '--reason' : reason = a
Command line input:
--change I87f7802d438d5640779daa9ac8196aeb3eec8c2a
--change-url http://<hostname>:8080/308
--project private/bar
--branch master
--uploader xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)
--commit 49aae9befaf27a5fede51b498f0660199f47b899 --patchset 1
print sys.argv[1:]
['--action', 'new',
'--change','I87f7802d438d5640779daa9ac8196aeb3eec8c2a',
'--change-url',
'http://<hostname>:8080/308',
'--project', 'private/bar',
'--branch', 'master',
'--uploader', 'xxxxxxx-xxxxx', 'xxxxxxx', '(xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)',
'--commit', '49aae9befaf27a5fede51b498f0660199f47b899',
'--patchset', '1']
print optlist
[('--action', 'new'),
('--change', 'I87f7802d438d5640779daa9ac8196aeb3eec8c2a'),
('--change-url', 'http://<hostname>:8080/308'),
('--project', 'private/bar'),
('--branch', 'master'),
('--uploader', 'xxxxxxx-xxxxx')]
I don't know why the script generates
'--uploader', 'xxxxxxx-xxxxx', 'xxxxxxx', '(xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)'
and not
'--uploader', 'xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)'
because so the script dont't parse --commit --patchset ...
When I parse comment-added all things works:
Command line input:
-change I87f7802d438d5640779daa9ac8196aeb3eec8c2a
--change-url http://<hostname>.intra:8080/308
--project private/bar
--branch master
--author xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)
--commit 49aae9befaf27a5fede51b498f0660199f47b899
--comment asdf
--CRVW 0
--VRIF 0
print sys.argv[1:]
'--action', 'comment',
'--change', 'I87f7802d438d5640779daa9ac8196aeb3eec8c2a',
'--change-url',
'http://<hostname>:8080/308',
'--project', 'private/bar',
'--branch', 'master',
'--author', 'xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)', <<< That's right!
'--commit', '49aae9befaf27a5fede51b498f0660199f47b899',
'--comment', 'asdf',
'--CRVW', '0',
'--VRIF', '0']
As the options names and values are space-separated, you have to put the values in quotes if they contain spaces themselves.
If you write --uploader xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx), the last two strings will actually end up in args from the line
optlist, args = getopt.getopt(sys.argv[1:], '', need)
as they are not associated with --uploader
You should quote an argument, if it contains spaces, like for all commandline tools:
--uploader "xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)"
You may also consider using gnu_getopt() as it would allow you to mix option and non-option arguments.
From the Documentation
The getopt() function stops processing options as soon as a non-option argument is encountered
If you use gnu_getopt, the rest of the options namely commit and pathset will still be parsed correctly even though the uploader argument has missing quotes