how to create groups from string? [duplicate] - python
This question already has answers here:
How do I split a list into equally-sized chunks?
(66 answers)
Closed 2 years ago.
I have string and I will do split for this string and then I will get 320 elements. I need to create 4 groups. Every group will be with 100 elements and the last group must be with 20 last elements. And the last step is that all groups must be string and not list. how can I do that?
I can do that if I know how many elements I have:
s_l = 'AA AAL AAOI ACLS ADT ADTX ADVM AEL AIG ALEC ALLY ALT AMCX ANGI APA APDN APLS APPS APRN AQUA ARMK ARNC ARVN ATNM ATOM ATRA ATSG AVCT AVT AX AXDX BCLI BE BEAM BJRI BKE BKU BLDR BLNK BMRA BOOT BXS BYD CAKE CALX CAPR CARG CARR CARV CATM CC CCL CELH CEQP CFG CHEF CHRS CIT CLDX CLR CLSK CNK CNST CODX COLB COOP CPE CRS CTVA CUK CVET CVI CVM CYTK DAL DBX DCP DDS DEI DISCA DISCK DK DNB DRNA DVAX DXC ECOM EIGR ELAN ELF ELY ENVA EQ EQT EXEL FE FHI FIXX FL FLWS FMCI FORM FOX FOXA FRTA FUN GBX GIII GM GNMK GOCO GPRE GRAF GRPN GRWG GTHX GWB HALO HCC HCSG HEAR HFC HGV HIBB HMSY HOG HOME HP HSC HTH HWC IMUX IMVT INO INOV INSG INSM INT IOVA IRDM ITCI JELD JWN KMT KODK KPTI KSS KTB KTOS KURA LAKE LB LCA LL LPI LPRO LSCC LYFT MAXR MBOT MCRB MCS MD MDP MGM MGNX MIC MLHR MOS MRSN MTOR MXL MYGN NCLH NCR NK NKTR NLS NMIH NOVA NTLA NTNX NUAN NVST NXTC ODP OFC OKE OMER OMF OMI ONEM OSPN OSUR OXY OZK PACW PD PDCE PDCO PEAK PGNY PLAY PLCE PLT PLUG PPBI PRPL PRTS PRVB PS PSNL PSTX PSXP PTGX PVAC RCUS REAL REZI RKT RMBL RPAY RRGB RRR RVLV RVP RXN SANM SAVE SBGI SC SCPL SEAS SEM SFIX SFM SGMS SGRY SHLL SHOO SHYF SIX SKX SLQT SMCI SNAP SNDX SNV SONO SPAQ SPCE SPR SPWH SPWR SRG SRNE SSNT SSSS STOR SUM SUN SUPN SVMK SWBI SYF SYRS TBIO TCDA TCF TCRR TDC TEX TFFP TGTX THC TMHC TRGP TRIP TSE TUP TVTY UBX UCBI UCTT UFS UNFI UONE UPWK URBN USFD VCRA VERI VIAC VIRT VIVO VREX VSLR VSTO VXRT WAFD WBS WFC WHD WIFI WKHS WORK WORX WRK WRTC WW WWW WYND XEC XENT XPER XRX YELP ZGNX ZUMZ ZYXI'
split_s_l = s_l.split(" ")
part_1 = ' '.join(split_s_l[:100])
part_2 = ' '.join(split_s_l[100:200])
part_3 = ' '.join(split_s_l[200:300])
part_4 = ' '.join(split_s_l[300:])
for part in (part_1, part_2, part_3, part_4):
print(part)
but I don't know how to do that If I have many elements in list.
For a variable number of items, you can loop using:
sep = ' '
num = 100
split_s_l = s_l.split(sep)
for i in range(0, len(split_s_l), num):
part = sep.join(split_s_l[i : i+num])
print(part)
Bear in mind that for the last slice in the example case ([300:400]) it does not matter that there are only 320 elements -- just the last 20 items will be included (no error).
Something like this?
def break_up(s, nwords, separator):
words = s.split()
return [separator.join(words[n:n+nwords]) for n in range(0, len(words), nwords)]
print(break_up('a b c d e f g h', 3, ' '))
Result:
['a b c', 'd e f', 'g h']
Of course, you might call as print(break_up(s_l, 100, ' '))
Below
s_l = 'AA AAL AAOI ACLS ADT ADTX ADVM AEL AIG ALEC ALLY ALT AMCX ANGI APA APDN APLS APPS APRN AQUA ARMK ARNC ARVN ATNM ATOM ATRA ATSG AVCT AVT AX AXDX BCLI BE BEAM BJRI BKE BKU BLDR BLNK BMRA BOOT BXS BYD CAKE CALX CAPR CARG CARR CARV CATM CC CCL CELH CEQP CFG CHEF CHRS CIT CLDX CLR CLSK CNK CNST CODX COLB COOP CPE CRS CTVA CUK CVET CVI CVM CYTK DAL DBX DCP DDS DEI DISCA DISCK DK DNB DRNA DVAX DXC ECOM EIGR ELAN ELF ELY ENVA EQ EQT EXEL FE FHI FIXX FL FLWS FMCI FORM FOX FOXA FRTA FUN GBX GIII GM GNMK GOCO GPRE GRAF GRPN GRWG GTHX GWB HALO HCC HCSG HEAR HFC HGV HIBB HMSY HOG HOME HP HSC HTH HWC IMUX IMVT INO INOV INSG INSM INT IOVA IRDM ITCI JELD JWN KMT KODK KPTI KSS KTB KTOS KURA LAKE LB LCA LL LPI LPRO LSCC LYFT MAXR MBOT MCRB MCS MD MDP MGM MGNX MIC MLHR MOS MRSN MTOR MXL MYGN NCLH NCR NK NKTR NLS NMIH NOVA NTLA NTNX NUAN NVST NXTC ODP OFC OKE OMER OMF OMI ONEM OSPN OSUR OXY OZK PACW PD PDCE PDCO PEAK PGNY PLAY PLCE PLT PLUG PPBI PRPL PRTS PRVB PS PSNL PSTX PSXP PTGX PVAC RCUS REAL REZI RKT RMBL RPAY RRGB RRR RVLV RVP RXN SANM SAVE SBGI SC SCPL SEAS SEM SFIX SFM SGMS SGRY SHLL SHOO SHYF SIX SKX SLQT SMCI SNAP SNDX SNV SONO SPAQ SPCE SPR SPWH SPWR SRG SRNE SSNT SSSS STOR SUM SUN SUPN SVMK SWBI SYF SYRS TBIO TCDA TCF TCRR TDC TEX TFFP TGTX THC TMHC TRGP TRIP TSE TUP TVTY UBX UCBI UCTT UFS UNFI UONE UPWK URBN USFD VCRA VERI VIAC VIRT VIVO VREX VSLR VSTO VXRT WAFD WBS WFC WHD WIFI WKHS WORK WORX WRK WRTC WW WWW WYND XEC XENT XPER XRX YELP ZGNX ZUMZ ZYXI'
words= s_l.split(" ")
num_of_100_groups = int(len(words) / 100)
groups = []
for i in range(0,num_of_100_groups):
groups.append(words[i * 100 : (i+1) * 100])
groups.append(words[num_of_100_groups * 100:])
for sub_group in groups:
print(' '.join(sub_group))
Related
how to retrieve sector and industry for a list of tickers with python?
I have a list of tickers (below: tick1) that comes from the Earnings Report. I would like to add the "shortname", "sector" and the "industry" next to the ticker while creating a dataframe. Unfortunately, the columns are always shuffeling up a bit and they are not matched properly. for instance: VFC --> sector: technology; industry: Semiconductors, which is wrong. It should be sector: Consumer Cyclical; industry: Apparel Manufacturing Here is my code below: can you please help to adjust it? ---tickers to be read--- import yfinance as yf with open("/Users/Doc/AB/Earnings/tickers.txt") as fh: tick1 = fh.read().split() tickers in txt file ABOS ACRX ADI ADMP ADOCY AER AGYS AINV ALBO ALLT AMAT AMPS AOZOY ARCO AREC ARZGY ATAI AUTO AVAL AXDX BAH BBAR BBWI BHIL BJ BKYI BLBX BPCGY BPTH BRDS BZFD CAAP CAE CALT CCHWF CCSI CELC CFRHF CGEN CINT CLSN CMRX CRLBF CRXT CSCO CSWI CVSI CWBHF CWBR DAC DADA DE DECK DESP DLO DOYU DTST DUOT EAST EBR EBR.B EDAP ENJY EVTV EXP FATH FL FLO FSI FTK FUV FXLV GAN GBOX GDS GLBE GLOB GNLN GOED GOGL GRAB GRAMF GRCL HD HOOK HPK HUYA HWKN HYRE IBEX IGIC IKT IMPL INLB INLX INVO IONM IONQ IPW IPWR ISUN ITCTY JBI JD JHX JMIA KALA KBNT KEYS KMDA KORE KSLLF KSS KULR LOW LTRY LUNA LVLU MARK MBT MCG MCLD MDWD MDWT MIGI MIRO MNDY MNMD MNRO MSADY MSGM MUFG MVST NEXCF NGS NNOX NOVN NRDY NRGV NU NXGN OBSV OEG OMQS ONON PANW PASG PCYG PEAR PLNHF PLX PTE PTN PXS QIPT QRHC QTEK QUIK RCRT RDY REE REED REKR RKLB RMED RMTI ROST RSKD RYAAY SANW SCVL SDIG SE SHLS SHPW SHWZ SLGG SNPS SPRO SQM SRAD SSYS SUNL SUNW SUPV SYN SYRS TCEHY TCRT TCS TGI TGT THBRF TJX TKOMY TLLTF TME TRMR TSEM TSHA TTWO TXMD USWS VBLT VERB VEV VFC VIPS VJET VOXX VTRU VVOS VWE VYGVF VYNT WEBR WEDXF WEJO WIX WMS WMT WRBY WYY YALA YOU ZIM ---adding the shortname, sector, industry --- from yahooquery import Ticker import pandas as pd symbols = tick1 tickers = Ticker(symbols, asynchronous=True) datasi = tickers.get_modules("summaryProfile quoteType") dfsi = pd.DataFrame.from_dict(datasi).T dataframes = [pd.json_normalize([x for x in dfsi[module] if isinstance(x, dict)]) for module in ['summaryProfile', 'quoteType']] dfsi = pd.concat(dataframes, axis=1) dfsi
import pandas as pd from yahooquery import Ticker symbols = ['TSHA', 'GRAMF', 'VFC', 'ABOS', 'INLX', 'INVO', 'IONM', 'IONQ'] tickers = Ticker(symbols, asynchronous=True) datasi = tickers.get_modules("summaryProfile quoteType") dfsi = pd.DataFrame.from_dict(datasi).T dataframes = [pd.json_normalize([x for x in dfsi[module] if isinstance(x, dict)]) for module in ['summaryProfile', 'quoteType']] dfsi = pd.concat(dataframes, axis=1) dfsi = dfsi.set_index('symbol') dfsi = dfsi.loc[symbols] print(dfsi[['industry', 'sector']]) Output industry sector symbol TSHA Biotechnology Healthcare GRAMF Drug Manufacturers—Specialty & Generic Healthcare VFC Apparel Manufacturing Consumer Cyclical ABOS Biotechnology Healthcare INLX Software—Application Technology INVO Medical Devices Healthcare IONM Medical Care Facilities Healthcare IONQ Computer Hardware Technology Try the following. Set the column'symbol' as indexes. And send it to the ticker list. Again, you need to check. I have run the ticker 'VFC' several times: VFC industry---Apparel Manufacturing, sector---Consumer Cyclical.
Problems in trying to decode through statistics
My goal is to decode a message with the statistics of letters occurrence in the French language. For that, I created a first function which creates a string in decreasing order of occurrence in the text. Then, I match the index of the characters and I change each character with the one corresponding to the French language. The problem is that the text comes out unchanged in the console. I am a beginner in programming and any help on my mistakes or inaccuracies are appreciated. Thanks !!! import string texte = f"""iwnspa rynjjdj arg sj hjuajhask awaigkhihaj ag sj ongyaonghihaj ps vvha rhaiwa, rdsqajg idjrhpaka idooa wa xaka pa wn gyadkha pa w’hjbdkonghdj. hw arg ja wa 30 nqkhw 1916 n xagdrlaf, pnjr wa ohiyhunj. rdj xaka arg sj csua, ag rn oaka arg wa xkdqhrask ps wfiaa pa unfwdkp, sja nsgka qhwwa ps ohiyhunj. hw agspha n w’sjhqakrhga ps ohiyhunj ds hw rshg sj pdszwa iskrsr aj awaigkhihga ag aj ongyaonghmsar. hw dzghajg sja whiajia pnjr iar pasv phrihxwhjar aj 1936, nqnjg pa xdskrshqka rar agspar ns kaxsga ohg (onrrniysraggr hjrghgsga db gaiyjdwduf ). hw rdsghajg aj 1940 sja gyara pa pdigdkng aj ongyaonghmsar (awwa xdkgnhg rsk par nxxwhinghdjr par ongyaonghmsar n wn uajaghmsa) ag sj oaodhka pa onrgak aj awaigkhihga. rh wn xkaohaka bsg wnkuaoajg hujdkaa, wn raidjpa,msh avxwhmsa idooajg sghwhrak war nwuazkar pa zddwa xdsk w’njnwfra par rhujnsv awaigkhmsar, arg kargaa iawazka. aj 1941, hw arg aoznsiya pnjr war wnzdkngdhkar pa wn idoxnujha pa gawaxydja ngg zaww. hooaphngaoajg, hw gknqnhwwa rsk par xkdcagr aj whnhrdj nqai war rakqhiar raikagr, msh wsh bdjg jdgnooajg nzdkpak par msarghdjr pa ikfxgduknxyha. aj 1949, hw ra onkha; xnk wn rshga, hw nskn gkdhr ajbnjgr. rynjjdj gknqnhwwa nsv wnzdkngdhkar pa zaww csrms’aj 1971. xnknwwawaoajg n iawn, hw arg nsrrh xkdbarrask ns ohg pa 1958 n 1978. rynjjdj idoxkajpmsa gdsga pdjjaa, oaoa wn qdhv ds par honuar, xasg ra gknjroaggka n w’nhpa p’sja rshga pa 0 ag pa 1 (war zhgr), dsqknjg wn qdha nsv idoosjhinghdjr jsoakhmsar ag jdj xwsr njnwduhmsar. hw odjgka nsrrh idooajg wa bnhg p’ncdsgak iakgnhjr zhgr n sj oarrnua xasg xakoaggka pa qakhbhak msa war nsgkar djg aga idkkaigaoajg gknjrohr (dj xnkwa pa idpa idkkaigask p’akkaskr). hw n kais pa jdozkasv ydjjaskr, pdjg wn oapnhwwa jnghdjnwa par rihajiar par onhjr ps xkarhpajg cdyjrdj aj 1966, ag wa xkhv lfdgd aj 1985. n wn bhj pa rn qha, hw rdsbbka pa wn onwnpha p’nwtyahoak, ia msh wa idjpshg pnjr sja onhrdj pa kaxdr ps onrrniysraggr. hw f paiapa wa 24 baqkhak 2001, n w’nua pa 84 njr.""" def count(texte): ordre_txt = {} order = "" texte.lower() alfb = list(string.ascii_lowercase) for i in range(len(alfb)): ordre_txt[alfb[i]] = texte.count(alfb[i]) ordre_txt = sorted(ordre_txt.items(), key = lambda x: x[1], reverse = True) for elt in ordre_txt: order += elt[0] return order def trad2(texte): ordre_fr = 'esaitnruolhdcgmpvfqbjxzykw' ordre_txt = count(texte) for i in range(26): texte.replace(ordre_txt[i], ordre_fr[i]) return texte print(trad2(texte))
str.lower() Return a copy of the string with all the cased characters converted to lowercase. So use texte = texte.lower() instead of texte.lower() at the 3rd line in def count(texte):. Apply str.translate(table) in def trad2(texte): e.g. as follows: def trad2(texte): ordre_fr = 'esaitnruolhdcgmpvfqbjxzykw' ordre_txt = count(texte) tr_table = str.maketrans(ordre_txt, ordre_fr) return texte.translate(tr_table) Note the str.maketrans() static method (returns a translation table usable for str.translate())…If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y.
Normalization words for sentiment analysis
I'm currently doing sentiment analysis and having a problem. I have a big normalization for word and I want to normalization text before tokenize like this example: data normal kamu knp sayang kamu kenapa sayang drpd sedih mending belajar dari pada sedih mending belajar dmna sekarang di mana sekarang knp: kenapa drpd: dari pada dmna: di mana This is my code: import pandas as pd slang = pd.DataFrame({'before': ['knp', 'dmna', 'drpd'], 'after': ['kenapa', 'di mana', 'dari pada']}) df = pd.DataFrame({'data': ['kamu knp sayang', 'drpd sedih mending bermain']}) normalisasi = {} for index, row in slang.iterrows(): if row[0] not in normalisasi: normalisasi[row[0]] = row[1] def normalized_term(document): return [normalisasi[term] if term in normalisasi else term for term in document] df['normal'] = df['data'].apply(normalized_term) df But, the result like this: result I want the result like the example table.
There is a utility named str.replace in pandas that allows us to replace a substring with another or even find/replace patterns. You can find full documentation here. Your desired output would have appeared like this: UPDATE There were two things wrong with the answer: You must only replace in whole word mode, not subword After each entry in the slang file you must keep the changes not discard them So it would be like this: import pandas as pd df = pd.read_excel('data bersih.xlsx') slang = pd.read_excel('slang.xlsx') df['normal'] = df.text for idx, row in slang.iterrows(): df['normal'] = df.normal.str.replace(r"\b"+row['before']+r"\b", row['after'], regex=True) output: text \ 0 hari ini udh mulai ppkm yaa 1 mohon info apakah pgs pasar turi selama ppkm b... 2 di rumah aja soalnya lagi ppkm entah bakal nga... 3 pangkal penanganan pandemi di indonesia yang t... 4 ppkm mikro anjingggggggg ... ... 9808 drpd nonton sinetron mending bagi duit kayak g... 9809 ppkm pelan pelan kalau masukin 9810 masih ada kepala desa camat bahkan kepala daer... 9811 aku suka ppkm tapi tanpa pp di depannya 9812 menteri ini perlu tidak dibayarkan gajinya set... normal 0 hari ini sudah mulai ppkm yaa 1 mohon informasi apakah pgs pasar turi selama p... 2 di rumah saja soalnya lagi ppkm entah bakal se... 3 pangkal penanganan pandemi di indonesia yang t... 4 ppkm mikro anjingggggggg ... ... 9808 dari pada nonton sinema elektronik lebih baik ... 9809 ppkm pelan pelan kalau masukkan 9810 masih ada kepala desa camat bahkan kepala daer... 9811 aku suka ppkm tapi tanpa pulang pergi di depannya 9812 menteri ini perlu tidak dibayarkan gajinya set... [9813 rows x 2 columns]
Python error: string indices must be integers
i need to write a small json data object with python, but when i use this, it don't work, what do i wrong? This is for the newest version of Python import urllib, json import requests import json with open('locaties.json') as json_file: data = json.load(json_file) for parkeerlocaties in data['parkeerlocaties']: for locatie in parkeerlocaties['parkeerlocatie']: for title in locatie['title']: print("Hello World") {"parkeerlocaties":[{"parkeerlocatie":{"title":"Fietsenstalling Tolhuisplein","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9032801,52.3824545]}","type":"Fietspunt","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/fiets\/fietsparkeren\/gemeentelijke\/","urltitle":"www.amsterdam.nl\/fiets","adres":"Buiksloterweg 3","postcode":"1031 CC","woonplaats":"Amsterdam","opmerkingen":"Alleen toegankelijk voor abonnementhouders van Tolhuisplein, automatische stalling"}},{"parkeerlocatie":{"title":"Fietsenstalling Paradiso","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8833735,52.3621851]}","type":"Fietspunt","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/fiets\/fietsparkeren\/gemeentelijke\/","urltitle":"www.amsterdam.nl\/fiets","adres":"Weteringschans 4 A","postcode":"1017 SG","woonplaats":"Amsterdam","opmerkingen":"Maximale parkeerduur 28 dagen, stalling met toezicht"}},{"parkeerlocatie":{"title":"Fietsenstalling Zuidplein","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8719467,52.3398642]}","type":"Fietspunt","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/fiets\/fietsparkeren\/gemeentelijke\/","urltitle":"www.amsterdam.nl\/fiets","adres":"Zuidplein 5","postcode":"1077 XV","woonplaats":"Amsterdam","opmerkingen":"Maximale parkeerduur 28 dagen, stalling met toezicht"}},{"parkeerlocatie":{"title":"Fietsenstalling Station Rai (gesloten tot februari 2019)","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8905079,52.339392]}","type":"Fietspunt","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/fiets\/fietsparkeren\/gemeentelijke\/","urltitle":"www.amsterdam.nl\/fiets","adres":"Europaboulevard 4","postcode":"1083 AD","woonplaats":"Amsterdam","opmerkingen":"Sluit voor renovatie op 21 juli 2018. Er zijn rond het station extra parkeerplekken voor fiets gemaakt."}},{"parkeerlocatie":{"title":"P+R Zeeburg","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9607015,52.3719632]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#h4f9f93f8-875b-4d18-936a-c1eba9d6f198","urltitle":"www.amsterdam.nl\/penr ","adres":"Zuiderzeeweg 46 a","postcode":"1095KJ","woonplaats":"Amsterdam","opmerkingen":"","OV_bus":"bus 37 Noord - Amstelstation vv","OV_tram":"tram 26 Ijburg - Centraal Station vv","OV":"tram;GVB_26_1;08240, bus;GVB_37_2;08134"}},{"parkeerlocatie":{"title":"Weekend P+R VUmc","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8611063,52.3361167]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#hdce18cfd-fc8f-4728-be57-2d9a23b494d9","urltitle":"www.amsterdam.nl\/penr","adres":"Gustav Mahlerlaan 3004","postcode":"1081 LA","woonplaats":"Amsterdam","opmerkingen":"","OV_metro":"metro 51 Isolatorweg - Centraal Station vv (maart 2019 t\/m eind 2020), metro 50 met overstap Overamstel op 51 Centraal Station","OV_tram":"tram 24 VU medisch centrum - Centraal Station vv, tram 5 Amstelveen - Van Hallstraat vv","OV":"metro;GVB_50_1;07343;09563, tram;GVB_24_1;07350, tram;GVB_5_1;07410"}},{"parkeerlocatie":{"title":"P+R Bos en Lommer","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8453671,52.379131]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#h9434503d-d323-4331-b792-5210ce062c42","urltitle":"www.amsterdam.nl\/penr ","adres":"Leeuwendalersweg 23 b","postcode":"1055JE","woonplaats":"Amsterdam","opmerkingen":"","OV_bus":"bus 21 Geuzenveld - Centraal Station vv","OV_tram":"tram 7 Slotermeer - Azartplein vv","OV":"bus;GVB_21_1;03060, tram;GVB_7_1;03167"}},{"parkeerlocatie":{"title":"P+R Sloterdijk","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8384209,52.3900128]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#h628fb483-dec3-4a9d-9d52-50136e9639ec","urltitle":"www.amsterdam.nl\/penr","adres":"Piarcoplein 1","postcode":"1043DW","woonplaats":"Amsterdam","opmerkingen":"","OV_bus":"bus 22 Station Sloterdijk - Muiderpoortstation vv","OV_metro":"metro 50 Isolatorweg - Gein vv, overstap 51 op Station Zuid \/ Station RAI \/ Overamstel","OV_tram":"tram 19 Station Sloterdijk - Diemen vv","OV_trein":"Treinen tussen station Sloterdijk en de stations CS, Muiderpoort en Amstel (GVB P+R-kaart niet geldig)","OV":"tram;GVB_19_1;02361;00014, metro;GVB_50_1;02295;09563, metro;GVB_51_1;*09563, bus;GVB_22_1;02367;00001"}},{"parkeerlocatie":{"title":"P+R Olympisch Stadion","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8539215,52.3440266]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#h4567b083-9fea-4848-882a-280b6abc7853","urltitle":"www.amsterdam.nl\/penr ","adres":"Olympisch Stadion 44","postcode":"1076DE","woonplaats":"Amsterdam","opmerkingen":"","OV_tram":"tram 24 VU medisch centrum - Centraal Station vv","OV":"tram;GVB_24_1;07121"}},{"parkeerlocatie":{"title":"P+R Johan Cruijff ArenA","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9405734,52.3137551]}","type":"P+R","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeren-reizen\/#h1dfa5189-98e8-42ce-8119-ce74f2451969","urltitle":"www.amsterdam.nl\/penr","adres":"Burgemeester Stramanweg 130","postcode":"1101EP","woonplaats":"Amsterdam","opmerkingen":"","OV_metro":"metro 54 Gein - Centraal Station vv","OV_trein":"Treinen tussen station Bijlmer Arena en stations Amstel, Muiderpoort en Centraal Station (GVB P+R-kaart niet geldig)","OV":"metro;GVB_54_1;09522"}},{"parkeerlocatie":{"title":"Amsterdamse Poort (P21 t\/m 24)","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9626214,52.3192019]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/amsterdamse-poort-p21\/","urltitle":"Amsterdamse Poort P21","adres":"Bijlmerdreef 700","postcode":"1103DS","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P18 HES\/ ROC","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9466199,52.3152543]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/garage-p18-hes-roc\/","urltitle":"Bekijk P18 HES\/ ROC op www.amsterdam.nl\/parkeergarages","adres":"Fraijlemaborg 131","postcode":"1102CV","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P1 ArenA","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9405851,52.3137433]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/parkeergarage-p1\/","urltitle":"Bekijk P1 ArenA op www.amsterdam.nl\/parkeergarages","adres":"Burgemeester Stramanweg 130","postcode":"1101EP","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P10 Plaza ArenA","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9409531,52.3080762]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/p10-plaza-arena\/","urltitle":"Bekijk P10 Plaza ArenA op www.amsterdam.nl\/parkeergarages","adres":"Herikerbergweg 288","postcode":"1101CT","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P 3 Mikado","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9413266,52.3103066]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/garage-p3-mikado\/","urltitle":"Bekijk P3 Mikado op www.amsterdam.nl\/parkeergarages","adres":"De entree 228","postcode":"1101EE","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"RAI Parking","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8921615,52.3383996]}","type":"CommercieleParkeergarage","url":"https:\/\/www.rai.nl\/nl\/contact-bereikbaarheid-en-parkeren\/parkeren-bij-rai-amsterdam\/","urltitle":"Rai Parking","adres":"Europaboulevard 24","postcode":"1078GZ","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Eurocenter","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8888094,52.3358123]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/eurocenter\/","urltitle":"Qpark Eurocenter ","adres":"Barbara Strozzilaan 342","postcode":"1083HN","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Mahler","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8723915,52.3377672]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/mahler\/","urltitle":"Qpark Mahler","adres":"Claude Debussylaan 42","postcode":"1082MD","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Olympisch Stadion","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8539215,52.3440266]}","type":"CommercieleParkeergarage","url":"http:\/\/www.p1.nl\/parkeren\/parkeergarage-olympisch-stadion\/","urltitle":"P1 Parkeergarage Olympisch Stadion ","adres":"Olympisch Stadion 44","postcode":"1076DE","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Interparking Oranjekwartier Amsterdam","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.839149,52.3546448]}","type":"CommercieleParkeergarage","url":"http:\/\/www.interparking.nl\/nl-NL\/find-parking\/Oranjekwartier\/","urltitle":"Interparking Oranjekwartier Amsterdam","adres":"Carnapstraat 200","postcode":"1062KZ","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Bomengarage P2 (Boven 't IJ) ","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9388323,52.3994577]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/garage-p2bomengarage\/","urltitle":"Bekijk Bomengarage P2 op www.amsterdam.nl\/parkeergarages","adres":"Buikslotermeerplein 237","postcode":"1025 XB","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Westergasfabriek","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8662388,52.3847072]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/westergasfabriek\/","urltitle":"Qpark Amsterdam Westergasfabriek","adres":"Van Bleiswijkstraat 8","postcode":"1051DG","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Europarking","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8766781,52.3699218]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/europarking\/","urltitle":"Qpark Europarking","adres":"Marnixstraat 250","postcode":"1016TL","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Byzantium","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8793897,52.3618422]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/byzantium\/","urltitle":"Qpark Amsterdam Byzantium","adres":"Tesselschadestraat 1","postcode":"1054ET","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Piet Heingarage","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9173751,52.3773883]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/parkeergarage-piet\/","urltitle":"Bekijk Piet Heingarage op www.amsterdam.nl\/parkeergarages","adres":"Piet Heinkade 59","postcode":"1019GM","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Parking Centrum Oosterdok","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9092051,52.3761913]}","type":"CommercieleParkeergarage","url":"http:\/\/www.parkingcentrumoosterdok.nl\/","urltitle":"Parking Centrum Oosterdok","adres":"Oosterdoksstraat 150","postcode":"1011AD","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Markenhoven","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.908618,52.3696328]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/garage-markenhoven\/","urltitle":"Bekijk Markenhoven op www.amsterdam.nl\/parkeergarages","adres":"Anne Frankstraat 220","postcode":"1011 MP","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"(P1) Parking Waterlooplein","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9043352,52.3689665]}","type":"CommercieleParkeergarage","url":"http:\/\/www.parkereninwaterlooplein.nl\/","urltitle":"Parkeergarage Waterlooplein ","adres":"Valkenburgerstraat 238","postcode":"1011ND","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Stadhuis - Muziektheater","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9018035,52.3670615]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/garage-stadhuis\/","urltitle":" Bekijk Stadhuis-Muziektheater op www.amsterdam.nl\/parkeergarages","adres":"Waterlooplein 28","postcode":"1011PG","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Parkeergarage Prins & Keizer","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.891798,52.3622906]}","type":"CommercieleParkeergarage","url":"http:\/\/www.apcoa.nl\/parkeren-in\/amsterdam\/apcoa-parking-prins-keizer.html","urltitle":"Apcoa Parking Prins & Keizer","adres":"Prinsengracht 927","postcode":"1017HL","woonplaats":"Amsterdam","aantal":"140","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark De Bijenkorf","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.895162,52.373881]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/de-bijenkorf\/","urltitle":"Qpark De Bijenkorf","adres":"Beursplein 15","postcode":"1012JW","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Nieuwendijk","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8944693,52.3764423]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/nieuwendijk\/","urltitle":"Qpark Nieuwendijk","adres":"Nieuwezijds Kolk 18","postcode":"1012PV","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"(P1) Parking Amsterdam Centre","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8970068,52.3785141]}","type":"CommercieleParkeergarage","url":"http:\/\/www.p1.nl\/parkeren\/p1-parking-amsterdam-centre\/","urltitle":"P1 Parking Amsterdam Centre ","adres":"Prins Hendrikkade 20 a","postcode":"1012TL","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Parkeergarage Apcoa Heinekenplein","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8924871,52.3571537]}","type":"CommercieleParkeergarage","url":"http:\/\/www.apcoa.nl\/parkeren-in\/amsterdam\/apcoa-parking-heinekenplein.html","urltitle":"Apcoa garage Heinekenplein ","adres":"Eerste Van der Helststraat 6","postcode":"1072NV","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"Qpark Museumplein","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.8798246,52.3571347]}","type":"CommercieleParkeergarage","url":"https:\/\/www.q-park.nl\/nl-nl\/parkeren\/amsterdam\/museumplein\/","urltitle":"Qpark Museumplein ","adres":"Van Baerlestraat 33 B","postcode":"1071AP","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P4 en P5 Villa ArenA","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9389632,52.3118578]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/","urltitle":"Bekijk P4 en P5 Villa ArenA op www.amsterdam.nl\/parkeergarages ","adres":"De entree 7","postcode":"1101BH","woonplaats":"Amsterdam","opmerkingen":""}},{"parkeerlocatie":{"title":"P4 en P5 Villa ArenA","Locatie":"{\"type\":\"Point\",\"coordinates\":[4.9389632,52.3118578]}","type":"Parkeergarage","url":"https:\/\/www.amsterdam.nl\/parkeren-verkeer\/parkeergarages\/parkeergarages\/","urltitle":"Bekijk P4 en P5 Villa ArenA op www.amsterdam.nl\/parkeergarages","adres":"De entree 7","postcode":"1101BH","woonplaats":"Amsterdam","opmerkingen":""}} The current error message is "TypeError: string indices must be integers" but i think it should give all the titles of the parkeerlocatie.
parkeerlocaties['parkeerlocatie'] is not a list, it's a dictionary. You should use parkeerlocaties['parkeerlocaties']['title']. And the title is a string, there's no reason to iterate over it (unless you want to process it character by character for some reason). with open('locaties.json') as json_file: data = json.load(json_file) for parkeerlocaties in data['parkeerlocaties']: print('Title: ', parkeerlocaties['parkeerlocaties']['title'])
>>> type(data) <type 'dict'> >>> data['parkeerlocaties'] <type 'list'> So the code could be import urllib, json import requests import json with open('locaties.json') as json_file: data = json.load(json_file) parkeerlocaties = data['parkeerlocaties'] for parkeerlocatie in parkeerlocaties: print(parkeerlocatie['parkeerlocatie']['title'])
Python file parsing, can't catch strings in new line
So Parsing a large text file with 56,900 book titles with authors and a etext no. Trying to find the authors. By parsing the file. The file is a like this: TITLE and AUTHOR ETEXT NO. Aspects of plant life; with special reference to the British flora, 56900 by Robert Lloyd Praeger The Vicar of Morwenstow, by Sabine Baring-Gould 56899 [Subtitle: Being a Life of Robert Stephen Hawker, M.A.] Raamatun tutkisteluja IV, mennessä Charles T. Russell 56898 [Subtitle: Harmagedonin taistelu] [Language: Finnish] Raamatun tutkisteluja III, mennessä Charles T. Russell 56897 [Subtitle: Tulkoon valtakuntasi] [Language: Finnish] Tom Thatcher's Fortune, by Horatio Alger, Jr. 56896 A Yankee Flier in the Far East, by Al Avery 56895 and George Rutherford Montgomery [Illustrator: Paul Laune] Nancy Brandon's Mystery, by Lillian Garis 56894 Nervous Ills, by Boris Sidis 56893 [Subtitle: Their Cause and Cure] Pensées sans langage, par Francis Picabia 56892 [Language: French] Helon's Pilgrimage to Jerusalem, Volume 2 of 2, by Frederick Strauss 56891 [Subtitle: A picture of Judaism, in the century which preceded the advent of our Savior] Fra Tommaso Campanella, Vol. 1, di Luigi Amabile 56890 [Subtitle: la sua congiura, i suoi processi e la sua pazzia] [Language: Italian] The Blue Star, by Fletcher Pratt 56889 Importanza e risultati degli incrociamenti in avicoltura, 56888 di Teodoro Pascal [Language: Italian] The Junior Classics, Volume 3: Tales from Greece and Rome, by Various 56887 ~ ~ ~ ~ Posting Dates for the below eBooks: 1 Mar 2018 to 31 Mar 2018 ~ ~ ~ ~ TITLE and AUTHOR ETEXT NO. The American Missionary, Volume 41, No. 1, January, 1887, by Various 56886 Morganin miljoonat, mennessä Sven Elvestad 56885 [Author a.k.a. Stein Riverton] [Subtitle: Salapoliisiromaani] [Language: Finnish] "Trip to the Sunny South" in March, 1885, by L. S. D 56884 Balaam and His Master, by Joel Chandler Harris 56883 [Subtitle: and Other Sketches and Stories] Susien saaliina, mennessä Jack London 56882 [Language: Finnish] Forged Egyptian Antiquities, by T. G. Wakeling 56881 The Secret Doctrine, Vol. 3 of 4, by Helena Petrovna Blavatsky 56880 [Subtitle: Third Edition] No Posting 56879 Author name usually starts after "by" or when there is no "by" in line then author name starts after a comma ","...However the "," can be a part of the title if the line has a by. So, I parsed it for by first then for comma. Here is what I tried: def search_by_author(): fhand = open('GUTINDEX.ALL') print("Search by Author:") for line in fhand: if not line.startswith(" [") and not line.startswith("TITLE"): if not line.startswith("~"): words = line.rstrip() words = line.lstrip() words = words[:-6] if ", by" in words: words = words[words.find(', by'):] words = words[5:] print (words) else: words = words[words.find(', '):] words = words[5:] if "," in words: words = words[words.find(', '):] if words.startswith(','): words =words[words.find(','):] print (words) else: print (words) else: print (words) if " by" in words: words = words[words.find('by')] print(words) search_by_author() However it can't seem to find the author name for lines like Aspects of plant life; with special reference to the British flora, 56900 by Robert Lloyd Praeger
As per your file, info about a book can be spread across multiple lines. There is a blank line after each book info. I used that to gather all info about a book and then parse it to get the author info. import re def search_by_author(): fhand = open('GUTINDEX.ALL') book_info = '' for line in fhand: line = line.rstrip() if (line.startswith('TITLE') or line.startswith('~')): continue if (len(line) == 0): # remove info in square bracket from book_info book_info = re.sub(r'\[.*$', '', book_info) if ('by ' in book_info): tokens = book_info.split('by ') else: tokens = book_info.split(',') if (len(tokens) > 1): authors = tokens[-1].strip() print(authors) book_info = '' else: # remove ETEXT NO. from line line = re.sub(r'\d+$', '', line) book_info += ' ' + line.rstrip() search_by_author() Output: Robert Lloyd Praeger Sabine Baring-Gould mennessä Charles T. Russell mennessä Charles T. Russell Horatio Alger, Jr. Al Avery and George Rutherford Montgomery Lillian Garis Boris Sidis par Francis Picabia Frederick Strauss di Luigi Amabile Fletcher Pratt di Teodoro Pascal Various Various mennessä Sven Elvestad L. S. D Joel Chandler Harris mennessä Jack London T. G. Wakeling Helena Petrovna Blavatsky