How to load a CSV with nested arrays - python
I came across a dataset of Twitter users (Kaggle Source) but I have found that the dataset has a rather strange format. It contains a row with column headers, and then rows of what are essentially JSON arrays. The dataset is also quite large which makes it very difficult to convert the entire file into JSON objects.
What is a good way to load this data into Python, preferably a Pandas Dataframe?
Example of Data
id,screenName,tags,avatar,followersCount,friendsCount,lang,lastSeen,tweetId,friends
"1969527638","LlngoMakeEmCum_",[ "#nationaldogday" ],"http://pbs.twimg.com/profile_images/534286217882652672/FNmiQYVO_normal.jpeg",319,112,"en",1472271687519,"769310701580083200",[ "1969574754", "1969295556", "1969284056", "1969612214", "1970067476", "1969797386", "1969430539", "1969840064", "1969698176", "1970005154", "283011644", "1969901029", "1969563175", "1969302314", "1969978662", "1969457936", "1969667533", "1969547821", "1969943478", "1969668032", "283006529", "1969809440", "1969601096", "1969298856", "1969331652", "1969385498", "1969674368", "1969565263", "1970144676", "1969745390", "1969947438", "1969734134", "1969801326", "1969324008", "1969259820", "1969535827", "1970072989", "1969771688", "1969437804", "1969507394", "1969509972", "1969751588", "283012808", "1969302888", "1970224440", "1969603532", "283011244", "1969501046", "1969887518", "1970153138", "1970267527", "1969941955", "1969421654", "1970013110", "1969544905", "1969839590", "1969876500", "1969674625", "1969337952", "1970046536", "1970090934", "1969419133", "1969517215", "1969787869", "1969298065", "1970149771", "1969422638", "1969504268", "1970025554", "1969776001", "1970138611", "1969316186", "1969547558", "1969689272", "283009727", "283015491", "1969526874", "1969662210", "1969536164", "1969320008", "1969893793", "1970158393", "1969365936", "1970194418", "1969942094", "1969631580", "1969704756", "1969920092", "1969712882", "1969791680", "1969408164", "1969754851", "1970205480", "1969840267", "1969443211", "1969706762", "1969692698", "1969751576", "1969486796", "1969286630", "1969686674", "1969833492", "1969294814", "1969472719", "1969685018", "283008559", "283011243", "1969680078", "1969545697", "1969646412", "1969442725", "1969692529" ]
"51878493","_notmichelle",[ "#nationaldogday" ],"http://pbs.twimg.com/profile_images/761977602173046786/4_utEHsD_normal.jpg",275,115,"en",1472270622663,"769309490038439936",[ "60789485", "2420931980", "2899776756", "127410795", "38747286", "1345516880", "236076395", "1242946609", "2567887488", "280777286", "2912446303", "1149916171", "3192577639", "239569380", "229974168", "389097282", "266336410", "1850301204", "2364414805", "812302213", "2318240348", "158634793", "542282350", "569664772", "766573472", "703551325", "168564432", "261054460", "402980453", "562547390", "539630318", "165167145", "22216387", "427568285", "61033129", "213519434", "373092437", "170762012", "273601960", "322108757", "1681816280", "357843027", "737471496", "406541143", "1084122632", "633477616", "537821327", "793079732", "2386380799", "479015607", "783354019", "365171478", "625002575", "2326207404", "1653286842", "1676964216", "2296617326", "1583692190", "1315393903", "377660026", "2235123476", "792779641", "351222527", "444993309", "588396446", "377629159", "469383424", "1726612471", "415230430", "942443390", "360924168", "318593248", "565022085", "319679735", "632508305", "377638254", "1392782078", "584483723", "377703135", "180463340", "564978577", "502517645", "1056960042", "285097108", "410245879", "159121042", "570399371", "502348447", "960927356", "377196638", "478142245", "335043809", "73546116", "11348282", "901302409", "53255593", "515983155", "391774800", "62351523", "724792351", "346296289", "152520627", "559053427", "508019115", "349996133", "378859519", "65120103", "190070557", "339868374", "417355200", "256729771", "16171898", "45266183", "16143507", "165258639" ]
We could start with something like this:
(Might need to rethink the use of | though. We could go for something more exotic like ╡
import pandas as pd
import io
import json
data = '''\
id,screenName,tags,avatar,followersCount,friendsCount,lang,lastSeen,tweetId,friends
"1969527638","LlngoMakeEmCum_",[ "#nationaldogday" ],"http://pbs.twimg.com/profile_images/534286217882652672/FNmiQYVO_normal.jpeg",319,112,"en",1472271687519,"769310701580083200",[ "1969574754", "1969295556", "1969284056", "1969612214", "1970067476", "1969797386", "1969430539", "1969840064", "1969698176", "1970005154", "283011644", "1969901029", "1969563175", "1969302314", "1969978662", "1969457936", "1969667533", "1969547821", "1969943478", "1969668032", "283006529", "1969809440", "1969601096", "1969298856", "1969331652", "1969385498", "1969674368", "1969565263", "1970144676", "1969745390", "1969947438", "1969734134", "1969801326", "1969324008", "1969259820", "1969535827", "1970072989", "1969771688", "1969437804", "1969507394", "1969509972", "1969751588", "283012808", "1969302888", "1970224440", "1969603532", "283011244", "1969501046", "1969887518", "1970153138", "1970267527", "1969941955", "1969421654", "1970013110", "1969544905", "1969839590", "1969876500", "1969674625", "1969337952", "1970046536", "1970090934", "1969419133", "1969517215", "1969787869", "1969298065", "1970149771", "1969422638", "1969504268", "1970025554", "1969776001", "1970138611", "1969316186", "1969547558", "1969689272", "283009727", "283015491", "1969526874", "1969662210", "1969536164", "1969320008", "1969893793", "1970158393", "1969365936", "1970194418", "1969942094", "1969631580", "1969704756", "1969920092", "1969712882", "1969791680", "1969408164", "1969754851", "1970205480", "1969840267", "1969443211", "1969706762", "1969692698", "1969751576", "1969486796", "1969286630", "1969686674", "1969833492", "1969294814", "1969472719", "1969685018", "283008559", "283011243", "1969680078", "1969545697", "1969646412", "1969442725", "1969692529" ]
"51878493","_notmichelle",[ "#nationaldogday" ],"http://pbs.twimg.com/profile_images/761977602173046786/4_utEHsD_normal.jpg",275,115,"en",1472270622663,"769309490038439936",[ "60789485", "2420931980", "2899776756", "127410795", "38747286", "1345516880", "236076395", "1242946609", "2567887488", "280777286", "2912446303", "1149916171", "3192577639", "239569380", "229974168", "389097282", "266336410", "1850301204", "2364414805", "812302213", "2318240348", "158634793", "542282350", "569664772", "766573472", "703551325", "168564432", "261054460", "402980453", "562547390", "539630318", "165167145", "22216387", "427568285", "61033129", "213519434", "373092437", "170762012", "273601960", "322108757", "1681816280", "357843027", "737471496", "406541143", "1084122632", "633477616", "537821327", "793079732", "2386380799", "479015607", "783354019", "365171478", "625002575", "2326207404", "1653286842", "1676964216", "2296617326", "1583692190", "1315393903", "377660026", "2235123476", "792779641", "351222527", "444993309", "588396446", "377629159", "469383424", "1726612471", "415230430", "942443390", "360924168", "318593248", "565022085", "319679735", "632508305", "377638254", "1392782078", "584483723", "377703135", "180463340", "564978577", "502517645", "1056960042", "285097108", "410245879", "159121042", "570399371", "502348447", "960927356", "377196638", "478142245", "335043809", "73546116", "11348282", "901302409", "53255593", "515983155", "391774800", "62351523", "724792351", "346296289", "152520627", "559053427", "508019115", "349996133", "378859519", "65120103", "190070557", "339868374", "417355200", "256729771", "16171898", "45266183", "16143507", "165258639" ]'''
# Create new separator (|) after 9th comma (',')
data = '\n'.join(['|'.join(row.split(',',9)) for row in data.split('\n')])
# REPLACE WITH THIS FOR REAL FILE
#with open('path/to/file') as f:
#data = '\n'.join(['|'.join(row.split(',',9)) for row in f.read().split('\n')])
# Read dataframe
df = pd.read_csv(io.StringIO(data), sep='|')
# Convert strings to objects with json module:
df['friends'] = df['friends'].apply(lambda x: json.loads(x))
df['tags'] = df['tags'].apply(lambda x: json.loads(x))
Safer approach:
import pandas as pd
import io
import json
with open('path/to/file') as f:
columns, *rows = [row.split(',',9) for row in f.read().split('\n')]
df = pd.DataFrame(rows, columns=columns)
# Convert strings to objects with json module:
df['friends'] = df['friends'].apply(lambda x: json.loads(x))
df['tags'] = df['tags'].apply(lambda x: json.loads(x))
Related
File name grouping and indexing
I have a folder with txt files like below, and I have used os.listdir to generate a file list, ['acc_exp01_user01.txt', 'acc_exp02_user01.txt', 'acc_exp03_user02.txt', 'acc_exp04_user02.txt', 'acc_exp05_user03.txt', 'acc_exp06_user03.txt', 'acc_exp07_user04.txt', 'acc_exp08_user04.txt', 'acc_exp09_user05.txt', 'acc_exp10_user05.txt', 'acc_exp11_user06.txt', 'acc_exp12_user06.txt', 'acc_exp13_user07.txt', 'acc_exp14_user07.txt', 'acc_exp15_user08.txt', 'acc_exp16_user08.txt', 'acc_exp17_user09.txt', 'acc_exp18_user09.txt', 'acc_exp19_user10.txt', 'acc_exp20_user10.txt', 'acc_exp21_user10.txt', 'acc_exp22_user11.txt', 'acc_exp23_user11.txt', 'acc_exp24_user12.txt', 'acc_exp25_user12.txt', 'acc_exp26_user13.txt', 'acc_exp27_user13.txt', 'acc_exp28_user14.txt', 'acc_exp29_user14.txt', 'acc_exp30_user15.txt', 'acc_exp31_user15.txt', 'acc_exp32_user16.txt', 'acc_exp33_user16.txt', 'acc_exp34_user17.txt', 'acc_exp35_user17.txt', 'acc_exp36_user18.txt', 'acc_exp37_user18.txt', 'acc_exp38_user19.txt', 'acc_exp39_user19.txt', 'acc_exp40_user20.txt', 'acc_exp41_user20.txt', 'acc_exp42_user21.txt', 'acc_exp43_user21.txt', 'acc_exp44_user22.txt', 'acc_exp45_user22.txt', 'acc_exp46_user23.txt', 'acc_exp47_user23.txt', 'acc_exp48_user24.txt', 'acc_exp49_user24.txt', 'acc_exp50_user25.txt', 'acc_exp51_user25.txt', 'acc_exp52_user26.txt', 'acc_exp53_user26.txt', 'acc_exp54_user27.txt', 'acc_exp55_user27.txt', 'acc_exp56_user28.txt', 'acc_exp57_user28.txt', 'acc_exp58_user29.txt', 'acc_exp59_user29.txt', 'acc_exp60_user30.txt', 'acc_exp61_user30.txt', 'gyro_exp01_user01.txt', 'gyro_exp02_user01.txt', 'gyro_exp03_user02.txt', 'gyro_exp04_user02.txt', 'gyro_exp05_user03.txt', 'gyro_exp06_user03.txt', 'gyro_exp07_user04.txt', 'gyro_exp08_user04.txt', 'gyro_exp09_user05.txt', 'gyro_exp10_user05.txt', 'gyro_exp11_user06.txt', 'gyro_exp12_user06.txt', 'gyro_exp13_user07.txt', 'gyro_exp14_user07.txt', 'gyro_exp15_user08.txt', 'gyro_exp16_user08.txt', 'gyro_exp17_user09.txt', 'gyro_exp18_user09.txt', 'gyro_exp19_user10.txt', 'gyro_exp20_user10.txt', 'gyro_exp21_user10.txt', 'gyro_exp22_user11.txt', 'gyro_exp23_user11.txt', 'gyro_exp24_user12.txt', 'gyro_exp25_user12.txt', 'gyro_exp26_user13.txt', 'gyro_exp27_user13.txt', 'gyro_exp28_user14.txt', 'gyro_exp29_user14.txt', 'gyro_exp30_user15.txt', 'gyro_exp31_user15.txt', 'gyro_exp32_user16.txt', 'gyro_exp33_user16.txt', 'gyro_exp34_user17.txt', 'gyro_exp35_user17.txt', 'gyro_exp36_user18.txt', 'gyro_exp37_user18.txt', 'gyro_exp38_user19.txt', 'gyro_exp39_user19.txt', 'gyro_exp40_user20.txt', 'gyro_exp41_user20.txt', 'gyro_exp42_user21.txt', 'gyro_exp43_user21.txt', 'gyro_exp44_user22.txt', 'gyro_exp45_user22.txt', 'gyro_exp46_user23.txt', 'gyro_exp47_user23.txt', 'gyro_exp48_user24.txt', 'gyro_exp49_user24.txt', 'gyro_exp50_user25.txt', 'gyro_exp51_user25.txt', 'gyro_exp52_user26.txt', 'gyro_exp53_user26.txt', 'gyro_exp54_user27.txt', 'gyro_exp55_user27.txt', 'gyro_exp56_user28.txt', 'gyro_exp57_user28.txt', 'gyro_exp58_user29.txt', 'gyro_exp59_user29.txt', 'gyro_exp60_user30.txt', 'gyro_exp61_user30.txt', 'labels.txt'] but I want to now group into a indexing list like this, how can I realise it?
You can use glob to find out files based out of a pattern from a path then create the required DataFrame from glob import glob import os exp_path = "Your Path Here" acc_pattern = "acc_exp*.csv" gyro_pattern = "gyro_exp*.csv" acc_files = glob(os.path.join(exp_path,acc_pattern)) gyro_files = glob(os.path.join(exp_path,gyro_pattern)) Once you have all the required files , we can create the DataFrame df = pd.DataFrame() df['acc'] = [os.path.basename(x) for x in acc_files] df['gyro'] = [os.path.basename(x) for x in gyro_files] df['experiment'] = df['acc'].apply(lambda x:x[7:9]) df['userId'] = df['acc'].apply(lambda x:x[14:16])
Error in retrieving financial data for large list of tickers from yahoo finance into a dataframe using for loop
In this particular problem, I have a very long list of tickers for which I want to retrieve some of the financial information from yahoo finance website using python: here is the list: tickers = ["OMKAR.BO", "KCLINFRA.BO", "MERMETL.BO", "PRIMIND.BO", "VISIONCO.BO", "PANAFIC.BO", "KARANWO.BO", "SOURCEIND.BO", "WELCURE.BO", "NAVKETAN.BO", "CUBIFIN.BO", "IMPEXFERRO.BO", "MISHTANN.BO", "SUMERUIND.BO", "MISHTANN.BO", "MADHUVEER.BO", "TNTELE.BO", "JMGCORP.BO", "GSLSEC.BO", "DEVKI.BO", "MINAXI.BO", "INNOCORP.BO", "SURYACHAKRA.BO", "ANKITMETAL.BO", "HAVISHA.BO", "SHIVA.BO", "COMFINTE.BO", "KONNDOR.BO", "PAZEL.BO", "SHARPINV.BO", "MIDINFRA.BO", "UNIVPRIM.BO", "ATHARVENT.BO", "FGP.BO", "BKV.BO", "VIVIDHA.BO", "FISCHER.BO", "ADITRI.BO", "GLFL.BO", "RAJOIL.BO", "ALFL.BO", "PURITY.BO", "ARCEEIN.BO", "INTECH.BO", "MIDEASTP.BO", "STANCAP.BO", "OCTAVE.BO", "TRIJAL.BO", "SREEJAYA.BO", "4THGEN.BO", "RICHIRICH.BO", "VIRTUALS.BO", "SAVINFOCO.BO", "TTIENT.BO", "OONE.BO", "TILAK.BO", "XTGLOBAL.BO", "MANGIND.BO", "ROYALIND.BO", "ASHUTPM.BO", "SMPL.BO", "BGPL.BO", "NYSSACORP.BO", "BILENERGY.BO", "YOGISUNG.BO", "DOLPHMED.BO", "PRATIK.BO", "IPOWER.BO", "BIHSPONG.BO", "CAPFIN.BO", "MCLTD.BO", "KGL.BO", "OMNIAX.BO", "HEERAISP.BO", "VISIONCINE.BO", "SWORDEDGE.BO", "AARVINFRA.BO", "ADVENT.BO", "UVDRHOR.BO", "SUNGOLD.BO", "USHDI.BO", "HINDAPL.BO", "IMEC.BO", "ARAVALIS.BO", "SERVOTEACH.BO", "SCAGRO.BO", "UMESLTD.BO", "CHARMS.BO", "NCLRESE.BO", "SYMBIOX.BO", "PRADIP.BO", "INTEGFD.BO", "CLIOINFO.BO", "RRSECUR.BO", "MUKATPIP.BO", "SYNCOMF.BO", "DYNAMICP.BO", "TRABI.BO", "RADAAN.BO", "KIRANSY-B.BO", "RAMSARUP.BO", "UNIMOVR.BO", "MELSTAR.BO", "OMANSH.BO", "VERTEX.BO", "VENTURA.BO", "GEMSPIN.BO", "EXPLICITFIN.BO", "PASARI.BO", "BABA.BO", "MAHAVIRIND.BO", "BAMPSL.BO", "GAJRA.BO", "SUNRAJDI.BO", "ACCEL.BO", "SIMPLXPAP.BO", "PHARMAID.BO", "JATALIA.BO", "TWINSTAR.BO", "CINDRELL.BO", "SHRGLTR.BO", "EUROMULTI.BO", "CRESSAN.BO", "SEVENHILL.BO", "QUADRANT.BO", "PHTRADING.BO", "SIPTL.BO", "HOTELRUGBY.BO", "KAUSHALYA.BO", "YASHRAJC.BO", "ASHAI.BO", "BERYLSE.BO", "LLOYDSTEEL.BO", "SCANPRO.BO", "HBLEAS.BO", "ASHCAP.BO", "SUNSHINE.BO", "AREALTY.BO", "MSCTC.BO", "HARIAEXPO.BO", "CNIRESLTD.BO", "KABRADG.BO", "CLFL.BO", "TRANSASIA.BO", "KACL.BO", "JAIHINDS.BO", "SANBLUE.BO", "DHENUBUILD.BO", "DHENUBUILD.BO", "ODYCORP.BO", "SAWABUSI.BO", "KAKTEX.BO", "GANONPRO.BO", "GENUSPRIME.BO", "EUREKAI.BO", "CHROMATIC.BO", "ISHWATR.BO", "INTEGRA.BO", "KACL.BO", "SSLFINANCE.BO", "ORIENTTR.BO", "ZHINUDYP.BO", "SWADEIN.BO", "SHKALYN.BO", "BAPACK.BO", "MARUTISE.BO", "PMTELELIN.BO", "SPARCSYS.BO", "GOLKONDA.BO", "DECPO.BO", "NATHUEC.BO", "INDOCITY.BO", "IOSYSTEM.BO", "ADVIKCA.BO", "JRFOODS.BO", "INFOMEDIA.BO", "INDRANIB.BO", "REGTRUS.BO", "RAGHUNAT.BO", "DCMFINSERV.BO", "RRIL.BO", "FILATFASH.BO", "ISWL.BO", "ASINPET.BO", "KORE.BO", "UNIOFFICE.BO", "GUJINV.BO", "QUEST.BO", "GLITTEKG.BO", "AMFORG.BO", "LGBFORGE.BO", "MAL.BO", "CYBERMAT.BO", "AGRIMONY.BO", "METKORE.BO", "SKYLMILAR.BO", "KIRANPR.BO", "RAJSPTR.BO", "SHVFL.BO", "MPFSL.BO", "AMITINT.BO", "KREONFIN.BO", "GRAVITY.BO", "KACHCHH.BO", "STELLANT.BO", "DEVINE.BO", "ICSL.BO", "STELLAR.BO", "CORAGRO.BO", "ARCFIN.BO", "GAMMNINFRA.BO", "EMMSONS.BO", "OSCARGLO.BO", "HARIAAPL.BO", "CORNE.BO", "FACORALL.BO", "KANELIND.BO", "INDOASIAF.BO", "BHANDHOS.BO", "GAGANPO.BO", "SELMCL.BO", "VENLONENT.BO", "KBSINDIA.BO", "RAMAPETRO.BO", "UTIQUE.BO", "GUJSTATFIN.BO", "COUNCODOS.BO", "JDORGOCHEM.BO", "ANSHNCO.BO", "SILVERO.BO", "CONSTRONIC.BO", "SIPIND.BO", "ESARIND.BO", "GUJCOTEX.BO", "HILIKS.BO", "MINFY.BO", "LEENEE.BO", "DUGARHOU.BO", "JHACC.BO", "CINERAD.BO", "GCMCAPI.BO", "GCMCOMM.BO", "CHENFERRO.BO", "MANCREDIT.BO", "TRICOMFRU.BO", "VEGETABLE.BO", "JSHL.BO", "HATHWAYB.BO", "JAYIND.BO", "ROYALCU.BO", "DHANADACO.BO", "ELCIDIN.BO", "RAGHUTOB.BO", "GISOLUTION.BO", "RAGHUTOB.BO", "CONTICON.BO", "NETWORK.BO", "BANASFN.BO", "CRANESSOFT.BO", "RSCINT.BO", "JPTRLES.BO", "ALOKTEXT.BO", "PRAGBOS.BO", "WELTI.BO", "EKAMLEA.BO", "MASL.BO", "SAFFRON.BO", "SRDAPRT.BO", "FFPL.BO", "RITESHIN.BO", "BLOIN.BO", "YARNSYN.BO", "OISL.BO", "POLYTEX.BO", "SPSINT.BO", "GCMCOMM.BO", "FRONTCAP.BO", "SEZAL.BO", "CITYMAN.BO", "AJEL.BO", "ESCORTSFIN.BO", "ABHIINFRA.BO", "PRATIKSH.BO", "JCTLTD.BO", "GENESIS.BO", "HINDSECR.BO", "GKCONS.BO", "MODWOOL.BO", "ROHITFERRO.BO", "NMSRESRC.BO", "VARIMAN.BO", "WAGEND.BO", "INDLEASE.BO", "APOORVA.BO", "HITTCO.BO", "PREMPIPES.BO", "SRMENERGY.BO", "KEDIACN.BO", "TOYAMIND.BO", "EPSOMPRO.BO", "RICHUNV.BO", "CITYONLINE.BO", "ELANGO.BO", "AMITSEC.BO", "CTL.BO", "LPDC.BO", "CONTCHM.BO", "NTL.BO", "SYBLY.BO", "ELEFLOR.BO", "KMFBLDR.BO", "TRIVIKRAMA.BO", "RUCHINFRA.BO", "PROMACT.BO", "USHAKIRA.BO", "ARUNAHTEL.BO", "CIL.BO", "MOUNTSHIQ.BO", "SPTRSHI.BO", "SEATV.BO", "SWASTIVI.BO", "SUNDARAM.BO", "CREATIVEYE.BO", "EUROASIA.BO", "ANJANIFIN.BO", "ADARSH.BO", "GLOBALCA.BO", "INDERGR.BO", "USGTECH.BO", "RASIELEC.BO", "SHEETAL.BO", "SYLPH.BO", "GOYALASS.BO", "KANSAFB.BO", "ANERI.BO", "DRL.BO", "OSWALOR.BO", "SWAGRUHA.BO", "SARTHAKIND.BO", "GALADA.BO", "OSWAYRN.BO", "TRINITYLEA.BO", "GOLCA.BO", "SODFC.BO", "LEADFIN.BO", "KAYPOWR.BO", "PANELEC.BO", "TARAI.BO", "SANJIVIN.BO", "MKTCREAT.BO", "ECOBOAR.BO", "SUNRINV.BO", "MAYURFL.BO", "GARWAMAR.BO", "SURYAKR.BO", "BESTAGRO.BO", "INDCEMCAP.BO", "EASTSILK.BO", "MPAGI.BO", "HRMNYCP.BO", "RUBRAME.BO", "INCON.BO", "AMRAPLIN.BO", "RESPONSINF.BO", "BACPHAR.BO", "KRISHNACAP.BO", "SHBHAWPA.BO", "TOWASOK.BO", "PADMALAYAT.BO", "MHSGRMS.BO", "JMTAUTOLTD.BO", "WELCON.BO", "UNITEDTE.BO", "MNPLFIN.BO", "PARSHINV.BO", "UNISHIRE.BO", "RAJINFRA.BO", "MMLF.BO", "ALCHCORP.BO", "CHMBBRW.BO", "NOGMIND.BO", "SHRMFGC.BO", "SAMTEX.BO", "SUPERTEX.BO", "JAIHINDPRO.BO", "CENTEXT.BO", "BCG.BO", "GENNEX.BO", "EDUCOMP.BO", "SHIVAGR.BO", "ADINATH.BO", "MINID.BO", "SURANAT&P.BO", "GYANDEV.BO", "AVTIL.BO", "ZSWASTSA.BO", "JINDCAP.BO", "NBFOOT.BO", "SHESHAINDS.BO", "UTLINDS.BO", "MADHUSE.BO", "THAMBBI.BO", "KKPLASTICK.BO", "VAGHANI.BO", "SOLIDCO.BO", "HIMFIBP.BO", "KKFIN.BO", "CSL.BO", "GOPAIST.BO", "BALTE.BO", "ETIL.BO", "PAOS.BO", "RAINBOWDQ.BO", "JAGSONFI.BO", "REGENTRP.BO", "AFEL.BO", "BRIPORT.BO", "SURATEX.BO", "INFRAIND.BO", "SPENTEX.BO", "TITANSEC.BO", "ALPSINDUS.BO", "UNISTRMU.BO", "SPECMKT.BO", "SAENTER.BO", "TOKYOFIN.BO", "TRANSFD.BO", "BSELINFRA.BO", "WELSPLSOL.BO", "SONALAD.BO", "CRIMSON.BO", "UNITY.BO", "VIKASPROP.BO", "VELHO.BO", "SYNCOM.BO", "CYBELEIND.BO", "VANICOM.BO", "THAKRAL.BO", "INDOEURO.BO", "ALAN SCOTT.BO", "SALSTEEL.BO", "ADITYA.BO", "HASTIFIN.BO", "NIBE.BO", "JOINTECAED.BO", "GANGAPHARM.BO", "SBECSUG.BO", "EASTBUILD.BO", "LORDSHOTL.BO", "IYKOTHITE.BO", "URJAGLOBA.BO", "DHRUVCA.BO", "RAP.BO", "LAHL.BO", "MONNETIN.BO", "SETUINFRA.BO", "RRMETAL.BO", "GTLINFRA.BO", "ECOM.BO", "TTML.BO", "ARNOLD.BO", "FLORATX.BO", "GARODCH.BO", "PUROHITCON.BO", "KAMRLAB.BO", "MILESTONE.BO", "NETLINK.BO", "MARSONS.BO", "SESL.BO", "OBRSESY.BO", "VRWODAR.BO", "NUWAY.BO", "CJGEL.BO", "REDEXPR.BO", "AISHWARYA.BO", "PICTUREHS.BO", "BAGFILMS.BO", "WOODSVILA.BO", "MEHSECU.BO", "MBPARIKH.BO", "SICLTD.BO", "GITARENEW.BO", "DESHRAK.BO", "SENINFO.BO", "TELECANOR.BO", "STLSTRINF.BO", "JRELTD.BO", "OROSMITHS.BO", "MUNOTHFI.BO", "AVAILFC.BO", "NITINFIRE.BO", "PIFL.BO", "BLBLIMITED.BO", "SRECR.BO", "NAGTECH.BO", "ARISE.BO", "FRONTBUSS.BO", "PAEL.BO", "ROLLT.BO", "VALLABH.BO", "RANASUG.BO", "STRATMONT.BO", "SANTOSHF.BO", "SVAINDIA.BO", "PARKERAC.BO", "VSFPROJ.BO", "AUROCOK.BO", "HKG.BO", "CASTEXTECH.BO", "HOWARHO.BO", "RTNPOWER.BO", "SHRIBCL.BO", "GARWSYN.BO", "MEHSECU.BO", "PRAVEG.BO", "MEHTAHG.BO", "RTNINFRA.BO", "MMWL.BO", "GAGAN.BO", "WWALUM.BO", "HEMANG.BO", "DOLAT.BO", "SUPTANERY.BO", "EUROCERA.BO", "SURFI.BO", "TTIL.BO", "VARDHMAN.BO", "SUPERBAK.BO", "ESHAMEDIA.BO", "CONTILI.BO", "CESL.BO", "DAULAT.BO", "RAJATH.BO", "SURYVANSP.BO", "KUWERIN.BO", "SVARTCORP.BO", "SKRABUL.BO", "WSIND.BO", "DELTA.BO", "SIPL.BO", "RMCHEM.BO", "STDBAT.BO", "PICCASUG.BO", "AGIOPAPER.BO", "SHREYASI.BO", "CCCL.BO", "GAL.BO", "GOLECHA.BO", "RAAJMEDI.BO", "KINETRU.BO", "ZKHANDEN.BO", "LAKHOTIA.BO", "SANINFRA.BO", "KABSON.BO", "ENTRINT.BO", "SIROHIA.BO", "3IINFOTECH.BO", "MEHIF.BO", "BASANTGL.BO", "MAITRI.BO", "CEENIK.BO", "MAXIMAA.BO", "STANPACK.BO", "CRANEINFRA.BO", "CHITRTX.BO", "CAPRICORN.BO", "TAVERNIER.BO", "JPPOWER.BO", "PATIDAR.BO", "BANSTEA.BO", "NEWMKTADV.BO", "DANUBE.BO", "MAHALXSE.BO", "SARDAPPR.BO", "KZLFIN.BO", "ABHIFIN.BO", "AVI.BO", "GAYATRIBI.BO", "VXLINSTR.BO", "ADITYASP.BO", "OMKARPH.BO", "ESSARSEC.BO", "SALSAIN.BO", "NDASEC.BO", "PARABDRUGS.BO", "EPIC.BO", "HIGHSTREE.BO", "TRIMURTHI.BO", "DBSTOCKBRO.BO", "ADARSHPL.BO", "SONAL.BO", "FRASER.BO", "BRIDGESE.BO", "GBGLOBAL.BO", "UNRYLMA.BO", "ANNAINFRA.BO", "RTEXPO.BO", "FUNDVISER.BO", "LIBORD.BO", "HYPERSOFT.BO", "JTAPARIA.BO", "ANUBHAV.BO", "MEGFI.BO", "ACTIONFI.BO", "BCLENTERPR.BO", "RAMSONS.BO", "GUJARATPOLY.BO", "SBFL.BO", "CHDCHEM.BO", "MONEYBOXX.BO", "ALSL.BO", "DEVHARI.BO", "NARPROP.BO", "PIONAGR.BO", "JAYBHCR.BO", "QGO.BO", "KRIFILIND.BO", "GOLDCOINHF.BO", "GALLOPENT.BO", "MIC.BO", "INTELLCAP.BO", "ABIRAFN.BO", "OLPCL.BO", "ZSHERAPR.BO", "CELLA.BO", "ZSANMCOM.BO", "STEELCO.BO", "VFL.BO", "MODAIRY.BO", "ZSANMCOM.BO", "STEELCO.BO", "SHUKJEW.BO", "JAYKAY.BO", "MIC.BO", "MODAIRY.BO", "RGIL.BO", "GSBFIN.BO", "OLPCL.BO", "HINDMOTORS.BO", "GAJANANSEC.BO", "MKEXIM.BO", "BERLDRG.BO", "KUBERJI.BO", "ADDIND.BO", "INDOSOLAR.BO", "GOLDCOINHF.BO", "ACIIN.BO", "UNITINT.BO", "SDC.BO", "RAJKSYN.BO", "CHAMAK.BO", "BHILSPIN.BO", "PANORAMA.BO", "REGAL.BO", "KRRAIL.BO", "AMS.BO", "PARIKSHA.BO", "SURYAINDIA.BO", "ADHARSHILA.BO", "AMARNATH.BO", "JAYATMA.BO", "CANOPYFIN.BO", "FMEC.BO", "CITL.BO", "DAL.BO", "YORKEXP.BO", "MEWATZI.BO"] and then what I am doing is as below in which I want to get Market Capitalization for each of the tickers in the above list: from pandas_datareader import data import pandas as pd tickers = tick[0:30] dat = data.get_quote_yahoo(tickers)['marketCap'] print(dat) I am able to fetch 20-30 tickers using above code, but if I try to pull all, the code throws an error "request timed out" and "list out of range" etc. Then I tried to fetch data one by one using for loop as below: f = pd.DataFrame(columns=["Ticker", "MarketCapINR"]) columns = list(df) for tick in ticks: dat = data.get_quote_yahoo(tick)['marketCap'] zipped = zip(columns, dat) a_dictionary = dict(zipped) df = df.append(a_dictionary, ignore_index=True) This returned me two errors, one of these is list out of bound and another (when I tried to shorten the length of list using 'slicing'), request timed out. Is there a way out, to get all the data (ticker names in first column and MarketCap values in second column of a pandas dataframe) ?? ..
Here's a solution using a package called yahooquery. Disclaimer: I am the author of the package. from yahooquery import Ticker import pandas as pd tickers = [...] # Use your list above t = Ticker(tickers) data = t.quotes df = pd.DataFrame(data).T df['marketCap'] OMKAR.BO 4750000 KCLINFRA.BO 26331000 MERMETL.BO 11472136 PRIMIND.BO 22697430 VISIONCO.BO 14777874 ... CANOPYFIN.BO 93859304 FMEC.BO 10542380 CITL.BO NaN YORKEXP.BO 57503880 MEWATZI.BO 51200000 Name: marketCap, Length: 632, dtype: object
How does Python convert date value from excel
I am reading a csv file with a CDATE column. The structure of the column is: |CDATE | |08/28/2018| |08/28/2018| |08/29/2018| |08/30/2018| |09/02/2018| |09/04/2018| ... |04/10/2019| As you can see there is duplicate date as well as missing dates in this column, and I would like to find the missing dates and add them to my dataframe. My code is: import matplotlib.pyplot as plt warnings.filterwarnings("ignore") plt.style.use('fivethirtyeight') import pandas as pd df = pd.read_csv("XXX.csv") dateCol = df['CDATE'].values.tolist() dates = pd.to_datetime(dateCol, format='%m/%d/%Y') startDate = dates.min() endDate = dates.max() df = df.sort_values('CDATE') df_plastic = df['PLASTIC'].unique() dateRange = pd.date_range(startDate, endDate) df_date = df['CDATE'].unique() for cursorDate in dateRange: if (cursorDate in df_date) is False: print('Data is missing date {} from range {}'.format(cursorDate, df_date)) But the output is: Data is missing date 2019-02-21 00:00:00 from ['01/01/2019' '01/02/2019' '01/03/2019' '01/04/2019' '01/05/2019' '01/07/2019' '01/08/2019' '01/09/2019' '01/10/2019' '01/11/2019' '01/12/2019' '01/14/2019' '01/15/2019' '01/16/2019' '01/17/2019' '01/18/2019' '01/19/2019' '01/21/2019' '01/22/2019' '01/23/2019' '01/24/2019' '01/25/2019' '01/26/2019' '01/28/2019' '01/29/2019' '01/30/2019' '01/31/2019' '02/01/2019' '02/02/2019' '02/04/2019' '02/05/2019' '02/06/2019' '02/07/2019' '02/08/2019' '02/09/2019' '02/11/2019' '02/12/2019' '02/13/2019' '02/14/2019' '02/15/2019' '02/16/2019' '02/19/2019' '02/20/2019' '02/21/2019' '02/22/2019' '02/23/2019' '02/25/2019' '02/26/2019' '02/27/2019' '02/28/2019' '03/01/2019' '03/02/2019' '03/03/2019' '03/04/2019' '03/05/2019' '03/06/2019' '03/07/2019' '03/08/2019' '03/09/2019' '03/11/2019' '03/12/2019' '03/13/2019' '03/14/2019' '03/15/2019' '03/16/2019' '03/18/2019' '03/19/2019' '03/20/2019' '03/21/2019' '03/22/2019' '03/23/2019' '03/25/2019' '03/26/2019' '03/27/2019' '03/28/2019' '03/29/2019' '03/30/2019' '04/01/2019' '04/02/2019' '04/03/2019' '04/04/2019' '04/05/2019' '04/06/2019' '04/08/2019' '04/09/2019' '04/10/2019' '05/29/2018' '05/30/2018' '05/31/2018' '06/01/2018' '06/02/2018' '06/04/2018' '06/05/2018' '06/06/2018' '06/07/2018' '06/08/2018' '06/09/2018' '06/11/2018' '06/12/2018' '06/13/2018' '06/14/2018' '06/15/2018' '06/16/2018' '06/18/2018' '06/19/2018' '06/20/2018' '06/21/2018' '06/22/2018' '06/23/2018' '06/25/2018' '06/26/2018' '06/27/2018' '06/28/2018' '06/29/2018' '06/30/2018' '07/03/2018' '07/04/2018' '07/05/2018' '07/06/2018' '07/07/2018' '07/09/2018' '07/10/2018' '07/11/2018' '07/12/2018' '07/13/2018' '07/14/2018' '07/16/2018' '07/17/2018' '07/18/2018' '07/19/2018' '07/20/2018' '07/21/2018' '07/23/2018' '07/24/2018' '07/25/2018' '07/26/2018' '07/27/2018' '07/28/2018' '07/30/2018' '07/31/2018' '08/01/2018' '08/02/2018' '08/03/2018' '08/04/2018' '08/07/2018' '08/08/2018' '08/09/2018' '08/10/2018' '08/11/2018' '08/13/2018' '08/14/2018' '08/15/2018' '08/16/2018' '08/17/2018' '08/18/2018' '08/20/2018' '08/21/2018' '08/22/2018' '08/23/2018' '08/24/2018' '08/25/2018' '08/27/2018' '08/28/2018' '08/29/2018' '08/30/2018' '08/31/2018' '09/01/2018' '09/04/2018' '09/05/2018' '09/06/2018' '09/07/2018' '09/08/2018' '09/10/2018' '09/11/2018' '09/12/2018' '09/13/2018' '09/14/2018' '09/15/2018' '09/17/2018' '09/18/2018' '09/19/2018' '09/20/2018' '09/21/2018' '09/22/2018' '09/24/2018' '09/25/2018' '09/26/2018' '09/27/2018' '09/28/2018' '09/29/2018' '10/01/2018' '10/02/2018' '10/03/2018' '10/04/2018' '10/05/2018' '10/06/2018' '10/09/2018' '10/10/2018' '10/11/2018' '10/12/2018' '10/13/2018' '10/15/2018' '10/16/2018' '10/17/2018' '10/18/2018' '10/19/2018' '10/20/2018' '10/22/2018' '10/23/2018' '10/24/2018' '10/25/2018' '10/26/2018' '10/29/2018' '10/30/2018' '10/31/2018' '11/01/2018' '11/02/2018' '11/03/2018' '11/05/2018' '11/06/2018' '11/07/2018' '11/08/2018' '11/09/2018' '11/10/2018' '11/13/2018' '11/14/2018' '11/15/2018' '11/16/2018' '11/18/2018' '11/19/2018' '11/20/2018' '11/21/2018' '11/22/2018' '11/23/2018' '11/24/2018' '11/26/2018' '11/27/2018' '11/28/2018' '11/29/2018' '11/30/2018' '12/01/2018' '12/03/2018' '12/04/2018' '12/05/2018' '12/06/2018' '12/07/2018' '12/08/2018' '12/09/2018' '12/10/2018' '12/11/2018' '12/12/2018' '12/13/2018' '12/14/2018' '12/15/2018' '12/17/2018' '12/18/2018' '12/19/2018' '12/20/2018' '12/21/2018' '12/22/2018' '12/24/2018' '12/25/2018' '12/27/2018' '12/28/2018' '12/29/2018' '12/31/2018'] Somehow the data type of cursorDate is changed to Timestamp, making the value comparison not work. How is it converting the datetime formats?
Building on my comment above. Change the last line before your loop to this: df_date = df['CDATE'].apply(pd.to_datetime).unique()
Python - How to count specific section in a list
I'm brand new to python and I'm struggling how to add certain sections of a cvs file in python. I'm not allowed to use "import cvs" I'm importing the TipJoke CVS file from https://vincentarelbundock.github.io/Rdatasets/datasets.html This is the only code I have so far that worked and I'm at a total loss on where to go from here. if __name__ == '__main__': from pprint import pprint from string import punctuation f = open("TipJoke.csv", "r") tipList = [] for line in f: #deletes the quotes line = line.replace('"', '') tipList.append(line) pprint(tipList[]) Output: [',Card,Tip,Ad,Joke,None\n', '1,None,1,0,0,1\n', '2,Joke,1,0,1,0\n', '3,Ad,0,1,0,0\n', '4,None,0,0,0,1\n', '5,None,1,0,0,1\n', '6,None,0,0,0,1\n', '7,Ad,0,1,0,0\n', '8,Ad,0,1,0,0\n', '9,None,0,0,0,1\n', '10,None,0,0,0,1\n', '11,None,1,0,0,1\n', '12,Ad,0,1,0,0\n', '13,None,0,0,0,1\n', '14,Ad,1,1,0,0\n', '15,Joke,1,0,1,0\n', '16,Joke,0,0,1,0\n', '17,Joke,1,0,1,0\n', '18,None,0,0,0,1\n', '19,Joke,0,0,1,0\n', '20,None,0,0,0,1\n', '21,Ad,1,1,0,0\n', '22,Ad,1,1,0,0\n', '23,Ad,0,1,0,0\n', '24,Joke,0,0,1,0\n', '25,Joke,1,0,1,0\n', '26,Joke,0,0,1,0\n', '27,None,1,0,0,1\n', '28,Joke,1,0,1,0\n', '29,Joke,1,0,1,0\n', '30,None,1,0,0,1\n', '31,Joke,0,0,1,0\n', '32,None,1,0,0,1\n', '33,Joke,1,0,1,0\n', '34,Ad,0,1,0,0\n', '35,Joke,0,0,1,0\n', '36,Ad,1,1,0,0\n', '37,Joke,0,0,1,0\n', '38,Ad,0,1,0,0\n', '39,Joke,0,0,1,0\n', '40,Joke,0,0,1,0\n', '41,Joke,1,0,1,0\n', '42,None,0,0,0,1\n', '43,None,0,0,0,1\n', '44,Ad,0,1,0,0\n', '45,None,0,0,0,1\n', '46,None,0,0,0,1\n', '47,Ad,0,1,0,0\n', '48,Joke,0,0,1,0\n', '49,Joke,1,0,1,0\n', '50,None,1,0,0,1\n', '51,None,0,0,0,1\n', '52,Joke,1,0,1,0\n', '53,Joke,1,0,1,0\n', '54,Joke,0,0,1,0\n', '55,None,1,0,0,1\n', '56,Ad,0,1,0,0\n', '57,Joke,0,0,1,0\n', '58,None,0,0,0,1\n', '59,Ad,0,1,0,0\n', '60,Joke,1,0,1,0\n', '61,Ad,0,1,0,0\n', '62,None,1,0,0,1\n', '63,Joke,0,0,1,0\n', '64,Ad,0,1,0,0\n', '65,Joke,0,0,1,0\n', '66,Ad,0,1,0,0\n', '67,Ad,0,1,0,0\n', '68,Ad,0,1,0,0\n', '69,None,0,0,0,1\n', '70,Joke,1,0,1,0\n', '71,None,1,0,0,1\n', '72,None,0,0,0,1\n', '73,None,0,0,0,1\n', '74,Joke,0,0,1,0\n', '75,Ad,1,1,0,0\n', '76,Ad,0,1,0,0\n', '77,Ad,1,1,0,0\n', '78,Joke,0,0,1,0\n', '79,Joke,0,0,1,0\n', '80,Ad,1,1,0,0\n', '81,Ad,0,1,0,0\n', '82,None,0,0,0,1\n', '83,Ad,0,1,0,0\n', '84,Joke,0,0,1,0\n', '85,Joke,0,0,1,0\n', '86,Ad,1,1,0,0\n', '87,None,1,0,0,1\n', '88,Joke,1,0,1,0\n', '89,Ad,0,1,0,0\n', '90,None,0,0,0,1\n', '91,None,0,0,0,1\n', '92,Joke,0,0,1,0\n', '93,Joke,0,0,1,0\n', '94,Ad,0,1,0,0\n', '95,Ad,0,1,0,0\n', '96,Ad,0,1,0,0\n', '97,Joke,1,0,1,0\n', '98,None,0,0,0,1\n', '99,None,0,0,0,1\n', '100,None,1,0,0,1\n', '101,Joke,0,0,1,0\n', '102,Joke,0,0,1,0\n', '103,Ad,1,1,0,0\n', '104,Ad,0,1,0,0\n', '105,Ad,0,1,0,0\n', '106,Ad,1,1,0,0\n', '107,Ad,0,1,0,0\n', '108,None,0,0,0,1\n', '109,Ad,0,1,0,0\n', '110,Joke,1,0,1,0\n', '111,None,0,0,0,1\n', '112,Ad,0,1,0,0\n', '113,Ad,0,1,0,0\n', '114,None,0,0,0,1\n', '115,Ad,0,1,0,0\n', '116,None,0,0,0,1\n', '117,None,0,0,0,1\n', '118,Ad,0,1,0,0\n', '119,None,1,0,0,1\n', '120,Ad,1,1,0,0\n', '121,Ad,0,1,0,0\n', '122,Ad,1,1,0,0\n', '123,None,0,0,0,1\n', '124,None,0,0,0,1\n', '125,Joke,1,0,1,0\n', '126,Joke,1,0,1,0\n', '127,Ad,0,1,0,0\n', '128,Joke,0,0,1,0\n', '129,Joke,0,0,1,0\n', '130,Ad,0,1,0,0\n', '131,None,0,0,0,1\n', '132,None,0,0,0,1\n', '133,None,0,0,0,1\n', '134,Joke,1,0,1,0\n', '135,Ad,0,1,0,0\n', '136,None,0,0,0,1\n', '137,Joke,0,0,1,0\n', '138,Ad,0,1,0,0\n', '139,Ad,0,1,0,0\n', '140,None,0,0,0,1\n', '141,Joke,0,0,1,0\n', '142,None,0,0,0,1\n', '143,Ad,0,1,0,0\n', '144,None,1,0,0,1\n', '145,Joke,0,0,1,0\n', '146,Ad,0,1,0,0\n', '147,Ad,0,1,0,0\n', '148,Ad,0,1,0,0\n', '149,Joke,1,0,1,0\n', '150,Ad,1,1,0,0\n', '151,Joke,1,0,1,0\n', '152,None,0,0,0,1\n', '153,Ad,0,1,0,0\n', '154,None,0,0,0,1\n', '155,None,0,0,0,1\n', '156,Ad,0,1,0,0\n', '157,Ad,0,1,0,0\n', '158,Joke,0,0,1,0\n', '159,None,0,0,0,1\n', '160,Joke,1,0,1,0\n', '161,None,1,0,0,1\n', '162,Ad,1,1,0,0\n', '163,Joke,0,0,1,0\n', '164,Joke,0,0,1,0\n', '165,Ad,0,1,0,0\n', '166,Joke,1,0,1,0\n', '167,Joke,1,0,1,0\n', '168,Ad,0,1,0,0\n', '169,Joke,1,0,1,0\n', '170,Joke,0,0,1,0\n', '171,Ad,0,1,0,0\n', '172,Joke,0,0,1,0\n', '173,Joke,0,0,1,0\n', '174,Ad,0,1,0,0\n', '175,None,0,0,0,1\n', '176,Joke,1,0,1,0\n', '177,Ad,0,1,0,0\n', '178,Joke,0,0,1,0\n', '179,Joke,0,0,1,0\n', '180,None,0,0,0,1\n', '181,None,0,0,0,1\n', '182,Ad,0,1,0,0\n', '183,None,0,0,0,1\n', '184,None,0,0,0,1\n', '185,None,0,0,0,1\n', '186,None,0,0,0,1\n', '187,Ad,0,1,0,0\n', '188,None,1,0,0,1\n', '189,Ad,0,1,0,0\n', '190,Ad,0,1,0,0\n', '191,Ad,0,1,0,0\n', '192,Joke,1,0,1,0\n', '193,Joke,0,0,1,0\n', '194,Ad,0,1,0,0\n', '195,None,0,0,0,1\n', '196,Joke,1,0,1,0\n', '197,Joke,0,0,1,0\n', '198,Joke,1,0,1,0\n', '199,Ad,0,1,0,0\n', '200,None,0,0,0,1\n', '201,Joke,1,0,1,0\n', '202,Joke,0,0,1,0\n', '203,Joke,0,0,1,0\n', '204,Ad,0,1,0,0\n', '205,None,0,0,0,1\n', '206,Ad,0,1,0,0\n', '207,Ad,0,1,0,0\n', '208,Joke,0,0,1,0\n', '209,Ad,0,1,0,0\n', '210,Joke,0,0,1,0\n', '211,None,0,0,0,1\n'] I'm currently trying to find the Total number of entries of the specified card type and the Percentage of tips given for the specified card type with two decimal places of precision. The tip column is the 0 or 1 right after the card type (None, Ad, Joke).
if you are allowed with pandas library then import pandas as pd df = pd.read_csv("TipJoke.csv") df is a pandas dataframe object in which you can perform multiple filtering task according to your need. for example if you want to get data for Joke you can filter like this: print(df[df["Card"] == "Joke"]) Though, i'm just providing you the direction , not whole logic for your question.
This works from pprint import pprint from string import punctuation counts = {"Joke": 0, "Ad": 0, "None": 0} with open("TipJoke.csv", "r") as f: for line in f: line_clean = line.replace('"', "").replace("\n", "").split(",") try: counts[line_clean[1]] += int(line_clean[2]) except: pass print(counts)
how to add hyphen in between string using python
import csv x=[] y=[] with open ('x_wind.txt','r') as csvfile: plots= csv.reader(csvfile, delimiter=',') for row in plots: x.append(int(row[0])) y.append(float(row[1])) I have wrote above code to extract data from a file now , i want my data should print as 2018-06-14:1 sample data is 2018061402,6.8750 2018061403,8.0000 2018061404,7.7500 2018061405,7.3750 2018061406,6.7500 2018061407,6.1250 2018061408,5.7500 2018061409,5.6250 2018061410,5.5000 2018061411,5.5000 2018061412,5.3750 2018061413,5.1250 2018061414,4.6250 2018061415,3.8750 2018061416,3.5000 2018061417,3.1250 2018061418,3.6250 2018061419,4.2500 2018061420,4.7500 2018061421,5.8750 2018061422,6.2500 2018061423,6.6250 2018061500,6.7500 2018061501,6.7500 2018061502,7.3750 2018061503,7.1250 2018061504,6.1250 2018061505,5.2500 2018061506,4.7500 2018061507,4.1250 2018061508,4.0000 2018061509,3.8750 2018061510,3.8750 2018061511,4.1250 2018061512,4.5000 2018061513,4.3750 2018061514,3.5000 2018061515,3.1250 2018061516,3.1250 2018061517,3.0000 2018061518,3.0000 2018061519,3.5000 2018061520,3.8750 2018061521,4.1250 2018061522,4.3750 2018061523,4.6250 2018061600,5.1250 2018061601,4.8750 2018061602,6.0000 2018061603,5.5000 2018061604,4.7500 2018061605,3.8750 2018061606,3.3750 2018061607,2.7500 2018061608,2.3750 2018061609,2.5000 2018061610,2.7500 2018061611,3.1250 2018061612,3.3750 2018061613,3.6250 2018061614,3.2500 2018061615,2.7500 2018061616,3.1250 2018061617,2.8750 2018061618,1.5000 2018061619,1.5000 2018061620,1.6250 2018061621,1.8750 2018061622,2.6250 2018061623,3.3750 2018061700,4.1250 2018061701,4.7500 2018061702,6.1250 2018061703,6.1250 2018061704,5.5000 2018061705,5.0000 2018061706,4.2500 2018061707,4.0000 2018061708,3.8750 2018061709,4.0000 2018061710,4.3750 2018061711,4.5000 2018061712,4.5000 2018061713,4.0000 2018061714,3.5000 2018061715,3.0000 2018061716,2.7500 2018061717,2.2500 2018061718,0.3750 2018061719,1.5000 2018061720,2.1250 2018061721,2.1250 2018061722,2.2500 2018061723,3.1250 2018061800,4.2500 2018061801,5.5000 2018061802,7.1250 2018061803,7.1250 2018061804,6.3750 2018061805,5.7500 2018061806,5.3750 2018061807,5.0000 2018061808,5.1250 2018061809,5.0000 2018061810,5.0000 2018061811,4.7500 2018061812,4.6250 2018061813,4.5000 2018061814,4.2500 2018061815,3.7500 2018061816,3.3750 2018061817,3.2500 2018061818,3.1250 2018061819,3.0000 2018061820,3.1250 2018061821,3.2500 2018061822,3.3750 2018061823,3.3750 2018061900,3.5000 2018061901,3.3750 2018061902,4.3750 2018061903,5.8750 2018061904,5.8750 2018061905,5.3750 2018061906,4.7500 2018061907,3.6250 2018061908,3.5000 2018061909,2.6250 2018061910,3.0000 2018061911,2.5000 2018061912,2.0000 2018061913,1.3750 2018061914,0.5000 2018061915,-0.2500 2018061916,-0.5000 2018061917,0.6250 2018061918,2.2500 2018061919,2.0000 2018061920,2.1250 2018061921,2.1250 2018061922,2.7500 2018061923,3.1250 2018062000,3.1250 2018062001,3.0000 2018062002,3.5000 2018062003,5.0000 2018062004,5.1250 2018062005,4.5000 2018062006,3.7500 2018062007,3.2500 2018062008,3.3750 2018062009,2.8750 2018062010,2.7500 2018062011,2.5000 2018062012,1.8750 2018062013,1.2500 2018062014,0.3750 2018062015,0.1250 2018062016,0.5000 2018062017,1.8750
Looks like you have a list of datetime object. You can use the datetime module to convert it to your required format. Ex: import datetime data = "2018061402" print( datetime.datetime.strptime(data[:-2], "%Y%m%d").strftime("%Y-%m-%d") ) Output: 2018-06-14