nltk.Text : Not getting desired result - python

I am trying to extract keywords from some text as
t="""Two Travellers, walking in the noonday sun, sought the shade of a widespreading tree to rest. As they lay looking up among the pleasant leaves, they saw that it was a Plane Tree.
"How useless is the Plane!" said one of them. "It bears no fruit whatever, and only serves to litter the ground with leaves."
"Ungrateful creatures!" said a voice from the Plane Tree. "You lie here in my cooling shade, and yet you say I am useless! Thus ungratefully, O Jupiter, do men receive their blessings!"
Our best blessings are often the least appreciated."""
text = nltk.Text(word.lower() for word in t.split(" "))
print text.similar('tree')
but i get
None
why is that?
Edit
As per Andy's suggestion i tried
import nltk
import nltk.text
import nltk.corpus
t="""Two Travellers, walking in the noonday sun, sought the shade of a widespreading tree to rest As they lay looking up among the pleasant leaves, they saw that it was a Plane Tree
How useless is the Plane!" said one of them. "It bears no fruit whatever, and only serves to litter the ground with leaves."
"Ungrateful creatures!" said a voice from the Plane Tree. "You lie here in my cooling shade, and yet you say I am useless! Thus ungratefully, O Jupiter, do men receive their blessings!"
Our best blessings are often the least appreciated"""
idx = nltk.text.ContextIndex([word.lower( ) for word in t.split(" ")])
print idx.tokens()
idx.similar_words('tree')
for word in nltk.word_tokenize("tree"):
print idx.similar_words(word)
but this gives me
['two', 'travellers,', 'walking', 'in', 'the', 'noonday', 'sun,', 'sought', 'the', 'shade', 'of', 'a', 'widespreading', 'tree', 'to', 'rest', 'as', 'they', 'lay', 'looking', 'up', 'among', 'the', 'pleasant', 'leaves,', 'they', 'saw', 'that', 'it', 'was', 'a', 'plane', 'tree', '\nhow', 'useless', 'is', 'the', 'plane!"', 'said', 'one', 'of', 'them.', '"it', 'bears', 'no', 'fruit', 'whatever,', 'and', 'only', 'serves', 'to', 'litter', 'the', 'ground', 'with', 'leaves."\n\n"ungrateful', 'creatures!"', 'said', 'a', 'voice', 'from', 'the', 'plane', 'tree.', '"you', 'lie', 'here', 'in', 'my', 'cooling', 'shade,', 'and', 'yet', 'you', 'say', 'i', 'am', 'useless!', 'thus', 'ungratefully,', 'o', 'jupiter,', 'do', 'men', 'receive', 'their', 'blessings!"\n\nour', 'best', 'blessings', 'are', 'often', 'the', 'least', 'appreciated']
[]
so the word tree is int the tokens list. Why do i not get any output for the similar function?

Related

why it only get a part of text when using split in Python

I write some code to read a long text file. it has 10000 English words in txt file.then I want to use split() to get all word to train them, the code is like this:
with open('/train.txt', 'r') as fin
text=fin.read()
len(text)#result is 10000
len(text.split() #result is 2800
IT only get 2800 words of the text when using split(),but I think it should be the whole text and the both results of len() should be the same 10000.
why? due to my computer limited? or my text has problem?
the result of text command is as follow:
'My mother drove me to the airport with the windows rolled down It was seventy-five degrees in Phoenix the sky a perfect cloudless blue I was wearing my favorite shirtsleeve less white eyelet lace I was wearing it as a farewell gesture My carry-on item was a parka In the Olympic Peninsula of northwest Washington State a small town named Forks exists under a near-constant cover of clouds It rains on this inconsequential town more than any other place in the United States of America It was from this town and its gloomy omnipresent shade that my mother escaped with me when I was only a few months old It was in this town that I would been compelled to spend a month every summer until I was fourteen That was the year I finally put my foot down these past three summers my dad Charlie vacationed with me in California for two weeks instead It was to Forks that I now exiled myself an action that I took with great horror I detested Forks I loved Phoenix I loved the sun and the blistering heat I loved the vigorous sprawling cityBella my mom said to me the last of a thousand times before I got on the plane You don have to do this My mom looks like me except with short hair and laugh lines I felt a spasm of panic as I stared at her wide childlike eyes How could I leave my loving erratic hare-brained mother to fend for herself Of course she had Phil now so the bills would probably get paid there would be food in the refrigerator gas in her car and someone to call when she got lost but still I want to go I lied I would always been a bad liar but I would been saying this lie so frequently lately that it sounded almost convincing now Tell Charlie I said hi I willI ll see you soon she insisted You can come home whenever you want I ll come right back as soon as you need me But I could see the sacrifice in her eyes behind the promise Don worry about me I urged It ll be great I love you MomShe hugged my tightly for a minute and then I got on the plane and she was gone It a four-hour flight from Phoenix to Seattle another hour in a small plane up to Port Angeles and then an hour drive back down to Forks Flying did bother me the hour in the car with Charlie though I was a little worried about Charlie had really been fairly nice about the whole thing He seemed genuinely please that I was combing to live with him for the first time with any degree of permanence He would already gotten me registered for high school and was going to help me get a car But it was sure to be awkward with Charlie Neither of us was what anyone would call verbose and I did know what there was to say regardless I knew he was more than a little confused by my decision like my mother before me I had made a secret of my distaste for Forks When I landed in Port Angeles it was raining I did see it as an omen just unavoidable I would already said my goodbyes to the sun Charlie was waiting for me with the cruiser This I was expecting too Charlie is Police Chief Swan to the good people of Forks My primary motivation behind buying a car despite the scarcity of my funds was that I refused to be driven around town in a car with red and blue lights on top Nothing slows traffic down like a cop Charlie gave me an awkward one-armed hug when I stumbled my way off the plane It good to see you Bells he said smiling as he automatically caught and steadied me You haven changed much How Renee Mom fine It good to see you re too Dad I was allowed to call him Charlie to his face I had only a few bags Most of my Arizona clothes were too permeable for Washington My mom and I had pooled our resources to supplement my winter wardrobe but it was still scanty It all fit easily into the trunk of the cruiser I found a good car for you really cheap he announced when we were strapped in What kind of car I was suspicious of the way he said good car for you as opposed to just good car Well it a truck actually a Chevy Where did you find it Do you remember Billy Black from La Push La Push is the tiny Indian reservation on the coastNo He used to go fishing with us during the summer Charlie promptedThat would explain why I did remember him I do a good job of blocking painful unnecessary things from my memory He in a wheelchair now Charlie continued when I did respond so he can drive anymore and he offered to sell me his truck cheap What year is it I could see from his change of expression that this was the question he was hoping I would ask Well Billy done a lot of work on the engine it only a few years old really I hope he did think so little of me as to believe I would give up that easily When did he buy it He bought it in I think Did he buy it new Well no I think it was new in the early sixties or late fifties at the earliest he admitted sheepishly Dad I don really know anything about cars I would be able to fix it if anything went wrong and I could not afford a mechanic Really Bella the thing runs great They don build them like that anymore The thing I at the very least How cheap is cheap After all that was the part I could not compromise on Well honey I kind of already bought it for you As a homecoming gift Charlie peeked sideways at me with a hopeful expression Wow Free You did need to do that Dad I was going to buy myself a car I don mind I want you to be happy here He was looking ahead at the road when he said this Charlie was comfortable with expressing his emotions out loud I inherited that from him So I was looking straight ahead as I responded That really nice Dad Thanks I really appreciate it No need to add that my being happy in Forks is He did need to suffer along with me And I never looked a free truck in the mouth or engine Well now you re welcome he mumbled embarrassed by my thanksWe exchanged a few more comments on the weather which was wet and that was pretty much it for conversation We stared out the windows in silence It was better because it was raining yet though the clouds were dense and opaque It was easier because I knew what to expect of my day Mike came to sit by me in English and walked me to my next class with ChessClub Eric glaring at him all the while that was nattering People did look at me quite as much as they had yesterday I sat with a big group at lunch that included Mike Eric Jessica and several other people whose names and faces I now remembered I began to feel like I was treading water instead of drowning in it It was worse because I was tired I still could not sleep with the wind echoing around the house It was worse because Mr Varner called on me inTrig when my hand was raised and I had the wrong answer It was miserable because I had to play volleyball and the one time I did cringe out of the way of the ball I hit my teammate in the head with itAnd it was worse because Edward Cullen was in school at all All morning I was dreading lunch fearing his bizarre glares Part of mew anted to confront him and demand to know what his problem was While I was lying sleepless in my bed I even imagined what I would say But I knew myself too well to think I would really have the guts to do it I made the Cowardly Lion look like the terminator But when I walked into the cafeteria with Jessica trying to keep my eyes from sweeping the place for him and failing entirely I saw that his four siblings of sorts were sitting together at the same table and he was not with them Mike intercepted8 us and steered9 us to his table Jessica seemed elated by the attention and her friends quickly joined us But as I tried to listen to their easy chatter I was terribly uncomfortable waiting nervously for the moment he would arrive I hoped that he would simply ignore me when he came and prove my suspicions false He did come and as time passed I grew more and more tense I walked to Biology with more confidence when by the end of lunch he still had showed Mike who was taking on the qualities of a golden retriever walked faithfully by my side to class I held my breath at the door but Edward Cullen was there either I exhaled and went to my seat Mike followed talking about an upcoming trip to the beach He lingered by my desk till the bell rang Then he smiled at me wistfully and went to sit by a girl with braces and a bad perm It looked like I was going to have to do something about Mike and it would be easy Ina town like this where everyone lived on top of everyone else diplomacy was essential I had never been enormously tactful I had no practice dealing with overly friendly boys I was relieved that I had the desk to myself that Edward was absent I told myself that repeatedly But I could get rid of the nagging suspicion that I was the reason he was there It was ridiculous and egotistical to think that I could affect anyone that strongly It was impossible And yet I could not stop worrying that it was true When the school day was finally done and the blush was fading out of my cheeks from the volleyball incident I changed quickly back into my jeans and navy blue sweater I hurried from the girls locker room pleased to find that I had successfully evaded my retriever friend for the moment I walked swiftly out to the parking lot It was crowded now with fleeing students I got in my truck and dug through my bag to make sure I had what I needed Last night I would discovered that Charlie could not cook much besides fried eggs and bacon So I requested that I be assigned kitchen detail for the duration of my stay He was willing enough to hand over the keys to the banquet hall I also found out that he had no food in the house So I had my shopping list and the cash from the jar in the cupboard labeled FOOD\\\\x0cMONEY and I was on my way to the Thrift way I gunned my deafening engine to life ignoring the heads that turned in my direction and backed carefully into a place in the line of cars that were waiting to exit the parking lot As I waited trying to pretend that-the earsplitting rumble was coming from someone else car I saw the twoCullens and the Hale twins getting into their car It was the shiny newVolvo Of course I had noticed their clothes before I would been too mesmerized by their faces Now that I looked it was obvious that they were all dressed exceptionally well simply but in clothes that subtly hinted at designer origins With their remarkable good looks the style with which they carried themselves they could have worn dishrags and pulled it off It seemed excessive for them to have both looks and money But as far as I could tell life worked that way most of the time It did look as if it bought them any acceptance here No I did fully believe that The isolation must be their desire I could not imagine any door that would be opened by that degree of beauty They looked at my noisy truck as I passed them just like everyone else I kept my eyes straight forward and was relieved when I finally was free of the school grounds The Thrift way was not far from the school just a few streets south off the highway It was nice to be inside the supermarket it felt normal Idid the shopping at home and I fell into the pattern of the familiar task gladly The store was big enough inside that I could not hear the tapping of the rain on the roof to remind me where I was When I got home I unloaded all the groceries stuffing them in whereverI could find an open space I hoped Charlie would mind I wrapped potatoes in foil and stuck them in the oven to bake covered a steak in marinade and balanced it on top of a carton of eggs in the fridge When I was finished with that I took my book bag upstairs Before starting my homework I changed into a pair of dry sweats pulled my damp hair up into a pony-tail and checked my e-mail for the first time I had three messages Bella my mom wrote…Write me as soon as you get in Tell me how your flight was Is it raining I miss you already I am almost finished packing for Florida butI can find my pink blouse Do you know where I put it Phil says hi Mom I sighed and went to the next It was sent eight hours after the first Bella she wrote…Why haven you e-mailed me yet What are you waiting for Mom The last was from this morning Isabella If I haven heard from you by pm today I am calling Charlie I checked the clock I still had an hour but my mom was well known for the gun MomCalm down I am writing right now Don do anything rash Bella I sent that and began again MomEverything is great Of course it raining I was waiting for something to write about School isn bad just a little repetitive I met some nice kids who sit by me at lunch Your blouse is at the dry cleaners you were supposed to pick it upFriday Charlie bought me a truck can you believe it I love it It old but really sturdy which is good you know for me I miss you too I ll write again soon but I am not going to check my e-mail every five minutes Relax breathe I love you Bella I had decided to read Withering Heights the novel we were currently studying in English yet again for the fun of it and that what I was doing when Charlie came home I would lost track of the time and I hurried downstairs to take the potatoes out and put the steak in to broil Bella my father called out when he heard me on the stairs Who else I thought to myself Hey Dad welcome home Thanks He hung up his gun belt and stepped out of his boots as I bustled about the kitchen As far as I was aware he would never shot the gun-on the job But he kept it ready When I came here as a child he would always remove the bullets as soon as he walked in the door I guess he considered me old enough now not to shoot myself by accident and not depressed enough to shoot myself on purpose What for dinner" he asked warily My mother was an imaginative cook and her experiments were always edible I was surprised and sad that he seemed to remember that far back Steak and potatoes I answered and he looked relieved He seemed to feel awkward standing in the kitchen doing nothing he lumbered into the living room to watch TV while I worked We were both more comfortable that way I made a salad while the steaks cooked and set the table I called him in when dinner was ready and he sniffed appreciatively as he walked into the room'
The result of text.split() command is as follow:
['My',
'mother',
'drove',
'me',
'to',
'the',
'airport',
'with',
'the',
'windows',
'rolled',
'down',
'It',
'was',
'seventy-five',
'degrees',
'in',
'Phoenix',
'the',
'sky',
'a',
'perfect',
'cloudless',
'blue',
'I',
'was',
'wearing',
'my',
'favorite',
'shirtsleeve',
'less',
'white',
'eyelet',
'lace',
'I',
'was',
'wearing',
'it',
'as',
'a',
'farewell',
'gesture',
'My',
'carry-on',
'item',
'was',
'a',
'parka',
'In',
'the',
'Olympic',
'Peninsula',
'of',
'northwest',
'Washington',
'State',
'a',
'small',
'town',
'named',
'Forks',
'exists',
'under',
'a',
'near-constant',
'cover',
'of',
'clouds',
'It',
'rains',
'on',
'this',
'inconsequential',
'town',
'more',
'than',
'any',
'other',
'place',
'in',
'the',
'United',
'States',
'of',
'America',
'It',
'was',
'from',
'this',
'town',
'and',
'its',
'gloomy',
'omnipresent',
'shade',
'that',
'my',
'mother',
'escaped',
'with',
'me',
'when',
'I',
'was',
'only',
'a',
'few',
'months',
'old',
'It',
'was',
'in',
'this',
'town',
'that',
'I',
'would',
'been',
'compelled',
'to',
'spend',
'a',
'month',
'every',
'summer',
'until',
'I',
'was',
'fourteen',
'That',
'was',
'the',
'year',
'I',
'finally',
'put',
'my',
'foot',
'down',
'these',
'past',
'three',
'summers',
'my',
'dad',
'Charlie',
'vacationed',
'with',
'me',
'in',
'California',
'for',
'two',
'weeks',
'instead',
'It',
'was',
'to',
'Forks',
'that',
'I',
'now',
'exiled',
'myself',
'an',
'action',
'that',
'I',
'took',
'with',
'great',
'horror',
'I',
'detested',
'Forks',
'I',
'loved',
'Phoenix',
'I',
'loved',
'the',
'sun',
'and',
'the',
'blistering',
'heat',
'I',
'loved',
'the',
'vigorous',
'sprawling',
'cityBella',
'my',
'mom',
'said',
'to',
'me',
'the',
'last',
'of',
'a',
'thousand',
'times',
'before',
'I',
'got',
'on',
'the',
'plane',
'You',
'don',
'have',
'to',
'do',
'this',
'My',
'mom',
'looks',
'like',
'me',
'except',
'with',
'short',
'hair',
'and',
'laugh',
'lines',
'I',
'felt',
'a',
'spasm',
'of',
'panic',
'as',
'I',
'stared',
'at',
'her',
'wide',
'childlike',
'eyes',
'How',
'could',
'I',
'leave',
'my',
'loving',
'erratic',
'hare-brained',
'mother',
'to',
'fend',
'for',
'herself',
'Of',
'course',
'she',
'had',
'Phil',
'now',
'so',
'the',
'bills',
'would',
'probably',
'get',
'paid',
'there',
'would',
'be',
'food',
'in',
'the',
'refrigerator',
'gas',
'in',
'her',
'car',
'and',
'someone',
'to',
'call',
'when',
'she',
'got',
'lost',
'but',
'still',
'I',
'want',
'to',
'go',
'I',
'lied',
'I',
'would',
'always',
'been',
'a',
'bad',
'liar',
'but',
'I',
'would',
'been',
'saying',
'this',
'lie',
'so',
'frequently',
'lately',
'that',
'it',
'sounded',
'almost',
'convincing',
'now',
'Tell',
'Charlie',
'I',
'said',
'hi',
'I',
'willI',
'll',
'see',
'you',
'soon',
'she',
'insisted',
'You',
'can',
'come',
'home',
'whenever',
'you',
'want',
'I',
'll',
'come',
'right',
'back',
'as',
'soon',
'as',
'you',
'need',
'me',
'But',
'I',
'could',
'see',
'the',
'sacrifice',
'in',
'her',
'eyes',
'behind',
'the',
'promise',
'Don',
'worry',
'about',
'me',
'I',
'urged',
'It',
'll',
'be',
'great',
'I',
'love',
'you',
'MomShe',
'hugged',
'my',
'tightly',
'for',
'a',
'minute',
'and',
'then',
'I',
'got',
'on',
'the',
'plane',
'and',
'she',
'was',
'gone',
'It',
'a',
'four-hour',
'flight',
'from',
'Phoenix',
'to',
'Seattle',
'another',
'hour',
'in',
'a',
'small',
'plane',
'up',
'to',
'Port',
'Angeles',
'and',
'then',
'an',
'hour',
'drive',
'back',
'down',
'to',
'Forks',
'Flying',
'did',
'bother',
'me',
'the',
'hour',
'in',
'the',
'car',
'with',
'Charlie',
'though',
'I',
'was',
'a',
'little',
'worried',
'about',
'Charlie',
'had',
'really',
'been',
'fairly',
'nice',
'about',
'the',
'whole',
'thing',
'He',
'seemed',
'genuinely',
'please',
'that',
'I',
'was',
'combing',
'to',
'live',
'with',
'him',
'for',
'the',
'first',
'time',
'with',
'any',
'degree',
'of',
'permanence',
'He',
'would',
'already',
'gotten',
'me',
'registered',
'for',
'high',
'school',
'and',
'was',
'going',
'to',
'help',
'me',
'get',
'a',
'car',
'But',
'it',
'was',
'sure',
'to',
'be',
'awkward',
'with',
'Charlie',
'Neither',
'of',
'us',
'was',
'what',
'anyone',
'would',
'call',
'verbose',
'and',
'I',
'did',
'know',
'what',
'there',
'was',
'to',
'say',
'regardless',
'I',
'knew',
'he',
'was',
'more',
'than',
'a',
'little',
'confused',
'by',
'my',
'decision',
'like',
'my',
'mother',
'before',
'me',
'I',
'had',
'made',
'a',
'secret',
'of',
'my',
'distaste',
'for',
'Forks',
'When',
'I',
'landed',
'in',
'Port',
'Angeles',
'it',
'was',
'raining',
'I',
'did',
'see',
'it',
'as',
'an',
'omen',
'just',
'unavoidable',
'I',
'would',
'already',
'said',
'my',
'goodbyes',
'to',
'the',
'sun',
'Charlie',
'was',
'waiting',
'for',
'me',
'with',
'the',
'cruiser',
'This',
'I',
'was',
'expecting',
'too',
'Charlie',
'is',
'Police',
'Chief',
'Swan',
'to',
'the',
'good',
'people',
'of',
'Forks',
'My',
'primary',
'motivation',
'behind',
'buying',
'a',
'car',
'despite',
'the',
'scarcity',
'of',
'my',
'funds',
'was',
'that',
'I',
'refused',
'to',
'be',
'driven',
'around',
'town',
'in',
'a',
'car',
'with',
'red',
'and',
'blue',
'lights',
'on',
'top',
'Nothing',
'slows',
'traffic',
'down',
'like',
'a',
'cop',
'Charlie',
'gave',
'me',
'an',
'awkward',
'one-armed',
'hug',
'when',
'I',
'stumbled',
'my',
'way',
'off',
'the',
'plane',
'It',
'good',
'to',
'see',
'you',
'Bells',
'he',
'said',
'smiling',
'as',
'he',
'automatically',
'caught',
'and',
'steadied',
'me',
'You',
'haven',
'changed',
'much',
'How',
'Renee',
'Mom',
'fine',
'It',
'good',
'to',
'see',
'you',
're',
'too',
'Dad',
'I',
'was',
'allowed',
'to',
'call',
'him',
'Charlie',
'to',
'his',
'face',
'I',
'had',
'only',
'a',
'few',
'bags',
'Most',
'of',
'my',
'Arizona',
'clothes',
'were',
'too',
'permeable',
'for',
'Washington',
'My',
'mom',
'and',
'I',
'had',
'pooled',
'our',
'resources',
'to',
'supplement',
'my',
'winter',
'wardrobe',
'but',
'it',
'was',
'still',
'scanty',
'It',
'all',
'fit',
'easily',
'into',
'the',
'trunk',
'of',
'the',
'cruiser',
'I',
'found',
'a',
'good',
'car',
'for',
'you',
'really',
'cheap',
'he',
'announced',
'when',
'we',
'were',
'strapped',
'in',
'What',
'kind',
'of',
'car',
'I',
'was',
'suspicious',
'of',
'the',
'way',
'he',
'said',
'good',
'car',
'for',
'you',
'as',
'opposed',
'to',
'just',
'good',
'car',
'Well',
'it',
'a',
'truck',
'actually',
'a',
'Chevy',
'Where',
'did',
'you',
'find',
'it',
'Do',
'you',
'remember',
'Billy',
'Black',
'from',
'La',
'Push',
'La',
'Push',
'is',
'the',
'tiny',
'Indian',
'reservation',
'on',
'the',
'coastNo',
'He',
'used',
'to',
'go',
'fishing',
'with',
'us',
'during',
'the',
'summer',
'Charlie',
'promptedThat',
'would',
'explain',
'why',
'I',
'did',
'remember',
'him',
'I',
'do',
'a',
'good',
'job',
'of',
'blocking',
'painful',
'unnecessary',
'things',
'from',
'my',
'memory',
'He',
'in',
'a',
'wheelchair',
'now',
'Charlie',
'continued',
'when',
'I',
'did',
'respond',
'so',
'he',
'can',
'drive',
'anymore',
'and',
'he',
'offered',
'to',
'sell',
'me',
'his',
'truck',
'cheap',
'What',
'year',
'is',
'it',
'I',
'could',
'see',
'from',
'his',
'change',
'of',
'expression',
'that',
'this',
'was',
'the',
'question',
'he',
'was',
'hoping',
'I',
'would',
'ask',
'Well',
'Billy',
'done',
'a',
'lot',
'of',
'work',
'on',
'the',
'engine',
'it',
'only',
'a',
'few',
'years',
'old',
'really',
'I',
'hope',
'he',
'did',
'think',
'so',
'little',
'of',
'me',
'as',
'to',
'believe',
'I',
'would',
'give',
'up',
'that',
'easily',
'When',
'did',
'he',
'buy',
'it',
'He',
'bought',
'it',
'in',
'I',
'think',
'Did',
'he',
'buy',
'it',
'new',
'Well',
'no',
'I',
'think',
'it',
'was',
'new',
'in',
'the',
'early',
'sixties',
'or',
'late',
'fifties',
'at',
'the',
'earliest',
'he',
'admitted',
'sheepishly',
'Dad',
'I',
'don',
'really',
'know',
'anything',
'about',
'cars',
'I',
'would',
'be',
'able',
'to',
'fix',
'it',
'if',
'anything',
'went',
'wrong',
'and',
'I',
'could',
'not',
'afford',
'a',
'mechanic',
'Really',
'Bella',
'the',
'thing',
'runs',
'great',
'They',
'don',
'build',
'them',
'like',
'that',
'anymore',
'The',
'thing',
'I',
'at',
'the',
'very',
'least',
'How',
'cheap',
'is',
'cheap',
...]
len(text) is the total number of characters in the file 'train.txt' (assuming ASCII text, this will be the same as your file-size).
len(text.split(...) is the total number of tokens in the file (as determined by your delimiter).
Sidenote: Assuming your delimiter is \n you can cross verify this on unix with cat train.txt | wc -l.

Does HttpResponseObject change after read().decode('utf-8') in python openurl method?

I am currently learning python 3 and was playing around with openurl when I noticed that after read().decode('utf-8') the length of my HTTP response object became zero and I cant figure out why is it behaving like that.
story = urlopen('http://sixty-north.com/c/t.txt')
print(story.read().decode('utf-8'))
story_words = []
for line in story:
line_words = line.decode('utf-8').split()
for word in line_words:
story_words.append(word)
story.close()
print(story_words)
On execution of the print command on line 2, the length of the HTTP response in story changes from 593 to 0 and an empty array is printed on story words. If I remove the print command the story_words array is populated.
Output with read().decode()
It was the best of times
it was the worst of times
it was the age of wisdom
it was the age of foolishness
it was the epoch of belief
it was the epoch of incredulity
it was the season of Light
it was the season of Darkness
it was the spring of hope
it was the winter of despair
we had everything before us
we had nothing before us
we were all going direct to Heaven
we were all going direct the other way
in short the period was so far like the present period that some of
its noisiest authorities insisted on its being received for good or for
evil in the superlative degree of comparison only
[]
Output without it -
['It', 'was', 'the', 'best', 'of', 'times', 'it', 'was', 'the', 'worst', 'of', 'times', 'it', 'was', 'the', 'age', 'of', 'wisdom', 'it', 'was', 'the', 'age', 'of', 'foolishness', 'it', 'was', 'the', 'epoch', 'of', 'belief', 'it', 'was', 'the', 'epoch', 'of', 'incredulity', 'it', 'was', 'the', 'season', 'of', 'Light', 'it', 'was', 'the', 'season', 'of', 'Darkness', 'it', 'was', 'the', 'spring', 'of', 'hope', 'it', 'was', 'the', 'winter', 'of', 'despair', 'we', 'had', 'everything', 'before', 'us', 'we', 'had', 'nothing', 'before', 'us', 'we', 'were', 'all', 'going', 'direct', 'to', 'Heaven', 'we', 'were', 'all', 'going', 'direct', 'the', 'other', 'way', 'in', 'short', 'the', 'period', 'was', 'so', 'far', 'like', 'the', 'present', 'period', 'that', 'some', 'of', 'its', 'noisiest', 'authorities', 'insisted', 'on', 'its', 'being', 'received', 'for', 'good', 'or', 'for', 'evil', 'in', 'the', 'superlative', 'degree', 'of', 'comparison', 'only']
Calling urlopen returns a file-like buffer object. With read you can get the response up to a number of bytes or get the whole response until EOF when not passing a parameter. After reading, the buffer is empty. This means that you need to save the returned value in a variable before printing.

Python3 - how can I sort the list by frequency of its elements? [duplicate]

This question already has answers here:
Sort list by frequency
(8 answers)
Closed 4 years ago.
I'm working on the code that can analyze the input text.
One of the functions I would like to ask for help is that making a list of words used in order of descending frequency.
By referring the similar topics in stack overflow, I was able to retain only alphanumeric characters (remove all quotation / punctuation etc) and put each words into the list.
Here is the list I have now. (variable called word_list)
['Hi', 'beautiful', 'creature', 'Said', 'by', 'Rothchild', 'the',
'biggest', 'enemy', 'of', 'Zun', 'Zun', 'started', 'get', 'afraid',
'of', 'him', 'As', 'her', 'best', 'friend', 'Lia', 'can', 'feel',
'her', 'fear', 'Why', 'the', 'the', 'hell', 'you', 'are', 'here']
(FYI, text file is just random fanfiction I found from the web)
However, I'm having trouble to modify this list to the list in order of descending frequency - for example, there are 3 'the' in that list, so 'the' becomes the first element of the list. next element would be 'of', which occurring 2 times.
I tried several things similar to my case but keep displaying error (Counter, sorted).
Can someone teach me how can I sort the list?
In addition, after sorting the list, how can I retain only 1 copy for repeating ones? (my current idea is using for loop and indexing - compare with previous index, remove if it's same.)
Thank you.
You can use a itertools.Counter for your sorting in different ways:
from collections import Counter
lst = ['Hi', 'beautiful', 'creature', 'Said', 'by', 'Rothchild', 'the', 'biggest', 'enemy', 'of', 'Zun', 'Zun', 'started', 'get', 'afraid', 'of', 'him', 'As', 'her', 'best', 'friend', 'Lia', 'can', 'feel', 'her', 'fear', 'Why', 'the', 'the', 'hell', 'you', 'are', 'here']
c = Counter(lst) # mapping: {item: frequency}
# now you can use the counter directly via most_common (1.)
lst = [x for x, _ in c.most_common()]
# or as a sort key (2.)
lst = sorted(set(lst), key=c.get, reverse=True)
# ['the', 'Zun', 'of', 'her', 'Hi', 'hell', 'him', 'friend', 'Lia',
# 'get', 'afraid', 'Rothchild', 'started', 'by', 'can', 'Why', 'fear',
# 'you', 'are', 'biggest', 'enemy', 'Said', 'beautiful', 'here',
# 'best', 'creature', 'As', 'feel']
These approaches use either the Counter keys (1.) or set for the removal of duplicates.
However, if you want the sort to be stable with regard to the original list (keep order of occurrence for equal frequency items), you might have to do this, following the collections.OrderedDict based recipe for duplicate removal:
from collections import OrderedDict
lst = sorted(OrderedDict.fromkeys(lst), key=c.get, reverse=True)
# ['the', 'of', 'Zun', 'her', 'Hi', 'beautiful', 'creature', 'Said',
# 'by', 'Rothchild', 'biggest', 'enemy', 'started', 'get', 'afraid',
# 'him', 'As', 'best', 'friend', 'Lia', 'can', 'feel', 'fear', 'Why',
# 'hell', 'you', 'are', 'here']

from a list of strings, how do you get a tuple/list of words which contain 3 or more characters and are evenly spaced in python [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Checking if a string's characters are ascending alphabetically and its ascent is evenly spaced python
I have a list of strings/words:
mylist = ['twas', 'brillig', 'and', 'the', 'slithy', 'toves', 'did', 'gyre', 'and', 'gimble', 'in', 'the', 'wabe', 'all', 'mimsy', 'were', 'the', 'borogoves', 'and', 'the', 'mome', 'raths', 'outgrabe', '"beware', 'the', 'jabberwock', 'my', 'son', 'the', 'jaws', 'that', 'bite', 'the', 'claws', 'that', 'catch', 'beware', 'the', 'jubjub', 'bird', 'and', 'shun', 'the', 'frumious', 'bandersnatch', 'he', 'took', 'his', 'vorpal', 'sword', 'in', 'hand', 'long', 'time', 'the', 'manxome', 'foe', 'he', 'sought', 'so', 'rested', 'he', 'by', 'the', 'tumtum', 'tree', 'and', 'stood', 'awhile', 'in', 'thought', 'and', 'as', 'in', 'uffish', 'thought', 'he', 'stood', 'the', 'jabberwock', 'with', 'eyes', 'of', 'flame', 'came', 'whiffling', 'through', 'the', 'tulgey', 'wood', 'and', 'burbled', 'as', 'it', 'came', 'one', 'two', 'one', 'two', 'and', 'through', 'and', 'through', 'the', 'vorpal', 'blade', 'went', 'snicker-snack', 'he', 'left', 'it', 'dead', 'and', 'with', 'its', 'head', 'he', 'went', 'galumphing', 'back', '"and', 'has', 'thou', 'slain', 'the', 'jabberwock', 'come', 'to', 'my', 'arms', 'my', 'beamish', 'boy', 'o', 'frabjous', 'day', 'callooh', 'callay', 'he', 'chortled', 'in', 'his', 'joy', '`twas', 'brillig', 'and', 'the', 'slithy', 'toves', 'did', 'gyre', 'and', 'gimble', 'in', 'the', 'wabe', 'all', 'mimsy', 'were', 'the', 'borogoves', 'and', 'the', 'mome', 'raths', 'outgrabe']
firstly i need to only get the words which have 3 or more characters in them - i assume a for loop for that or something.
then i need to get a list of words which contain only words that increase from left to right alphabetically and are a fixed number apart. (e.g. ('ace', 2) or ('ceg', 2) does not have to be 2) the list also has to be sorted in alphabetical order and each element should be a tuple consisting of the word and character difference.
I think i have to use a for loop but im not sure how to use it in this case and am not sure how to do the second part.
for the list above the answer i should get is:
([])
I do not have the newest version of python.
Any help is greatly appreciated.
You should probably start by learning how to use a for loop. A for loop will get things out of a collection and assign them to a variable:
for letter in "strings are collections":
print letter
Or..
for thing in ['this is a list', 'of', 4, 'things']:
if thing == 4:
print '4 is in the list'
If you're able to do more than this, then try something, figure out where you get stuck, and ask as more specifically what you need help with.
Take this problem in steps
To filter words with length >= 3
[w for w in mylist if len(w) >= 3]
To see if the words are increasing in regular interval? Calculate the difference, between consecutive letters, create a set and check if the length == 1
diff =lambda word:len({ord(n)-ord(c) for n,c in zip(word[1:],word)}) == 1
Now use this new function to filter the remaining words
[w for w in mylist if len(w) >= 3 and diff(w)]

Technique to remove common words(and their plural versions) from a string

I am attempting to find tags(keywords) for a recipe by parsing a long string of text. The text contains the recipe ingredients, directions and a short blurb.
What do you think would be the most efficient way to remove common words from the tag list?
By common words, I mean words like: 'the', 'at', 'there', 'their' etc.
I have 2 methodologies I can use, which do you think is more efficient in terms of speed and do you know of a more efficient way I could do this?
Methodology 1:
- Determine the number of times each word occurs(using the library Collections)
- Have a list of common words and remove all 'Common Words' from the Collection object by attempting to delete that key from the Collection object if it exists.
- Therefore the speed will be determined by the length of the variable delims
import collections from Counter
delim = ['there','there\'s','theres','they','they\'re']
# the above will end up being a really long list!
word_freq = Counter(recipe_str.lower().split())
for delim in set(delims):
del word_freq[delim]
return freq.most_common()
Methodology 2:
- For common words that can be plural, look at each word in the recipe string, and check if it partially contains the non-plural version of a common word. Eg; For the string "There's a test" check each word to see if it contains "there" and delete it if it does.
delim = ['this','at','them'] # words that cant be plural
partial_delim = ['there','they',] # words that could occur in many forms
word_freq = Counter(recipe_str.lower().split())
for delim in set(delims):
del word_freq[delim]
# really slow
for delim in set(partial_delims):
for word in word_freq:
if word.find(delim) != -1:
del word_freq[delim]
return freq.most_common()
I'd just do something like this:
from nltk.corpus import stopwords
s=set(stopwords.words('english'))
txt="a long string of text about him and her"
print filter(lambda w: not w in s,txt.split())
which prints
['long', 'string', 'text']
and in terms of complexity should be O(n) in number of words in the string, if you believe the hashed set lookup is O(1).
FWIW, my version of NLTK defines 127 stopwords:
'all', 'just', 'being', 'over', 'both', 'through', 'yourselves', 'its', 'before', 'herself', 'had', 'should', 'to', 'only', 'under', 'ours', 'has', 'do', 'them', 'his', 'very', 'they', 'not', 'during', 'now', 'him', 'nor', 'did', 'this', 'she', 'each', 'further', 'where', 'few', 'because', 'doing', 'some', 'are', 'our', 'ourselves', 'out', 'what', 'for', 'while', 'does', 'above', 'between', 't', 'be', 'we', 'who', 'were', 'here', 'hers', 'by', 'on', 'about', 'of', 'against', 's', 'or', 'own', 'into', 'yourself', 'down', 'your', 'from', 'her', 'their', 'there', 'been', 'whom', 'too', 'themselves', 'was', 'until', 'more', 'himself', 'that', 'but', 'don', 'with', 'than', 'those', 'he', 'me', 'myself', 'these', 'up', 'will', 'below', 'can', 'theirs', 'my', 'and', 'then', 'is', 'am', 'it', 'an', 'as', 'itself', 'at', 'have', 'in', 'any', 'if', 'again', 'no', 'when', 'same', 'how', 'other', 'which', 'you', 'after', 'most', 'such', 'why', 'a', 'off', 'i', 'yours', 'so', 'the', 'having', 'once'
obviously you can provide your own set; I'm in agreement with the comment on your question that it's probably easiest (and fastest) to just provide all the variations you want to eliminate up front, unless you want to eliminate a lot more words than this but then it becomes more a question of spotting interesting ones than eliminating spurious ones.
Your problem domain is "Natural Language Processing".
If you don't want to reinvent the wheel, use NLTK, search for stemming in the docs.
Given that NLP is one of the hardest subjects in computer science, reinventing this wheel is a lot of work...
You ask about speed, but you should be more concerned with accuracy. Both your suggestions will make a lot of mistakes, removing either too much or too little (for example, there are a lot of words that contain the substring "at"). I second the suggestion to look into the nltk module. In fact, one of the early examples in the NLTK book involves removing common words until the most common remaining ones reveal something about the genre. You'll get not only tools, but instruction on how to go about it.
Anyway you'll spend much longer writing your program than your computer will spend executing it, so focus on doing it well.

Categories