Does anyone know how to add row numbers?

Does anyone know how to add row numbers? - python

I can open this file directly from the net,and I want to add row numbers to each line based on rules. If you need header row number,then start from number 1, if no need, then start from next line. This is my code, I tried a lot but doesn't work. It looks like picture. Does anyone how to solve this problem? Thanks in advance!
import sys
class Main:
def task1(self):
print('*' * 30, 'Task')
import urllib.request
# url
url = 'http://www.born.nhely.hu/group_list.txt'
# Initiate a request to get a response
while True:
try:
response = urllib.request.urlopen(url)
except Exception as e:
print('An error has occurred, the request is being made again, the error message is as follows：', e)
else:
break
# Print all student information
content = response.read().decode('utf-8')
#add row number
header_row = input("Do you want to know header_row numbers? Y OR N?")
if header_row == 'Y':
for i, line in enumerate(content, start=1):
print(f'{i},{line}')
else:
for i, line in enumerate(content, start=0):
print('{},{}'.format(i, line.strip()))
def start(self):
self.task1()
Main().start()

Have a look at the data you are downloading:
Name;Short name;Email;Country;Other spoken languages
ABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?
AGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English
...
Now look at the results you are getting:
1,N
2,a
3,m
4,e
5,;
6,S
7,h
8,o
...
It should be apparent that you are looping character by character; not line by line.
When you have:
for i, line in enumerate(content, start=1):
print(f'{i},{line}')
content is a string -- not a list of lines -- so you will loop over the string character by character with the for loop.
So to fix, do:
for i, line in enumerate(content.splitlines(), start=1):
print(f'{i},{line}')
Or, you can change the method of reading from the server to reading lines instead of characters:
content = response.readlines()

Your absorbing the .txt content in one big string... if you use .readlines() instead of .read(), you can achieve what you want.
You should modify this:
# Print all student information
content = response.read().decode('utf-8')
To this:
# Print all student information
content = response.readlines()
You can use the repr() method to take a look at your data:
print(repr(content))
'Name;Short name;Email;Country;Other spoken languages\r\nABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?\r\nAGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English\r\nAMIN Asjad;?;;?;?\r\nATILA Arda Burak;Arda;arda_atila#hotmail.com;Turkey;English\r\nBELTRAN CASTRO Carlos Ricardo;Ricardo;crbeltrancas#gmail.com;Colombia;English, Chinese\r\nBhatti Muhammad Hasan;?;;?;?\r\nCAKIR Alp Hazar;Alp;alphazarc#gmail.com;Turkey;English\r\nDENG Zhihui;Deng;dzhfalcon0727#gmail.com;China;English\r\nDURUER Ahmet Enes;Ahmet / kahverengi;hello#ahmetduruer.com;Turkey;English\r\nENKHZAYA Jagar;Jager;japman2400#gmail.com;Mongolia;English\r\nGHAIBAH Sanaa;Sanaa;sanaagheibeh12#gmail.com;Syria;English\r\nGUO Ruizheng;?;ruizhengguo#gmail.com;China;English\r\nGURBANZADE Gurban;Qurban;gurbanzade01#gmail.com;Azeribaijan;English, Russian, Turkish\r\nHASNAIN Syed Muhammad;Hasnain;syedhasnainhijazy313#gmail.com;Pakistan;?\r\nISMAYILOV Firdovsi;Firi;firiisi#gmail.com;Azeribaijan ?;English,Russian,Turkish\r\nKINGRANI Muskan;Muskan;muskankingrani4#gmail.com;India;English\r\nKOKO Susan Kekeli Ruth;Susan;susankoko3#gmail.com;Ghana;N/A\r\nKOLA-OLALEYE Adeola Damilola;Adeola;inboxadeola#gmail.com;Nigeria;French\r\nLEWIS Madison Buse;?;madisonbuse#yahoo.com;Turkey;Turkish\r\nLI Ting;Ting;514053044#qq.com;China;English\r\nMARUSENKO Svetlana;Svetlana;svetlana.maru#gmail.com;Russia;English, German\r\nMOHANTY Cyrus;cyrus;cyrusmohanty5261#gmail.com;India;English\r\nMOTHOBI Thabo Emmanuel;thabo;thabomothobi#icloud.com;South Africa;English\r\nNayudu Yashmit Vinay;?;;?;?\r\nPurevsuren Davaadorj;?;Purevsuren.davaadorj99#gmail.com;Mongolia ?;English\r\nSAJID Anoosha;Anoosha;anooshasajid12#gmail.com;Pakistan;English\r\nSHANG Rongxiang;Xiang;1074482757#qq.com;China;English\r\nSU Haobo;Su;2483851740#qq.com;China;English\r\nTAKEUCHI ROSSMAN Elly;Elly;elliebanana10th#gmail.com;Japan;English\r\nULUSOY Nedim Can;Nedim;nedimcanulusoy#gmail.com;Turkey;English, Hungarian\r\nXuan Qijian;Xuan;xjwjadon#gmail.com;China ?;?\r\nYUAN Gaopeng;Yuan;1277237374#qq.com;China;English\r\n'
vs
print(repr(content))
[b'Name;Short name;Email;Country;Other spoken languages\r\n', b'ABOUELHASSAN Shehab Ibrahim Adbelazin;?;dwedar909#gmail.com;?;?\r\n', b'AGHAEI HOSSEIN ABADI Mohammad Mehdi;Matt;mahdiaghaei355#gmail.com;Iran;English\r\n', b'AMIN Asjad;?;;?;?\r\n', b'ATILA Arda Burak;Arda;arda_atila#hotmail.com;Turkey;English\r\n', b'BELTRAN CASTRO Carlos Ricardo;Ricardo;crbeltrancas#gmail.com;Colombia;English, Chinese\r\n', b'Bhatti Muhammad Hasan;?;;?;?\r\n', b'CAKIR Alp Hazar;Alp;alphazarc#gmail.com;Turkey;English\r\n', b'DENG Zhihui;Deng;dzhfalcon0727#gmail.com;China;English\r\n', b'DURUER Ahmet Enes;Ahmet / kahverengi;hello#ahmetduruer.com;Turkey;English\r\n', b'ENKHZAYA Jagar;Jager;japman2400#gmail.com;Mongolia;English\r\n', b'GHAIBAH Sanaa;Sanaa;sanaagheibeh12#gmail.com;Syria;English\r\n', b'GUO Ruizheng;?;ruizhengguo#gmail.com;China;English\r\n', b'GURBANZADE Gurban;Qurban;gurbanzade01#gmail.com;Azeribaijan;English, Russian, Turkish\r\n', b'HASNAIN Syed Muhammad;Hasnain;syedhasnainhijazy313#gmail.com;Pakistan;?\r\n', b'ISMAYILOV Firdovsi;Firi;firiisi#gmail.com;Azeribaijan ?;English,Russian,Turkish\r\n', b'KINGRANI Muskan;Muskan;muskankingrani4#gmail.com;India;English\r\n', b'KOKO Susan Kekeli Ruth;Susan;susankoko3#gmail.com;Ghana;N/A\r\n', b'KOLA-OLALEYE Adeola Damilola;Adeola;inboxadeola#gmail.com;Nigeria;French\r\n', b'LEWIS Madison Buse;?;madisonbuse#yahoo.com;Turkey;Turkish\r\n', b'LI Ting;Ting;514053044#qq.com;China;English\r\n', b'MARUSENKO Svetlana;Svetlana;svetlana.maru#gmail.com;Russia;English, German\r\n', b'MOHANTY Cyrus;cyrus;cyrusmohanty5261#gmail.com;India;English\r\n', b'MOTHOBI Thabo Emmanuel;thabo;thabomothobi#icloud.com;South Africa;English\r\n', b'Nayudu Yashmit Vinay;?;;?;?\r\n', b'Purevsuren Davaadorj;?;Purevsuren.davaadorj99#gmail.com;Mongolia ?;English\r\n', b'SAJID Anoosha;Anoosha;anooshasajid12#gmail.com;Pakistan;English\r\n', b'SHANG Rongxiang;Xiang;1074482757#qq.com;China;English\r\n', b'SU Haobo;Su;2483851740#qq.com;China;English\r\n', b'TAKEUCHI ROSSMAN Elly;Elly;elliebanana10th#gmail.com;Japan;English\r\n', b'ULUSOY Nedim Can;Nedim;nedimcanulusoy#gmail.com;Turkey;English, Hungarian\r\n', b'Xuan Qijian;Xuan;xjwjadon#gmail.com;China ?;?\r\n', b'YUAN Gaopeng;Yuan;1277237374#qq.com;China;English\r\n']
Also, instead of hard-coding the charset as utf-8, you can use response.headers.get_content_charset()

Related

Get specific value within a file in python

Hello I have a file such as
line .....
line ...content....
SEs for parameters:
0.290391 0.273460 0.236199 0.177329 0.205789 0.221322 0.283763 0.133840 0.119349 0.161495 0.166068 0.340432 0.267828 0.211030 0.175328 0.201448 0.172427 0.244625 0.118869 0.070389 0.085757 0.121992 0.295142 0.371023 0.286122 0.114233 0.191837 0.086125 0.119095 0.061429 0.116536 0.030760 0.018447
contennn
llinnee
some stuf ...
and I would like to get the last value after the SEs for parameters: match (0.018447)
and save it into a variable called :Number
than I should get
print (Number)
0.018447
does someone have an idea using python3 ?

Well I found by using :
with open("path/file.txt", "r") as ifile:
for line in ifile:
if line.startswith("SEs for parameters:"):
SE=next(ifile, ' ').strip()
Number=re.split('\s+', SE)
print(SE[-1])

Python parse large db over 4gb

I'm trying to parse a db file with python that is over 4 gb.
Example from the db file:
% Tags relating to '217.89.104.48 - 217.89.104.63'
% RIPE-USER-RESOURCE
inetnum: 194.243.227.240 - 194.243.227.255
netname: PRINCESINDUSTRIEALIMENTARI
remarks: INFRA-AW
descr: PRINCES INDUSTRIE ALIMENTARI
descr: Provider Local Registry
descr: BB IBS
country: IT
admin-c: DUMY-RIPE
tech-c: DUMY-RIPE
status: ASSIGNED PA
notify: order.manager2#telecomitalia.it
mnt-by: INTERB-MNT
changed: unread#ripe.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
% Tags relating to '194.243.227.240 - 194.243.227.255'
% RIPE-USER-RESOURCE
inetnum: 194.16.216.176 - 194.16.216.183
netname: SE-CARLSTEINS
descr: CARLSTEINS TRAFIK AB
org: ORG-CTA17-RIPE
country: SE
admin-c: DUMY-RIPE
tech-c: DUMY-RIPE
status: ASSIGNED PA
notify: mntripe#telia.net
mnt-by: TELIANET-LIR
changed: unread#ripe.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
I want to parse each block starting with % Tags relating to
and out of the block I want to extract the inetnum and first descr
This is what I got so far: (Updated)
import re
with open('test.db', "r") as f:
content = f.read()
r = re.compile(r''
'descr:\s+(.*?)\n',
re.IGNORECASE)
res = r.findall(content)
print res

as it's over 4gb file you don't want to read all the file in one time by using f.read()
but using the file object as an iterator (when you iterate on a file you get one line after the other)
the following genererator should do the job
def parse(filename):
current= None
for l in open(filename):
if l.startswith("% Tags relating to"):
if current is not None:
yield current
current = {}
elif l.startswith("inetnum:"):
current["inetnum"] = l.split(":",1)[1].strip()
elif l.startswith("descr") and not "descr" in current:
current["descr"] = l.split(":",1)[1].strip()
if current is not None:
yield current
and you can use it as the following
for record in parse("test.db"):
print (record)
result on the test file:
{'inetnum': '194.243.227.240 - 194.243.227.255', 'descr': 'PRINCES INDUSTRIE ALIMENTARI'}
{'inetnum': '194.16.216.176 - 194.16.216.183', 'descr': 'CARLSTEINS TRAFIK AB'}

If you only want to get first descr :
r = re.compile(r''
'descr:\s+(.*?)\n(?:descr:.*\n)*',
re.IGNORECASE)
If you want inetnum and first descr :
[ a + b for (a,b) in re.compile(r''
'(?:descr:\s+(.*?)\n(?:descr:.*\n)*)|(?:inetnum:\s+(.*?)\n)',
re.IGNORECASE) ]
I must admit I make no use of % Tags relating to, and that I suppose that all descr are consecutive.

BeautifulSoup in Python not parsing right

I am running Python 2.7.5 and using the built-in html parser for what I am about to describe.
The task I am trying to accomplish is to take a chunk of html that is essentially a recipe. Here is an example.
html_chunk = "<h1>Miniature Potato Knishes</h1><p>Posted by bettyboop50 at recipegoldmine.com May 10, 2001</p><p>Makes about 42 miniature knishes</p><p>These are just yummy for your tummy!</p><p>3 cups mashed potatoes (about<br> 2 very large potatoes)<br>2 eggs, slightly beaten<br>1 large onion, diced<br>2 tablespoons margarine<br>1 teaspoon salt (or to taste)<br>1/8 teaspoon black pepper<br>3/8 cup Matzoh meal<br>1 egg yolk, beaten with 1 tablespoon water</p><p>Preheat oven to 400 degrees F.</p><p>Sauté diced onion in a small amount of butter or margarine until golden brown.</p><p>In medium bowl, combine mashed potatoes, sautéed onion, eggs, margarine, salt, pepper, and Matzoh meal.</p><p>Form mixture into small balls about the size of a walnut. Brush with egg yolk mixture and place on a well-greased baking sheet and bake for 20 minutes or until well browned.</p>"
The goal is to separate out the header, junk, ingredients, instructions, serving, and number of ingredients.
Here is my code that accomplishes that
from bs4 import BeautifulSoup
def list_to_string(list):
joined = ""
for item in list:
joined += str(item)
return joined
def get_ingredients(soup):
for p in soup.find_all('p'):
if p.find('br'):
return p
def get_instructions(p_list, ingredient_index):
instructions = []
instructions += p_list[ingredient_index+1:]
return instructions
def get_junk(p_list, ingredient_index):
junk = []
junk += p_list[:ingredient_index]
return junk
def get_serving(p_list):
for item in p_list:
item_str = str(item).lower()
if ("yield" or "make" or "serve" or "serving") in item_str:
yield_index = p_list.index(item)
del p_list[yield_index]
return item
def ingredients_count(ingredients):
ingredients_list = ingredients.find_all(text=True)
return len(ingredients_list)
def get_header(soup):
return soup.find('h1')
def html_chunk_splitter(soup):
ingredients = get_ingredients(soup)
if ingredients == None:
error = 1
header = ""
junk_string = ""
instructions_string = ""
serving = ""
count = ""
else:
p_list = soup.find_all('p')
serving = get_serving(p_list)
ingredient_index = p_list.index(ingredients)
junk_list = get_junk(p_list, ingredient_index)
instructions_list = get_instructions(p_list, ingredient_index)
junk_string = list_to_string(junk_list)
instructions_string = list_to_string(instructions_list)
header = get_header(soup)
error = ""
count = ingredients_count(ingredients)
return (header, junk_string, ingredients, instructions_string,
serving, count, error)
It works well except in situations where I have chunks that contain strings like "Sauté" because soup = BeautifulSoup(html_chunk) causes Sauté to turn into SautÃ© and this is a problem because I have a huge csv file of recipes like the html_chunk and I'm trying to structure all of them nicely and then get the output back into a database. I tried checking it SautÃ© comes out right using this html previewer and it still comes out as SautÃ©. I don't know what to do about this.
What's stranger is that when I do what BeautifulSoup's documentation shows
BeautifulSoup("Sacré bleu!")
# <html><head></head><body>Sacré bleu!</body></html>
I get
# Sacr├⌐ bleu!
But my colleague tried that on his Mac, running from terminal, and he got exactly what the documentation shows.
I really appreciate all your help. Thank you.

This is not a parsing problem; it is about encoding, rather.
Whenever working with text which might contain non-ASCII characters (or in Python programs which contain such characters, e.g. in comments or docstrings), you should put a coding cookie in the first or - after the shebang line - second line:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
... and make sure this matches your file encoding (with vim: :set fenc=utf-8).

BeautifulSoup tries to guess the encoding, sometimes it makes a mistake, however you can specify the encoding by adding the from_encoding parameter:
for example
soup = BeautifulSoup(html_text, from_encoding="UTF-8")
The encoding is usually available in the header of the webpage

Script like google suggest in python

I am writing a script that works like google suggest. Problem is that I am trying to get a suggestion for next 2 most likely words.
The example uses a txt file working_bee.txt. When writing a text "mis" I should get suggestions like "Miss Mary , Miss Taylor, ...". I only get "Miss, ...". I suspect the Ajax responseText method gives only a single word?
Any ideas what is wrong?
# Something that looks like Google suggest
def count_words(xFile):
frequency = {}
words=[]
for l in open(xFile, "rt"):
l = l.strip().lower()
for r in [',', '.', "'", '"', "!", "?", ":", ";"]:
l = l.replace(r, " ")
words += l.split()
for i in range(len(words)-1):
frequency[words[i]+" "+words[i+1]] = frequency.get(words[i]+" "+words[i+1], 0) + 1
return frequency
# read valid words from file
ws = count_words("c:/mod_python/working_bee.txt").keys()
def index(req):
req.content_type = "text/html"
return '''
<script>
function complete(q) {
var xhr, ws, e
e = document.getElementById("suggestions")
if (q.length == 0) {
e.innerHTML = ''
return
}
xhr = XMLHttpRequest()
xhr.open('GET', 'suggest_from_file.py/complete?q=' + q, true)
xhr.onreadystatechange = function() {
if (xhr.readyState == 4) {
ws = eval(xhr.responseText)
e.innerHTML = ""
for (i = 0; i < ws.length; i++)
e.innerHTML += ws[i] + "<br>"
}
}
xhr.send(null)
}
</script>
<input type="text" onkeyup="complete(this.value)">
<div id="suggestions"></div>
'''
def complete(req, q):
req.content_type = "text"
return [w for w in ws if w.startswith(q)]
txt file:
IV. Miss Taylor's Working Bee
"So you must. Well, then, here goes!" Mr. Dyce swung her up to his shoulder and went, two steps at a time, in through the crowd of girls, so that he arrived there first when the door was opened. There in the hall stood Miss Mary Taylor, as pretty as a pink.
"I heard there was to be a bee here this afternoon, and I've brought Phronsie; that's my welcome," he announced.
"See, I've got a bag," announced Phronsie from her perch, and holding it forth.
So the bag was admired, and the girls trooped in, going up into Miss Mary's pretty room to take off their things. And presently the big library, with the music-room adjoining, was filled with the gay young people, and the bustle and chatter began at once.
"I should think you'd be driven wild by them all wanting you at the same minute." Mr. Dyce, having that desire at this identical time, naturally felt a bit impatient, as Miss Mary went about inspecting the work, helping to pick out a stitch here and to set a new one there, admiring everyone's special bit of prettiness, and tossing a smile and a gay word in every chance moment between.
"Oh, no," said Miss Mary, with a little laugh, "they're most of them my Sunday- school scholars, you know."

Looking at your code I believe you are not sending the correct thing to Apache. You are sending apache a list and apache is expecting a string. I would suggest changing your return to json:
import json
def complete(req, q):
req.content_type = "text"
return json.dumps([w for w in ws if w.startswith(q)])

Parsing files (ics/ icalendar) using Python

I have a .ics file in the following format. What is the best way to parse it? I need to retrieve the Summary, Description, and Time for each of the entries.
BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
PRODID:-//Lotus Development Corporation//NONSGML Notes 8.0//EN
METHOD:PUBLISH
BEGIN:VTIMEZONE
TZID:India
BEGIN:STANDARD
DTSTART:19500101T020000
TZOFFSETFROM:+0530
TZOFFSETTO:+0530
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
DTSTART;TZID="India":20100615T111500
DTEND;TZID="India":20100615T121500
TRANSP:OPAQUE
DTSTAMP:20100713T071035Z
CLASS:PUBLIC
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n
UID:12D3901F0AD9E83E65257743001F2C9A-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:12D3901F0AD9E83E65257743001F2C9A
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100628T130000
DTEND;TZID="India":20100628T133000
TRANSP:OPAQUE
DTSTAMP:20100628T055408Z
CLASS:PUBLIC
DESCRIPTION:
SUMMARY:smart energy management
LOCATION:8778/92050462
UID:07F96A3F1C9547366525775000203D96-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-NOTICETYPE:A
X-LOTUS-APPTTYPE:3
X-LOTUS-CHILD_UID:07F96A3F1C9547366525775000203D96
END:VEVENT
BEGIN:VEVENT
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT

The icalendar package looks nice.
For instance, to write a file:
from icalendar import Calendar, Event
from datetime import datetime
from pytz import UTC # timezone
cal = Calendar()
cal.add('prodid', '-//My calendar product//mxm.dk//')
cal.add('version', '2.0')
event = Event()
event.add('summary', 'Python meeting about calendaring')
event.add('dtstart', datetime(2005,4,4,8,0,0,tzinfo=UTC))
event.add('dtend', datetime(2005,4,4,10,0,0,tzinfo=UTC))
event.add('dtstamp', datetime(2005,4,4,0,10,0,tzinfo=UTC))
event['uid'] = '20050115T101010/27346262376#mxm.dk'
event.add('priority', 5)
cal.add_component(event)
f = open('example.ics', 'wb')
f.write(cal.to_ical())
f.close()
Tadaaa, you get this file:
BEGIN:VCALENDAR
PRODID:-//My calendar product//mxm.dk//
VERSION:2.0
BEGIN:VEVENT
DTEND;VALUE=DATE:20050404T100000Z
DTSTAMP;VALUE=DATE:20050404T001000Z
DTSTART;VALUE=DATE:20050404T080000Z
PRIORITY:5
SUMMARY:Python meeting about calendaring
UID:20050115T101010/27346262376#mxm.dk
END:VEVENT
END:VCALENDAR
But what lies in this file?
g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
print component.name
g.close()
You can see it easily:
>>>
VCALENDAR
VEVENT
>>>
What about parsing the data about the events:
g = open('example.ics','rb')
gcal = Calendar.from_ical(g.read())
for component in gcal.walk():
if component.name == "VEVENT":
print(component.get('summary'))
print(component.get('dtstart'))
print(component.get('dtend'))
print(component.get('dtstamp'))
g.close()
Now you get:
>>>
Python meeting about calendaring
20050404T080000Z
20050404T100000Z
20050404T001000Z
>>>

You could probably also use the vobject module for this: http://pypi.python.org/pypi/vobject
If you have a sample.ics file you can read it's contents like, so:
# read the data from the file
data = open("sample.ics").read()
# parse the top-level event with vobject
cal = vobject.readOne(data)
# Get Summary
print 'Summary: ', cal.vevent.summary.valueRepr()
# Get Description
print 'Description: ', cal.vevent.description.valueRepr()
# Get Time
print 'Time (as a datetime object): ', cal.vevent.dtstart.value
print 'Time (as a string): ', cal.vevent.dtstart.valueRepr()

New to python; the above comments were very helpful so wanted to post a more complete sample.
# ics to csv example
# dependency: https://pypi.org/project/vobject/
import vobject
import csv
with open('sample.csv', mode='w') as csv_out:
csv_writer = csv.writer(csv_out, delimiter=',', quotechar='"', quoting=csv.QUOTE_MINIMAL)
csv_writer.writerow(['WHAT', 'WHO', 'FROM', 'TO', 'DESCRIPTION'])
# read the data from the file
data = open("sample.ics").read()
# iterate through the contents
for cal in vobject.readComponents(data):
for component in cal.components():
if component.name == "VEVENT":
# write to csv
csv_writer.writerow([component.summary.valueRepr(),component.attendee.valueRepr(),component.dtstart.valueRepr(),component.dtend.valueRepr(),component.description.valueRepr()])

Four years later and understanding ICS format a bit better, if those were the only fields I needed, I'd just use the native string methods:
import io
# Probably not a valid .ics file, but we don't really care for the example
# it works fine regardless
file = io.StringIO('''
BEGIN:VCALENDAR
X-LOTUS-CHARSET:UTF-8
VERSION:2.0
DESCRIPTION:Emails\nDarlene\n Murphy\nDr. Ferri\n
SUMMARY:smart energy management
LOCATION:8778/92050462
DTSTART;TZID="India":20100629T110000
DTEND;TZID="India":20100629T120000
TRANSP:OPAQUE
DTSTAMP:20100713T071037Z
CLASS:PUBLIC
SUMMARY:meeting
UID:6011DDDD659E49D765257751001D2B4B-Lotus_Notes_Generated
X-LOTUS-UPDATE-SEQ:1
X-LOTUS-UPDATE-WISL:$S:1;$L:1;$B:1;$R:1;$E:1;$W:1;$O:1;$M:1
X-LOTUS-NOTESVERSION:2
X-LOTUS-APPTTYPE:0
X-LOTUS-CHILD_UID:6011DDDD659E49D765257751001D2B4B
END:VEVENT
'''.strip())
parsing = False
for line in file:
field, _, data = line.partition(':')
if field in ('SUMMARY', 'DESCRIPTION', 'DTSTAMP'):
parsing = True
print(field)
print('\t'+'\n\t'.join(data.split('\n')))
elif parsing and not data:
print('\t'+'\n\t'.join(field.split('\n')))
else:
parsing = False
Storing the data and parsing the datetime is left as an exercise for the reader (it's always UTC)
old answer below
You could use a regex:
import re
text = #your text
print(re.search("SUMMARY:.*?:", text, re.DOTALL).group())
print(re.search("DESCRIPTION:.*?:", text, re.DOTALL).group())
print(re.search("DTSTAMP:.*:?", text, re.DOTALL).group())
I'm sure it may be possible to skip the first and last words, I'm just not sure how to do it with regex. You could do it this way though:
print(' '.join(re.search("SUMMARY:.*?:", text, re.DOTALL).group().replace(':', ' ').split()[1:-1])

In case anyone else is looking at this, the ics package seems like it's updated better than any others mentioned in the thread. https://pypi.org/project/ics/
Here's some sample code I'm using:
from ics import Calendar, Event
with open(in_file, 'r') as file:
ics_text = file.read()
c = Calendar(ics_text) for e in c.events:
print(e.name)

I'd parse line by line and do a search for your terms, then get the index and extract that and X number of characters further (however many you think you'll need). Then parse that much smaller string to get it to be what you need.

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

Does anyone know how to add row numbers? - python

Related

Get specific value within a file in python

Python parse large db over 4gb

BeautifulSoup in Python not parsing right

Script like google suggest in python

Parsing files (ics/ icalendar) using Python

Categories

Resources