bibtexparser - pyparsing.ParseException: Expected end of text - python

I'm using bibtexparser to parse a bibtex file.
import bibtexparser
with open('MetaGJK12842.bib','r') as bibfile:
bibdata = bibtexparser.load(bibfile)
While parsing I get the error message:
Could not parse properly, starting at
#article{Frenn:EvidenceBasedNursing:1999,
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/pyparsing.py", line 3183, in parseImpl
raise ParseException(instring, loc, self.errmsg, self)
pyparsing.ParseException: Expected end of text (at char 5773750),
(line:47478, col:1)`
The line refers to the following bibtex entry:
#article{Frenn:EvidenceBasedNursing:1999,
author = {Frenn, M.},
title = {A Mediterranean type diet reduced all cause and cardiac mortality after a first myocardial infarction [commentary on de Lorgeril M, Salen P, Martin JL, et al. Mediterranean dietary pattern in a randomized trial: prolonged survival and possible reduced cancer rate. ARCH INTERN MED 1998;158:1181-7]},
journal = {Evidence Based Nursing},
uuid = {15A66A61-0343-475A-8700-F311B08BB2BC},
volume = {2},
number = {2},
pages = {48-48},
address = {College of Nursing, Marquette University, Milwaukee, WI},
year = {1999},
ISSN = {1367-6539},
url = {},
keywords = {Treatment Outcomes;Mediterranean Diet;Mortality;France;Neoplasms -- Prevention and Control;Phase One Excluded - No Assessment of Vegetable as DV;Female;Phase One - Reviewed by Hao;Myocardial Infarction -- Diet Therapy;Diet, Fat-Restricted;Phase One Excluded - No Fruit or Vegetable Study;Phase One Excluded - No Assessment of Fruit as DV;Male;Clinical Trials},
tags = {Phase One Excluded - No Assessment of Vegetable as DV;Phase One Excluded - No Fruit or Vegetable Study;Phase One - Reviewed by Hao;Phase One Excluded - No Assessment of Fruit as DV},
accession_num = {2000008864. Language: English. Entry Date: 20000201. Revision Date: 20130524. Publication Type: journal article},
remote_database_name = {rzh},
source_app = {EndNote},
EndNote_reference_number = {4413},
Secondary_title = {Evidence Based Nursing},
Citation_identifier = {Frenn 1999a},
remote_database_provider = {EBSCOhost},
publicationStatus = {Unknown},
abstract = {Question: text.},
notes = {(0) abstract; commentary. Journal Subset: Core Nursing; Europe; Nursing; Peer Reviewed; UK \& Ireland. No. of Refs: 1 ref. NLM UID: 9815947.}
}
What is wrong with this entry?

It seems that the issue has been addressed and resolved in the project repository (see Issue 147)
Until the next release, installing the library from the git repository can serve as a temporary fix.
pip install --upgrade git+https://github.com/sciunto-org/python-bibtexparser.git#master

I had this same error and found an entry near the line mentioned in the error that had a line like this
...
year = {1959},
month =
}
When I removed the null month item it parsed for me.

Related

Extract data using regex between specified strings

Question1: I want to extract the data between "Target Information" and the line before "Group Information" and store it as a variable or appropriately.
Question2: Next, I want to extract the data from "Group Information" till the end of the file and store it in a variable or something appropriate.
Question3: With this information in both the above cases, I want to extract the line just after the line which starts with "Name"
From the below code I was able to get the information between "Target Information" and "Group Information" and Captured the data in "required_lines" variable.
Next, I am trying to get the line after the line "Name". But this fails. And can the logic be implemented using regex call?
# Extract the lines between
with open ('showrcopy.txt', 'r') as f:
file = f.readlines()
required_lines1 = []
required_lines = []
inRecordingMode = False
for line in file:
if not inRecordingMode:
if line.startswith('Target Information'):
inRecordingMode = True
elif line.startswith('Group Information'):
inRecordingMode = False
else:
required_lines.append(line.strip())
print(required_lines)
#Extract the line after the line "Name"
def gen():
for x in required_lines:
yield x
for line in gen():
if "Name" in line:
print(next(gen())
showrcopy.txt
root#gnodee184119:/home/usr/redsuren# date; showrcopy -qw
Tue Aug 24 00:20:38 PDT 2021
Remote Copy System Information
Status: Started, Normal
Target Information
Name ID Type Status Policy QW-Server QW-Ver Q-Status Q-Status-Qual ATF-Timeout
s2976 4 IP ready mirror_config https://10.157.35.148:8443 4.0.007 Re-starting Quorum not stable 10
Link Information
Target Node Address Status Options
s2976 0:9:1 192.168.20.21 Up -
s2976 1:9:1 192.168.20.22 Up -
receive 0:9:1 192.168.10.21 Up -
receive 1:9:1 192.168.10.22 Up -
Group Information
Name Target Status Role Mode Options
SG_hpux_vgcgloack.r518634 s2976 Started Primary Sync auto_recover,auto_failover,path_management,auto_synchronize,active_active
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
vgcglock_SG_cluster 13496 vgcglock_SG_cluster 28505 Synced NA
Name Target Status Role Mode Options
aix_rcg1_AA.r518634 s2976 Started Primary Sync auto_recover,auto_failover,path_management,auto_synchronize,active_active
LocalVV ID RemoteVV ID SyncStatus LastSyncTime
tpvvA_aix_r.2 20149 tpvvA_aix.2 41097 Synced NA
tpvvA_aix_r.3 20150 tpvvA_aix.3 41098 Synced NA
tpvvA_aix_r.4 20151 tpvvA_aix.4 41099 Synced NA
tpvvA_aix_r.5 20152 tpvvA_aix.5 41100 Synced NA
tpvvA_aix_r.6 20153 tpvvA_aix.6 41101 Synced NA
tpvvA_aix_r.7 20154 tpvvA_aix.7 41102 Synced NA
tpvvA_aix_r.8 20155 tpvvA_aix.8 41103 Synced NA
tpvvA_aix_r.9 20156 tpvvA_aix.9 41104 Synced NA
tpvvA_aix_r.10 20157 tpvvA_aix.10 41105 Synced NA
Here's a regex solution to pull the target info and group info:
import re
with open("./showrcopy.txt", "r") as f:
text = f.read()
target_info_pattern = re.compile(r"Target Information([.\s\S]*)Group Information")
group_info_pattern = re.compile(r"Group Information([.\s\S]*)")
target_info = target_info_pattern.findall(text)[0].strip().split("\n")
group_info = group_info_pattern.findall(text)[0].strip().split("\n")
target_info_line_after_name = target_info[1]
group_info_line_after_name = group_info[1]
And the lines you're interested in:
>>> target_info_line_after_name
's2976 4 IP ready mirror_config https://10.157.35.148:8443 4.0.007 Re-starting Quorum not stable 10'
>>> group_info_line_after_name
'SG_hpux_vgcgloack.r518634 s2976 Started Primary Sync auto_recover,auto_failover,path_management,auto_synchronize,active_active'

Index Error: list index out of range using filepath

I have this script that tries to find Style + Year on Discogs and writes it to the mp3 file.
It used to work before.
But now I get this error:
Index Error: list index out of range
I've de-installed 3.7.2, and re-installed 3.6.3 since I have chat logs where it shows it's working back then on that version.
When I set sys.argument to 0, no error, but no execution either.
for mp3file in Path(sys.argv[1]).glob('**/*.mp3'):
print (mp3file)
artist, title = return_tag_data(mp3file)
artist_c = clean(artist)
title_c = clean(title)
style = get_style(artist_c, title_c, artist, title)
year = get_year(artist_c, title_c, artist, title)
if style != None:
print ("Artist : {}\nTitle : {}\nStyle : {}\nYear : {}\n".format(artist, title, style, year))
write_tag_data(mp3file, style, year)
It should show this:
c:\users\useraccount\music\My Music Collection\2-Step & Garage\187 Lockdown - Gunman (God Remix).mp3
Artist : 187 Lockdown
Title : Gunman (God Remix)
Style : Drum n Bass, Speed Garage
Year : 1997
But instead it throws this:
E:\test-mp3>python styleyear-mp3.py "e:\test-mp3"
e:\test-mp3\187 Lockdown - Gunman (GOD remix).mp3
Traceback (most recent call last):
File "styleyear-mp3.py", line 111, in <module>
style = get_style(artist_c, title_c, artist, title)
File "styleyear-mp3.py", line 50, in get_style
new_artist = y('spanitemprop title="(.+?)"')[0].strip()
IndexError: list index out of range

getting error with default examples - imdbpy

Installed imdbpy and tried running the default examples mentioned on -http://imdbpy.sourceforge.net/support.html, but I am unable to run them successfully.
Program:
# Create the object that will be used to access the IMDb's database.
ia = imdb.IMDb() # by default access the web.
# Search for a movie (get a list of Movie objects).
s_result = ia.search_movie('The Untouchables')
# Print the long imdb canonical title and movieID of the results.
for item in s_result:
print item['long imdb canonical title'], item.movieID
# Retrieves default information for the first result (a Movie object).
the_unt = s_result[0]
ia.update(the_unt)
# Print some information.
print the_unt['runtime']
print the_unt['rating']
director = the_unt['director'] # get a list of Person objects.
# Get the first item listed as a "goof".
ia.update(the_unt, 'goofs')
print the_unt['goofs'][0]
# The first "trivia" for the first director.
b_depalma = director[0]
ia.update(b_depalma)
print b_depalma['trivia'][0]
Error:
/System/Library/Frameworks/Python.framework/Versions/2.7/bin/python2.7 /Users/karan/PycharmProjects/IMDBParser/extractor/imdbpy_tester.py
Untouchables, The (1987) 0094226
Untouchables (in development), The (????) 1987680
"Untouchables, The" (1959) 0052522
"Untouchables, The" (1993) 0106165
Intouchables, The (2011) 1675434
Untouchables, The (2017) 1877895
Untouchable (I) (2016) 5509634
Untouchable 2, The (2001) (VG) 0287778
"Untouchable" (2015) 4191792
Untouchable, The (1997) (VG) 0287779
Untouchable (2012) 2266916
"Untouchable" (2017) (mini) 5220680
Untouchable (I) (2010) 1590231
Untouchable (III) (2013) 3001590
Untouchables, The (1991) (VG) 0335509
"Frontline" The Untouchables (2013) 2620144
"DVD_TV: Enhanced Version" The Untouchables (2005) 1088289
"Real Story, The" The Untouchables (2009) 2760176
"Harbour Lights" The Untouchables (1999) 0596815
"Bill, The" The Untouchables (2000) 0525783
[u'119']
7.9
Traceback (most recent call last):
File "/Users/karan/PycharmProjects/IMDBParser/extractor/imdbpy_tester.py", line 27, in <module>
print the_unt['goofs'][0]
File "/Library/Python/2.7/site-packages/IMDbPY-5.0-py2.7-macosx-10.11-intel.egg/imdb/utils.py", line 1469, in __getitem__
rawData = self.data[key]
KeyError: 'goofs'
Process finished with exit code 1
I found sth similar posted by another user here - https://bitbucket.org/alberanid/imdbpy/issues/42/examples-not-functional but I do not know how to fix this problem.

Python parse large db over 4gb

I'm trying to parse a db file with python that is over 4 gb.
Example from the db file:
% Tags relating to '217.89.104.48 - 217.89.104.63'
% RIPE-USER-RESOURCE
inetnum: 194.243.227.240 - 194.243.227.255
netname: PRINCESINDUSTRIEALIMENTARI
remarks: INFRA-AW
descr: PRINCES INDUSTRIE ALIMENTARI
descr: Provider Local Registry
descr: BB IBS
country: IT
admin-c: DUMY-RIPE
tech-c: DUMY-RIPE
status: ASSIGNED PA
notify: order.manager2#telecomitalia.it
mnt-by: INTERB-MNT
changed: unread#ripe.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
% Tags relating to '194.243.227.240 - 194.243.227.255'
% RIPE-USER-RESOURCE
inetnum: 194.16.216.176 - 194.16.216.183
netname: SE-CARLSTEINS
descr: CARLSTEINS TRAFIK AB
org: ORG-CTA17-RIPE
country: SE
admin-c: DUMY-RIPE
tech-c: DUMY-RIPE
status: ASSIGNED PA
notify: mntripe#telia.net
mnt-by: TELIANET-LIR
changed: unread#ripe.net 20000101
source: RIPE
remarks: ****************************
remarks: * THIS OBJECT IS MODIFIED
remarks: * Please note that all data that is generally regarded as personal
remarks: * data has been removed from this object.
remarks: * To view the original object, please query the RIPE Database at:
remarks: * http://www.ripe.net/whois
remarks: ****************************
I want to parse each block starting with % Tags relating to
and out of the block I want to extract the inetnum and first descr
This is what I got so far: (Updated)
import re
with open('test.db', "r") as f:
content = f.read()
r = re.compile(r''
'descr:\s+(.*?)\n',
re.IGNORECASE)
res = r.findall(content)
print res
as it's over 4gb file you don't want to read all the file in one time by using f.read()
but using the file object as an iterator (when you iterate on a file you get one line after the other)
the following genererator should do the job
def parse(filename):
current= None
for l in open(filename):
if l.startswith("% Tags relating to"):
if current is not None:
yield current
current = {}
elif l.startswith("inetnum:"):
current["inetnum"] = l.split(":",1)[1].strip()
elif l.startswith("descr") and not "descr" in current:
current["descr"] = l.split(":",1)[1].strip()
if current is not None:
yield current
and you can use it as the following
for record in parse("test.db"):
print (record)
result on the test file:
{'inetnum': '194.243.227.240 - 194.243.227.255', 'descr': 'PRINCES INDUSTRIE ALIMENTARI'}
{'inetnum': '194.16.216.176 - 194.16.216.183', 'descr': 'CARLSTEINS TRAFIK AB'}
If you only want to get first descr :
r = re.compile(r''
'descr:\s+(.*?)\n(?:descr:.*\n)*',
re.IGNORECASE)
If you want inetnum and first descr :
[ a + b for (a,b) in re.compile(r''
'(?:descr:\s+(.*?)\n(?:descr:.*\n)*)|(?:inetnum:\s+(.*?)\n)',
re.IGNORECASE) ]
I must admit I make no use of % Tags relating to, and that I suppose that all descr are consecutive.

How to extract data from text files?

So I am having a set of files that I need to extract data from and write in a new txt file, and I am not sure how to do this with Python. Below is a sample data. I am trying to extract the parts from NSF Org, File and Abstract.
Title : CRB: Genetic Diversity of Endangered Populations of Mysticete Whales:
Mitochondrial DNA and Historical Demography
Type : Award
NSF Org : DEB
Latest
Amendment
Date : August 1, 1991
File : a9000006
Award Number: 9000006
Award Instr.: Continuing grant
Prgm Manager: Scott Collins
DEB DIVISION OF ENVIRONMENTAL BIOLOGY
BIO DIRECT FOR BIOLOGICAL SCIENCES
Start Date : June 1, 1990
Expires : November 30, 1992 (Estimated)
Expected
Total Amt. : $179720 (Estimated)
Investigator: Stephen R. Palumbi (Principal Investigator current)
Sponsor : U of Hawaii Manoa
2530 Dole Street
Honolulu, HI 968222225 808/956-7800
NSF Program : 1127 SYSTEMATIC & POPULATION BIOLO
Fld Applictn: 0000099 Other Applications NEC
61 Life Science Biological
Program Ref : 9285,
Abstract :
Commercial exploitation over the past two hundred years drove the great
Mysticete whales to near extinction. Variation in the sizes of populations
prior to exploitation, minimalpopulation size during exploitation and
current population sizes permit analyses of the effects of differing levels
of exploitation on species with different biogeographical distributions and
life-history characteristics.
You're not giving me much to go on but, what I do to read input files from a txt file. This is in Java, hopefully you'll know how to store it in an array of some sort
import java.util.Scanner;
import java.io.*;
public class ClockAngles{
public static void main (String [] args) throws IOException {
Scanner reader = null;
String input = "";
try {
reader = new Scanner (new BufferedReader (new FileReader("FilePath")));
while (reader.hasNext()) {
input = reader.next();
System.out.print(input);
}
}
finally {
if (reader != null) {
reader.close();
}
}
Python code
#!/bin/env python2.7
# Change this to the file with the time input
filename = "filetext"
storeData = []
class Whatever:
def __init__(self, time_str):
times_list = time_str.split('however you want input to be read')
self.a = int(times_list[0])
self.b = int(times_list[1])
self.c = int(times_list[2])
# prints the data
def __str__(self):
return str(self.a) + " " + str(self.b) + " " + str(self.c)

Categories