Extracting move information from a pgn file on Python - python

How do I go about extracting move information from a pgn file on Python? I'm new to programming and any help would be appreciated.

Try pgnparser.
Example code:
import pgn
import sys
f = open(sys.argv[1])
pgn_text = f.read()
f.close()
games = pgn.loads(pgn_text)
for game in games:
print game.moves

#Dennis Golomazov
I like what Denis did above. To add on to what he did, if you want to extract move information from more than 1 game in a png file, say like in games in chess database png file, use chess.pgn.
import chess.pgn
png_folder = open('sample.pgn')
current_game = chess.pgn.read_game(png_folder)
png_text = str(current_game.mainline_moves())
read_game() method acts as an iterator so calling it again will grab the next game in the pgn.

I can't give you any Python-specific directions, but I wrote a PGN converter recently in java, so I'll try offer some advice. The main disadvantage of Miku's link is the site doesn't allow for variance in .pgn files, which every site seems to vary slightly on the exact format.
Some .pgn have the move number attached to the move itself (1.e4 instead of 1. e4) so if you tokenise the string, you could check the placement of the dot since it only occurs in move numbers.
Work out all the different move combinations you can have. If a move is 5 characters long it could be 0-0-0 (queenside castles), Nge2+ (Knight from g to e2 with check(+)/ checkmate(#)), Rexb5 (Rook on e takes b5).
The longest string a move could be is 7 characters (for when you must specify origin rank AND file AND a capture AND with check). The shortest is 2 characters (a pawn advance).
Plan early for castling and en passant moves. You may realise too late that the way you have built your program doesn't easily adapt for them.
The details given at the start(ELO ratings, location, etc.) vary from file to file.

I dont have PGN parser for python but You can get source code of PGN parser for XCode from this place it can be of assistance

Related

How to separate words to single letters from text file python

How do I separate words from a text file into single letters?
I'm given a text where I have to calculate the frequency of the letters in a text. However, I can't seem to figure out how I separate the words into single letters so I can count the unique elements and from there determine their frequency.
I apologize for not having the text in a text file, but the following text I'm given:
alice was beginning to get very tired of sitting by her sister on the bank, and of having nothing to do: once or twice she had peeped into the book her sister was reading, but it had no pictures or conversations in it, and what is the use of a book,' thought alice without pictures or conversation?'
so she was considering in her own mind (as well as she could, for the hot day made her feel very sleepy and stupid), whether the pleasure of making a daisy- chain would be worth the trouble of getting up and picking the daisies, when suddenly a white rabbit with pink eyes ran close by her.
there was nothing so very remarkable in that; nor did alice think it so very much out of the way to hear the rabbit say to itself, `oh dear! oh dear! i shall be late!' (when she thought it over afterwards, it occurred to her that she ought to have wondered at this, but at the time it all seemed quite natural); but when the rabbit actually took a watch out of its waistcoat- pocket, and looked at it, and then hurried on, alice started to her feet, for it flashed across her mind that she had never before seen a rabbit with either a waistcoat-pocket, or a watch to take out of it, and burning with curiosity, she ran across the field after it, and fortunately was just in time to see it pop down a large rabbit-hole under the hedge.
in another moment down went alice after it, never once considering how in the world she was to get out again.
the rabbit-hole went straight on like a tunnel for some way, and then dipped suddenly down, so suddenly that alice had not a moment to think about stopping herself before she found herself falling down a very deep well.
I'm supposed to separate into getting 26 variables a-z, and then determine their frequency which is given as the following:
I tried making the following code so far:
# Check where the current file you are working in, is saved.
import os
os.getcwd()
#print(os.getcwd())
# 1. Change the current working directory to the place where you have saved the file.
os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles')
os.getcwd()
#print(os.chdir('C:/Users/Annik/Desktop/DTU/02633 Introduction to programming/Datafiles'))
# 2. Listing the content of current working directory type
os.listdir(os.getcwd())
#print(os.listdir(os.getcwd()))
#importing the file
filein = open("small_text.txt", "r") #opens the file for reading
lines = filein.readlines() #reads all lines into an array
smalltxt = "".join(lines) #Joins the lines into one big string.
import numpy as np
def letterFrequency(filename):
#counts the frequency of letters in a text
unique_elems, counts = np.unique(separate_words, return_counts=True)
return unique_elems
I just don't know how to separate the letters in the text, so I can count the unique elements.
You can use collections.Counter to get your frequencies directly from the text.
Then just select the 26 keys you are interested, because it will also include whitespaces and other signs.
from collections import Counter
[...]
with open("small_text.txt", "r") as file:
text = file.read()
keys = "abcdefghijklmnopqrstuvwxyz"
c = Counter(text.lower())
# initialize occurrence with zeros to have all keys present.
occurrence = dict.fromkeys(keys, 0)
occurrence.update({k:v for k,v in c.items() if k in keys})
total = sum(occurrence.values())
frequency = {k:v/total for k,v in occurrence.items()}
[...]
To handle upper case str.lower might be useful as well.
"how I separate the words into single letters" since you want to calculate the count of the characters you can implement python counter in collections.
For example
import collections
import pprint
...
...
file_input = input('File_Name: ')
with open(file_input, 'r') as info:
count = collections.Counter(info.read().upper()) # reading file
value = pprint.pformat(count)
print(value)
...
...
This read your file will output the count of characters present.

Create word lists from medical journals

I have been asked to compile a crossword for a surgeon's publication, - it comes out quarterly. I need to make it medically oriented, preferably using different specialty words. eg some will be orthopaedics, some cardiac surgery and some human anatomy etc.
I can get surgical journals online.
I want to create word lists for each specialty and use them in the compiler. I will use crossword compiler .
I can use journal articles on the web, or downloaded pdf's. I am a surgeon and use pandas for data analysis but my python skills are a bit primitive so I need relatively simple solutions. How can I create the specific word lists for each surgical specialty.
They don't need to be very specific words, so eg I thought I could scrape the journal volume for words, compare them to a list of common words and delete those leaving me with a technical list. May require some trial and error. I havent used beautiful soup before but willing to try it.
Alternatively I could just get rid of the beautful soup step and use endnote to download a few hundred journals and export to txt.
Its the extraction and list making I think i am mainly struggling to conceptualise.
I created this program that you can use to parse through a .txt file to find the most common words. I also included a block of code that will help you to convert a .pdf file to .txt. Hope my approach to the solution helps, good luck with your crossword for the surgeon's publication!
'''
Find the most common words in a txt file
'''
import collections
# The re module provides regular expression matching operations
import re
'''
Use this if you would like to convert a PDF to a txt file
'''
# import PyPDF2
# pdffileobj=open('textFileName.pdf','rb')
# pdfreader=PyPDF2.PdfFileReader(pdffileobj)
# x=pdfreader.numPages
# pageobj=pdfreader.getPage(x-1)
# text=pageobj.extractText()
# file1=open(r"(folder path)\\textFileName.txt","a")
# file1.writelines(text)
# file1.close()
words = re.findall(r'\w+', open('textFileName.txt').read().lower())
most_common = collections.Counter(words).most_common(10)
print(most_common)

Output more than one instrument using Music 21 and Python

I am using Python and Music21 to code an algorithm that composes melodies from input music files of violin accompanied by piano pieces.My problem is that when I input a midi file that has two instruments, the output is only in one instrument. I can currently change the output instrument to a guitar, trumpet etc. even though those instruments are not present in my original input files. I would like to know whether I could write some code that identifies the instruments in the input files and outputs those specific instruments. Alternatively, is there any way that I could code for two output instruments rather than one? I have tired to copy the existing code with another instrument but the algorithm only outputs the last instrument detected in the code.Below is my current running code:
def convert_to_midi(prediction_output):
offset=0
output_notes=[]
#Create note and chord objects based on the values generated by the model
for pattern in prediction_output:
#Pattern is a chord
if ('.' in pattern) or pattern.isdigit():
notes_in_chord=pattern.split('.')
notes=[]
for current_note in notes_in_chord:
output_notes.append(instrument.Guitar())
cn=int(current_note)
new_note=note.Note(cn)
notes.append(new_note)
new_chord=chord.Chord(notes)
new_chord.offset=offset
output_notes.append(new_note)
#Pattern is a note
else:
output_notes.append(instrument.Guitar())
new_note=note.Note(pattern)
new_note.offset=offset
output_notes.append(new_note)
Instrument objects go directly into the Stream object, not on a Note, and each Part can have only one Instrument object active at a time.

Pulling Vars from lists and applying them to a function

I'm working on a project in my work using purely Python 3:
If I take my scanner, (Because I work in inventory) and anything I scan goes into a text doc, and I scan the location "117" , and then I scan any device in any other location, (the proceeding lines in the text doc "100203") and I run the script and it plugs in '117' in the search on our database and changes each of the devices (whether they were assigned to that location or not) into that location, (Validating those devices are in location '117')
My main question is the 3rd objective down from the Objectives list below that doesn't have "Done" after it.
Objective:
Pull strings from a text document, convert it into a dictionary. = (Text_Dictionary) **Done**
Assign the first var in the dictionary to a separate var. = (First_Line) **Done**
All proceeding var's greater then the first var in the dictionary should be passed into a function individually. = (Proceeding_Lines)
Side note: The code should loop in a fashion that should (.pop) the var from the dictionary/list, But I'm open for other alternatives. (Not mandatory)
What I already have is:
Project.py:
1 import re
2 import os
3 import time
4 import sys
5
6 with open(r"C:\Users\...\text_dictionary.txt") as f:
7 Text_Dictionary = [line.rstrip('\n') for line in
8 open(r"C:\Users\...\text_dictionary.txt")]
9
10 Text_Dict = (Text_Dictionary)
11 First_Line = (Text_Dictionary[0])
12
13 print("The first line is: ", First_Line)
14
15 end = (len(Text_Dictionary) + 1)
16 i = (len(Text_Dictionary))
17
What I have isn't much on the surface, but I have another "*.py" file fill of code that I am going to copy in for the action that I wish to preform on each of the vars in the Text_Dictionary.txt. Lines 15 - 16 was me messing with what I thought might solve this.
In the imported text document, the var's look very close to this (Same length)(All digits):
Text_Dictionary.txt:
117
23000
53455
23454
34534
...
Note: These values will change for each time the code is ran, meaning someone will type/scan in these lines of digits each time.
Explained concept:
Ideally, I would like to have the first line point towards a direction, and the rest of the digits would follow; however, each (Example: '53455') needs to be ran separately then the next in line and (Example: '117') would be where '53455' goes. You could say the first line is static throughout the code, unless otherwise changed inText_Dictionary.txt. '117'is ran in conjunction with each iteration proceeding it.
Background:
This is for inventory management for my office, I am in no way payed for doing this, but it would make my job a heck-of-a-lot easier. Also, I know basic python to get myself around, but this kinda stumped me. Thank you to whoever answers!
I've no clue what you're asking, but I'm going to take a guess. Before I do so, your code was annoying me:
with open("file.txt") as f:
product_ids = [line.strip() for line in f if not line.isspace()]
There. That's all you need. It protects against blank lines in the file and weird invisible spaces too, this way. I decided to leave the data as strings because it probably represents an inventory ID, and in the future that might be upgraded to "53455-b.42454#62dkMlwee".
I'm going to hazard a guess that you want to run different code depending on the number at the top. If so, you can use a dictionary containing functions. You said that you wanted to run code from another file, so this is another_file.py:
__all__ = ["dispatch_whatever"]
dispatch_whatever = {}
def descriptive_name_for_117(product_id):
pass
dispatch_whatever["117"] = descriptive_name_for_117
And back in main_program.py, which is stored in the same directory:
from another_file import dispatch_whatever
for product_id in product_ids[1:]:
dispatch_whatever[product_ids[0]](product_id)

renumber residues in a protein structure file (pdb)

Hi
I am currently involved in making a website aimed at combining all papillomavirus information in a single place.
As part of the effort we are curating all known files on public servers (e.g. genbank)
One of the issues I ran into was that many (~50%) of all solved structures are not numbered according to the protein.
I.e. a subdomain was crystallized (amino acid 310-450) however the crystallographer deposited this as residue 1-140.
I was wondering whether anyone knows of a way to renumber the entire pdb file. I have found ways to renumber the sequence (identified by seqres), however this does not update the helix and sheet information.
I would appreciate it if you had any suggestions…
Thanks
I'm the maintainer of pdb-tools - which may be a tool that can assist you.
I have recently modified the residue-renumber script within my application to provide more flexibility. It can now renumber hetatms and specific chains, and either force the residue numbers to be continuous or just add a user-specified offset to all residues.
Please let me know if this assists you.
I frequently encounter this problem too. After abandoning an old perl script I had for this I've been experimenting with some python instead. This solution assumes you've got Biopython, ProDy (http://www.csb.pitt.edu/ProDy/#prody) and EMBOSS (http://emboss.sourceforge.net/) installed.
I used one of the papillomavirus PDB entries here.
from Bio import AlignIO,SeqIO,ExPASy,SwissProt
from Bio.Seq import Seq
from Bio.SeqRecord import SeqRecord
from Bio.Alphabet import IUPAC
from Bio.Emboss.Applications import NeedleCommandline
from prody.proteins.pdbfile import parsePDB, writePDB
import os
oneletter = {
'ASP':'D','GLU':'E','ASN':'N','GLN':'Q',
'ARG':'R','LYS':'K','PRO':'P','GLY':'G',
'CYS':'C','THR':'T','SER':'S','MET':'M',
'TRP':'W','PHE':'F','TYR':'Y','HIS':'H',
'ALA':'A','VAL':'V','LEU':'L','ILE':'I',
}
# Retrieve pdb to extract sequence
# Can probably be done with Bio.PDB but being able to use the vmd-like selection algebra is nice
pdbname="2kpl"
selection="chain A"
structure=parsePDB(pdbname)
pdbseq_str=''.join([oneletter[i] for i in structure.select("protein and name CA and %s"%selection).getResnames()])
alnPDBseq=SeqRecord(Seq(pdbseq_str,IUPAC.protein),id=pdbname)
SeqIO.write(alnPDBseq,"%s.fasta"%pdbname,"fasta")
# Retrieve reference sequence
accession="Q96QZ7"
handle = ExPASy.get_sprot_raw(accession)
swissseq = SwissProt.read(handle)
refseq=SeqRecord(Seq(swissseq.sequence,IUPAC.protein),id=accession)
SeqIO.write(refseq, "%s.fasta"%accession,"fasta")
# Do global alignment with needle from EMBOSS, stores entire sequences which makes numbering easier
needle_cli = NeedleCommandline(asequence="%s.fasta"%pdbname,bsequence="%s.fasta"%accession,gapopen=10,gapextend=0.5,outfile="needle.out")
needle_cli()
aln = AlignIO.read("needle.out", "emboss")
os.remove("needle.out")
os.remove("%s.fasta"%pdbname)
os.remove("%s.fasta"%accession)
alnPDBseq = aln[0]
alnREFseq = aln[1]
# Initialize per-letter annotation for pdb sequence record
alnPDBseq.letter_annotations["resnum"]=[None]*len(alnPDBseq)
# Initialize annotation for reference sequence, assume first residue is #1
alnREFseq.letter_annotations["resnum"]=range(1,len(alnREFseq)+1)
# Set new residue numbers in alnPDBseq based on alignment
reslist = [[i,alnREFseq.letter_annotations["resnum"][i]] for i in range(len(alnREFseq)) if alnPDBseq[i] != '-']
for [i,r] in reslist:
alnPDBseq.letter_annotations["resnum"][i]=r
# Set new residue numbers in the structure
newresnums=[i for i in alnPDBseq.letter_annotations["resnum"][:] if i != None]
resindices=structure.select("protein and name CA and %s"%selection).getResindices()
resmatrix = [[newresnums[i],resindices[i]] for i in range(len(newresnums)) ]
for [newresnum,resindex] in resmatrix:
structure.select("resindex %d"%resindex).setResnums(newresnum)
writePDB("%s.renumbered.pdb"%pdbname,structure)
pdb-tools
Phenix pdb-tools
BioPython or Bio3D
Check the first one - it should fit your needs

Categories