I have an MSI file that I'm trying to extract some of the parameters specified in the Details tab on the file properties.
I found the msilib where SummaryInformation.GetProperty(field) looks like the way to go, but I don't understand how to use it. how do I 'connect' it to that existing MSI file and not one that is being created?
The msi file contains both cab files and information in a database format.
See this link for more info about its structre and how to view it: MSI structure answer.
I never used the python msilib but by reading the documentation my guess is this:
to get the db object, use something like:
dbobject = msilib.OpenDatabase(path, msilib.MSIDBOPEN_READONLY)
if you want something in the summary info then you can do something like:
info = dbobject.GetSummaryInformation(1)
prop = info.GetProperty(field)
if the information you need is in one of the db tables then you should do a sql query against it:
view = dbobject.OpenView(sql)
rec = view.Execute(params)
str_val = rec.GetString(field)
Related
I'm trying to upload a .txt file to a mongodb database collection using PyCharm, but nothing is appearing inside of the collection? Here's the script I'm using at the moment:
from pymongo import MongoClient
client = MongoClient()
db = client.memorizer_data # use a database called "memorizer_data"
collection = db.english # and inside that DB, a collection called "english"
with open('7_1_1.txt', 'r') as f:
text = f.read() # read the txt file
name = '7_1_1.txt'
# build a document to be inserted
text_file_doc = {"file_name": name, "contents": text}
# insert the contents into the "file" collection
collection.insert_one(text_file_doc)
PyCharm gets through the script with no errors, I've also tried printing the acknowledged attribute just to see what comes up:
result = collection.insert_one(text_file_doc)
print(result.acknowledged)
Which is giving me True. I wasn't sure if I was actually connecting to my database, so I tried db.list_collection_names() and my collection 'english' is in the list, so as far as I can tell I am connecting with it?
I'm a newbie to MongoDB so I realize I've probably gone about things the wrong way. At the moment I'm just trying to get the script working for a single .txt file before uploading everything my project is using to the db.
What makes you think there's nothing in the collection? Two ways to check;
In your pymongo code, add a final debug line:
print(collection.find_one())
Or, in the mongodb shell:
use memorizer_data
db.english.findOne()
I am currently working on a project in which I must retrieve a document uploaded on a MongoDB database using GridFS and store it in my local directory.
Up to now I have written these lines of code:
if not fs.exists({'filename': 'my_file.txt'}):
CRAWLED_FILE = os.path.join(SAVING_FOLDER, 'new_file.txt')
else:
file = fs.find_one({'filename': 'my_file.txt'})
CRAWLED_FILE = os.path.join(SAVING_FOLDER, 'new_file.txt')
with open(CRAWLED_FILE, 'wb') as f:
f.write(file.read())
f.close()
I believe that find_one doesn't allow me to write in a new file the content of the file previously stored in the database. f.write(file.read()) writes in the file just created (new_file.txt) the directory in which (new_file.txt) is stored! So I have a txt completely different from the one I have uploaded in the database and the only line in the txt is: E:\\my_folder\\sub_folder\\my_file.txt
It's kind of weird, I don't even know why it is happening.
I thought it could work if I use the fs.get(ObjectId(ID)) method, which, according to the official documentation of Pymongo and GridFS, it provides a file-like interface for reading. However I just know the name of the txt saved in the database, I have no clue what is the object ID, I cannot use a list or dict to store all the IDs of my documents since it wouldn't be worthy. I have checked with many posts here on StackOverflow and everyone suggests to use subscription. Basically you create a cursor using fs.find()then you can iterate over the cursor for example like this:
for x in fs.find({'filename': 'my_file.txt'}):
ID = x['_id']
see, many answers here suggest me to do the following, the only problem is that Cursor object is not subscriptable and I have no clue how I can resolve this issue.
I must find way to get the document '_id' given the filename of the document so I can later use it combined with fs.get(ObjectId(ID)).
Hope you can help me, thank you a lot!
Matteo
You can just access it like this:
ID = x._id
But "_" is a protected member in Python, so I was looking around for other solutions (could not find much). For getting just the ID, you could do:
for ID in fs.find({'filename': 'my_file.txt'}).distinct('_id'):
# do something with ID
Since that only gets the IDs, you would probably need to do:
query = fs.find({'filename': 'my_file.txt'}).limit(1) # equivalent to find_one
content = next(query, None) # Iterate GridOutCursor, should have either one element or None
if content:
ID = content._id
...
I am using the application Zim Wiki(cross-platform, FOSS), which I am using to keep a personal wiki with lots of data coming from tables, copy and pasting, my own writing, and downloading and attaching .png and .html files for viewing and offline access.
The data that is not written or pasted can be stored in tables in the form of names, url addresses, and the names and locations of images and other attachments.
To insert into zim, I can use the front end with WSIWYG, or to make the skeleton of each entry, I could modify a template text entry. If I do this, nothing matters except for the location and identity of each character in each line.
By supplying the text in this image:
DandelionDemo source text,
--I can make this entry for Dandelion:
DandelionDemo Wiki.
So, I can generate and name the Wiki entry in Zim, which creates the .txt file for me, and inserts the time stamp and title, so, the template for this type of entry without the pasted fields would be:
**Full Scientific Name: **[[|]]**[syn]**
**Common Name(s): **
===== =====
**USDA PLANTS entry for Code:** [[https://plants.usda.gov/core/profile?symbol=|]] **- CalPhotos available images for:** [[https://calphotos.berkeley.edu/cgi/img_query?query_src=photos_index&where-taxon=|]]
**---**
**From - Wikipedia **[[wp?]] **- **[[/Docs/Plants/]]
{{/Docs/Plants/?height=364}}{{/Docs/Plants/?height=364}}
**()** //,// [[|(source)]]
**()** //// [[|(source)]]
**Wikipedia Intro: **////
---
So the first line with content, after the 31st character(which is a tab), you paste "http... {etc}. Then the procedure would insert "Taraxacum officinale... {etc}" after the "|", or what was the 32nd character, and so on. This data could be from "table1" and "table2", or combining the tables to make an un-normalized "table1table2", where each row could be converted to text or a .csv or I don't know, what do you think?
Is there a way, in LibreOffice to do this? I have used LibreOffice Base to generate a "book" form that populated fields, but it was much less complex data, without wiki liking and drag-and-drop pasting of images and attachments. So maybe the answer is to go simpler? The tables are not currently part of a registered database, but I could do that, once I have decided on the method of doing this.
I am ultimately looking for a "way", hopefully an "easy" way. However, that may not be in LibreOffice. If not, I know that I could do this in Python, but I haven't learned much about Python yet. If it involves a language, that is the first and only one I don't know that I will invest in learning for this project. If you know a "way" to do this in Python, let me know, and my first project and way of framing my study process will be in learning the methods that you share.
If you know of some other Linux GUI, I am definitely interested, but only in active free and open source builds that involve minimal/no compiling. I know the basics of SQL and DBMS's. In the past, have gotten Microsoft SQL server lite to work, but not DBeaver, yet. If you know of a CLI way also let me know, but I am a self-taught outdoors-loving Linux newb and mostly know about how to tweak little settings in programs, how to use moderately easy programs like ImageMagick, and I have built a few Lamp stacks for Drupal and Wordpress (no BASH etc...).
Thank you very much!
Ok, since you want to learn some python, let me propose you a way to do it this. First you need a template engine -like jinja2 (there are many others)-, a data source in our example a .csv file, -could be other like a db- and finally some code that reads the csv line by line and mix the content with the template.
Sample CSV file:
1;sample
2;dandelion
3;just for fun
Sample template:
**Full Scientific Name: **[[|]]**[syn]**
**Common Name(s): *{{name}}*
===== =====
USDA PLANTS entry for Code: *{{symbol}}*
---
Sample code:
#!/usr/bin/env/python
#
# Using the file system load
#
# We now assume we have a file in the same dir as this one called
# test_template.ziim
#
from jinja2 import Environment, FileSystemLoader
import os
import csv
# Capture our current directory
THIS_DIR = os.path.dirname(os.path.abspath(__file__))
def print_zim_doc():
# Create the jinja2 environment.
# Notice the use of trim_blocks, which greatly helps control whitespace.
j2_env = Environment(loader=FileSystemLoader(THIS_DIR),
trim_blocks=True)
template = j2_env.get_template('test_template.zim')
with open('example.csv') as File:
reader = csv.reader(File, delimiter=';')
for row in reader:
result = template.render(
symbol=row[0]
, name=row[1]
)
# to save the results
with open(row[0]+".txt", "wt") as fh:
fh.write(result)
fh.close()
if __name__ == '__main__':
print_zim_doc()
The code is pretty simple, reads the template located in the same folder as the python code, opens the csv file (also located in the same place), iterates over each line of the csv and renders the template using the values of the csv columns to fill the {{var_name}} in the template, finally saves the rendered result in a new file named as one of the csv column values. This sample will generate 3 files (1.txt, 2.txt, 3.txt). From here you can extend and improve the code to get your desired results.
I have a JSON file that looks like this:
I have a list of device ID's, and I'd like to search my JSON for a specific value of the id, to get the name.
The data that is now is JSON format used to be in XML format, for which I used to do this:
device = xml.find("devices/device[#id=\'%s\']" %someDeviceID)
deviceName = device.attrib['name']
--
So far based on answers online I have managed to search the JSON for a jey, but I haven't yet managed to search for a value.
Personally to read a json file I use the jsondatabase module. Using this module I would use the following code
from jsondb.db import Database
db = Database('PATH/TO/YOUR/JSON/FILE')
for device in db['devices']:
if device['id'] == 'SEARCHEDID':
print(device['name'])
Of course when your json is online you could scrape it with the requests module and then parse it to the jsondatabase module
Given a bibTex file, I need to add the respective fields(author, title, journal etc.) to a table in a MySQL database (with a custom schema).
After doing some initial research, I found that there exists Bibutils which I could use to convert a bib file to xml. My initial idea was to convert it to XML and then parse the XML in python to populate a dictionary.
My main questions are:
Is there a better way I could do this conversion?
Is there a library which directly parses a bibTex and gives me the fields in python?
(I did find bibliography.parsing, which uses bibutils internally but there is not much documentation on it and am finding it tough to get it to work).
Old question, but I am doing the same thing at the moment using the Pybtex library, which has an inbuilt parser:
from pybtex.database.input import bibtex
#open a bibtex file
parser = bibtex.Parser()
bibdata = parser.parse_file("myrefs.bib")
#loop through the individual references
for bib_id in bibdata.entries:
b = bibdata.entries[bib_id].fields
try:
# change these lines to create a SQL insert
print b["title"]
print b["journal"]
print b["year"]
#deal with multiple authors
for author in bibdata.entries[bib_id].persons["author"]:
print author.first(), author.last()
# field may not exist for a reference
except(KeyError):
continue
My workaround is to use bibtexparser to export relevant fields to a csv file;
import bibtexparser
import pandas as pd
with open("../../bib/small.bib") as bibtex_file:
bib_database = bibtexparser.load(bibtex_file)
df = pd.DataFrame(bib_database.entries)
selection = df[['doi', 'number']]
selection.to_csv('temp.csv', index=False)
And then write the csv to a table in the database, and delete the temp.csv.
This avoids some complication with pybtex I found.
You can also use Python BibtexParser: https://github.com/sciunto/python-bibtexparser
Documentation: https://bibtexparser.readthedocs.org
It's very straight forward (I use it in production).
For the record, I am not the developer of this library.
Converting to XML is a fine idea.
XML exists as an application-independent data format, so that you can parse it with readily-available libraries; using it as an intermediary has no particular drawbacks. In fact, you can usually import XML into a database without even going through a programming language such as Python (although the amount of Python you'd have to write for a task like this is trivial).
So far as I know, there is no direct, mature bibTeX reader for Python.
You could use the Perl package Bib2ML (aka. Bib2HTML). It contains a bib2sql tool that generates a SQL database from a BibTeX database, with the following schema:
An alternative tool: bibsql and bibtosql.
Then you can feed it to your schema by writing some SQL conversion queries.