I am using tabula-py to extract some text from a pdf.
For my program I need to know the total number of pages. Is it possible to know this with tabula-py or do I need to use another module for this? If yes can you suggest the easiest method, possibly without any additional module or with a built-in module?
Note: No I don't need to read all, so I am not using pages='all'
Related
The website Download the GLEIF Golden Copy and Delta Files.
has buttons that download data that I want to retrieve automatically with a python script. Usually when I want to download a file, I use mget or similar, but that will not work here (at least I don't think it will).
For some reason I cannot fathom, the producers of the data seem to want to force one to manually download the files. I really need to automate this to reduce the number of steps for my users (and frankly for me), since there are a great many files in addition to these and I want to automate as many as possible (all of them).
So my question is this - is there some kind of python package for doing this sort of thing? If not a python package, is there perhaps some other tool that is useful for it? I have to believe this is a common annoyance.
Yup, you can use BeautifulSoup to scrape the URLs then download them with requests.
I'm Trying to convert .txt data file to STDF (ATE Standard Test Data Format, commonly used in semiconductor tests) file.
Is there any way to do that?
Are there any libraries in Python which would help in cases like this?
Thanks!
You can try Semi-ATE STDF library:
It supports only ver. 4. You can use conda-forge or pypi to install it.
It is of course possible since Python is Turing complete. However, you should use one of the available open source or commercial libraries to handle the STDF writing if you are not familiar with STDF. Even one mis-placed byte in the binary output will wreck your file.
It is impossible to say whether an existing tool can do this for you because a text file can have anything in it. Your text file will need to adhere to the tool's expectations of where the necessary header data (lot id, program name, etc.), test names and numbers, part identifiers, test results and so on will be in the text file.
Is it possible if I have a list of url parse them in python and take this server calls key/values without need to open any browser manually and save them to a local file?
The only library I found for csv is pandas but anything for the first part. Any example will be perfect for me.
You can investigate the use of one of the built in or available libraries that let python actually perform the browser like operations and record the results, filter them and then use the built in csv library to output the results.
You will probably need one of the lower level libraries:
urllib/urllib2/urllib3
And you may need to override, one or more, of the methods to record the transaction data that you are looking for.
I'm using urllib to open one site and get some information on it.
Is there a way to "open" this site only to the part I need and discard the rest (discard I mean don't open/load the rest)?
I'm not sure what you are trying to do. If you are simply trying to parse the site to find the useful "information", then I recommend using the library BeautifulSoup. That library makes it easy to keep certain parts of the site while discarding the rest.
If however you trying to save download bandwidth by downloading only a piece of the site, then you will need to do a lot more work. If that is the case please say so in your question and I'll update the answer.
You should be able to read(bytes) instead of read(), this will read a number of bytes instead of all of it. Then append to already downloaded bytes, and see if it contains what you're looking for. Then you should be able to stop download with .close().
Hello I want to Compare two webpages using python script.
how can i achieve it? thanks in advance!
First, you want to retrieve both webpages. You can use wget, urlretrieve, etc.:
wget Vs urlretrieve of python
Second, you want to "compare" the pages. You can use a "diff" tool as Chinmay noted. You can also do a keyword analysis of the two pages:
Parse all keywords from page. e.g. How do I extract keywords used in text?
Optionally take the "stem" of the words with something like:
http://pypi.python.org/pypi/stemming/1.0
Use some math to compare the two pages' keywords, e.g. term frequency–inverse document frequency: http://en.wikipedia.org/wiki/Tf%E2%80%93idf with some of the python tools out there like these: http://wiki.python.org/moin/InformationRetrieval
What do you mean by compare? If you just want to find the differences between two files, try difflib, which is part of the standard Python library.