I would like to generate geoiplist.acl file from a csv file. acl file format:
acl "A1" {
31.14.133.39/32;
37.221.172.0/23;
acl "A2" {
5.145.149.142/32;
57.72.6.0/24;
......
the csv file: http://download.db-ip.com/free/dbip-country-2016-09.csv.gz
Here are sample lines from CSV file with IP_Start, IP_End and Country columns.
"0.0.0.0","0.255.255.255","US"
"1.0.0.0","1.0.0.255","AU"
"1.0.1.0","1.0.3.255","CN"
"1.0.4.0","1.0.7.255","AU"
"1.0.8.0","1.0.15.255","CN"
"1.0.16.0","1.0.31.255","JP"
"1.0.32.0","1.0.63.255","CN"
"1.0.64.0","1.0.127.255","JP"
"1.0.128.0","1.0.255.255","TH"
"1.1.0.0","1.1.0.255","CN"
I got some references from here: http://geoip.site/ but their acl don't have complete list.
Anyone can help me to do this in bash code please. Thanks in advance.
The issue here is that DB-IP provide the begin and end value of each range in human readable IP address format. Why they have done this, I'm not sure, because the more universal (easier to process) format is to simply present these values in integer form.
In any case, I have modified the Python script on http://geoip.site/ to handle this and included the DB-IP database URL within the script. The ACL file generated from their CSV file is now also available to download from http://geoip.site/download/DB-IP/GeoIP.acl
Note I have already identified some issues with this database:
The entry "::","2001:1ff:ffff:ffff:ffff:ffff:ffff:ffff","US" exists in it. This is obviously complete rubbish and also broke the Python script (I've improved the error detection code to handle this) and is also one of the reasons for point 4 below.
"224.0.0.0","255.255.255.255","CH" is an interesting entry. I'm not entirely sure how they have deemed the entire multicast block of the IPv4 address space to be delegated to Switzerland, or why it exists in their database at all.
The statistical analysis (available on http://geoip.site/) suggests that their DB/CSV file spans 100% of the IPv4 address space outside 224.0.0.0/3 (multicast). That's 3,758,096,384 addresses. But we already know that several address blocks should not exist in here, the obvious ones being 10.0.0.0/8, 172.16.0.0/12 and 192.168.0.0/16 (and indeed others; further investigation reveals the entry "192.168.0.0","192.169.31.255","US" exists, which covers 192.168.0.0/16 and beyond). So this result looks questionable.
The statistical analysis also reports that their DB/CSV spans 100% of the IPv6 address space. This is primarily because they have mapped 3000::/4 (and various other smaller address blocks) to the US, which is wrong (see http://www.iana.org/assignments/ipv6-unicast-address-assignments/ipv6-unicast-address-assignments.xhtml where 3000::/4 is listed as RESERVED). This mapping originates from the entry "2c10::","ffff:ffff:ffff:ffff:ffff:ffff:ffff:ffff","US". The other 2 databases on http://geoip.site/ are nowhere close to this magnitude of coverage across the IPv6 address space (both are currently less than 0.1%), so this result also looks questionable.
Given all of the above, I would question the accuracy of their database and contact them about it. But feel free to download the http://geoip.site/download/DB-IP/GeoIP.acl file if you wish to use it.
Finally, I would not have even attempted this in BASH. The conversions required to produce this file from their CSV file are only available in more advanced languages like Python; BASH just wouldn't cut this (well, not my BASH).
I hope this has helped resolved your query/problem.
UPDATE
As of the December 2016 version of their database, DB-IP have introduced a ZZ acl to cover IPv4 networks that are outside the realm of being mapped to any specific country. This certainly resolves some of the issues I raised above.
acl ZZ {
0.0.0.0/8;
10.0.0.0/8;
100.64.0.0/10;
127.0.0.0/8;
169.254.0.0/16;
172.16.0.0/12;
192.0.0.8/29;
192.0.0.16/28;
192.0.0.32/27;
192.0.0.64/26;
192.0.0.128/25;
192.0.2.0/24;
192.88.99.0/24;
192.168.0.0/16;
198.18.0.0/15;
198.51.100.0/24;
203.0.113.0/24;
};
Related
I recently managed to restore an old COLX wallet by importing a mobile backup file into colx-qt core %appdata folder.
After resyncing the blockchain I changed the rpcpassword and the wallet's encryption password, tested both of these and everything was in good order.
For testing purposes, I had decided to dump another encrypted backup file (wallet.dat) into the %appdata folder without removing the restored one first, resynced the blockchain, tried the passwords and noted they were now both incorrect.
I've removed both backups,resynced the blockchain anew, set rpcpassword and wallet unlock password to those associated with the first backup, imported said backup again but noticed that both passwords were now incorrect therefore losing my access to that mobile backup while at it.
Since I happen to have an older wallet.json file containing multiple addresses, several pubkeys and ckeys as well as the mkey (in the format {nID=1; encrypted_key:"the encrypted key"; nDerivationIterations:"62116 or some similar number"; version:""; etc etc}), could I use this information to restore my access to the wallet?
If YES, how exactly must I go about doing it?
Thank you in advance for your assistance!
I haven't tried anything else so far because I am trying to understand how this happened, what caused the change and how will I need to fix it before I'd just go ahead adding even more muddy details to the issue.
I'm working on a model of universe for which I'm using data available on Sloan Digital Sky Survey site. Problem is some files are more than 4GB large(total more than 50GB) and I know those files contain a lot of data columns but I want data only from few columns. I have heard about web scraping so I thought to search about how to do it but it didn't help as all the tutorials explained how to download the whole file using python. I want know that is there any way through which I can extract only few columns from that file so that I only have the data I need and I won't have to download the whole larges file just for a small fraction of its data?
Sorry, my question is just words and no codes because I'm not that pro in python. I just searched online and learned how to do basic web-scraping but it didn't solve my problem.
It will be even more helpful if you could suggest me some more ways to reduce the size of data I'll have to download.
Here is the URL to download FITS files: https://data.sdss.org/sas/dr12/boss/lss/
I only want to extract columns that have coordinates(ra, dec), distance, velocity and redshifts from the files.
Also, is there a way to do the same thing with CSV files or a general way to do it with any file?
I'm afraid what you're asking is generally not possible, at least not with significant effort and software support both on the client and server side.
First of all, the way FITS tables are stored in binary is row-oriented meaning if you wanted to stream a portion of a FITS table you can read it one row at a time. But to read individual columns you need to make partial reads of each row for every single row in the table. Some web servers support what are called "range requests" meaning you can request only a few ranges of bytes from a file, instead of the whole file. The web server has to have this enabled, and not all servers do. If FITS tables were stored column-oriented this could be feasible, as you could download just the header of the file to determine the ranges of the columns, and then download just the ranges for those columns.
Unfortunately, since FITS tables are row-oriented, if you wanted to load say 3 columns from it, and the table contains a million rows, that would involve 3 million range requests which would likely involve enough overhead that you wouldn't gain anything from it (and I'm honestly not sure what limits web servers place on how many ranges you can request in a single request but I suspect most won't allow something so extreme.
There are other astronomy data formats (e.g. I think CASA Tables) that can store tables in a column-oriented format, and so are more feasible for this kind of use case.
Further, even if the HTTP limitations could be overcome, you would need software support for loading the file in this manner. This has been discussed to a limited extent here but for the reasons discussed above it would mostly be useful for a limited set of cases, such as loading one HDU at a time (not so helpful in your case if the entire table is in one HDU) or possibly some other specialized cases such as sections of tile-compressed images.
As mentioned elsewhere, Dask supports loading binary arrays from various cloud-based filesystems, but when it comes to streaming data from arbitrary HTTP servers it runs into similar limitations.
Worse still, I looked at the link you provided and all the files there are gzip-compressed, so it is especially difficult to deal with since you can't know what ranges of them to request without decompressing them first.
As an aside, since you asked, you will have the same problem with CSV, only worse since CSV fields are not typically in fixed-width format, so there is no way to know how to extract individual columns without downloading the whole file.
For FITS maybe it would be helpful to develop a web service capable of serving arbitrary extracts from larger FITS files. If such a thing already exists I don't know, but I don't think it exists in a very general sense. So this would a) have to be developed, and b) you would have to ask anyone hosting the files you want to access to host such a service.
Your best bet is to just download the whole file, extract the data you need from it, and delete the original file assuming you no longer need it. It's possible the information you need is also already accessible through some online database.
My place of work receives sets of pipe delimited files from many different clients that we use Visual Studio Integration Services projects to import into tables in our MS SQL 2008 R2 server for later processing - specifically with Data Flow Tasks containing Flat File Source to OLE DB Destination steps. Each data flow task has columns that are specifically mapped to columns in our tables, but the chances of a column addition in any file from any client are relatively high (and we are rarely warned that there will be changes), which is becoming tedious as I currently need to...
Run a python script that uses pyodbc to grab the columns contained in the destination tables and compare them to the source files to find out if there is a difference in columns
Execute the necessary SQL to add the columns to the destination tables
Open the corresponding VS Solution, refresh the columns in the flat file sources that have new columns and manually map each new column to the newly created columns in the OLE DB Destination
We are quickly getting more and more sites that I have to do this with, and I desperately need to find a way to automate this. The VS project can easily be automated if we could depend on the changes being accounted for, but as of now this needs to be a manual process to ensure we load all the data properly. Things I've thought about but have been unable to execute...
Using an XML parser - combined with the output of the python script mentioned above - to append new column mappings to the source/destination objects in the VS Package.dtsx.xml. I hit a dead end when I could not find out more information about creating a valid "DTS:DTSID" for new column mapping, and the file became corrupted whenever I edited it. This also seemed a very unstable option
Finding any built-in event handler in Visual Studio to throw an error if the flat file has a new, un-mapped column - I would be fine with this as a solution because we could confidently schedule the import projects to run automatically and only worry about changing the mapping for projects that failed. I could find a built in feature that does this. I'm also aware I could do this with a python script similar to the one mentioned above that fails if there are differences, but this would be extremely tedious to implement due to file-naming conventions and the fact that there are 50+ clients with more on the way.
I am open to any type of solution, even if it's just an idea. As this is my first question on Stack Overflow, I apologize if this was asked poorly and ask for feedback if the question could be improved. Thanks in advance to those that take the time to read!
Edit:
#Larnu stated that SSIS by default throws an error when unrecognized columns are found in the files. This however does not currently happen with our Visual Studio Integration Services projects and our team would certainly resist a conversion of all packages to SSIS at this point. It would be wonderful if someone could provide insight as to how to ensure the package would fail if there were new columns - in VS. If this isn't possible, I may have to pursue the difficult route as mentioned by #Dave Cullum, though I don't think I get paid enough for that!
Also, talking sense into the clients has proven to be impossible - the addition of columns will always be a crapshoot!
Using a script task you can read your file and record how many pipes are in a line:
using (System.IO.StreamReader sr = new System.IO.StreamReader(path))
{
string line = sr.ReadLine();
int ColumnCount = line.Length - line.Replace("|", "").Length +1;
}
I assume you know how to set that to a variable.
Now add an execute SQL and store result as another variable:
Select Count(*)
from INFORMATION_SCHEMA.columns
where TABLE_NAME = [your destination table]
Now exiting the execute SQL add a conditional arrow and compare the numbers. If they are equal continue your process. If they are not equal then go ahead and send an email (or some other type of notification.
I recently started coding, but took a brief stint. I started a new job and I’m under some confidential restrictions. I need to make sure python and pandas are secure before I do this—I’ll also be talking with IT on Monday
I was wondering if pandas in python was a local library, or does the data get sent to or from elsewhere? If I write something in pandas—will the data be stored somewhere under pandas?
The best example of what I’m doing is best found on a medium article about stripping data from tables that don’t have csv Exports.
https://medium.com/#ageitgey/quick-tip-the-easiest-way-to-grab-data-out-of-a-web-page-in-python-7153cecfca58
Creating a DataFrame out of a dict, doing vectorized operations on its rows, printing out slices of it, etc. are all completely local. I'm not sure why this matters. Is your IT department going to say, "Well, this looks fishy—but some random guy on the internet says it's safe, so forget our policies, we'll allow it"? But, for what it's worth, you have this random guy on the internet saying it's safe.
However, Pandas can be used to make network requests. Some of the IO functions can take a URL instead of a filename or file object. Some of them can also use another library that does so—e.g., if you have lxml installed, read_html, will pass the filename to lxml to open, and if that filename is an HTTP URL, lxml will go fetch it.
This is rarely a concern, but if you want to get paranoid, you could imagine ways in which it might be.
For example, let's say your program is parsing user-supplied CSV files and doing some data processing on them. That's safe; there's no network access at all.
Now you add a way for the user to specify CSV files by URL, and you pass them into read_csv and go fetch them. Still safe; there is network access, but it's transparent to the end user and obviously needed for the user's task; if this weren't appropriate, your company wouldn't have asked you to add this feature.
Now you add a way for CSV files to reference other CSV files: if column 1 is #path/to/other/file, you recursively read and parse path/to/other/file and embed it in place of the current row. Now, what happens if I can give one of your users a CSV file where, buried at line 69105, there's #http://example.com/evilendpoint?track=me (an endpoint which does something evil, but then returns something that looks like a perfectly valid thing to insert at line 69105 of that CSV)? Now you may be facilitating my hacking of your employees, without even realizing it.
Of course this is a more limited version of exactly the same functionality that's in every web browser with HTML pages. But maybe your IT department has gotten paranoid and clamped down security on browsers and written an application-level sniffer to detect suspicious followup requests from HTML, and haven't thought to do the same thing for references in CSV files.
I don't think that's a problem a sane IT department should worry about. If your company doesn't trust you to think about these issues, they shouldn't hire you and assign you to write software that involves scraping the web. But then not every IT department is sane about what they do and don't get paranoid about. ("Sure, we can forward this under-1024 port to your laptop for you… but you'd better not install a newer version of Firefox than 16.0…")
I have a text file that contains something like this :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
host another_host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
I want my program to detect the line with the 'host' and then modify the content of the block according to what I type.
When I do the following (for example with request.form.get('name') in flask):
#random inputs
host = name2
comment = nothing
hardware = 00:00:00:00:00:00
address = 192.168.101.123
I would like to have :
host host_name {
# comment (optional)
hardware ethernet 01:22:85:EA:A8:5D;
fixed-address 192.168.107.210;
}
#after the change
host name2 {
# nothing
hardware ethernet 00:00:00:00:00:00;
fixed-address 192.168.101.123;
}
I don't have a problem with regex but rather the program that I have to do in order to achieve this, so how can I do it?
If you start to cod e the way you are thinking about your problem, you will likely have a complete and utter mess no ne can change or maintain, even if it works at first.
You have several different small tasks, and you are thinking of them "as one thing".
No. You are using Falsk to provide you an easy and light-weight web interface. That is ok. You already know how to get a text block from it. You don't need to ask anything about Flask now. Nor tio put any further code in the same place the code that gets data from the web is.
Instead just write some plain Python functions that will get your textual data as parameters, and then update the configuration files for you.
And while at that, if you can pick an special template and create a new config file when doing this, instead of trying to parse an existing file, and update your desired values in place, then, this is something you can achieve.
Parsing a "real world" config file in place and live update it is not an easy task. Actually it can be so complicated that most Linux distributions skipped trying that for more than 10 years.
And then you have a further complication you don't mention: you probably want to keep any configurations you are not changing on the file. I was to advise you to keep a template of the file, and fill in your data, creating a new file on each run. But that would require you to have all the other config data in some other format, which would basically duplicate your problem.
So, ok, your idea of "getting data from the original file" with regular expressions might be a go. But still, keep it separate from writing back the file. And don't think in "lines" if said file is structured in blocks.
One feasible thing would be to read the file, get the data you are interested in into a Python data-structure (for example, a list of dictionaries, each one having your host_name, comment, ethernet and ip fields). And, in a second apply of the same regex, change all those for placeholders , so that the file contents could be filled back in by a call to the .format method, or using Flask's jinja2 templating.
Separating the above in 2 functions will even allow you to present all the configured hosts on your web interface, so the user can edit then individually without having to type ethernet addresses by hand.
Sorry, but i won be writing all this code for you. I hope the above can help you think about a proper approach there. So, if later you come up with other questions, with some code from your attempts, we can help you further.