Exporting all fields in pcap files into the csv in python - python

I want to convert a pcap file into the csv using the python code; but the problem is I should specify precisely what fields to be exported using the tshark library. I want to export all fields. when I dont specify the fields, blank file is exported. the sample code is presented below:
tshark -r /root to the file/test1.pcap -T fields -e ip.src > test1.csv
I want to remove the special fields to export ALL fields; then accessing the fields in python (using a library like pandas in dictionary format like df["Source"])
Any help appreciated!

I want to remove the special fields to export ALL fields;
This is not possible with the CSV (fields) format
$tshark -r trace.pcap -T fields
tshark: "-Tfields" was specified, but no fields were specified with "-e".
An alternative solution is to use one of the JSON formats (-T ek|json|jsonraw) or the XML format (-T pdml).
then accessing the fields in python (using a library like pandas in dictionary format like df["Source"])
In python you could parse the JSON using json.lodas() and get a dictionary. See https://www.w3schools.com/python/python_json.asp

Related

Select specific fields from DB export that includes Json

I have an export from a DB table that has the following columns:
name|value|age|external_atributes
The external_atribute is on Json format. So the export looks like this:
George|10|30|{"label1":1,"label2":2,"label3":3,"label4":4,"label5":5,"label6":"6","label7":"7","label8":"8"}
Which is the most efficient way (since the export has more than 1m lines) to keep only the name and the values from label2, label5 and label6. For example from the above export I would like to keep only:
George|2|5|6
Edit: I am not sure for the sequence of the fields/variables on the JSON part. Data could be also for example:
George|10|30|{"label2":2,"label1":1,"label4":4,"label3":3,"label6":6,"label8":"8","label7":"7","label5":"5"}
Also the fact that some of the values are double quoted, while some are not, is intentional (This is how they appear also on the export).
My understanding until now is that I have to use something that has a JSON parser like Python or jq.
This is what i created on Python and seems that is working as expected:
from __future__ import print_function
import sys,json
with open(sys.argv[1], 'r') as file:
for line in file:
fields = line.split('|')
print (fields[0], json.loads(fields[3])['label2'], json.loads(fields[3])['label5'], json.loads(fields[3])['label6'], sep='|')
output:
George|2|5|6
Since I am looking for the most efficient way to do this, any comment is more than welcome.
Even if the data are easy to parse, I advise to use a json parser like jq to extract your json data:
<file jq -rR '
split("|")|[ .[0], (.[3]|fromjson|(.label2,.label5,.label6)|tostring)]|join("|")'
Both options -R and -r allows jq to accept and display a string as input and output (instead of json data).
The split function enable getting all fields into an array that can be indexed with number .[0] and .[3].
The third field is then parsed as json data with the function fromjson such that the wanted labels are extracted.
All wanted fields are put into an array and join together with the | delimiter.
You could split with multiple delimiters using a character class.
The following prints the desired result:
awk 'BEGIN { FS = "[|,:]";OFS="|"} {gsub(/"/,"",$15)}{print $1,$7,$13,$15}'
The above solution assumes that the input data is structured.
Since it is about record-based text edits, awk is most probably the best tool to accomplish the task. However, here it is a sed solution:
sed 's/\([^|]*\).*label2[^0-9]*\([0-9]*\).*label5[^0-9]*\([0-9]*\).*label6[^0-9]*\([0-9]*\).*/\1|\2|\3|\4/' inputFile

How to specify length prefix in bcp command?

I'm using bcp tool to import CSV into a sql server table. I'm using python subprocess to execute the bcp command. My sample bcp command is like below:
bcp someDatabase.dbo.sometable IN myData.csv -n -t , -r \n -S mysqlserver.com -U myusername -P 'mypassword'
the command executes and says
0 rows copied.
Even if i remove the -t or -n option, the message is still the same. I read from sql server docs that there is something called length prefix(if bcp tool is used with -n (native) mode).
How can i specify that length prefix with bcp command?
My goal is to import CSV into a sql server table using bcp tool. I first create my table according to my date in the CSV file and i dont create a format file for bcp. I want all my data to be inserted correctly(according to the data type i have sepecified in my table).
If it is a csv file then do not use -n, -t or -r options. Use -e errorFileName to catch the error(s) you may be encountering. You can then take the appropriate steps.
It is a very common practice with ETL tasks to first load text files into a "load" table that has all varchar/char data types. This avoids any possible implied data conversion errors that are more difficult/time-consuming to troubleshoot via BCP. Just pass the character data in the text file into character datatype columns in SQL Server. Then you can move data from the "load" table into your final destination table. This will allow you to use the MUCH more functional T-SQL commands to handle transformation of data types. Do not force BCP/SQL Server to transform your data-types for you by going from text file directly into your final table via BCP.
Also, I would also suggest visually inspecting your incoming data file to confirm it is formatted as specified. I often see mixups betweeen \n and \r\n for line terminator.
Last, when loading the data, you should also use the -e option as Neeraj has stated. This will capture "data" errors (it does not report command/syntax errors; just data/formatting errors). Since you incoming file is an ascii text file, you DO want to use the -c option for loading into the all-varchar "load" table.

Manipulating a file with nested tags and key-value pairs

I have a need to work with configuration files that use nested HTML-style tags of key-value pairs using equals signs.
I'd like a Python approach that would allow me to read such files, add, delete or modify sections, and write the updated file.
The files look like:
<tag1>
key1=value1
key2=value2
<tag2>
key3=value3
</tag2>
<tag2>
key3=value four
</tag2>
</tag1>
So its not quite a HTML or XML file, and not a Windows INI file either. There are no spaces surrounding the equals signs, there are a few random blank lines in the files that seem to be ignored and values in the key-value pairs don't use quote marks and may have embedded spaces.
I could not find a definition or name for this exact file structure but I found it hard to focus the search so I may have missed something obvious.
Is this a recognized standard file structure? If so what is it called?
I'd appreciate any pointers on what libraries can be coerced into working with this structure and maybe some examples if they are not readily available in the docco.
Thanks.
For keeping configuration files , you can use the configparser module in python. This makes it very easy to read config information in your app.
For a config file like this :-
[installation]
library=%(prefix)s/lib
include=%(prefix)s/include
bin=%(prefix)s/bin
prefix=/usr/local
[debug]
log_errors=true
show_warnings=False
[server]
port: 8080
nworkers: 32
pid-file=/tmp/spam.pid
root=/www/root
You can read this configuration file like this shown below :
from configparser import ConfigParser
cfg = ConfigParser()
cfg.read('config.ini')
['config.ini']
cfg.sections()
['installation', 'debug', 'server']
cfg.get('installation','library')
'/usr/local/lib'
cfg.getboolean('debug','log_errors')
True
cfg.getint('server','port')
8080
cfg.getint('server','nworkers')
32
print(cfg.get('server','signature'))
If you wanna go with only that html kind of configuration , checkout xml.etree module . It offers a wide range of functions .

Malformed CSV quoting

I pass data from SAS to Python using CSV format. Have a problem with a quoting format SAS uses. Strings like "480 КЖИ" ОАО aren't quoted, but Python csv module thinks they're.
dat = ['18CA4,"480 КЖИ" ОАО', '1142F,"""Росдорлизинг"" Российская дор,лизинг,компания"" ОАО"']
for i in csv.reader(dat):
print(i)
>>['18CA4', '480 КЖИ ОАО']
>>['1142F', '"Росдорлизинг" Российская дор,лизинг,компания" ОАО']
The 2nd string is fine, but I need 480 КЖИ ОАО string to be "480 КЖИ" ОАО. Don't find such an option in csv module. Maybe it's possible to force proc export to quote all " chars?
UPD: Here's a similar problem Python CSV : field containing quotation mark at the beginning
UPD2: #Quentin have asked for details. Here they're: I have SAS8.2 connected to 9.1 server. I download custom format data from server side with proc format cntlout=..; proc download... So i get a dictionary-like dataset <key>, <value>. Then i pass this dataset in CSV format using proc export via DDE interface to Python. But proc export quotes only strings which include delimiter (comma) as i understand. So i think, i need SAS to quote quotation marks too or Python to unquote only those strings which include commas.
UPDATE: switching from proc export via DDE to direct reading of dataset with a modified SAS7BDAT Python module hugely improved performance. And i got rid of the problem above.
SAS will add extra quotes if the value has quotes in it already.
data _null_;
file log dsd ;
string='"480 КЖИ" ОАО';
put string;
run;
Generates this result:
"""480 КЖИ"" ОАО"
Perhaps the quotes are being removed at some other point in the flow from SAS to Python? Try saving the CSV file to a disk and having Python read from the disk file.

Python script to convert po file to localized json

I need python script to convert po file to localized json.
you might start here http://docs.python.org/library/gettext.html and here http://docs.python.org/library/json.html
http://jsgettext.berlios.de/ contains a .po to json converter (p.erl) - for python you can use polib to access .po file contents and transform as desired

Categories