Accessing *.mdb file with Pandas [duplicate] - python

I have 7 tables which I want to read from an Access file (.mdb), then I need to change the values using pandas DataFrame, and then save them again in a new Access file. Do you have any suggestion on how to do that?
I am relatively new in python, and any support is highly appreciated.

This may be some help: https://pypi.python.org/pypi/pandas_access
Everything should be straight forward after you're able to load the tables into pandas data frame. Then do the data manipulations you need to and send back to Access.

I think you should check this.
https://pypi.python.org/pypi/pyodbc/
Also, to read data from Access Table, try something like this.
# -*- coding: utf-8 -*-
import pypyodbc
pypyodbc.lowercase = False
conn = pypyodbc.connect(
r"Driver={Microsoft Access Driver (*.mdb, *.accdb)};" +
r"Dbq=C:\Users\Public\Database1.accdb;")
cur = conn.cursor()
cur.execute("SELECT CreatureID, Name_EN, Name_JP FROM Creatures");
while True:
row = cur.fetchone()
if row is None:
break
print(u"Creature with ID {0} is {1} ({2})".format(
row.get("CreatureID"), row.get("Name_EN"), row.get("Name_JP")))
cur.close()
conn.close()
Or . . . just use VBA, if you are already using Access.
Dim outputFileName As String
outputFileName = CurrentProject.Path & "\Export_" & Format(Date, "yyyyMMdd") & ".xls"
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel9, "Table1", outputFileName , True
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel9, "Table2", outputFileName , True
This could be an options too . . .
strPath = "V:\Reports\Worklist_Summary.xlsx"
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel12, "qryEscByDate", strPath
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel12, "qryCreatedByDate", strPath
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel12, "qryClosedByDate", strPath
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel12, "qryCreatedByUsers", strPath
DoCmd.TransferSpreadsheet acExport, acSpreadsheetTypeExcel12, "qrySummaries", strPath
Or . . . run some VBA scripts . . .
Option Compare Database
Option Explicit
Private Sub Command2_Click()
Dim strFile As String
Dim varItem As Variant
strFile = InputBox("Designate the path and file name to export to...", "Export")
If (strFile = vbNullString) Then Exit Sub
For Each varItem In Me.List0.ItemsSelected
DoCmd.TransferSpreadsheet transferType:=acExport, _
spreadsheetType:=acSpreadsheetTypeExcel9, _
tableName:=Me.List0.ItemData(varItem), _
fileName:=strFile
Next
MsgBox "Process complete.", vbOKOnly, "Export"
End Sub
Private Sub Form_Open(Cancel As Integer)
Dim strTables As String
Dim tdf As TableDef
For Each tdf In CurrentDb.TableDefs
If (Left(tdf.Name, 4) <> "MSys") Then
strTables = strTables & tdf.Name & ","
End If
Next
strTables = Left(strTables, Len(strTables) - 1)
Me.List0.RowSource = strTables
End Sub
When all data is exported, do your transformations, and load (back to Access or another destination).
I'll bet you don't even need the export step. You can probably do everything you need to do in Access, all y itself.

Related

encoding issue when exporting from dataframe in python to MS access

Dears,
i have a python script that have a query that reads from a DB , store the result in a Dataframe , then export it to MS access.
in the loop , it divides the result into 3 files ( each file has different month ) .
the issue in the column : LI_DESC , it have Arabic letter that shows correctly in jupyter , but it shows incorrect char when exported to access .
here is the columns showing correctly in jupyter:
here is the columns shown in access file:
python code:
import cx_Oracle
import os
import accessdb
import pandas as pd
dsn_tns = cx_Oracle.makedsn('10.112.**.****', '1521', service_name='cdwn10g.hq')
conn = cx_Oracle.connect(user='BI', password='BI', dsn=dsn_tns , encoding='utf-8')
sql_query= pd.read_sql_query("""SELECT MONTH1,LI_DESC,PORT,REGS_NUM,REG_DT,CTRY_CD,TAR_CD,UNS_QTY,UN_CD,KGN,KGG,CIF_AMT,CURCY_CD,CURCY_RT
FROM STTS.CDS
WHERE SUBSTR(REG_DT_G,1,6) BETWEEN to_number(extract(year from add_months(sysdate,-3)) || '' || to_char(add_months(sysdate,-3), 'MM')) AND to_number(extract(year from add_months(sysdate,-1)) || '' || to_char(add_months(sysdate,-1), 'MM'))
ORDER BY PORT, REGS_NUM, REG_DT""",conn)
df = pd.DataFrame(sql_query)
from datetime import datetime
today = datetime.now()
if not os.path.exists(r'C:\Users\nalkar\Documents\Python Scripts\RUNDATE'+today.strftime('%Y%m%d')) :
os.makedirs(r'C:\Users\nalkar\Documents\Python Scripts\RUNDATE'+today.strftime('%Y%m%d'))
months= df['MONTH1'].unique().tolist()
for month in months:
mydf=df.loc[df.MONTH1 == month]
mydf.to_accessdb(r"C:\Users\nalkar\Documents\Python Scripts\RUNDATE"+today.strftime('%Y%m%d')+"\%s.accdb" %month, "Data")
print('done')
else:
print(r'directory already exist')

Executing VBAscript from Python

I am in need of some help in regards to win32com.client. I have the code working as far as creating the macro from Python and using Excel but I would like this code to also run the vbascript.
Thank you guys for all of your wonderful feedback!
import pyodbc
import win32com.client as win32
xl = win32.gencache.EnsureDispatch('Excel.Application')
xl.Visible = True
ss = xl.Workbooks.Add()
sh = ss.ActiveSheet
xlmodule = ss.VBProject.VBComponents.Add(1) # vbext_ct_StdModule
sCode = '''Sub Download_Standard_BOM()
'Initializes variables
Dim cnn As New ADODB.Connection
Dim rst As New ADODB.Recordset
Dim ConnectionString As String
Dim StrQuery As String
ConnectionString = "Provider=SQLOLEDB; Network Library=dbmssocn;Password=********;User ID=*******;Initial Catalog=**;Data Source=*************;"
cnn.Open ConnectionString
cnn.CommandTimeout = 900
StrQuery = "SELECT * FROM car_search WHERE shop_id = *******"
rst.Open StrQuery, cnn
Sheets(1).Range("A2").CopyFromRecordset rst
End Sub'''
xlmodule.CodeModule.AddFromString(sCode)
You should be able to use Excel's Application.Run method:
xl.Run "Download_Standard_BOM"
EDIT
If you need to refer to ADO, then you can either use late-binding, like this:
Dim cnn As Object 'ADODB.Connection
Dim rst As Object 'ADODB.Recordset
Set cnn = CreateObject("ADODB.Connection")
Set rst = CreateObject("ADODB.Recordset")
Or, use early binding and add a reference to the VBA Project:
ss.VBProject.References.AddFromGuid "{2A75196C-D9EB-4129-B803-931327F72D5C}", 2, 8

using declare cursor with cx_oracle

I am trying to parse an sqlscript file based on a delimiter ";" and then subsequently call Cx_Oracle to connect and execute statements on the DB server. I have run into an issue with a cursor related block of code. My call structure is thus:
ScriptHandle = open(filepath)
SqlScript = ScriptHandle.read()
SqlCommands = SqlScript.split(';')
for sqlcommand in SqlCommands:
print sqlcommand,'\n'*3
if sqlcommand:
ODBCCon.ExecuteWithCx_Oracle(cursor, sqlcommand)
The problem I have comes with the following sql block:
DECLARE CURSOR date_cur IS (select calendar_date
from cg_calendar dates
where dates.calendar_date between '30-Jun-2014' and '31-Jul-2015'
and global_business_or_holiday = 'B');
BEGIN
FOR date_rec in date_cur LOOP
insert into fc_pos
SELECT PP.acctid,
PP.mgrid,
PP.activitydt,
PP.secid,
PP.shrparamt,
PP.lclmktval,
PP.usdmktval,
SM.asset_name_1,
SM.asset_name_2,
SM.cg_sym,
SM.fc_local_crncy_cd,
SM.fc_local_crncy_id,
SM.fc_trade_cd,
substr(SM.asset_name_1, 17,3) as against_crncy_cd
FROM FC_acct_mgr AM, asset SM, ma_mktval PP
WHERE PP.dw_asset_id = SM.dw_asset_id
AND PP.secid = SM.asset_id
AND PP.activitydt = date_rec.calendar_date
AND AM.acctid = PP.acctid
AND AM.mgrid = PP.mgrid
AND SM.asset_categ_cd = 'FC';
END LOOP;
END;
The above python parse step disassociates the above code based on delimiter ";" where as I need to treat above as one block starting at the DECLARE and ending at the END;
How can I accomplish it from python end. I have been unable to make any headway on this and this is a legacy process flow that I am automating.
Thanks in advance.

Python Converting tab limited file into csv

I basically want to convert tab delimited text file http://www.linux-usb.org/usb.ids into a csv file.
I tried importing using Excel, but it is not optimal, it turns out like:
8087 Intel Corp.
0020 Integrated Rate Matching Hub
0024 Integrated Rate Matching Hub
How I want it so for easy searching is:
8087 Intel Corp. 0020 Integrated Rate Matching Hub
8087 Intel Corp. 0024 Integrated Rate Matching Hub
Is there any ways I can do this in python?
$ListDirectory = "C:\USB_List.csv"
Invoke-WebRequest 'http://www.linux-usb.org/usb.ids' -OutFile $ListDirectory
$pageContents = Get-Content $ListDirectory | Select-Object -Skip 22
"vendor`tvendor_name`tproduct`tproduct_name`r" > $ListDirectory
#Variables and Flags
$currentVid
$currentVName
$currentPid
$currentPName
$vendorDone = $TRUE
$interfaceFlag = $FALSE
$nextline
$tab = "`t"
foreach($line in $pageContents){
if($line.StartsWith("`#")){
continue
}
elseif($line.length -eq 0){
exit
}
if(!($line.StartsWith($tab)) -and ($vendorDone -eq $TRUE)){
$vendorDone = $FALSE
}
if(!($line.StartsWith($tab)) -and ($vendorDone -eq $FALSE)){
$pos = $line.IndexOf(" ")
$currentVid = $line.Substring(0, $pos)
$currentVName = $line.Substring($pos+2)
"$currentVid`t$currentVName`t`t`r" >> $ListDirectory
$vendorDone = $TRUE
}
elseif ($line.StartsWith($tab)){
if ($interfaceFlag -eq $TRUE){
$interfaceFlag = $FALSE
}
$nextline = $line.TrimStart()
if ($nextline.StartsWith($tab)){
$interfaceFlag = $TRUE
}
if ($interfaceFlag -eq $FALSE){
$pos = $nextline.IndexOf(" ")
$currentPid = $nextline.Substring(0, $pos)
$currentPName = $nextline.Substring($pos+2)
"$currentVid`t$currentVName`t$currentPid`t$currentPName`r" >> $ListDirectory
Write-Host "$currentVid`t$currentVName`t$currentPid`t$currentPName`r"
$interfaceFlag = $FALSE
}
}
}
I know the ask is for python, but I built this PowerShell script to do the job. It takes no parameters. Just run as admin from the directory where you want to store the script. The script collects everything from the http://www.linux-usb.org/usb.ids page, parses the data and writes it to a tab delimited file. You can then open the file in excel as a tab delimited file. Ensure the columns are read as "text" and not "general" and you're go to go. :)
Parsing this page is tricky because the script has to be contextually aware of every VID-Vendor line proceeding a series of PID-Product lines. I also forced the script to ignore the commented description section, the interface-interface_name lines, the random comments that he inserted throughout the USB list (sigh) and everything after and including "#List of known device classes, subclasses and protocols" which is out of scope for this request.
I hope this helps!
You just need to write a little program that scans in the data a line at a time. Then it should check to see if the first character is a tab ('\t'). If not then that value should be stored. If it does start with tab then print out the value that was previously stored followed by the current line. The result will be the list in the format you want.
Something like this would work:
import csv
lines = []
with open("usb.ids.txt") as f:
reader = csv.reader(f, delimiter="\t")
device = ""
for line in reader:
# Ignore empty lines and comments
if len(line) == 0 or (len(line[0]) > 0 and line[0][0] == "#"):
continue
if line[0] != "":
device = line[0]
elif line[1] != "":
lines.append((device, line[1]))
print(lines)
You basically need to loop through each line, and if it's a device line, remember that for the following lines. This will only work for two columns, and you would then need to write them all to a csv file but that's easy enough

How to parse nagios status.dat file?

I'd like to parse status.dat file for nagios3 and output as xml with a python script.
The xml part is the easy one but how do I go about parsing the file? Use multi line regex?
It's possible the file will be large as many hosts and services are monitored, will loading the whole file in memory be wise?
I only need to extract services that have critical state and host they belong to.
Any help and pointing in the right direction will be highly appreciated.
LE Here's how the file looks:
########################################
# NAGIOS STATUS FILE
#
# THIS FILE IS AUTOMATICALLY GENERATED
# BY NAGIOS. DO NOT MODIFY THIS FILE!
########################################
info {
created=1233491098
version=2.11
}
program {
modified_host_attributes=0
modified_service_attributes=0
nagios_pid=15015
daemon_mode=1
program_start=1233490393
last_command_check=0
last_log_rotation=0
enable_notifications=1
active_service_checks_enabled=1
passive_service_checks_enabled=1
active_host_checks_enabled=1
passive_host_checks_enabled=1
enable_event_handlers=1
obsess_over_services=0
obsess_over_hosts=0
check_service_freshness=1
check_host_freshness=0
enable_flap_detection=0
enable_failure_prediction=1
process_performance_data=0
global_host_event_handler=
global_service_event_handler=
total_external_command_buffer_slots=4096
used_external_command_buffer_slots=0
high_external_command_buffer_slots=0
total_check_result_buffer_slots=4096
used_check_result_buffer_slots=0
high_check_result_buffer_slots=2
}
host {
host_name=localhost
modified_attributes=0
check_command=check-host-alive
event_handler=
has_been_checked=1
should_be_scheduled=0
check_execution_time=0.019
check_latency=0.000
check_type=0
current_state=0
last_hard_state=0
plugin_output=PING OK - Packet loss = 0%, RTA = 3.57 ms
performance_data=
last_check=1233490883
next_check=0
current_attempt=1
max_attempts=10
state_type=1
last_state_change=1233489475
last_hard_state_change=1233489475
last_time_up=1233490883
last_time_down=0
last_time_unreachable=0
last_notification=0
next_notification=0
no_more_notifications=0
current_notification_number=0
notifications_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_host=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
service {
host_name=gateway
service_description=PING
modified_attributes=0
check_command=check_ping!100.0,20%!500.0,60%
event_handler=
has_been_checked=1
should_be_scheduled=1
check_execution_time=4.017
check_latency=0.210
check_type=0
current_state=0
last_hard_state=0
current_attempt=1
max_attempts=4
state_type=1
last_state_change=1233489432
last_hard_state_change=1233489432
last_time_ok=1233491078
last_time_warning=0
last_time_unknown=0
last_time_critical=0
plugin_output=PING OK - Packet loss = 0%, RTA = 2.98 ms
performance_data=
last_check=1233491078
next_check=1233491378
current_notification_number=0
last_notification=0
next_notification=0
no_more_notifications=0
notifications_enabled=1
active_checks_enabled=1
passive_checks_enabled=1
event_handler_enabled=1
problem_has_been_acknowledged=0
acknowledgement_type=0
flap_detection_enabled=1
failure_prediction_enabled=1
process_performance_data=1
obsess_over_service=1
last_update=1233491098
is_flapping=0
percent_state_change=0.00
scheduled_downtime_depth=0
}
It can have any number of hosts and a host can have any number of services.
Pfft, get yerself mk_livestatus. http://mathias-kettner.de/checkmk_livestatus.html
Nagiosity does exactly what you want:
http://code.google.com/p/nagiosity/
Having shamelessly stolen from the above examples,
Here's a version build for Python 2.4 that returns a dict containing arrays of nagios sections.
def parseConf(source):
conf = {}
patID=re.compile(r"(?:\s*define)?\s*(\w+)\s+{")
patAttr=re.compile(r"\s*(\w+)(?:=|\s+)(.*)")
patEndID=re.compile(r"\s*}")
for line in source.splitlines():
line=line.strip()
matchID = patID.match(line)
matchAttr = patAttr.match(line)
matchEndID = patEndID.match( line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.setdefault(cur[0],[]).append(cur[1])
del cur
return conf
To get all Names your Host which have contactgroups beginning with 'devops':
nagcfg=parseConf(stringcontaingcompleteconfig)
hostlist=[host['host_name'] for host in nagcfg['host']
if host['contact_groups'].startswith('devops')]
Don't know nagios and its config file, but the structure seems pretty simple:
# comment
identifier {
attribute=
attribute=value
}
which can simply be translated to
<identifier>
<attribute name="attribute-name">attribute-value</attribute>
</identifier>
all contained inside a root-level <nagios> tag.
I don't see line breaks in the values. Does nagios have multi-line values?
You need to take care of equal signs within attribute values, so set your regex to non-greedy.
You can do something like this:
def parseConf(filename):
conf = []
with open(filename, 'r') as f:
for i in f.readlines():
if i[0] == '#': continue
matchID = re.search(r"([\w]+) {", i)
matchAttr = re.search(r"[ ]*([\w]+)=([\w\d]*)", i)
matchEndID = re.search(r"[ ]*}", i)
if matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2)
cur[1][attribute] = value
elif matchEndID:
conf.append(cur)
return conf
def conf2xml(filename):
conf = parseConf(filename)
xml = ''
for ID in conf:
xml += '<%s>\n' % ID[0]
for attr in ID[1]:
xml += '\t<attribute name="%s">%s</attribute>\n' % \
(attr, ID[1][attr])
xml += '</%s>\n' % ID[0]
return xml
Then try to do:
print conf2xml('conf.dat')
If you slightly tweak Andrea's solution you can use that code to parse both the status.dat as well as the objects.cache
def parseConf(source):
conf = []
for line in source.splitlines():
line=line.strip()
matchID = re.match(r"(?:\s*define)?\s*(\w+)\s+{", line)
matchAttr = re.match(r"\s*(\w+)(?:=|\s+)(.*)", line)
matchEndID = re.match(r"\s*}", line)
if len(line) == 0 or line[0]=='#':
pass
elif matchID:
identifier = matchID.group(1)
cur = [identifier, {}]
elif matchAttr:
attribute = matchAttr.group(1)
value = matchAttr.group(2).strip()
cur[1][attribute] = value
elif matchEndID and cur:
conf.append(cur)
del cur
return conf
It is a little puzzling why nagios chose to use two different formats for these files, but once you've parsed them both into some usable python objects you can do quite a bit of magic through the external command file.
If anybody has a solution for getting this into a a real xml dom that'd be awesome.
For the last several months I've written and released a tool that that parses the Nagios status.dat and objects.cache and builds a model that allows for some really useful manipulation of Nagios data. We use it to drive an internal operations dashboard that is a simplified 'mini' Nagios. Its under continual development and I've neglected testing and documentation but the code isn't too crazy and I feel fairly easy to follow.
Let me know what you think...
https://github.com/zebpalmer/NagParser

Categories