move pdf and prc files to another location Python [duplicate] - python

I'm trying to detect files with a list of extensions.
ext = [".3g2", ".3gp", ".asf", ".asx", ".avi", ".flv", \
".m2ts", ".mkv", ".mov", ".mp4", ".mpg", ".mpeg", \
".rm", ".swf", ".vob", ".wmv"]
if file.endswith(ext): # how to use the list ?
command 1
elif file.endswith(""): # it should be a folder
command 2
elif file.endswith(".other"): # not a video, not a folder
command 3

Use a tuple for it.
>>> ext = [".3g2", ".3gp", ".asf", ".asx", ".avi", ".flv", \
".m2ts", ".mkv", ".mov", ".mp4", ".mpg", ".mpeg", \
".rm", ".swf", ".vob", ".wmv"]
>>> ".wmv".endswith(tuple(ext))
True
>>> ".rand".endswith(tuple(ext))
False
Instead of converting everytime, just convert it to tuple once.

Couldn't you have just made it a tuple in the first place? Why do you have to do:
>>> ".wmv".endswith(tuple(ext))
Couldn't you just do:
>>> ext = (".3g2", ".3gp", ".asf", ".asx", ".avi", ".flv", \
".m2ts", ".mkv", ".mov", ".mp4", ".mpg", ".mpeg", \
".rm", ".swf", ".vob", ".wmv")

Related

File name grouping and indexing

I have a folder with txt files like below,
and I have used os.listdir to generate a file list,
['acc_exp01_user01.txt', 'acc_exp02_user01.txt', 'acc_exp03_user02.txt', 'acc_exp04_user02.txt', 'acc_exp05_user03.txt', 'acc_exp06_user03.txt', 'acc_exp07_user04.txt', 'acc_exp08_user04.txt', 'acc_exp09_user05.txt', 'acc_exp10_user05.txt', 'acc_exp11_user06.txt', 'acc_exp12_user06.txt', 'acc_exp13_user07.txt', 'acc_exp14_user07.txt', 'acc_exp15_user08.txt', 'acc_exp16_user08.txt', 'acc_exp17_user09.txt', 'acc_exp18_user09.txt', 'acc_exp19_user10.txt', 'acc_exp20_user10.txt', 'acc_exp21_user10.txt', 'acc_exp22_user11.txt', 'acc_exp23_user11.txt', 'acc_exp24_user12.txt', 'acc_exp25_user12.txt', 'acc_exp26_user13.txt', 'acc_exp27_user13.txt', 'acc_exp28_user14.txt', 'acc_exp29_user14.txt', 'acc_exp30_user15.txt', 'acc_exp31_user15.txt', 'acc_exp32_user16.txt', 'acc_exp33_user16.txt', 'acc_exp34_user17.txt', 'acc_exp35_user17.txt', 'acc_exp36_user18.txt', 'acc_exp37_user18.txt', 'acc_exp38_user19.txt', 'acc_exp39_user19.txt', 'acc_exp40_user20.txt', 'acc_exp41_user20.txt', 'acc_exp42_user21.txt', 'acc_exp43_user21.txt', 'acc_exp44_user22.txt', 'acc_exp45_user22.txt', 'acc_exp46_user23.txt', 'acc_exp47_user23.txt', 'acc_exp48_user24.txt', 'acc_exp49_user24.txt', 'acc_exp50_user25.txt', 'acc_exp51_user25.txt', 'acc_exp52_user26.txt', 'acc_exp53_user26.txt', 'acc_exp54_user27.txt', 'acc_exp55_user27.txt', 'acc_exp56_user28.txt', 'acc_exp57_user28.txt', 'acc_exp58_user29.txt', 'acc_exp59_user29.txt', 'acc_exp60_user30.txt', 'acc_exp61_user30.txt', 'gyro_exp01_user01.txt', 'gyro_exp02_user01.txt', 'gyro_exp03_user02.txt', 'gyro_exp04_user02.txt', 'gyro_exp05_user03.txt', 'gyro_exp06_user03.txt', 'gyro_exp07_user04.txt', 'gyro_exp08_user04.txt', 'gyro_exp09_user05.txt', 'gyro_exp10_user05.txt', 'gyro_exp11_user06.txt', 'gyro_exp12_user06.txt', 'gyro_exp13_user07.txt', 'gyro_exp14_user07.txt', 'gyro_exp15_user08.txt', 'gyro_exp16_user08.txt', 'gyro_exp17_user09.txt', 'gyro_exp18_user09.txt', 'gyro_exp19_user10.txt', 'gyro_exp20_user10.txt', 'gyro_exp21_user10.txt', 'gyro_exp22_user11.txt', 'gyro_exp23_user11.txt', 'gyro_exp24_user12.txt', 'gyro_exp25_user12.txt', 'gyro_exp26_user13.txt', 'gyro_exp27_user13.txt', 'gyro_exp28_user14.txt', 'gyro_exp29_user14.txt', 'gyro_exp30_user15.txt', 'gyro_exp31_user15.txt', 'gyro_exp32_user16.txt', 'gyro_exp33_user16.txt', 'gyro_exp34_user17.txt', 'gyro_exp35_user17.txt', 'gyro_exp36_user18.txt', 'gyro_exp37_user18.txt', 'gyro_exp38_user19.txt', 'gyro_exp39_user19.txt', 'gyro_exp40_user20.txt', 'gyro_exp41_user20.txt', 'gyro_exp42_user21.txt', 'gyro_exp43_user21.txt', 'gyro_exp44_user22.txt', 'gyro_exp45_user22.txt', 'gyro_exp46_user23.txt', 'gyro_exp47_user23.txt', 'gyro_exp48_user24.txt', 'gyro_exp49_user24.txt', 'gyro_exp50_user25.txt', 'gyro_exp51_user25.txt', 'gyro_exp52_user26.txt', 'gyro_exp53_user26.txt', 'gyro_exp54_user27.txt', 'gyro_exp55_user27.txt', 'gyro_exp56_user28.txt', 'gyro_exp57_user28.txt', 'gyro_exp58_user29.txt', 'gyro_exp59_user29.txt', 'gyro_exp60_user30.txt', 'gyro_exp61_user30.txt', 'labels.txt']
but I want to now group into a indexing list like this,
how can I realise it?
You can use glob to find out files based out of a pattern from a path then create the required DataFrame
from glob import glob
import os
exp_path = "Your Path Here"
acc_pattern = "acc_exp*.csv"
gyro_pattern = "gyro_exp*.csv"
acc_files = glob(os.path.join(exp_path,acc_pattern))
gyro_files = glob(os.path.join(exp_path,gyro_pattern))
Once you have all the required files , we can create the DataFrame
df = pd.DataFrame()
df['acc'] = [os.path.basename(x) for x in acc_files]
df['gyro'] = [os.path.basename(x) for x in gyro_files]
df['experiment'] = df['acc'].apply(lambda x:x[7:9])
df['userId'] = df['acc'].apply(lambda x:x[14:16])

Extracting numbers in text file

I have a text file which came from excel. I dont know how to take five digits after a specific character.
I want to take only five digits after #ACA in a text file.
my text is like:
ERROR_MESSAGE
(((#ACA16018)|(#ACA16019))&(#AQV71767='')&(#AQV71765='2'))?1:((#AQV71765='4')?1:((#AQV71767$'')?(((#AQV71765='1')|(#AQV71765='3'))?1:'Hasar veya Lehe Hukuk seçebilirsiniz'):'Rücu sıra numarasını yazıp Hasar veya Lehe Hukuk seçebilirsiniz'))
Rücu Oranı Girilmesi Zorunludur...'
#ACA17660
#ACA16560
#ACA15623
#ACA17804
BU ALANI BOŞ GEÇEMEZSİNİZ.EKSPER RAPORU GELMEDEN DY YE GERİ GÖNDEREMEZSİNİZ. PERT İHBARI VARSA PERT ÇALINMA OPERASYONU AKTİVİTESİ OLUŞTURULMALIDIR.
(#TSC[T008UNSMAS;FIRM_CODE=2 AND UNIT_TYPE='SG' AND UNIT_NO=#AQV71830]>0)?1:'Girdiğiniz değer fihristte yoktur'
#ACA17602
#ACA17604
#ACA56169
BU ALANI BOŞ GEÇEMEZSİNİZ
#ACA17606
#ACA17608
(#AQV71835='')?'Boş geçilemez':1
Lütfen Gönderilecek Kişinin Mail Adresini Giriniz ! '
LÜTFEN RED NEDENİNİ GİRİNİZ.
EKSİK BİLGİ / BELGE ALANINA GİRMİŞ OLDUĞUNUZ DEĞER YANLIŞ VEYA GEÇERŞİZDİR!!! LÜTFEN KONTROL EDİP TEKRAR DENEYİNİZ.'
BU ALAN BOŞ GEÇİLEMEZ. ÖDEME YAPILMADAN EK ÖDEME SÜRECİNİ BAŞLATAMAZSINIZ.
ONAYLANDI VE REDDEDİLDİ SEÇENEKLERİNİ KULLANAMAZSINIZ
BU ALAN BOŞ GEÇİLEMEZ.EVRAKLARINIZI , VARSA EKSPER RAPORUNU VE MUALLAĞI KONTROL EDİNİZ.
Muallak Tutarını kontrol ediniz.
'OTO BRANŞINDA REDDEDİLDİ NEDENİ SEÇMELİSİNİZ'
'OTODIŞI BRANŞINDA REDDEDİLDİ NEDENİ SEÇMELİSİNİZ'
(#AQV70003$'')?((#TSC[T001HASIHB;FIRM_CODE=#FP10100 AND COMPANY_CODE=2 AND CLAIM_NO=#AQV70003]$0)?1:'Bu dosya sistemde bulunmamaktadır'):'Bu alan boş geçilemez'
(#AQV70503='')?'Bu alan boş geçilemez.':((#ACA18635=1)?1:'Mağdura ait uygun kriterli ödeme kaydı mevcut değildir.')
(#AQV71809=0)?'Boş geçilemez':1
(#FD101AQV71904_AFDS<0)?'Tarih bugünün tarihinden büyük olamaz
I want to take every 5 digits which comes after #ACA, so:
16018, 16019, 17660, etc...
grep -oP '#ACA\K[0-9]{5}' file.txt
#ACA\K will match #ACA but not printed as part of output
[0-9]{5} five digits following #ACA
If variable number of digits are needed, use
grep -oP '#ACA\K[0-9]+' file.txt
If you don't know or don't like regular expressions, you can do this, although the code is a bit longer :
if __name__ == '__main__':
pattern = '#ACA'
filename = 'yourfile.txt'
res = list()
with open(filename, 'rb') as f: # open 'yourfile.txt' in byte-reading mode
for line in f: # for each line in the file
for s in line.split(pattern)[1:]: # split the line on '#ACA'
try:
nb = int(s[:5]) # take the first 5 characters after as an int
res.append(nb) # add it to the list of numbers we found
except (NameError, ValueError): # if conversion fails, that wasn't an int
pass
print res # if you want them in the same order as in the file
print sorted(res) # if you want them in ascending order
This should do it
import re
print(re.findall("#ACA(\d+)",str_var))
If you have the whole text in the variable str_var
Output:
['16018', '16019', '17660', '16560', '15623', '17804', '17602', '17604', '56169', '17606', '17608', '18635']
re.findall(r'#ACA(\d{5})', str_var)
[x[:5] for x in content.split("#ACA")[1:]]
PowerShell solution:
$contet = Get-Content -Raw 'your_file'
$match = [regex]::Matches($contet, '#ACA(\d{5})')
$match | ForEach-Object {
$_.Groups[1].Value
}
Output:
16018
16019
17660
16560
15623
17804
17602
17604
56169
17606
17608
18635

Iterating through a yaml file

I am trying to iterate through a yaml file. I want to extract the contents
ipv6: "2031:31:31:31:: 2032:32:32:32:: 2033:33:33:33:: 2034:34:34:34:: 2035:35:35:35::"
Below is my code. I get the below error :
for x in self.dhcp_dict['subnets'][sub]['ipv4'].split():
TypeError: string indices must be integers, not str
Can anyone tell em where i am going wrong? Thanks
Jessi
Code:
dict = yaml.load(fd)
self.server_dict = dict['server_configs']
self.interface_dict = self.server_dict['interface']
self.dhcp_dict = self.server_dict['dhcp_config']
def configureDhcpv6(self):
pdb.set_trace()
log.info ("Writing the dhcp.conf file")
infile = open('v6.txt', 'w+')
self.lease_time = self.dhcp_dict['lease_time']
infile.write("default-lease-time %s; \n" %(self.lease_time))
infile.write("preferred-lifetime 604800;\noption dhcp-renewal-time 604800;\noption dhcp-rebinding-time 7200;\noption dhcp6.domain-search cisco.com;\noption dhcp6.preference 255;\noption dhcp6.rapid-commit;\noption dhcp6.info-refresh-time 21600;\ndhcpv6-lease-file-name /var/lib/dhcpd/dhcpd6.leases;\nauthoritative;\nlog-facility local7;\n\n")
for sub in self.dhcp_dict['subnets']:
if (sub == 'relay'):
for x in self.dhcp_dict['subnets'][sub]['ipv6'].split():
range6 = sub + "11" + " " + sub + "254"
infile.write("Subnet 6 %s/64 {\n" % (sub))
infile.write(" range6 %s;\n}\n\n" % (range6))
infile.close()
Yaml file:
dhcp_config:
lease_time: "300"
relay_server: "5.5.5.0"
subnets:
relay:
ipv4: "30.30.30.0 31.31.31.0 32.32.32.0 33.33.33.0 34.34.34.0 35.35.35.0"
ipv6: "2031:31:31:31:: 2032:32:32:32:: 2033:33:33:33:: 2034:34:34:34:: 2035:35:35:35::"
smart_relay: "31.1.1.0 32.1.1.0 33.1.1.0 34.1.1.0 35.1.1.0"
snoop: "36.36.36.0 37.37.37.0 38.38.38.0 39.39.39.0 30.30.30.0"

custom sorting for find command output

I'm trying to get sorted directory/file list with unix "find" command.
# find . -type f
.
./bin
./data
./data/disks
./inc
./inc/calls
./inc/calls/show
./inc/calls/show/system
./inc/calls/show/cli
./inc/calls/show/network
./inc/calls/show/stats
./inc/calls/services
./inc/calls/services/ntp
./inc/calls/services/tsa
./inc/calls/services/webgui
./inc/calls/services/engine
./inc/calls/system
./inc/calls/change
./inc/calls/change/password
./inc/calls/change/network
./inc/calls/disk
./inc/calls/disk/encr
./inc/etc
I want to sort it like:
./inc/calls/show/system \
./inc/calls/show/cli \
./inc/calls/show/network \
./inc/calls/show/stats \
./inc/calls/services/ntp \
./inc/calls/services/tsa \
./inc/calls/services/webgui \
./inc/calls/services/engine \
./inc/calls/change/password \
./inc/calls/change/network \
./inc/calls/disk/encr \
./inc/calls/system \
./inc/calls/change \
./inc/calls/services \
./inc/calls/disk \
./inc/calls/show \
./inc/calls \
./data/disks \
./inc/etc \
./bin \
./data \
./inc
Which node (directory/file) has more child (directory/files) should be first... i want to do it with bash or python... What is the best way to do that?
Match lines containing / and prepend the number of fields to the line using / as the separator, sort on the numbers of fields and remove the count.
$ awk -F/ '/\//{print NF,$0}' file | sort -nrk1 | cut -d' ' -f2-
./inc/calls/show/system
./inc/calls/show/stats
./inc/calls/show/network
./inc/calls/show/cli
./inc/calls/services/webgui
./inc/calls/services/tsa
./inc/calls/services/ntp
./inc/calls/services/engine
./inc/calls/disk/encr
./inc/calls/change/password
./inc/calls/change/network
./inc/calls/system
./inc/calls/show
./inc/calls/services
./inc/calls/disk
./inc/calls/change
./inc/etc
./inc/calls
./data/disks
./inc
./data
./bin
I would use python and try to convert:
a/b
a/c
b/e/f
b/e/g
in something like:
{'a': {'b': {}, 'c': {}},
'b': {'e': {'f': {}, 'g': {}}},
}
To achieve this:
def add_list_to_dict(lst,d):
key, lst = lst[0], lst[1:]
if not key in d:
d[key] = {}
if lst:
add_list_to_dict(lst,d[key])
d = {}
for path in paths:
add_list_to_dict(path.split('/'),d)

Python + getopt - Problems with parsing

I have some problems with writing some Gerrit http://code.google.com/p/gerrit/ hooks.
http://gerrit.googlecode.com/svn/documentation/2.2.0/config-hooks.html
If I parse the command line for
patchset-created --change --change-url --project --branch --uploader --commit --patchset
def main():
if (len(sys.argv) < 2):
showUsage()
exit()
if (sys.argv[1] == 'update-projects'):
updateProjects()
exit()
need = ['action=', 'change=', 'change-url=', 'commit=', 'project=', 'branch=', 'uploader=',
'patchset=', 'abandoner=', 'reason=', 'submitter=', 'comment=', 'CRVW=', 'VRIF=' , 'patchset=' , 'restorer=', 'author=']
print sys.argv[1:]
print '-----'
optlist, args = getopt.getopt(sys.argv[1:], '', need)
id = url = hash = who = comment = reason = codeReview = verified = restorer = ''
print optlist
for o, a in optlist:
if o == '--change': id = a
elif o == '--change-url': url = a
elif o == '--commit': hash = a
elif o == '--action': what = a
elif o == '--uploader': who = a
elif o == '--submitter': who = a
elif o == '--abandoner': who = a
elif o == '--author' : who = a
elif o == '--branch': branch = a
elif o == '--comment': comment = a
elif o == '--CRVW' : codeReview = a
elif o == '--VRIF' : verified = a
elif o == '--patchset' : patchset = a
elif o == '--restorer' : who = a
elif o == '--reason' : reason = a
Command line input:
--change I87f7802d438d5640779daa9ac8196aeb3eec8c2a
--change-url http://<hostname>:8080/308
--project private/bar
--branch master
--uploader xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)
--commit 49aae9befaf27a5fede51b498f0660199f47b899 --patchset 1
print sys.argv[1:]
['--action', 'new',
'--change','I87f7802d438d5640779daa9ac8196aeb3eec8c2a',
'--change-url',
'http://<hostname>:8080/308',
'--project', 'private/bar',
'--branch', 'master',
'--uploader', 'xxxxxxx-xxxxx', 'xxxxxxx', '(xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)',
'--commit', '49aae9befaf27a5fede51b498f0660199f47b899',
'--patchset', '1']
print optlist
[('--action', 'new'),
('--change', 'I87f7802d438d5640779daa9ac8196aeb3eec8c2a'),
('--change-url', 'http://<hostname>:8080/308'),
('--project', 'private/bar'),
('--branch', 'master'),
('--uploader', 'xxxxxxx-xxxxx')]
I don't know why the script generates
'--uploader', 'xxxxxxx-xxxxx', 'xxxxxxx', '(xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)'
and not
'--uploader', 'xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)'
because so the script dont't parse --commit --patchset ...
When I parse comment-added all things works:
Command line input:
-change I87f7802d438d5640779daa9ac8196aeb3eec8c2a
--change-url http://<hostname>.intra:8080/308
--project private/bar
--branch master
--author xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)
--commit 49aae9befaf27a5fede51b498f0660199f47b899
--comment asdf
--CRVW 0
--VRIF 0
print sys.argv[1:]
'--action', 'comment',
'--change', 'I87f7802d438d5640779daa9ac8196aeb3eec8c2a',
'--change-url',
'http://<hostname>:8080/308',
'--project', 'private/bar',
'--branch', 'master',
'--author', 'xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)', <<< That's right!
'--commit', '49aae9befaf27a5fede51b498f0660199f47b899',
'--comment', 'asdf',
'--CRVW', '0',
'--VRIF', '0']
As the options names and values are space-separated, you have to put the values in quotes if they contain spaces themselves.
If you write --uploader xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx), the last two strings will actually end up in args from the line
optlist, args = getopt.getopt(sys.argv[1:], '', need)
as they are not associated with --uploader
You should quote an argument, if it contains spaces, like for all commandline tools:
--uploader "xxxxxxx-xxxxx xxxxxxx (xxxxxxxxxxxxx.xxxxxxx#xxx-xxxx.xx)"
You may also consider using gnu_getopt() as it would allow you to mix option and non-option arguments.
From the Documentation
The getopt() function stops processing options as soon as a non-option argument is encountered
If you use gnu_getopt, the rest of the options namely commit and pathset will still be parsed correctly even though the uploader argument has missing quotes

Categories