Firstly python 2.7.11
Overview, I'm gathering the directory names in a given path and passing them into a subprocess cmd. From that subprocess I'm iterating over the output line by line, the directory name is the key and the subprocess.stdout is the value.
What I need is to keep the key the same but save the unique values and add them to a dict so I can write to a csv later.
Snip it of code showing 2 methods I have already tried (one is commented out). Both overwrite the existing key:value in the dict.
data = []
for dname in listdir(path):
header = dname
if isfile:
entrydict = dict()
cmd = "ct lsh -fmt \"%u \\n\" -since 01-Oct-2015 -all " + dname
# output of cmd is "name \r\n"
p1 = subp.Popen(cmd, stdout=subp.PIPE, stderr=subp.PIPE)
usr = []
for name in iter(p1.stdout.readline, ''):
if name.rstrip() not in usr:
usr.append(name.rstrip())
else:
entrydict[header] = usr
for n in usr:
entrydict[header] = n
data.append(entrydict)
Thanks!
Yes, you could collect all of the unique values as a list like names = ['f0', 'f1', 'f2'] and then assign it to your dict with a header as a key like
entrydict[header] = names
Just make sure that all of header are different.
Related
I have many configuration files containing many stanzas. I would like to search each stanza for a missing key value pair and insert to the end of each stanza. The configuration files can contain 1 or 1000 stanzas depending on the file.
The configuration files look like this:
[stanza name]
key = value
key2 = value
...
[another stanza]
setting = value
setting2 = value
...
For each of the stanzas, if key_something does not exist, append to bottom of stanza.
As a nice to have, an option to filter the stanza for key_something_else = value_something_else and append the same missing key value pair, that would be awesome
I am not even sure where to start. Attempted to google an answer but I am either not searching the correct terms or there is not an example I can find. Unfortunately I do not know what I do not know.
Expected output would look like:
#good stanza
[stanza name]
key = value
key2 = value
requiredKey = requiredValue
key_something_else = value_something_else
# stanza missing "requiredKey = requiredValue". Need to append "requiredKey = requiredValue" to stanza
[another stanza]
setting = value
setting2 = value
#stanza missing "requiredKey = requiredValue" but does contain "key_something_else = value_something_else". Need to append "requiredKey = requiredValue". (The purpose of "key_something_else = value_something_else" is so I can build on it as time goes by)
[third stanza]
key = value
key2 = value
key_something_else = value_something_else
One way to approach this problem is to use a Python script to parse through the configuration files and look for the missing key-value pair in each stanza. You can use the built-in configparser module to read and write the configuration files and use regular expressions to search for the key-value pair.
import configparser
import re
required_key = 'requiredKey'
required_value = 'requiredValue'
optional_key = 'key_something_else'
optional_value = 'value_something_else'
config = configparser.ConfigParser()
config.read(['file1.cfg', 'file2.cfg', ...])
for section in config.sections():
if not config.has_option(section, required_key):
config.set(section, required_key, required_value)
elif config.has_option(section, optional_key) and config.get(section, optional_key) == optional_value:
config.set(section, required_key, required_value)
with open('file1.cfg', 'w') as configfile:
config.write(configfile)
with open('file2.cfg', 'w') as configfile:
config.write(configfile)
Is something like this possible? Id like to use a dictionary or set as the key for my file renamer. I have a lot of key words that id like to filter out of the file names but the only way iv found to do it so far is to search by string such as key720 = "720" this make it functions correctly but creates bloat. I have to have a version of the code at bottom for each keyword I want to remove.
how do I get the list to work as keys in the search?
I tried to take the list and make it a string with:
str1 = ""
keyres = (str1.join(keys))
This was closer but it makes a string of all the entry's I think and didn't pick up any keywords.
so iv come to this at the moment.
keys = ["720p", "720", "1080p", "1080"]
for filename in os.listdir(dirName):
if keys in filename:
filepath = os.path.join(dirName, filename)
newfilepath = os.path.join(dirName, filename.replace(keys,""))
os.rename(filepath, newfilepath)
Is there a way to maybe go by index and increment it one at a time? would that allow the strings in the list to be used as strings?
What I'm trying to do is take a file name and rename it by removing all occurrences of the key words.
How about using Regular Expressions, specifically the sub function?
from re import sub
KEYS = ["720p", "720", "1080p", "1080"]
old_filename = "filename1080p.jpg"
new_filename = sub('|'.join(KEYS),'',old_filename)
print(new_filename)
I have three files in a directory and I wanted them to be matched with a list of strings to dictionary.
The files in dir looks like following,
DB_ABC_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz
DB_ABC_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz
DB_DEF_S1_001_MM_R1.faq.gz
DB_DEF_S1_001_MM_R2.faq.gz
The list has part of the filename as,
ABC
DEF
So here is what I tried,
import os
import re
dir='/user/home/files'
list='/user/home/list'
samp1 = {}
samp2 = {}
FH_sample = open(list, 'r')
for line in FH_sample:
samp1[line.strip().split('\n')[0]] =[]
samp2[line.strip().split('\n')[0]] =[]
FH_sample.close()
for file in os.listdir(dir):
m1 =re.search('(.*)_R1', file)
m2 = re.search('(.*)_R2', file)
if m1 and m1.group(1) in samp1:
samp1[m1.group(1)].append(file)
if m2 and m2.group(1) in samp2:
samp2[m2.group(1)].append(file)
I wanted the above script to find the matches from m1 and m2 and collect them in dictionaries samp1 and samp2. But the above script is not finding the matches, within the if loop. Now the samp1 and samp2 are empty.
This is what the output should look like for samp1 and samp2:
{'ABC': [DB_ABC_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz, DB_ABC_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz], 'DEF': [DB_DEF_S1_001_MM_R1.faq.gz, DB_DEF_S1_001_MM_R2.faq.gz]}
Any help would be greatly appreciated
A lot of this code you probably don't need. You could just see if the substring that you have from list is in dir.
The code below reads in the data as lists. You seem to have already done this, so it will simply be a matter of replacing files with the file names you read in from dir and replacing st with the substrings from list (which you shouldn't use as a variable name since it is actually used for something else in Python).
files = ["BSSE_QGF_1987_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz",
"BSSE_QGF_1967_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz",
"BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R1_001_MM_1.faq.gz",
"BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R2_001_MM_1.faq.gz"]
my_strings = ["MOHUA", "MSJLF"]
res = {s: [] for s in my_strings}
for k in my_strings:
for file in files:
if k in file:
res[k].append(file)
print(res)
You can pass the python script a dict and provide id_list and then add id_list as dict keys and append the fastqs if the dict key is in the fastq_filename:
import os
import sys
dir_path = sys.argv[1]
fastqs=[]
for x in os.listdir(dir_path):
if x.endswith(".faq.gz"):
fastqs.append(x)
id_list = ['MOHUA', 'MSJLF']
sample_dict = dict((sample,[]) for sample in id_list)
print(sample_dict)
for k in sample_dict:
for z in fastqs:
if k in z:
sample_dict[k].append(z)
print(sample_dict)
to run:
python3.6 fq_finder.py /path/to/fastqs
output from above to show what is going on:
{'MOHUA': [], 'MSJLF': []} # first print creates dict with empty list as vals for keys
{'MOHUA': ['BSSE_QGF_1987_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R1_001_MM_1.faq.gz', 'BSSE_QGF_1967_HJUS_1_MOHUA_2_T_bR_r1_v1_0_S1_R2_001_MM_1.faq.gz'], 'MSJLF': ['BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R2_001_MM_1.faq.gz', 'BSSE_QGF_18565_H33HLAFXY_1_MSJLF_T_bulk_RNA_S1_R1_001_MM_1.faq.gz']}
I am trying to write in a text (download.txt) the lines from open.txt that there are not the same 'id' and there are not in excepcions (idexception, classexcepcion). I have got writing the 'ids' not repeated and the idexcepcion.
MY QUESTION is how to add the condition 'classexception', I tried it but it is impossible. Any idea about dictionaries/conditionals I have to use?
c = open('open.txt','r') #structure: name:xxx; id:xxxx; class:xxxx; name:xxx; id:xxxx;class:xxxx etc
t=c.read()
d=open('download.txt','a')
allLines = t.split("\n")
lines = {}
class=[s[10:-1] for s in t.split() if s.startswith("class")]
for line in allLines:
idPos = line.find("id:")
colPos = line.find(";",idPos)
if idPos > -1:
id = line[idPos+4: colPos if colPos > -1 else None]
if id not in idexception:
lines.setdefault(id,line)
for l in lines:
d.write(lines[l]+'\n')
c.close()
d.close()
Generally you are quite unclear but if I understand correctly here is my approach to your problem with a lot o comments inside:
import re
id_exceptions = ['id_ex_1', 'id_ex_2']
class_exceptions = ['class_ex_1', 'class_ex_2']
# Values to be written to dowload.txt file
# Since id's needs to be unique, structure of this dict should be like this:
# {[single key as value of an id]: {name: xxx, class: xxx}}
unique_values = dict()
# All files should be opened using 'with' statement
with open('open.txt') as source:
# Read whole file into one single long string
all_lines = source.read().replace('\n', '')
# Prepare regular expression for geting values from: name, id and class as a dict
# Read https://regex101.com/r/Kby3fY/1 for extra explanation what does it do
reg_exp = re.compile('name:(?<name>[a-zA-Z0-9_-]*);\sid:(?<id>[a-zA-Z0-9_-]*);\sclass:(?<class>[a-zA-Z0-9_-]*);')
# Read single long string and match to above regular expression
for match in reg_exp.finditer(all_lines):
# This will produce a single dict {'name': xxx, 'id': xxx, 'class': xxx}
single_line = match.groupdict()
# Now we will check againt all conditions at once and
# if they are not True we will add values as an unique id
if single_line['id'] not in unique_values or # Check if not present already
single_line['id'] not in id_exceptions or # Check is not in id exceptions
single_line['class'] not in class_exceptions: # Check is not in class exceptions
# Add unique id values
unique_values[single_line['id']] = {'name': single_line['name'],
'class': single_line['class']}
# Now we just need to write it to download.txt file
with open('download.txt', 'w') as destintion:
for key, value in all_lines.items(): # In Python 2.x use all_lines.iteritems()
line = "id:{}; name:{}; class:{}".format(key, value['name'], value['class'])
def data_entry(categories):
# these are the values within categories
data_entry(['movies', 'sports', 'actors', 'tv', 'games', \
'activities', 'musicians', 'books'])
# for each different value in categories i need it to open a different txt file
# for example
when categories = movies ([0])
filename='movies.txt'
when categories = sports([1])
filename='sports.txt'
How would i write this in code?
Write a dict that maps a category name to a filename.
Loop over your list of categories, and retrieve the filename by indexing into the dict using the category name.
Use open() with the filename.
Example:
categories = ["movies", "tv"]
# long winded:
filenames = {
"movies": "movies.txt",
"tv": "television.txt",
# ...
}
# alternatively:
filenames = dict([(x, x + ".txt") for x in categories])
for category in categories:
with open(filenames[category], 'rb'):
pass
If you the names of the text files are always going to be <categoryname>.txt I would simply do:
for category in categories:
with open(category + ".txt", 'r') as f:
# Do whatever you need to here...
pass
This of course does not take directories or anything else into account. If the names of files for each category are likely to change then I'd suggest using a dictionary.
Probably you want a dictionary / hash:
dic = { 'movies':'movies.txt', 'xxx':'xxx.txt' }
for key,value in dic.items():
print (key, value)