Splitting string in Python with multiple occurrence of split keyword - python

So, I wanna split filter and update parameter for MongoDB replace one method available in Pymongo Library.
{filter}, {update}
are passed to me from a file, one pair per line
Eg: {"k1":"v1"}, {"k1":"v1", "k2":"v2", "k3":"v3"}
What I wanna do ?
Split them such that I get two dict variables,
filter = {"k1":"v1"}
update = {"k1":"v1", "k2":"v2", "k3":"v3"}
What have I tried ?
Problem is, I don't wanna change the original format and if I split them using "," then it might split abruptly, I can't also rely on splitting on first occurrence of "," as filter part itself might have multiple ","
def data_replace_one(host, port, dbname, coll_name, file_path, authdb):
if LOCALHOST:
client = pymongo.MongoClient(host, port)
else:
print("Enter credentials:")
uname = input("Username: ")
pwd = getpass.getpass()
client = pymongo.MongoClient(
host, port, username=uname, password=pwd, authSource=authdb)
db = client[dbname]
coll = db[coll_name]
with open(file_path) as in_file:
list_dict_queries = [line.strip() for line in in_file]
list_dict_queries = list(filter(None, list_dict_queries))
for query in list_dict_queries:
query_list = query.split("|")
query_list[0] = query_list[0].strip()
query_list[1] = query_list[1].strip()
#print(literal_eval(query_list[0]), literal_eval(query_list[1]))
coll.replace_one(literal_eval(
query_list[0]), literal_eval(query_list[1]))

I think it would be simplest to add some square braces around each line, and then interpret it as JSON - assuming that your input format is guaranteed to be JSON compliant.
Something like:
import json
with open(file_path) as in_file:
list_dict_queries = [('[' + line.strip() + ']') for line in in_file]
query_list = [json.loads(n) for n in list_dict_queries]

If you would not have braces/curly brackets anywhere else, then you can use the following.
>>> filter, update = re.findall('{.*?}', '{"k1":"v1"}, {"k1":"v1", "k2":"v2", "k3":"v3"}')
>>> filter
'{"k1":"v1"}'
>>> update
'{"k1":"v1", "k2":"v2", "k3":"v3"}'

Related

How do I split a combo list in a large text file?

my problem is that I have a very large database of emails and passwords and I need to send it to a mysql database.
The .txt file format is something like this:
emailnumberone#gmail.com:password1
emailnumbertwo#gmail.com:password2
emailnumberthree#gmail.com:password3
emailnumberfour#gmail.com:password4
emailnumberfive#gmail.com:password5
My idea is to make a loop that takes the line and make it a variable, search the ":" and pick the text before, send it to the db and then the same with the after part of the line. How do I do this?
Short program with some error handling:
Create demo data file:
t = """
emailnumberone#gmail.com:password1
emailnumbertwo#gmail.com:password2
emailnumberthree#gmail.com:password3
emailnumberfour#gmail.com:password4
emailnumberfive#gmail.com:password5
k
: """
with open("f.txt","w") as f: f.write(t)
Parse data / store:
def store_in_db(email,pw):
# replace with db access code
# see http://bobby-tables.com/python
# for parametrized db code in python (or the API of your choice)
print("stored: ", email, pw)
with open("f.txt") as r:
for line in r:
if line.strip(): # weed out empty lines
try:
email, pw = line.split(":",1) # even if : in pw: only split at 1st :
if email.strip() and pw.strip(): # only if both filled
store_in_db(email,pw)
else:
raise ValueError("Something is empty: '"+line+"'")
except Exception as ex:
print("Error: ", line, ex)
Output:
stored: emailnumberone#gmail.com password1
stored: emailnumbertwo#gmail.com password2
stored: emailnumberthree#gmail.com password3
stored: emailnumberfour#gmail.com password4
stored: emailnumberfive#gmail.com password5
Error: k
not enough values to unpack (expected 2, got 1)
Error: : Something is empty: ': '
Edit: According to What characters are allowed in an email address? - a ':' may be part of the first part of an email if quoted.
This would theoretically allow inputs as
`"Cool:Emailadress#google.com:coolish_password"`
which will get errors with this code. See Talip Tolga Sans answer for how to break down the splitting differently to avoid this problem.
This can be done through simple split() method of the strings in python.
>>> a = 'emailnumberone#gmail.com:password1'
>>> b = a.split(':')
>>> b
['emailnumberone#gmail.com', 'password1']
To accomodate #PatrickArtner's complex password fail this can be done:
atLocation = a.find('#')
realSeperator = atLocation + a[atLocation:].find(':')
emailName = a[0:atLocation]
emailDomain = a[atLocation:realSeperator]
email = emailName + emailDomain
password = a[realSeperator + 1:]
print(email, password)
>>> emailnumberone#gmail.com com:plex:PassWord:fail
str.find() returns the first occurrence location of the given character in the given string. Emails can have : in their name field but they can not have #. So first locating the # then locating the : would give you the correct separation locations. After that splitting the string will be piece of cake.
Open file as context manager (with open(...)), You can iterate over the lines with a for loop, then regex match(re Module)(or just split on ":") and use sqlite3 to insert your values to DB.
So the file:
with open("file.txt", "r") as f:
for line in f:
pass #manipulation
Sqlite3 Docs: https://docs.python.org/2/library/sqlite3.html

get wanted data from a text file with python without using splits

Hello i have a that file:
WORKERS = yovel:10.0.0.6,james:10.0.0.7
BLACKLIST = 92.122.197.45:ynet,95.1.2.2:twitter
I'm trying to write a function in python that will get the worker IP and returns the worker name like this:
workername = getName(ip)
The only method i thougt to do it is with splits(using .split(":") , .split(",") etc.) but it will be very long code and not smart.
is there a shorter way to do it?
You can use re:
import re
def getName(ip, content = open('filename.txt').read()):
_r = re.findall('\w+(?=:{})'.format(ip), content)
return _r[0] if _r else None
print(getName('10.0.0.6'))
Output:
'yovel'
Note, however, it is slightly more robust to use split:
def getName(ip):
lines = dict(i.strip('\n').split(' = ') for i in open('filename.txt')]
d = {b:a for a, b in map(lambda x:x.split(':'), lines['WORKERS'].split(','))}
return d.get(ip)
Using split() doesn't look too bad here:
def getName(ip_address, filename='file.txt', line_type='WORKERS'):
with open(filename) as in_file:
for line in in_file:
name, info = [x.strip() for x in line.strip().split('=')]
if name == line_type:
info = [x.split(':') for x in info.split(',')]
lookup = {ip: name for name, ip in info}
return lookup.get(ip_address)
Which works as follows:
>>> getName('10.0.0.6')
'yovel'

Regular expression in Python issue

I have the below code in one of my configuration files:
appPackage_name = sqlncli
appPackage_version = 11.3.6538.0
The left side is the key and the right side is value.
Now i want to be able to replace the value part with something else given a key in Python.
import re
Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
searchstr = re.escape(key) + " = [\da-zA-Z]+"
replacestr = re.escape(key) + " = " + re.escape(value)
filedata = ""
with open(Filepath,'r') as File:
filedata = File.read()
File.close()
print ("Before change:",filedata)
re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
I assume there is something wrong with the regex i am using. But i am not able to figure out what . Can someone please help me ?
Use the following fix:
import re
#Filepath = r"C:\Users\bhatsubh\Desktop\Everything\Codes\Python\OO_CONF.conf"
key = "appPackage_name"
value = "Subhayan"
#searchstr = re.escape(key) + " = [\da-zA-Z]+"
#replacestr = re.escape(key) + " = " + re.escape(value)
searchstr = r"({} *= *)[\da-zA-Z.]+".format(re.escape(key))
replacestr = r"\1{}".format(value)
filedata = "appPackage_name = sqlncli"
#with open(Filepath,'r') as File:
# filedata = File.read()
#File.close()
print ("Before change:",filedata)
filedata = re.sub(searchstr,replacestr,filedata)
print ("After change:",filedata)
See the Python demo
There are several issues: you should not escape the replacement pattern, only the literal user-defined values in the regex pattern. You can use a capturing group (a pair of unescaped (...)) and a backreference (here, \1 since the group is only one in the pattern) to restore the part of the matched string you need to keep rather than build that replacement string dynamically. As the version value contains dots, you should add a . to the character class, [\da-zA-Z.]. You also need to assign new value after replacing, so as to actually modify it.

Array/List from txt file in Python

I was trying to get value from .txt file into array/list in python.
Let's say I have this data in user.txt :
ghost:001
ghost:002
ghost:003
So, when I want to output it as :
'ghost:001','ghost:002','ghost:003'
I use this function
def readFromFile(filename, use_csv):
userlist = ''
userlist_b = ''
print ("Fetching users from '%s'"% filename)
f = open (filename,"r")
for line in f:
userlist+=str(line)
userlist = "','".join(userlist.split("\n"))
userlist = "'" + userlist + "'"
userlist = "(%s)" %userlist
return userlist
My question is how could I do this:
I want to print specific user. Something like
idx = 2
print("User[%s] : %s",%idx, %(array[idx]))
*output:*
User[2] : ghost:003
How do I form the array?
Could anyone help me?
I would store the users in a dict where the keys increment for each user:
d = {}
with open("in.txt") as f:
user = 1
for line in f:
d[user]= line.rstrip()
user += 1
print(d)
{1: 'ghost:001', 2: 'ghost:002', 3: 'ghost:003'}
If you just want a list of user and to access by index:
with open("in.txt") as f:
users = f.readlines()
print("User {}".format(users[0]))
User ghost:001
Look into loading dictionaries. This code should help you.
import json
import pickle
d = { 'field1': 'value1', 'field2': 2, }
json.dump(d,open("testjson.txt","w"))
print json.load(open("testjson.txt","r"))
pickle.dump(d,open("testpickle.txt","w"))
print pickle.load(open("testpickle.txt","r"))
If you want the file (one big string) split out into smaller strings, don't build up a new string, then split it apart again. Just append each line to a list:
def readFromFile(filename, use_csv):
userlist = []
print ("Fetching users from '%s'"% filename)
with open(filename,"r") as f:
for line in f.read():
userlist.append(line)
return userlist
array = readFromFile('somefile', use_csv)
idx = 2
print("User[%s] : %s" % (idx, array[idx]))
Not sure about the User['idx'] part of you desire.
Try to use list comprehensions.
Use indexing rather than dictionaries if that's all you need. (I can add a dict version if the seconds part of the line is really the index you are looking up)
# read the file and use strip to remove trailing \n
User = [line.strip() for line in open(filename).readlines()]
# your output
print "User[2] : %s"%User[2]
# commented line is more clear
#print ','.join(User)
# but this use of repr adds the single quotes you showed
print ','.join(repr(user) for user in User)
output:
User[2] : ghost:003
'ghost:001','ghost:002','ghost:003'

Find all lines that match regex pattern and grab part of string

f = open("machinelist.txt", 'r')
lines = f.readlines()
for host in lines:
hostnames = host.strip()
print hostnames
Returns:
\\TESTHOSTDEV01
\\TESTHOSTDEVDB01
\\TESTHOSTDEVDBQA
\\TESTHOSTDEVQA02
\\BTLCMOODY01 MRA Server
\\BTLCSTG05 StG Server
\\BTLCWEB02
\\BTLCWSUS01 Test Update Server
\\HIMSAPP01
\\SLVAPP01
\\TORAAPP01
\\HNSVAPP01
\\TESAPP01
I am curious if there is a way to use re.findall() to grab all lines that begin with "\" however I just want to capture return the hostnames, not the "\ or the comments after the host such as "MRA Server" (example: BTLCMOODY01)
You can do something like this(no need of regex):
Use str.startswith to check if a line starts with '\\':
>>> strs = "\\BTLCMOODY01 MRA Server\n"
>>> strs.startswith('\\')
True
Then use a combination of str.split and str.lstrip to get the first word:
>>> strs.split(None, 1)
['\\BTLCMOODY01', 'MRA Server\n']
#apply str.lstrip on the first item
>>> strs.split(None, 1)[0].lstrip('\\')
'BTLCMOODY01'
Code:
>>> with open('abc1') as f:
... for line in f:
... if line.startswith('\\'): #check if the line startswith `\`
... print line.split(None,1)[0].lstrip('\\')
...
TESTHOSTDEV01
TESTHOSTDEVDB01
TESTHOSTDEVDBQA
TESTHOSTDEVQA02
BTLCMOODY01
BTLCSTG05
BTLCWEB02
BTLCWSUS01
HIMSAPP01
SLVAPP01
TORAAPP01
HNSVAPP01
TESAPP01
An approach using regular expression:
import re
f = open("machinelist.txt", 'r')
lines = f.readlines()
for host in lines:
hostnames = host.strip()
if hostnames.startswith('\\'):
print(re.match(r'\\\\(\S+)',hostnames).group(1))
It yields:
TESTHOSTDEV01
TESTHOSTDEVDB01
TESTHOSTDEVDBQA
TESTHOSTDEVQA02
BTLCMOODY01
BTLCSTG05
BTLCWEB02
BTLCWSUS01
HIMSAPP01
SLVAPP01
TORAAPP01
HNSVAPP01
TESAPP01
import re
pattern = re.compile(r"\\([a-z]+)[\s]+",re.I) # single-slash, foll'd by word: \HOSTNAME
fh = open("file.txt","r")
for x in fh:
match = re.search(pattern,x)
if(match): print(match.group(1))

Categories