how to join various bits of string and data together using python

how to join various bits of string and data together using python - python

Python newbie here. I've been working my way through this code to basically create a string which includes a date. I have bits of the code working to get the data I want, however I need help formatting to string to tie in the data together.
This is what I have so far:
def get_rectype_count(filename, rectype):
return int(subprocess.check_output('''zcat %s | '''
'''awk 'BEGIN {FS=";"};{print $6}' | '''
'''grep -i %r | wc -l''' %
(filename, rectype), shell=True))
str = "MY VALUES ("
rectypes = 'click', 'bounce'
for myfilename in glob.iglob('*.gz'):
#print (rectypes)
print str.join(rectypes)
print (timestr)
print([get_rectype_count(myfilename, rectype)
for rectype in rectypes])
My output looks like this:
clickMY VALUES (bounce
'2015-07-01'
[222, 0]
I'm trying to create this output file:
MY VALUES ('2015-07-01', click, 222)
MY VALUES ('2015-07-01', bounce, 0)

When you call join on a string it joins together everything in the sequence passed to it, using itself as the separator.
>>> '123'.join(['click', 'bounce'])
click123bounce
Python supports formatting strings using replacement fields:
>>> values = "MY VALUES ('{date}', {rec}, {rec_count})"
>>> values.format(date='2015-07-01', rec='click', rec_count=222)
"MY VALUES ('2015-07-01', click, 222)"
With your code:
for myfilename in glob.iglob('*.gz'):
for rec in rectypes:
rec_count = get_rectype_count(myfilename, rec)
print values.format(date=timestr, rec=rec, rec_count=rec_count)
edit:
If you want to use join, you can join a newline, \n:
>>> print '\n'.join(['line1', 'line2'])
line1
line2
Putting it together:
print '\n'.join(values.format(date=timestr,
rec=rec,
rec_count=get_rectype_count(filename, rec))
for filename in glob.iglob('*.gz')
for rec in rectypes)

try this:
str1 = "MY VALUES ("
rectypes = ['click', 'bounce']
K=[]
for myfilename in glob.iglob('*.gz'):
#print (rectypes)
#print str.join(rectypes)
#print (timestr)
k=([get_rectype_count(myfilename, rectype)
for rectype in rectypes])
for i in range(0,len(rectypes)):
print str1+str(timestr)+","+rectypes[i]+","+str(k[i])+")"

Related

Removing Single Quotes from a String Stored in an Array

I wrote code to append a json response into a list for some API work I am doing, but it stores the single quotes around the alphanumerical value I desire. I would like to get rid of the single quotes. Here is what I have so far:
i = 0
deviceID = []
while i < deviceCount:
deviceID.append(devicesRanOn['resources'][i])
deviceID[i] = re.sub('[\W_]', '', deviceID[i])
i += 1
if i >= deviceCount:
break
if (deviceCount == 1):
print ('Device ID: ', deviceID)
elif (deviceCount > 1):
print ('Device IDs: ', deviceID)
the desired input should look like this:
input Device IDs:
['14*************************00b29', '58*************************c3df4']
Output:
['14*************************00b29', '58*************************c3df4']
Desired Output:
[14*************************00b29, 58*************************c3df4]
As you can see, I am trying to use RegEx to filter non Alphanumeric and replace those with nothing. It is not giving me an error nor is it preforming the actions I am looking for. Does anyone have a recommendation on how to fix this?
Thank you,
xOm3ga

You won't be able to use the default print. You'll need to use your own means of making a representation for the list. But this is easy with string formatting.
'[' + ', '.join(f'{id!s}' for id in ids) + ']'
The f'{id:!s} is an f-string which formats the variable id using it's __str__ method. If you're on a version pre-3.6 which doesn't use f-strings, you can also use
'%s' % id
'{!s}'.format(id)
PS:
You can simplify you're code significantly by using a list comprehension and custom formatting instead of regexes.
ids = [device for device in devicesRanOn['resources'][:deviceCount]]
if deviceCount == 1:
label = 'Device ID:'
elif deviceCount > 1:
label = 'Device IDs:'
print(label, '[' + ', '.join(f'{id!s}' for id in ids) + ']')

Python 2.7: how to append lines which contain both string & numeric values to text file?

I want to append following line to my text file:
Degree of polarization is 8.23 % and EVPA is 45.03 degree.
i.e. I want both string and numeric values to be appended.
I want to append above line with different numeric values after each run of my python code.
Any help will be appreciated.
For example
>>> a = 10.5
>>> with open("myfile.txt","a") as f:
... f.write(a)
gives me error.

Do you mean something like this:
while True:
polarization = getPolarization()
evpa = getEvpa()
my_text = "Degree of polarization is {} % and EVPA is {} degree.".format(polarization, evpa)
with open("test.txt", "a") as myfile:
myfile.write(my_text)
Maybe you should also write what have you tried yet and what problems/errors occurred

You can only write strings to files.
Strings can be concatenated:
>>> 'a' + 'b'
'ab'
Numbers can be converted to strings:
>>> str(4)
'4'
>>> str(5.6)
'5.6'
You should be able to get started with that.
Also, Python's string formatting will automatically do this for you:
>>> '{} % and {} degree'.format(6.7, 8.9)
'6.7 % and 8.9 degree'
Or with a more readable format using keywords:
>>> '{polarization} % and {evpa} degree'.format(polarization=6.7, evpa=8.9)
'6.7 % and 8.9 degree'

how to manipulate SREC file

I have an S19 file looking something like below:
S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8
I want to separate the first two characters and also the next two characters, and so on... I want it to look like below (last two characters are also to be separated for each line):
S0, 03, 0000, FC
S3, 0D, 0003C000, 0F00000000000000, 20
S3, FD, 00000000, 782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, ED, 000000F8, 3D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B0000, 3D
S3, 15, 00000400, FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF, 7D
S3, FD, 00000410, 10B5DFF828000468012147F22C10C4F20300016047F22010C4F20300, 00
S7, 05, 00008EB4, B8
How can I do this in Python?
I have something like this:
#!/usr/bin/python
import string,os,sys,re,fileinput
print "hi"
inputfile = "k60.S19"
outputfile = "k60_out.S19"
# open the source file and read it
fh = file(inputfile, 'r')
subject = fh.read()
fh.close()
# create the pattern object. Note the "r". In case you're unfamiliar with Python
# this is to set the string as raw so we don't have to escape our escape characters
pattern2 = re.compile(r'S3')
pattern3 = re.compile(r'S7')
pattern1 = re.compile(r'S0')
# do the replace
result1 = pattern1.sub("S0, ", subject)
result2 = pattern2.sub("S3, ", subject)
result3 = pattern3.sub("S7, ", subject)
# write the file
f_out = file(outputfile, 'w')
f_out.write(result1)
f_out.write(result2)
f_out.write(result3)
f_out.close()
#EoF
but it is not working as I like!! Can someone help me with how to come up with proper regular expression use for this?

try package bincopy, maybe you need it.
bincopy - Interpret strings as packed binary data
Mangling of various file formats that conveys binary information (Motorola S-Record, Intel HEX and binary files).
import bincopy
f = bincopy.BinFile()
f.add_srec_file("path/to/your/s19/flie.s19")
f.as_binary() # print s19 as binary
or you can easily use open() for a file:
with open("path/to/your/s19/flie.s19") as s19:
for line in s19:
type = line[0:2]
count = line[2:4]
adress = line[4:12]
data = line[12:-2]
crc = line[-2:]
print type + ", "+ count + ", " + adress + ", " + data + ", " + crc + "\n"
hope it helps.
Motorola S-record file format

You can do it using a callback function as replacement with re.sub:
#!/usr/bin/python
import re
data = r'''S0030000FC
S30D0003C0000F0000000000000020
S3FD00000000782EFF1FB58E00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S3ED000000F83D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D2B00003D
S31500000400FFFFFFFFFFFFFFFFFFFFFFFF7EF9FFFF7D
S3FD0000041010B5DFF828000468012147F22C10C4F20300016047F22010C4F2030000
S70500008EB4B8'''
pattern = re.compile(r'^(..)(..)((?:.{4}){1,2})(.*)(?=..)', re.M)
def repl(m):
repstr = ''
for g in m.groups():
if (g):
repstr += g + ', '
return repstr
print re.sub(pattern, repl, data)
However, as Mark Setchell notices it, there is probably a nice way to do it with slicing.

I know you are thinking Python and regexes, but this was made for awk and the following will maybe help you work out the way to do it using slicing:
awk '{r=length($0);print substr($0,1,2),substr($0,3,2),substr($0,5,8),substr($0,13,r-14),substr($0,r-1)}' OFS=, k60.s19
That says "get the length of the line in variable r, then print the first two characters, the next two characters, the next 8 characters and so on... and use a comma as the field separator".
EDITED
Here are a few more hints to get you started...
if you want to avoid printing line 1, you can do
awk 'FNR==1{next} ...rest of awk script above ... '
If you want to only process lines longer than 40 characters, you can do
awk 'length($0)>40 {print}' yourfile
If you only want to process lines where the second field is "xx", you can do
awk '$2 ~ "xx" {print}' yourfile

python Printing One result per line

I show below part of a working script to verify twitter accounts that is giving me the results I want one besides the other, while I want to have them one per line including the title of the find
Example, the first three result are for followers, then how many others are being followed, and in how many lists the user is in, and its giving me the results all in one line something like this:
1350 257 27 and I want it to be as follows
Followers:1,350
Following: 257
Number of lists present: 27
I tried to use " ; " commas, "/n " ; but either it does not work or gives me a 500 Error
Here is the script
All help will be nice
Thank you
................
details = twitter.show_user(screen_name='xxxxxx')
print "content-type: text/html;charset=utf-8"
print
print"<html><head></head><body>"
print (details['followers_count']) #followers
print (details['friends_count'])# following
print (details['listed_count'])# In how many lists
... ....

Instead of the last three print lines, use string formatting to pass in the values.
print "Followers:{}\nFollowing: {}\nNumber of lists present: {}".format(
details['followers_count'], details['friends_count'], details['listed_count']
)

Take a look at the print function. You can write multiple arguments in a tab-separated line like:
print details['followers_count'], details['friends_count'], details['listed_count']
If you want more control over what you print use the join function:
# Add the parts you want to show
stringParts = []
for part in ['followers_count','friends_count','listed_count']:
stringParts.append( part + " = " + str(details[part]) )
seperator = "," # This will give you a comma seperated string
print seperator.join( stringParts )

You can use the % operator
print 'Followers: %s \nFollowing: %s \nNumber of lists present: %s' % (
details['followers_count'], details['friends_count'],
details['listed_count'])

Use grep on file in Python

I have searched the grep answers on here and cannot find an answer. They all seem to search for a string in a file, not a list of strings from a file. I already have a search function that works, but grep does it WAY faster. I have a list of strings in a file sn.txt (with one string on each line, no deliminators). I want to search another file (Merge_EXP.exp) for lines that have a match and write it out to a new file. The file I am searching in has a half millions lines, so searching for a few thousand in there takes hours without grep.
When I run it from command prompt in windows, it does it in minutes:
grep --file=sn.txt Merge_EXP.exp > Merge_EXP_Out.exp
How can I call this same process from Python? I don't really want alternatives in Python because I already have one that works but takes a while. Unless you think you can significantly improve the performance of that:
def match_SN(serialnumb, Exp_Merge, output_exp):
fout = open(output_exp,'a')
f = open(Exp_Merge,'r')
# skip first line
f.readline()
for record in f:
record = record.strip().rstrip('\n')
if serialnumb in record:
fout.write (record + '\n')
f.close()
fout.close()
def main(Output_CSV, Exp_Merge, updated_exp):
# create a blank output
fout = open(updated_exp,'w')
# copy header records
f = open(Exp_Merge,'r')
header1 = f.readline()
fout.write(header1)
header2 = f.readline()
fout.write(header2)
fout.close()
f.close()
f_csv = open(Output_CSV,'r')
f_csv.readline()
for rec in f_csv:
rec_list = rec.split(",")
sn = rec_list[2]
sn = sn.strip().rstrip('\n')
match_SN(sn,Exp_Merge,updated_exp)

Here is a optimized version of pure python code:
def main(Output_CSV, Exp_Merge, updated_exp):
output_list = []
# copy header records
records = open(Exp_Merge,'r').readlines()
output_list = records[0:2]
serials = open(Output_CSV,'r').readlines()
serials = [x.split(",")[2].strip().rstrip('\n') for x in serials]
for s in serials:
items = [x for x in records if s in x]
output_list.extend(items)
open(updated_exp, "w").write("".join(output_list))
main("sn.txt", "merge_exp.exp", "outx.txt")
Input
sn.txt:
x,y,0011
x,y,0002
merge_exp.exp:
Header1
Header2
0011abc
0011bcd
5000n
5600m
6530j
0034k
2000lg
0002gg
Output
Header1
Header2
0011abc
0011bcd
0002gg
Try this out and see how much time it takes...

When I use full path to grep location it worked (I pass it the grep_loc, Serial_List, Export):
import os
Export_Dir = os.path.dirname(Export)
Export_Name = os.path.basename(Export)
Output = Export_Dir + "\Output_" + Export_Name
print "\nOutput: " + Output + "\n"
cmd = grep_loc + " --file=" + Serial_List + " " + Export + " > " + Output
print "grep usage: \n" + cmd + "\n"
os.system(cmd)
print "Output created\n"

I think you have not chosen the right title for your question: What you want to do is the equivalent of a database JOIN. You can use grep for that in this particular instance, because one of your files only has keys and no other information. However, I think it is likely (but of course I don't know your case) that in the future your sn.txt may also contain extra information.
So I would solve the generic case. There are multiple solutions:
import all data into a database, then do a LEFT JOIN (in sql) or equivalent
use a python large data tool
For the latter, you could try numpy or, recommended because you are working with strings, pandas. Pandas has an optimized merge routine, which is very fast in my experience (uses cython under the hood).
Here is pandas PSEUDO code to solve your problem. It is close to real code but I need to know the names of the columns that you want to match on. I assumed here the one column in sn.txt is called key, and the matching column in merge_txt is called sn. I also see you have two header lines in merge_exp, read the docs for that.
# PSEUDO CODE (but close)
import pandas
left = pandas.read_csv('sn.txt')
right = pandas.read_csv('merge_exp.exp')
out = pandas.merge(left, right, left_on="key", right_on="sn", how='left')
out.to_csv("outx.txt")

We Keep Coding

Python is a programming language that lets you work quickly and integrate systems more effectively.

how to join various bits of string and data together using python - python

Related

Removing Single Quotes from a String Stored in an Array

Python 2.7: how to append lines which contain both string & numeric values to text file?

how to manipulate SREC file

python Printing One result per line

Use grep on file in Python

Categories

Resources