I wanted to align all the strings in each column of a file in a particular order using a python script. I have described the problem and the possible outcome using a sample scenario.
#sample.txt
start() "hello"
appended() "fly"
instantiated() "destination"
do() "increment"
logging_sampler() "dummy string"
Output scenario
#sample.txt(indented)
start() "hello"
appended() "fly"
instantiated() "destination"
do() "increment"
logging_sampler() "dummy string"
So is there any python library that can process a file and provide the above indentation?
Is there any general solution such that if I have a file with more that 2 columns and I can still indent all the columns in the same manner?
So is there any python library that can process a file and provide the above indentation? NO
Is this Possible? Yes
You need to know a way to parse your lines and then display in a formatted manner
In your particular case, parsing is straight forward as you just need to split the string based on the first occurring space. This can easily be done using str.partition. At times you may even need some exotic parsing logic for which you may need to use regex.
Formatting is even simpler, if you know the Format String Syntax.
Demo
>>> for e in st.splitlines():
left,_,right = e.partition(' ')
print "{:<20}{:<20}".format(left, right)
start() "hello"
appended() "fly"
instantiated() "destination"
do() "increment"
logging_sampler() "dummy string"
Adapted from this one here's a function takes a list of string lists, and returns a list formatted lines.
def table(lines, delim='\t'):
lens = [len(max(col, key=len)) for col in zip(*lines)]
fmt = delim.join('{:' + str(x) + '}' for x in lens)
return [fmt.format(*line) for line in lines]
The rest is trivial:
import re
with open(__file__) as fp:
lines = [re.split(r' ', s.strip(), maxsplit=1) for s in fp]
print '\n'.join(table(lines))
http://ideone.com/9WucPj
You can use the tab character ("\t") in your printing, but I'm not clear how you are printing sample.txt.
print string1+"\t"+string2
See here for more details
Related
I'm very new to programming and am working on some code to extract data from a bunch of text files. I've been able to do this however the data is not useful to me in Excel. Therefore, I would like to print it all on a single line and separate it by a special character, which I can then delimit in Excel.
Here is my code:
import os
data=['Find me','find you', 'find us']
with open('C:\\Users\\Documents\\File.txt', 'r') as inF:
for line in inF:
for a in data:
string=a
if string in line:
print (line,end='*') #print on same line
inF.close()
So basically what I'm doing is finding if a keyword is on that line and then printing that line if it is.
Even though I have print(,end='*'), I don't get the print on a single line. It outputs:
Find me
*find you
*find us
Where is the problem? (I'm using Python 3.5.1)
Your immediate problem is that you're not removing the newline characters from your lines before printing them. The usual way to do this is with strip(), eg:
print(line.strip(), end='*')
You'll also print multiple copies of the line if more than one of your special phrases appear in the line. To avoid that, add a break statement after your print, or (better, but a more advanced construct that might not make sense until you're used to generator expressions) use if any(keyword in line for keyword in data):
You also don't need to explicitly close the input file - the point of the with open(...) as ...: context manager is that it closes the file when exiting it.
And I would avoid using string as a variable name - it doesn't tell anyone anything about what the variable is used for, and it can cause confusion if you end up using the built-in string module for anything. It's not as bad as shadowing a built-in constructor like list, but it's worth avoiding. Especially since it does nothing for you here, you can just use if a in line: here if you don't want to use the any() version above.
In addition to all that, if your data is not extremely large (and I hope it's not if you're trying to fit it all on one line) you'll get tidier code and avoid the trailing delimiter by using the .join() method on strings, eg something like:
import os
data=['Find me','find you', 'find us']
with open('C:\\Users\\Documents\\File.txt', 'r') as inF:
print "*".join(line.strip() for line in inF if any(keyword in line for keyword in data))
I have a letter in LaTeX format. I'd like to write a short script in python that takes one argument (the addressee) and creates a .tex file with the general letter format and the addressee.
from sys import argv
script, addressee = argv
file = open('newletter.tex', 'w')
file.write("\begin{Document} Dear " + addressee + ", \n Greetings, how are you? Sincerely, Me \end{Document}")
file.close()
Is there a better function to write out large blocks of text? Also, you can see that the .tex file will contain programming syntax - will python disregard this as long as it is coerced to a string? Do I need to coerce a large block to string? Thanks in advance!
If you directly enter print "\begin..." into your interpreter, you will notice the result will omit the \b on the front of the string. This is because \b is a character that the print statement (or function if you're in 3.x) recognizes (it happens to be a backspace).
To avoid this confusion, you can use a "raw string", which in python is denoted by pre-pending an 'r':
>>> a = "\begin"
>>> b = r"\begin"
>>> print a
egin
>>> print b
\begin
>>>
Typically, when working with strings to represent file paths, or anything else which may contain a \ character, you should use a raw string.
As far as inserting information into a template, I would recommend using the format() function rather than string concatenation. To do this, your string would look like this:
r"\begin{{Document}} Dear {} \n Greetings, how are you? Sincerely, Me \end{{Document}}".format(addressee)
The argument of the function (in this case addressee) will be inserted into each {} within the string. For this reason, curly brackets which should be interpreted literally must be escaped by included them in duplicate.
I'd take the approach of creating the tex files first as letter.tex with the addressee set to something like QXQ_ADDRESSEE_QXQ.
The in the python script I'd read the entire file into memory. When you read from a file, it gets treated as a raw string with proper escaping.
with open('letter.tex', 'r') as f:
raw_letter = f.readlines()
Then just do a substitution and write the string to a file.
raw_letter.replace("QXQ_ADDRESSEE_QXQ", newname)
with open('newletter.tex', 'w') as f:
f.write(raw_letter)
I am new to programming and have already checked other people's questions to make sure that I am using a good method to replace tabs with spaces, know my regex is correct, and also understand what exactly my error is ("Unhashable type 'list'). But even still, I'm at a loss of what to do. Any help would be great!
I have a large file that I have broken up into lines. Ultimately I will need to access the first 3 elements of each line. Currently when I print a line, without the additional re.sub line of code, I get something like this: ['blah\tblah\tblah'], when I want ['blah blah blah'].
My code to do this is
f = open(text.txt)
raw = f.read()
raw = raw.lower()
lines = raw.splitlines()
lines = re.sub(r'\t', lines, '\s')
print lines[0:2] #just to see the first few examples
f.close()
When I print the first few lines without the regex sub bit, it works fine. And then when I add that line in attempt to change the lines, I get the error. I understand that lists are changeable and thus can't be a hashed... but I'm not trying to work with a hash. I'm just trying to replace \t with \s in a large text file to make the program easier to work with. I don't think there is a problem with how I am changing \t's to \s's, because according to this error, any way I change it will break my code. What do I do?! Any help is super appreciated. :')
You need to change the order of params present inside the re.sub function. And also note that you can't use regex \s as a second param in re.sub function. Syntax of re.sub must be re.sub(regex,replacement,string) .
lines = raw.splitlines()
lines = [re.sub(r'\t', ' ', line) for line in lines]
raw.splitlines() returns a list which was then assigned to a variable called lines. So you need to apply the re.sub function to each item present in the list, since re.sub won't directly be applied on a list.
Caveat emptor: I can spell p-y-t-h-o-n and that's pretty much all there is to my knowledge. I tried to take some online classes but after about 20 lectures learning not much, I gave up long time ago. So, what I am going to ask is very simple but I need help:
I have a file with the following structure:
object_name_here:
object_owner:
- me#my.email.com
- user#another.email.com
object_id: some_string_here
identification: some_other_string_here
And this block repeats itself hundreds of times in the same file.
Other than object_name_here being unique and required, all other lines may or may not be present, email addresses can be from none to 10+ different email addresses.
what I want to do is to export this information into a flat file, likes of /etc/passwd, with a twist
for instance, I want the block above to yield a line like this:
object_name_here:object_owner=me#my_email.com,user#another.email.com:objectid=some_string_here:identification=some_other_string_here
again, the number of fields or length of the content fields are not fixed by any means. I am sure this is pretty easy task to accomplish with python but how, I don't know. I don't even know where to start from.
Final Edit: Okay, I am able to write a shell script (bash, ksh etc.) to parse the information, but, when I asked this question originally, I was under the impression that, python had a simpler way of handling uniform or semi-uniform data structures as this one. My understanding was proven to be not very accurate. Sorry for wasting your time.
As jaypb points out, regular expressions are a good idea here. If you're interested in some python 101, I'll give you some simple code to get you started on your own solution.
The following code is a quick and dirty way to lump every six lines of a file into one line of a new file:
# open some files to read and write
oldfile = open("oldfilename","r")
newfile = open("newfilename","w")
# initiate variables and iterate over the input file
count = 0
outputLine = ""
for line in oldfile:
# we're going to append lines in the file to the variable outputLine
# file.readline() will return one line of a file as a string
# str.strip() will remove whitespace at the beginning and end of a string
outputLine = outputLine + oldfile.readline().strip()
# you know your interesting stuff is six lines long, so
# reset the output string and write it to file every six lines
if count%6 == 0:
newfile.write(outputLine + "\n")
outputLine = ""
# increment the counter
count = count + 1
# clean up
oldfile.close()
newfile.close()
This isn't exactly what you want to do but it gets you close. For instance, if you want to get rid of " - " from the beginning of the email addresses and replace it with "=", instead of just appending to outputLine you'd do something like
if some condition:
outputLine = outputLine + '=' + oldfile.readline()[3:]
that last bit is a python slice, [3:] means "give me everything after the third element," and it works for things like strings or lists.
That'll get you started. Use google and the python docs (for instance, googling "python strip" takes you to the built-in types page for python 2.7.10) to understand every line above, then change things around to get what you need.
Since you are replacing text substrings with different text substrings, this is a pretty natural place to use regular expressions.
Python, fortunately, has an excellent regular expressions library called re.
You will probably want to heavily utilize
re.sub(pattern, repl, string)
Look at the documentation here:
https://docs.python.org/3/library/re.html
Update: Here's an example of how to use the regular expression library:
#!/usr/bin/env python
import re
body = None
with open("sample.txt") as f:
body = f.read()
# Replace emails followed by other emails
body = re.sub(" * - ([a-zA-Z.#]*)\n * -", r"\1,", body)
# Replace declarations of object properties
body = re.sub(" +([a-zA-Z_]*): *[\n]*", r"\1=", body)
# Strip newlines
body = re.sub(":?\n", ":", body)
print (body)
Example output:
$ python example.py
object_name_here:object_owner=me#my.email.com, user#another.email.com:object_id=some_string_here:identification=some_other_string_here
I'm a python beginner and just ran into a simple problem: I have a list of names (designators) and then a very simple code that reads lines in a csv file and prints the csv lines that has a name in the first column (row[0]) in common with my "designator list". So:
import csv
DesignatorList = ["AAX-435", "AAX-961", "HHX-9387", "HHX-58", "K-58", "K-14", "K-78524"]
with open('DesignatorFile.csv','rb') as FileReader:
for row in csv.reader(FileReader, delimiter=';'):
if row[0] in DesignatorList:
print row
My csv files is only a list of names, like this:
AAX-435
AAX-961
HHX-58
HHX-9387
I would like to be able to use wildcards like * and ., example: let's say that I put this on my csv file:
AAX*
H.X-9387
*58
I need my code to be able to interpret those wild cards/control characters, printing the following:
every line that starts with "AAX";
every line that starts with "H", then any following character, then finally ends with "X-9387";
every line that ends with "58".
Thank you!
EDIT: For future reference (in case somebody runs into the same problem), this is how I solved my problem following Roman advice:
import csv
import re
DesignatorList = ["AAX-435", "AAX-961", "HHX-9387", "HHX-58", "K-58", "K-14", "K-78524"]
with open('DesignatorFile.txt','rb') as FileReader:
for row in csv.reader(FileReader, delimiter=';'):
designator_col0 = row[0]
designator_col0_re = re.compile("^" + ".*".join(re.escape(i) for i in designator_col0.split("*")) + "$")
for d in DesignatorList:
if designator_col0_re.match(d):
print d
Try the re module.
You may need to prepare regular expression (regex) for use by replacing '*' with '.*' and adding ^ (beginning of a string) and $ (end of string) to the beginning and the end of the regular expression. In addition, you may need to escape everything else by re.escape function (that is, function escape from module re).
In case you do not have any other "control characters" (as you call them), splitting the string by "*" and joining by ".*" after applying escape.
For example,
import re
def make_rule(rule): # where rule for example "H*X-9387"
return re.compile("^" + ".*".join(re.escape(i) for i in rule.split("*")) + "$")
Then you can match (I guess, your rule is row):
...
rule_re = make_rule(row)
for d in DesignatorList:
if rule_re.match(d):
print row # or maybe print d
(I have understood, that rules are coming from CSV file while designators are from a list. It's easy to do it the other way around).
The examples above are examples. You still need to adapt them into your program.
Python's string object does have a startswith and an endswith method, which you could use here if you only had a small number of rules. The most general way to go with this, since you seem to have fairly simple patterns, is regular expressions. That way you can encode those rules as patterns.
import re
rules = ['^AAX.*$', # starts with AAX
'^H.*X-9387$', # starts with H, ends with X-9387
'^.*58$'] # ends with 58
for line in reader:
if any(re.match(rule, line) for rule in rules):
print line