Python Output not Aligned Using Space and Tab - python

My code is here:
days = int(input("How many days did you work? : "))
totalSalary = 0
print("Day", "\tDaily Salary", "\tTotal Salary")
for day in range(days):
daily = 2**day
totalSalary += daily
print(day+1, "\t ", daily, "\t\t ", totalSalary)
When I enter 6 as input, here is the output:
Day Daily Salary Total Salary
1 1 1
2 2 3
3 4 7
4 8 15
5 16 31
6 32 63
Why last 2 lines are not aligned?
Edit: I forgot to say that I know there are better solutions like using format, but I just wanted to understand why there is problem with tabs and spaces.
Edit2: The visualization of tabstops in Jason Yang's answer satisfied me.

For statement
print(day+1, "\t ", daily, "\t\t ", totalSalary)
each '\t' will stop at 1, 9, 17, ..., at each 8th character
So it will look like this
1=------____=1=-........____=1
2=------____=2=-........____=3
3=------____=4=-........____=7
4=------____=8=-........____=15
5=------____=16=--------........=31
6=------____=32=--------........=63
12345678123456781234567812345678 <--- Tab stop before each 1
Here
= is the separator space between each two arguments of print
- is the space generated by not-last TAB
_ is the space specified by you in your print.
. is the sapce generated by last TAB.
From here you can find the differece why they stop at different position.
Try to add option sep='' in your print, or change the numbers of spaces you added.
print(day+1, "\t ", daily, "\t\t ", totalSalary, sep='')
then it will be fine.
How many days did you work? : 6
Day Daily Salary Total Salary
1 1 1
2 2 3
3 4 7
4 8 15
5 16 31
6 32 63

When the 2nd columns has double digit values, the rest of the tab and 3rd column gets shifted. You must use zero padding for the values, if you expect correctly aligned column values.
If your python version is 3+, and if you want 2 digit values, you can call print(f'{n:02}'), so as to print 01 if the value of n was having 1.
For 2.7+ version of python, you could use the format like so print('{:02d}'.format(n)).

tab is not just collection of white spaces
Any character before tab fills the space.
>>> print("A\tZ")
A Z
>>> print("AB\tZ")
AB Z
>>> print("ABC\tZ")
ABC Z
and if there are no spaces to fill tab, then it will be shifted
>>> print("ABCDEFGH\tZ")
ABCDEFGH Z

I suppose your question is due to a misunderstanding on what a tab character is and how it behaves:
A tab character should advance to the next tab stop. Historically tab stops were every 8th character, although smaller values are in common use today and most editors can be configured. Source:
How many spaces for tab character(\t)?
try this and see:
print('123456789')
print('1\t1')
print('12\t1')
print('123\t1')

I think that you added too much spaces in " print(day+1, "\t ", daily, "\t\t ", totalSalary)".
when you remove the tab spaces you will not get the "not aligned" problem.

Dynamic space declaration. Due to length of the digit
days = int(input("How many days did you work? : "))
totalSalary = 0
day_data = []
for day in range(days):
daily = 2**day
totalSalary += daily
day_data.append([day+1,daily,totalSalary])
num_space = len(str(day_data[-1][-1]))+2
f_space_len, s_space_len = 5+num_space, 9+num_space
print(f"Day{num_space*' '}Daily Salary{num_space*' '}Total Salary")
for i in day_data:
day, daily, totalSalary = map(str, i)
print(day, (f_space_len-len(day)+1)*' ', daily,(s_space_len-len(daily)+1)*' ', totalSalary)

Related

Credit Card Number Sensor Challenge

So I am trying to write a program that sensors the first 12 digits of a 'credit card number' and then prints it with the last 4 digits still there. (4444444444444444=************4444). I am having a problem making the loop stop at just the first 12 digits. The output is always 12 *'s. Here is the code:
credit=""
while len(credit)<16 or len(credit)>16:
credit=input("Please input credit card number: ")
x=1
k="*"
while x<12:
for ele in credit:
if ele.isdigit() and x<12:
credit=credit.replace(ele, k)
x=x+1
print(credit)
Without the re module you could this.
First of all ensure that the input is exactly 16 characters and is comprised entirely of digits.
Then print 12 asterisks (constant) followed by the last 4 characters of the input which can be achieved with traditional slicing.
while len(ccn := input('Please input credit card number: ')) != 16 or not ccn.isdigit():
print('Must be 16 digits. Try again')
print('*'*12, ccn[-4:], sep='')
import re
print(re.sub('\d{12}', '*'*12, credit))

Use regular expression to extract numbers before specific words

Goal
Extract number before word hours, hour, day, or days
How to use | to match the words?
s = '2 Approximately 5.1 hours 100 ays 1 s'
re.findall(r"([\d.+-/]+)\s*[days|hours]", s) # note I do not know whether string s contains hours or days
return
['5.1', '100', '1']
Since 100 and 1 are not before the exact word hours, they should not show up. Expected
5.1
How to extract the first number from the matched result
s1 = '2 Approximately 10.2 +/- 30hours'
re.findall(r"([\d. +-/]+)\s*hours|\s*hours", s)
return
['10.2 +/- 30']
Expect
10.2
Note that special characters +/-. is optional. When . appears such as 1.3, 1.3 will need to show up with the .. But when 1 +/- 0.5 happens, 1 will need to be extracted and none of the +/- should be extracted.
I know I could probably do a split and then take the first number
str(re.findall(r"([\d. +-/]+)\s*hours", s1)[0]).split(" ")[1]
Gives
'10.2'
But some of the results only return one number so a split will cause an error. Should I do this with another step or could this be done in one step?
Please note that these strings s1, s2 are the values in a dataframe. Therefore, iteration using function like apply and lambda will be needed.
In fact, I would use re.findall here:
units = ["hours", "hour", "days", "day"] # the order matters here: put plurals first
regex = r'(?:' + '|'.join(units) + r')'
s = '2 Approximately 5.1 hours 100 ays 1 s'
values = re.findall(r'\b(\d+(?:\.\d+)?)\s+' + regex, s)
print(values) # prints [('5.1')]
If you want to also capture the units being used, then make the units alternation capturing, i.e. use:
regex = r'(' + '|'.join(units) + r')'
Then the output would be:
[('5.1', 'hours')]
Code
import re
units = '|'.join(["hours", "hour", "hrs", "days", "day", "minutes", "minute", "min"]) # possible units
number = '\d+[.,]?\d*' # pattern for number
plus_minus = '\+\/\-' # plus minus
cases = fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'
pattern = re.compile(cases)
Tests
print(pattern.findall('2 Approximately 5.1 hours 100 ays 1 s'))
# Output: [5.1]
print(pattern.findall('2 Approximately 10.2 +/- 30hours'))
# Output: ['10.2']
print(pattern.findall('The mean half-life for Cetuximab is 114 hours (range 75-188 hours).'))
# Output: ['114', '75']
print(pattern.findall('102 +/- 30 hours in individuals with rheumatoid arthritis and 68 hours in healthy adults.'))
# Output: ['102', '68']
print(pattern.findall("102 +/- 30 hrs"))
# Output: ['102']
print(pattern.findall("102-130 hrs"))
# Output: ['102']
print(pattern.findall("102hrs"))
# Output: ['102']
print(pattern.findall("102 hours"))
# Output: ['102']
Explanation
Above uses the convenience that raw strings (r'...') and string interpolation f'...' can be combined to:
fr'...'
per PEP 498
The cases strings:
fr'({number})(?:[\s\d\-\+\/]*)(?:{units})'
Parts are sequence:
fr'({number})' - capturing group '(\d+[.,]?\d*)' for integers or floats
r'(?:[\s\d-+/]*)' - non capturing group for allowable characters between number and units (i.e. space, +, -, digit, /)
fr'(?:{units})' - non-capturing group for units

Returning values from a function in Python3 with correct formatting

working on a coding program that I think I have licked and I'm getting the correct values for. However the test conditions are looking for the values formatted in a different way. And I'm failing at figuring out how to format the return in my function correctly.
I get the values correctly when looking for the answer:
[4, 15, 7, 19, 1, 20, 13, 9, 4, 14, 9, 7, 8, 20]
but the test condition expects
should equal '20 8 5 14 1 18 23 8 1 12 2 1 3 15 14 19 1 20 13 9 4 14 9 7 8 20'
and for the life of me I haven't been able to figure this out yet.
View the original problem here: https://www.codewars.com/kata/546f922b54af40e1e90001da/train/python
Still very new to Python, but tackling these problems best I can. Code may be ugly, but it's mine =D
EDIT: I am looking for a way to reformat my return statement as a string instead of a list of integers.
Thanks for the help in advance! Any help is appreciated, even how to post better questions here.
Koruptedkernel.
import string
def alphabet_position(positions):
#Declaring final position list.
position = []
#Stripping punctuation from the passed string.
out1 = positions.translate(str.maketrans("","", string.punctuation))
#Stripping digits from the passed string.
out = out1.translate(str.maketrans("","", string.digits))
#Removing Spaces from the passed string.
outter = out.replace(" ","")
#reducing to lowercase.
mod_text = str.lower(outter)
#For loop to iterate through alphabet and report index location to position list.
for letter in mod_text:
#Declare list of letters (lower) in the alphabet (US).
alphabet = list('abcdefghijklmnopqrstuvwxyz')
position.append(alphabet.index(letter) + 1)
return(position)
#Call the function with text.
alphabet_position("The sunset sets at twelve o'clock.")
Here's your homework with some explanations (uses a comprehension which are the BEST):
# full alphabet (is technically a list)
alphabet = 'abcdefghijklmnopqrstuvwxyz'
# letter and index+1 for each letter
alphabet_dict = {u: str(i + 1) for i, u in enumerate(alphabet)}
def alphabet_position(text):
# map the text chars to their corresponding aplhabet_dict item
pos_list = [alphabet_dict.get(char.lower())
for char in text if char.lower() in alphabet_dict]
# convert to the insane string format they asked for
return ' '.join(pos_list)
alphabet_position("The sunset sets at twelve o' clock.")
Output:
'20 8 5 19 21 14 19 5 20 19 5 20 19 1 20 20 23 5 12 22 5 15 3 12 15 3 11'
Had some great homework with dictionaries and tried and successfully did it THAT way.
Went back to the original way I did it (not as clean or efficient), but managed to get the list successfully converted by changing my for loop to
position.append(str(alphabet.index(letter) + 1))
and then used
return ' '.join(position)
to get the answer in the format they were test for.
YES! I SOLVED A THING!
Thanks for all the help, I will try and get better at formulating questions to get better help.
TYTY

Removing rows from a DataFrame based on words in a string

Novice programmer here seeking help.
I have a Dataframe that looks like this:
Current
0 "Invest in $APPL, $FB and $AMZN"
1 "Long $AAPL, Short $AMZN"
2 "$AAPL earnings announcement soon"
3 "$FB is releasing a new product. Will $FB's product be good?"
4 "$Fb doing good today"
5 "$AMZN high today. Will $amzn continue like this?"
I also have a list with all the hashtags: cashtags = ["$AAPL", "$FB", $AMZN"]
Basically, I want to go through all the lines in this column of the DataFrame and keep the rows with a unique cashtag, regardless if it is in caps or not, and delete all others.
Desired Output:
Desired
2 "$AAPL earnings announcement soon"
3 "$FB is releasing a new product. Will $FB's product be good?"
4 "$Fb doing good today"
5 "$AMZN high today. Will $amzn continue like this?"
I've tried to basically count how many times the word appears in the string and add that value to a new column so that I can delete the rows based on the number.
for i in range(0,len(df)-1):
print(i, end = "\r")
tweet = df["Current"][i]
count = 0
for word in cashtags:
count += str(tweet).count(word)
df["Word_count"][i] = count
However if I do this I will be deleting rows that I don't want to. For example, rows where the unique cashtag is mentioned several times ([3],[5])
How can I achieve my desired output?
Rather than summing the count of each cashtag, you should sum its presence or absence, since you don't care how many times each cashtag occurs, only how many cashtags.
for tag in cashtags:
count += tag in tweet
Or more succinctly: sum(tag in tweet for tag in cashtags)
To make the comparison case insensitive, you can upper case the tweets beforehand. Additionally, it would be more idiomatic to filter on a temporary series and avoid explicitly looping over the dataframe (though you may need to read up more about Pandas to understand how this works):
df[df.Current.apply(lambda tweet: sum(tag in tweet.upper() for tag in cashtags)) == 1]
If you ever want to generalise your question to any tag, then this is a good place for a regular expression.
You want to match against (\$w+)(?!.*/1) see e.g. here for a detailed explanation, but the general structure is:
\$w+: find a dollar sign followed by one or more letters/numbers (or
an _), if you just wanted to count how many tags you had this is all you need
e.g.
df.Current.str.count(r'\$\w+')
will print
0 3
1 2
2 1
3 2
4 1
5 2
but this will remove cases where you have the same element more than once so you need to add a negative lookahead meaning don't match
(?!.*/1): Is a negative lookahead, this means don't match if it is followed by the same match later on. This will mean that only the last tag is counted in the string.
Using this, you can then use pandas DataFrame.str methods, specifically DataFrame.str.count (the re.I does a case insensitive match)
import re
df[df.Current.str.count(r'(\$\w+)(?!.*\1)', re.I) == 1]
which will give you your desired output
Current
2 $AAPL earnings announcement soon
3 $FB is releasing a new product. Will $FB's pro...
4 $Fb doing good today
5 $AMZN high today. Will $amzn continue like this?

Manipulate time-range in a pandas Dataframe

Need to clean up a csv import, which gives me a range of times (in string form). Code is at bottom; I currently use regular expressions and replace() on the df to convert other chars. Just not sure how to:
select the current 24 hour format numbers and add :00
how to select the 12 hour format numbers and make them 24 hour.
Input (from csv import):
break_notes
0 15-18
1 18.30-19.00
2 4PM-5PM
3 3-4
4 4-4.10PM
5 15 - 17
6 11 - 13
So far I have got it to look like (remove spaces, AM/PM, replace dot with colon):
break_notes
0 15-18
1 18:30-19:00
2 4-5
3 3-4
4 4-4:10
5 15-17
6 11-13
However, I would like it to look like this ('HH:MM-HH:MM' format):
break_notes
0 15:00-18:00
1 18:30-19:00
2 16:00-17:00
3 15:00-16:00
4 16:00-16:10
5 15:00-17:00
6 11:00-13:00
My code is:
data = pd.read_csv('test.csv')
data.break_notes = data.break_notes.str.replace(r'([P].|[ ])', '').str.strip()
data.break_notes = data.break_notes.str.replace(r'([.])', ':').str.strip()
Here is the converter function that you need based on your requested input data. convert_entry takes complete value entry, splits it on a dash, and passes its result to convert_single, since both halfs of one entry can be converted individually. After each conversion, it merges them with a dash.
convert_single uses regex to search for important parts in the time string.
It starts with a some numbers \d+ (representing the hours), then optionally a dot or a colon and some more number [.:]?(\d+)? (representing the minutes). And after that optionally AM or PM (AM|PM)? (only PM is relevant in this case)
import re
def convert_single(s):
m = re.search(pattern="(\d+)[.:]?(\d+)?(AM|PM)?", string=s)
hours = m.group(1)
minutes = m.group(2) or "00"
if m.group(3) == "PM":
hours = str(int(hours) + 12)
return hours.zfill(2) + ":" + minutes.zfill(2)
def convert_entry(value):
start, end = value.split("-")
start = convert_single(start)
end = convert_single(end)
return "-".join((start, end))
values = ["15-18", "18.30-19.00", "4PM-5PM", "3-4", "4-4.10PM", "15 - 17", "11 - 13"]
for value in values:
cvalue = convert_entry(value)
print(cvalue)

Categories