Add leading 0 to column of data in data frame - python

Existing data in data frame has dropped leading zero which is required as a part number. The number stored, needs to be 8 digits long and now they vary based on the removal of the leading 0's
Sorry I am new to Python and this may be built in function in Pandas but I have not found a way to convert this type of formatting.
I have over 2000 part numbers to convert over all
EG:
Part No
9069
38661
90705
9070
907
970206
Part number needs to be:
Part No
00009069
00038661
00090705
00009070
00000907
00970206

Use astype before using zfill, as follows:
df['Part'].astype(str).str.zfill(8)

Related

Integer out of range. when I use to_numeric() on string in Python

I have a column containing numbers in a string format. I want to convert them to numbers so I use to_numeric() on the said column and then get Integer out of range error.
Here is a number straight from my dataset:
x: 3934057714693296797966
df_test['Number_converted']=pd.to_numeric(df_test['Number_string'])
The largest number I have in my population is 28 digits. Any ideas how I can deal with this error?
Looks like the thing I am using cannot handle such big numbers

combining real and imag columns in dataframe into complex number to obtain magnitude using np.abs

I have a data frame that has complex numbers split into a real and an imaginary column. I want to add a column (2, actually, one for each channel) to the dataframe that computes the log magnitude:
` ch1_real ch1_imag ch2_real ch2_imag ch1_phase ch2_phase distance
79 0.011960 -0.003418 0.005127 -0.019530 -15.95 -75.290 0.0
78 -0.009766 -0.005371 -0.015870 0.010010 -151.20 147.800 1.0
343 0.002197 0.010990 0.003662 -0.013180 78.69 -74.480 2.0
80 -0.002686 0.010740 0.011960 0.013430 104.00 48.300 3.0
341 -0.007080 0.009033 0.016600 -0.000977 128.10 -3.366 4.0
If I try this:
df['ch1_log_mag']=20*np.log10(np.abs(complex(df.ch1_real,df.ch1_imag)))
I get error: "TypeError: cannot convert the series to <class 'float'>", because I think cmath.complex cannot work on an array.
So I then experimented using loc to pick out the first element of ch1_real, for example, to then work out how use it to accomplish what I'm trying to do, but couldn't figure out how to do it:
df.loc[0,df['ch1_real']]
This produces a KeyError.
Brute forcing it works,
df['ch1_log_mag'] = 20 * np.log10(np.sqrt(df.ch1_real**2+ df.ch1_imag**2))
but, I believe it is more legible to use np.abs to get the magnitude, plus I'm more interested in understanding how dataframes and indexing dataframes work and why what I initially attempted does not work.
btw, what is the difference between df.ch1_real and df['ch1_real'] ? When do I use one vs. the other?
Edit: more attempts at solution
I tried using apply, since my understanding is that it "applies" the function passed to it to each row (by default):
df.apply(complex(df['ch1_real'], df['ch1_imag']))
but this generates the same TypeError, since I think the issue is that complex cannot work on Series. Perhaps if I cast the series to float?
After reading this post, I tried using pd.to_numeric to convert a series to type float:
dfUnique.apply(complex(pd.to_numeric(dfUnique['ch1_real'],errors='coerce'), pd.to_numeric(dfUnique['ch1_imag'],errors='coerce')))
to no avail.
You can do simple multiplication with 1j which denotes the complex number 0+1j, see imaginary literals:
df['ch1_log_mag'] = 20 * np.log10((df.ch1_real + 1j * df.ch1_imag).abs())
complex(df.ch1_real, df.ch1_imag) doesn't work as it needs a float argument, not a whole series. df.loc[0,df['ch1_real']] is not a valid expression, as the second argument must be a string, not a series (df.loc[79,'ch1_real'] would work for accessing an element).
If you want to use apply it should be 20 * np.log10(df.apply(lambda x: complex(x.ch1_real, x.ch1_imag), 1).abs()) but as apply is just a disguised loop over the rows of the dataframe it's not recommended performancewise.
There's no difference between df.ch1_real and df['ch1_real'], it's a matter of personal preference. If your column name contains spaces or dots or the like you must use the latter form however.

0's in the beginning are being skipped & I'm not sure how to fix it

The output value is not including the 0's in the beginning, can someone help me fix the problem?
def bitwiseOR(P, Q):
return bin(P | Q)
bitwiseOR(0b01010111, 0b00111000)
OUTPUT: '0b1111111'
The leading zeroes are just for representation, so you can utilize Format Specification Mini-Language to display them as you wish:
Format string:
# Includes 0b prefix
0{length} Pad leading zeroes so total length is length
def bitwiseOR(P, Q, length=10):
return format(P | Q, f'#0{length}b')
x = bitwiseOR(0b01010111, 0b00111000)
# 0b01111111
print(x)
Leading zeros are a property of the string you produce, not the number. So, for example, if you're looking for a way to make the following two calls produce different results, that's not possible:1
bitwiseOR(0b01010111, 0b00111000)
bitwiseOR( 0b1010111, 0b111000)
However, if you can provide the number of digits separately, then you can do this using the format() function. It accepts a second argument which lets you customize how the number is printed out using the format spec. Based on that spec, you can print a number padded with zeros to a given width like this:
>>> format(127, '#010b')
'0b01111111'
Here the code consists of four pieces:
# means apply the 0b prefix at the beginning
0 means pad with leading zeros
10 means the total length of the resulting string should be at least 10 characters
b means to print the number in binary
You can tweak the format code to produce your desired string length, or even take the length from a variable.
1Well... technically there is a way to make Python re-read its own source code and possibly produce different results that way, but that's not useful in any real program, it's only useful if you want to learn something about how the Python interpreter works.

Historical database number formatting

Currently I am working with a historical database (in MS Access) containing passages of ships through the Sound (the strait between Denmark and Sweden).
I am having problems with the way amounts of products on board of ships were recorded. This generally takes the following forms:
12 1/15 (integer - space - fraction)
1/4 (fraction)
1 (integer)
I'd like to convert all these numbers to floats/decimal, in order to do some calculations. There are some additional challenges which are mainly caused by the lack of uniform input:
-not all rows have a value
-some rows have value: '-', i'd like to skip these
-some rows contain '*' when a number or a part of a number is missing, these can be skipped too
My first question is: Is there a way I could directly convert this in Access SQL? I have not been able to find anything but perhaps I overlooked something.
The second option I attempted is to export the table (called cargo), use python to convert the value and then output it and import the table again. I have a function to convert the standard three formats:
from fractions import Fraction
import pandas
import numpy
def fracToString(number):
conversionResult = float(sum(Fraction(s) for s in number.split()))
return conversionResult
df = pandas.read_csv('cargo.csv', usecols = [0,5], header = None, names = ['id_passage', 'amount'])
df['amountDecimal'] = df['amount'].dropna().apply(fracToString)
This works for empty rows, however the values containing '*' or '-' or other characters that the fractToString function can't handle raise a ValueError. Since these are just a couple of records out of over 4 million these can be omitted. Is there a way to tell pandas.apply() to just skip to the next row if the fracToString function throws a ValueError?
Thank you in advance,
Alex

randint with leading zero's

I want to generate numbers from 00000 to 99999.
with
number=randint(0,99999)
I only generate values without leading zero's, of course, a 23 instead of a 00023.
Is there a trick to generate always 5 digit-values in the sense of %05d or do I really need to play a python-string-trick to fill the missing 0s at front in case len() < 5?
Thanks for reading and helping,
B
You will have to do a python-string-trick since an integer, per se, does not have leading zeroes
number="%05d" % randint(0,99999)
The numbers generated by randint are integers. Integers are integers and will be printed without leading zeroes.
If you want a string representation, which can have leading zeroes, try:
str(randint(0, 99999)).rjust(5, "0")
Alternatively, str(randint(0, 99999)).zfill(5), which provides slightly better performance than string formatting (20%) and str.rjust (1%).
randint generates integers. Those are simple numbers without any inherent visual representation. The leading zeros would only be visible if you create strings from those numbers (and thus another representation).
Thus, you you have to use a strung function to have leading zeros (and have to deal with those strings later on). E.g. it's not possible to do any calculations afterwards. To create these strings you can do something like
number = "%05d" % random.randint(0,99999)
The gist of all that is that an integer is not the same as a string, even if they look similar.
>>> '12345' == 12345
False
For python, you're generating a bunch of numbers, only when you print it / display it is it converted to string and thus, it can have padding.
You can as well store your number as a formatted string:
number="%05d" % random.randint(0,9999)

Categories