I need help with coding.
I wrote code to get last 2 rows from csv file and after that saving it to another file.
The code looks like this:
with open(outputFileName,"r") as f:
reader = csv.reader(f,delimiter = ",")
data = list(reader)
row_count = len(data)
df = pd.read_csv(outputFileName, skiprows = row_count - 2)
df.to_csv('D:\koreguoti.csv', index=False)
Data in file now looks like: (but without names Column1 and Column2. I jus want to show you, that information is in diferent columns)
Column1 | Column2
2021.03.17 12:00:00 P+ 0 | 644.0
0 2021.03.17 12:00:00 P- 0 | 6735.0
So I need to have it in this format (with names of columns):
Date | Time | P | Value
0 2021.03.17 | 12:00:00 | P+| 644.0
1 2021.03.17 | 12:00:00 | P-| 6735.0
Could anybody to help me?
I'd do a text = split(csv_file) and keep only what I want then reorganise them.
For exemple if you have :
Column 1 Column 2
12 15
You do text = split(csv_file) so it gives you text = ["Column", "1", "Column", "2", "12", "15"]
And you just take the two last ones and do a
print("Month : Day :\n\
{} {}".format(text[4], text[5])
and that's it.
Of course you need to change some things until it works for you.
Solved, by working arround
df['0'] = ['no']
df['1'] = ['no']
df['2'] = ['no']
df.to_csv('D:\koreguoti1.csv', index=False)
#---------------------------------------------------------------------------
#Rename column names
df = pd.read_csv('D:\koreguoti1.csv', header=None)
df.rename(columns={0: 'Data',1: 'Laikas', 2: 'P', 3: 'Nulis', 4: 'Verte'}, inplace=True)
# Copy values from one column to another
df['Verte'] = df['Laikas']
# Split first columns to 4 columns
split_data = df["Data"].str.split(" ")
data = split_data.to_list()
names = ["Data", "Laikas", "P", "Nulis"]
new_df = pd.DataFrame(data, columns=names)
new_df.insert(4, "Verte", 0)
# adding needed column
new_df['Verte'] = df['Laikas']
# Deleting not needed column "Nulis"
del new_df['Nulis']
#print(new_df)
# Save everything to new file
new_df.to_csv('D:\sutvarkyti.csv', index=False)
I'm trying to modify the Address column data by removing all the characters before the comma.
Sample data:
**ADDRESS**
0 Ksfc Layout,Bangalore
1 Vishweshwara Nagar,Mysore
2 Jigani,Bangalore
3 Sector-1 Vaishali,Ghaziabad
4 New Town,Kolkata
Expected Output:
**ADDRESS**
0 Bangalore
1 Mysore
2 Bangalore
3 Ghaziabad
4 Kolkata
I tried this code but it's not working can someone correct the code?
import pandas as pd
import regex as re
data = pd.read_csv("train.csv")
data.ADDRESS.replace(re.sub(r'.*,',"", data.ADDRESS), regex=True, inplace=True)
Try this:
data.ADDRESS = data.ADDRESS.str.split(',').str[-1]
You can do it without a regex:
def removeFirst(x):
return x.split(",")[-1]
df['ADDRESS'] = df['ADDRESS'].apply(removeFirst)
You can try like this without Regex:
data['ADDRESS'] = data['ADDRESS'].str.split(',').str[-1]
Use Series.str.replace:
data['ADDRESS'] = data['ADDRESS'].str.replace(r'.*,', '')
See proof
I am trying to re-arrange a file to match a BACS bank format. In order for it to work the columns in the csv below need to be of a specific length. I have figured out the abcdabcd column as it's a repeating pattern (as are a couple more in the file), but several columns have random numbers that I cannot easily target.
Is there a way for me to target either (ideally) a specific column based on its header, or alternatively target everything up to a comma to butcher something that could work?
In my example file below, you'll see three columns where the value changes. If targeting everything up to a specific character is the solution, I was thinking of using .ljust to fill the column up to the specified length (and then sorting it out manually in excel).
Original File
a,b,c,d,e,f,g,h,i,j,k
12345,1234567,0,11,123456,12345678,1234567,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,12345678,abcdabcd,A ABCD
123456,1234567,0,11,123456,12345678,12345,abcdabcd,A ABCD
12345,1234567,0,11,123456,12345678,1234567,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456789,abcdabcd,A ABCD
Ideal output
a,b,c,d,e,f,g,h,i,j,k
123450,12345670,0,11,123456,12345678,123456700,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456780,abcdabcd,A ABCD
123456,12345670,0,11,123456,12345678,123450000,abcdabcd,A ABCD
123450,12345670,0,11,123456,12345678,123456700,abcdabcd,A ABCD
123456,12345678,0,11,123456,12345678,123456789,abcdabcd,A ABCD
Code
with open('file.txt', 'r') as file :
filedata = file.read()
filedata = filedata.replace('12345', '12345'.ljust(6, '0'))
with open('file.txt', 'w') as file:
file.write(filedata)
EDIT:
Something similar to this Python - How to add zeros to and integer/string? but while either targeting a specific column per header, or at least the first one.
EDIT2:
I am using the below to rearrange my columns, could this be modified to work with string lengths?
import pandas as pd
## Read csv / tab-delimited in this example
df = pd.read_csv('test.txt', sep='\t')
## Reorder columns
df = df[['h','i','c','g','a','b','e','d','f','j','k']]
## Write csv / tab-delimited
df.to_csv('test', sep='\t')
Using pandas, you can convert the column to str and then use .str.pad. You can make a dict with the requested lengths:
lengths = {
"a": 6,
"b": 8,
"c": 3,
"d": 6,
"e": 8,
}
and use it like this:
result = pd.DataFrame(
{
column_name: column.str.pad(
lengths.get(column_name, 0), side="right", fillchar="0"
)
for column_name, column in df.astype(str).items()
}
)
If the fillchar is different per column, you can get that from a dict as well
>>> print '{:0>5}'.format(4)
'00004'
>>> print '{:0<5}'.format(4)
'40000'
>>> print '{:0^5}'.format(4)
'00400'
Example:
#--------------DEFs------------------
def number_zero_right(number,len_number):
return ('{:0<'+str(len_number)+'}').format(number)
#--------------MAIN------------------
a = 12345
b = 1234567
c = 0
d = 11
e = 123456
f = 12345678
g = 1234567
h = 'abcdabcd'
i = 'A'
j = 'ABCD'
print(a,b,c,d,e,f,g,h,i,j)
# > 12345 1234567 0 11 123456 12345678 1234567 abcdabcd A ABCD
a = number_zero_right(a,6)
b = number_zero_right(b,8)
c = number_zero_right(c,1)
d = number_zero_right(d,2)
e = number_zero_right(e,6)
f = number_zero_right(f,8)
g = number_zero_right(g,9)
print(a,b,c,d,e,f,g,h,i,j)
#> 123450 12345670 0 11 123456 12345678 123456700 abcdabcd A ABCD
Managed to get there, so thought I'd post in case someone has a similar issue. This only works on one column, but that's enough for me now.
#import pandas
import pandas as pd
#open file and convert data to str
data = pd.read_csv('Test.CSV', dtype = str)
# width of output string
width = 6
# fillchar
char ="_"
#Change the contents of column named ColumnID
data["ColumnID"]= data["ColumnID"].str.ljust(width, char)
#print output
print(data)
I'm trying to update the strings in a .csv file that I am reading using Pandas. The .csv contains the column name 'about' which contains the rows of data I want to manipulate.
I've already used str. to update but it is not reflecting in the exported .csv file. Some of my code can be seen below.
import pandas as pd
df = pd.read_csv('data.csv')
df.About.str.lower() #About is the column I am trying to update
df.About.str.replace('[^a-zA-Z ]', '')
df.to_csv('newdata.csv')
You need assign output to column, also is possible chain both operation together, because working with same column About and because values are converted to lowercase, is possible change regex to replace not uppercase:
df = pd.read_csv('data.csv')
df.About = df.About.str.lower().str.replace('[^a-z ]', '')
df.to_csv('newdata.csv', index=False)
Sample:
df = pd.DataFrame({'About':['AaSD14%', 'SDD Aa']})
df.About = df.About.str.lower().str.replace('[^a-z ]', '')
print (df)
About
0 aasd
1 sdd aa
import pandas as pd
import numpy as np
columns = ['About']
data = ["ALPHA","OMEGA","ALpHOmGA"]
df = pd.DataFrame(data, columns=columns)
df.About = df.About.str.lower().str.replace('[^a-zA-Z ]', '')
print(df)
OUTPUT:
Example Dataframe:
>>> df
About
0 JOHN23
1 PINKO22
2 MERRY jen
3 Soojan San
4 Remo55
Solution:,another way Using a compiled regex with flags
>>> df.About.str.lower().str.replace(regex_pat, '')
0 john
1 pinko
2 merry jen
3 soojan san
4 remo
Name: About, dtype: object
Explanation:
Match a single character not present in the list below [^a-z]+
+ Quantifier — Matches between one and unlimited times, as many times as possible, giving back as needed (greedy) a-z a single character in
the range between a (index 97) and z (index 122) (case sensitive)
$ asserts position at the end of a line