Related
I was trying to write a Python script to extract texts from text file and write it into excel file.
The question is I do not know how to extract the strings next to equal.
I am new to Python, at this stage just manage to open the file.
The data looks like below:
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x020a", RAC="0x68", RACRANGE="0x73", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="511100001243", ENDIMSI="53110100270380", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x01Fa", RAC="0x67", RACRANGE="0x63", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="", ENDIMSI="", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
Output should be like below:
#!/usr/bin/python
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s+=\s(.*)'.format('|'.join(fieldnames)), re.I)
with open('input.txt') as f_input, open('output.csv', 'wb') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
csv_output.writeheader()
Your corrected pattern is HERE
I would break text like that into BLOCKS and then find the matches in each block:
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s*=\s*([^,]+),'.format('|'.join(fieldnames)))
with open(fn) as f_input:
data=f_input.read()
for block in re.finditer(r'(?<=ADD IUACCAREALST:)(.*?)(?=ADD IUACCAREALST:|\Z)', data, flags=re.S | re.M):
print(re_fields.findall(block.group(1)))
Prints:
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
At that point, use each list of tuples to create a dict forming that csv record; write it to the csv file. Done!
I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')
I have a bunch of CSV files which I will be combining to a single CSV file named 'Combined'. For each CSV file, once the data is appended to the 'Combined' file, I want to insert a fresh column before column 1 in 'Combined' and insert the name of the CSV file from which data was copied in that iteration. Is there any way of doing this in Python?
This can be done as follows. First open a CSV file for output. Now use Python's glob library to list you all of the CSV files in a folder. For each row in a CSV file, prefix the filename as the first column entry and then write it to output.csv:
import glob
import csv
with open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
for filename in glob.glob('*.csv'):
with open(filename, newline='') as f_input:
csv_input = csv.reader(f_input)
for row in csv_input:
row.insert(0, filename)
csv_output.writerow(row)
So for example, if you had these two CSV files:
num.csv
1,2,3,4,5
1,2,3,4,5
1,2,3,4,5
letter.csv
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
a,b,c,d,e,f
It would create the following output.csv file:
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
letter.csv,a,b,c,d,e,f
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
num.csv,1,2,3,4,5
This assumes you are using Python 3.x.
I want to write a simple script which will parse a text file of mine.
Pattern is the following:
0.061024 seconds for Process 0 to send.
0.060062 seconds for Process 1 to receive.
This goes on in a loop.
The python file looks like this:
import fileinput, csv
data = []
for line in fileinput.input():
time, sep, status = line.partition("seconds")
if sep:
print(time.strip())
with open('result.csv', 'w') as f:
w = csv.writer(f)
w.writerow('send receive'.split())
w.writerows(data)
this gives me the desired output on the bash and also creates two columns with the send and receive. How do I fill them with input which is printed by
print(time.strip())
I would like to have this output in a CSV file in two columns.
how shall I do it?
You can use the writer function that comes with the csv module:
import csv
with open('file.csv', 'wb') as csvfile:
cwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for var in list_of_values:
cwriter.writerow(var)
This takes into consideration, that you have all the rows as separate lists within list_of_values, as in:
list_of_values = [['col1', 'col2'],['col1', 'col2']]
I have CSV files in which Data is formatted as follows:
file1.csv
ID,NAME
001,Jhon
002,Doe
fille2.csv
ID,SCHOOLS_ATTENDED
001,my Nice School
002,His lovely school
file3.csv
ID,SALARY
001,25
002,40
ID field is kind of primary key that will be used to fetch record.
What is the most efficient way to read 3 to 4 files and get corresponding data and store in another CSV file having headings (ID,NAME,SCHOOLS_ATTENDED,SALARY)?
The file sizes are in the hundreds of MBs (100, 200 Mb).
Hundreds of megabytes aren't that much. Why not go for a simple approach using the csv module and collections.defaultdict:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
The resulting defaultdict looks like this:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
You could then combine that into a CSV file (not my prettiest work, but good enough for now):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv then contains
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
Following is the working code for combining multiple csv files with specific keywords in their names into 1 final csv file. I have set the default keyword to "file" but u can set it blank if u want to combine all csv files from a folder_path. This code will take header from your first csv file and use it as a header in final combined csv file. It will ignore headers of all other csv files.
import glob,os
#staticmethod
def Combine_multiple_csv_files_thatContainsKeywordInTheirNames_into_one_csv_file(folder_path,keyword='file'):
#takes header only from 1st csv, all other csv headers are skipped and data is appened to final csv
fileNames = glob.glob(folder_path + "*" + keyword + "*"+".csv") # fileNames INCLUDES FOLDER_PATH TOO
with open(folder_path+"Combined_csv.csv", "w", newline='') as fout:
print('Combining multiple csv files into 1')
csv_write_file = csv.writer(fout, delimiter=',')
# a.writerows(op)
with open(fileNames[0], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
csv_write_file.writerows(csv_read_file)
for num in range(1, len(fileNames)):
with open(fileNames[num], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
next(csv_read_file) # ignore header
csv_write_file.writerows(csv_read_file)