Extract strings from text file and write it to Excel - python

I was trying to write a Python script to extract texts from text file and write it into excel file.
The question is I do not know how to extract the strings next to equal.
I am new to Python, at this stage just manage to open the file.
The data looks like below:
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x020a", RAC="0x68", RACRANGE="0x73", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="511100001243", ENDIMSI="53110100270380", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
ADD IUACCAREALST: AREA=RA, MCC="510", MNC="28", LAC="0x01Fa", RAC="0x67", RACRANGE="0x63", SUBRANGE=SPECIAL_IMSI_RANGE, BEGIMSI="", ENDIMSI="", CTRLTYPE=REJECT, CAUSE=ROAMING_NOT_ALLOWED_IN_LA;
Output should be like below:
#!/usr/bin/python
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s+=\s(.*)'.format('|'.join(fieldnames)), re.I)
with open('input.txt') as f_input, open('output.csv', 'wb') as f_output:
csv_output = csv.DictWriter(f_output, fieldnames= fieldnames)
csv_output.writeheader()

Your corrected pattern is HERE
I would break text like that into BLOCKS and then find the matches in each block:
import csv
import re
fieldnames = ['AREA', 'MCC', 'MNC']
re_fields = re.compile(r'({})\s*=\s*([^,]+),'.format('|'.join(fieldnames)))
with open(fn) as f_input:
data=f_input.read()
for block in re.finditer(r'(?<=ADD IUACCAREALST:)(.*?)(?=ADD IUACCAREALST:|\Z)', data, flags=re.S | re.M):
print(re_fields.findall(block.group(1)))
Prints:
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
[('AREA', 'RA'), ('MCC', '"510"'), ('MNC', '"28"')]
At that point, use each list of tuples to create a dict forming that csv record; write it to the csv file. Done!

Related

Need help in extracting data from csv and writing to a text file

I have a csv with two columns of data. I want to extract data from one column and write to a text file with single-quote on each element and separated by a comma. For example, I have this..
taxable_entity_id,id
45efc167-9254-406c-b5a8-6aef91a73dd9,331999
5ae97680-f489-4182-9dcb-eb07a73fab15,103507
00018d93-ae71-4367-a0da-f252cea4dfa2,32991
I want all the taxable_entity_ids in a text file like this
'45efc167-9254-406c-b5a8-6aef91a73dd9','5ae97680-f489-4182-9dcb-eb07a73fab15','00018d93-ae71-4367-a0da-f252cea4dfa2'
without any space between two elements, separated by a comma.
Edit:
This is what i tried..
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_MINIMAL)
for row in reader:
writer.writerow(row["taxable_entity_id"])
# print(row["taxable_entity_id"])
text_file.close()
csv_File.close()
and this is what I got..
4,5,e,f,c,1,6,7,-,9,2,5,4,-,4,0,6,c,-,b,5,a,8,-,6,a,e,f,9,1,a,7,3,d,d,9
5,a,e,9,7,6,8,0,-,f,4,8,9,-,4,1,8,2,-,9,d,c,b,-,e,b,0,7,a,7,3,f,a,b,1,5
0,0,0,1,8,d,9,3,-,a,e,7,1,-,4,3,6,7,-,a,0,d,a,-,f,2,5,2,c,e,a,4,d,f,a,2
You were close. Simply as you want one single line in the output file, you should write it at once by using a comprehension:
import csv
with open("Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv", 'r') as csv_File:
reader = csv.DictReader(csv_File)
with open("te_id.csv", 'w') as text_file:
# use QUOTE_ALL to force the quoting
writer = csv.writer(text_file, quotechar='\'', quoting=csv.QUOTE_ALL)
writer.writerow((row["taxable_entity_id"] for row in reader))
And do not use close as you have (correctly) used with.
try that
import pandas as pd
df = pd.read_csv('nameoffile.csv',delimiter = ',')
X = df[0].values
f = open('newfile.txt','w')
for i in X:
f.write(X[i] + ',')
f.close()
It's seems a little odd that you basically want a one row csv file for the taxable_entity_ids, but certain possible. You also don't need to explicitly close() the open files because the with context manager will do it for you automatically.
You also need to open the CSV file with newline='' as shown in all the examples in the csv module's documentation.
Lastly, if you want the all the fields to be quoted you need to use quoting=csv.QUOTE_ALL instead of quoting=csv.QUOTE_MINIMAL.
import csv
inp_filename = "Taxable_entity_those_who_filed_G1_M_July_but_not_in_Aug.csv"
outp_filename = "te_id.csv"
with open(outp_filename, 'w', newline='') as text_file, \
open(inp_filename, 'r', newline='') as csv_File:
reader = csv.DictReader(csv_File)
writer = csv.writer(text_file, quotechar="'", quoting=csv.QUOTE_ALL)
taxable_entity_ids = (row["taxable_entity_id"] for row in reader)
writer.writerow(taxable_entity_ids)
print('done')

Reading in CSV Files with newline characters imbedded

I am currently reading in data from a csv file and inputting tokens and their definitions into a dictionary. The code works fine until it hits a place where the data in the CSV file looks like this:
"Token000\nip address\ntesttestest"
Here is my code so far:
for line in f:
if "Token" in line and re.search("Token\d", line):
commaIndex = line.index(",", line.index("Token"))
csvDict[line[line.index("Token"): commaIndex]] = line[commaIndex + 1: line.index(",", commaIndex + 1)]
Use this:
import csv
data={}
with open('your_file.csv') as csv_file:
reader=csv.reader(csv_file, skipinitialspace=True, quotechar="'")
for row in reader:
data[row[0]]=row[1:]
print(data)
I recommend that you take a look at csv module documentation

How to extract specific columns in log file to csv file

I0625 17:25:22.544378 3366 solver.cpp:229] Iteration 7120, loss = 8.79839
expected output :
Iteration 7120 loss = 8.79839
You would probably need to use a regular expression to find lines that contain the values you want and to extract them. These lines can then be written in CSV format using Python's CSV library as follows:
import re
import csv
with open('log.txt') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_output = csv.writer(f_output)
csv_output.writerow(['Iteration', 'loss'])
for line in f_input:
re_values = re.search(r'Iteration (\d+), loss = ([0-9.]+)', line)
if re_values:
csv_output.writerow(re_values.groups())
Giving you output.csv in CSV format as follows:
Iteration,loss
7120,1.79839
7120,1.79839
7120,1.79839
7120,1.79839

Python: redirect text parsing output to CSV file

I want to write a simple script which will parse a text file of mine.
Pattern is the following:
0.061024 seconds for Process 0 to send.
0.060062 seconds for Process 1 to receive.
This goes on in a loop.
The python file looks like this:
import fileinput, csv
data = []
for line in fileinput.input():
time, sep, status = line.partition("seconds")
if sep:
print(time.strip())
with open('result.csv', 'w') as f:
w = csv.writer(f)
w.writerow('send receive'.split())
w.writerows(data)
this gives me the desired output on the bash and also creates two columns with the send and receive. How do I fill them with input which is printed by
print(time.strip())
I would like to have this output in a CSV file in two columns.
how shall I do it?
You can use the writer function that comes with the csv module:
import csv
with open('file.csv', 'wb') as csvfile:
cwriter = csv.writer(csvfile, delimiter=' ',
quotechar='|', quoting=csv.QUOTE_MINIMAL)
for var in list_of_values:
cwriter.writerow(var)
This takes into consideration, that you have all the rows as separate lists within list_of_values, as in:
list_of_values = [['col1', 'col2'],['col1', 'col2']]

Combining multiple CSV file into a single one

I have CSV files in which Data is formatted as follows:
file1.csv
ID,NAME
001,Jhon
002,Doe
fille2.csv
ID,SCHOOLS_ATTENDED
001,my Nice School
002,His lovely school
file3.csv
ID,SALARY
001,25
002,40
ID field is kind of primary key that will be used to fetch record.
What is the most efficient way to read 3 to 4 files and get corresponding data and store in another CSV file having headings (ID,NAME,SCHOOLS_ATTENDED,SALARY)?
The file sizes are in the hundreds of MBs (100, 200 Mb).
Hundreds of megabytes aren't that much. Why not go for a simple approach using the csv module and collections.defaultdict:
import csv
from collections import defaultdict
result = defaultdict(dict)
fieldnames = {"ID"}
for csvfile in ("file1.csv", "file2.csv", "file3.csv"):
with open(csvfile, newline="") as infile:
reader = csv.DictReader(infile)
for row in reader:
id = row.pop("ID")
for key in row:
fieldnames.add(key) # wasteful, but I don't care enough
result[id][key] = row[key]
The resulting defaultdict looks like this:
>>> result
defaultdict(<type 'dict'>,
{'001': {'SALARY': '25', 'SCHOOLS_ATTENDED': 'my Nice School', 'NAME': 'Jhon'},
'002': {'SALARY': '40', 'SCHOOLS_ATTENDED': 'His lovely school', 'NAME': 'Doe'}})
You could then combine that into a CSV file (not my prettiest work, but good enough for now):
with open("out.csv", "w", newline="") as outfile:
writer = csv.DictWriter(outfile, sorted(fieldnames))
writer.writeheader()
for item in result:
result[item]["ID"] = item
writer.writerow(result[item])
out.csv then contains
ID,NAME,SALARY,SCHOOLS_ATTENDED
001,Jhon,25,my Nice School
002,Doe,40,His lovely school
Following is the working code for combining multiple csv files with specific keywords in their names into 1 final csv file. I have set the default keyword to "file" but u can set it blank if u want to combine all csv files from a folder_path. This code will take header from your first csv file and use it as a header in final combined csv file. It will ignore headers of all other csv files.
import glob,os
#staticmethod
def Combine_multiple_csv_files_thatContainsKeywordInTheirNames_into_one_csv_file(folder_path,keyword='file'):
#takes header only from 1st csv, all other csv headers are skipped and data is appened to final csv
fileNames = glob.glob(folder_path + "*" + keyword + "*"+".csv") # fileNames INCLUDES FOLDER_PATH TOO
with open(folder_path+"Combined_csv.csv", "w", newline='') as fout:
print('Combining multiple csv files into 1')
csv_write_file = csv.writer(fout, delimiter=',')
# a.writerows(op)
with open(fileNames[0], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
csv_write_file.writerows(csv_read_file)
for num in range(1, len(fileNames)):
with open(fileNames[num], mode='rt') as read_file: # utf8
csv_read_file = csv.reader(read_file, delimiter=',') # CSVREADER READS FILE AS 1 LIST PER ROW. SO WHEN WRITIN TO ANOTHER CSV FILE WITH FUNCTION WRITEROWS, IT INTRODUCES ANOTHER NEW LINE '\N' CHARACTER. SO TO AVOID DOUBLE NEWLINES , WE SET NEWLINE AS '' WHEN WE OPEN CSV WRITER OBJECT
next(csv_read_file) # ignore header
csv_write_file.writerows(csv_read_file)

Categories