Import CSV File and Doing arithmetic Opertions without importing any Library in PYTHON - python

My CSV file Looks Like this
Time_stamp; Mobile_number; Download; Upload; Connection_start_time; Connection_end_time; location
1/2/2020 10:43:55;+917777777777;213455;2343;1/2/2020 10:43:55;1/2/2020 10:47:25;09443
1/3/2020 10:33:10;+919999999999;345656;3568;1/3/2020 10:33:10;1/3/2020 10:37:20;89442
1/4/2020 11:47:57;+919123456654;345789;7651;1/4/2020 11:11:10;1/4/2020 11:40:22;19441
1/5/2020 11:47:57;+919123456543;342467;4157;1/5/2020 11:44:10;1/5/2020 11:59:22;29856
1/6/2020 10:47:57;+917777777777;213455;2343;1/6/2020 10:43:55;1/6/2020 10:47:25;09443
MY Question is
Without importing any Library file
How i can read a CSV file & user have to enter the Mobile number & Program should show the Data usage of that number. ie: Arithmetic Operation (Adding Uplink & downlink ) & get the result (Total Data Used)of that specific Mobile number.
Here is what my code looks Like. ( i don't want to import any Pandas Library. )
import pandas as pd
df = pd.read_csv('test.csv', sep=';')
df.columns = [col.strip() for col in df.columns]
usage = df[['Download', 'Upload']][df.Mobile_number == +917777777777].sum().sum()
print(usage)

I'd use csv.DictReader
In [30]: with open('x', 'r') as f:
...: r = csv.DictReader(f, delimiter=';')
...: dct = {}
...: for row in r:
...: dct.setdefault(row[' Mobile_number'], []).append(row)
...:
In [31]: dct
Out[31]:
{'+917777777777': [OrderedDict([('Time_stamp', '1/2/2020 10:43:55'),
(' Mobile_number', '+917777777777'),
(' Download', '213455'),
(' Upload', '2343'),
(' Connection_start_time', '1/2/2020 10:43:55'),
(' Connection_end_time', '1/2/2020 10:47:25'),
(' location', '09443')]),
OrderedDict([('Time_stamp', '1/6/2020 10:47:57'),
(' Mobile_number', '+917777777777'),
(' Download', '213455'),
(' Upload', '2343'),
(' Connection_start_time', '1/6/2020 10:43:55'),
(' Connection_end_time', '1/6/2020 10:47:25'),
(' location', '09443')])],
'+919999999999': [OrderedDict([('Time_stamp', '1/3/2020 10:33:10'),
(' Mobile_number', '+919999999999'),
(' Download', '345656'),
(' Upload', '3568'),
(' Connection_start_time', '1/3/2020 10:33:10'),
(' Connection_end_time', '1/3/2020 10:37:20'),
(' location', '89442')])],
'+919123456654': [OrderedDict([('Time_stamp', '1/4/2020 11:47:57'),
(' Mobile_number', '+919123456654'),
(' Download', '345789'),
(' Upload', '7651'),
(' Connection_start_time', '1/4/2020 11:11:10'),
(' Connection_end_time', '1/4/2020 11:40:22'),
(' location', '19441')])],
'+919123456543': [OrderedDict([('Time_stamp', '1/5/2020 11:47:57'),
(' Mobile_number', '+919123456543'),
(' Download', '342467'),
(' Upload', '4157'),
(' Connection_start_time', '1/5/2020 11:44:10'),
(' Connection_end_time', '1/5/2020 11:59:22'),
(' location', '29856')])]}
In [32]:
You then process list of dict for a given mobile number by something like usage = sum(float(_[' Download']) + float(_[' Upload']) for _ in dct['+91777777777'])

Noting that you specifically wanted to avoid importing any libraries (I assume this means you want to avoid importing even from the included modules) - for a trivial file (I named one supermarkets.csv, the content looks like this):
ID,Address,City,State,Country,Name,Employees
1,3666 21st St,San Francisco,CA 94114,USA,Madeira,8
2,735 Dolores St,San Francisco,CA 94119,USA,Bready Shop,15
3,332 Hill St,San Francisco,California 94114,USA,Super River,25
4,3995 23rd St,San Francisco,CA 94114,USA,Ben's Shop,10
5,1056 Sanchez St,San Francisco,California,USA,Sanchez,12
6,551 Alvarado St,San Francisco,CA 94114,USA,Richvalley,20
Then you can do something like this:
data = []
with open("supermarkets.csv") as f:
for line in f:
data.append(line)
print(data)
From here you can manipulate your each of the entries in the list using string tools and list comprehensions.

You could try open that does not require any library, to read your file and then iterate through it with readlines. Split the line and check your condition depending on the place in the file your data are.
usage=0
with open('test.csv', 'r') as f:
for line in f.readlines():
try:
line_sp = line.split(';')
if line_sp[1]=='+917777777777':
usage += int(line_sp[2])+int(line_sp[3])
except:
#print(line)
pass
print (usage)

Using no imported modules
# read file and create dict of phone numbers
phone_dict = dict()
with open('test.csv') as f:
for i, l in enumerate(f.readlines()):
l = l.strip().split(';')
if (i != 0):
mobile = l[1]
download = int(l[2])
upload = int(l[3])
if phone_dict.get(mobile) == None:
phone_dict[mobile] = {'download': [download], 'upload': [upload]}
else:
phone_dict[mobile]['download'].append(download)
phone_dict[mobile]['upload'].append(upload)
print(phone_dict)
{'+917777777777': {'download': [213455, 213455], 'upload': [2343, 2343]},
'+919999999999': {'download': [345656], 'upload': [3568]},
'+919123456654': {'download': [345789], 'upload': [7651]},
'+919123456543': {'download': [342467], 'upload': [4157]}}
# function to return usage
def return_usage(data: dict, number: str):
download_usage = sum(data[number]['download'])
upload_usage = sum(data[number]['upload'])
return download_usage + upload_usage
# get user input to return usage
number = input('Please input a phone number')
usage = return_usage(phone_dict, number)
print(usage)
>>> Please input a phone number (numbers only) +917777777777
>>> 431596

A combination of csv and defaultdict can fit your use case:
import csv
from collections import defaultdict
d= defaultdict(list)
with open('data.txt',newline='') as csvfile:
reader = csv.DictReader(csvfile, delimiter=';', skipinitialspace = True)
headers = reader.fieldnames
for row in reader:
row['Usage'] = int(row['Upload']) + int(row['Download'])
d[row.get('Mobile_number')].append(row["Usage"])
print(d)
defaultdict(list,
{'+917777777777': [215798, 215798],
'+919999999999': [349224],
'+919123456654': [353440],
'+919123456543': [346624]})
#get sum for specific mobile number :
sum(d.get("+917777777777"))
431596
Additional details :
new_d = {}
for k,v in d.items():
kb = sum(v)
mb = kb/1024
gb = kb/1024**2
usage = F"{kb}KB/{mb:.2f}MB/{gb:.2f}GB"
new_d[k] = usage
print(new_d)
{'+917777777777': '431596KB/421.48MB/0.41GB',
'+919999999999': '349224KB/341.04MB/0.33GB',
'+919123456654': '353440KB/345.16MB/0.34GB',
'+919123456543': '346624KB/338.50MB/0.33GB'}

Related

How do I parse this kind of text file with special separator

I need to parse the following text file into a dataframe, any suggestion about the methods?
Input:
('name: ', u'Jacky')
('male: ', True)
('hobby: ', u'play football and bascket')
('age: ', 24.0)
----------------
('name: ', u'Belly')
('male: ', True)
('hobby: ', u'dancer')
('age: ', 74.0)
----------------
('name: ', u'Chow')
('male: ', True)
('hobby: ', u'artist')
('age: ', 46.0)
output:
name male hobby age
jacky True football 24
...
I used regex to parse your text file :
import re
import pandas as pd
text_path = 'text.txt'
my_dict = {}
pattern = r"\('(\w+):\s+',\s+u*'*([a-zA-Z0-9\s.]*)'*\)"
with open(text_path, 'r') as txt:
for block in re.split(r"-+\n", txt.read()):
for line in filter(None, block.split('\n')):
col_name, value = re.search(pattern, line).group(1,2)
try:
value = int(float(value))
except ValueError:
value = True if value=='True' else False if value=='False' else value
if col_name in my_dict:
my_dict[col_name].append(value)
else:
my_dict[col_name] = [value]
df = pd.DataFrame(my_dict)
print(df)
Output :
name male hobby age
0 Jacky True play football and bascket 24
1 Belly True dancer 74
2 Chow True artist 46
Booleans values are not string but real bool True or False, numerical value (like age) are int (you could keep them as float) and not strings.
Ask me if you don't understand something.
I don't know any way to parse this data convention with usage of some existing parser so I suggest to build your own ones. Then I would use readlines() method on open file so it allows me to iterate over lines of data and apply correct parser to each row in iteration. Finally, I would combine data and create DataFrame. Example code is below:
import pandas as pd
import sys
def parse_from_weird_file_to_pandas_df(file):
with open(file, 'r') as f:
content = f.readlines()
name_vals = [_parse_text(content[line]) for line in range(0, len(content), 5)]
male_vals = [_parse_bool(content[line]) for line in range(1, len(content), 5)]
hobby_vals = [_parse_text(content[line]) for line in range(2, len(content), 5)]
age_vals = [_parse_int(content[line]) for line in range(3, len(content), 5)]
df_rows = zip(name_vals, male_vals, hobby_vals, age_vals)
df = pd.DataFrame(data=df_rows, columns=["name", "male", "hobby", "age"])
return df
def _parse_text(text_line):
text = text_line[text_line.find("u'") + 2: text_line.find("')")]
return text
def _parse_bool(bool_line):
val_bool = bool_line[bool_line.find("', ") + 3: bool_line.find(")")]
return True if val_bool == "True" else False
def _parse_int(int_line):
val_int = int_line[int_line.find("', ") + 3: int_line.find(")")]
return int(float(val_int))
If you wish to shorten 'play football and bascket' to just 'football' you can achieve this for example by creating list with all available hobbies, looping them through parsed hobby and returning the matching one.
Here is a quick code I made just before lunch, not optimised but seems to work (I did not remove the 'u'in the string and did not convert the int but you should be able to manage that ? If not let me kow and i will work on it after !
The .join remove unecessary char and I assume you only have 4 object every time...
file = open("yourfile.txt", 'r')
lines = file.readlines()
init = True
list_to_append = []
df = pd.DataFrame(columns=['name', 'male', 'hobby','age'])
for line in lines:
if '---' not in line:
line = line.split(',')[1]
processed_line = ''.join(c for c in line if c not in " ()'\n")
list_to_append.append(processed_line)
if len(list_to_append) == 4:
df.loc[len(df)] = list_to_append
list_to_append = []
else :
pass
file.close()

Data generation Python

I'm trying to generate a dataset based on an existing one, I was able to implement a method to randomly change the contents of files, but I can’t write all this to a file. Moreover, I also need to write the number of changed words to the file, since I want to use this dataset to train a neural network, could you help me?
Input: files with 2 lines of text in each.
Output: files with 3(maybe) lines: the first line does not change, the second changes according to the method, the third shows the number of words changed (if for deep learning tasks it is better to do otherwise, I would be glad to advice, since I'm a beginner)
from random import randrange
import os
Path = "D:\corrected data\\"
filelist = os.listdir(Path)
if __name__ == "__main__":
new_words = ['consultable', 'partie ', 'celle ', 'également ', 'forte ', 'statistiques ', 'langue ',
'cadeaux', 'publications ', 'notre', 'nous', 'pour', 'suivr', 'les', 'vos', 'visitez ', 'thème ', 'thème ', 'thème ', 'produits', 'coulisses ', 'un ', 'atelier ', 'concevoir ', 'personnalisés ', 'consultable', 'découvrir ', 'fournit ', 'trace ', 'dire ', 'tableau', 'décrire', 'grande ', 'feuille ', 'noter ', 'correspondant', 'propre',]
nb_words_to_replace = randrange(10)
#with open("1.txt") as file:
for i in filelist:
# if i.endswith(".txt"):
with open(Path + i,"r",encoding="utf-8") as file:
# for line in file:
data = file.readlines()
first_line = data[0]
second_line = data[1]
print(f"Original: {second_line}")
# print(f"FIle: {file}")
second_line_array = second_line.split(" ")
for j in range(nb_words_to_replace):
replacement_position = randrange(len(second_line_array))
old_word = second_line_array[replacement_position]
new_word = new_words[randrange(len(new_words))]
print(f"Position {replacement_position} : {old_word} -> {new_word}")
second_line_array[replacement_position] = new_word
res = " ".join(second_line_array)
print(f"Result: {res}")
with open(Path + i,"w") as f:
for line in file:
if line == second_line:
f.write(res)
In short, you have two questions:
How to properly replace line number 2 (and 3) of the file.
How to keep track of number of words changed.
How to properly replace line number 2 (and 3) of the file.
Your code:
with open(Path + i,"w") as f:
for line in file:
if line == second_line:
f.write(res)
Reading is not enabled. for line in file will not work. fis defined, but file is used instead. To fix this, do the following instead:
with open(Path + i,"r+") as file:
lines = file.read().splitlines() # splitlines() removes the \n characters
lines[1] = second_line
file.writelines(lines)
However, you want to add more lines to it. I suggest you structure the logic differently.
How to keep track of number of words changed.
Add varaible changed_words_count and increment it when old_word != new_word
Resulting code:
for i in filelist:
filepath = Path + i
# The lines that will be replacing the file
new_lines = [""] * 3
with open(filepath, "r", encoding="utf-8") as file:
data = file.readlines()
first_line = data[0]
second_line = data[1]
second_line_array = second_line.split(" ")
changed_words_count = 0
for j in range(nb_words_to_replace):
replacement_position = randrange(len(second_line_array))
old_word = second_line_array[replacement_position]
new_word = new_words[randrange(len(new_words))]
# A word replaced does not mean the word has changed.
# It could be replacing itself.
# Check if the replacing word is different
if old_word != new_word:
changed_words_count += 1
second_line_array[replacement_position] = new_word
# Add the lines to the new file lines
new_lines[0] = first_line
new_lines[1] = " ".join(second_line_array)
new_lines[2] = str(changed_words_count)
print(f"Result: {new_lines[1]}")
with open(filepath, "w") as file:
file.writelines(new_lines)
Note: Code not tested.

how to parse a txt file to csv and modify formatting

Is there a way I can use python to take my animals.txt file results and convert it to csv and format it differently?
Currently the animals.txt file looks like this:
ID:- 512
NAME:- GOOSE
PROJECT NAME:- Random
REPORT ID:- 30321
REPORT NAME:- ANIMAL
KEYWORDS:- ['"help,goose,Grease,GB"']
ID:- 566
NAME:- MOOSE
PROJECT NAME:- Random
REPORT ID:- 30213
REPORT NAME:- ANIMAL
KEYWORDS:- ['"Moose, boar, hansel"']
I would like the CSV file to present it as:
ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS
Followed by the results underneath each header
Here is a script I have wrote:
import re
import csv
with open("animals.txt") as f: text = f.read()
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.DictWriter(csvfile, fieldnames=keys)
writer.writeheader()
for x in data:
writer.writerow(x)
An easy way to do is parsing using regex and store them in a dict, just before you write the final csv:
import re
# `text` is your input text
data = {}
keys = ['ID', 'NAME', 'PROJECT NAME', 'REPORT ID', 'REPORT NAME', 'KEYWORDS']
for k in keys:
data[k] = re.findall(r'%s:- (.*)' % k, text)
And to CSV:
import csv
csv_file = 'out.csv'
with open(csv_file, 'w') as csvfile:
writer = csv.writer(csvfile, quoting=csv.QUOTE_NONE, escapechar='\\')
writer.writerow(data.keys())
for i in range(len(data[keys[0]])):
writer.writerow([data[k][i] for k in keys])
Output in csv:
ID,NAME,PROJECT NAME,REPORT ID,REPORT NAME,KEYWORDS
512,GOOSE,Random,30321,ANIMAL,['\"help\,goose\,Grease\,GB\"']
566,MOOSE,Random,30213,ANIMAL,['\"Moose\, boar\, hansel\"']
Note that I used re.M multiline mode since there's a trick in your text, preventing matching ID twice! Also the default write rows needed to be twisted.
Also uses \ to escape the quote.
This should work:
fname = 'animals.txt'
with open(fname) as f:
content = f.readlines()
content = [x.strip() for x in content]
output = 'ID, NAME, PROJECT NAME, REPORT ID, REPORT NAME, KEYWORDS\n'
line_output = ''
for i in range(0, len(content)):
if content[i]:
line_output += content[i].split(':-')[-1].strip() + ','
elif not content[i] and not content[i - 1]:
output += line_output.rstrip(',') + '\n'
line_output = ''
output += line_output.rstrip(',') + '\n'
print(output)
That's the code in Autoit (www.autoitscript.com)
Global $values_A = StringRegExp(FileRead("json.txt"), '[ID|NAME|KEYWORDS]:-\s(.*)?', 3)
For $i = 0 To UBound($values_A) - 1 Step +6
FileWrite('out.csv', $values_A[$i] & ',' & $values_A[$i + 1] & ',' & $values_A[$i + 2] & ',' & $values_A[$i + 3] & ',' & $values_A[$i + 4] & ',' & $values_A[$i + 5] & #CRLF)
Next

Create variables from text file in Python

This is linked to this question here I used the first answer, I tried changing the code but it didn't seem to work as that example has "[]" in the variables
I have a text file here:
room1North = CP
room1East = CP
room1South = OP
room1West = OP
room2North = OP
room2East = CP
room2South = EP
room2West = OP
I would like Python to create variables with the values in the text file so the variable "room1North = CP" in Python
I have the following code so far
with open("maze files.txt", "r") as f:
data = f.readlines()
room1North, room1East, room1South, room1West, room2North, room2Eeast, room2South, room2West = [d.split('=')[1].split('\n')[0] for d in data]
I get the following error:
IndexError: list index out of range
You don't actually want separate variables; you want a single dict whose keys are read from the file.
with open("maze files.txt", "r") as f:
data = {k:v for k, v in [line.strip().replace(' ', '').split("=") for line in f]}
# data["room1North"] == "CP"
# data["room1East"] == "CP"
# data["room1South"] == "OP"
# etc
Change your code as bellow
with open("maze files.txt", "r") as f:
data = f.readlines()
room1North, room1East, room1South, room1West, room2North, room2Eeast, room2South, room2West = [d.split('=')[1].split('\n')[0] for d in ''.join(data).split('\n')]
I think you'd have more luck using a dictionary rather than relying on pure variables.
with open("maze files.txt", "r") as f:
data = f.readlines()
rooms = {}
for i in data:
currentRoom = i.replace(' ', '').strip().split('=')
rooms[currentRoom[0]] = currentRoom[1]
What you'll be left with is a dictionary like the following
print(rooms)
#{'room1North ': ' CP', 'room1East ': ' CP', 'room1South ': ' OP', 'room1West ': ' OP', 'room2North ': ' OP', 'room2East ': ' CP', 'room2South ': ' EP', 'room2 West ': ' OP'}
You can reference each room and it's value by rooms["room1North"]

Python: Search particular string in file

I have text file, that store orders info in following format. I try to search an order by first line of the block, that represent ID and print 7 next lines. But my code checking just the first line or print all line's that contain an input number. Could somebody help me?
4735
['Total price: ', 1425.0]
['Type of menu: ', 'BBQ']
['Type of service: ', ' ']
['Amount of customers: ', 25.0]
['Discount: ', '5%', '= RM', 75.0]
['Time: ', '2017-01-08 21:39:19']
3647
['Total price: ', 2000.0]
['Type of menu: ', ' ']
['Type of service: ', 'Tent ']
['Amount of customers: ', 0]
.......
I use the following code to search in text file.
try:
f = open('Bills.txt', 'r')
f.close()
except IOError:
absent_input = (raw_input("|----File was not founded----|\n|----Press 'Enter' to continue...----|\n"))
report_module = ReportModule()
report_module.show_report()
Id_input = (raw_input("Enter ID of order\n"))
with open("Bills.txt", "r") as f:
searchlines = f.readlines()
j = len(searchlines) - 1
for i, line in enumerate(searchlines):
if Id_input in str(line): # I also try to check in this way (Id_input == str(line)), but it didn't work
k = min(i + 7, j)
for l in searchlines[i:k]: print l,
print
else:
absent_input = (raw_input("|----Order was not founded----|\n|----Press 'Enter' to continue...----|\n"))
report_module = ReportModule()
report_module.show_report()
check the following code.
Id_input = (raw_input("Enter ID of order\n")).strip()
try:
f = open("Bills.txt", "r")
print_rows = False
for idline in f:
if idline.strip() == Id_input:
print_rows = True
continue
if print_rows:
if idline.startswith("["):
print idline
else:
break
if not print_rows:
absent_input = (raw_input("|----Order was not founded----|\n|---- Press 'Enter' to continue...----|\n"))
report_module = ReportModule()
report_module.show_report()
except IOError:
absent_input = (raw_input("|----File was not founded----|\n|---- Press 'Enter' to continue...----|\n"))
report_module = ReportModule()
report_module.show_report()

Categories