changing CSV file in python - python

I have a csv file where i have datas like
reviews pos_neg
men hecne bawa duwmedim bu kitabdan. yeqin wie sunni ayrimi etmisiz men wieyem dede babadan bildiyim dualar bu kitabda bawqadi Negative
Cavidanquluzade_official instagram seyfem xeber pırogramı Negative
səhər və axşam zikrlərində 108 ci zikrin başlığının latınca yazılışı verilmiyib. Positive
Bir müslümana aid herşey var Positive
now i need to change all "positive" to "1" and all negatives to "0". Could you please help me?

If I would write that kind of utility, I'd do something like this:
import argparse
import csv
import sys
parser = argparse.ArgumentParser()
parser.add_argument('file', type=argparse.FileType())
args = parser.parse_args()
out = csv.DictWriter(sys.stdout, ('review', 'pos_neg'))
out.writeheader()
for row in csv.DictReader(args.file):
row['pos_neg'] = ['Negative', 'Positive'].index(row['pos_neg'])
out.writerow(row)
Can be used like this on the command line:
$ python converter.py the_file.csv > the_new_file.csv
You may need to fiddle with the options for the CSV reader, as it appears you have a TSV or fix-width file perhaps…?

Related

Get specific value within a file in python

Hello I have a file such as
line .....
line ...content....
SEs for parameters:
0.290391 0.273460 0.236199 0.177329 0.205789 0.221322 0.283763 0.133840 0.119349 0.161495 0.166068 0.340432 0.267828 0.211030 0.175328 0.201448 0.172427 0.244625 0.118869 0.070389 0.085757 0.121992 0.295142 0.371023 0.286122 0.114233 0.191837 0.086125 0.119095 0.061429 0.116536 0.030760 0.018447
contennn
llinnee
some stuf ...
and I would like to get the last value after the SEs for parameters: match (0.018447)
and save it into a variable called :Number
than I should get
print (Number)
0.018447
does someone have an idea using python3 ?
Well I found by using :
with open("path/file.txt", "r") as ifile:
for line in ifile:
if line.startswith("SEs for parameters:"):
SE=next(ifile, ' ').strip()
Number=re.split('\s+', SE)
print(SE[-1])

how to convert google-maps GeoJSON to GPX, retaining location names

I have exported my google-maps Point Of Interests (saved places / locations) via the takeout tool. How can i convert this to GPX, so that i can import it into OSMAnd?
I tried using gpsbabel:
gpsbabel -i geojson -f my-saved-locations.json -o gpx -F my-saved-locations_converted.gpx
But this did not retain the title/name of each point of interest - and instead just used names like WPT001, WPT002, etc.
in the end I solved this by creating a small python script to convert between the formats.
This could be easily adapted for specific needs:
#!/usr/bin/env python3
import argparse
import json
import xml.etree.ElementTree as ET
from xml.dom import minidom
def ingestJson(geoJsonFilepath):
poiList = []
with open(geoJsonFilepath) as fileObj:
data = json.load(fileObj)
for f in data["features"]:
poiList.append({'title': f["properties"]["Title"],
'lon': f["geometry"]["coordinates"][0],
'lat': f["geometry"]["coordinates"][1],
'link': f["properties"].get("Google Maps URL", ''),
'address': f["properties"]["Location"].get("Address", '')})
return poiList
def dumpGpx(gpxFilePath, poiList):
gpx = ET.Element("gpx", version="1.1", creator="", xmlns="http://www.topografix.com/GPX/1/1")
for poi in poiList:
wpt = ET.SubElement(gpx, "wpt", lat=str(poi["lat"]), lon=str(poi["lon"]))
ET.SubElement(wpt, "name").text = poi["title"]
ET.SubElement(wpt, "desc").text = poi["address"]
ET.SubElement(wpt, "link").text = poi["link"]
xmlstr = minidom.parseString(ET.tostring(gpx)).toprettyxml(encoding="utf-8", indent=" ")
with open(gpxFilePath, "wb") as f:
f.write(xmlstr)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('--inputGeoJsonFilepath', required=True)
parser.add_argument('--outputGpxFilepath', required=True)
args = parser.parse_args()
poiList = ingestJson(args.inputGeoJsonFilepath)
dumpGpx(args.outputGpxFilepath, poiList=poiList)
if __name__ == "__main__":
main()
...
it can be called like so:
./convert-googlemaps-geojson-to-gpx.py \
--inputGeoJsonFilepath my-saved-locations.json \
--outputGpxFilepath my-saved-locations_converted.gpx
There is also a NPM script called "togpx":
https://github.com/tyrasd/togpx
I didn't try it, but it claims to keep as much information as possible.

Python configparser reads comments in values

ConfigParser also reads comments. Why? Shouldn't this be a default thing to "ignore" inline comments?
I reproduce my problem with the following script:
import configparser
config = configparser.ConfigParser()
config.read("C:\\_SVN\\BMO\\Source\\Server\\PythonExecutor\\Resources\\visionapplication.ini")
for section in config.sections():
for item in config.items(section):
print("{}={}".format(section, item))
The ini file looks as follows:
[LPI]
reference_size_mm_width = 30 ;mm
reference_size_mm_height = 40 ;mm
print_pixel_pitch_mm = 0.03525 ; mm
eye_cascade = "TBD\haarcascade_eye.xml" #
The output:
C:\_Temp>python read.py
LPI=('reference_size_mm_width', '30 ;mm')
LPI=('reference_size_mm_height', '40 ;mm')
LPI=('print_pixel_pitch_mm', '0.03525 ; mm')
LPI=('eye_cascade', '"TBD\\haarcascade_eye.xml" #')
I don't want to read 30 ;mm but I want to read just the number '30'.
What am I doing wrong?
PS: Python3.7
hi use inline_comment_prefixes while creating configparser object check example below
config = configparser.ConfigParser(inline_comment_prefixes = (";",))
Here is detailed documentation.

Python3 argparse

I have been struggling with this for a few days now and still dont have a good solution. Instead of providing code this time which with this problem has lately been leading to unhelpful tangents, let me just give you an idea of exactly what I am trying to accomplish and perhaps this will streamline the solution.
All I am trying to do run a python program while inputting a few variables to control what the program does. Allow me to give a specific example.
Example Syntax Structure
program_name function_to_run variable_1 variable_2 variable_n
Generic Syntax Example
parrot add "Mr Fluffy" "Red" "15oz"
Another Example
datamine search "Chris"
So to expand on these examples. The first program "parrot" has an add function. When the program is run and the add function is used from the command line, the program expects three variables (Name, color, weight). In the second example, the program named "datamine" has a function named "search" that expects a single string (the search term). The idea is, the program (datamine) for example will have several functions that could be used. Perhaps "add", "search", "delete" are all examples and each will have different expected variables. Using datamine help would list out each function and the required and or optional components.
Using argparse, I have not been able to figure out a working implementation of this yet. From past experience, I think the solution will involved using custom actions. Can anyone please help with some example code? I am using Python 3 by the way.
Thanks for the help!
Use subparsers. The docs give a good example of how to use set_defaults to specify the function that should be called for each subparser:
One particularly effective way of handling sub-commands is to combine the use of the add_subparsers() method with calls to set_defaults() so that each subparser knows which Python function it should execute.
In your examples, parrot and datamine would be separate parsers in separate modules, and add and search would be subparsers under them respectively. For example, the datamine module would look something like this:
#!/usr/bin/env python
# datamine
def add(a, b):
print(a + b)
def search(query, search_all=True):
run_my_search_app(query, search_all=search_all)
if __name__ == '__main__':
# create the top-level parser
parser = argparse.ArgumentParser()
subparsers = parser.add_subparsers()
# create the parser for the "add" command
parser_add = subparsers.add_parser('add')
parser_add.add_argument('-a', type=int, default=1)
parser_add.add_argument('-b', type=int, default=2)
parser_add.set_defaults(func=add)
# create the parser for the "search" command
parser_search = subparsers.add_parser('search')
parser_search.add_argument('query')
parser_search.add_argument('--search-all', action='store_true')
parser_search.set_defaults(func=search)
args = parser.parse_args()
args = vars(args)
func = args.pop("func")
func(**args)
If this file is executable in your shell as datamine, you can do:
datamine add -a 11 -b 5
datamine search foo --search-all
Without optional flags you don't need anything fancy - just look at sys.argv directly:
import sys
def my_add(*args):
print( ','.join(args))
def my_search(*args):
print(args)
fn_map = {"add": my_add, "search": my_search}
if sys.argv[1:]:
fn = fn_map[sys.argv[1]]
rest = sys.argv[2:]
fn(*rest)
sample runs
1951:~/mypy$ python stack43990444.py
1951:~/mypy$ python stack43990444.py add "Mr Fluffy" "Red" "15oz"
Mr Fluffy,Red,15oz
1951:~/mypy$ python stack43990444.py search "Chris"
('Chris',)
Fully functional extrapolation of code from your parrot example using subparsers. Data set (created by this code) and usage examples at the bottom. Beware, example set does not consist strictly of parrots
#!/usr/bin/env python3
import argparse
import json
def add_parrot(name, weight, kind, **kwargs):
print("Adding {} of type {} and size {}".format(name, kind, weight))
with open('parrots.json', 'r') as parrotdb:
parrots = json.load(parrotdb)
parrots.append({'name': name, 'weight': weight, 'type': kind})
with open('parrots.json', 'w') as parrotdb:
json.dump(parrots, parrotdb)
def delete_parrot(name, **kwargs):
print("Uh oh! What happened to {}?".format(name))
with open('parrots.json', 'r') as parrotdb:
parrots = json.load(parrotdb)
parrots[:] = [p for p in parrots if p.get('name') != name]
with open('parrots.json', 'w') as parrotdb:
json.dump(parrots, parrotdb)
def show_parrots(name=None, weight=0, kind=None, **kwargs):
with open('parrots.json', 'r') as parrotdb:
parrots = json.load(parrotdb)
for p in parrots:
if (name or weight or kind):
if name in p['name'] or weight == p['weight'] or kind == p['type']:
print("{}\t{}\t{}".format(
p['name'], p['weight'], p['type']))
else:
print("{}\t{}\t{}".format(p['name'], p['weight'], p['type']))
parser = argparse.ArgumentParser(description="Manage Parrots")
subparsers = parser.add_subparsers()
add_parser = subparsers.add_parser('insert', aliases=['add', 'a'])
add_parser.add_argument('name')
add_parser.add_argument('weight', type=int)
add_parser.add_argument('kind')
add_parser.set_defaults(func=add_parrot)
del_parser = subparsers.add_parser("delete", aliases=['del', 'd'])
del_parser.add_argument('name')
del_parser.set_defaults(func=delete_parrot)
ls_parser = subparsers.add_parser('list', aliases=['show', 'ls'])
ls_parser.add_argument('--name')
ls_parser.add_argument('--size', type=int)
ls_parser.add_argument('--type', dest='kind')
ls_parser.set_defaults(func=show_parrots)
args = parser.parse_args()
args.func(**vars(args))
Dataset and usage examples:
➜ ~ cat parrots.json
[{"name": "tweety", "weight": 4, "type": "yellow"}, {"name": "donald", "weight": 18, "type": "white"}, {"name": "daffy", "weight": 12, "type": "black"}]
➜ ~ ./parrot.py ls
tweety 4 yellow
donald 18 white
daffy 12 black
➜ ~ ./parrot.py ls --name tweety
tweety 4 yellow
➜ ~ ./parrot.py delete tweety
Uh oh! What happened to tweety?
➜ ~ ./parrot.py ls --name tweety
➜ ~

BeautifulSoup in Python not parsing right

I am running Python 2.7.5 and using the built-in html parser for what I am about to describe.
The task I am trying to accomplish is to take a chunk of html that is essentially a recipe. Here is an example.
html_chunk = "<h1>Miniature Potato Knishes</h1><p>Posted by bettyboop50 at recipegoldmine.com May 10, 2001</p><p>Makes about 42 miniature knishes</p><p>These are just yummy for your tummy!</p><p>3 cups mashed potatoes (about<br> 2 very large potatoes)<br>2 eggs, slightly beaten<br>1 large onion, diced<br>2 tablespoons margarine<br>1 teaspoon salt (or to taste)<br>1/8 teaspoon black pepper<br>3/8 cup Matzoh meal<br>1 egg yolk, beaten with 1 tablespoon water</p><p>Preheat oven to 400 degrees F.</p><p>Sauté diced onion in a small amount of butter or margarine until golden brown.</p><p>In medium bowl, combine mashed potatoes, sautéed onion, eggs, margarine, salt, pepper, and Matzoh meal.</p><p>Form mixture into small balls about the size of a walnut. Brush with egg yolk mixture and place on a well-greased baking sheet and bake for 20 minutes or until well browned.</p>"
The goal is to separate out the header, junk, ingredients, instructions, serving, and number of ingredients.
Here is my code that accomplishes that
from bs4 import BeautifulSoup
def list_to_string(list):
joined = ""
for item in list:
joined += str(item)
return joined
def get_ingredients(soup):
for p in soup.find_all('p'):
if p.find('br'):
return p
def get_instructions(p_list, ingredient_index):
instructions = []
instructions += p_list[ingredient_index+1:]
return instructions
def get_junk(p_list, ingredient_index):
junk = []
junk += p_list[:ingredient_index]
return junk
def get_serving(p_list):
for item in p_list:
item_str = str(item).lower()
if ("yield" or "make" or "serve" or "serving") in item_str:
yield_index = p_list.index(item)
del p_list[yield_index]
return item
def ingredients_count(ingredients):
ingredients_list = ingredients.find_all(text=True)
return len(ingredients_list)
def get_header(soup):
return soup.find('h1')
def html_chunk_splitter(soup):
ingredients = get_ingredients(soup)
if ingredients == None:
error = 1
header = ""
junk_string = ""
instructions_string = ""
serving = ""
count = ""
else:
p_list = soup.find_all('p')
serving = get_serving(p_list)
ingredient_index = p_list.index(ingredients)
junk_list = get_junk(p_list, ingredient_index)
instructions_list = get_instructions(p_list, ingredient_index)
junk_string = list_to_string(junk_list)
instructions_string = list_to_string(instructions_list)
header = get_header(soup)
error = ""
count = ingredients_count(ingredients)
return (header, junk_string, ingredients, instructions_string,
serving, count, error)
It works well except in situations where I have chunks that contain strings like "Sauté" because soup = BeautifulSoup(html_chunk) causes Sauté to turn into Sauté and this is a problem because I have a huge csv file of recipes like the html_chunk and I'm trying to structure all of them nicely and then get the output back into a database. I tried checking it Sauté comes out right using this html previewer and it still comes out as Sauté. I don't know what to do about this.
What's stranger is that when I do what BeautifulSoup's documentation shows
BeautifulSoup("Sacré bleu!")
# <html><head></head><body>Sacré bleu!</body></html>
I get
# Sacré bleu!
But my colleague tried that on his Mac, running from terminal, and he got exactly what the documentation shows.
I really appreciate all your help. Thank you.
This is not a parsing problem; it is about encoding, rather.
Whenever working with text which might contain non-ASCII characters (or in Python programs which contain such characters, e.g. in comments or docstrings), you should put a coding cookie in the first or - after the shebang line - second line:
#!/usr/bin/env python
# -*- coding: utf-8 -*-
... and make sure this matches your file encoding (with vim: :set fenc=utf-8).
BeautifulSoup tries to guess the encoding, sometimes it makes a mistake, however you can specify the encoding by adding the from_encoding parameter:
for example
soup = BeautifulSoup(html_text, from_encoding="UTF-8")
The encoding is usually available in the header of the webpage

Categories