Python regex to capture a comma-delimited list of items - python

I have a list of weather forecasts that start with a similar prefix that I'd like to remove. I'd also like to capture the city names:
Some Examples:
If you have vacation or wedding plans in Phoenix, Tucson, Flagstaff,
Salt Lake City, Park City, Denver, Estes Park, Colorado Springs,
Pueblo, or Albuquerque, the week will...
If you have vacation or wedding plans for Miami, Jacksonville, Macon,
Charlotte, or Charleston, expect a couple systems...
If you have vacation or wedding plans in Pittsburgh, Philadelphia,
Atlantic City, Newark, Baltimore, D.C., Richmond, Charleston, or
Dover, expect the week...
The strings start with a common prefix "If you have vacation or wedding plans in" and the last city has "or" before it. The list of cities is of variable length.
I've tried this:
>>> text = 'If you have vacation or wedding plans in NYC, Boston, Manchester, Concord, Providence, or Portland'
>>> re.search(r'^If you have vacation or wedding plans in ((\b\w+\b), ?)+ or (\w+)', text).groups()
('Providence,', 'Providence', 'Portland')
>>>
I think I'm pretty close, but obviously it's not working. I've never tried to do something with a variable number of captured items; any guidance would be greatly appreciated.

Alternative solution here (probably just for sharing and educational purposes).
If you were to solve it with nltk, it would be called a Named Entity Recognition problem. Using the snippet based on nltk.chunk.ne_chunk_sents(), provided here:
import nltk
def extract_entity_names(t):
entity_names = []
if hasattr(t, 'label') and t.label:
if t.label() == 'NE':
entity_names.append(' '.join([child[0] for child in t]))
else:
for child in t:
entity_names.extend(extract_entity_names(child))
return entity_names
sample = "If you have vacation or wedding plans in Phoenix, Tucson, Flagstaff, Salt Lake City, Park City, Denver, Estes Park, Colorado Springs, Pueblo, or Albuquerque, the week will..."
sentences = nltk.sent_tokenize(sample)
tokenized_sentences = [nltk.word_tokenize(sentence) for sentence in sentences]
tagged_sentences = [nltk.pos_tag(sentence) for sentence in tokenized_sentences]
chunked_sentences = nltk.ne_chunk_sents(tagged_sentences, binary=True)
entity_names = []
for tree in chunked_sentences:
entity_names.extend(extract_entity_names(tree))
print entity_names
It prints exactly the desired result:
['Phoenix', 'Tucson', 'Flagstaff', 'Salt Lake City', 'Park City', 'Denver', 'Estes Park', 'Colorado Springs', 'Pueblo', 'Albuquerque']

Here is my approach: use the csv module to parse the lines (I assume they are in a text file named data.csv, please change to suite your situation). After parsing each line:
Discard the last cell, it is not a city name
Remove 'If ...' from the first cell
Remove or 'or ' from the last cell (used to be next-to-last)
Here is the code:
import csv
def cleanup(row):
new_row = row[:-1]
new_row[0] = new_row[0].replace('If you have vacation or wedding plans in ', '')
new_row[0] = new_row[0].replace('If you have vacation or wedding plans for ', '')
new_row[-1] = new_row[-1].replace('or ', '')
return new_row
if __name__ == '__main__':
with open('data.csv') as f:
reader = csv.reader(f, skipinitialspace=True)
for row in reader:
row = cleanup(row)
print row
Output:
['Phoenix', 'Tucson', 'Flagstaff', 'Salt Lake City', 'Park City', 'Denver', 'Estes Park', 'Colorado Springs', 'Pueblo', 'Albuquerque']
['Miami', 'Jacksonville', 'Macon', 'Charlotte', 'Charleston']
['Pittsburgh', 'Philadelphia', 'Atlantic City', 'Newark', 'Baltimore', 'D.C.', 'Richmond', 'Charleston', 'Dover']

import re
s = "If you have vacation or wedding plans for Miami, Jacksonville, Macon, Charlotte, or Charleston, expect a couple systems"
p = re.compile(r"If you have vacation or wedding plans (in|for) ((\w+, )+)or (\w+)")
m = p.match(s)
print m.group(2) # output: Miami, Jacksonville, Macon, Charlotte,
cities = m.group(2).split(", ") # cities = ['Miami', 'Jacksonville', 'Macon', 'Charlotte', '']
cities[-1] = m.group(4) # add the city after or
print cities # cities = ['Miami', 'Jacksonville', 'Macon', 'Charlotte', 'Charleston']
the city can be matched by pattern (\w+, ) and or (\w+)
and split cities by pattern ,
btw, as the pattern is used to many data, it is preferred to work with the compiled object
PS: the word comes after plan can be for or in, according to examples you provide

How about this
>>> text = 'If you have vacation or wedding plans for Phoenix, Tucson, Flagstaff, Salt Lake City, Park City, Denver, Estes Park, Colorado Springs, Pueblo, or Albuquerque, the week will'
>>> match = re.search(r'^If you have vacation or wedding plans (in?|for?) ([\w+ ,]+)',text).groups()[1].split(", ")
Output
>>> match
['Phoenix', 'Tucson', 'Flagstaff', 'Salt Lake City', 'Park City', 'Denver', 'Estes Park', 'Colorado Springs', 'Pueblo', 'or Albuquerque', 'the week will']

Related

sort the list by alphabetically with last name, if names are same with book title

Trying to sort with an order of the last name from the list of author names, and books like this. Does anyone know how to get an index value right before the ',' this delimiter? Which are the last names.
I need to put the index value in the lambda x:x[here]
Also what if the author names are the same how do I order them in alphabetical order of book titles?
name_list= ["Dan Brown,The Da Vinci Code",
"Cornelia Funke,Inkheart",
"H G Wells,The War Of The Worlds",
"William Goldman,The Princess Bride",
"Harper Lee,To Kill a Mockingbird",
"Gary Paulsen,Hatchet",
"Jodi Picoult,My Sister's Keeper",
"Philip Pullman,The Golden Compass",
"J R R Tolkien,The Lord of the Rings",
"J R R Tolkien,The Hobbit",
"J.K. Rowling,Harry Potter Series",
"C S Lewis,The Lion the Witch and the Wardrobe",
"Louis Sachar,Holes",
"F. Scott Fitzgerald,The Great Gatsby",
"Eric Walters,Shattered",
"John Wyndham,The Chrysalids"]
def sorting(name):
last_name =[]
name_list = book_rec(name)
for i in name_list:
last_name.append(i.split())
name_list = []
for i in sorted(last_name, key=lambda x: x[]):
name_list.append(' '.join(i))
return name_list
split on comma, keep first part; split on white space, keep last:
name_list.sort(key=lambda x: x.split(',')[0].split()[-1])
If you also want to sort by book titles for the same author last name, then maybe it's better to use a function that throws key:
def sorting_key(author_title):
author, title = author_title.split(',')
# first by author last name, then by book title
return author.split()[-1], title
name_list.sort(key=sorting_key)
print(name_list)
Output:
['Dan Brown,The Da Vinci Code',
'F. Scott Fitzgerald,The Great Gatsby',
'Cornelia Funke,Inkheart',
'William Goldman,The Princess Bride',
'Harper Lee,To Kill a Mockingbird',
'C S Lewis,The Lion the Witch and the Wardrobe',
'Gary Paulsen,Hatchet',
"Jodi Picoult,My Sister's Keeper",
'Philip Pullman,The Golden Compass',
'J.K. Rowling,Harry Potter Series',
'Louis Sachar,Holes',
'J R R Tolkien,The Hobbit',
'J R R Tolkien,The Lord of the Rings',
'Eric Walters,Shattered',
'H G Wells,The War Of The Worlds',
'John Wyndham,The Chrysalids']

Obtaining A List of Car Brands from Wikipedia Using Beautiful Soup

What I want to do is to be able to go and scrape all the makes of cars off the Wikipedia page for each of the countries and be able to get them into a dictionary.
https://en.wikipedia.org/wiki/List_of_car_brands
Right now I am currently using BeautifulSoup and found that each of these cars lies within a ul and li tag but I am not sure how to specifically get them specifically. Where the cars come from isn't really relevant and I just want to be able to get all of them into a list.
I have the current code that is able to get all of the ul tags but not sure how to go about specifying it in a better way.
# Import packages
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Specify url of the web page
source = urlopen('https://en.wikipedia.org/wiki/List_of_car_brands').read()
# Make a soup
soup = BeautifulSoup(source,'lxml')
for link in soup.find_all("li"):
print(link)
Any direction or idea of what I can do would be much appreciated. Cheers!
Flat list with cars
Select all <li> that are next siblings of a <h2> get its texts and slice by the first list entry of 'See also' section:
(x:=[x.text.split('[')[0]
for x in soup.select('h2~ul li')
])[:x.index('Timeline of motor vehicle brands')]
Output
['Zanella (1948–present)', 'Anasagasti (1911–1915)', 'Andino (1967–1973)', 'ASA (1961– 1969)', 'Eniak (1983–1989)', 'Hispano-Argentina (1925–1953)', 'Industrias Aeronáuticas y Mecánicas del Estado (IAME, Mechanical Aircraft Industries of the State, 1951–1979), not to be confused with Italian American Motor Engineering', 'Industrias Kaiser Argentina (IKA, 1956–1975), United Kingdom', 'Alpha Sports (1963–present)', 'Arrow (1963–present)',...]
List of dicts with cars and country
A bit more to do but nearly the same - Select all <h2> that has <ul> as next siblings and iterate over their next siblings while the next <h2> follows:
data = []
for h in soup.select('h2:has(~ul)'):
cars = []
for tag in h.next_siblings:
if tag.name == 'ul':
for x in tag.text.split('\n'):
cars.append(x.split('[')[0])
elif tag.name == 'h2':
break
if 'See also' not in h.text:
data.append({
'country':h.text.split('[')[0],
'cars': cars
})
print(data)
Output
[{'country': 'Argentina', 'cars': ['Zanella (1948–present)', 'Anasagasti (1911–1915)', 'Andino (1967–1973)', 'ASA (1961– 1969)', 'Eniak (1983–1989)', 'Hispano-Argentina (1925–1953)', 'Industrias Aeronáuticas y Mecánicas del Estado (IAME, Mechanical Aircraft Industries of the State, 1951–1979), not to be confused with Italian American Motor Engineering', 'Industrias Kaiser Argentina (IKA, 1956–1975), United Kingdom']}, {'country': 'Australia', 'cars': ['Alpha Sports (1963–present)', 'Arrow (1963–present)', 'Birchfield (2003–present)', 'Bolwell (1979–present)', 'Borland Racing Developments (1984–present)', 'Bufori (1986–present)', 'Bullet (1996–present)', 'Carbontech (1999–present)', 'Daytona (2002–present)', 'Devaux (2001–present)', 'DRB Sports Cars (1997–present)', 'Elfin Cars (1958–present)', 'Finch Restorations (1965–present)', 'Jacer (1995–present)', 'Joss Developments (2004–present)', 'McKernan (2012–present)', 'Minetti Sports Cars (2003–present)', 'Nota (1955–present)', 'PRB (1978–present)', 'Puma Clubman (1998–present)', 'Python (1981–present)', 'Quantum (2015–present)', 'Roaring Forties (1997–present)', 'Spartan-V (2004–present)', 'Stealth Special Vehicles (2004–present)', 'Stohr Cars (1991–present)', 'Ascort (1958–1960)', 'Austin (1954–1983)', 'Australian Six (1919–1930)', 'Australis (1897–1907)', 'Birchfield (2003–2004)', 'Blade (2008–2013)', 'Buchanan', 'Buckle (1955–1959)', 'Bush Ranger (1977–2016)', 'Caldwell Vale (1907–1913)', 'Cheetah', 'Chrysler (1957–1981)', 'Ford (1925–2016) (continues as a brand applied to imported cars)', 'FPV (2002–2014)', 'Giocattolo (1986–1989)', 'Goggomobil (1958–1961)', 'Hartnett (1949–1955)', 'Holden (1948–2017) (continues as a brand applied to imported cars)', 'HSV (1987–2017)', 'Honda', 'Ilinga (1974-1975)', 'Kaditcha', 'Leyland (1973–1982)', 'Lloyd-Hartnett (1957–1962)', 'Lonsdale (1982–1983) (Cars produced and exported by Mitsubishi Australia and sold in the UK by the Colt Car Company under the Lonsdale brand.)', 'Mercedes-Benz (1890–present)', 'Mitsubishi (1980–2008) (The brand continued to be used in Australia for fully imported cars after 2008.)', 'Morris (1947–1973)', 'Nissan (1983–1992) (The brand continued to be used in Australia for fully imported cars after 1992.)', 'Pellandini (1970–1978)', 'Pioneer', 'Purvis Eureka (1974–1991)', 'Shrike (1988–1989)', 'Southern Cross (1931–1935)', 'Statesman (1971–1984)', 'Tarrant (1900–1907)', 'Toyota, Australian production finished (1963–2017)', 'Volkswagen', 'Zeta (1963–1965)']}, {'country': 'Austria', 'cars': ['Eurostar Automobilwerk', 'KTM', 'Magna Steyr', 'ÖAF', 'Puch', 'Steyr Motors GmbH', 'Rosenbauer', 'Tushek&Spigel Supercars', 'Austro-Daimler (1889–1934)', 'Austro-Tatra (1934–1948)', 'Custoca (also known as Custoka) (1966–1988)', 'Denzel (1948–1959)', 'Felber Autoroller (1952–1953)', 'Gräf & Stift (1902–2001)', 'Grofri (1921–1931)', 'Libelle (1952–1954)', 'Lohner-Porsche (1900–1905)', 'Möve 101', 'Steyr automobile', 'Steyr-Daimler-Puch']}, {'country': 'Azerbaijan', 'cars': ['GA (1986–present)', 'Khazar (2018–present)', 'NAZ (2010–present)', 'Aziz (2005–2010)']}, {'country': 'Belgium', 'cars': ['Ecar (2015–present)', 'Edran (1984–present)', 'Gillet (1982–present)', 'Imperia Automobiles (2008–present)', 'ADK (1930)', 'Alatac (1913–1914)', 'Alberta (1906)', 'Alfa Legia (1914)', 'ALP (1920)', 'Altona (1946)', 'AMA (1913)', 'Antoine (1903)', "d'Aoust (1927)", 'Apal (1998)', 'Aquila (1903)', 'Astra (1931)', 'ATA', 'Auto Garage (1911)', 'Auto-Mixte (1906–1912)', 'Avior (1947)', 'Bastin (1909)', 'Beckett & Farlow (1908)', 'Belga (1921)', 'Belga-Rise (1935)', 'Belgica (1909)', 'Bercley (1900)', 'Bovy (1914)', 'Cambier (1898)', 'CAP (1914)', 'Catala (1914)', 'CIE (1898)', "CLA (Compagnie Liégeoise d'Automobiles) (1901)", 'Claeys-Flandria (1955)', 'Coune (1947)', 'Cyclecars R&D (1921)', 'Dasse (1924)', 'De Cosmo (1908)', 'De Wandre (1923)', 'DéChamps (1906)', 'Delecroix (1899)', 'Delin (1901)', 'Direct (1905)', 'Dyle & Bacalan (1906)', 'Escol (1938)', 'Excelsior (1904–1932)', 'Fab (1914)', 'FD (1925)', 'Fif (1914)', 'Flaid (1921)', 'FN (1935)', 'Fondu (1912)', 'Frenay (1914)', 'Germain (1901)', 'Hermes (1909)', 'Hermes-Mathis (1914)', 'Imperia (1906–1948)', 'Imperia-Abadal (1913–1917)', 'Jeecy-Vea (1926)', 'Jenatzy (Société Générale des Transports Automobiles) (1905)', 'Juwel (1928)', 'Kleinstwagen (1952)', 'Knap (moved to France in 1899 or 1900) (1909)', 'L&B', 'Linon (1914)', 'Loza (1925)', 'Matthieu (1906)', 'Matthys Frères et Osy (1927)', 'Mécanique et Moteurs (1906)', 'Meeussen (1972)', 'Métallurgique (1913)', 'Miesse (1926)', 'Minerva (1939)', 'Nagant (1927)', 'Oracle (2005)', 'P.L.M. (Keller) (1955)', 'P-M (1924)', 'Peterill (1899)', 'Pieper (1903)', 'Pipe (1922)', 'R.A.L. (1914)', 'Ranger (General Motors brand) (1970–1978)', 'Royal Star (1910)', 'Rumpf (1899)', 'S.C.H. (1928)', 'Sava (1923)', 'SOMEA', 'Speedsport (1927)', 'Springuel (1912)', 'Taunton (1922)', 'Turner-Miesse (1913)', 'Vanclee (1989)', 'Vincke (1905)', 'Vivinus (1912)', 'Widi (1960)', 'Wilford (1901)', 'Zelensis (1962)']},...]
One approach is to get all list items with no classes (which will skip the index at the top), then split that list by the first occurrence of 'Timeline of motor vehicle brands':
from urllib.request import urlopen
from bs4 import BeautifulSoup
# Specify url of the web page
source = urlopen('https://en.wikipedia.org/wiki/List_of_car_brands').read()
# Make a soup
soup = BeautifulSoup(source)
#get all list items
list_items = [i.get_text() for i in soup.find_all('li', class_=None)]
#split list_items by 'Timeline of motor vehicle brands'
list_items = list_items[:list_items.index('Timeline of motor vehicle brands')]
#remove citations from strings
list_items = [i.split('[')[0] for i in list_items]
Output:
['Zanella (1948–present)', 'Anasagasti (1911–1915)', 'Andino (1967–1973)', 'ASA (1961– 1969)', 'Eniak (1983–1989)', 'Hispano-Argentina (1925–1953)', 'Industrias Aeronáuticas y Mecánicas del Estado (IAME, Mechanical Aircraft Industries of the State, 1951–1979), not to be confused with Italian American Motor Engineering', 'Industrias Kaiser Argentina (IKA, 1956–1975), United Kingdom', 'Alpha Sports (1963–present)', 'Arrow (1963–present)', 'Birchfield (2003–present)', 'Bolwell (1979–present)', 'Borland Racing Developments (1984–present)', 'Bufori (1986–present)', 'Bullet (1996–present)', 'Carbontech (1999–present)', 'Daytona (2002–present)', 'Devaux (2001–present)', 'DRB Sports Cars (1997–present)', 'Elfin Cars (1958–present)', 'Finch Restorations (1965–present)', 'Jacer (1995–present)', 'Joss Developments (2004–present)', 'McKernan (2012–present)', 'Minetti Sports Cars (2003–present)', 'Nota (1955–present)', 'PRB (1978–present)', 'Puma Clubman (1998–present)', 'Python (1981–present)', 'Quantum (2015–present)', 'Roaring Forties (1997–present)', 'Spartan-V (2004–present)', 'Stealth Special Vehicles (2004–present)', 'Stohr Cars (1991–present)', 'Ascort (1958–1960)', 'Austin (1954–1983)', 'Australian Six (1919–1930)', 'Australis (1897–1907)', 'Birchfield (2003–2004)', 'Blade (2008–2013)', 'Buchanan', 'Buckle (1955–1959)', 'Bush Ranger (1977–2016)', 'Caldwell Vale (1907–1913)', 'Cheetah', 'Chrysler (1957–1981)', 'Ford (1925–2016) (continues as a brand applied to imported cars)', 'FPV (2002–2014)', 'Giocattolo (1986–1989)', 'Goggomobil (1958–1961)', 'Hartnett (1949–1955)', 'Holden (1948–2017) (continues as a brand applied to imported cars)', 'HSV (1987–2017)', 'Honda', 'Ilinga (1974-1975)', 'Kaditcha', 'Leyland (1973–1982)', 'Lloyd-Hartnett (1957–1962)', 'Lonsdale (1982–1983) (Cars produced and exported by Mitsubishi Australia and sold in the UK by the Colt Car Company under the Lonsdale brand.)', 'Mercedes-Benz (1890–present)', 'Mitsubishi (1980–2008) (The brand continued to be used in Australia for fully imported cars after 2008.)', 'Morris (1947–1973)', 'Nissan (1983–1992) (The brand continued to be used in Australia for fully imported cars after 1992.)', 'Pellandini (1970–1978)', 'Pioneer', 'Purvis Eureka (1974–1991)', 'Shrike (1988–1989)', 'Southern Cross (1931–1935)', 'Statesman (1971–1984)', 'Tarrant (1900–1907)', 'Toyota, Australian production finished (1963–2017)', 'Volkswagen', 'Zeta (1963–1965)', 'Eurostar Automobilwerk', 'KTM', 'Magna Steyr', 'ÖAF', 'Puch', 'Steyr Motors GmbH', 'Rosenbauer', 'Tushek&Spigel Supercars', 'Austro-Daimler (1889–1934)', 'Austro-Tatra (1934–1948)', 'Custoca (also known as Custoka) (1966–1988)', 'Denzel (1948–1959)', 'Felber Autoroller (1952–1953)', 'Gräf & Stift (1902–2001)', 'Grofri (1921–1931)', 'Libelle (1952–1954)', 'Lohner-Porsche (1900–1905)', 'Möve 101', 'Steyr automobile', 'Steyr-Daimler-Puch', 'GA (1986–present)', 'Khazar (2018–present)', 'NAZ (2010–present)', 'Aziz (2005–2010)', 'Ecar (2015–present)', 'Edran (1984–present)', 'Gillet (1982–present)', 'Imperia Automobiles (2008–present)', 'ADK (1930)', 'Alatac (1913–1914)', 'Alberta (1906)', 'Alfa Legia (1914)', 'ALP (1920)', 'Altona (1946)', 'AMA (1913)', 'Antoine (1903)', "d'Aoust (1927)", 'Apal (1998)', 'Aquila (1903)', 'Astra (1931)', 'ATA', 'Auto Garage (1911)', 'Auto-Mixte (1906–1912)', 'Avior (1947)', 'Bastin (1909)', 'Beckett & Farlow (1908)', 'Belga (1921)', 'Belga-Rise (1935)', 'Belgica (1909)', 'Bercley (1900)', 'Bovy (1914)', 'Cambier (1898)', 'CAP (1914)', 'Catala (1914)', 'CIE (1898)', "CLA (Compagnie Liégeoise d'Automobiles) (1901)", 'Claeys-Flandria (1955)', 'Coune (1947)', 'Cyclecars R&D (1921)', 'Dasse (1924)', 'De Cosmo (1908)', 'De Wandre (1923)', 'DéChamps (1906)', 'Delecroix (1899)', 'Delin (1901)', 'Direct (1905)', 'Dyle & Bacalan (1906)', 'Escol (1938)', 'Excelsior (1904–1932)', 'Fab (1914)', 'FD (1925)', 'Fif (1914)', 'Flaid (1921)', 'FN (1935)', 'Fondu (1912)', 'Frenay (1914)', 'Germain (1901)', 'Hermes (1909)', 'Hermes-Mathis (1914)', 'Imperia (1906–1948)', 'Imperia-Abadal (1913–1917)', 'Jeecy-Vea (1926)', 'Jenatzy (Société Générale des Transports Automobiles) (1905)', 'Juwel (1928)', 'Kleinstwagen (1952)', 'Knap (moved to France in 1899 or 1900) (1909)', 'L&B', 'Linon (1914)', 'Loza (1925)', 'Matthieu (1906)', 'Matthys Frères et Osy (1927)', 'Mécanique et Moteurs (1906)', 'Meeussen (1972)', 'Métallurgique (1913)', 'Miesse (1926)', 'Minerva (1939)', 'Nagant (1927)', 'Oracle (2005)', 'P.L.M. (Keller) (1955)', 'P-M (1924)', 'Peterill (1899)', 'Pieper (1903)', 'Pipe (1922)', 'R.A.L. (1914)', 'Ranger (General Motors brand) (1970–1978)', 'Royal Star (1910)', 'Rumpf (1899)', 'S.C.H. (1928)', 'Sava (1923)', 'SOMEA', 'Speedsport (1927)', 'Springuel (1912)', 'Taunton (1922)', 'Turner-Miesse (1913)', 'Vanclee (1989)', 'Vincke (1905)', 'Vivinus (1912)', 'Widi (1960)', 'Wilford (1901)', 'Zelensis (1962)', 'PRETIS (?–1969)', 'TAS (1982–1992)', 'Abais', 'Adamo GT', 'Agrale (1982–present)', 'Americar', 'Amoritz GT', 'Avallone', 'Bianco', 'BRM Buggy (1969–present)', 'Bugre (1970–present)', 'Centaurus', 'Chamonix (1987–present)', 'Edra (1989–present)', 'Kramer', 'Lobini (2002–present)', 'Matra', 'San Vito', 'TAC (2004–present)', 'Tanger', 'Vemag', 'Villa GT', 'Volkswagen (1953–present)', 'W.W. Trevis (1998–present)', 'Dacon (1964-1996)', 'Dardo (1981)', 'Democrata (1967)', 'Engesa (1963–1993)', 'Envemo (1978–c.1994)', 'Equus Thundix', 'Fabral', 'Farus', 'FNM (1960–1963)', 'Gurgel (1966–1995)', 'Hofstetter turbo (1986–1989)', 'Miura (1977–c.1987)', 'MP Lafer (1974–c.1990)', 'Puma (1967–1997)', 'Santa Matilde (1977–c.1997)', 'Troller (1998–2021)', 'Uirapuru (1966–1968)', 'Willys', 'Litex Motors', 'SIN Cars', 'Bulgaralpine', 'Bulgarrenault', 'Moskvitch', 'Pirin-Fiat', 'Sofia', 'ElectraMeccanica (2015–present)', 'HTT (automobile) (2010–present)', 'Intermeccanica (1959–present)', 'Wingho (1999-present)', 'Acadian (1961–1971)', 'Amherst (1912)', 'Asüna (1992–1995)', 'Beaumont', 'Bourassa (1926)', 'Bricklin (1974–1975)', 'Brock (1921)', 'Brooks (1923–1926)', 'Canadian (1921)', 'Canadian Motor (1900–1902)', 'Clinton (1911–1912)', 'Colonial (1922)', 'Dominion Motors Frontenac (1931–1933)', 'Enterra (1987)', 'Envoy', 'Epic', 'Frontenac (1959–1960)', 'Gareau (1910)', 'Gray-Dort (1915–1925)', 'London Six (1922–1924)', 'Manic GT (1969–1971)', 'McLaughlin (1908–1922)', 'Meteor (1949–1976)', 'Monarch (1946–1961)', 'Moose Jaw Standard (1916–1919)', 'Queen (1901–1903)', 'Studebaker (1963–1966)', 'Tudhope (1906–1913)', 'ZENN (2006–2010)', 'BAIC Group', 'Baolong (1998–present)', 'Beijing Automotive Industry Holding Corporation\nBeijing Automobile Works (1958–present)', 'Beijing Automobile Works (1958–present)', 'BYD (2003–present)', "Chang'an Motors (1990–present)\nChanghe (since 1986)\nHafei", 'Changhe (since 1986)', 'Hafei', 'Chery (Qirui) (1997–present)', 'Dadi', 'Dongfeng (1969–present)\nDongfeng Fengshen\nVenucia (2010–present)', 'Dongfeng Fengshen', 'Venucia (2010–present)', 'First Automobile Works (FAW) (1953–present)\nFAW Tianjin (Xiali) (1986–present)\nHaima Automobile (2004–present)\nHongqi (Red Flag) (1958–present)\nHuali', 'FAW Tianjin (Xiali) (1986–present)', 'Haima Automobile (2004–present)', 'Hongqi (Red Flag) (1958–present)', 'Huali', 'Forta', 'Foton (1996–present)', 'Fudi (1996–present)', 'Fukang (company) (1990–present)', 'Fuqi', 'Geely (Jili) (1998–present)\nShanghai Maple Guorun Automobile (2003–present)\nZhejiang Geely Automobile', 'Shanghai Maple Guorun Automobile (2003–present)', 'Zhejiang Geely Automobile', 'LTI Shanghai Automobile', 'Great Wall Motors (1984–present)\nHaval', 'Haval', 'Green Field Motor (2010–present)', 'Guangzhou Automobile Industry Group (GAIG) (2000–present)\nChangfeng Motor\nLiebao\nGAC Group\nGonow\nTrumpchi\nGuangqi Honda\nEverus', 'Changfeng Motor\nLiebao', 'Liebao', 'GAC Group', 'Gonow\nTrumpchi', 'Trumpchi', 'Guangqi Honda\nEverus', 'Everus', 'Guizhou / Yunque', 'Hawtai (Huatai)', 'Huachen (Brilliance)\nJinbei (1992–present)', 'Jinbei (1992–present)', 'Huayang', 'Hwanghai', 'Icona', 'Jianghuai (JAC) (1999–present)', 'Jiangling (JMC) (1993–present)', 'Jiangnan (company) (1988–present)', 'Jonway (2005–present)', 'Kingstar (2004–present)\nLandwind', 'Landwind', 'Li Nian (Everus) (2010–present)', 'Lifan (2005–present)', 'NIO (2014–present)', 'Polarsun Automobile (Zhongshun) (2004–present)', 'Qoros (2013–present)', 'SAIC Motor\nMG Motor\nNanjing Automobile Corporation (NAC) (1947–present)\nNanjing Soyat (2004–present)\nYuejin (1995–present)\nRoewe (2006–present)\nWuling (1958–present)', 'MG Motor', 'Nanjing Automobile Corporation (NAC) (1947–present)\nNanjing Soyat (2004–present)\nYuejin (1995–present)', 'Nanjing Soyat (2004–present)', 'Yuejin (1995–present)', 'Roewe (2006–present)', 'Wuling (1958–present)', 'Shaanxi Automobile Group', 'Shuanghuan (1998–present)', 'Sichuan Tengzhong', 'Soueast Motors / Dongnan', 'Tianma (Heavenly Horse) (1995–present)', 'Tongtian (2002–present)', 'Venucia (2010–present)', 'Xinkai (1984–present)', 'Yema Auto (1994–present)', 'Youngman (2001–present)', 'Yutong Group', 'Zhonghua (1985–present)', 'Zhongxing (Zxauto) (1991–present)', 'Zhongyu (2004–present)', 'Zotye (2005–present)\nDomy Auto', 'Domy Auto', 'Crobus', 'Đuro Đaković', 'DOK-ING', 'Rimac', 'Tvornica motora zagreb', 'Tvornica Autobusa Zagreb', 'Avia (1919–present)', 'Bureko (2007–present)', 'Gordon (1997–present)', 'Jawa (1929–present)', 'Kaipan (1997–present)', 'Karosa (1896–present (since 2007 IVECO))', 'MTX / Metalex (1969–present)', 'MW Motors (2010–present)', 'Praga (1907–present)', 'Škoda (1895–present)', 'SVOS (1992–present)', 'Tatra (1850–present)', 'Aero (1929–1947)', 'Aspa (1924–25)', 'Gatter (1926–37)', 'Gnom (1921–24)', 'Hakar', 'ISIS (1922–24)', 'KAN (1911–14)', 'LIAZ (1951–2002)', 'Premier (1913–14)', 'RAF (Reichenberger Automobil Fabrik) (1907–1954)', 'Rösler & Jauernig (1896–1908)', 'Sibrava (1921–29)', 'Start (1921–31)', 'Stelka (1920–1922)', 'TAZ (1961–99)', 'Vechet (1911–14)', 'Velorex (1951–1971)', 'Walter (1909–1954)', 'Wikov (1922–35)', 'Zbrojovka Brno (1923–36)', 'PVP Karting', 'Zenvo Automotive (2004–present)', 'Anglo-Dane (1902–1917)', 'Brems (1900 and 1907)', 'Dansk (1901–1907)', 'Krampers (1890–1960)', 'Egy-Tech (2010–present)', 'Speranza (1998–present)', 'Nasr (1960–2008)', ' Nobe', 'Holland Car (2005–2013)', 'Elcat', 'Electric Raceabout (prototype, not in production)', 'Finlandia (1922–1924)', 'Korvensuu (1912–1913)', 'Sisu Auto', 'Toroidion (2015- ,prototype, not yet in production)', 'Valmet Automotive', 'Vanaja (1943–1968)', 'Valtra', 'Veemax (racecar, 1960s - 1978)', 'Wiima', 'Aixam', 'Alpine', 'Bolloré', 'Bugatti', 'Chatenet', 'Citroën', 'De la Chapelle', 'DS', 'Goupil', 'Ligier', 'Microcar', 'Peugeot', 'PGO', 'Renault', 'Venturi', 'Ballot (1905–1932)', 'Berliet (1899–1978)', 'Chenard-Walcker (1899–1946)', 'Darracq (1897–1902)', 'DB (1938–1961)', 'De Dion-Bouton (1883–1932)', 'Delage (1906–1953)', 'Delahaye (1894–1954)', 'Facel Vega (1939–1964)', 'Gobron-Brillié (1898–1930)', 'Hotchkiss (1903–1955)', 'Lorraine-Dietrich', 'Matra (1964–2003)', 'Mega(1919-1933)', 'Panhard (1887–2012)', 'Panhard et Levassor (1887–1940)', 'Rosengart (1927–1955)', 'Salmson (1920–1957)', 'Saviem (1955–1978)', 'Simca (1934–1979)', 'Talbot (1916–1959)', 'Talbot-Lago (1935–1959)', 'Tracta (1926–1934)', 'VELAM (1955–1959)', 'Vespa', 'Voisin (1919–1939)', '9ff', 'ABT Sportsline', 'Audi', 'Alpina', 'Artega', 'BMW', 'BMW M', 'Borgward', 'Citycom', 'CityEl', 'Ford-Werke', 'Gemballa', 'Gumpert (2004-2013, 2016-present)', 'Isdera', 'Lotec', 'Magirus', 'Maybach', 'Mercedes-AMG', 'Mercedes-Benz', 'Opel', 'Porsche', 'Ruf', 'Smart', 'Volkswagen', 'Wiesmann', 'Amphicar (1960–1968)', 'Apal', 'Auto Union (1932–1969)', 'Beck (2012)', 'DKW', 'Gatter (1952-1958) (formerly Czechoslovakia 1926-1937)', 'Glas (1883–1966)', 'Goliath (1928–1961)', 'Hansa (1905–1931)', 'Heinkel (1956–1958)', 'Horch (1904–1932)', 'Kodiak (1983-1985)', 'Lloyd (1908–1963)', 'Maybach (1909–2013)', 'Mercedes (1900–1926)', 'Messerschmitt (1953–1964)', 'NSU (1873–1969)', 'Trabant (1957–1991)', 'VW-Porsche (1969–1976)', 'Wanderer (1911–1941)', 'Wartburg (1898–1991)', 'Balkania (1945–present)', 'ELVO (1973–present)', 'Kioleides (1968–present)', 'Korres (2002–present)', 'Namco (1973–present)', 'Replicar Hellas (2007–present)', 'Saracakis (1923–present)', 'Temax (1925–present)', 'Alta (1968–1978)', 'Attica (1958–1972)', 'Autokinitoviomihania Ellados (1975–1984)', 'Automeccanica (1980–1995)', 'Balkania (1975–1995)', 'BET (1965–1975', 'Biamax (1956–1986)', 'C.AR (1970–1992)', 'Candia (1965–1990)', 'Diana (1976–1990)', 'DIM (1977–1982)', 'EBIAM (1979–1984)', 'Enfield (1973–1976)', 'Hercules (1980–1983)', 'MAVA-Renault (1979–1985)', 'MEBEA (1960–1983)', 'Neorion (1974–1975)', 'Pan-Car (1968–1994)', 'Record (1957–1999)', 'Scavas (1973–1992)', 'Styl Kar (1970)', 'Tangalakis (1935–1939)', 'Theologou (1915–1926)', 'Tejas Motors', 'Ashok Leyland ', 'DC Design', 'Eicher Motors', 'Force Motors', 'Tata Motors', 'Mahindra & Mahindra', 'Maruti Suzuki', 'KIA Motors', 'Hindustan Motors (1963–2014)', 'ICML (2012–2018)', 'Maruti (1983–2007)', 'Premier (1947–2016)', 'Reva', 'Sipani Motors (Sunrise Auto Industries) (1973–1995)', 'Standard (1949–1988)', 'Esemka', 'Fin Komodo', 'Pindad', 'Timor', 'Bahman', 'Diar', 'Iran Khodro (1962–present)', 'Khodro Kaveer', 'Kish Khodro', 'Morattab', 'MVM', 'Pars Khodro (1967–present)', 'Paykan', 'Reyan', 'SAIPA (1966–present)', 'Shahab Khodro', 'Zagross Khodro', 'Shamrock', 'TMC Costin', 'Alesbury (1907–1908)', 'GAC Ireland (1980–1986)', 'AIL', 'Plasan', 'Autocars', 'Kaiser-Ilin Industries', 'Abarth', 'ACM', 'Alfa Romeo', 'Casalini', 'Cizeta', 'Dagger', 'De Tomaso', 'DR', 'Ferrari', 'Fiat', 'Giannini', 'Giottiline', 'Grecav', 'Iveco', 'Lamborghini', 'Lancia', 'Maserati', 'Mazzanti', 'Pagani', 'Piaggio', 'Pininfarina', 'Qvale', 'ASA (1961–1969)', 'Autobianchi (1955–1995)', 'Bertone (1982–1989)', 'Bizzarrini (1964–1969)', 'Cisitalia (1946–1963)', 'Covini (1978-2016)', 'Innocenti (1920–1996)', 'Intermeccanica (moved to Canada)', 'Iso (1953–1974)', 'Karlmann', 'O.S.C.A. (1947–1967)', 'Siata (1926–1970)', 'Baby-Brousse (1964–1979)', 'Aspark', 'Daihatsu', 'Datsun', 'Dome (1975-present)', 'Englon', 'Lexus', 'Honda', 'Acura', 'Isuzu', 'Mazda', 'Mini', 'Mitsubishi', 'Mitsuoka', 'Nissan', 'Infiniti', 'Proton', 'Renault', 'SEAT', 'Subaru', 'Suzuki', 'Toyota', 'Yamaha Motor', 'Autozam (1989–1998)', 'Colt (1974–1984) (cars produced and exported by Mitsubishi Motors and imported into the UK by the Colt Car Company and marketed under the Colt brand)', 'ɛ̃fini (1991–1997)', 'Eunos (1989–1996)', 'Hino (1961–1967)', 'Prince (1952–1966)', 'Scion (2003–2016)', 'Toyopet', 'Mobius (2013–present)', 'Nyayo (1986–1999)', 'Orca', 'Assal', 'Bufori', 'Proton', 'Perodua', 'TD2000', 'Inokom', 'Naza', 'Hicom', 'Mildef', 'Mykar', 'Autobuses King', 'Cimex', 'Dina', 'FANASA', 'Grupo Electrico Motorizado', 'Mastretta', 'VAM', 'Vhul', 'Zacua', 'Matchedje Motors (2014–2017) owned by China Tong Jian Investment – built rebadged Fudi F16 pickups.', 'Uri-Automobile (1995–2008 moved to South Africa)', 'Hulas Motors', 'Donkervoort', "Van Doorne's Automobiel Fabrieken", 'Spyker (1999–present)', 'Vencer', 'DAF', 'Spyker (1899–1926)', 'Eysink', 'Almac (1985–present)', 'Alternative Cars (1984–present)', 'Chevron (1984–present)', 'Fraser (1988–present)', 'Hulme (2005–present)', 'Leitch (1986–present)', 'Saker (1989–present)', 'Anziel (1967)', 'Beattie (1997–2001) thence Redline', 'Carlton (1922–1928)', 'Cobra (1983–1990)', 'Crowther (1968–1978)', 'De Joux (1970)', "Dennison (1900–1905) – New Zealand's first indigenous car", 'Everson (1935–1989)', 'Heron (1964–1990)', 'Marlborough (1912–1922) thence Carlton', 'McRae (1990–2003)', 'Mistral (1957–1960)', 'Redline (2001–2009)', 'Steel Brothers (1973–1981)', 'Trekka (1966–1973)', 'UltraCommuter (2006–2013)', 'Wood (1901–1903)', 'Izuogu (1997–2006)', 'Innoson Vehicle Manufacturing', 'Pyeonghwa Motors', 'Pyongsang Auto Works', 'Sungri Motors', 'Kongsberg', 'Bjering', 'Buddy', 'Geijer', 'Norsk', 'Think', 'Troll', 'Atlas Honda', 'Honda Atlas Cars Pakistan', 'FAW Pakistan', 'Ghandhara Nissan', 'Ghandhara Industries', 'Heavy Industries Taxila', 'Hinopak', 'Master', 'Millat Tractors', 'Pak Suzuki', 'Indus Motors Company', 'Yamaha Motor Pakistan', 'Sazgar', 'Hyundai Nishat Motors', 'Kia Lucky Motors', 'United Auto Industries', 'Prince DFSK', 'MG JW Automobile', 'Adam Motor Company (Defunct)', 'Nexus Automotive (Defunct)', 'Dewan Farooque Motors (Defunct)', 'Arrinera', 'FSO', 'Melex', 'Polski Fiat (1932–1939, 1968–1992)', 'UMM (União Metalo-Mecânica) (1978–2001)', 'Portaro (1975–1995)', 'Dacia (1966–present)', 'Ford România S.A. (2008–present)', 'Oltcit (1976–1991)', 'ARO (1957–2006)', 'Derways (2003–present)', 'GAZ (1932–present)', 'Lada (1966–present)', 'UAZ (1941–present)', 'ZiL (1916–present)', 'Izh (1965–2008)', 'Moskvitch (1929–2010)', 'Russo-Balt (1894–1929/2006)', 'Marussia (2007–2014)', 'FCA Srbija (2008–present)', 'Zastava TERVO (2017–present)', 'IDA-Opel (1977–1992)', 'Yugo (1944–2008)', 'Zastava (1953–2008)', 'Revoz', 'Atax (1938–1949)', 'TAM (1947–2011)', 'Birkin (1982–present)', 'Harper Sports Cars (2014–present)', 'Perana (2007–present)', 'Puma (1973–1974, 1989–1991, 2006–present)', 'Shaka (1995–present)', 'Superformance (1996–present)', 'Uri International Vehicle & Equipment Marketing (2008–present)', 'Badsey (1979–1983, then the company moved to the USA', 'Eagle', 'GSM (1958–1964)', 'Hayden Dart (1997–2003)', 'Hi-Tech (1992–1996)', 'Optimal Energy (2008–2012)', 'Perana (1967–1996; a famous Ford manufacturer, today only active as a Ford dealer)', 'Protea (1957–1958)', 'Ranger (1968–1973)', 'Sao (1985–1994)', 'CT&T', 'Galloper', 'Genesis', 'Hyundai', 'Kia', 'Renault Samsung', 'Ssangyong', 'Edison Motors', 'Asia (1965–1999)', 'Daewoo (1983–2002)', 'GMK (1972–1976)', 'Saehan (1976–1983)', 'Saenara (1962–1965)', 'Proto (1997–2017)', 'Shinjin (1965–1972)', 'Aspid', 'Comarth', 'Cupra', 'DSD Design and Motorsport', 'GTA Motor', 'Hurtan', 'SEAT', 'Spania GTA', 'Tramontana (sports car)', 'Tauro Sport Auto', 'UROVESA', 'Hispano Suiza (Planned Revival in 2019)', 'Pegaso', 'Santana', 'Volvo Cars (1927–present)', 'Koenigsegg (1994–present)', 'Polestar (1996–present)', 'NEVS (2012–present) (bought by Saab)', 'Von Braun Holding Company (2014–present)', 'Allevo (1890–1916) / (made cars 1903–1907)', 'Vagnsaktiebolaget I Södertelge (later VABIS, then Scania-Vabis)', 'Rengsjöbilen (1914–1916)', 'Saab (1945–2012)', 'Hult Healey (1984–1990)', 'Jösse Car (1994–1999)', 'Tjorven (1968–1971)', 'MBM (1960–1967)', 'Monteverdi (1967–1984)', 'Ranger (General Motors brand) (1970–1975)', 'CMC (1973–present)', 'Formosa', 'Luxgen', 'Thunder Power', 'Tobe', 'Yue Long/Yulon/YLN (affiliated to Nissan)', 'Akepanich', 'C-FEE', 'Cherdchai', 'Deva', 'Kwaithong', 'Mine', 'Siam V.M.C.', 'Thai Rung', 'Vera', 'Barkia (2010–present)', 'Industries Mécaniques Maghrébines (1982–1988, 1991–present)', 'Wallyscar (2007–present)', 'Anadol', 'Devrim', 'Diardi', 'Etox', 'EVT S1', 'Imza', 'Özaltin', 'Sazan', 'Tofaş (1968–date)', 'TOGG', 'Kiira', 'ZAZ (1923–present)', 'Devel Motors', 'Shayton', 'W Motors', 'Zarooq Motors', 'AC Cars', 'Arash', 'Ariel (1991–present)', 'Aston Martin (1913–present)', 'BAC', 'Bentley (1919–present)', 'Bristol', 'Caterham Cars (1957–present)', 'David Brown (2013–present)', 'Dendrobium', 'Farbio', 'Ford', 'Ginetta (1958–present)', 'Jaguar (1935–present)', 'Jowett (1906–1954)', 'Keating Supercars (2006-present)', 'Lagonda', 'Land Rover (1948–present)', 'Lister', 'London Electric Vehicle Company (LEVC) (2013–present)', 'Lotus (1952–present)', 'Mini', 'McLaren (2010–present)', 'Morgan (1910–present)', 'Noble (1999–present)', 'Radical (1997–present)', 'Rolls-Royce (1904–present)', 'Trident', 'TVR (1946–2006,2013-present)', 'Vauxhall (1903–present)', 'AC', 'Allard (1945–1957)', 'Alvis', 'Armstrong Siddeley', 'Ascari (1995–2010)', 'Austin', 'Austin-Healey (1952–1972)', 'Berkeley (1956–1960)', 'Bond', 'Bristol (1945–2020)', 'British Salmson (1934–1939)', 'Buckler (1947–1962)', 'Chambers Motors (1904–1929)', 'Chrysler Europe (1976–1979)', 'Daimler', 'Dutton', 'Elva', 'Fairthorpe', 'Farboud Limited (1999-2006)', 'Frazer Nash (1925–1957)', 'Gilbern (1959–1973)', 'Gordon-Keeble', 'Healey', 'Hillman', 'Humber', 'Invicta', 'Jensen', 'Jowett (1906–1954)', 'Lanchester', 'Lea-Francis', 'Lloyd (1936–1950)', 'Lotus-Cortina', 'Marauder (1950–1952)', 'MG (1924–2011)', 'Metropolitan (1953–1961)', 'Morris (1913–1984)', 'Nash-Healey (1951–1954)', 'Ohio Electric (circa 1917)', 'Panther (1972–1990)', 'Paramount (1950–1956)', 'Peel Engineering Company (1955–1969)', 'Peerless (1957–1960)', 'Princess (1957–1960) (1975–1981)', 'Reliant', 'Riley (1907–1969)', 'Rover (1904–2005)', 'RW (1983–2000)', 'Singer', 'Standard', 'Sunbeam', 'Sunbeam-Talbot', 'Swallow (1954–1955)', 'Talbot', 'Tornado', 'Triumph', 'Trojan', 'Turner', 'Tyrrell', 'Vanden Plas', 'Warwick (1960–1962)', 'Wolseley', 'AM General', 'Anteros (2005–present)', 'Aurica (2010–present)', 'Bollinger Motors (2014-present)', 'Bremach (2009–present)', 'Buick (1903–present)', 'BXR (2008–present)', 'Cadillac (1902–present)', 'Chevrolet (1911–present)', 'Chrysler (1925–present)', 'Custom Crafted Cars (2013–present)', 'Dodge (1900–present)', 'Elio Motors (2009–present)', 'Equus Automotive (2014–present)', 'E-Z-GO (1954–present)', 'Falcon (2009–present)', 'Faraday (2014–present)', 'Ford (1903–present)', 'General Motors (1908–present)', 'Genovation', 'GMC (1913–present)', 'Hummer (1992–2010; 2020–present)', 'Opel (1941-present) (from Germany)', 'Hennessey (1991–present)', 'Humble Motors (2020-present)', 'Jeep (1941–present)', 'Karma (2016–present) (Formerly Fisker inc.)', 'Lincoln (1917–present)', 'Local (2007—present)', 'Lucid (2014–present)', 'Lyons (2011–present)', 'Niama-Reisser (2005–present)', 'Panoz (1989–present)', 'Polaris (1954–present)\nGEM', 'GEM', 'Racefab (1991–present)', 'RAESR (2014–present)', 'Ram Trucks (2010–present)', 'Rezvani (2014–present)', 'Rivian (2009–present)', 'Rossion (2007–present)', 'Saleen (1980–present)', 'Scuderia Cameron Glickenhaus', 'Shelby American (1962–present)', 'SSC (1999–present)', 'Tesla (2003–present)', 'Trion Supercars (2012–present)', 'Vehicle Production Group (2011-2013)', 'Zimmer (1978–1988, 1997–present)', 'Ajax (1925-1926)', 'AMC (1954–1987)', 'American Simplex (1906–1910) (renamed to Amplex in 1910)', 'Amplex (1910–1915) (previously known as American Simplex)', 'Auburn', 'Checker', 'Cord', 'Crosley', 'DeLorean Motor Company (1981–1983)', 'DeSoto (1928–1960)', 'Detroit Electric (1907–1939)', 'Devon (2008-2013)', 'Duesenberg', 'Eagle (1987–1998)', 'Edsel (1958–1960)', 'Frazer', 'Fisker (2011–2014)', 'Geo (General Motors brand) (1989–1997)', 'Henry J (1950–1954)', 'Hudson (1909–1957)', 'Hupmobile (1909–1939)', 'Imperial (1955–1975, 1981–1983) (Chrysler Corporation brand – Imperial was also used as a Chrysler model name in certain other years)', 'Jordan', 'Kaiser', 'LaFayette', 'LaSalle (1927–1940)', 'Marmon (1851-1933)', 'Marquette (General Motors brand)', 'Maxwell', 'Mercer (1909–1925)', 'Mercury (1938–2011)', 'Merkur (1985–1989)', 'Moon', 'Mosler (1993–2013)', 'Nash', 'Navistar International', 'Oakland (1908–1931)', 'Oldsmobile (1897–2004)', 'Packard (1899–1958)', 'Plymouth (1928–2001)', 'Pontiac (1926–2010)', 'Rambler (1897–1914 & 1958–1969)', 'Reliable Dayton (1906–1909)', 'Saturn (1985–2010)', 'Staver (1907–1914)', 'Stearns-Knight', 'Studebaker (1852–1967)', 'Vector (1989-1993)', 'Willys (1908–1963)', 'Nordex (1962–present)', 'Dellepiane (1980)', 'El Terruno (1960)', 'Grumett (1960–1982)', 'Guitolar (1970–2004)', 'Indio (1969–1977)', 'Industrias WARV (1966–1972)', 'Lima (1970–1980)', 'Mauser', 'Metalurgica Laguarda (1963)', 'Taller Danree y Silveira (1950–1960)', 'ChienThang', 'La Dalat', 'THACO', 'VinFast', 'Vinaxuki (2004–2015)']

Python file parsing, can't catch strings in new line

So Parsing a large text file with 56,900 book titles with authors and a etext no.
Trying to find the authors. By parsing the file.
The file is a like this:
TITLE and AUTHOR ETEXT NO.
Aspects of plant life; with special reference to the British flora,      56900
by Robert Lloyd Praeger
The Vicar of Morwenstow, by Sabine Baring-Gould 56899
[Subtitle: Being a Life of Robert Stephen Hawker, M.A.]
Raamatun tutkisteluja IV, mennessä Charles T. Russell 56898
[Subtitle: Harmagedonin taistelu]
[Language: Finnish]
Raamatun tutkisteluja III, mennessä Charles T. Russell 56897
[Subtitle: Tulkoon valtakuntasi]
[Language: Finnish]
Tom Thatcher's Fortune, by Horatio Alger, Jr. 56896
A Yankee Flier in the Far East, by Al Avery 56895
and George Rutherford Montgomery
[Illustrator: Paul Laune]
Nancy Brandon's Mystery, by Lillian Garis 56894
Nervous Ills, by Boris Sidis 56893
[Subtitle: Their Cause and Cure]
Pensées sans langage, par Francis Picabia 56892
[Language: French]
Helon's Pilgrimage to Jerusalem, Volume 2 of 2, by Frederick Strauss 56891
[Subtitle: A picture of Judaism, in the century
which preceded the advent of our Savior]
Fra Tommaso Campanella, Vol. 1, di Luigi Amabile 56890
[Subtitle: la sua congiura, i suoi processi e la sua pazzia]
[Language: Italian]
The Blue Star, by Fletcher Pratt 56889
Importanza e risultati degli incrociamenti in avicoltura, 56888
di Teodoro Pascal
[Language: Italian]
The Junior Classics, Volume 3: Tales from Greece and Rome, by Various 56887
~ ~ ~ ~ Posting Dates for the below eBooks: 1 Mar 2018 to 31 Mar 2018 ~ ~ ~ ~
TITLE and AUTHOR ETEXT NO.
The American Missionary, Volume 41, No. 1, January, 1887, by Various 56886
Morganin miljoonat, mennessä Sven Elvestad 56885
[Author a.k.a. Stein Riverton]
[Subtitle: Salapoliisiromaani]
[Language: Finnish]
"Trip to the Sunny South" in March, 1885, by L. S. D 56884
Balaam and His Master, by Joel Chandler Harris 56883
[Subtitle: and Other Sketches and Stories]
Susien saaliina, mennessä Jack London 56882
[Language: Finnish]
Forged Egyptian Antiquities, by T. G. Wakeling 56881
The Secret Doctrine, Vol. 3 of 4, by Helena Petrovna Blavatsky 56880
[Subtitle: Third Edition]
No Posting 56879
Author name usually starts after "by" or when there is no "by" in line then author name starts after a comma ","...However the "," can be a part of the title if the line has a by.
So, I parsed it for by first then for comma.
Here is what I tried:
def search_by_author():
fhand = open('GUTINDEX.ALL')
print("Search by Author:")
for line in fhand:
if not line.startswith(" [") and not line.startswith("TITLE"):
if not line.startswith("~"):
words = line.rstrip()
words = line.lstrip()
words = words[:-6]
if ", by" in words:
words = words[words.find(', by'):]
words = words[5:]
print (words)
else:
words = words[words.find(', '):]
words = words[5:]
if "," in words:
words = words[words.find(', '):]
if words.startswith(','):
words =words[words.find(','):]
print (words)
else:
print (words)
else:
print (words)
if " by" in words:
words = words[words.find('by')]
print(words)
search_by_author()
However it can't seem to find the author name for lines like
Aspects of plant life; with special reference to the British flora,      56900
by Robert Lloyd Praeger
As per your file, info about a book can be spread across multiple lines. There is a blank line after each book info. I used that to gather all info about a book and then parse it to get the author info.
import re
def search_by_author():
fhand = open('GUTINDEX.ALL')
book_info = ''
for line in fhand:
line = line.rstrip()
if (line.startswith('TITLE') or line.startswith('~')):
continue
if (len(line) == 0):
# remove info in square bracket from book_info
book_info = re.sub(r'\[.*$', '', book_info)
if ('by ' in book_info):
tokens = book_info.split('by ')
else:
tokens = book_info.split(',')
if (len(tokens) > 1):
authors = tokens[-1].strip()
print(authors)
book_info = ''
else:
# remove ETEXT NO. from line
line = re.sub(r'\d+$', '', line)
book_info += ' ' + line.rstrip()
search_by_author()
Output:
Robert Lloyd Praeger
Sabine Baring-Gould
mennessä Charles T. Russell
mennessä Charles T. Russell
Horatio Alger, Jr.
Al Avery and George Rutherford Montgomery
Lillian Garis
Boris Sidis
par Francis Picabia
Frederick Strauss
di Luigi Amabile
Fletcher Pratt
di Teodoro Pascal
Various
Various
mennessä Sven Elvestad
L. S. D
Joel Chandler Harris
mennessä Jack London
T. G. Wakeling
Helena Petrovna Blavatsky

File content into dictionary

I need to turn this file content into a dictionary, so that every key in the dict is a name of a movie and every value is the name of the actors that plays in it inside a set.
Example of file content:
Brad Pitt, Sleepers, Troy, Meet Joe Black, Oceans Eleven, Seven, Mr & Mrs Smith
Tom Hanks, You have got mail, Apollo 13, Sleepless in Seattle, Catch Me If You Can
Meg Ryan, You have got mail, Sleepless in Seattle
Diane Kruger, Troy, National Treasure
Dustin Hoffman, Sleepers, The Lost City
Anthony Hopkins, Hannibal, The Edge, Meet Joe Black, Proof
This should get you started:
line = "a, b, c, d"
result = {}
names = line.split(", ")
actor = names[0]
movies = names[1:]
result[actor] = movies
Try the following:
res_dict = {}
with open('my_file.txt', 'r') as f:
for line in f:
my_list = [item.strip() for item in line.split(',')]
res_dict[my_list[0]] = my_list[1:] # To make it a set, use: set(my_list[1:])
Explanation:
split() is used to split each line to form a list using , separator
strip() is used to remove spaces around each element of the previous list
When you use with statement, you do not need to close your file explicitly.
[item.strip() for item in line.split(',')] is called a list comprehension.
Output:
>>> res_dict
{'Diane Kruger': ['Troy', 'National Treasure'], 'Brad Pitt': ['Sleepers', 'Troy', 'Meet Joe Black', 'Oceans Eleven', 'Seven', 'Mr & Mrs Smith'], 'Meg Ryan': ['You have got mail', 'Sleepless in Seattle'], 'Tom Hanks': ['You have got mail', 'Apollo 13', 'Sleepless in Seattle', 'Catch Me If You Can'], 'Dustin Hoffman': ['Sleepers', 'The Lost City'], 'Anthony Hopkins': ['Hannibal', 'The Edge', 'Meet Joe Black', 'Proof']}

Python Replace multiple words in the list

Below is the code, which i was trying out, but i am not getting the expected results.
import re
def multiwordReplace(text, wordDic):
"""
take a text and replace words that match a key in a dictionary with
the associated value, return the changed text
"""
rc = re.compile('|'.join(map(re.escape, wordDic)))
def translate(match):
return wordDic[match.group(0)]
return rc.sub(translate, text)
wordDic = {
'ANGLO': 'ANGLO IRISH BANK',
'ANGLO IRISH': 'ANGLO IRISH BANK'
}
def replace(match):
return wordDic[match.group(0)]
#return ''.join(y for y in match.group(0).split())
str1 = {'ANGLO IRISH CORP PLC - THIS FOLLOWS THE BANK NATIONALIZATION BY THE GOVT OF THE REPUBLIC OF IRELAND'
'ANGLO CORP PLC - THIS FOLLOWS THE BANKS NATIONALIZATION BY THE GOVT OF THE REPUBLIC OF IRELAND'}
for item in str1:
str2 = multiwordReplace(item, wordDic)
print str2
print re.sub('|'.join(r'\b%s\b' % re.escape(s) for s in wordDic),
replace, item)
Output:
ANGLO IRISH BANK IRISH CORP PLC - THIS FOLLOWS THE BANK NATIONALIZATION BY THE GOVT OF THE REPUBLIC OF IRELAND
ANGLO IRISH BANK CORP PLC - THIS FOLLOWS THE BANKS NATIONALIZATION BY THE GOVT OF THE REPUBLIC OF IRELAND
the first one has to give only 'ANGLO IRISH BANK' and not ANGLO IRISH BANK IRISH.
Sort so that the longest possible match appears first.
longest_first = sorted(wordDic, key=len, reverse=True)
rc = re.compile('|'.join(map(re.escape, longest_first)))

Categories