Bangor:清理web刮擦数据并合并在一起(88 hammond street bangor maine)

关于Bangor的问题,在88 hammond street bangor maine中经常遇到,The websiteURLishttps://www.justia.com/lawyers/criminal-law/maine

The websiteURLishttps://www.justia.com/lawyers/criminal-law/maine

我只想知道律师的名字和他们的办公室在哪里。

response = requests.get(url)
soup= BeautifulSoup(response.text,"html.pr")
Lawyer_name= soup.find_all("a","url main-profile-link")
for i in Lawyer_name:
    print(i.find(text=True))
address= soup.find_all("span","-address -hide-landscape-tablet")
for x in address:
    print(x.find_all(text=True))

名称打印出来只是找到,但地址打印出额外的,我想删除:

['\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\t88 Hammond Street', '\n\t\t\t\t\t\t\t\t\t\t\t\t\t\t\tBangor,\t\t\t\t\tME 04401\t\t\t\t\t\t    ']

所以我试图为每个律师得到的输出是这样的(第一个例子):

Hunter J Tzovarras
88 Hammond Street
Bangor, ME 04401

我试图找出两个问题

我如何清理地址,以便更容易阅读?

我怎样才能保存匹配的律师姓名与地址,这样他们就不会混淆。

1
Usex.get_text()instead ofx.find_all
for x in address:
    print(x.get_text(strip=True))

完整的工作代码:

import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.justia.com/lawyers/criminal-law/maine'
response = requests.get(url)
soup= BeautifulSoup(response.text,"html.pr")
n=[]
ad=[]
Lawyer_name= [x.get('title').strip() for x in soup.select('a.lawyer-avatar')]
n.extend(Lawyer_name)
#print(Lawyer_name)
address= [x.get_text(strip=True).replace('\t','').strip() for x in soup.find_all("span",class_="-address -hide-landscape-tablet")]
#print(address)
ad.extend(address)
df = pd.DataFrame(data=list(zip(n,ad)),columns=[['Lawyer_name','address']])
print(df)
Output:
             Lawyer_name                              address
0              William T. Bly Esq                  119 Main StreetKennebunk,ME 04043
1                    John S. Webb                    949 Main StreetSanford,ME 04073
2              William T. Bly Esq                    20 Oak StreetEllsworth,ME 04605
3          Christopher Causey Esq                          16 Middle StSaco,ME 04072
4                 Robert Van Horn                   88 Hammond StreetBangor,ME 04401
5                    John S. Webb       37 Western Ave., Unit #307Kennebunk,ME 04043
6              Hunter J Tzovarras                  4 Union Park RoadTopsham,ME 04086
7      Michael Stephen Bowser Jr.            241 Main StreetP.O. Box 57Saco,ME 04072
8                   Richard Regan            6 City CenterSuite 301Portland,ME 04101
9             Robert Guillory Esq            75 Pearl St. Suite 400Portland,ME 04101
10                  Dylan R. Boyd      160 Capitol StreetP.O. Box 79Augusta,ME 04332
11                 Luke Rioux Esq                 10 Stoney Brook LaneLyman,ME 04002
12               David G. Webbert        15 Columbia Street, Ste. 301Bangor,ME 04401
13                  Amy Fairfield              32 Saco AveOld Orchard Beach,ME 04064
14      Mr. Richard Lyman Hartley         62 Portland Rd., Ste. 44Kennebunk,ME 04043      
15           Neal L Weinstein Esq                647 U.S. Route One#203York,ME 03909      
16                  Albert Hansen      76 Tandberg Trail (Route 115)Windham,ME 04062      
17          Russell Goldsmith Esq        Two C PlazaPO Box 4600Portland,ME 04112      
18            Miklos Pongratz Esq           18 Market Square Suite 5Houlton,ME 04730      
19       Bradford Pattershall Esq       5 Island View DrCumberland Foreside,ME 04110      
20             Michele D L Kenney    12 Silver StreetP.O. Box 559Waterville,ME 04903      
21                   John Simpson                 344 Mount Hope Ave.Bangor,ME 04402      
22         Mariah America Gleaton                  192 Main StreetEllsworth,ME 04605      
23                Wayne Foote Esq                85 Brackett StreetPortland,ME 04102      
24                      Will Ashe                  16 Union StreetBrunswick,ME 04011      
25                Peter J Cyr Esq     482 Congress Street Suite 402Portland,ME 04101      
26  Jonathan Steven Handelman Esq                            PO Box 335York,ME 03909      
27            Richard Smith Berne                 36 Ossipee Trl W.Standish,ME 04084      
28             Meredith G. Schmid             75 Pearl St.Suite 216Portland,ME 04101      
29                Gregory LeClerc           28 Long Sands Road, Suite 5York,ME 03909      
30                   Cory McKenna                      20 Mechanic StCamden,ME 04843      
31                Thomas P. Elias  P.O. Box 1049304 Han St. Suite 1KBangor,ME...      
32           Christopher  MacLean        1250 Forest Avenue, Ste 3APortland,ME 04103      
33               Zachary J. Smith      415 Congress StreetSuite 202Portland,ME 04101      
34                 Stephen Sweatt      919 Ridge RoadP.O. BOX 119Bowdoinham,ME 04008      
35           Michael Turndorf Esq        1250 Forest Avenue, Ste 3APortland,ME 04103      
36     Andrews Bruce Campbell Esq                   133 State StreetAugusta,ME 04330      
37                Timothy Zerillo               110 Portland StreetFryeburg,ME 04037      
38               Walter McKee Esq          440 Walnut Hill RdNorth Yarmouth,ME 04097      
39                 Shelley Carter                  70 State StreetEllsworth,ME 04605      
1

对于您的第二个查询,您可以将它们保存到这样的字典中-

url = 'https://www.justia.com/lawyers/criminal-law/maine'
response = requests.get(url)
soup= BeautifulSoup(response.text,"html.pr")
# p all names and save them in a list
lawyer_names = soup.find_all("a","url main-profile-link")
lawyer_names = [name.find(text=True).strip() for name in lawyer_names]
# p all addresses and save them in a list
lawyer_addresses = soup.find_all("span","-address -hide-landscape-tablet")
lawyer_addresses = [re.sub('\s+',' ', address.get_text(strip=True)) for address in lawyer_addresses]
# map names with addresses
lawyer_dict = dict(zip(lawyer_names, lawyer_addresses))
print(lawyer_dict)

输出字典-

{'Albert Hansen': '62 Portland Rd., Ste. 44Kennebunk, ME 04043',
 'Amber Lynn Tucker': '415 Congress St., Ste. 202P.O. Box 7542Portland, ME 04112',
 'Amy Fairfield': '10 Stoney Brook LaneLyman, ME 04002',
 'Andrews Bruce Campbell Esq': '919 Ridge RoadP.O. BOX 119Bowdoinham, ME 04008',
 'Bradford Pattershall Esq': 'Two C PlazaPO Box 4600Portland, ME 04112',
 'Christopher Causey Esq': '949 Main StreetSanford, ME 04073',
 'Cory McKenna': '75 Pearl St.Suite 216Portland, ME 04101',
 'David G. Webbert': '160 Capitol StreetP.O. Box 79Augusta, ME 04332',
 'David Nelson Wood Esq': '120 Main StreetSuite 110Saco, ME 04072',
 'Dylan R. Boyd': '6 City CenterSuite 301Portland, ME 04101',
 'Gregory LeClerc': '36 Ossipee Trl W.Standish, ME 04084',
 'Hunter J Tzovarras': '88 Hammond StreetBangor, ME 04401',
 'John S. Webb': '16 Middle StSaco, ME 04072',
 'John Simpson': '5 Island View DrCumberland Foreside, ME 04110',
 'Jonathan Steven Handelman Esq': '16 Union StreetBrunswick, ME 04011',
 'Luke Rioux Esq': '75 Pearl St. Suite 400Portland, ME 04101',
 'Mariah America Gleaton': '12 Silver StreetP.O. Box 559Waterville, ME 04903',
 'Meredith G. Schmid': 'PO Box 335York, ME 03909',
 'Michael Stephen Bowser Jr.': '37 Western Ave., Unit #307Kennebunk, ME 04043',
 'Michael Turndorf Esq': '415 Congress StreetSuite 202Portland, ME 04101',
 'Michele D L Kenney': '18 Market Square Suite 5Houlton, ME 04730',
 'Miklos Pongratz Esq': '76 Tandberg Trail (Route 115)Windham, ME 04062',
 'Mr. Richard Lyman Hartley': '15 Columbia Street, Ste. 301Bangor, ME 04401',
 'Neal L Weinstein Esq': '32 Saco AveOld Orchard Beach, ME 04064',
 'Peter J Cyr Esq': '85 Brackett StreetPortland, ME 04102',
 'Richard Regan': '4 Union Park RoadTopsham, ME 04086',
 'Richard Smith Berne': '482 Congress Street Suite 402Portland, ME 04101',
 'Robert Guillory Esq': '241 Main StreetP.O. Box 57Saco, ME 04072',
 'Robert Van Horn': '20 Oak StreetEllsworth, ME 04605',
 'Russell Goldsmith Esq': '647 U.S. Route One#203York, ME 03909',
 'Shelley Carter': '110 Portland StreetFryeburg, ME 04037',
 'Thaddeus Day Esq': '440 Walnut Hill RdNorth Yarmouth, ME 04097',
 'Thomas P. Elias': '28 Long Sands Road, Suite 5York, ME 03909',
 'Timothy Zerillo': '1250 Forest Avenue, Ste 3APortland, ME 04103',
 'Todd H Crawford Jr': '1288 Roosevelt Trl, Ste #.O. Box 753Raymond, ME 04071',
 'Walter McKee Esq': '133 State StreetAugusta, ME 04330',
 'Wayne Foote Esq': '344 Mount Hope Ave.Bangor, ME 04402',
 'Will Ashe': '192 Main StreetEllsworth, ME 04605',
 'William T. Bly Esq': '119 Main StreetKennebunk, ME 04043',
 'Zachary J. Smith': 'P.O. Box 1049304 Han St. Suite 1KBangor, ME 04401'}

本站系公益性非盈利分享网址,本文来自用户投稿,不代表码文网立场,如若转载,请注明出处

(735)
Code查询:SERVER_ERROR:[code] 1675030[message]:执行查询时出错
上一篇
学前教育专科代码:使用科学前缀(prefix scientific)
下一篇

相关推荐

  • java与jsp:如何使用Java和JSP构建功能强大的Web应用

    示例示例Java和JSP是两种不同的技术,它们都是用于开发Web应用程序的重要工具。Java是一种面向对象的编程语言,用于编写可在多种平台上运行的跨平台应用程序。它可以用于开发各种类型的应用程序,包括桌面应用程序、服务器端应用程序和Web应用程序。Java应用程序通常使用Java类库来实现其功能。…

    2023-06-15 13:33:03
    0 54 14
  • go服务器框架快速构建高性能Web应用

    Go服务器框架是一种基于Go语言的Web服务器框架,它可以帮助开发者快速搭建一个高性能的Web应用程序。Go服务器框架使用Go语言的标准库中的net/http包来处理HTTP请求,并且提供了一些常用的功能,如路由、中间件、模板引擎等,从而让开发者更加方便地构建Web应用程序。…

    2023-06-06 03:45:49
    0 16 52
  • html5和css3的新特性:Welcome to the Future of Web Design!

    示例示例HTML5新特性(新的语义元素…

    2023-05-20 12:24:46
    0 79 14
  • java web项目怎么运行:如何在Java Web项目中运行和部署应用

    Java Web项目运行步骤:首先需要安装JDK环境,并配置环境变量。…

    2023-05-14 09:09:39
    0 72 77
  • html 代码编辑器:Welcome to My Website!

    HTML代码编辑器是一种可以用来编写、编辑和测试HTML代码的应用程序,它可以帮助网页开发人员更快地创建和维护网站。它们通常具有语法高亮、自动补全、拼写检查和其他功能,以帮助开发人员更好地编写HTML代码。…

    2023-06-19 06:47:23
    0 81 89
  • javassm项目:使用JavaSSM框架构建高效的Web应用

    JavaSSM是一个基于Java的框架,用于构建Web应用程序。它的核心是Spring,Struts和MyBatis三个开源框架,这三个框架合作完成Web应用程序的开发。JavaSSM框架的核心是Spring,它是一个轻量级的控制反转(IoC)和面向切面(AOP)的容器框架。它可以帮助开发人员管理对象之间的依赖关系,并且可以提供大量的服务,例如事务管理,安全管理,数据访问等。…

    2023-05-06 11:33:32
    0 82 17
  • vue 全局websocket:使用Vue全局Websocket构建实时应用程序

    Vue 全局 websocket 是指在 Vue 项目中,将 websocket 连接封装成一个全局的插件,可以在任何组件中使用。…

    2023-06-27 03:50:13
    0 15 62
  • css底部对齐:Welcome to Our Website

    示例示例css底部对齐指的是将多个元素的底部对齐,使用CSS实现底部对齐可以使用flex布局。代码示例:…

    2023-05-01 13:46:07
    0 32 24

发表评论

登录 后才能评论

评论列表(61条)