前段时间,经理让我去找一些企业的信息,我平常习惯于使用爱企查。所以,便想着写一个程序来实现这个,所以有以下的代码:import json
import requests
import re
from lxml import etree
url="https://aiqicha.baidu.com/s?q="+公司名称+"=0"
headers={"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/78.0.3904.108 Safari/537.36",
"Cookie": "BAIDUID=FCA8661E3619BECE060CC564924BCC62:FG=1; PSTM=1598866843; BIDUPSID=E0F38C456F9E422ADF83AC42B7D6101A; BDUSS=WQ0VGd1RFNjMmZsallMY2h0cHpxcGJ3UX4tc000d1RSU3RFaUt0eTE2R1VGSGhmSVFBQUFBJCQAAAAAAAAAAAEAAAA3fsVHxfSzzLrDs7UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJSHUF-Uh1BfO; BDUSS_BFESS=WQ0VGd1RFNjMmZsallMY2h0cHpxcGJ3UX4tc000d1RSU3RFaUt0eTE2R1VGSGhmSVFBQUFBJCQAAAAAAAAAAAEAAAA3fsVHxfSzzLrDs7UAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAJSHUF-Uh1BfO; BDPPN=4a85ba200a8603ef878bc33a1be441f3; log_guid=1a14b30029743b225cc8614df11b9eb2; H_PS_PSSID=7560_32606_1431_32045_32680_32116_31322_32691; BDORZ=B490B5EBF6F3CD402E515D22BCDA1598; BDSFRCVID=B_FOJeC627JtTMnro8G-M4zom7dhgP3TH6aogQEIojxEwhB2gJ6wEG0PeM8g0KAbDINlogKK3gOTH4PF_2uxOjjg8UtVJeC6EG0Ptf8g0M5; H_BDCLCKID_SF=tRKOoILKfIt3fP36qRQj-ICShUFs3qRlB2Q-5KL-JhcMSh6kK4PWQIuIjh6y26bb2IvToMbdJJjoeUjHytn82MLWM-KHKMIqb2TxoUJHBCnJhhvq-xOzX4AebPRiJ-b9Qg-JbpQ7tt5W8ncFbT7l5hKpbt-q0x-jLTnhVn0MBCK0HPonHjKKejoX3f; Hm_lvt_baca6fe3dceaf818f5f835b0ae97e4cc=1599189361,1599210076,1599439817,1599439901; BDRCVFR[feWj1Vr5u3D]=I67x6TjHwwYf0; delPer=0; PSINO=6; Hm_lpvt_baca6fe3dceaf818f5f835b0ae97e4cc=1599448132"}
res=requests.get(url,headers=headers)
res=res.text.replace('\/','')
res=res.encode('utf-8').decode('unicode_escape')
# res=re.findall('{(.*?)}',res)
res=re.findall(r'{"pid":(.*?)}],',res)
# print(res)
for aa in res:
# # aa=aa.strip('')
aa=aa.replace('','')
aa=aa.replace('','')
# print(aa)
bb=re.findall(r'"entName":"(.*?)",',aa)
cc=re.findall(r'"regCap":"(.*?)",',aa)
bids=re.findall(r'"bid":"(.*?)",',aa)
gongsiming={'username':'',
'zijin':'',
'dizhi':''}
for ae,ac,bid in zip(bb,cc,bids):
# print(ae,ac,bid)
# if ae=="北京蜂盒科技有限公司":
# print(ac)
gongsiming={'username':ae,
'zijin':ac,
'dizhi':bid}
# gongsiming['username']=ae
# gongsiming['zijin']=ac
# gongsiming['dizhi']=bid
print(gongsiming)
我这里需要的是公司的名称、注册资金,其他的参数都是不需要的,所以这里我只做了简单的提取,想要提取其他信息,用正则选以下就好了。至于为什么使用正则,主要是因为这个源代码太复杂了,本想用json,但是没搞懂json,使用正则效果也是一样。