Access weather data from a script?

  • 2
  • Question
  • Updated 2 years ago
Has anybody figured out how to scrape weather data from the new system?  Either from the smartHUB directly, or from the new myacurite.com?

I had a python script that successfully scraped acu-link.com. I'm trying to adapt that script to the new site.
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes

Posted 2 years ago

  • 2
Photo of George D. Nincehelser

George D. Nincehelser

  • 6741 Posts
  • 1249 Reply Likes
It's not difficult to monitor the data from the SmartHUB.  For example, you can use a Raspberry Pi with two ethernet ports, bridge the ports, then use tcpflow to monitor the http traffic.

All the data is now human-readable without knowing any arcane formulas.
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes
It is too difficult when the layout of your house prevents you from wiring your hub to your Pi ;) I barely got my wife to agree to put the hub in a semi-public space (needs to be there for sensor reception). No way I'm getting my Pi up there, too! ;)

Also, don't get me wrong, what you and others have done to intercept the communication from the hub to AcuRite's servers is impressive. Seriously. I'm glad it works for you. But IMO that and/or futzing with DNS are the wrong kind of hacks. A much better solution would be scraping the data directly. I had it working before. I'm hoping to get it working that way again.

Photo of George D. Nincehelser

George D. Nincehelser

  • 6729 Posts
  • 1248 Reply Likes
No.  It's not a man-in-the-middle attack.  It's perfectly legitimate.  That's basically how networking devices work... they read the data from a network interface, processes it, then send it back out on another interface.

As for the limitation of the SmartHUB firmware, you might want to look up the spec sheets for the microcontroller inside.  There just isn't much in the way of resources to work with.

There's also the issue that the existing http server is in the boot firmware and can't be modified remotely.
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes
It is not normal operating process for network devices to process the payload of a packet. They process the headers to route it. They do not inspect, let alone process, the payload. Doing that is the very definition of a MiTM attack.

From the first line of the wikipedia page:

In cryptography and computer security, a man-in-the-middle attack (often abbreviated MitM, MiM attack, MitMA or the same using all capital letters) is an attack where the attacker secretly relays and possibly alters the communication between two parties who believe they are directly communicating with each other.

Kind of sounds like what you're doing. It's textbook, to be quite frank.

Anyway....  

As for modifying the firmware: I'll gladly admit that we're both speculating here, and I'll gladly admit that the device is most likely resource constrained. But, like I said, they already have an HTTP server. Adding current sensor state can't be adding much: just an extra column in an html page. That's got to be almost zero on top of what they're already doing.

I get what you have works, and I'm glad it does for you. And it's very clever: kudos for figuring it out. But you should at least be honest about what it is. That would hopefully increase the pressure on AcuRite to implement a real solution.

Photo of George D. Nincehelser

George D. Nincehelser

  • 6729 Posts
  • 1248 Reply Likes
Being a network engineer and architect for many years, I'll have to disagree with your opinions.  We often get involved analyzing payload data to debug applications.

And I'm not speculating about the hardware or firmware.  Please do not speak for me about what I know.
(Edited)
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes
Dude, this isn't personal. This is simply a fact: inspecting a clear text http request as it passes through your code is almost the very definition of a MiTM attack.

What would happen if your intercepting code modified the weather data as it intercepted it, then passed that new data along? What would the server show??? Your modified data. This is 100% MiTM.

That said, I suppose it's just a matter of perspective. I know what you mean about analyzing payload data to debug applications. wireshark, etc... But the key difference there is that's to debug, not for normal operational reasons. And I guess that's the difference to me.  I mean, have you really ever deployed a something like this to a customer?

That said - I admit eavesdropping on the data as it passes through the network does seem more stable than what I'm proposing. More on this below...
Photo of George D. Nincehelser

George D. Nincehelser

  • 6729 Posts
  • 1248 Reply Likes
Sorry, no.  It's not a man-in-the-middle attack.  

Yes, it's much more stable technique than "screen scraping", and people have been reading bridge traffic this way for years.  It all resides cleanly on your own network and doesn't abuse the resources of others.
Photo of tandy1000

tandy1000

  • 37 Posts
  • 3 Reply Likes
I would think that web scraping is pretty fragile unless they've provided an interface that's intended for programmatically getting data. What about publishing to Wunderground and using their API? Or SDR + RTL-433 which (might) give you better range than the hub. Or just build a nice looking box to hold the Pi and sit under the hub, to hide the wires and appease the wife. :)
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes
Yeah, you're right: it would be fragile. Or, rather, it is fragile: my solution is currently broken while George's isn't ;).

I thought about Wunderground but I've got interior data, too, and that seems wonky to publish there.

I'm going to try to hack the page. I'll let you know how it goes. And if anybody else figures it out: please let me know.

Thanks all!
Photo of tandy1000

tandy1000

  • 37 Posts
  • 3 Reply Likes
I agree, it would be odd to put interior sensor data on a weather site. I'd also argue that hacking the web page output is just as fishy (if not more so) than intercepting traffic on your own local network. :) But that's a philosophical debate that probably won't get anyone anywhere. 

Anyway I use the packet sniffing method, but I will likely pursue the rtl_433 solution as a backup in case our hubs ever do send stuff encrypted. 
Photo of George D. Nincehelser

George D. Nincehelser

  • 6729 Posts
  • 1248 Reply Likes
Yep.  Screen scraping is fragile and not well regarded in the professional world.  

In the past someone was trying to screen-scrape the old system every 5 seconds.  The system just wasn't built for that and caused a lot of problems for everyone.

I'm not an advocate of the "encrypt everywhere" mentality, either.  It makes it extremely hard to debug applications, let alone knowing what information might be leaking out of your home or business.
Photo of dsaracin

dsaracin

  • 13 Posts
  • 2 Reply Likes
Somebody really tried to scrape the screen every 5 seconds? That's insane!