Context Files Losing Parameters

grantroot
Posts: 23
Member Since:
2008-07-17

I'm having trouble with the DND and CFWD status not syncing to the phone's LEDs, and the problem can be traced to data loss in the context files in /var/cache/aastra. This never happened (that I noticed) during our testing phase, with a handful of phones (57i CT) polling the synch.php script every five minutes.

Three things changed at roughly the same time:
- The XML Scripts package was updated to 2.1.1.
- I changed the "action uri poll interval" to 60 seconds.
- We provisioned and deployed 110 phones.

Since that time, here's what happens:

1) A phone for (for example) 3333 is newly provisioned. It has both CFWD and DND keys defined in /var/cache/aastra/3333.context, and the LEDs work just fine when the DND or CFWD modes are activated.

2) Within a few minutes or hours, the 3333.context file "decays" and some of the features no longer sync. There's no random file corruption; the file format is still valid and readable -- but some of the parameters are just gone. Usually the "key" parameter in the [cfwd] section goes first, then later the "key" for the [dnd] section. Sometimes the whole [cfwd] section disappears. The other sections of the file (e.g. [daynight], [speed]) seem to persist with no problems.

Due to the precise nature of the parameter deletion, I believe that some sort of race condition in the reading and writing of the context files is causing partial files to be written out every so often. I don't think it's caused by my having the phones update every minute, but that's probably making it happen much sooner.

Has anyone else seen this issue? Can anyone suggest a clean fix for it?

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



necits
Posts: 419
Member Since:
2008-02-23
I had the same issue.

I had the same issue. Cleared out /var/cache/aastra and reprovisioned phones. It hasn't happened since but I am getting ready to put this system into production and you have me a little worried now.

--

Michael Mathewson CCNA,MCSE
Owner/Consultant
Northeast CT IT Solutions



aastra1
Posts: 287
Member Since:
2006-11-06
Potential explanation

I will have a look at this problem but you have to keep in mind that the polling is needed only to synchronize the status of the phone for DND, CFWD... when they can be activated by another mean such a a star code, the daynight is a bit different problem. But anyway 110 phones polling every minute this makes an average of 330 calls per minute which means 5/6 calls per seconds to scripts reading and writing the same context file and also opening the AGI.
I see 2 sources of problem:
- the AGI connection fails which could translate to a wrong status
- as the read/write of the context file is not protected by a 'locking' mechanism the scripts may fail to do the read/write and then maybe corrupt the file.
I would bet on the second option. Since 2.3.0 the PhoneExecute items are asynchronous so the 3 scripts are now launched at the same time after the action uri is triggered when before 2.3.0 the operations were synchronous, dnd.php then cfwd.php then daynight.php so no risk of accessing the same context file.
I will make some tests ASAP to reproduce and will publish a fix, I might to create 3 different context files specific for each application so they don't interfere anymore but first I will have to make sure that this is the origin of the problem.
But again, I would like to understand why you need a sync every minute.

Stay tuned

aastra1

--

---
aastra1
Aastra XML scripts 2.3.0 now available



necits
Posts: 419
Member Since:
2008-02-23
I had the same problem occur

I had the same problem occur using the default poll interval of 1800. I also am only using three phones. Could the action uri xml sip notify and action uri poll be conflicting? Is the action URI poll even necessary now that the scripts use xml sip notify?

--

Michael Mathewson CCNA,MCSE
Owner/Consultant
Northeast CT IT Solutions



grantroot
Posts: 23
Member Since:
2008-07-17
Follow-up

Thanks for checking into this!

Actually, I don't need the updates to occur every minute. I just want to make sure the phones sync up fairly quickly, and that I can tell my users something reasonable like "within a minute" or "within five minutes".

6 requests a second is no problem for Apache, of course. I don't know the limitations of the AGI. As far as the context files, any given file would only be opened three times per minute, right? Although those three times would be very close together. But they'd be very close together no matter what my sync interval was.

I did not know about the PhoneExecute items being asynchronous with the new firmware. I'll bet you're right that it is the source of the trouble. Some sort of locking does seem to be needed.

My scenario is fairly repeatable at the moment, so I can do some testing for you if that helps. We're going live sometime this week, though, so if I don't get an official fix I'll probably have to put in some kind of ugly hack to tide me over. :-(

--
Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



aastra1
Posts: 287
Member Since:
2006-11-06
Temporary fix

All,

I have reproduced the problem which was coming from race conditions in the save_user_context function as the 3 scripts were trying to write at the same time.
Here is a quick fix to solve this issue, I will do a better solution in the next version of the scripts.

In asterisk/cfwd.php
Line 80
Replace Aastra_save_user_context($user,'cfwd',$data);
By if($cf!=$last) Aastra_save_user_context($user,'cfwd',$data,'cf');
Line 115
Replace Aastra_save_user_context($user,'cfwd',$data);
By Aastra_save_user_context($user,'cfwd',$data,'cf');

In asterisk/daynight.php
Line 160
Replace Aastra_save_user_context($user,'daynight',$data);
By Aastra_save_user_context($user,'daynight',$data,'dn');
Line 205
Replace Aastra_save_user_context($user,'daynight',$data);
By if($night!=$last) Aastra_save_user_context($user,'daynight',$data,'dn');

In asterisk/dnd.php
Line 111
Replace Aastra_save_user_context($user,'dnd',$data);
By Aastra_save_user_context($user,'dnd',$data,'dnd');
Line 151
Replace Aastra_save_user_context($user,'dnd',$data);
By if($dnd!=$last) Aastra_save_user_context($user,'dnd',$data,'dnd');

In include/AastraCommon.php
Line 839
Replace function Aastra_get_user_context($user,$appli)
By function Aastra_get_user_context($user,$appli,$special='')
Line 845
Replace $file=AASTRA_PATH_CACHE.$user.".context";
By
if($special=='') $file=AASTRA_PATH_CACHE.$user.".context";
else $file=AASTRA_PATH_CACHE.$user."-".$special.".context";
Line 869
Replace function Aastra_save_user_context($user,$appli,$data)
By function Aastra_save_user_context($user,$appli,$data,$special='')
Line 875
Replace $file=AASTRA_PATH_CACHE.$user.".context";
By
if($special=='') $file=AASTRA_PATH_CACHE.$user.".context";
else $file=AASTRA_PATH_CACHE.$user."-".$special.".context";

I checked my fix with 5 phones plooing every seconds, as now each application has its own context file updated only when needed, it should work. I will work on a more elegant solution for the future.

Can you implement the fix and let me know if everything is fixed.

Thanks for finding this one.

Regards

Aastra1

--

---
aastra1
Aastra XML scripts 2.3.0 now available



grantroot
Posts: 23
Member Since:
2008-07-17
Thanks!

Thanks for your work on this! I'll try it later today and see if it works for me.

--
Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



grantroot
Posts: 23
Member Since:
2008-07-17
Problems, and Proposed Alternative

The solution proposed above has several problems.
1) (Serious) The saving is done to application-specific context files, but the reading is always from the main context file. Effectively, the saves don't do anything.
2) (Serious) If the reading *were* done from the application-specific context files, then startup.php would have to be altered to create those files in the first case.
3) (Minor) The "special" parameter seems to duplicate the application parameter. Seems like one should be enough.

I think that the solution *may* be much simpler; locking the context file properly while writing to it, so that other processes cannot read from it before it is completely written. Here's the proposed fix:

AastraCommon.php
888,891c888,895
$value)
---
> if (flock($handle, LOCK_EX))
> {
> foreach($array as $key=>$value)
> {
> fputs($handle,'['.$key.']'."\n");
> fputs($handle,'data='.$value['data']."\n");
> }
> flock($handle, LOCK_UN);

(The tabs got lost in the above diff files, but you get the idea.)

I would also support your idea of writing out the file only when needed, like this:

cfwd.php
80c80
---
> if ($cf!=$last) Aastra_save_user_context($user,'cfwd',$data);

dnd.php
151c151
---
> if($dnd!=$last) Aastra_save_user_context($user,'dnd',$data);

daynight.php
205c205
---
> if($night!=$last) Aastra_save_user_context($user,'daynight',$data);

I have the first part of this fix in place now, and it seems to be working. I will test it overnight and see if the problem is solved, then report back.

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



aastra1
Posts: 287
Member Since:
2006-11-06
flock is probably the way to go

Grant,

You are right I forgot to read the user context from the same 'special' file in each application, my bad. I did some tests with flock but once in a while it was failing because the lock is only in writing mode, we should also protect in reading mode.

Let me know how your changes work for you and I will apply them to the code.

Thanks for your help.

Aastra1

--

---
aastra1
Aastra XML scripts 2.3.0 now available



grantroot
Posts: 23
Member Since:
2008-07-17
Fail

Hmmm. The change I made didn't work -- in fact, the context files got even more mangled.

I'm going to have to dig deeper into what's happening here. I'll report back when I come up with something.

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



aastra1
Posts: 287
Member Since:
2006-11-06
Two options to fix this issue

Hi Grant,

I can see 2 ways to fix that as the flock does not work:
1-Use asterisk database to store the last status for each application and not the context file
2-Change sync.php to make the actions synchronous again which means 'chain' the calls to sync.php.

Let me know what you find on youur side.

aastra1

--

---
aastra1
Aastra XML scripts 2.3.0 now available



grantroot
Posts: 23
Member Since:
2008-07-17
Success with flock

Got it!

You were most likely correct that the Aastra_readINIfile function needed to lock the context file as well, so I tried that, but with no luck. Then after some research and careful thought, I figured out what was wrong. A file cannot be locked until after it is opened. Opening it with mode "w" will immediately truncate the file contents, allowing other processes to read an empty file before the lock can be placed. The solution is to open the file with mode "r+" and then explicitly truncate the contents *after* the lock is in place. Here is the patch:

--- AastraCommon.php.orig	2008-10-28 13:11:52.000000000 -0400
+++ AastraCommon.php	2008-10-29 16:16:02.000000000 -0400
@@ -582,7 +582,21 @@
 ###################################################################################################
 function Aastra_readINIfile ($filename, $commentchar, $delim) 
 {
-$array1 = @file($filename);
+$array1 = array();
+$handle = @fopen($filename, "r");
+if ($handle)
+	{
+	if (flock($handle, LOCK_SH))
+		{
+		while (!feof($handle))
+			{
+			$array1[] = fgets($handle);
+			}
+		flock($handle, LOCK_UN);
+		}   
+	fclose($handle);
+	}
+
 $section = '';
 foreach ($array1 as $filedata) 
 	{
@@ -882,13 +896,18 @@
 $array[$appli]['data']=base64_encode(serialize($data));
 
 # Create cache file
-$handle = @fopen($file, "w");
+$handle = @fopen($file, "r+");
 if($handle)
 	{
-	foreach($array as $key=>$value)
+	if (flock($handle, LOCK_EX))
 		{
-		fputs($handle,'['.$key.']'."\n");
-		fputs($handle,'data='.$value['data']."\n");
+        	ftruncate($handle, 0);
+		foreach($array as $key=>$value)
+			{
+			fputs($handle,'['.$key.']'."\n");
+			fputs($handle,'data='.$value['data']."\n");
+			}
+		flock($handle, LOCK_UN);
 		}
 	fclose($handle);
 	}

This fix has been in place for the last 24 hours, and not one context file has lost parameters.

I also think I will add the tiny mods listed above to dnd.php, cfwd.php and daynight.php so that the context file will not be written to when the status hasn't changed. I'm pretty sure it introduces a theoretical opportunity for the context file to get out of sync, perhaps if a user presses a button at the same time as the polling occurs, so it may not be ideal for everyone. But I suspect the chances of that are minimal, and the fix is easy. Since it will reduce my server's I/O load by about 330 writes per minute, it's worth it to me.

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



aastra1
Posts: 287
Member Since:
2006-11-06
Great fix

Grant,

Thanks for your work, your fix is great and will be part of the next version.

Great job!

aastra1

--

---
aastra1
Aastra XML scripts 2.3.0 now available



grantroot
Posts: 23
Member Since:
2008-07-17
Fix Utilities

Thanks!

In case anyone else has had this happen with large numbers of phones and needs to fix the "key" parameters in the context files *without* hosing the other data, here is a simple way. There's a program and a script to run it against all the files. Note that the program has hard-coded values and will need to be changed based on your setup.

fixcontext.php

#!/usr/bin/php

fixcontext.sh (to be run from /var/www/html/aastra/asterisk)

#!/bin/bash

ls -1 /var/cache/aastra/*.context | cut -d/ -f5 | cut -d. -f1 | xargs ./fixcontext.php

Hope this helps!

--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



grantroot
Posts: 23
Member Since:
2008-07-17
Fix to the Fix -- VERY IMPORTANT!!!

D'oh! I missed something in the fix posted above.

In cases where the context file does not exist, opening the file with mode "r+" will not attempt to create it.

Instead, you should use mode "a+". The corrected patch is below:

--- AastraCommon.php.orig	2008-10-28 13:11:52.000000000 -0400
+++ AastraCommon.php	2008-11-06 15:48:20.000000000 -0500
@@ -582,7 +582,21 @@
 ###################################################################################################
 function Aastra_readINIfile ($filename, $commentchar, $delim) 
 {
-$array1 = @file($filename);
+$array1 = array();
+$handle = @fopen($filename, "r");
+if ($handle)
+	{
+	if (flock($handle, LOCK_SH))
+		{
+		while (!feof($handle))
+			{
+			$array1[] = fgets($handle);
+			}
+		flock($handle, LOCK_UN);
+		}   
+	fclose($handle);
+	}
+
 $section = '';
 foreach ($array1 as $filedata) 
 	{
@@ -882,13 +896,18 @@
 $array[$appli]['data']=base64_encode(serialize($data));
 
 # Create cache file
-$handle = @fopen($file, "w");
+$handle = @fopen($file, "a+");
 if($handle)
 	{
-	foreach($array as $key=>$value)
+	if (flock($handle, LOCK_EX))
 		{
-		fputs($handle,'['.$key.']'."\n");
-		fputs($handle,'data='.$value['data']."\n");
+        	ftruncate($handle, 0);
+		foreach($array as $key=>$value)
+			{
+			fputs($handle,'['.$key.']'."\n");
+			fputs($handle,'data='.$value['data']."\n");
+			}
+		flock($handle, LOCK_UN);
 		}
 	fclose($handle);
 	}
--

Grant Root
MIS Supervisor, Dayton-Phoenix Group, Inc.



Comment viewing options

Select your preferred way to display the comments and click "Save settings" to activate your changes.